Skip to content

Fix agx crash on gui_init in AgX#21010

Open
masterpiga wants to merge 1 commit into
darktable-org:masterfrom
masterpiga:fix_agx_init
Open

Fix agx crash on gui_init in AgX#21010
masterpiga wants to merge 1 commit into
darktable-org:masterfrom
masterpiga:fix_agx_init

Conversation

@masterpiga
Copy link
Copy Markdown
Collaborator

@masterpiga masterpiga commented May 13, 2026

This caused darktable to crash occasionally when entering the darkroom.

Cause: During gui_init, dt_bauhaus_slider_set_soft_range() on the curve_contrast_around_pivot slider fires a signal chain → _update_curve_warnings → tries to dereference curve_toe_power/curve_shoulder_power which may not be assigned yet.
Fix: Added if(widget) guards before the two dt_bauhaus_widget_set_quad_paint() calls.

Co-authored with Claude.

@masterpiga masterpiga added bugfix pull request fixing a bug priority: high core features are broken and not usable at all, software crashes scope: image processing correcting pixels labels May 13, 2026
Copy link
Copy Markdown
Member

@TurboGit TurboGit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove submodules from commit. TIA.

Comment thread src/tests/integration
Comment thread src/external/lua-scripts
@TurboGit TurboGit added this to the 5.6 milestone May 13, 2026
@masterpiga
Copy link
Copy Markdown
Collaborator Author

Please remove submodules from commit. TIA.

Uups sorry, removed, thanks!

@masterpiga masterpiga requested a review from TurboGit May 13, 2026 17:30
@kofa73
Copy link
Copy Markdown
Collaborator

kofa73 commented May 13, 2026

There may be a better alternative. I'll detail, just wanted to say we should not merge, yet.

@kofa73
Copy link
Copy Markdown
Collaborator

kofa73 commented May 13, 2026

  • gui_init is called by imageop.c#dt_iop_gui_init, which increments darktable.gui->reset:
    void dt_iop_gui_init(dt_iop_module_t *module)
    {
      ++darktable.gui->reset;
      --darktable.bauhaus->skip_accel;
      dt_pthread_mutex_init(&module->gui_lock, NULL);
      if(module->gui_init) module->gui_init(module);
      ++darktable.bauhaus->skip_accel;
      --darktable.gui->reset;
    }
  • bauhaus.c#set_soft_range calls dt_bauhaus_slider_set_soft_min, dt_bauhaus_slider_set_soft_max, both of which call dt_bauhaus_slider_set, which then calls _slider_set_normalized. There, the reset check blocks the actual update:
    static void _slider_set_normalized(dt_bauhaus_widget_t *w, float pos)
    {
      dt_bauhaus_slider_data_t *d = &w->slider;
      // ...
      if(!darktable.gui->reset)
      {
        d->is_changed = -1;
        _slider_value_change(w);
      }
    }

So, if I understand correctly, this should not happen.
I have the feeling it's the same issue I found in #19745, which I was then unable to ever replicate.

I have my best agents :-) on the case. Will keep you posted.

@masterpiga
Copy link
Copy Markdown
Collaborator Author

masterpiga commented May 13, 2026

It certainly seems related to the issue that #19745 tried to fix, if not exactly the same.

How I triggered it (twice in a row):

macOS, master branch synced to yesterday or the day before. There is a bug in the darkroom zoom (which I addressed separately in #21011) which caused the zoom level to get stuck in a kind of ping-pong situation. I.e., I would try to zoom out after zooming in, but the zoom would bounce back to where it was before, or not move at all.

I tried to exit the darkroom and re-enter, to see if that would "reset" the zoom somehow. Upon re-entering the darkroom, I got the spinning wheel of death and then a nice crash report, pointing to a NULL access that Claude traced back to those widgets not being initialized when they are accessed.

@kofa73
Copy link
Copy Markdown
Collaborator

kofa73 commented May 13, 2026

So maybe there's some other bug that breaks the 'reset' protection.

How about using a single if? Those sliders should be created together; if we hit this code with either of them being null, something weird is going on anyway.

  if(!g->basic_curve_controls.curve_toe_power
     || !g->basic_curve_controls.curve_shoulder_power)
    return;

Instead of the 2 if statements.

@kofa73
Copy link
Copy Markdown
Collaborator

kofa73 commented May 13, 2026

Hmm.... #20723

re-entrant GTK event processing, allowing other code paths to run their own ++/-- cycles on darktable.gui->reset

I'll dig further tomorrow.

@zisoft
Copy link
Copy Markdown
Collaborator

zisoft commented May 14, 2026

  • bauhaus.c#set_soft_range calls dt_bauhaus_slider_set_soft_min, dt_bauhaus_slider_set_soft_max, both of which call dt_bauhaus_slider_set, which then calls _slider_set_normalized. There, the reset check blocks the actual update

Maybe also the reason for #17236 ?

@masterpiga
Copy link
Copy Markdown
Collaborator Author

masterpiga commented May 14, 2026

How about using a single if?

Since the two widgets are used in two separate statements I think it's tidier if they are checked for independently. However, I don't feel strongly about it, so happy to change it if you think that it's important.

Maybe also the reason for #17236 ?

PR #20723 that Kofa linked above is supposed to fix that. That is a very elusive bug. It was affecting me until I started building from sources, then I have never ever seen it happen again.

That said, the change that I am proposing shouldn't be very controversial. In the worst case it is a no-op, and anyways it adds a NULL check which is always a good practice IMHO. I agree that we want to understand why something that should not be NULL actually is, but that investigation can happen independently.

@kofa73
Copy link
Copy Markdown
Collaborator

kofa73 commented May 14, 2026

However, I don't feel strongly about it, so happy to change it if you think that it's important.

I just thought that's less code, easier to read.

This is defensive code, certainly not controversial, but something seems to be off in darktable (or in a compiler - "It was affecting me until I started building from sources, then I have never ever seen it happen again"), as all the calls that could trigger that path (including dt_bauhaus_slider_set_soft_range()) seem to be protected using darktable->reset, and yet we still reach the code.

That was also the conclusion before (see #19745 (comment) and below), and @dterrahe , who knows the UI inside-out, also agreed.
There, the call path was:

#7  dt_bauhaus_widget_set_quad_paint (widget=0x0, f=0x0, paint_flags=16, paint_data=0x0) at darktable/src/bauhaus/bauhaus.c:1100
#8  0x00007bc65ff842b0 in _update_curve_warnings (self=<optimized out>) at darktable/src/iop/agx.c:2089
#9  0x00007bc65ff87d2d in gui_changed (self=0x60605ccdf140, widget=<optimized out>, previous=0x7fffb5911754) at darktable/src/iop/agx.c:2144
#10 0x00007bc6ddc5fd88 in dt_iop_gui_changed (action=0x60605ccdf140, widget=widget@entry=0x60605ffa6650, data=data@entry=0x7fffb5911754) at darktable/src/develop/imageop.c:3969
#11 0x00007bc6ddaec60f in _slider_value_change (w=0x60605ffa6650) at darktable/src/bauhaus/bauhaus.c:3384
#12 0x00007bc6ddaf08f6 in dt_bauhaus_slider_set_soft_range (widget=0x60605ffa6650, soft_min=<optimized out>, soft_max=1) at darktable/src/bauhaus/bauhaus.c:1017
#13 0x00007bc65ff8820f in _create_basic_curve_controls_box (self=0x60605ccdf140, g=0x60605ebf5d80) at darktable/src/iop/agx.c:2181
#14 gui_init (self=0x60605ccdf140) at darktable/src/iop/agx.c:2800

The call to dt_iop_gui_changed from _slider_value_change should have been prevented by the reset guard in _slider_set_normalized, which seems to have been optimised away (dt_bauhaus_slider_set_soft_range -> dt_bauhaus_slider_set_soft_min -> dt_bauhaus_slider_set -> _slider_set_normalized -> (guarded) _slider_value_change).
It this was a compiler issue in our case (the guard being missing, removed during compilation), I'd expect the crash to occur all the time.

So, your fix avoids the crash, and is certainly beneficial - thanks! It'd be great to find the root cause, though, as other modules (current or future) could also be affected. (Probably not as part of this PR, though.)

@TurboGit
Copy link
Copy Markdown
Member

One possible explanation is that on some compiler / architecture :

++darktable.gui->reset;
...
--darktable.gui->reset;

Aren't atomic in some context. What about replacing them by:

dt_atomic_add_int() from atomic.h?

Maybe even adding an dt_atomic_incr_int() and dt_atomic_decr_int().

@ralfbrown
Copy link
Copy Markdown
Collaborator

I encountered the exact crash this PR is trying to fix a couple of days ago (Linux, GCC 12.3.0), but hadn't gotten around to making a PR of my own.

If all paths to the problematic function are indeed protected with reset++, then a comment I left recently about that variable possibly needing to be made atomic gains confidence - the compiler can't optimize away or reorder atomic increment/decrement the way it can for a simple variable.

@kofa73
Copy link
Copy Markdown
Collaborator

kofa73 commented May 14, 2026

But wouldn't a compiler optimising the protection away always result in a crash? The call path is deterministic.

@TurboGit
Copy link
Copy Markdown
Member

No and that's the beauty of a race condition. It all depends on the scheduler and sometime very hard to reproduce. You can live for years with a race condition without noticing it.

@TurboGit
Copy link
Copy Markdown
Member

@masterpiga @kofa73 : Let me do the atomic change, we'll see if this help fixing this issue.

@kofa73
Copy link
Copy Markdown
Collaborator

kofa73 commented May 14, 2026

I would understand the atomic stuff. I was referring to the compiler removing the protection. Without the reset protection, there is a straight path from gui_init to the crash site, as @masterpiga outlined. If either incrementing the reset counter before gui_init, or checking it in _slider_set_normalized is removed, there would be a deterministic crash.

Adding the atomics won't hurt - but can we reproduce this to test? I encountered the crash in November, and nothing since. @masterpiga seems to see it more often, though.

@TurboGit
Copy link
Copy Markdown
Member

It is even worse than that, block like:

  if(darktable.gui->reset) return;

  ++darktable.gui->reset;

Is completely wrong. This should also be made full atomic... I'm on it.

@TurboGit
Copy link
Copy Markdown
Member

See #21026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix pull request fixing a bug priority: high core features are broken and not usable at all, software crashes scope: image processing correcting pixels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants