Fix agx crash on gui_init in AgX#21010
Conversation
TurboGit
left a comment
There was a problem hiding this comment.
Please remove submodules from commit. TIA.
Uups sorry, removed, thanks! |
|
There may be a better alternative. I'll detail, just wanted to say we should not merge, yet. |
So, if I understand correctly, this should not happen. I have my best agents :-) on the case. Will keep you posted. |
|
It certainly seems related to the issue that #19745 tried to fix, if not exactly the same. How I triggered it (twice in a row): macOS, master branch synced to yesterday or the day before. There is a bug in the darkroom zoom (which I addressed separately in #21011) which caused the zoom level to get stuck in a kind of ping-pong situation. I.e., I would try to zoom out after zooming in, but the zoom would bounce back to where it was before, or not move at all. I tried to exit the darkroom and re-enter, to see if that would "reset" the zoom somehow. Upon re-entering the darkroom, I got the spinning wheel of death and then a nice crash report, pointing to a NULL access that Claude traced back to those widgets not being initialized when they are accessed. |
|
So maybe there's some other bug that breaks the 'reset' protection. How about using a single if(!g->basic_curve_controls.curve_toe_power
|| !g->basic_curve_controls.curve_shoulder_power)
return;Instead of the 2 |
|
Hmm.... #20723
I'll dig further tomorrow. |
Maybe also the reason for #17236 ? |
Since the two widgets are used in two separate statements I think it's tidier if they are checked for independently. However, I don't feel strongly about it, so happy to change it if you think that it's important.
PR #20723 that Kofa linked above is supposed to fix that. That is a very elusive bug. It was affecting me until I started building from sources, then I have never ever seen it happen again. That said, the change that I am proposing shouldn't be very controversial. In the worst case it is a no-op, and anyways it adds a NULL check which is always a good practice IMHO. I agree that we want to understand why something that should not be NULL actually is, but that investigation can happen independently. |
I just thought that's less code, easier to read. This is defensive code, certainly not controversial, but something seems to be off in darktable (or in a compiler - "It was affecting me until I started building from sources, then I have never ever seen it happen again"), as all the calls that could trigger that path (including That was also the conclusion before (see #19745 (comment) and below), and @dterrahe , who knows the UI inside-out, also agreed. The call to So, your fix avoids the crash, and is certainly beneficial - thanks! It'd be great to find the root cause, though, as other modules (current or future) could also be affected. (Probably not as part of this PR, though.) |
|
One possible explanation is that on some compiler / architecture : Aren't atomic in some context. What about replacing them by:
Maybe even adding an |
|
I encountered the exact crash this PR is trying to fix a couple of days ago (Linux, GCC 12.3.0), but hadn't gotten around to making a PR of my own. If all paths to the problematic function are indeed protected with reset++, then a comment I left recently about that variable possibly needing to be made atomic gains confidence - the compiler can't optimize away or reorder atomic increment/decrement the way it can for a simple variable. |
|
But wouldn't a compiler optimising the protection away always result in a crash? The call path is deterministic. |
|
No and that's the beauty of a race condition. It all depends on the scheduler and sometime very hard to reproduce. You can live for years with a race condition without noticing it. |
|
@masterpiga @kofa73 : Let me do the atomic change, we'll see if this help fixing this issue. |
|
I would understand the atomic stuff. I was referring to the compiler removing the protection. Without the reset protection, there is a straight path from Adding the atomics won't hurt - but can we reproduce this to test? I encountered the crash in November, and nothing since. @masterpiga seems to see it more often, though. |
|
It is even worse than that, block like: Is completely wrong. This should also be made full atomic... I'm on it. |
|
See #21026 |
This caused darktable to crash occasionally when entering the darkroom.
Cause: During gui_init,
dt_bauhaus_slider_set_soft_range()on the curve_contrast_around_pivot slider fires a signal chain → _update_curve_warnings → tries to dereference curve_toe_power/curve_shoulder_power which may not be assigned yet.Fix: Added
if(widget)guards before the twodt_bauhaus_widget_set_quad_paint()calls.Co-authored with Claude.