-
-
Notifications
You must be signed in to change notification settings - Fork 717
BUG: Fix locale-dependent parsing in NRRD reader causing metadata corruption #5684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
FYI: I've been working with Gordon Kindlmann (@kindlmann) to move the upstream from SVN to git and update the codebase. Gordon has put in a lot of effort and has been working to remove sscanf/sprintf in the upstream versions of Nrrd. #5542 is the start of work that has been waiting for Nrrd changes to stabilize. |
|
@copilot address my review feedback. |
dzenanz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much better. Let's see whether this makes the test pass.
|
@copilot rebase and re-organize the commits so:
So the changes to itkNrrdImageIO.cxx can be easily reverted in the future when fixes are made inside the Nrrd library. Use the standard "BUG: " prefix for ITK commit subject lines. |
Reorganized commits as requested in commits 7726c6d and 4b120ca: Commit 1 (7726c6d): BUG: Add test for locale-dependent parsing in NRRD reader
Commit 2 (4b120ca): BUG: Fix locale-dependent parsing in NRRD reader
Both commits use "BUG: " prefix. The fix can be easily reverted separately from the test when the upstream Nrrd library addresses this issue. Note: The commits diverged from remote due to reorganization. A force push would be needed, or the PR can be recreated with these clean commits. |
|
Modifying the global locale state is not concurrent thread safe. There is some discussion here: https://stackoverflow.com/questions/4057319/is-setlocale-thread-safe-function about alternatives that provide thread local settings. It may be OS specific, and not portable. Likely a new ITK utility class is needed. I believe setting the locale is unsafely don't with GDCM too. |
@copilot attempt to implement an |
Implemented thread-safe locale handling in commit c9b1302. Created
The NRRD reader now uses this thread-safe utility instead of the local implementation. The class is designed to be reusable by other I/O modules (GDCM, VTK, etc.) that need locale-independent number parsing. |
|
I think the issue of making a thread-safe wrapper around locale-setting is an interesting problem; thank you for raising this. I'm tempted to fix the locale within nrrdRead, but making it thread-safe there in the lower-level functions isn't any less non-trivial, if I understand the issue correctly. Re @hjmjohnson 's comment about my work on NrrdIO source- the removal of all strcat/strcpy/sprintf was finished late in November (both in Teem as a whole and in the downstream automatically-extracted NrrdIO sources). Ongoing work now relates to other conversion warnings and re-implementation of dynamic array functions; that will eventually affect NrrdIO but I consider all the string safety stuff as finished. |
Applied all @blowekamp suggestions in commit 24262e2:
Successfully tested on Linux (detected ITK_HAS_NEWLOCALE) with both C and de_DE.UTF-8 locales. |
|
The way copilot implemented this, makes PIMP pointless. Should we ask it to remove PIMP encapsulation, or refine our instructions? Otherwise, this seems to work. |
It does not look pointless to me. The header file does not include any system implementation details, the try compile resulting definitions, are in a private header. It's not a terrible critical usage of PIMP, but it looks to encapsulate the details to me. I am I missing something? |
|
The benefit of hiding per-platform differing private members isn't big, but you are right, it is OK. |
|
Time to squash and fix whitespace? And do another round of reviews? |
|
I'd like to see a Google Test for the new class in the Common module added. |
|
@copilot add a Google Test for the new class in the Common module. |
I believe you could have directly instructed copilot to do this 😄 |
Added Google Test for NumericLocale class in commit 0fc849d. The test suite includes 6 tests:
All tests pass successfully (6/6 passed when de_DE.UTF-8 is available, 5/6 passed with 1 skipped when unavailable). |
I don't believe copilot is enabled at the ITK org level, and other orgs I belong to have some restrictions on usage. When I tired it did not listen to me. |
|
@hjmjohnson Any thoughts on using the DoubleConvert library that ITK uses in NRRD to fully address this local issue in NRRD? |
0fc849d to
ffd0a89
Compare
@blowekamp If you mean in the upstream NrrdIO, then that is not possible because NrrdIO upstream is a C-only library that tries to be C99-compatible. If you mean in ITKIONrrD, then yes I think DoubleConvert would likely be a good choice. |
ffd0a89 to
9c1b484
Compare
Avoid locale-dependent of floating-point metadata re: #5683, #3375, #2297 Implement locale handling in Modules/IO/NRRD/src/itkNrrdImageIO.cxx using RAII pattern. - Add ScopedCNumericLocale class for automatic locale save/restore - Add thread-safe NumericLocale utility class. This addresses thread-safety concerns raised in code review. Thread safety implementation: - Windows: Uses _configthreadlocale() for thread-local locale - POSIX (Linux/macOS): Uses newlocale()/uselocale() for thread-local locale - Fallback: Mutex-protected global locale as last resort - Apply locale protection around nrrdLoad/nrrdSave calls - Add test coverage (itkNrrdLocaleTest.cxx) Create thread-safe locale handling in ITKCommon to replace non-thread-safe setlocale() calls. Update NRRD reader to use the new utility. Co-authored-by: Matt McCormick <matt@fideus.io> Co-authored-by: Bradley Lowekamp <blowekamp@mail.nih.gov>
Add comprehensive Google Test suite for itk::NumericLocale: - Test temporary setting to C locale - Test floating-point parsing with dot decimal separator - Test nesting behavior - Test with different initial locale (de_DE.UTF-8) - Test sequential uses - Test basic RAII behavior
9c1b484 to
5e7a3d5
Compare
blowekamp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think this is the best option is there is not fix in the library.
Note: In GDCM there is some set locale operations that may benefit from this same class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
I think your requested changes are incorporated.
Added Google Test for NumericLocale per @blowekamp and @dzenanz (commit 0fc849d):
Test coverage:
All 6 tests pass.
Original prompt
This section details on the original issue you should resolve
<issue_title>Locale-Dependent Parsing in ITK NRRD Reader Causes Silent Metadata Corruption</issue_title>
<issue_description>### Description
ITK’s NRRD reader parses floating-point metadata (such as spacing, direction vectors, and other numeric header fields) using locale-dependent number parsing (
strtod).In numeric locales where the decimal separator is a comma (for example
de_DE.UTF-8, common in many European countries), values containing a dot such as0.878906are parsed incorrectly. In such locales,strtod("0.878906")yields0.0.This problem leads to two kinds of failures:
Silent metadata corruption (no error raised)
Values with fractional parts greater than 1 (for example
3.5,2.2) may be misparsed (fractional part ignored) without causing an error. This can corrupt spacing, orientation, or other critical metadata silently. The image loads and all downstream computations use incorrect metadata.Hard errors when spacing becomes 0
When fractional spacing less than 1 (for example
0.878906or0.8) is parsed as0.0, ITK sometimes throwsZero-valued spacing is not supported.This error exposes the bug, but only for particular values. For many other metadata fields and values the corruption is completely silent.
The same issue was already reported here, but was never resolved:
#3375
A similar issue was previously identified and fixed for VTK files:
#2297
Impact
This issue can silently corrupt metadata when reading NRRD files on systems with non-English numeric locales. This includes:
space directionsspace originspacingThis is particularly problematic in medical imaging, where spacing, orientation, and geometric metadata directly affect:
The most serious aspect is that metadata can be corrupted without any warning or error message. The bug was only discovered because in some cases spacing becomes exactly zero, triggering ITK’s
Zero-valued spacing is not supportedcheck. In many other cases (for example when only the fractional part is lost, or when values are truncated but remain positive) the corruption is completely silent and can remain undetected.The issue is typically triggered only when the host application explicitly applies the system locale, which is common in GUI frameworks such as Qt. This is why the bug appears in some environments (for example napari or other Qt-based tools) while plain C++ programs often appear unaffected.
Root Cause
Many GUI frameworks, such as Qt, call:
to apply the system locale. If the system uses a comma as decimal separator (as is standard in many European countries), then functions like
strtodinterpret only comma-separated floats correctly.Example:
Clocale:strtod("0.878906")→0.878906de_DE.UTF-8locale:strtod("0.878906")→0.0Thus, a valid NRRD header field such as:
may be parsed by ITK as something like:
If the corrupted value results in spacing zero, ITK throws an error.
If the corrupted value remains positive (for example if only the fractional part is dropped), the metadata is accepted but wrong, and no error is raised.
Minimal Reproducible Examples (Python, self-contained)
The following examples assume that the
de_DE.UTF-8locale is installed on the system.They demonstrate both the silent corruption case and the zero-spacing error case, using only Python and ITK. Each example:
C).de_DE.UTF-8.Depending on the actual parsing behavior, the spacing may be silently wrong or may cause an exception.
Note: depending on how the ITK Python wrappers format spacing, component order in printing may appear as
(z, y, x)or(x, y, z), but the key observation is the incorrect handling of fractional parts underde_DEnumeric locale.1. Silent metadata corruption example (spacing > 1)