Fix incorrect snprintf format specifier and undefined bsearch comparator#377
Open
aaranmcguire wants to merge 1 commit intoWizardMac:devfrom
Open
Fix incorrect snprintf format specifier and undefined bsearch comparator#377aaranmcguire wants to merge 1 commit intoWizardMac:devfrom
aaranmcguire wants to merge 1 commit intoWizardMac:devfrom
Conversation
1. spss/readstat_sav_parse.c: %*s should be %.*s snprintf uses %*s (minimum field width — pads with spaces) where %.*s (precision — truncates to N chars) is intended. This means if temp_val is longer than str_len, the full string is copied rather than a str_len-character prefix. The same function also does memcpy(temp_val, str_start, str_len) where temp_val is char[65]. If str_len > 64 the memcpy overflows the stack buffer. Added a bounds check before the memcpy. Note: this file is generated by Ragel from readstat_sav_parse.rl. The fix is applied to both the generated .c and should be applied to the .rl source as well. 2. stata/readstat_dta_read.c: dta_compare_strls returns uint64 diff dta_compare_strls returns key->o - target->o where both are uint64_t. When the difference exceeds INT_MAX, the truncated int has the wrong sign, causing bsearch to navigate the wrong direction — potentially returning NULL (strL silently replaced with empty string) or a wrong entry (silent data corruption). Replaced with safe three-way comparison: (a>b)-(a<b) evaluates to +1, 0, or -1 for any comparable type without overflow.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR 6: Fix incorrect
snprintfformat specifier and undefined comparator behaviourSummary
Two unrelated but both definitively-wrong code patterns:
%*sshould be%.*s— in the SAV long-variable-names parser,snprintfuses%*s(minimum-width specifier) where%.*s(precisionspecifier, which truncates) is intended. Combined with a missing bounds
check on a
memcpytarget, a malformed SAV file can overflow a 64-bytestack buffer.
Unsigned subtraction returned as
intindta_compare_strls— thebsearchcomparator for Stata strL references returnskey->o - target->owhere both areuint64_t. When the true differenceexceeds
INT_MAX, the truncatedinthas the wrong sign, causingbsearchto navigate in the wrong direction and potentially return garbage pointers.
Bug 1:
%*svs%.*s+memcpyoverflow (spss/readstat_sav_parse.c)What
%*sand%.*sactually dosnprintf(dst, dst_size, "%*s", width, src)—*specifies minimumfield width (padding). If
srcis longer thanwidth, the fullsrcis copied (up to
dst_size). This is NOT truncation.snprintf(dst, dst_size, "%.*s", prec, src)—.*specifies precision(maximum characters from
src). This IS truncation.The code intends to copy at most
str_lencharacters oftemp_valintoinfo->longname.%*sdoes not achieve this — it copies all oftemp_val.%.*sis the correct specifier.memcpyoverflow intotemp_val[64]The same function also does:
temp_valis declaredchar temp_val[64+1]. If the Ragel grammar allowsstr_len > 64(which it does — the grammar constrains key characters butnot value length),
memcpyoverflowstemp_valonto the stack.Vulnerable code
Reproduction
With AddressSanitizer on a crafted SAV:
Fix
Note: This file is generated by Ragel from
readstat_sav_parse.rl. Thefix should be applied to the
.rlsource and the generated.cre-generated,or applied directly to the
.cif the.rlsource is not being maintained.Bug 2: Unsigned subtraction overflow in
dta_compare_strlsdta_strl_tfield typesVulnerable code
key->o - target->oisuint64_tsubtraction. The result is alwaysnon-negative (wraps on underflow). When cast to
int:key->o > target->oandkey->o - target->o > INT_MAX(e.g.ovalues differ by
0x80000001), the cast truncates to a negativeint.bsearchsees a negative return and goes left — the opposite directionfrom correct — potentially returning NULL or a wrong element.
What goes wrong
dta_compare_strlsis used as the comparator inbsearchover the strLlookup table. A wrong comparator makes
bsearchnavigate incorrectly, so:with empty string.
one at the queried
(v, o)coordinates — silent data corruption.The
vsubtraction is safe becauseuint16_tdifferences fit inint.Only
ohas the problem.Reproduction
A Stata DTA file (v117+) with at least two strL entries whose observation
indices (
o) differ by more thanINT_MAX(2^31−1):The bug is deterministic and format-dependent. A DTA file with
ovalues{0x100000001, 0x000000001}will causebsearchto go thewrong direction on the first comparison.
Fix
Use explicit three-way comparison that is safe for any unsigned type:
(a > b) - (a < b)evaluates to+1,0, or−1for any comparable type,without overflow.