Fix integer overflow and undefined behaviour in SAV reader#375
Open
aaranmcguire wants to merge 1 commit intoWizardMac:devfrom
Open
Fix integer overflow and undefined behaviour in SAV reader#375aaranmcguire wants to merge 1 commit intoWizardMac:devfrom
aaranmcguire wants to merge 1 commit intoWizardMac:devfrom
Conversation
Four arithmetic issues in spss/readstat_sav_read.c: 1. size*count multiplication overflow (sav_parse_records_pass1 and pass2): Two uint32_t values from the file are assigned to size_t then multiplied. On 32-bit platforms the product can overflow, producing a data_len of 0 which causes the parser to skip the record and continue reading at the wrong file position. 2. abs(n_missing_values) undefined behaviour (sav_skip_variable_record): When n_missing_values == INT_MIN, abs() is undefined. On two's- complement targets it typically returns INT_MIN (negative), which when used as a seek offset produces a large wrong seek. Replaced with an explicit conditional and a validity range check (> 3 is always invalid per SPSS spec). 3. uint32 label_len overflow (sav_read_variable_label): label_len near UINT32_MAX causes (label_len+3)/4*4 to wrap to 0, resulting in readstat_malloc(0). Added a 0xFFFF sanity bound (well above the SPSS limit of 255 characters). 4. uint32_t buffer length overflow in long-string value labels: value_buffer_len and label_buffer_len are uint32_t; value_len*4+1 overflows when value_len >= 0x40000000, producing a tiny realloc target. Changed to size_t with an explicit > 0xFFFFFF guard.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR 4: Fix integer overflow and undefined behaviour in SAV reader
Summary
Four integer-arithmetic issues in
spss/readstat_sav_read.c:size * countmultiplication overflow — file-controlleduint32valuesmultiplied without overflow check, used as allocation size and seek distance.
abs(INT_MIN)undefined behaviour — callingabs()onINT_MINis UBin C; on two's-complement targets it typically returns
INT_MIN(negative),producing a huge seek.
uint32variable-label length overflow — a 32-bit length field nearUINT32_MAXcauses(label_len + 3) / 4 * 4to wrap to 0, leading to azero-byte
malloc.uint32_tbuffer-length overflow in long-string value labels — twouint32_tvariables hold lengths computed aslen * 4 + 1; whenlen≥0x40000000the multiplication wraps to a small number, causingreadstat_reallocto return a tiny buffer thatreadstat_convertthenoverflows.
Bug 1:
size * countmultiplication overflowVulnerable code (same pattern at two locations, pass-1 and pass-2)
Why it is exploitable
On 32-bit platforms where
size_tis 32 bits: ifsize = 0x80000000andcount = 2, thensize * count = 0x100000000which wraps to 0.readstat_realloc(data_buf, data_len=0):readstat_realloc(p, 0)returns NULL →READSTAT_ERROR_MALLOC, which isa safe failure. ✓
data_lenbytes:io->seek(0, ...)is a no-op,so the parser continues reading the next record at the wrong file position.
Subsequent parsing produces garbage or, depending on what happens to be at
that offset, a crash.
On 64-bit the widened multiply cannot overflow (uint32 × uint32 ≤ 2^64).
Reproduction
Fix
Applied at both the pass-1 (
sav_parse_records_pass1) and pass-2(
sav_parse_records_pass2) locations.Bug 2:
abs(n_missing_values)undefined behaviourVulnerable code
Why it is undefined behaviour
n_missing_valuesis a signed 32-bit integer read from the file. SPSS usesnegative values (−1, −2, −3) to encode range-type missing specifications.
If the file sets
n_missing_values = INT_MIN(0x80000000), thenabs(INT_MIN)is undefined behaviour in C. On all common two's-complementtargets it returns
INT_MINitself (−2147483648). Cast to an unsigned offset:(off_t)(−2147483648) * 8 = −17179869184— a large negative seek — or on a32-bit
off_t, wraps to a large positive, seeking past the end of the file.The later
sav_read_variable_missing_valuesvalidates to[-3, 3], butsav_skip_variable_record(pass-1) runs before that function and has no suchguard.
Reproduction
Fix
Bug 3:
uint32variable-label length wraps to 0Vulnerable code
Why it matters
If
label_len = 0xFFFFFFFD(just 3 belowUINT32_MAX):The error is handled safely (returns
READSTAT_ERROR_MALLOC) but thediagnostic is misleading — the real issue is an impossibly large label, not
a genuine OOM. More importantly, label lengths this large can only come from
malformed or malicious files.
Fix
Add an explicit sanity bound that matches the maximum SPSS label length
(255 characters):
Bug 4:
uint32_toverflow in long-string value-label buffer lengthsVulnerable code
How the heap write occurs
value_lenis auint32_tfrom the file.value_buffer_lenis alsouint32_t, so the arithmetic isuint32_t × 4 + 1.If
value_len = 0x40000001:readstat_realloc(ptr, 5)succeeds — 5 bytes allocated.Then
readstat_convertis called with an output length ofvalue_buffer_len(5), but it reads
value_len(0x40000001 ≈ 1 GB) bytes from the file into a5-byte buffer.
readstat_convertinternally clips to itsdst_lenparameter— but
dst_lenhere is 5, notvalue_len * 4 + 1, soreadstat_convertcorrectly writes at most 5 bytes.
However, the file is then advanced by
value_lenbytes viaio->read.Because the file isn't actually 1 GB, this read returns short and
READSTAT_ERROR_PARSEfires. The out-of-bounds write doesn't materialiseon real files, but the logic is fragile and the declaration type is wrong.
Fix
Change the buffer-length type to
size_tand guard against huge values: