Skip to content

Commit e6110ef

Browse files
authored
gh-144759: Fix undefined behavior from NULL pointer arithmetic in lexer (#144788)
Guard against NULL pointer arithmetic in `_PyLexer_remember_fstring_buffers` and `_PyLexer_restore_fstring_buffers`. When `start` or `multi_line_start` are NULL (uninitialized in tok_mode_stack[0]), performing `NULL - tok->buf` is undefined behavior. Add explicit NULL checks to store -1 as sentinel and restore NULL accordingly. Add test_lexer_buffer_realloc_with_null_start to test_repl.py that exercises the code path where the lexer buffer is reallocated while tok_mode_stack[0] has NULL start/multi_line_start pointers. This triggers _PyLexer_remember_fstring_buffers and verifies the NULL checks prevent undefined behavior.
1 parent 645f5c4 commit e6110ef

File tree

3 files changed

+24
-4
lines changed

3 files changed

+24
-4
lines changed

Lib/test/test_repl.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,22 @@ def test_multiline_string_parsing(self):
143143
output = kill_python(p)
144144
self.assertEqual(p.returncode, 0)
145145

146+
@cpython_only
147+
def test_lexer_buffer_realloc_with_null_start(self):
148+
# gh-144759: NULL pointer arithmetic in the lexer when start and
149+
# multi_line_start are NULL (uninitialized in tok_mode_stack[0])
150+
# and the lexer buffer is reallocated while parsing long input.
151+
long_value = "a" * 2000
152+
user_input = dedent(f"""\
153+
x = f'{{{long_value!r}}}'
154+
print(x)
155+
""")
156+
p = spawn_repl()
157+
p.stdin.write(user_input)
158+
output = kill_python(p)
159+
self.assertEqual(p.returncode, 0)
160+
self.assertIn(long_value, output)
161+
146162
def test_close_stdin(self):
147163
user_input = dedent('''
148164
import os
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Fix undefined behavior in the lexer when ``start`` and ``multi_line_start``
2+
pointers are ``NULL`` in ``_PyLexer_remember_fstring_buffers()`` and
3+
``_PyLexer_restore_fstring_buffers()``. The ``NULL`` pointer arithmetic
4+
(``NULL - valid_pointer``) is now guarded with explicit ``NULL`` checks.

Parser/lexer/buffer.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ _PyLexer_remember_fstring_buffers(struct tok_state *tok)
1313

1414
for (index = tok->tok_mode_stack_index; index >= 0; --index) {
1515
mode = &(tok->tok_mode_stack[index]);
16-
mode->start_offset = mode->start - tok->buf;
17-
mode->multi_line_start_offset = mode->multi_line_start - tok->buf;
16+
mode->start_offset = mode->start == NULL ? -1 : mode->start - tok->buf;
17+
mode->multi_line_start_offset = mode->multi_line_start == NULL ? -1 : mode->multi_line_start - tok->buf;
1818
}
1919
}
2020

@@ -27,8 +27,8 @@ _PyLexer_restore_fstring_buffers(struct tok_state *tok)
2727

2828
for (index = tok->tok_mode_stack_index; index >= 0; --index) {
2929
mode = &(tok->tok_mode_stack[index]);
30-
mode->start = tok->buf + mode->start_offset;
31-
mode->multi_line_start = tok->buf + mode->multi_line_start_offset;
30+
mode->start = mode->start_offset < 0 ? NULL : tok->buf + mode->start_offset;
31+
mode->multi_line_start = mode->multi_line_start_offset < 0 ? NULL : tok->buf + mode->multi_line_start_offset;
3232
}
3333
}
3434

0 commit comments

Comments
 (0)