fix(validator): bound tree-sitter parse to prevent scoring-round DoS#1276
Open
anderdc wants to merge 1 commit into
Open
fix(validator): bound tree-sitter parse to prevent scoring-round DoS#1276anderdc wants to merge 1 commit into
anderdc wants to merge 1 commit into
Conversation
…ures Sets parser.timeout_micros (2s) on every cached tree-sitter parser so adversarial inputs cannot hang the scoring round in C, and wraps each PR in score_miner_prs with a try/except so one bad PR cannot abort the UID's scoring loop. Without the timeout, parser.parse() can spin forever in tree-sitter's error-recovery paths on inputs as small as 16 bytes, holding the GIL and preventing the round from completing. The timeout makes the C code raise ValueError, which the existing parse_code wrapper already catches and converts to a None tree (handled as score=0 downstream).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
parser.timeout_micros = 2_000_000on every cached tree-sitter parser so adversarial PR file contents cannot hang the scoring round in C.score_miner_prswith atry/exceptso one bad PR cannot abort the rest of a UID's scoring loop.Why
gittensor/validator/utils/tree_sitter_scoring.pycallsparser.parse(content.encode('utf-8'))with no timeout. Several bundled grammars intree-sitter-language-pack==0.7.2have error-recovery loops that, on adversarial input, spin forever in C while holding the GIL — including a known 16-byte TSX payload (tree-sitter-typescript#323). Thetry/except Exceptionalready insideparse_codedoes not help here because a C-level hang never raises a Python exception.The unguarded path runs synchronously inside the validator's scoring round:
A miner can open a PR (no merge required —
OPENPRs are scored) containing a single pathological file in anymaster_repositories.jsonrepo. Every validator re-fetches that PR each round for ~PR_LOOKBACK_DAYS=35and hangs, blocking weight setting.What this PR does (and doesn't)
Setting
parser.timeout_microsmakes the C code raiseValueError('Parsing failed')back into Python after 2s. The existing wrapper inparse_codealready catchesExceptionand returnsNone, whichscore_tree_diffalready handles (the file degrades to atree-diffwith empty signatures, score 0). 2s is well above the millisecond cost of real files.The per-PR
try/exceptinscore_miner_prsis defense-in-depth: any future exception in PR scoring degrades that PR to score 0 instead of aborting the rest of the loop.Scope intentionally minimal — does not include subprocess isolation. One known timeout-immune class remains (
ts_subtree_balancehangs, tree-sitter#4019, one known input on.scala); that requires an external wall-clock and can be addressed in a follow-up if it appears in the wild.Test plan
uv run pytest tests/→ 753 passed).parse_code('<a {{b:>c:d(e f)', 'tsx')returnsNonein 2.00s (previously: infinite hang).parser.timeout_micros = 2000000is set on cached parsers.