-
Notifications
You must be signed in to change notification settings - Fork 0
feat(codec): envelope validation + size cap #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdds codec versioning and stricter envelope validation across runtimes and codec utilities, injects Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant NodeRuntime as NodeRuntime
participant PythonSubprocess as PythonSubprocess
participant CodecValidator as CodecValidator
Client->>NodeRuntime: invoke Python-backed function
NodeRuntime->>PythonSubprocess: start subprocess / send request
PythonSubprocess->>PythonSubprocess: serialize result, inject codecVersion
PythonSubprocess->>PythonSubprocess: measure byte length vs TYWRAP_CODEC_MAX_BYTES
alt payload within limit
PythonSubprocess->>NodeRuntime: emit JSONL envelope response
NodeRuntime->>CodecValidator: decode & validate envelope (codecVersion, encoding, fields)
CodecValidator->>NodeRuntime: return decoded result
NodeRuntime->>Client: deliver result
else payload too large
PythonSubprocess->>NodeRuntime: emit size-limit error (includes TYWRAP_CODEC_MAX_BYTES)
NodeRuntime->>Client: surface size-limit error
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🪛 Ruff (0.14.11)runtime/python_bridge.py581-581: Avoid specifying long messages outside the exception class (TRY003) 584-584: Avoid specifying long messages outside the exception class (TRY003) 596-596: Avoid specifying long messages outside the exception class (TRY003) 🔇 Additional comments (7)
✏️ Tip: You can disable this entire section by setting Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🤖 Fix all issues with AI agents
In `@docs/codec-roadmap.md`:
- Around line 37-56: The fenced code blocks for the envelope examples (e.g., the
Envelope JSON block showing "__tywrap__": "scipy.sparse", "codecVersion",
"encoding", "format", etc.) lack language specifiers and preceding blank lines;
update each envelope code fence (SciPy, Torch, Sklearn examples referenced
around the "__tywrap__" JSON blocks) to include a blank line before the
triple-backtick and add the language tag "json" after the opening ``` so the
blocks become ```json; apply this same change to all three envelope examples
mentioned.
In `@docs/configuration.md`:
- Line 298: Add a brief inline comment explaining the purpose of the
TYWRAP_CODEC_MAX_BYTES environment variable so readers understand what it
controls; locate the line defining TYWRAP_CODEC_MAX_BYTES and append or insert a
short comment like "Max response payload size (bytes)" or add an entry for
TYWRAP_CODEC_MAX_BYTES to the Environment Variables table if present, ensuring
the description mentions units and impact (e.g., limits codec/response size).
In `@docs/runtimes/nodejs.md`:
- Around line 211-216: The "Payload Size Limit" heading and its code fence lack
required blank lines; add a blank line before and after the "### Payload Size
Limit" heading and ensure there is a blank line before the opening ``` and after
the closing ``` surrounding the TYWRAP_CODEC_MAX_BYTES example, and append a
short sentence noting that if a response exceeds TYWRAP_CODEC_MAX_BYTES (env
var) the call fails with an explicit error rather than delivering the payload.
In `@runtime/python_bridge.py`:
- Around line 380-399: get_codec_max_bytes() is called on every response and
does repeated os.environ lookup and parsing; compute and cache its result once
at module-import time (e.g. evaluate get_codec_max_bytes() into a module-level
constant like CODEC_MAX_BYTES or _CODEC_MAX_BYTES next to FALLBACK_JSON) and
update places that call get_codec_max_bytes() (including main and the
response-path that currently calls it) to read that constant instead; preserve
current semantics (None for unset/blank/non-positive or invalid values raising
ValueError at import if you prefer, or treat parse errors as None to match
existing behavior) so behavior remains unchanged while avoiding per-response
environment parsing.
In `@src/utils/codec.ts`:
- Around line 368-377: The type assertion on version when returning the
SklearnEstimator is redundant; remove the "as string | undefined" cast in the
return object so that version is returned as-is (the local variable name is
version and the return is the object satisfying SklearnEstimator), leaving the
rest of the return (className, module, params) unchanged.
- Around line 344-354: Unnecessary casts and any usage occur when building the
returned TorchTensor-like object; instead, capture and validate raw values
first: read rawShape/rawDtype/rawDevice from value without casting, validate
rawShape with Array.isArray and element type checks before assigning to shape
(then you can safely narrow to number[]), and set dtype/device by checking
typeof rawDtype/rawDevice === 'string' and assigning those validated strings to
local consts; finally return the object using these validated locals (and keep
the isPromiseLike(decoded) branch returning the same shaped object) so you
remove the unnecessary assertion and all (any) casts.
- Around line 306-328: The two uses of (value as any).dtype should be replaced
by a properly narrowed binding: extract const maybeDtype = (value as { dtype?:
unknown }).dtype, then check typeof maybeDtype === 'string' and use maybeDtype
in the returned objects' dtype field; update both places (the first
sparse-return block and the csr/csc return) to remove any casts and rely on the
validated maybeDtype variable.
In `@test/runtime_node.test.ts`:
- Around line 390-392: The cleanup rm(tempDir, { recursive: true, force: true })
call can fail on Windows; update the call in the teardown to include retry
options like maxRetries and retryDelay (matching other cleanup calls in this
file) so it becomes rm(tempDir, { recursive: true, force: true, maxRetries:
<same-value>, retryDelay: <same-value> }) to avoid flaky failures when tempDir
is still held open; locate the call using the tempDir variable and the rm
invocation in runtime_node.test.ts and add the same numeric values used at lines
where other rm calls include maxRetries/retryDelay.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (7)
docs/codec-roadmap.mddocs/configuration.mddocs/runtimes/nodejs.mdruntime/python_bridge.pysrc/utils/codec.tstest/runtime_codec.test.tstest/runtime_node.test.ts
🧰 Additional context used
🧬 Code graph analysis (2)
test/runtime_codec.test.ts (1)
src/utils/codec.ts (3)
decodeValueAsync(405-407)CodecEnvelope(43-102)decodeValue(412-422)
test/runtime_node.test.ts (1)
src/runtime/node.ts (1)
NodeBridge(87-449)
🪛 GitHub Check: lint
src/utils/codec.ts
[failure] 375-375:
This assertion is unnecessary since it does not change the type of the expression
[warning] 349-349:
Unexpected any. Specify a different type
[warning] 347-347:
Unexpected any. Specify a different type
[failure] 345-345:
This assertion is unnecessary since it does not change the type of the expression
[warning] 327-327:
Unexpected any. Specify a different type
[warning] 312-312:
Unexpected any. Specify a different type
🪛 markdownlint-cli2 (0.18.1)
docs/codec-roadmap.md
38-38: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
38-38: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
69-69: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
69-69: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
102-102: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
102-102: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/runtimes/nodejs.md
213-213: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
214-214: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
🪛 Ruff (0.14.11)
runtime/python_bridge.py
396-396: Avoid specifying long messages outside the exception class
(TRY003)
580-582: Abstract raise to an inner function
(TRY301)
580-582: Avoid specifying long messages outside the exception class
(TRY003)
584-584: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: os (windows-latest)
🔇 Additional comments (13)
runtime/python_bridge.py (2)
576-583: LGTM - size enforcement logic is correct.The implementation correctly:
- Serializes first, then checks size (unavoidable without a streaming serializer)
- Uses UTF-8 byte length (not string length) for accurate size calculation
- Falls back to an unchecked error response if the size check itself fails, preventing infinite loops
The error message clearly identifies the constraint that was violated.
32-33: LGTM - codec versioning is well implemented.The
CODEC_VERSIONconstant centralizes the version number, and its consistent use across all envelope serializers (ndarray, dataframe, series, scipy.sparse, torch.tensor, sklearn.estimator) ensures forward compatibility and enables decoders to reject incompatible payloads.test/runtime_node.test.ts (1)
356-396: Test logic is correct and well-structured.The test properly validates the
TYWRAP_CODEC_MAX_BYTESenforcement by:
- Creating a Python module that returns a payload larger than the configured limit
- Setting the limit to 200 bytes
- Asserting the error message references the violated constraint
docs/codec-roadmap.md (1)
17-31: Documentation accurately reflects the implementation.The envelope conventions section clearly documents:
- The
__tywrap__marker- The
codecVersionfield with backward compatibility note- The
encodingfield- Size limit configuration via
TYWRAP_CODEC_MAX_BYTESThis aligns well with the implementation in
runtime/python_bridge.py.src/utils/codec.ts (4)
43-102: LGTM! Well-structured envelope type definitions.The
CodecEnvelopeunion type clearly defines the shape for each envelope marker with appropriate optionalcodecVersionfields. The type definitions align well with the validation logic and support both arrow and json encodings where applicable.
177-194: LGTM! Robust version validation.The
assertCodecVersionfunction correctly handles the optional nature ofcodecVersion, validates the type (includingNumber.isFinitecheck to rejectNaN/Infinity), and provides clear error messages with the envelope marker context.
205-218: LGTM! Proper marker dispatch and version validation.The unified marker check followed by
assertCodecVersionensures consistent version validation across all supported envelope types while allowing unknown markers to pass through unchanged.
220-275: LGTM! Consistent validation across dataframe, series, and ndarray.The encoding and field validation is thorough with clear, specific error messages. The repetitive pattern is acceptable given each type has slightly different field requirements.
test/runtime_codec.test.ts (5)
345-409: LGTM! Comprehensive SciPy sparse envelope validation tests.The tests properly cover:
- Valid CSR envelope with
codecVersion- Missing
indices/indptrfor CSR/CSC formats- Missing
colfor COO format- Invalid shape validation (non-2-item array)
Good alignment with the validation logic in
codec.ts.
535-554: LGTM! Thorough torch.tensor validation tests.Tests correctly cover:
- Invalid nested value (not an ndarray envelope)
- Unsupported encoding type
The error assertions match the implementation in
decodeEnvelopeCore.
577-601: LGTM! Good sklearn.estimator validation coverage.Tests properly validate:
- Numeric
versionrejection (must be string)- Non-string
classNamerejectionError assertions align with implementation.
603-623: LGTM! Essential codecVersion validation tests.Tests correctly cover:
- Unsupported version number (2 vs expected 1)
- Invalid version type (string instead of number)
- Both async (
decodeValueAsync) and sync (decodeValue) paths
663-722: LGTM! Updated edge case tests align with stricter validation.Tests correctly verify that invalid envelopes (missing encoding, invalid encoding, missing b64, non-string b64, missing data) now throw with appropriate error messages rather than passing through.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@runtime/python_bridge.py`:
- Around line 49-52: Narrow the broad exception handler around the int(raw)
conversion: replace the "except Exception as exc" that wraps "value = int(raw)"
with "except ValueError as exc" so only conversion errors from int() are caught,
then re-raise the existing ValueError('TYWRAP_CODEC_MAX_BYTES must be an integer
byte count') from exc as before.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (5)
docs/codec-roadmap.mddocs/configuration.mddocs/runtimes/nodejs.mdruntime/python_bridge.pytest/runtime_node.test.ts
🧰 Additional context used
🧬 Code graph analysis (1)
test/runtime_node.test.ts (1)
src/runtime/node.ts (1)
NodeBridge(87-449)
🪛 markdownlint-cli2 (0.18.1)
docs/runtimes/nodejs.md
213-213: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
214-214: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
🪛 Ruff (0.14.11)
runtime/python_bridge.py
52-52: Avoid specifying long messages outside the exception class
(TRY003)
599-601: Abstract raise to an inner function
(TRY301)
599-601: Avoid specifying long messages outside the exception class
(TRY003)
603-603: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: os (windows-latest)
🔇 Additional comments (8)
docs/runtimes/nodejs.md (1)
211-220: Documentation section is well-structured.The payload size limit documentation clearly explains the configuration and behavior when the limit is exceeded. The markdown formatting with blank lines around the heading and code fence is correct.
runtime/python_bridge.py (3)
32-59: LGTM! Codec versioning and max bytes caching implemented correctly.The
CODEC_VERSIONconstant and cachedCODEC_MAX_BYTESprovide consistent envelope versioning and avoid per-response environment lookups. The implementation addresses the past review suggestion to cache the result at module load time.
152-158: Consistent codecVersion injection across all envelope types.The
codecVersionfield is correctly added to all serialized envelopes (ndarray, dataframe, series, scipy.sparse, torch.tensor, sklearn.estimator), enabling decoder-side version validation.
596-612: Payload size enforcement is correctly implemented.The size check at lines 598-601 properly rejects oversized payloads before writing to stdout. The fallback error response (lines 604-612) is minimal and won't trigger recursive size violations.
One note: the static analysis hint about catching blind
Exceptionat line 603 is a false positive here—graceful error recovery in the main loop requires catching all exceptions to maintain protocol compliance.test/runtime_node.test.ts (1)
356-396: Well-structured test for payload size limit enforcement.The test correctly:
- Creates a temporary Python module returning a 5000-char payload
- Sets
TYWRAP_CODEC_MAX_BYTES=200to trigger the limit- Asserts the error message references the environment variable
- Properly cleans up resources with retry options for Windows compatibility
docs/configuration.md (1)
298-298: Environment variable documentation is complete.The
TYWRAP_CODEC_MAX_BYTESentry includes a clear inline comment explaining its purpose (max response payload size in bytes), addressing the past review suggestion.docs/codec-roadmap.md (2)
17-31: Envelope conventions section provides clear documentation.The new section establishes consistent expectations for all tywrap codec envelopes:
- Required fields (
__tywrap__,codecVersion,encoding)- Versioning strategy (v1, backward compat for missing)
- Size limit configuration via
TYWRAP_CODEC_MAX_BYTES
37-115: Envelope specifications are well-documented with proper formatting.All three envelope examples (SciPy, Torch, Sklearn) now include:
- JSON language specifiers for syntax highlighting
codecVersion: 1field aligned with the implementation- Blank lines around code fences for markdown compliance
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
|
@coderabbitai review\n\nAddressed all requested changes; please re-review. |
|
✅ Actions performedReview triggered.
|
|
@coderabbitai approve\n\nAll requested changes are addressed; please update review state. |
|
✅ Actions performedComments resolved and changes approved. |
|
@coderabbitai help |
ChatThere are 3 ways to chat with CodeRabbit:
CodeRabbit commands
Other keywords and placeholders
CodeRabbit configuration file (
|
Summary\n- define codecVersion=1 in Python envelopes and validate on decode\n- add explicit envelope validation for SciPy/Torch/Sklearn/ndarray/dataframe/series\n- enforce optional TYWRAP_CODEC_MAX_BYTES size cap in the Python bridge\n- document envelope specs + new env var\n\n## Testing\n- npm test -- test/runtime_codec.test.ts test/runtime_node.test.ts\n\nCloses #24