-
Notifications
You must be signed in to change notification settings - Fork 0
feat(codec): auto-register Arrow decoder #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughAdds a new public API Changes
Sequence Diagram(s)sequenceDiagram
participant NodeRuntime as Node.js Runtime
participant Codec as Codec Registry
participant Loader as Module Loader
participant ArrowModule as Apache Arrow
NodeRuntime->>Codec: call autoRegisterArrowDecoder({ loader? })
Codec->>Codec: check if decoder already registered
alt already registered
Codec-->>NodeRuntime: return true
else not registered
Codec->>Loader: invoke loader()
Loader->>ArrowModule: require / import 'apache-arrow'
ArrowModule-->>Loader: return module (expects tableFromIPC)
Loader-->>Codec: provide module
Codec->>Codec: validate tableFromIPC and register decoder
Codec-->>NodeRuntime: return true
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (4)
🧰 Additional context used🧬 Code graph analysis (1)src/runtime/node.ts (1)
🔇 Additional comments (11)
✏️ Tip: You can disable this entire section by setting Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 44de44b1b4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
runtime/python_bridge.py
Outdated
| 'arrowAvailable': arrow_available(), | ||
| 'scipyAvailable': module_available('scipy.sparse'), | ||
| 'torchAvailable': module_available('torch'), | ||
| 'sklearnAvailable': module_available('sklearn.base'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid heavy imports in metadata feature detection
module_available uses importlib.util.find_spec, and when you pass dotted module names like scipy.sparse or sklearn.base, Python implicitly imports the parent package to resolve the submodule. That means handle_meta() will eagerly import heavy libraries on every bridge startup, which defeats the “lightweight” intent and can add noticeable startup latency or trigger side effects even if those codecs are never used. To keep metadata probing cheap, prefer checking only the top-level packages (e.g., scipy, sklearn) or use a lookup that doesn’t require importing the parent.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@docs/runtimes/nodejs.md`:
- Around line 180-186: The code snippet mixes auto and manual registration: it
calls autoRegisterArrowDecoder() while the prose instructs to "register
manually"; separate and clarify both paths by showing them as distinct examples
and updating the text: one short example demonstrating the auto path using
autoRegisterArrowDecoder() (for when apache-arrow is installed) and a separate
manual customization example that imports apache-arrow's tableFromIPC and calls
registerArrowDecoder(bytes => tableFromIPC(bytes)); reference
autoRegisterArrowDecoder, registerArrowDecoder, and tableFromIPC in the updated
snippet and reword the surrounding sentence to explicitly state which example is
auto and which is manual.
In `@src/runtime/optimized-node.ts`:
- Around line 190-193: The loader passed to autoRegisterArrowDecoder uses an
unnecessary async wrapper around the synchronous require call; change the loader
to return the module directly (remove async wrapper) so that
autoRegisterArrowDecoder({ loader: ... }) calls a synchronous loader — locate
the call referencing createRequire and autoRegisterArrowDecoder and replace the
async loader function with a direct-return loader that invokes
require('apache-arrow') and returns that value.
In `@src/utils/codec.ts`:
- Around line 152-163: The inline async default assigned to loader (the
options.loader fallback used when isNodeRuntime() is true) has no explicit
return type and violates noImplicitReturns; update that anonymous async function
to declare its return type as Promise<unknown> (e.g. async (): Promise<unknown>
=> { ... }) so the loader variable's default matches the expected signature and
satisfies TypeScript's rule.
In `@test/runtime_codec.test.ts`:
- Around line 78-92: Add assertions to the failing-case tests to ensure no
decoder was registered: after calling autoRegisterArrowDecoder({ loader }) in
both "should return false when loader lacks tableFromIPC" and "should return
false when loader throws" tests, call hasArrowDecoder() and assert it is false
(expect(hasArrowDecoder()).toBe(false)); update references to
autoRegisterArrowDecoder and hasArrowDecoder to locate the checks and keep the
existing expect(registered).toBe(false) assertions.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (14)
docs/api/README.mddocs/codec-roadmap.mddocs/runtimes/browser.mddocs/runtimes/nodejs.mdexamples/living-app/README.mdexamples/living-app/src/index.tsruntime/python_bridge.pysrc/index.tssrc/runtime/node.tssrc/runtime/optimized-node.tssrc/types/index.tssrc/utils/codec.tstest/runtime_codec.test.tstest/runtime_node.test.ts
🧰 Additional context used
🧬 Code graph analysis (5)
src/runtime/node.ts (2)
src/index.ts (1)
autoRegisterArrowDecoder(72-72)src/utils/codec.ts (1)
autoRegisterArrowDecoder(146-175)
test/runtime_codec.test.ts (1)
src/utils/codec.ts (3)
autoRegisterArrowDecoder(146-175)hasArrowDecoder(127-129)registerArrowDecoder(114-118)
src/runtime/optimized-node.ts (2)
src/index.ts (1)
autoRegisterArrowDecoder(72-72)src/utils/codec.ts (1)
autoRegisterArrowDecoder(146-175)
src/utils/codec.ts (1)
src/index.ts (2)
registerArrowDecoder(73-73)autoRegisterArrowDecoder(72-72)
examples/living-app/src/index.ts (2)
src/index.ts (1)
autoRegisterArrowDecoder(72-72)src/utils/codec.ts (1)
autoRegisterArrowDecoder(146-175)
🪛 GitHub Check: lint
src/utils/codec.ts
[warning] 155-155:
Missing return type on function
🪛 Ruff (0.14.11)
runtime/python_bridge.py
104-104: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (13)
examples/living-app/README.md (1)
7-7: Docs align with Node auto-registration.This statement accurately reflects the new default behavior and keeps the living-app guide consistent.
runtime/python_bridge.py (2)
5-105: Lightweight feature detection keeps startup cheap.Using find_spec avoids heavy imports/side effects while still exposing availability.
563-575: Bridge metadata expansion looks good.Exposing scipy/torch/sklearn availability in meta supports runtime feature-gating on the TS side.
test/runtime_node.test.ts (1)
139-145: Good coverage for new availability flags.These assertions validate the BridgeInfo shape while keeping runtime variability intact.
src/runtime/node.ts (1)
8-10: Auto-registration is correctly sequenced before process start.This ensures Arrow decoding is ready for the first response without impacting the spawn flow.
Also applies to: 275-280
docs/runtimes/browser.md (1)
66-66: Doc clarification matches runtime behavior.Clear, accurate note about Node auto-registration when apache-arrow is present.
src/index.ts (1)
69-75: LGTM!The new
autoRegisterArrowDecoderexport is correctly added alongside the existing codec utilities, maintaining logical grouping and consistent import paths.src/types/index.ts (1)
319-323: LGTM!The new availability flags follow the established naming convention (
*Available) and are correctly typed as booleans. The placement afterarrowAvailablemaintains logical grouping for feature detection properties.docs/api/README.md (1)
140-157: LGTM!The documentation clearly explains the auto-registration flow, shows correct usage patterns, and distinguishes between automatic registration in
NodeBridgeversus manual registration for standalone usage. The example code is accurate.test/runtime_codec.test.ts (2)
55-64: LGTM!Good test coverage for the happy path - verifies the loader is invoked, registration succeeds, and the decoder becomes available.
66-76: LGTM!Correctly verifies the short-circuit behavior when a decoder is already registered, using a throwing loader to ensure it's never invoked.
docs/codec-roadmap.md (1)
17-23: Docs update looks consistent and clear.The additions align with the auto-registration direction and set clear expectations for defaults, feature detection, and future transport work.
Also applies to: 38-42, 108-109, 139-142
examples/living-app/src/index.ts (1)
7-7: LGTM — example aligns with new auto-registration API.The import and enablement flow now matches the updated codec API and keeps a clear failure path when Arrow isn’t available.
Also applies to: 55-59
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
|
@coderabbitai review incremental changes on latest push. Addressed requested updates + docstrings. |
|
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@runtime/python_bridge.py`:
- Around line 87-111: Remove the unused noqa and tighten the broad exception
handlers: in arrow_available() delete the "# noqa: F401" and replace the broad
except with except ImportError: returning False; in
module_available(module_name: str) narrow the catch to likely issues (e.g.
except (ImportError, AttributeError, TypeError, ValueError): return False) and
add a short comment that these specific exceptions guard against importlib edge
cases so linters stop flagging BLE001.
♻️ Duplicate comments (1)
runtime/python_bridge.py (1)
607-610: Avoid dotted-module checks in metadata probing.Line 608-610 passes dotted module names into
module_available, which can import heavy parent packages viafind_spec, undermining “lightweight” probing. Prefer top‑level module checks.♻️ Suggested tweak
- 'scipyAvailable': module_available('scipy.sparse'), + 'scipyAvailable': module_available('scipy'), @@ - 'sklearnAvailable': module_available('sklearn.base'), + 'sklearnAvailable': module_available('sklearn'),
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (5)
docs/runtimes/nodejs.mdruntime/python_bridge.pysrc/runtime/optimized-node.tssrc/utils/codec.tstest/runtime_codec.test.ts
🧰 Additional context used
🧬 Code graph analysis (3)
test/runtime_codec.test.ts (2)
src/utils/codec.ts (3)
autoRegisterArrowDecoder(163-192)hasArrowDecoder(127-129)registerArrowDecoder(114-118)src/index.ts (2)
autoRegisterArrowDecoder(72-72)registerArrowDecoder(73-73)
src/runtime/optimized-node.ts (2)
src/utils/codec.ts (1)
autoRegisterArrowDecoder(163-192)src/index.ts (1)
autoRegisterArrowDecoder(72-72)
src/utils/codec.ts (1)
src/index.ts (2)
registerArrowDecoder(73-73)autoRegisterArrowDecoder(72-72)
🪛 Ruff (0.14.11)
runtime/python_bridge.py
95-95: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
96-96: Do not catch blind exception: Exception
(BLE001)
110-110: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (4)
src/utils/codec.ts (1)
131-192: Auto-registration flow is clean and defensive.The Node-runtime guard, loader fallback, and safe
falsereturn keep optional Arrow support robust without breaking non-Node environments.test/runtime_codec.test.ts (1)
55-94: Auto-register tests cover the critical paths.Success, skip, and failure cases are all exercised, and the decoder state is asserted after failures. Nice coverage.
src/runtime/optimized-node.ts (1)
190-193: Startup auto-registration looks good.Decoder registration happens early and remains optional without breaking startup.
docs/runtimes/nodejs.md (1)
179-191: Docs now clearly separate auto vs manual Arrow registration.The split examples make the default behavior and customization path easy to follow.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
|
Addressed review notes: narrowed exception handling + removed unused noqa, made module probing top-level, and added docstring for why meta exposes capabilities. |
|
@coderabbitai review\n\nAll requested changes addressed; please re-review. |
|
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@runtime/python_bridge.py`:
- Around line 101-112: Update the bridge metadata docs/type to clarify that
module availability flags are best-effort hints, not guarantees: add a short
note to the BridgeInfo docstring (or the docstring near module_available)
stating that module_available(module_name) returns an informational capability
hint and that actual runtime checks like is_scipy_sparse, is_torch_tensor, and
is_sklearn_estimator perform independent try/catch import validation at
serialization time and should be relied on for definitive behavior; keep the
wording concise and place it alongside the BridgeInfo/type declaration or
module_available docstring so consumers see the caveat where flags are exposed.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
runtime/python_bridge.py
🔇 Additional comments (9)
runtime/python_bridge.py (9)
5-5: LGTM: stdlib import for find_spec.
87-97: Optional PyArrow detection is safer now.
Narrowing toImportError/OSErrorkeeps startup resilient without masking unrelated failures.
116-120: Docstring clarifies optional NumPy handling.
129-133: Docstring clarifies optional pandas DataFrame handling.
142-146: Docstring clarifies optional pandas Series handling.
155-159: Docstring clarifies optional scipy sparse handling.
171-175: Docstring clarifies optional torch tensor handling.
187-191: Docstring clarifies optional sklearn estimator handling.
600-617: Bridge metadata now exposes optional codec availability.
This aligns well with feature detection on the Node side.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
|
Addressed CodeRabbit items: clarified auto vs manual Arrow registration docs, made Node loader sync, widened loader type, and added best-effort hint note for module availability. Please re-review. |
|
@coderabbitai review\n\nAll requested changes addressed; please update review state. |
|
✅ Actions performedReview triggered.
|
Summary
Testing