Skip to content

Conversation

@bbopen
Copy link
Owner

@bbopen bbopen commented Jan 15, 2026

Summary

  • auto-register Arrow decoder in Node bridges for frictionless Arrow defaults
  • expose helper for manual registration + tests for auto-register behavior
  • update living app + docs to rely on auto-registration

Testing

  • npm test -- test/runtime_codec.test.ts

@coderabbitai
Copy link

coderabbitai bot commented Jan 15, 2026

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds a new public API autoRegisterArrowDecoder(): Promise<boolean>, integrates it into Node startup and codec utilities to auto-load/register Apache Arrow decoders, extends Python bridge feature flags, updates types/tests, and revises docs/examples to prefer the auto-registration flow.

Changes

Cohort / File(s) Summary
Core codec auto-registration
src/utils/codec.ts, src/index.ts, test/runtime_codec.test.ts
New autoRegisterArrowDecoder(options?: { loader?: () => Promise<unknown> }): Promise<boolean> export that lazily loads and registers an Arrow IPC decoder (accepts optional loader or uses Node loader). Adds tests for success, no-op when already registered, missing tableFromIPC, and loader rejection.
Node runtime integration
src/runtime/node.ts, src/runtime/optimized-node.ts, test/runtime_node.test.ts
Startup now awaits autoRegisterArrowDecoder() (uses createRequire(import.meta.url) loader in Node) before bridge/process startup so Arrow decoding is available; tests updated to assert added BridgeInfo flags.
Python bridge & types
runtime/python_bridge.py, src/types/index.ts
Added module_available(module_name: str) -> bool, exposed scipyAvailable, torchAvailable, and sklearnAvailable in bridge metadata; BridgeInfo gains corresponding boolean fields.
Docs & examples
docs/api/README.md, docs/codec-roadmap.md, docs/runtimes/browser.md, docs/runtimes/nodejs.md, examples/living-app/README.md, examples/living-app/src/index.ts
Documentation and examples updated to demonstrate the auto-registration flow, replace manual require-based Arrow setup with autoRegisterArrowDecoder() usage, and add DX/roadmap notes about codec defaults and feature detection.

Sequence Diagram(s)

sequenceDiagram
    participant NodeRuntime as Node.js Runtime
    participant Codec as Codec Registry
    participant Loader as Module Loader
    participant ArrowModule as Apache Arrow

    NodeRuntime->>Codec: call autoRegisterArrowDecoder({ loader? })
    Codec->>Codec: check if decoder already registered
    alt already registered
        Codec-->>NodeRuntime: return true
    else not registered
        Codec->>Loader: invoke loader()
        Loader->>ArrowModule: require / import 'apache-arrow'
        ArrowModule-->>Loader: return module (expects tableFromIPC)
        Loader-->>Codec: provide module
        Codec->>Codec: validate tableFromIPC and register decoder
        Codec-->>NodeRuntime: return true
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 I hopped in the runtime, found modules to be merry,
autoRegisterArrowDecoder — no more require to carry!
TableFromIPC joins the dance, loaders hum a tune,
Scipy, Torch, and Sklearn peek like stars and moon.
Data hops, decodes, and scurries off by noon.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(codec): auto-register Arrow decoder' directly summarizes the main change—adding auto-registration of the Arrow decoder in Node bridges, which is the primary objective of the PR.
Description check ✅ Passed The description clearly relates to the changeset by outlining the three main initiatives: auto-registration in Node bridges, exposing a helper for manual registration with tests, and updating documentation and examples.
Docstring Coverage ✅ Passed Docstring coverage is 93.33% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5292801 and 1431438.

📒 Files selected for processing (4)
  • docs/runtimes/nodejs.md
  • runtime/python_bridge.py
  • src/runtime/node.ts
  • src/utils/codec.ts
🧰 Additional context used
🧬 Code graph analysis (1)
src/runtime/node.ts (1)
src/utils/codec.ts (1)
  • autoRegisterArrowDecoder (165-194)
🔇 Additional comments (11)
runtime/python_bridge.py (5)

5-5: LGTM!

The import of importlib.util enables lightweight module detection via find_spec without triggering heavy imports.


87-98: LGTM!

Exception handling correctly narrowed to (ImportError, OSError), addressing the previous review feedback. The docstring clearly explains the function's purpose.


101-114: LGTM!

The implementation addresses previous review feedback:

  • Uses find_spec for lightweight probing without importing heavy modules.
  • Exception handling is appropriately narrow with a comment explaining the rationale.
  • Docstring clearly documents that these are best-effort hints, not guarantees.

116-197: LGTM!

Docstrings added to all detector functions improve code documentation and explain the optional dependency detection pattern.


601-619: LGTM!

The handle_meta() function now exposes capability flags using top-level package names only, addressing the previous review concern about heavy imports during metadata detection.

src/utils/codec.ts (3)

131-143: LGTM!

The ArrowModuleLoader type and isNodeRuntime() helper are cleanly implemented with defensive type checks for safe cross-environment usage.


145-157: LGTM!

The helper centralizes Arrow module validation and registration, providing a clear error message when tableFromIPC is missing or invalid.


165-194: LGTM!

The autoRegisterArrowDecoder function is well-designed:

  • Idempotent (safe to call multiple times).
  • Gracefully handles missing loaders and failed imports by returning false.
  • The Node-specific loader correctly tries CommonJS require first with ESM fallback.
  • Return type annotation added per previous review feedback.
docs/runtimes/nodejs.md (1)

179-194: LGTM!

Documentation clearly separates the auto-registration path from the manual customization path, addressing the previous review feedback about clarity.

src/runtime/node.ts (2)

8-10: LGTM!

Imports correctly updated to bring in createRequire for CommonJS module loading and autoRegisterArrowDecoder for the new auto-registration pathway.


275-280: LGTM!

The Arrow decoder is correctly registered before spawning the Python process. The synchronous loader pattern works with the async API since ArrowModuleLoader accepts both sync and async return types. Ignoring the return value is acceptable since Arrow decoding failures will produce clear errors later if apache-arrow is unavailable.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@bbopen bbopen self-assigned this Jan 15, 2026
@bbopen bbopen added enhancement New feature or request area:codec Area: codecs and serialization priority:p2 Priority P2 (medium) labels Jan 15, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 44de44b1b4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 571 to 574
'arrowAvailable': arrow_available(),
'scipyAvailable': module_available('scipy.sparse'),
'torchAvailable': module_available('torch'),
'sklearnAvailable': module_available('sklearn.base'),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid heavy imports in metadata feature detection

module_available uses importlib.util.find_spec, and when you pass dotted module names like scipy.sparse or sklearn.base, Python implicitly imports the parent package to resolve the submodule. That means handle_meta() will eagerly import heavy libraries on every bridge startup, which defeats the “lightweight” intent and can add noticeable startup latency or trigger side effects even if those codecs are never used. To keep metadata probing cheap, prefer checking only the top-level packages (e.g., scipy, sklearn) or use a lookup that doesn’t require importing the parent.

Useful? React with 👍 / 👎.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@docs/runtimes/nodejs.md`:
- Around line 180-186: The code snippet mixes auto and manual registration: it
calls autoRegisterArrowDecoder() while the prose instructs to "register
manually"; separate and clarify both paths by showing them as distinct examples
and updating the text: one short example demonstrating the auto path using
autoRegisterArrowDecoder() (for when apache-arrow is installed) and a separate
manual customization example that imports apache-arrow's tableFromIPC and calls
registerArrowDecoder(bytes => tableFromIPC(bytes)); reference
autoRegisterArrowDecoder, registerArrowDecoder, and tableFromIPC in the updated
snippet and reword the surrounding sentence to explicitly state which example is
auto and which is manual.

In `@src/runtime/optimized-node.ts`:
- Around line 190-193: The loader passed to autoRegisterArrowDecoder uses an
unnecessary async wrapper around the synchronous require call; change the loader
to return the module directly (remove async wrapper) so that
autoRegisterArrowDecoder({ loader: ... }) calls a synchronous loader — locate
the call referencing createRequire and autoRegisterArrowDecoder and replace the
async loader function with a direct-return loader that invokes
require('apache-arrow') and returns that value.

In `@src/utils/codec.ts`:
- Around line 152-163: The inline async default assigned to loader (the
options.loader fallback used when isNodeRuntime() is true) has no explicit
return type and violates noImplicitReturns; update that anonymous async function
to declare its return type as Promise<unknown> (e.g. async (): Promise<unknown>
=> { ... }) so the loader variable's default matches the expected signature and
satisfies TypeScript's rule.

In `@test/runtime_codec.test.ts`:
- Around line 78-92: Add assertions to the failing-case tests to ensure no
decoder was registered: after calling autoRegisterArrowDecoder({ loader }) in
both "should return false when loader lacks tableFromIPC" and "should return
false when loader throws" tests, call hasArrowDecoder() and assert it is false
(expect(hasArrowDecoder()).toBe(false)); update references to
autoRegisterArrowDecoder and hasArrowDecoder to locate the checks and keep the
existing expect(registered).toBe(false) assertions.
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 607e65c and 44de44b.

📒 Files selected for processing (14)
  • docs/api/README.md
  • docs/codec-roadmap.md
  • docs/runtimes/browser.md
  • docs/runtimes/nodejs.md
  • examples/living-app/README.md
  • examples/living-app/src/index.ts
  • runtime/python_bridge.py
  • src/index.ts
  • src/runtime/node.ts
  • src/runtime/optimized-node.ts
  • src/types/index.ts
  • src/utils/codec.ts
  • test/runtime_codec.test.ts
  • test/runtime_node.test.ts
🧰 Additional context used
🧬 Code graph analysis (5)
src/runtime/node.ts (2)
src/index.ts (1)
  • autoRegisterArrowDecoder (72-72)
src/utils/codec.ts (1)
  • autoRegisterArrowDecoder (146-175)
test/runtime_codec.test.ts (1)
src/utils/codec.ts (3)
  • autoRegisterArrowDecoder (146-175)
  • hasArrowDecoder (127-129)
  • registerArrowDecoder (114-118)
src/runtime/optimized-node.ts (2)
src/index.ts (1)
  • autoRegisterArrowDecoder (72-72)
src/utils/codec.ts (1)
  • autoRegisterArrowDecoder (146-175)
src/utils/codec.ts (1)
src/index.ts (2)
  • registerArrowDecoder (73-73)
  • autoRegisterArrowDecoder (72-72)
examples/living-app/src/index.ts (2)
src/index.ts (1)
  • autoRegisterArrowDecoder (72-72)
src/utils/codec.ts (1)
  • autoRegisterArrowDecoder (146-175)
🪛 GitHub Check: lint
src/utils/codec.ts

[warning] 155-155:
Missing return type on function

🪛 Ruff (0.14.11)
runtime/python_bridge.py

104-104: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (13)
examples/living-app/README.md (1)

7-7: Docs align with Node auto-registration.

This statement accurately reflects the new default behavior and keeps the living-app guide consistent.

runtime/python_bridge.py (2)

5-105: Lightweight feature detection keeps startup cheap.

Using find_spec avoids heavy imports/side effects while still exposing availability.


563-575: Bridge metadata expansion looks good.

Exposing scipy/torch/sklearn availability in meta supports runtime feature-gating on the TS side.

test/runtime_node.test.ts (1)

139-145: Good coverage for new availability flags.

These assertions validate the BridgeInfo shape while keeping runtime variability intact.

src/runtime/node.ts (1)

8-10: Auto-registration is correctly sequenced before process start.

This ensures Arrow decoding is ready for the first response without impacting the spawn flow.

Also applies to: 275-280

docs/runtimes/browser.md (1)

66-66: Doc clarification matches runtime behavior.

Clear, accurate note about Node auto-registration when apache-arrow is present.

src/index.ts (1)

69-75: LGTM!

The new autoRegisterArrowDecoder export is correctly added alongside the existing codec utilities, maintaining logical grouping and consistent import paths.

src/types/index.ts (1)

319-323: LGTM!

The new availability flags follow the established naming convention (*Available) and are correctly typed as booleans. The placement after arrowAvailable maintains logical grouping for feature detection properties.

docs/api/README.md (1)

140-157: LGTM!

The documentation clearly explains the auto-registration flow, shows correct usage patterns, and distinguishes between automatic registration in NodeBridge versus manual registration for standalone usage. The example code is accurate.

test/runtime_codec.test.ts (2)

55-64: LGTM!

Good test coverage for the happy path - verifies the loader is invoked, registration succeeds, and the decoder becomes available.


66-76: LGTM!

Correctly verifies the short-circuit behavior when a decoder is already registered, using a throwing loader to ensure it's never invoked.

docs/codec-roadmap.md (1)

17-23: Docs update looks consistent and clear.

The additions align with the auto-registration direction and set clear expectations for defaults, feature detection, and future transport work.

Also applies to: 38-42, 108-109, 139-142

examples/living-app/src/index.ts (1)

7-7: LGTM — example aligns with new auto-registration API.

The import and enablement flow now matches the updated codec API and keeps a clear failure path when Arrow isn’t available.

Also applies to: 55-59

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@bbopen
Copy link
Owner Author

bbopen commented Jan 16, 2026

@coderabbitai review incremental changes on latest push. Addressed requested updates + docstrings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 16, 2026

@bbopen I'll review the incremental changes from your latest push, with a focus on the updates you've addressed and the docstrings you've added.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@runtime/python_bridge.py`:
- Around line 87-111: Remove the unused noqa and tighten the broad exception
handlers: in arrow_available() delete the "# noqa: F401" and replace the broad
except with except ImportError: returning False; in
module_available(module_name: str) narrow the catch to likely issues (e.g.
except (ImportError, AttributeError, TypeError, ValueError): return False) and
add a short comment that these specific exceptions guard against importlib edge
cases so linters stop flagging BLE001.
♻️ Duplicate comments (1)
runtime/python_bridge.py (1)

607-610: Avoid dotted-module checks in metadata probing.

Line 608-610 passes dotted module names into module_available, which can import heavy parent packages via find_spec, undermining “lightweight” probing. Prefer top‑level module checks.

♻️ Suggested tweak
-        'scipyAvailable': module_available('scipy.sparse'),
+        'scipyAvailable': module_available('scipy'),
@@
-        'sklearnAvailable': module_available('sklearn.base'),
+        'sklearnAvailable': module_available('sklearn'),
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 44de44b and 9b52eb8.

📒 Files selected for processing (5)
  • docs/runtimes/nodejs.md
  • runtime/python_bridge.py
  • src/runtime/optimized-node.ts
  • src/utils/codec.ts
  • test/runtime_codec.test.ts
🧰 Additional context used
🧬 Code graph analysis (3)
test/runtime_codec.test.ts (2)
src/utils/codec.ts (3)
  • autoRegisterArrowDecoder (163-192)
  • hasArrowDecoder (127-129)
  • registerArrowDecoder (114-118)
src/index.ts (2)
  • autoRegisterArrowDecoder (72-72)
  • registerArrowDecoder (73-73)
src/runtime/optimized-node.ts (2)
src/utils/codec.ts (1)
  • autoRegisterArrowDecoder (163-192)
src/index.ts (1)
  • autoRegisterArrowDecoder (72-72)
src/utils/codec.ts (1)
src/index.ts (2)
  • registerArrowDecoder (73-73)
  • autoRegisterArrowDecoder (72-72)
🪛 Ruff (0.14.11)
runtime/python_bridge.py

95-95: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


96-96: Do not catch blind exception: Exception

(BLE001)


110-110: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (4)
src/utils/codec.ts (1)

131-192: Auto-registration flow is clean and defensive.

The Node-runtime guard, loader fallback, and safe false return keep optional Arrow support robust without breaking non-Node environments.

test/runtime_codec.test.ts (1)

55-94: Auto-register tests cover the critical paths.

Success, skip, and failure cases are all exercised, and the decoder state is asserted after failures. Nice coverage.

src/runtime/optimized-node.ts (1)

190-193: Startup auto-registration looks good.

Decoder registration happens early and remains optional without breaking startup.

docs/runtimes/nodejs.md (1)

179-191: Docs now clearly separate auto vs manual Arrow registration.

The split examples make the default behavior and customization path easy to follow.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@bbopen
Copy link
Owner Author

bbopen commented Jan 16, 2026

Addressed review notes: narrowed exception handling + removed unused noqa, made module probing top-level, and added docstring for why meta exposes capabilities.

@bbopen
Copy link
Owner Author

bbopen commented Jan 16, 2026

@coderabbitai review\n\nAll requested changes addressed; please re-review.

@coderabbitai
Copy link

coderabbitai bot commented Jan 16, 2026

@bbopen I'll review the incremental changes now. Thank you for addressing all the requested feedback—particularly the narrowed exception handling, top-level module probing, and the added docstring for capability exposition.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@runtime/python_bridge.py`:
- Around line 101-112: Update the bridge metadata docs/type to clarify that
module availability flags are best-effort hints, not guarantees: add a short
note to the BridgeInfo docstring (or the docstring near module_available)
stating that module_available(module_name) returns an informational capability
hint and that actual runtime checks like is_scipy_sparse, is_torch_tensor, and
is_sklearn_estimator perform independent try/catch import validation at
serialization time and should be relied on for definitive behavior; keep the
wording concise and place it alongside the BridgeInfo/type declaration or
module_available docstring so consumers see the caveat where flags are exposed.
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b52eb8 and 5292801.

📒 Files selected for processing (1)
  • runtime/python_bridge.py
🔇 Additional comments (9)
runtime/python_bridge.py (9)

5-5: LGTM: stdlib import for find_spec.


87-97: Optional PyArrow detection is safer now.
Narrowing to ImportError/OSError keeps startup resilient without masking unrelated failures.


116-120: Docstring clarifies optional NumPy handling.


129-133: Docstring clarifies optional pandas DataFrame handling.


142-146: Docstring clarifies optional pandas Series handling.


155-159: Docstring clarifies optional scipy sparse handling.


171-175: Docstring clarifies optional torch tensor handling.


187-191: Docstring clarifies optional sklearn estimator handling.


600-617: Bridge metadata now exposes optional codec availability.
This aligns well with feature detection on the Node side.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@bbopen
Copy link
Owner Author

bbopen commented Jan 16, 2026

Addressed CodeRabbit items: clarified auto vs manual Arrow registration docs, made Node loader sync, widened loader type, and added best-effort hint note for module availability. Please re-review.

@bbopen
Copy link
Owner Author

bbopen commented Jan 16, 2026

@coderabbitai review\n\nAll requested changes addressed; please update review state.

@coderabbitai
Copy link

coderabbitai bot commented Jan 16, 2026

@bbopen I'll review the incremental changes now. Thank you for addressing all the feedback—particularly the documentation clarifications around auto vs manual Arrow registration, the synchronous Node loader change, the widened loader type, and the best-effort module availability hint.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:codec Area: codecs and serialization enhancement New feature or request priority:p2 Priority P2 (medium)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants