fix(codec): await nested Arrow in torch tensors #30

bbopen · 2026-01-14T22:35:36Z

Fixes #21\n\n- Make async decoding await nested Arrow ndarrays inside torch tensor envelopes.\n- Add a regression test to ensure TorchTensor.data is a concrete value, not a Promise.\n

bbopen · 2026-01-14T22:42:47Z

@CodeRabbit full review

coderabbitai · 2026-01-14T22:42:53Z

✅ Actions performed

Full review triggered.

coderabbitai · 2026-01-14T22:42:56Z

📝 Walkthrough

Walkthrough

The changes add asynchronous envelope decoding support by introducing decodeEnvelopeAsync that handles Promise-returning arrow decoders, and update decodeValueAsync to use this async version. A test for torch tensors with nested arrow-encoded ndarrays is included.

Changes

Cohort / File(s)	Summary
Async Envelope Decoder `src/utils/codec.ts`	Added `decodeEnvelopeAsync` function mirroring `decodeEnvelope` with Promise support for arrow decoders; updated `decodeValueAsync` to call async version instead of sync variant. Handles all envelope types (dataframe/series with arrow, ndarray with arrow/json, scipy.sparse, torch.tensor, sklearn.estimator) with proper awaiting of nested async decodes.
Torch Tensor Async Test `test/runtime_codec.test.ts`	Added test case validating that `decodeValueAsync` properly resolves torch tensor envelopes containing nested arrow-encoded ndarrays, ensuring `data` field is a concrete `Uint8Array` rather than a Promise.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A promise wrapped in torch so bright,
Now awaits with async might!
Arrow's data flows without a hitch,
Nested deep, yet resolved—no glitch! 🎯✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title concisely describes the main fix: awaiting nested Arrow decoding in torch tensor envelopes, which matches the primary change.
Description check	✅ Passed	The description clearly relates to the changeset by referencing issue `#21`, explaining the async decoding fix, and mentioning the regression test added.
Linked Issues check	✅ Passed	The pull request implements all requirements from issue `#21`: adds async envelope decoder, awaits nested decodes, and includes a regression test asserting concrete TorchTensor.data values.
Out of Scope Changes check	✅ Passed	All changes are scoped to fixing async decoding of nested Arrow ndarrays in torch tensors; no out-of-scope modifications are present.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/utils/codec.ts`:
- Around line 261-369: decodeEnvelopeAsync duplicates most logic from the
synchronous decodeEnvelope; extract the shared
envelope-matching/field-extraction logic into a single core function (e.g.,
decodeEnvelopeCore) that takes a decodeArrow decoder returning T | Promise<T>
and a recurse/callback to decode nested values, then have decodeEnvelope call
the core synchronously (or wrap a sync decoder) and decodeEnvelopeAsync await
the core; update torch.tensor handling to call the provided recurse for nested
value decoding and remove the duplicated branches from decodeEnvelopeAsync so
both variants reuse the same implementation.

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e6fa8eb and ac12e19.

📒 Files selected for processing (2)

src/utils/codec.ts
test/runtime_codec.test.ts

🧰 Additional context used

🧬 Code graph analysis (2)

test/runtime_codec.test.ts (1)

src/utils/codec.ts (3)

registerArrowDecoder (102-106)

CodecEnvelope (43-90)

decodeValueAsync (374-376)

src/utils/codec.ts (1)

src/index.ts (1)

decodeValueAsync (71-71)

🔇 Additional comments (3)

src/utils/codec.ts (2)

331-346: The async torch tensor decode fix looks correct.

The key change at line 339 properly awaits the nested decodeEnvelopeAsync call, ensuring that when the inner ndarray uses Arrow encoding, the Promise is resolved before being assigned to TorchTensor.data. This directly addresses issue #21.

374-376: LGTM!

The change to use decodeEnvelopeAsync with tryDecodeArrowTable correctly enables async decoding throughout the envelope hierarchy.

test/runtime_codec.test.ts (1)

395-419: Test is correct as written—no changes needed.

The registerArrowDecoder API accepts synchronous decoders only (type signature: (bytes: Uint8Array) => ArrowTable | Uint8Array). The sync decoder in the test correctly matches this contract.

The bug fix addressed was about properly awaiting nested envelope decoding in decodeEnvelopeAsync (see line 339 in codec.ts), not about supporting async decoders. The test validates this by confirming that a torch tensor containing a nested Arrow-encoded ndarray is correctly decoded with proper await semantics in the recursive decoding path.

Likely an incorrect or invalid review comment.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-14T22:45:31Z

src/utils/codec.ts

+async function decodeEnvelopeAsync<T>(
+  value: unknown,
+  decodeArrow: (bytes: Uint8Array) => T | Promise<T>
+): Promise<T | unknown> {
+  if (!isObject(value)) {
+    return value;
+  }
+  const marker = (value as { __tywrap__?: unknown }).__tywrap__;
+  if (
+    (marker === 'dataframe' || marker === 'series') &&
+    (value as { encoding?: unknown }).encoding === 'arrow' &&
+    typeof (value as { b64?: unknown }).b64 === 'string'
+  ) {
+    const bytes = fromBase64(String((value as { b64: string }).b64));
+    return await decodeArrow(bytes);
+  }
+  if (
+    marker === 'dataframe' &&
+    (value as { encoding?: unknown }).encoding === 'json' &&
+    'data' in (value as object)
+  ) {
+    return (value as { data: unknown }).data;
+  }
+  if (
+    marker === 'series' &&
+    (value as { encoding?: unknown }).encoding === 'json' &&
+    'data' in (value as object)
+  ) {
+    return (value as { data: unknown }).data;
+  }
+  if (marker === 'ndarray') {
+    if (
+      (value as { encoding?: unknown }).encoding === 'arrow' &&
+      typeof (value as { b64?: unknown }).b64 === 'string'
+    ) {
+      const bytes = fromBase64(String((value as { b64: string }).b64));
+      return await decodeArrow(bytes);
+    }
+    if ((value as { encoding?: unknown }).encoding === 'json' && 'data' in (value as object)) {
+      return (value as { data: unknown }).data;
+    }
+  }
+  if (
+    marker === 'scipy.sparse' &&
+    (value as { encoding?: unknown }).encoding === 'json' &&
+    typeof (value as { format?: unknown }).format === 'string' &&
+    Array.isArray((value as { shape?: unknown }).shape) &&
+    Array.isArray((value as { data?: unknown }).data)
+  ) {
+    const sparse = value as {
+      format: 'csr' | 'csc' | 'coo';
+      shape: readonly number[];
+      data: readonly unknown[];
+      indices?: readonly number[];
+      indptr?: readonly number[];
+      row?: readonly number[];
+      col?: readonly number[];
+      dtype?: string;
+    };
+    return {
+      format: sparse.format,
+      shape: sparse.shape,
+      data: sparse.data,
+      indices: sparse.indices,
+      indptr: sparse.indptr,
+      row: sparse.row,
+      col: sparse.col,
+      dtype: sparse.dtype,
+    } satisfies SparseMatrix;
+  }
+  if (marker === 'torch.tensor' && (value as { encoding?: unknown }).encoding === 'ndarray') {
+    const torchValue = value as {
+      value?: unknown;
+      shape?: readonly number[];
+      dtype?: string;
+      device?: string;
+    };
+    if ('value' in (torchValue as object)) {
+      const decoded = await decodeEnvelopeAsync(torchValue.value, decodeArrow);
+      return {
+        data: decoded,
+        shape: torchValue.shape,
+        dtype: torchValue.dtype,
+        device: torchValue.device,
+      } satisfies TorchTensor;
+    }
+  }
+  if (
+    marker === 'sklearn.estimator' &&
+    (value as { encoding?: unknown }).encoding === 'json' &&
+    typeof (value as { className?: unknown }).className === 'string' &&
+    typeof (value as { module?: unknown }).module === 'string' &&
+    isObject((value as { params?: unknown }).params)
+  ) {
+    const estimator = value as {
+      className: string;
+      module: string;
+      version?: string;
+      params: Record<string, unknown>;
+    };
+    return {
+      className: estimator.className,
+      module: estimator.module,
+      version: estimator.version,
+      params: estimator.params,
+    } satisfies SklearnEstimator;
+  }
+  return value as unknown;
+}


🧹 Nitpick | 🔵 Trivial

Significant code duplication between decodeEnvelope and decodeEnvelopeAsync.

The new async function duplicates ~100 lines from the sync version. This creates maintenance burden—any future envelope type or logic change must be applied in both places.

Consider refactoring to share the common logic. One approach: make the core function always async-compatible by accepting a decoder that may return T | Promise<T>, then have the sync wrapper use a blocking pattern or keep sync-only paths separate.

♻️ Suggested approach to reduce duplication

// Example: Single implementation that handles both sync and async async function decodeEnvelopeCore<T>( value: unknown, decodeArrow: (bytes: Uint8Array) => T | Promise<T>, recurse: (v: unknown) => T | Promise<T | unknown> ): Promise<T | unknown> { // ... shared envelope matching logic ... // For torch.tensor: if (marker === 'torch.tensor' && ...) { const decoded = await recurse(torchValue.value); return { data: decoded, ... }; } // ... } // Sync version wraps with sync decoder export function decodeValue(value: unknown): DecodedValue { // Use sync-only path or validate decoder is sync } // Async version export async function decodeValueAsync(value: unknown): Promise<DecodedValue> { return decodeEnvelopeCore(value, tryDecodeArrowTable, v => decodeEnvelopeCore(v, tryDecodeArrowTable, ...)); }

🤖 Prompt for AI Agents

In `@src/utils/codec.ts` around lines 261 - 369, decodeEnvelopeAsync duplicates most logic from the synchronous decodeEnvelope; extract the shared envelope-matching/field-extraction logic into a single core function (e.g., decodeEnvelopeCore) that takes a decodeArrow decoder returning T | Promise<T> and a recurse/callback to decode nested values, then have decodeEnvelope call the core synchronously (or wrap a sync decoder) and decodeEnvelopeAsync await the core; update torch.tensor handling to call the provided recurse for nested value decoding and remove the duplicated branches from decodeEnvelopeAsync so both variants reuse the same implementation.

bbopen · 2026-01-15T05:59:39Z

Closing as duplicate of #33 (same fix for #21). #33 includes the additional tests + CodeRabbit-requested refactor to dedupe sync/async envelope decoding.

fix(codec): await nested Arrow in torch tensors

ac12e19

bbopen added bug Something isn't working area:codec Area: codecs and serialization priority:p1 Priority P1 (high) labels Jan 14, 2026

bbopen self-assigned this Jan 14, 2026

coderabbitai bot requested changes Jan 14, 2026

View reviewed changes

bbopen closed this Jan 15, 2026

bbopen deleted the fix/issue-21-async-torch-decode branch January 15, 2026 05:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(codec): await nested Arrow in torch tensors #30

fix(codec): await nested Arrow in torch tensors #30

Uh oh!

bbopen commented Jan 14, 2026

Uh oh!

bbopen commented Jan 14, 2026

Uh oh!

coderabbitai bot commented Jan 14, 2026

Uh oh!

coderabbitai bot commented Jan 14, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 14, 2026

Uh oh!

bbopen commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(codec): await nested Arrow in torch tensors #30

fix(codec): await nested Arrow in torch tensors #30

Uh oh!

Conversation

bbopen commented Jan 14, 2026

Uh oh!

bbopen commented Jan 14, 2026

Uh oh!

coderabbitai bot commented Jan 14, 2026

Uh oh!

coderabbitai bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

bbopen commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Jan 14, 2026 •

edited

Loading