Skip to content

Conversation

@bbopen
Copy link
Owner

@bbopen bbopen commented Jan 14, 2026

Fixes #21\n\n- Make async decoding await nested Arrow ndarrays inside torch tensor envelopes.\n- Add a regression test to ensure TorchTensor.data is a concrete value, not a Promise.\n

@bbopen bbopen added bug Something isn't working area:codec Area: codecs and serialization priority:p1 Priority P1 (high) labels Jan 14, 2026
@bbopen bbopen self-assigned this Jan 14, 2026
@bbopen
Copy link
Owner Author

bbopen commented Jan 14, 2026

@CodeRabbit full review

@coderabbitai
Copy link

coderabbitai bot commented Jan 14, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link

coderabbitai bot commented Jan 14, 2026

📝 Walkthrough

Walkthrough

The changes add asynchronous envelope decoding support by introducing decodeEnvelopeAsync that handles Promise-returning arrow decoders, and update decodeValueAsync to use this async version. A test for torch tensors with nested arrow-encoded ndarrays is included.

Changes

Cohort / File(s) Summary
Async Envelope Decoder
src/utils/codec.ts
Added decodeEnvelopeAsync function mirroring decodeEnvelope with Promise support for arrow decoders; updated decodeValueAsync to call async version instead of sync variant. Handles all envelope types (dataframe/series with arrow, ndarray with arrow/json, scipy.sparse, torch.tensor, sklearn.estimator) with proper awaiting of nested async decodes.
Torch Tensor Async Test
test/runtime_codec.test.ts
Added test case validating that decodeValueAsync properly resolves torch tensor envelopes containing nested arrow-encoded ndarrays, ensuring data field is a concrete Uint8Array rather than a Promise.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A promise wrapped in torch so bright,
Now awaits with async might!
Arrow's data flows without a hitch,
Nested deep, yet resolved—no glitch! 🎯✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title concisely describes the main fix: awaiting nested Arrow decoding in torch tensor envelopes, which matches the primary change.
Description check ✅ Passed The description clearly relates to the changeset by referencing issue #21, explaining the async decoding fix, and mentioning the regression test added.
Linked Issues check ✅ Passed The pull request implements all requirements from issue #21: adds async envelope decoder, awaits nested decodes, and includes a regression test asserting concrete TorchTensor.data values.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing async decoding of nested Arrow ndarrays in torch tensors; no out-of-scope modifications are present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/utils/codec.ts`:
- Around line 261-369: decodeEnvelopeAsync duplicates most logic from the
synchronous decodeEnvelope; extract the shared
envelope-matching/field-extraction logic into a single core function (e.g.,
decodeEnvelopeCore) that takes a decodeArrow decoder returning T | Promise<T>
and a recurse/callback to decode nested values, then have decodeEnvelope call
the core synchronously (or wrap a sync decoder) and decodeEnvelopeAsync await
the core; update torch.tensor handling to call the provided recurse for nested
value decoding and remove the duplicated branches from decodeEnvelopeAsync so
both variants reuse the same implementation.
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e6fa8eb and ac12e19.

📒 Files selected for processing (2)
  • src/utils/codec.ts
  • test/runtime_codec.test.ts
🧰 Additional context used
🧬 Code graph analysis (2)
test/runtime_codec.test.ts (1)
src/utils/codec.ts (3)
  • registerArrowDecoder (102-106)
  • CodecEnvelope (43-90)
  • decodeValueAsync (374-376)
src/utils/codec.ts (1)
src/index.ts (1)
  • decodeValueAsync (71-71)
🔇 Additional comments (3)
src/utils/codec.ts (2)

331-346: The async torch tensor decode fix looks correct.

The key change at line 339 properly awaits the nested decodeEnvelopeAsync call, ensuring that when the inner ndarray uses Arrow encoding, the Promise is resolved before being assigned to TorchTensor.data. This directly addresses issue #21.


374-376: LGTM!

The change to use decodeEnvelopeAsync with tryDecodeArrowTable correctly enables async decoding throughout the envelope hierarchy.

test/runtime_codec.test.ts (1)

395-419: Test is correct as written—no changes needed.

The registerArrowDecoder API accepts synchronous decoders only (type signature: (bytes: Uint8Array) => ArrowTable | Uint8Array). The sync decoder in the test correctly matches this contract.

The bug fix addressed was about properly awaiting nested envelope decoding in decodeEnvelopeAsync (see line 339 in codec.ts), not about supporting async decoders. The test validates this by confirming that a torch tensor containing a nested Arrow-encoded ndarray is correctly decoded with proper await semantics in the recursive decoding path.

Likely an incorrect or invalid review comment.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +261 to +369
async function decodeEnvelopeAsync<T>(
value: unknown,
decodeArrow: (bytes: Uint8Array) => T | Promise<T>
): Promise<T | unknown> {
if (!isObject(value)) {
return value;
}
const marker = (value as { __tywrap__?: unknown }).__tywrap__;
if (
(marker === 'dataframe' || marker === 'series') &&
(value as { encoding?: unknown }).encoding === 'arrow' &&
typeof (value as { b64?: unknown }).b64 === 'string'
) {
const bytes = fromBase64(String((value as { b64: string }).b64));
return await decodeArrow(bytes);
}
if (
marker === 'dataframe' &&
(value as { encoding?: unknown }).encoding === 'json' &&
'data' in (value as object)
) {
return (value as { data: unknown }).data;
}
if (
marker === 'series' &&
(value as { encoding?: unknown }).encoding === 'json' &&
'data' in (value as object)
) {
return (value as { data: unknown }).data;
}
if (marker === 'ndarray') {
if (
(value as { encoding?: unknown }).encoding === 'arrow' &&
typeof (value as { b64?: unknown }).b64 === 'string'
) {
const bytes = fromBase64(String((value as { b64: string }).b64));
return await decodeArrow(bytes);
}
if ((value as { encoding?: unknown }).encoding === 'json' && 'data' in (value as object)) {
return (value as { data: unknown }).data;
}
}
if (
marker === 'scipy.sparse' &&
(value as { encoding?: unknown }).encoding === 'json' &&
typeof (value as { format?: unknown }).format === 'string' &&
Array.isArray((value as { shape?: unknown }).shape) &&
Array.isArray((value as { data?: unknown }).data)
) {
const sparse = value as {
format: 'csr' | 'csc' | 'coo';
shape: readonly number[];
data: readonly unknown[];
indices?: readonly number[];
indptr?: readonly number[];
row?: readonly number[];
col?: readonly number[];
dtype?: string;
};
return {
format: sparse.format,
shape: sparse.shape,
data: sparse.data,
indices: sparse.indices,
indptr: sparse.indptr,
row: sparse.row,
col: sparse.col,
dtype: sparse.dtype,
} satisfies SparseMatrix;
}
if (marker === 'torch.tensor' && (value as { encoding?: unknown }).encoding === 'ndarray') {
const torchValue = value as {
value?: unknown;
shape?: readonly number[];
dtype?: string;
device?: string;
};
if ('value' in (torchValue as object)) {
const decoded = await decodeEnvelopeAsync(torchValue.value, decodeArrow);
return {
data: decoded,
shape: torchValue.shape,
dtype: torchValue.dtype,
device: torchValue.device,
} satisfies TorchTensor;
}
}
if (
marker === 'sklearn.estimator' &&
(value as { encoding?: unknown }).encoding === 'json' &&
typeof (value as { className?: unknown }).className === 'string' &&
typeof (value as { module?: unknown }).module === 'string' &&
isObject((value as { params?: unknown }).params)
) {
const estimator = value as {
className: string;
module: string;
version?: string;
params: Record<string, unknown>;
};
return {
className: estimator.className,
module: estimator.module,
version: estimator.version,
params: estimator.params,
} satisfies SklearnEstimator;
}
return value as unknown;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Significant code duplication between decodeEnvelope and decodeEnvelopeAsync.

The new async function duplicates ~100 lines from the sync version. This creates maintenance burden—any future envelope type or logic change must be applied in both places.

Consider refactoring to share the common logic. One approach: make the core function always async-compatible by accepting a decoder that may return T | Promise<T>, then have the sync wrapper use a blocking pattern or keep sync-only paths separate.

♻️ Suggested approach to reduce duplication
// Example: Single implementation that handles both sync and async
async function decodeEnvelopeCore<T>(
  value: unknown,
  decodeArrow: (bytes: Uint8Array) => T | Promise<T>,
  recurse: (v: unknown) => T | Promise<T | unknown>
): Promise<T | unknown> {
  // ... shared envelope matching logic ...
  // For torch.tensor:
  if (marker === 'torch.tensor' && ...) {
    const decoded = await recurse(torchValue.value);
    return { data: decoded, ... };
  }
  // ...
}

// Sync version wraps with sync decoder
export function decodeValue(value: unknown): DecodedValue {
  // Use sync-only path or validate decoder is sync
}

// Async version
export async function decodeValueAsync(value: unknown): Promise<DecodedValue> {
  return decodeEnvelopeCore(value, tryDecodeArrowTable, 
    v => decodeEnvelopeCore(v, tryDecodeArrowTable, ...));
}
🤖 Prompt for AI Agents
In `@src/utils/codec.ts` around lines 261 - 369, decodeEnvelopeAsync duplicates
most logic from the synchronous decodeEnvelope; extract the shared
envelope-matching/field-extraction logic into a single core function (e.g.,
decodeEnvelopeCore) that takes a decodeArrow decoder returning T | Promise<T>
and a recurse/callback to decode nested values, then have decodeEnvelope call
the core synchronously (or wrap a sync decoder) and decodeEnvelopeAsync await
the core; update torch.tensor handling to call the provided recurse for nested
value decoding and remove the duplicated branches from decodeEnvelopeAsync so
both variants reuse the same implementation.

@bbopen
Copy link
Owner Author

bbopen commented Jan 15, 2026

Closing as duplicate of #33 (same fix for #21). #33 includes the additional tests + CodeRabbit-requested refactor to dedupe sync/async envelope decoding.

@bbopen bbopen closed this Jan 15, 2026
@bbopen bbopen deleted the fix/issue-21-async-torch-decode branch January 15, 2026 05:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:codec Area: codecs and serialization bug Something isn't working priority:p1 Priority P1 (high)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

decodeValueAsync returns Promise for torch tensor when Arrow encoding is used

2 participants