Skip to content

Support different ops at same call site with compile=True#6378

Open
rostan-t wants to merge 2 commits into
NVIDIA:mainfrom
rostan-t:ndd-compile-polymorphic-call-site
Open

Support different ops at same call site with compile=True#6378
rostan-t wants to merge 2 commits into
NVIDIA:mainfrom
rostan-t:ndd-compile-polymorphic-call-site

Conversation

@rostan-t
Copy link
Copy Markdown
Collaborator

Category:

Bug fix (non-breaking change which fixes an issue)

Description:

Currently, with transparent pipelining, the following patterns is silently incorrect:

for op in [ndd.flip, ndd.sphere]:
    out = op(images)

When a call site is reached multiple times during capture, we check if the arguments differ. If they don't, we return the already registered nodes. This fails because the operator itself isn't check.

This PR allows the call trie to track multiple operators on the same call site.

Additional information:

Affected modules and functionalities:

Transparent pipelining

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 29, 2026

Greptile Summary

This PR fixes a silent correctness bug in transparent pipelining (compile=True): when different operator types were called at the same call site (e.g. inside a loop over [ndd.flip, ndd.sphere]), the _CallTrie keyed nodes only by call chain, causing the first op's node to be silently reused for all subsequent ops at that site. The fix keys each trie leaf by (call chain, operator type) via a dict[type[Operator], CompileNode], so each op gets its own node regardless of site sharing.

  • _CallTrie.node (single slot) replaced by _CallTrie.nodes (dict keyed by op type); insert, find, and lookup all gain an op parameter.
  • get_compiled_result and _compile_intercept updated to pass op_class through; the walrus-operator result check is replaced by an explicit is not None guard, and device is added to the tracing-phase match predicate in record.
  • A new test_compile_different_ops_same_call_site test is added, and all existing zip calls are tightened to strict=True.

Confidence Score: 5/5

Safe to merge — the change is narrowly scoped to the call-trie lookup path and the fix correctly keys compiled nodes by both call chain and operator type.

The trie restructuring is straightforward: a single node slot becomes a dict keyed by operator class, and the parameter threads cleanly through every call site. The new test exercises the previously-broken pattern across three compiled epochs and compares against a dynamic reference, giving good confidence in the fix. No regressions are visible in the existing test suite changes.

No files require special attention.

Important Files Changed

Filename Overview
dali/python/nvidia/dali/experimental/dynamic/_compile.py Core fix: _CallTrie nodes dict keyed by op type; op_class threaded through find/lookup/insert and get_compiled_result; device added to record match check; walrus replaced with explicit is-not-None guard. Logic is correct.
dali/test/python/experimental_mode/test_compile.py New test correctly covers the multi-op same-call-site scenario with reference comparison against dynamic results over 3 compiled epochs; zip(strict=True) applied throughout.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[op call enters _compile_intercept] --> B{compile_ctx present?}
    B -- No --> C[Dynamic fn_call]
    B -- Yes --> D{State?}
    D -- COMPILED --> E[get_compiled_result\nframe, op_class, inputs, kwargs]
    E --> F{_call_trie.lookup\nframe, op_class}
    F -- None --> G[Dynamic fn_call fallback]
    F -- node found --> H{inputs/kwargs/device match?}
    H -- No --> G
    H -- Yes --> I[Return _pipeline_results node]
    D -- TRACING --> J[fn_call dynamic]
    J --> K[classify call site]
    K --> L[record\ncall_chain, op_class, backend, inputs, kwargs]
    L --> M{_call_trie.find\ncall_chain, op_class}
    M -- None --> N[Create new CompileNode\ninsert into trie + nodes list]
    M -- existing found --> O{inputs == existing.inputs\nAND kwargs match\nAND device match?}
    O -- Yes --> P[Return existing node]
    O -- No --> Q[Return None\nno CompiledBatch]
    N --> R[Wrap result in CompiledBatch]
    P --> R
Loading

Reviews (2): Last reviewed commit: "Use strict=True in zip for test result c..." | Re-trigger Greptile

Comment thread dali/test/python/experimental_mode/test_compile.py Outdated
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant