Skip direct-dominating live-in optimization for loop headers by guosran · Pull Request #281 · coredac/dataflow

guosran · 2026-03-08T05:01:51Z

Summary

This pr addresses issue #270.

Problem

In nested loop kernels, values defined in outer loop blocks and used in inner loop bodies were incorrectly classified as "direct dominating live-ins" by CanonicalizeLiveInPass. This prevented them from being promoted to block arguments, which in turn caused TransformCtrlToDataFlowPass to not create inner-rate PHI_START operations for them.

Fix

Before applying the direct-dominating optimization, check whether the using block is a loop header. If so, skip the optimization and let the value be promoted to a block argument.

Result

Before the fix, %14 (the outer-rate GEP) has no reserve/phi_start for the pointer. It is fed directly into the inner-loop body via a plain data_mov, with no rate-matching:

// Outer-rate GEP — runs once per outer iteration (index_per_ii=3)
%14 = "neura.gep"(%13) <{operandSegmentSizes = array<i32: 0, 1>}> {dfg_id = 42 : i32,
    lhs_value = "%arg0",
    mapping_locs = [{id = 0 : i32, index_per_ii = 3 : i32, ...}]}
    : (!neura.data<i64, i1>) -> !neura.data<!llvm.ptr, i1>

// No reserve + phi_start for the pointer — the value is forwarded as-is
%30 = "neura.data_mov"(%14) {dfg_id = 46 : i32,
    mapping_locs = [{id = 0 : i32, index_per_ii = 3 : i32, ...}]}
    : (!neura.data<!llvm.ptr, i1>) -> !neura.data<!llvm.ptr, i1>

// Inner-loop GEP directly consumes the outer-rate data_mov (index_per_ii=4)
%32 = "neura.gep"(%30, %31) <{operandSegmentSizes = array<i32: 1, 1>}> {dfg_id = 49 : i32,
    mapping_locs = [{id = 1 : i32, index_per_ii = 4 : i32, ...}]}
    : (!neura.data<!llvm.ptr, i1>, !neura.data<i64, i1>) -> !neura.data<!llvm.ptr, i1>

After the fix, %14 is promoted to a block argument. A reserve + phi_start pair is generated, giving the pointer an inner-rate PHI_START

// Outer-rate GEP — same as before (index_per_ii=3)
%14 = "neura.gep"(%13) <{operandSegmentSizes = array<i32: 0, 1>}> {dfg_id = 43 : i32,
    lhs_value = "%arg0",
    mapping_locs = [{id = 0 : i32, index_per_ii = 3 : i32, ...}]}
    : (!neura.data<i64, i1>) -> !neura.data<!llvm.ptr, i1>

// reserve + phi_start — inner-rate slot for the pointer (NEW)
%24 = neura.reserve {dfg_id = 8 : i32} : !neura.data<!llvm.ptr, i1>
%25 = "neura.data_mov"(%14) {dfg_id = 47 : i32,
    mapping_locs = [{id = 1 : i32, index_per_ii = 3 : i32, ...}]}
    : (!neura.data<!llvm.ptr, i1>) -> !neura.data<!llvm.ptr, i1>
%26 = neura.phi_start %25, %24 {dfg_id = 50 : i32,
    mapping_locs = [{id = 4 : i32, index_per_ii = 4 : i32, ...}]}
    : !neura.data<!llvm.ptr, i1>, !neura.data<!llvm.ptr, i1> -> !neura.data<!llvm.ptr, i1>

// Inner-loop GEP now consumes the rate-matched phi_start output (index_per_ii=6)
%35 = "neura.gep"(%33, %34) <{operandSegmentSizes = array<i32: 1, 1>}> {dfg_id = 63 : i32,
    mapping_locs = [{id = 5 : i32, index_per_ii = 6 : i32, ...}]}
    : (!neura.data<!llvm.ptr, i1>, !neura.data<i64, i1>) -> !neura.data<!llvm.ptr, i1>

guosran · 2026-03-08T05:05:11Z

Commits from the previous unmerged pr is shown here. This pr should start from 1f60123 (the first failed commit).

Copilot

Pull request overview

This PR fixes an SSA live-in canonicalization bug impacting nested loops by preventing the “direct-dominating live-in” optimization from firing on loop headers, enabling correct promotion to block arguments and subsequent inner-rate PHI_START creation during ctrl-to-data-flow lowering. It also renames/rewires mapping and task placement passes and updates affected tests/docs accordingly.

Changes:

Update CanonicalizeLiveInPass to skip the direct-dominating-live-in optimization for loop headers (fix for #270).
Rename MapToAcceleratorPass → MapOperationOnTilePass and update pass registration, build files, docs, and tests.
Replace MapTaskOnCgraPass with AllocateCgraToTaskPass and invoke global placement from ResourceAwareTaskOptimizationPass, updating taskflow tests.

Reviewed changes

Copilot reviewed 43 out of 46 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
test/neura/steer_ctrl/loop_with_return_value.mlir	Updates disabled mapping pass invocation name in test pipeline comments.
test/neura/fusion/test.mlir	Updates mapping pass name and adjusts FileCheck IDs after mapping changes.
test/neura/for_loop/relu_test.mlir	Updates mapping pass flag name in RUN line.
test/neura/ctrl/branch_for.mlir	Updates mapping pass flag name in RUN lines.
test/multi-cgra/taskflow/resource-heavy/resource-heavy.mlir	Splits long `RESOPT` attribute checks for readability/stability.
test/multi-cgra/taskflow/resnet/simple_resnet_tosa.mlir	Splits long `RESOPT` attribute checks for readability/stability.
test/multi-cgra/taskflow/parallel-nested/parallel-nested.mlir	Switches to `--allocate-cgra-to-task` and updates RESOPT/placement checks.
test/multi-cgra/taskflow/multi-nested/multi-nested.mlir	Switches to `--allocate-cgra-to-task` and updates placement + RESOPT expectations.
test/multi-cgra/taskflow/irregular-loop/irregular-loop.mlir	Switches to `--allocate-cgra-to-task` and splits `RESOPT` checks.
test/multi-cgra/kernel_mapping/relu/relu.mlir	Updates mapping pass flag name in RUN line.
test/multi-cgra/kernel_mapping/loop-in-kernel/loop-in-kernel.mlir	Updates mapping pass flag name in RUN line.
test/multi-cgra/kernel_mapping/fir/fir.mlir	Updates mapping pass flag name in RUN line.
test/mapping_quality/tiny_loop.mlir	Updates mapping pass flag name in RUN lines.
test/mapping_quality/branch_for.mlir	Updates mapping pass flag name in RUN lines.
test/honor_arch/fir_removed_tiles_test.mlir	Updates mapping pass flag name in RUN line.
test/e2e/relu/relu_kernel.mlir	Updates mapping pass flag name in RUN line.
test/e2e/histogram/histogram_kernel.mlir	Updates mapping pass flag name in RUN line.
test/e2e/gemv/gemv_kernel.mlir	Updates mapping pass flag name and refreshes extensive mapping/YAML/ASM FileCheck expectations.
test/e2e/fir/fir_kernel_vec.mlir	Updates mapping pass flag name in RUN line.
test/e2e/fir/fir_kernel.mlir	Updates mapping pass flag name in RUN line.
test/e2e/axpy/axpy_kernel.mlir	Updates mapping pass flag name in RUN line.
test/controflow_fuse/simple_loop_reduction/simple_loop_reduction.mlir	Updates mapping pass flag name in RUN line.
test/controflow_fuse/simple_loop/simple_loop.mlir	Updates mapping pass flag name in RUN line.
test/controflow_fuse/perfect_nested/perfect_nested.mlir	Updates mapping pass flag name in RUN line.
test/code_gen/test_code_generate.mlir	Updates mapping pass flag name in RUN line.
test/c2llvm2mlir/simple_loop/test.mlir	Updates mapping pass flag name in RUN line.
test/c2llvm2mlir/nested_loop/test.mlir	Updates mapping pass flag name in RUN line.
test/arch_spec/README.md	Updates documentation to use `--map-operation-on-tile`.
lib/TaskflowDialect/Transforms/Optimizations/ResourceAwareTaskOptimizationPass.cpp	Renames mapper references and invokes AllocateCgraToTask-based placement at convergence.
lib/TaskflowDialect/Transforms/CMakeLists.txt	Swaps in `AllocateCgraToTaskPass.cpp` and adds Optimizations subdirectory.
lib/TaskflowDialect/Transforms/AllocateCgraToTaskPass.cpp	Implements new task+SRAM placement pass with multi-CGRA placement search.
lib/TaskflowDialect/CMakeLists.txt	Adjusts subdirectory structure so Optimizations is included via Transforms.
lib/NeuraDialect/Transforms/MapOperationOnTilePass.cpp	Renames mapper pass (class/argument/logs) and updates factory function.
lib/NeuraDialect/Transforms/CanonicalizeLiveInPass.cpp	Skips direct-dominating-live-in optimization when the using block is a loop header.
lib/NeuraDialect/Transforms/CMakeLists.txt	Replaces `MapToAcceleratorPass.cpp` with `MapOperationOnTilePass.cpp`.
lib/NeuraDialect/NeuraPasses.cpp	Updates conversion pipeline to run `createMapOperationOnTilePass()`.
include/TaskflowDialect/TaskflowPasses.td	Renames pass definition to `allocate-cgra-to-task`.
include/TaskflowDialect/TaskflowPasses.h	Updates pass factory name and adds `runAllocateCgraToTask()` helper declaration.
include/NeuraDialect/NeuraPasses.td	Renames pass definition to `map-operation-on-tile` and updates constructor name.
include/NeuraDialect/NeuraPasses.h	Renames pass factory API to `createMapOperationOnTilePass`.
include/NeuraDialect/Architecture/Architecture.h	Updates reference in comment to new mapper pass name.

Comments suppressed due to low confidence (4)

lib/TaskflowDialect/Transforms/AllocateCgraToTaskPass.cpp:285

placement.primary() returns {-1,-1} when placement.cgra_positions is empty, but the code still pushes it into task_node->placement. That can result in invalid task_mapping_info (row/col = -1) and can skew the SRAM centroid computation in later iterations. Handle the no-placement case explicitly (e.g., signal pass failure / emit an error, or keep the task unannotated and skip SRAM assignment) before committing any placement.
lib/TaskflowDialect/Transforms/AllocateCgraToTaskPass.cpp:281
When a placement can’t be found for cgra_count, the pass falls back to cgra_count-1 but does not update the task’s cgra_count (or tile_shape) attribute accordingly. That can leave IR in an inconsistent state where cgra_count disagrees with the number of entries in task_mapping_info.cgra_positions, which downstream passes may rely on. Consider either failing the pass, or updating the attributes to match the chosen fallback placement (and possibly iterating fallback down to 1).
lib/TaskflowDialect/Transforms/AllocateCgraToTaskPass.cpp:432
parseTileShapeOffsets() is introduced but never used, and the placement search currently ignores each task’s tile_shape attribute (even though other code paths/comments imply placement should respect it). If tile_shape is intended to constrain the physical shape, wire this helper into findBestPlacement() (read the task’s tile_shape StringAttr, validate it matches cgra_count, then call tryPlaceShape() with those offsets). Otherwise, remove the dead helper / update comments to reflect that the allocator chooses shapes autonomously.
lib/TaskflowDialect/Transforms/AllocateCgraToTaskPass.cpp:553
The non-rectangular shape search uses a 64-bit bitmask (1ULL << (nr * grid_cols_ + nc)). If grid_rows * grid_cols >= 64, this shift becomes undefined and the mask cannot represent all cells. Add a guard/assert limiting the supported grid size (e.g., 4x4) or switch to a wider/dynamic bitset representation for visited states.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

include/TaskflowDialect/TaskflowPasses.td

lib/NeuraDialect/Transforms/CanonicalizeLiveInPass.cpp

lib/TaskflowDialect/Transforms/Optimizations/ResourceAwareTaskOptimizationPass.cpp

include/NeuraDialect/NeuraPasses.td

test/e2e/gemv/gemv_kernel.mlir

Jackcuii

work perfectly on simulator, good job bro!

Values crossing from outer blocks to inner loop headers were marked as direct-dominating live-ins, preventing block-arg promotion. This caused missing inner-rate PHI_STARTs in the dataflow IR, starving inner-loop operations of valid data each cycle. Added back-edge check: if the using block has a predecessor it dominates (i.e. it is a loop header), the value is promoted to a block argument.

tancheng · 2026-03-09T18:18:34Z

Summary

This pr addresses issue #270.

Problem

In nested loop kernels, values defined in outer loop blocks and used in inner loop bodies were incorrectly classified as "direct dominating live-ins" by CanonicalizeLiveInPass. This prevented them from being promoted to block arguments, which in turn caused TransformCtrlToDataFlowPass to not create inner-rate PHI_START operations for them.

Fix

Before applying the direct-dominating optimization, check whether the using block is a loop header. If so, skip the optimization and let the value be promoted to a block argument.

Result

After the fix, the GEP pointer is correctly promoted to a block argument in ^bb4:
^bb1:
  %6 = neura.gep(%arg0, shl(i, 4))
  neura.br ..., %6 : !llvm.ptr, ... to ^bb4

^bb4(%15: i64, %16: i32, %17: !llvm.ptr, ...):  // %17 is now a block arg
  gep(%17, %15)
And TransformCtrlToDataFlowPass creates an inner-rate PHI_START for it:
%9  = neura.gep(%arg0, shl(i, 4))          // outer-rate GEP
%17 = neura.phi_start %9, %16              // inner-rate PHI_START for the pointer
%22 = neura.gep(%17, %21)                  // inner-loop body uses inner-rate pointer
TODO

Have not updated all failed checks yet, will do so once the changes are reviewed and approved. gemv has been updated.

Can we also provide the IRs about what it would look like w/o this fix (in the Problem section of the PR description)?

guosran · 2026-03-09T18:40:50Z

Can we also provide the IRs about what it would look like w/o this fix (in the Problem section of the PR description)?

Updated.

lib/NeuraDialect/Transforms/CanonicalizeLiveInPass.cpp

Copilot AI review requested due to automatic review settings March 8, 2026 05:01

Copilot started reviewing on behalf of guosran March 8, 2026 05:02 View session

Copilot AI reviewed Mar 8, 2026

View reviewed changes

tancheng requested review from Jackcuii, ShangkunLi and n0thingNoob March 8, 2026 05:24

guosran force-pushed the fix/nested-loop-live-in-bug branch from 1f60123 to fa5fb36 Compare March 8, 2026 05:31

Jackcuii approved these changes Mar 9, 2026

View reviewed changes

guosran force-pushed the fix/nested-loop-live-in-bug branch from fa5fb36 to 19f5088 Compare March 9, 2026 18:09

tancheng approved these changes Mar 9, 2026

View reviewed changes

lib/NeuraDialect/Transforms/CanonicalizeLiveInPass.cpp Outdated Show resolved Hide resolved

guosran added 4 commits March 10, 2026 05:35

Updated tests

444f39b

update files

5451143

update tests

0ff7a92

Update comment

89e2142

guosran merged commit dc75783 into main Mar 10, 2026
1 check passed

guosran deleted the fix/nested-loop-live-in-bug branch March 10, 2026 01:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip direct-dominating live-in optimization for loop headers#281

Skip direct-dominating live-in optimization for loop headers#281
guosran merged 5 commits intomainfrom
fix/nested-loop-live-in-bug

guosran commented Mar 8, 2026 •

edited

Loading

Uh oh!

guosran commented Mar 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jackcuii left a comment

Uh oh!

tancheng commented Mar 9, 2026

Summary

Problem

Fix

Result

TODO

Uh oh!

guosran commented Mar 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

guosran commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Fix

Result

Uh oh!

guosran commented Mar 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jackcuii left a comment

Choose a reason for hiding this comment

Uh oh!

tancheng commented Mar 9, 2026

Summary

Problem

Fix

Result

TODO

Uh oh!

guosran commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

guosran commented Mar 8, 2026 •

edited

Loading

guosran commented Mar 9, 2026 •

edited

Loading