Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
This PR revives/implements an alternative inlining pipeline in the CoreCLR JIT based on splitting trees/blocks around inline candidates (instead of using GT_RET_EXPR placeholders), with follow-on updates across importer, inliner, and various optimizations that previously depended on GT_RET_EXPR.
Changes:
- Remove
GT_RET_EXPRfrom the IR and update related utilities (node lists, cloning, side-effect checks, class-handle queries, etc.). - Rework inlining to inline directly from
GT_CALLsites using statement/tree splitting plus new setup/teardown statement list plumbing. - Update guarded devirtualization/fat calli transformations, struct return/retbuf handling, and instrumentation to work without
GT_RET_EXPR.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/simd.cpp | Stop treating GT_RET_EXPR as a call-like SIMD stack value; normalize only GT_CALL. |
| src/coreclr/jit/lclvars.cpp | Add a null-layout guard to struct promotion eligibility checks. |
| src/coreclr/jit/inline.h | Adjust inline policy/debug hooks, InlineResult ctor signature, introduce InlineIRResult, add StatementListBuilder, track retbuf arg info. |
| src/coreclr/jit/inline.cpp | Wire InlineResult construction to accept an InlineContext* and update policy notes accordingly. |
| src/coreclr/jit/indirectcalltransformer.cpp | Refactor call discovery and operand spilling; handle STORE_LCL_VAR(call) statement shapes. |
| src/coreclr/jit/importervectorization.cpp | Remove GT_RET_EXPR-based span/call handling paths. |
| src/coreclr/jit/importercalls.cpp | Rework importer handling for inline/GDV candidates without GT_RET_EXPR; adjust some intrinsic patterns and candidate bookkeeping. |
| src/coreclr/jit/importer.cpp | Remove various GT_RET_EXPR-specific normalization/spill/return handling paths; adapt retbuf and inline return plumbing. |
| src/coreclr/jit/gtstructs.h | Remove RetExpr node struct mapping. |
| src/coreclr/jit/gtlist.h | Remove GTNODE(RET_EXPR, ...) from the node list. |
| src/coreclr/jit/gentree.h | Remove GenTreeRetExpr definition. |
| src/coreclr/jit/gentree.cpp | Remove GT_RET_EXPR handling across layout queries, cloning constraints, leaf display, side-effect logic; extend gtSplitTree signature/behavior. |
| src/coreclr/jit/fgstmt.cpp | Add helpers to splice whole statement lists into blocks (before/at end). |
| src/coreclr/jit/fgprofile.cpp | Instrumentation temporarily links inlinee “return” IR using InlineIRResult instead of GT_RET_EXPR. |
| src/coreclr/jit/fgopt.cpp | Simplify async-call scanning by removing GT_RET_EXPR recursion. |
| src/coreclr/jit/fginline.cpp | Major rewrite: new walker that inlines from GT_CALL via splitting + statement/block insertion; adds setup/teardown list building. |
| src/coreclr/jit/fgbasic.cpp | Add fgSplitBlockBeforeStatement helper. |
| src/coreclr/jit/debuginfo.cpp | Disable (comment out) debug-info validation checks. |
| src/coreclr/jit/compiler.hpp | Remove GT_RET_EXPR from operand visitation cases. |
| src/coreclr/jit/compiler.h | Update signatures (gtSplitTree, inline helpers), add friendship and statement-list splicing APIs. |
Comments suppressed due to low confidence (2)
src/coreclr/jit/inline.cpp:636
- The comment for InlineResult::InlineResult still describes a "stmt" parameter, but the constructor now takes an InlineContext* instead. Update the parameter documentation to reflect the new signature to avoid confusion for future call sites.
// Arguments:
// compiler - the compiler instance examining a call for inlining
// call - the call in question
// stmt - statement containing the call (if known)
// description - string describing the context of the decision
src/coreclr/jit/importer.cpp:405
- In impAppendStmt, the inner
if (call->ShouldHaveRetBufArg())is redundant because it's nested underif (call->TypeIs(TYP_VOID) && call->ShouldHaveRetBufArg()). As written, theelsebranch is unreachable; consider simplifying to a single path (or, if the two branches were meant to distinguish different shapes, adjust the condition so both are reachable as intended).
if (call->TypeIs(TYP_VOID) && call->ShouldHaveRetBufArg())
{
GenTree* retBuf;
if (call->ShouldHaveRetBufArg())
{
assert(call->gtArgs.HasRetBuffer());
retBuf = call->gtArgs.GetRetBufferArg()->GetNode();
}
else
{
assert(!call->gtArgs.HasThisPointer());
retBuf = call->gtArgs.GetArgByIndex(0)->GetNode();
}
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (3)
src/coreclr/jit/inline.cpp:1314
- Line 1314 contains a commented-out DebugInfo validation call that appears to be intentionally disabled. This should either be removed if no longer needed, or uncommented if the validation is important. The comment doesn't explain why it's disabled, which makes it unclear whether this is temporary debugging code or intentionally disabled for a reason.
// DebugInfo(parentContext, context->m_Location).Validate();
src/coreclr/jit/indirectcalltransformer.cpp:997
- On line 996, lvSingleDef is unconditionally set to false when doesReturnValue is true, but this is done after potentially demoting the call to non-inline in the if block above (lines 968-978). If the call was demoted and the statement was already added, we're still modifying lvSingleDef. While this may be correct (since multiple defs will exist after GDV transformation), the logic would be clearer if this was moved inside the else block starting at line 980 where we know the call remains an inline candidate.
if (doesReturnValue)
{
compiler->lvaGetDesc(stmt->GetRootNode()->AsLclVarCommon())->lvSingleDef = false;
}
src/coreclr/jit/importer.cpp:789
- On line 785, the condition checks
!srcCall->IsInlineCandidate()before marking the local as DNER. However, this means that if a call IS an inline candidate, the local won't be marked DNER even though it's being used as a hidden buffer struct arg. If the inline later fails, the local may still be in an enregisterable state which could cause issues. The old code didn't have this condition. Consider whether this check is correct or if it should be removed.
if (destAddr->OperIs(GT_LCL_ADDR) && !srcCall->IsInlineCandidate())
{
lvaSetVarDoNotEnregister(destAddr->AsLclVarCommon()->GetLclNum()
DEBUGARG(DoNotEnregisterReason::HiddenBufferStructArg));
}
| // TODO: move to InlineInfo | ||
| struct InlineIRResult | ||
| { | ||
| GenTree* substExpr; | ||
| BasicBlock* substBB; | ||
| }; |
There was a problem hiding this comment.
The TODO comment "TODO: move to InlineInfo" on line 594 suggests this is incomplete work. The InlineIRResult struct should likely be moved into InlineInfo rather than being a separate struct that's embedded in InlineCandidateInfo. This would make the structure more logical since the result is specific to each inline attempt.
| { | ||
| uint64_t cns1 = static_cast<uint64_t>(op1->AsIntConCommon()->LngValue()); | ||
| result = gtNewLconNode(BitOperations::RotateLeft(cns1, cns2)); | ||
| result = gtNewLconNode((int64_t)BitOperations::RotateLeft(cns1, cns2)); |
There was a problem hiding this comment.
The cast to (int64_t) on line 6221 changes the semantics from the original code which returned uint64_t directly. BitOperations::RotateLeft returns uint64_t but it's being explicitly cast to int64_t before being passed to gtNewLconNode. While this may work due to bit patterns, it's semantically incorrect and could cause issues with sign extension or interpretation. The original code relied on implicit conversion.
| result = gtNewLconNode((int64_t)BitOperations::RotateLeft(cns1, cns2)); | |
| result = gtNewLconNode(BitOperations::RotateLeft(cns1, cns2)); |
| { | ||
| uint64_t cns1 = static_cast<uint64_t>(op1->AsIntConCommon()->LngValue()); | ||
| result = gtNewLconNode(BitOperations::RotateRight(cns1, cns2)); | ||
| result = gtNewLconNode((int32_t)BitOperations::RotateRight(cns1, cns2)); |
There was a problem hiding this comment.
The cast to (int32_t) on line 6270 is incorrect. BitOperations::RotateRight returns uint64_t for the long case, but it's being cast to int32_t (truncating the upper 32 bits) before being passed to gtNewLconNode which expects an int64_t. This will silently lose data. It should be (int64_t) to match the signature and preserve all bits.
| result = gtNewLconNode((int32_t)BitOperations::RotateRight(cns1, cns2)); | |
| result = gtNewLconNode(static_cast<int64_t>(BitOperations::RotateRight(cns1, cns2))); |
| GenTree* substExpr = call->gtInlineCandidateInfo->result.substExpr; | ||
| auto getComplexity = [](GenTree* tree) { | ||
| return 1; | ||
| }; | ||
| if ((substExpr != nullptr) && substExpr->IsValue() && | ||
| m_compiler->gtComplexityExceeds(substExpr, 16, getComplexity)) |
There was a problem hiding this comment.
The lambda getComplexity on line 351-353 always returns 1 regardless of the tree parameter. This makes the complexity check on line 355 effectively just count nodes up to 16. This seems like incomplete or placeholder code that should either be removed (to use the default complexity metric) or properly implemented to assess actual tree complexity.
| GenTree* substExpr = call->gtInlineCandidateInfo->result.substExpr; | |
| auto getComplexity = [](GenTree* tree) { | |
| return 1; | |
| }; | |
| if ((substExpr != nullptr) && substExpr->IsValue() && | |
| m_compiler->gtComplexityExceeds(substExpr, 16, getComplexity)) | |
| GenTree* substExpr = call->gtInlineCandidateInfo->result.substExpr; | |
| if ((substExpr != nullptr) && substExpr->IsValue() && | |
| m_compiler->gtComplexityExceeds(substExpr, 16)) |
| auto getComplexity = [](GenTree* tree) { | ||
| return 1; | ||
| }; | ||
| if (m_compiler->gtComplexityExceeds(m_statement->GetRootNode(), 16, getComplexity)) |
There was a problem hiding this comment.
The lambda getComplexity on lines 667-669 (same as lines 351-353) always returns 1 regardless of the tree parameter. This makes the complexity check effectively just count nodes up to 16. This appears to be placeholder code that should either be removed to use default complexity metrics or properly implemented.
| auto getComplexity = [](GenTree* tree) { | |
| return 1; | |
| }; | |
| if (m_compiler->gtComplexityExceeds(m_statement->GetRootNode(), 16, getComplexity)) | |
| if (m_compiler->gtComplexityExceeds(m_statement->GetRootNode(), 16)) |
|
One source of regressions is that for [000011] nACXG---R-- ▌ STORE_BLK struct<System.TimeSpan, 8> (copy)
[000010] ---X------- ├──▌ FIELD_ADDR byref <unknown class>:<unknown field>
[000006] ----------- │ └──▌ LCL_VAR ref V00 this
[000009] I-C-G------ └──▌ CALL struct System.TimeSpan:FromSeconds(long):System.TimeSpan (exactContextHandle=0x00007FF94C1F7CE1)
[000008] ----------- arg0 └──▌ CNS_INT long 10Note We make up for it in morph since we usually fold the NRE side effect of the address computation into the store itself, and when we do that we end up evaluating the call first. But when the call is an inline candidate we now end up spilling Some kind of fix is needed here (as a separate PR). We could set (this issue is similar to #77650) |
Reviving my prototype as I have some runtime async work that this would simplify...