Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 101 additions & 2 deletions src/coreclr/jit/codegenwasm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@
static const int LINEAR_MEMORY_INDEX = 0;

#ifdef TARGET_64BIT
static const instruction INS_I_load = INS_i64_load;
static const instruction INS_I_store = INS_i64_store;
static const instruction INS_I_const = INS_i64_const;
static const instruction INS_I_add = INS_i64_add;
static const instruction INS_I_mul = INS_i64_mul;
Expand All @@ -21,6 +23,8 @@ static const instruction INS_I_le_u = INS_i64_le_u;
static const instruction INS_I_ge_u = INS_i64_ge_u;
static const instruction INS_I_gt_u = INS_i64_gt_u;
#else // !TARGET_64BIT
static const instruction INS_I_load = INS_i32_load;
static const instruction INS_I_store = INS_i32_store;
static const instruction INS_I_const = INS_i32_const;
static const instruction INS_I_add = INS_i32_add;
static const instruction INS_I_mul = INS_i32_mul;
Expand Down Expand Up @@ -427,7 +431,9 @@ void CodeGen::WasmProduceReg(GenTree* node)
//
// If the operand is a candidate, we use that candidate's current register.
// Otherwise it must have been allocated into a temporary register initialized
// in 'WasmProduceReg'.
// in 'WasmProduceReg'. To do this, set the LIR::Flags::MultiplyUsed flag during
// lowering or other pre-regalloc phases, and ensure that regalloc is updated to
// call CollectReferences on the node(s) that need to be used multiple times.
//
// Arguments:
// operand - The operand node
Expand Down Expand Up @@ -2420,9 +2426,102 @@ void CodeGen::genCodeForStoreBlk(GenTreeBlk* blkOp)
}
}

//------------------------------------------------------------------------
// genCodeForCpObj: Produce code for a GT_STORE_BLK node that represents a cpobj operation.
//
// Arguments:
// cpObjNode - the node
//
void CodeGen::genCodeForCpObj(GenTreeBlk* cpObjNode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably can never get here if we extend TryLowerBlockStoreAsGcBulkCopyCall to never bail. So all cpobj will go through the batch helper

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that bad?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that bad?

I mean for simplicity. The bulk helper is likely a bit less efficient for small object copies with just 1 gc ref where we can emit just one write-barrier, otherwise it's better. It also is simdified.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I misunderstood. So for a struct that has i.e. 4 gc ptrs at the head and then some random ints and floats at the tail, TryLowerBlockStoreAsGcBulkCopyCall will just lower this away entirely in theory. I've been testing with interleaved ptrs where a bulk one isn't possible.
I was definitely thinking we'd want to use bulk copies if possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TryLowerBlockStoreAsGcBulkCopyCall can handle any layout, even if there are no gc refs at all (although, in that case we should never be in cpobj, because we rely on the fact that the size is divisble by pointer size).

{
NYI_WASM("genCodeForCpObj");
GenTree* dstAddr = cpObjNode->Addr();
GenTree* source = cpObjNode->Data();
var_types srcAddrType = TYP_BYREF;

assert(source->isContained());
if (source->OperIs(GT_IND))
{
Comment on lines +2429 to +2443
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces a new WASM codegen path for GT_STORE_BLK/cpobj with GC references (including per-slot write barriers). There doesn’t appear to be a targeted test exercising the new WASM-only BlkOpKindCpObjUnroll path (especially the interleaved-GC-ptr case from the PR description), which makes regressions likely. Please add a focused JIT test that runs under WASM and covers struct copies with interleaved refs (and the stack-only copy case).

Copilot uses AI. Check for mistakes.
source = source->gtGetOp1();
assert(!source->isContained());
srcAddrType = source->TypeGet();
}

noway_assert(source->IsLocal());

// If the destination is on the stack we don't need the write barrier.
bool dstOnStack = cpObjNode->IsAddressNotOnHeap(m_compiler);
// We should have generated a memory.copy for this scenario in lowering.
assert(!dstOnStack);

#ifdef DEBUG
assert(!dstAddr->isContained());

// This GenTree node has data about GC pointers, this means we're dealing
// with CpObj.
assert(cpObjNode->GetLayout()->HasGCPtr());
#endif // DEBUG

genConsumeOperands(cpObjNode);

regNumber srcReg = GetMultiUseOperandReg(source);
regNumber dstReg = GetMultiUseOperandReg(dstAddr);

if (cpObjNode->IsVolatile())
{
// TODO-WASM: Memory barrier
}

// TODO: Do we need an implicit null check here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes


emitter* emit = GetEmitter();

ClassLayout* layout = cpObjNode->GetLayout();
unsigned slots = layout->GetSlotCount();

emitAttr attrSrcAddr = emitActualTypeSize(srcAddrType);
emitAttr attrDstAddr = emitActualTypeSize(dstAddr->TypeGet());

unsigned gcPtrCount = cpObjNode->GetLayout()->GetGCPtrCount();

unsigned i = 0, offset = 0;
while (i < slots)
{
// Copy the pointer-sized non-gc-pointer slots one at a time using regular I-sized load/store pairs,
// and gc-pointer slots using a write barrier.
if (!layout->IsGCPtr(i))
{
// Do a pointer-sized load+store pair at the appropriate offset relative to dest and source
emit->emitIns_I(INS_local_get, attrDstAddr, WasmRegToIndex(dstReg));
emit->emitIns_I(INS_local_get, attrSrcAddr, WasmRegToIndex(srcReg));
emit->emitIns_I(INS_I_load, EA_PTRSIZE, offset);
emit->emitIns_I(INS_I_store, EA_PTRSIZE, offset);
}
else
{
// Compute the actual dest/src of the slot being copied to pass to the helper.
emit->emitIns_I(INS_local_get, attrDstAddr, WasmRegToIndex(dstReg));
emit->emitIns_I(INS_I_const, attrDstAddr, offset);
emit->emitIns(INS_I_add);
emit->emitIns_I(INS_local_get, attrSrcAddr, WasmRegToIndex(srcReg));
emit->emitIns_I(INS_I_const, attrSrcAddr, offset);
emit->emitIns(INS_I_add);
// Call the byref assign helper. On other targets this updates the dst/src regs but here it won't,
// so we have to do the local.get+i32.const+i32.add dance every time.
genEmitHelperCall(CORINFO_HELP_ASSIGN_BYREF, 0, EA_PTRSIZE);
gcPtrCount--;
Comment on lines +2502 to +2511
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In genCodeForCpObj, the CORINFO_HELP_ASSIGN_BYREF helper call is emitted without pushing the implicit SP argument required by the WASM managed calling convention (see the contract documented in genEmitHelperCall). This will shift the helper arguments and can easily explain the current runtime crash. Push the stack pointer local first (below dst/src) before calling genEmitHelperCall so the helper receives (sp, dst, src) in the expected order.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are going to do something so that HELP_ASSIGN_BYREF specifically only takes two args (dest and src) to reduce code size and improve performance for write barriers. That's not implemented yet though.

Comment on lines +2508 to +2511
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

genEmitHelperCall for WASM explicitly requires the shadow-stack SP argument to be pushed first on the stack (see the comment in CodeGen::genEmitHelperCall). The CORINFO_HELP_ASSIGN_BYREF call here is emitted after pushing only dst/src, so the helper will receive the wrong arguments and this matches the crash described in the PR. Push the SP local (WasmShadowStackPointer) below dst/src before invoking genEmitHelperCall, or otherwise adjust the helper ABI in tandem so the call signature matches what the runtime expects.

Copilot uses AI. Check for mistakes.
}
++i;
offset += TARGET_POINTER_SIZE;
}

assert(gcPtrCount == 0);

if (cpObjNode->IsVolatile())
{
// TODO-WASM: Memory barrier
}

WasmProduceReg(cpObjNode);
}

//------------------------------------------------------------------------
Expand Down
4 changes: 4 additions & 0 deletions src/coreclr/jit/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -2566,6 +2566,10 @@ class Compiler
friend class ReplaceVisitor;
friend class FlowGraphNaturalLoop;

#ifdef TARGET_WASM
friend class WasmRegAlloc; // For m_pLowering
#endif

#ifdef FEATURE_HW_INTRINSICS
friend struct GenTreeHWIntrinsic;
friend struct HWIntrinsicInfo;
Expand Down
2 changes: 1 addition & 1 deletion src/coreclr/jit/gentree.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12844,7 +12844,7 @@ void Compiler::gtDispTree(GenTree* tree,

#ifdef TARGET_WASM
case GenTreeBlk::BlkOpKindNativeOpcode:
printf(" (memory.copy|fill)");
printf(" (memory.%s)", tree->OperIsCopyBlkOp() ? "copy" : "fill");
break;
#endif

Expand Down
64 changes: 64 additions & 0 deletions src/coreclr/jit/lowerwasm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,11 @@ void Lowering::LowerBlockStore(GenTreeBlk* blkNode)
ClassLayout* layout = blkNode->GetLayout();
bool doCpObj = layout->HasGCPtr();

// If copying to the stack instead of the heap, we should treat it as a raw memcpy for
// smaller generated code and potentially better performance.
if (blkNode->IsAddressNotOnHeap(m_compiler))
doCpObj = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other targets do this only under size check + they set blkNode->gtBlkOpGcUnsafe = true;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're trying always using the native wasm memcpy/memset opcodes because the VMs unroll them for us, which should provide both optimal code size and perf. So I think omitting the size check is okay.

Great point on gc unsafe.

Copy link
Member

@EgorBo EgorBo Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's fine. gtBlkOpGcUnsafe is needed for correctness, I'm not sure wasm can run into this, but for other targets we put copy into a nogc region in this case so we can ignore potential atomicity issues

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gtBlkOpGcUnsafe is not needed on WASM. There is no emitter-level tracking of no-gc regions on WASM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, we can do what other targets do and then just ignore it in the emitter till we implement threads, interruptible gc or whatever?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like LowerBlockStore should be just shared between all impl, currently it's copy-paste for all 6 or so targets (not for this PR)

Copy link
Contributor

@SingleAccretion SingleAccretion Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, we can do what other targets do and then just ignore it in the emitter till we implement threads, interruptible gc or whatever?

We won't need it for those things either. We will only need it if a WASM proposal comes around which makes stack walking in the traditional sense possible, which is exceedingly unlikely.


// CopyObj or CopyBlk
if (doCpObj)
{
Expand All @@ -254,6 +259,11 @@ void Lowering::LowerBlockStore(GenTreeBlk* blkNode)
}

blkNode->gtBlkOpKind = GenTreeBlk::BlkOpKindCpObjUnroll;
dstAddr->gtLIRFlags |= LIR::Flags::MultiplyUsed;
if (src->OperIs(GT_IND))
src->gtGetOp1()->gtLIRFlags |= LIR::Flags::MultiplyUsed;
else
src->gtLIRFlags |= LIR::Flags::MultiplyUsed;
}
else
{
Expand Down Expand Up @@ -512,6 +522,60 @@ void Lowering::AfterLowerBlock()
// instead be ifdef-ed out for WASM.
m_anyChanges = true;

// Invariant nodes can be safely moved by the stackifier with no side effects.
// For other nodes, the side effects would require us to turn them into a temporary local, but this
// is not possible for contained nodes like an IND inside a STORE_BLK. However, the few types of
// contained nodes we have in Wasm should be safe to move freely since the lack of 'dup' or
// persistent registers in Wasm means that the actual codegen will trigger the side effect(s) and
// store the result into a Wasm local for any later uses during the containing node's execution,
// i.e. cpobj where the src and dest get stashed at the start and then used as add operands
// repeatedly.
// Locals can also be safely moved as long as they aren't address-exposed due to local var nodes
// being
// implicitly pseudo-contained.
// TODO-WASM: Verify that it is actually safe to do this for all contained nodes.
if (node->IsInvariant() || node->isContained() ||
(node->OperIs(GT_LCL_VAR) &&
!m_lower->m_compiler->lvaGetDesc(node->AsLclVarCommon())->IsAddressExposed()))
{
JITDUMP("Stackifier moving node [%06u] after [%06u]\n", Compiler::dspTreeID(node),
Compiler::dspTreeID(prev));
m_lower->BlockRange().Remove(node);
m_lower->BlockRange().InsertAfter(prev, node);
break;
}

/*
else
{
// To resolve this scenario we have two options:
// 1. We try moving the whole tree rooted at `node`.
// To avoid quadratic behavior, we first stackify it and collect all the side effects
// from it. Then we check for interference of those side effects with nodes between
// 'node' and 'prev'.
// 2. Failing that, we insert a temporary ('ReplaceWithLclVar') for 'node'.
// To avoid explosion of temporaries, we maintain a busy/free set of them.
// For now, for simplicity we are implementing #2 only.

LIR::Use nodeUse;
// FIXME-WASM: TryGetUse is inefficient here, replace it with something more optimal
if (!m_lower->BlockRange().TryGetUse(node, &nodeUse))
{
JITDUMP("node==[%06u] prev==[%06u]\n", Compiler::dspTreeID(node),
Compiler::dspTreeID(prev)); NYI_WASM("Could not get a LIR::Use for the node to be moved by the
stackifier");
}

unsigned lclNum = nodeUse.ReplaceWithLclVar(m_lower->m_compiler);
GenTree* newNode = nodeUse.Def();
JITDUMP("Stackifier replaced node [%06u] with lcl var %u\n", Compiler::dspTreeID(node), lclNum);
m_lower->BlockRange().Remove(newNode);
m_lower->BlockRange().InsertAfter(prev, newNode);
JITDUMP("Stackifier moved new node [%06u] after [%06u]\n", Compiler::dspTreeID(newNode),
Compiler::dspTreeID(prev)); break;
}
*/
Comment on lines +548 to +577
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This large block of commented-out code (lines 548-577) should either be removed or converted to a proper TODO comment with a tracking issue. Leaving this much implementation detail in comments makes the code harder to maintain and review. If this is experimental code for future work, consider moving it to a design document or tracking issue instead.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is here because I'm not sure whether we should add it or not. I tested it and it does work for some cases but it's no longer exercised by any of my test cases so I can't say whether it's actually mergeable for sure.


JITDUMP("node==[%06u] prev==[%06u]\n", Compiler::dspTreeID(node), Compiler::dspTreeID(prev));
NYI_WASM("IR not in a stackified form");
}
Expand Down
29 changes: 29 additions & 0 deletions src/coreclr/jit/regallocwasm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@

#include "regallocwasm.h"

#include "lower.h" // for LowerRange()

RegAllocInterface* GetRegisterAllocator(Compiler* compiler)
{
return new (compiler->getAllocator(CMK_LSRA)) WasmRegAlloc(compiler);
Expand Down Expand Up @@ -330,6 +332,10 @@ void WasmRegAlloc::CollectReferencesForNode(GenTree* node)
CollectReferencesForBinop(node->AsOp());
break;

case GT_STORE_BLK:
CollectReferencesForBlockStore(node->AsBlk());
break;

default:
assert(!node->OperIsLocalStore());
break;
Expand Down Expand Up @@ -417,6 +423,21 @@ void WasmRegAlloc::CollectReferencesForBinop(GenTreeOp* binopNode)
ConsumeTemporaryRegForOperand(binopNode->gtGetOp1() DEBUGARG("binop overflow check"));
}

// CollectReferencesForBlockStore: Collect virtual register references for a block store.
//
// Arguments:
// node - The GT_STORE_BLK node
//
void WasmRegAlloc::CollectReferencesForBlockStore(GenTreeBlk* node)
{
GenTree* src = node->Data();
if (src->OperIs(GT_IND))
src = src->gtGetOp1();

ConsumeTemporaryRegForOperand(src DEBUGARG("block store source"));
ConsumeTemporaryRegForOperand(node->Addr() DEBUGARG("block store destination"));
}

//------------------------------------------------------------------------
// CollectReferencesForLclVar: Collect virtual register references for a LCL_VAR.
//
Expand Down Expand Up @@ -476,6 +497,9 @@ void WasmRegAlloc::RewriteLocalStackStore(GenTreeLclVarCommon* lclNode)
CurrentRange().InsertAfter(lclNode, store);
CurrentRange().Remove(lclNode);
CurrentRange().InsertBefore(insertionPoint, lclNode);

auto tempRange = LIR::ReadOnlyRange(store, store);
m_compiler->m_pLowering->LowerRange(m_currentBlock, tempRange);
}

//------------------------------------------------------------------------
Expand Down Expand Up @@ -539,6 +563,8 @@ void WasmRegAlloc::RequestTemporaryRegisterForMultiplyUsedNode(GenTree* node)
// Note how due to the fact we're processing nodes in stack order,
// we don't need to maintain free/busy sets, only a simple stack.
regNumber reg = AllocateTemporaryRegister(genActualType(node));
// If the node already has a regnum, trying to assign it a second one is no good.
assert(node->GetRegNum() == REG_NA);
node->SetRegNum(reg);
}

Expand All @@ -561,6 +587,7 @@ void WasmRegAlloc::ConsumeTemporaryRegForOperand(GenTree* operand DEBUGARG(const
}

regNumber reg = ReleaseTemporaryRegister(genActualType(operand));
// If this assert fails you likely called ConsumeTemporaryRegForOperand on your operands in the wrong order.
assert(reg == operand->GetRegNum());
CollectReference(operand);

Expand Down Expand Up @@ -605,6 +632,8 @@ void WasmRegAlloc::ResolveReferences()
{
TemporaryRegStack& temporaryRegs = m_temporaryRegs[static_cast<unsigned>(type)];
TemporaryRegBank& allocatedTemporaryRegs = temporaryRegMap[static_cast<unsigned>(type)];
// If temporaryRegs.Count != 0 that means CollectReferences failed to CollectReference one or more multiply-used
// nodes.
assert(temporaryRegs.Count == 0);

allocatedTemporaryRegs.Count = temporaryRegs.MaxCount;
Expand Down
1 change: 1 addition & 0 deletions src/coreclr/jit/regallocwasm.h
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ class WasmRegAlloc : public RegAllocInterface
void CollectReferencesForCast(GenTreeOp* castNode);
void CollectReferencesForBinop(GenTreeOp* binOpNode);
void CollectReferencesForLclVar(GenTreeLclVar* lclVar);
void CollectReferencesForBlockStore(GenTreeBlk* node);
void RewriteLocalStackStore(GenTreeLclVarCommon* node);
void CollectReference(GenTree* node);
void RequestTemporaryRegisterForMultiplyUsedNode(GenTree* node);
Expand Down
Loading