Skip to content

Test Mat's Win/ARM64 PR1 and PR5 alone#58

Open
swesonga wants to merge 2 commits intomacarte/baselinePR-TestTrampolineFixfrom
swesonga/PR1+PR5-winarm64
Open

Test Mat's Win/ARM64 PR1 and PR5 alone#58
swesonga wants to merge 2 commits intomacarte/baselinePR-TestTrampolineFixfrom
swesonga/PR1+PR5-winarm64

Conversation

@swesonga
Copy link
Member

Determine what fails without PRs 2, 3, and 4.

macarte added 2 commits March 6, 2026 13:44
MSVC's /volatile:iso (default on ARM64) makes volatile reads/writes
plain LDR/STR with no acquire/release barriers. HotSpot's C++ runtime
was written assuming volatile provides acquire/release semantics.

Changes:

1. flags-cflags.m4: Add /volatile:ms to JVM_CFLAGS for Windows AArch64
   to restore acquire/release semantics for volatile accesses.

2. orderAccess_windows_aarch64.hpp: Replace std::atomic_thread_fence()
   with __dmb() intrinsics for READ_MEM_BARRIER (dmb ishld),
   WRITE_MEM_BARRIER and FULL_MEM_BARRIER (dmb ish). The __dmb()
   intrinsic acts as both a hardware barrier and compiler barrier for
   volatile/non-atomic accesses, which std::atomic_thread_fence() does
   not guarantee under /volatile:iso.

3. atomicAccess_windows_aarch64.hpp: Override PlatformLoad/PlatformStore
   with __ldar/__stlr intrinsics (defense-in-depth for Atomic::load/
   store). Add PlatformOrderedLoad/PlatformOrderedStore specializations
   using __ldar/__stlr to avoid redundant dmb in load_acquire/
   release_store paths, matching the Linux AArch64 approach.
…alThread

On ARM64, volatile write (STLR/release) + volatile read (LDAR/acquire) to
different addresses does NOT provide StoreLoad ordering. This breaks
Dekker-like protocols where one side writes field A then reads field B,
while the other writes B then reads A — both sides can miss each other's
stores.

This adds U.fullFence() / VarHandle.fullFence() at all identified
Dekker-pattern sites:

VirtualThread.java:
  - afterYield() PARKING path: between setState(PARKED/TIMED_PARKED) and
    reading parkPermit (Dekker with unpark)
  - afterYield() BLOCKING path: between setState(BLOCKED) and reading
    blockPermit (Dekker with unblock)
  - afterYield() WAITING path: between setState(WAIT/TIMED_WAIT) and
    reading notified (Dekker with notify); fences in both untimed and
    timed sub-paths (adapted for tip's per-path inline checks)
  - afterDone(): between setState(TERMINATED) and reading
    notifyAllAfterTerminate (Dekker with beforeJoin)
  - unpark(): between getAndSetParkPermit(true) and reading state
    (Dekker with afterYield PARKING path)
  - unblock(): between blockPermit=true and reading state (Dekker with
    afterYield BLOCKING path)

LinkedTransferQueue.java:
  - xfer(): between cmpExItem CAS and reading waiter (Dekker with
    await() which writes waiter then reads item)

SynchronousQueue.java:
  - xferLifo(): between cmpExItem CAS and reading waiter (same Dekker
    as LinkedTransferQueue)

AbstractQueuedSynchronizer.java:
  - acquire(): between node.status=WAITING and re-reading state in
    tryAcquire/tryAcquireShared (Dekker with release/releaseShared)
  - release(): between tryRelease state update and reading node.status
    in signalNext
  - releaseShared(): same as release()

These fences are correctness-critical on ARM64, functionally redundant
on x86 (TSO already provides StoreLoad), and appear only on non-hot
paths (state transitions, not tight loops).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants