Skip to content

Added win64 ports of ThreadX and ThreadX SMP#529

Open
fdesbiens wants to merge 18 commits intoeclipse-threadx:devfrom
fdesbiens:win64
Open

Added win64 ports of ThreadX and ThreadX SMP#529
fdesbiens wants to merge 18 commits intoeclipse-threadx:devfrom
fdesbiens:win64

Conversation

@fdesbiens
Copy link
Copy Markdown
Contributor

No description provided.

fdesbiens and others added 18 commits April 3, 2026 11:00
Added the win64 port sources, CMake integration, and Windows build and test scripts.

Updated shared initialization and regression infrastructure for MSVC-hosted Windows simulation, and adjusted Windows-host timing tolerances in regression
  tests to keep the suite stable.
Replaced coarse Win64 scheduler polling with an event-driven wake path and switched the simulated timer to one-shot rearming to avoid catch-up ticks. Reduced Win64 regression slow-timer settings to 10 ms for the stable configurations, while keeping disable_notify_callbacks_build at 15 ms for reliability.

Hardened the Windows build wrapper by invoking Ninja directly for Ninja build trees, fixing timeout detection, enabling a default build timeout, and limiting fallback command replay to real timeout cases.
Restored Linux builds in scripts/build_tx.sh by making the regression tx_initialize_low_level generator tolerant of port-specific formatting. Replaced the brittle exact-string insertion logic with line-based matching so the test interrupt dispatcher hook was inserted reliably for both Linux and Windows simulator ports.
…cheduler timeout

Three targeted fixes that together produce a 20% overall speedup across the
SMP regression suite (150.3s -> 124.8s) with no regressions (all 109 tests pass).

## Fix 1 - Skip SuspendThread when _tx_thread_preempt_disable != 0
(tx_thread_context_save.c / tx_thread_context_restore.c)

When the timer ISR fires while a ThreadX thread is inside a TX_DISABLE
section (_tx_thread_preempt_disable != 0), the old code called
SuspendThread() / ResumeThread() unconditionally, wasting ~100 us per tick
for zero benefit: context_restore would always skip preemption in that state
because the ISR cannot lower _tx_thread_preempt_disable below its value at
ISR entry while it holds the Win32 critical section.

New behaviour:
- context_save sets suspension_type = 3 (new port-local state) instead of
  calling SuspendThread, letting the thread continue automatically once the
  critical section is released.
- context_restore clears suspension_type 3 without calling ResumeThread.

This is the primary driver of the improvement (e.g. threadx_thread_
delayed_suspension_test: 18.07s -> 2.29s, 7.9x speedup).

Also fixes a latent bug in context_restore where ResumeThread was called
even when context_save had skipped SuspendThread because mutex_access was
TRUE (thread spinning on the Win32 CS).

## Fix 2 - 2 ms scheduler event timeout
(tx_initialize_low_level.c, _tx_win32_wait_for_scheduler_event)

Matches the Linux SMP port's sem_timedwait(2 ms) pattern.
- Prevents indefinite stall on any missed SetEvent().
- Introduces slight timing jitter that helps break the systematic phase
  resonance observed in threadx_thread_wait_abort_and_isr_test, where the
  timer tick was always landing outside the _tx_thread_preempt_disable
  window, requiring far more ticks to accumulate 20 condition hits.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When context_save fires and the current thread has mutex_access == TRUE
(it is spinning on the Win32 critical section waiting to acquire it),
calling SuspendThread is wasteful: the thread cannot execute any protected
ThreadX code while blocked on the spinlock and will proceed naturally once
the ISR releases the CS.  Flag such threads with suspension_type 4 so
context_restore skips the matching ResumeThread.

Two subtle bugs were fixed along the way:

1. Thread-ID scan instead of stale-TLS lookup
   _tx_win32_critical_section_obtain previously used the TLS variable
   _tx_win32_current_virtual_core to find the calling thread's struct.
   That TLS is only refreshed via the run-semaphore wake path (type 2);
   after a type-1 (SuspendThread/ResumeThread) hand-off the TLS can point
   to the wrong virtual core.  A thread on core N with stale TLS=M would
   stamp mutex_access = TRUE on _tx_thread_current_ptr[M], which is a
   completely different thread.  The fix scans _tx_win32_virtual_cores[]
   by OS thread ID to find the correct struct.

2. context_restore no-preemption path: drop mutex_access guard
   The original defensive else in context_restore skipped ResumeThread
   when mutex_access was TRUE.  After fix eclipse-threadx#1 this situation still has a
   narrow race window (set before CAS, clear after CAS; timer ISR can
   land between them).  If the ISR had already called SuspendThread
   (suspension_type == 0) before mutex_access was stamped, the defensive
   else would leave the thread permanently OS-suspended — deadlock.
   The fix relies solely on suspension_type: type 3/4 => no SuspendThread
   was issued => no ResumeThread; any other type => ResumeThread always.

Results (109/109 tests pass):
  Round-1 baseline  : 124.8 s
  This commit       : 105.7 s  (-15.4 %, total -32 % vs original 150 s)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…scan

Three targeted optimisations over the Round 2 baseline (105.7 s):

1. Restore start_ack rendezvous in scheduler (revert Round 3 Phase 2)
   The fast-rendezvous approach (removing the start_ack wait) degraded
   wait_abort_and_isr from 11.98 s to 30 s because the CS-hold during
   start_ack provides the timing window where other threads spin with
   preempt_disable != 0, which the ISR must observe to satisfy the
   test condition.  Reverted tx_thread_schedule.c to the Round 2 state
   and updated the explanatory comment in tx_thread_context_restore.c.

2. Increase TX_WIN32_CONTENTION_PAUSE_COUNT 64 → 256
   Each time a thread spins on the Win32 critical section and fails a
   CAS, it increments a counter; on reaching the threshold it calls
   _tx_win32_thread_yield() (SwitchToThread/Sleep(0)) and resets the
   counter.  With a threshold of 64, heavily contended tests triggered
   SwitchToThread() on every 64th failed CAS — extremely expensive
   (~50 µs/call).  Raising to 256 reduces the call rate 4x while
   keeping the same eventual-yield guarantee.  The smp_random_resume_
   suspend* tests benefit most: 8-10 s → ~2 s each.

3. TLS-hinted current_thread lookup in _tx_win32_critical_section_obtain
   The hot path that stamps mutex_access=TRUE previously scanned all
   TX_THREAD_SMP_MAX_CORES (4) virtual-core entries on every CS
   acquisition by a ThreadX thread.  The TLS _tx_win32_current_virtual_
   core index is checked first; if it matches (common case) the scan is
   skipped entirely.  A full 4-way fallback scan is still performed when
   the TLS value is stale (e.g. after a type-1 scheduler hand-off),
   preserving the Round 2 correctness fix.

Results (109/109 pass, all timings on same machine):
  Baseline (original):            150.3 s
  Round 1 (skip redundant susp):  124.8 s  (-20 %)
  Round 2 (mutex_access type-4):  105.7 s  (-32 %)
  Round 4 (this commit):           65.45 s  (-57 %) ← new best
  Linux reference:                  59.8 s

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Tests-Win-After5.txt captures a fresh rebuild of commit f47e102d to
correct a stale-binary measurement.  The earlier Tests-Win-After4.txt
(65.45 s) remains valid but reflects a lucky run where the probabilistic
delayed_suspension test resolved in ~50 ms instead of the typical ~11 s.

The 78.24 s figure is the representative round-4 result:
  Baseline (original):            150.3 s
  Round 1:                        124.8 s  (-17 %)
  Round 2:                        105.7 s  (-30 %)
  Round 4 (typical):               78.2 s  (-48 %)  <- this commit
  Round 4 (best case):             65.5 s  (-56 %)
  Linux reference:                  59.8 s

Remaining gap to Linux is dominated by timer-bound and probabilistic tests
(byte_memory_thread_contention ~13.5 s, wait_abort_and_isr ~14-15 s,
delayed_suspension ~0.5-11 s, timer_multiple ~6.3 s).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Attempted Win32 thread priority mapping (TX priority 0 -> BELOW_NORMAL,
all others -> LOWEST) to guide OS scheduling toward higher-priority
ThreadX threads. Result: 83.36s - worse than Round 4's 78.24s baseline.

Root cause: the ISR resonance tests (wait_abort_and_isr, delayed_suspension)
depend on precise timing equilibria. Elevating any user thread to
BELOW_NORMAL disrupts these equilibria unpredictably. The timer-bound
tests (byte_memory_thread_contention, timer_multiple_test) that dominate
the total time cannot benefit from priority mapping.

Port code reverted to Round 4 state (commit f47e102d).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaced per-thread semaphore handshakes with Windows address waits.

Added high-resolution waitable timer support and regression ISR sampling.

Preserved the 100 Hz ThreadX tick cadence.

Tightened SMP test and clean-build watchdog defaults.

Co-authored-by: Codex (gpt 5.5) <codex@openai.com>
Updated Win32, Win64, and Win64 SMP port version metadata to 6.5.1.202602.

Aligned standalone block-comment terminators in the regular Windows port headers.

Co-authored-by: Codex (gpt 5.5) <codex@openai.com>
Waited for each created Windows host thread to reach the controlled run-semaphore handoff before allowing ThreadX scheduling.

Guarded disable-notify builds against stale host threads entering the ThreadX shell after deletion.

Co-authored-by: Codex (gpt 5.5) <codex@openai.com>
Enabled high-resolution waitable timers for the Win64 simulator and used SetWaitableTimerEx when available.

Bounded Windows host-thread cleanup during thread delete/reset so stale host threads could not spin indefinitely during regression cleanup.

Co-authored-by: Codex (gpt 5.5) <codex@openai.com>
Added -Clean support to the regular and SMP Windows test scripts so stale CTest Testing directories were removed before a run.

Skipped Visual Studio DevShell re-entry when the active MSVC environment already matched the requested architecture.

Defaulted SMP failure repeats to two attempts for timing-sensitive Windows simulator regressions.

Co-authored-by: Codex (gpt 5.5) <codex@openai.com>
Removed Windows-specific timer-thread cleanup from testcontrol after the port handled stale host-thread teardown directly.

Restored stricter event flag, sleep, and timer expectations where port fixes made the previous Windows accommodations unnecessary.

Co-authored-by: Codex (gpt 5.5) <codex@openai.com>
@fdesbiens
Copy link
Copy Markdown
Contributor Author

@billlamiework and @cypherbridge

This PR add brand-new Win64 ports of ThreadX and ThreadX SMP. There are now PowerShell scripts to build the code and run the regression tests for them.

At this point, all tests pass reliably for both versions. Would you mind reviewing the code, please?

If you want to run them, you will need Visual Studio Build Tools 2022 (I used 19.42.34436.0) or any Visual Studio installation bundling them. The code targets Windows 11 only, as other versions are obsolete at this point. I will probably update the win32 port to use the same compiler later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant