Complexity MoE + PID Dynamics (Token-Routed I64) by Complexity-ML · Pull Request #224 · openai/parameter-golf

Complexity-ML · 2026-03-20T15:36:40Z

Summary

Novel architecture from Complexity Framework:

Token-Routed MoE — 4 experts, deterministic routing (token_id % 4), mask-multiply (fullgraph safe)
PID Dynamics — mu traverses all 9 layers, tight clamping for stability
SwiGLU activation replacing relu²
Cosine Warm Restarts (SGDR) LR schedule

14.7M params, under 16MB cap. Awaiting compute credits for final val_bpb.

Status

⏳ Pending training results — compute credits requested.

🤖 Generated with Claude Code

Token-Routed MoE (4 experts, deterministic routing) + PID dynamics with mu traversing all layers + SwiGLU + Cosine Warm Restarts. 14.7M params, under 16MB cap. Awaiting compute for final val_bpb. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- LearnedHashRouter: Linear(H, E) micro-router, soft routing, fullgraph safe - 3 routing modes: modulo | learned | hybrid (modulo base + learned override) - Hybrid starts deterministic, learns to override when beneficial - Cosine warm restarts (SGDR): cycles 5k/10k/20k, peak decay 0.7x - Condensed comments to stay under 1500 lines (1497) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Complexity-ML · 2026-03-20T15:53:22Z

Update v2 — Learned Hash Router (Hybrid Mode)

Added context-aware routing on top of deterministic base:

LearnedHashRouter: Linear(H, E) micro-router, soft routing, fullgraph safe
3 routing modes: modulo | learned | hybrid
Hybrid: starts with modulo stability, learns to override when beneficial
Added Cosine Warm Restarts (SGDR) LR schedule
1497 lines, under 1500 cap

Awaiting compute credits to train and produce val_bpb.

- 11 layers (was 9) + MLP 2x expansion (was 1x) - 26.5M params, ~14.2MB with int6+zstd (fits 16MB) - Hybrid learned router + PID dynamics unchanged - Need int6+zstd quantizer for final artifact Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Complexity-ML · 2026-03-20T16:23:04Z

Update v2 — Scaled up to 26.5M params

11 layers (was 9), MLP 2x expansion (was 1x)
26.5M params, estimated ~14.2MB with int6+zstd → fits 16MB
Hybrid learned router + PID dynamics unchanged
4 SwiGLU experts with token-routed dispatch

Still awaiting compute credits. Will update with val_bpb once trained.

- Int6 quantization (QUANT_BITS=6, range [-31,31]) instead of int8 - zstd-22 compression with zlib fallback - SWA: fp32 checkpoint averaging during late training - 1500 lines exactly (at the limit) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- JIT-compiled CUDA kernel for true scatter-dispatch MoE (from vllm-i64) - Route → scatter → cuBLAS expert GEMM → gather pipeline - 4x less wasted compute vs mask-multiply (only active expert runs) - Falls back to PyTorch mask multiply if kernel unavailable - Used at eval time only (training uses torch.compile path) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Separate eval script (not counted in 1500 line limit): - Sliding window with configurable stride (default 64) and window (default 2048) - Loads quantized model, scores only last stride tokens per window - Expected ~0.03 BPB improvement at eval time Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

notapplica mentioned this pull request Mar 20, 2026

Parameter Golf Formerly Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes. Now disabled #140

Closed

Complexity-ML marked this pull request as ready for review March 20, 2026 16:30

Complexity-ML marked this pull request as draft March 20, 2026 16:31

Complexity-ML and others added 5 commits March 20, 2026 19:19

Remove internal doc

f7da66f

Ignore internal docs

823d3bb

Complexity-ML mentioned this pull request Mar 20, 2026

Complexity MoE v4: Token-Routed I64 + PID + CUDA Scatter + Int6 (26.5M params) #250

Closed

Complexity-ML closed this Mar 25, 2026

Complexity-ML deleted the complexity-moe-pid branch March 25, 2026 13:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complexity MoE + PID Dynamics (Token-Routed I64)#224

Complexity MoE + PID Dynamics (Token-Routed I64)#224
Complexity-ML wants to merge 8 commits intoopenai:mainfrom
Complexity-ML:complexity-moe-pid

Complexity-ML commented Mar 20, 2026

Uh oh!

Complexity-ML commented Mar 20, 2026

Uh oh!

Complexity-ML commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Complexity-ML commented Mar 20, 2026

Summary

Status

Uh oh!

Complexity-ML commented Mar 20, 2026

Uh oh!

Complexity-ML commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant