Staging: Int6 MLP3x 11L + SmearGate + BigramHash4096x128 + MuonWD038 + SWA50 + DocSliding (single-run val_bpb=1.1568) by ajkpersonal · Pull Request #208 · openai/parameter-golf

ajkpersonal · 2026-03-20T12:09:25Z

Summary

add records/track_10min_16mb/2026-03-20_Int6MLP3x_11L_SmearGate_Bigram4096x128_MuonWD038_SWA50_DocSliding
dense-lexical 11x512 KV4 candidate: seq2048, MLP_MULT=3, SmearGate, BigramHash(4096x128), MUON_WEIGHT_DECAY=0.038, ADAM_WEIGHT_DECAY=0.01, SWA_EVERY=50, SWA_START_FRAC=0.50
chosen legal export/eval path is int6_zstd_core with doc_sliding 2048/256
single recorded 8xH100 run reaches step:6038/20000 in 597185ms
chosen legal eval from the included sweep: val_loss=1.95474571, val_bpb=1.15677715, artifact_bytes=15704854
versus the current merged leaderboard leader on 2026-03-20, this is numerically better by 0.02877578 nats and 0.01797600 BPB

Notes

this is a single-run staging submission, not yet a leaderboard record claim
the current merged leaderboard leader on 2026-03-20 is mean_val_loss=1.98352149, mean_val_bpb=1.17475315; this run is numerically better, but more runs are still needed for the required significance test before making a SOTA claim
the built-in integrated export printed in train.log was slightly over the 16MB cap (artifact_bytes=16032236), so the promoted score in this folder comes from the included legal re-export path
the checked-in train_gpt.py is a whitespace-trimmed copy of the logged trainer source so it stays under the repo's 1500-line cap; behavior is unchanged
checked-in code/artifact sizes are 70147 and 15704854 bytes respectively

Test plan

PR diff only adds the new record folder relative to upstream/main
submission.json validates as JSON
train_gpt.py and checkpoint_frontier_sweep.py parse cleanly
run.sh and eval_doc2048_256.sh pass bash -n
train_gpt.py is under the line cap (1436 lines)
additional 8xH100 runs to establish p < 0.01 significance for a formal SOTA claim

Add 2026-03-20 11L dense-lexical submission candidate

11d2794

ajkpersonal changed the title ~~Staging: 11L dense-lexical doc-sliding candidate (single-run val_bpb=1.1568)~~ Staging: Int6 MLP3x 11L + SmearGate + BigramHash4096x128 + MuonWD038 + SWA50 + DocSliding (single-run val_bpb=1.1568) Mar 20, 2026

notapplica mentioned this pull request Mar 20, 2026

Parameter Golf Formerly Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes. Now disabled #140

Closed

ajkpersonal closed this Mar 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Staging: Int6 MLP3x 11L + SmearGate + BigramHash4096x128 + MuonWD038 + SWA50 + DocSliding (single-run val_bpb=1.1568)#208

Staging: Int6 MLP3x 11L + SmearGate + BigramHash4096x128 + MuonWD038 + SWA50 + DocSliding (single-run val_bpb=1.1568)#208
ajkpersonal wants to merge 1 commit intoopenai:mainfrom
ajkpersonal:ajk-11L-seq2048-lexical

ajkpersonal commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ajkpersonal commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Notes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ajkpersonal commented Mar 20, 2026 •

edited

Loading