Staging: Int6 MLP3x 11L + SmearGate + BigramHash4096x128 + MuonWD038 + SWA50 + DocSliding (single-run val_bpb=1.1568)#208
Closed
ajkpersonal wants to merge 1 commit intoopenai:mainfrom
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
records/track_10min_16mb/2026-03-20_Int6MLP3x_11L_SmearGate_Bigram4096x128_MuonWD038_SWA50_DocSliding11x512 KV4candidate:seq2048,MLP_MULT=3,SmearGate,BigramHash(4096x128),MUON_WEIGHT_DECAY=0.038,ADAM_WEIGHT_DECAY=0.01,SWA_EVERY=50,SWA_START_FRAC=0.50int6_zstd_corewithdoc_sliding 2048/2568xH100run reachesstep:6038/20000in597185msval_loss=1.95474571,val_bpb=1.15677715,artifact_bytes=157048542026-03-20, this is numerically better by0.02877578nats and0.01797600BPBNotes
2026-03-20ismean_val_loss=1.98352149,mean_val_bpb=1.17475315; this run is numerically better, but more runs are still needed for the required significance test before making a SOTA claimtrain.logwas slightly over the 16MB cap (artifact_bytes=16032236), so the promoted score in this folder comes from the included legal re-export pathtrain_gpt.pyis a whitespace-trimmed copy of the logged trainer source so it stays under the repo's 1500-line cap; behavior is unchanged70147and15704854bytes respectivelyTest plan
upstream/mainsubmission.jsonvalidates as JSONtrain_gpt.pyandcheckpoint_frontier_sweep.pyparse cleanlyrun.shandeval_doc2048_256.shpassbash -ntrain_gpt.pyis under the line cap (1436lines)8xH100runs to establishp < 0.01significance for a formal SOTA claim