Submission: 10L + Sliding Window eval (mean val_bpb=1.1899) by shajalahamedcse · Pull Request #221 · openai/parameter-golf

shajalahamedcse · 2026-03-20T15:12:53Z

Key idea:
The model was like a student who studied short paragraphs but was being tested on long chapters — so we asked: what if it practiced on long text too? We changed one line (train_seq_len = 4096) so it trained on 4096-token passages instead of 1024, teaching it real long-range patterns, then evaluated it with overlapping windows (stride=64) so every word gets maximum context during scoring. We ran it on Modal 8×H100 GPUs, got a consistent mean of 1.1899 across 3 random seeds (1337, 42, 7)

Combine train_seq_len=4096 with 10 layers and sliding window evaluation (stride=64)

Seed results

Seed	val_bpb	Artifact size
1337	1.1900	15,115,793 B
42	1.1908	15,128,724 B
7	1.1888	15,154,068 B
mean	1.1899	≤ 16MB ✓
std	0.0008

Config

num_layers     = 10
train_seq_len  = 4096
eval_stride    = 64  (sliding window)
warmdown_iters = 3600
matrix_lr      = 0.04
muon_momentum  = 0.95

Hardware

Modal 8×H100 SXM, torchrun --standalone --nproc_per_node=8, MAX_WALLCLOCK_SECONDS=600

Track: 10min_16mb Author: shajalahamedcse Key change: train_seq_len=4096 with 10 layers and sliding window eval (stride=64). Training on longer sequences improves predictions while keeping the same model architecture and evaluation method. Seed results: seed=1337: val_bpb=1.1900, artifact=15,115,793B seed=42: val_bpb=1.1908, artifact=15,128,724B seed=7: val_bpb=1.1888, artifact=15,154,068B mean: val_bpb=1.1899, std=0.0008 Hardware: Modal 8xH100 SXM, torchrun --nproc_per_node=8 Training capped at MAX_WALLCLOCK_SECONDS=600

Removed error traceback and submission results from log.

MatoTeziTanka · 2026-04-11T20:11:28Z

Community Review — Submission: 10L + Sliding Window eval (mean val_bpb=1.1899)

BPB: 1.1899 | Compliance: LOOKS CLEAN — pure-neural submission, no TTT/SLOT/n-gram-cache

What I found in the code (head SHA f8aaba857564, file records/track_10min_16mb/2026-03-20_Seq4096_10L_SlidingWindow/train_gpt.py):

Static code review found no TTT adaptation function, no SLOT optimization loop, no n-gram-cache class, and no pre-quant val-token fine-tune. The eval path uses the standard sliding-window stride-64 pattern. The submission is a pure-neural architecture iteration on the standard SP1024/SP4096/SP8192 baseline.

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.07s, dim=512, layers=10, vocab=1024, code=55483 B, SMOKE_TEST_PASS

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). No compliance flags from the classification pass — this looks like a clean pure-neural iteration on the standard baseline.

Auto-classification caveat: this review was drafted by the AST-based classifier. If there's a non-standard eval mechanism (logit postprocessing, hedge mixing, etc.) that I missed because it's factored into a helper file or a non-standard function name, please flag it and I'll re-run the audit manually.

Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.07s, dim=512, layers=10, vocab=1024, code=55483 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

shajalahamedcse added 3 commits March 20, 2026 22:10

Clean up log by removing error details

0e155f9

Removed error traceback and submission results from log.

Clean up log by removing error details

15f8806

Removed error traceback and submission results from log.

shajalahamedcse changed the title ~~Seq4096 + 10L + Sliding Window eval (mean val_bpb=1.1899)~~ 10L + Sliding Window eval (mean val_bpb=1.1899) Mar 20, 2026

shajalahamedcse changed the title ~~10L + Sliding Window eval (mean val_bpb=1.1899)~~ Submission: 10L + Sliding Window eval (mean val_bpb=1.1899) Mar 20, 2026

Update train_seed1337.log

f8aaba8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Submission: 10L + Sliding Window eval (mean val_bpb=1.1899)#221

Submission: 10L + Sliding Window eval (mean val_bpb=1.1899)#221
shajalahamedcse wants to merge 4 commits intoopenai:mainfrom
shajalahamedcse:submission/seq4096-10l-sliding-window

shajalahamedcse commented Mar 20, 2026 •

edited

Loading

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shajalahamedcse commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Seed results

Config

Hardware

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Community Review — Submission: 10L + Sliding Window eval (mean val_bpb=1.1899)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shajalahamedcse commented Mar 20, 2026 •

edited

Loading