MetaStack v3: 1.1792 sliding bpb, 10L BigramHash SmearGate OrthoInit SWA#205
MetaStack v3: 1.1792 sliding bpb, 10L BigramHash SmearGate OrthoInit SWA#205xinpw8 wants to merge 3 commits intoopenai:mainfrom
Conversation
10-layer GPT with BigramHash embeddings, SmearGate, OrthoInit, SWA (30 checkpoints), Muon WD=0.04, mixed int5(MLP)/int6(attn) quantization, 2% magnitude pruning. Sliding window eval at stride=64, seq_len=1024. val_bpb: 1.1792 (sliding window int6+zstd22, 12.1MB artifact) Trained 600s on 8xH100 SXM, seed 1337, step 7819. Includes: search harness (ProteinLite), Vast.ai deploy pipeline, live GPU monitor, v1/v2 lineage, test suite.
There was a problem hiding this comment.
Pull request overview
This PR introduces a standalone hyperparameter search harness (“ProteinLite” + runner/log parser), a live monitoring utility, and a Vast.ai deployment pipeline to support rapid Parameter Golf experimentation and remote execution, alongside new MetaStack v2/v3 record artifacts.
Changes:
- Add
search/harness (config loader, run launcher, log parsing, live-status tracking, optimizer) plus unit tests. - Add Vast.ai deployment scripts (remote bootstrap, config rendering, DDP smoke checks) and a live GPU/training monitor.
- Add new record folders/config presets for MetaStack v2 WD + MetaStack v3 competitive submission metadata.
Reviewed changes
Copilot reviewed 33 out of 34 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
tools/monitor.py |
Live SSH/local GPU + log + search-run status monitor. |
search/__init__.py |
Declares search package. |
search/config.py |
YAML-backed config loading for runner/search space/settings. |
search/log_parser.py |
Parses training logs into structured metrics + objective selection. |
search/protein_lite.py |
Implements ProteinLite optimizer (Sobol warm start + GP guidance). |
search/runner.py |
Builds commands/env and launches runs while teeing output. |
search/run_search.py |
Orchestrates search loop, live status JSON, and result persistence. |
search_configs/metastack_v2_wd_smoke.yaml |
Smoke search preset for v2 WD. |
search_configs/metastack_v2_wd_sliding_local.yaml |
Local sliding-eval search preset for v2 WD. |
search_configs/metastack_v2_wd_sliding_remote.yaml |
Remote 8-GPU sliding-eval search preset for v2 WD. |
tests/test_search_configs.py |
Unit test coverage for config presets. |
tests/test_search_live_status.py |
Unit tests for live status parsing/phase tracking. |
tests/test_search_log_parser.py |
Unit tests for log parsing/objective selection. |
tests/test_search_protein_lite.py |
Unit tests for deterministic suggestions + scoring semantics. |
tests/test_search_run_search.py |
Unit tests for observation building + log path resolution. |
tests/test_search_runner.py |
Unit tests for command building + launch output streaming. |
deploy/vast/requirements.remote.txt |
Remote runtime deps for search/training bootstrap. |
deploy/vast/rsync_excludes.txt |
Excludes large/ephemeral artifacts from rsync deploy. |
deploy/vast/render_remote_config.py |
Rewrites configs for remote workdir/python/log/output paths. |
deploy/vast/remote_bootstrap_and_run.sh |
End-to-end remote preflight/bootstrap/tests/data fetch/search runner. |
deploy/vast/Dockerfile.amd64 |
Builds an amd64 runtime image with remote deps installed. |
deploy/vast/ddp_smoke.py |
Minimal NCCL all-reduce sanity check. |
deploy/vast/deploy_and_launch.py |
Syncs repo to Vast instance and launches remote workflow over SSH. |
deploy/vast/deploy_and_launch.sh |
Thin wrapper to run deploy_and_launch.py. |
deploy/vast/create_vast_template.sh |
Helper to create Vast template with required env/search params. |
deploy/vast/build_amd64_image.sh |
Helper to build/push amd64 docker image for Vast runs. |
records/track_10min_16mb/2026-03-19_MetaStack_v1/train_gpt.py |
Adds v1 lineage trainer snapshot to records. |
records/track_10min_16mb/2026-03-20_MetaStack_v2_WD/train_gpt.py |
Adds v2 WD trainer snapshot to records. |
records/track_10min_16mb/2026-03-20_MetaStack_v2_WD/README.md |
Documents v2 WD thesis/results/repro steps. |
records/track_10min_16mb/2026-03-20_MetaStack_v2_WD/EXPERIMENT_LEDGER.md |
Append-only experiment ledger for v2 WD. |
records/track_10min_16mb/2026-03-20_MetaStack_v3_competitive/README.md |
Documents v3 competitive model details + repro command. |
records/track_10min_16mb/2026-03-20_MetaStack_v3_competitive/submission.json |
Submission metadata for v3 competitive artifact. |
.gitignore |
Ignores Vast venv + deploy run outputs + logs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for r in completed: | ||
| rid = r.get("run_id", "?")[:35] | ||
| st = r.get("status", "?")[:12] | ||
| pre = r.get("terminal_prequant_bpb") | ||
| i6 = r.get("final_int6_bpb") | ||
| sl = r.get("sliding_window_bpb") | ||
| sz = r.get("int6_artifact_bytes") | ||
| color = "\033[32m" if st == "completed" else "\033[31m" | ||
| pre_s = f"{pre:.4f}" if pre else "N/A" | ||
| i6_s = f"{i6:.4f}" if i6 else "N/A" | ||
| sl_s = f"{sl:.4f}" if sl else "N/A" | ||
| sz_s = format_bytes(sz) if sz else "N/A" | ||
| lines.append(f" {rid:<35} {color}{st:<12}\033[0m {pre_s:>8} {i6_s:>8} {sl_s:>8} {sz_s:>10}") | ||
| lines.append("") |
There was a problem hiding this comment.
The formatting block for completed search runs is mis-indented: i6_s/sl_s/sz_s and the lines.append(...) call are outside the for r in completed: loop, which will raise UnboundLocalError (or even a syntax/indentation error depending on editor) and only print one row. Re-indent those lines inside the loop (and consider using is not None checks instead of if pre so 0.0 doesn’t display as N/A).
| def run_remote(ssh_cmd: str, cmd: str, timeout: int = 10) -> str | None: | ||
| try: | ||
| result = subprocess.run( | ||
| ssh_cmd.split() + [cmd], | ||
| capture_output=True, text=True, timeout=timeout, | ||
| ) |
There was a problem hiding this comment.
run_remote() tokenizes the provided SSH command with ssh_cmd.split(), which breaks when flags include quoted values (e.g. -o StrictHostKeyChecking=accept-new) or paths with spaces. Use shlex.split(ssh_cmd) to parse the SSH command line robustly.
| def run_local(cmd: str, timeout: int = 10) -> str | None: | ||
| try: | ||
| result = subprocess.run( | ||
| cmd, shell=True, capture_output=True, text=True, timeout=timeout, | ||
| ) | ||
| if result.returncode == 0: | ||
| return result.stdout | ||
| except (subprocess.TimeoutExpired, Exception): | ||
| pass | ||
| return None | ||
|
|
||
|
|
||
| def run_cmd(ssh_cmd: str | None, cmd: str, timeout: int = 10) -> str | None: | ||
| if ssh_cmd: | ||
| return run_remote(ssh_cmd, cmd, timeout) | ||
| return run_local(cmd, timeout) |
There was a problem hiding this comment.
run_local() executes cmd with shell=True while many call sites interpolate user-provided workdir/paths into the command string. This is both fragile (spaces/quoting) and exposes command-injection risk if the monitor is pointed at an untrusted workdir. Prefer subprocess.run([...]) with argument lists (and quote/escape remote paths similarly).
| WD_RE = re.compile(r"weight_decay\s+token:([\d.]+)\s+head:([\d.]+)\s+muon:([\d.]+)\s+scalar:([\d.]+)") | ||
|
|
||
|
|
||
| def parse_training_log(text: str) -> dict: | ||
| """Parse key metrics from a training log.""" | ||
| info: dict = {} | ||
| last_step = None | ||
| last_val = None | ||
| for line in text.splitlines(): | ||
| m = STEP_RE.search(line) | ||
| if m: | ||
| step, total = int(m.group(1)), int(m.group(2)) | ||
| last_step = {"step": step, "total": total, "time_ms": int(m.group(5)), "avg_ms": float(m.group(6))} | ||
| if m.group(3): | ||
| last_val = {"val_loss": float(m.group(3)), "val_bpb": float(m.group(4)), "step": step, "total": total} | ||
| m = QUANT_RE.search(line) | ||
| if m: | ||
| info["quant"] = {"int8_bpb": float(m.group(1)), "int6_bpb": float(m.group(2)), | ||
| "int8_sz": int(m.group(3)), "int6_sz": int(m.group(4))} | ||
| m = ROUNDTRIP_RE.search(line) | ||
| if m: | ||
| info.setdefault("roundtrips", {})[m.group(1)] = {"bpb": float(m.group(3)), "time_ms": int(m.group(4))} | ||
| m = SLIDING_RE.search(line) | ||
| if m and "sliding_window:progress" not in line: | ||
| info["sliding"] = {"bpb": float(m.group(2)), "complete": True} | ||
| m = SLIDING_PROG_RE.search(line) | ||
| if m: | ||
| info["sliding_progress"] = {"pct": float(m.group(1)), "wps": float(m.group(2))} | ||
| if STOP_RE.search(line): | ||
| info["stopped_early"] = True | ||
| m = PARAMS_RE.search(line) | ||
| if m: | ||
| info["params"] = int(m.group(1)) | ||
| m = WD_RE.search(line) | ||
| if m: | ||
| info["wd"] = {"token": m.group(1), "muon": m.group(3), "scalar": m.group(4)} | ||
| if last_step: |
There was a problem hiding this comment.
parse_training_log() matches weight_decay token:<...> head:<...> muon:<...> scalar:<...> but the captured head value is dropped and the remaining values are stored as strings. This makes the displayed WD incomplete/inconsistent; store all four fields and parse them as floats for reliable formatting/comparisons.
tests/test_search_configs.py
Outdated
| ROOT = Path("/home/spark-advantage/parameter-golf") | ||
|
|
||
|
|
||
| class SearchConfigPresetTests(unittest.TestCase): | ||
| def test_v2_wd_sliding_local_has_weight_decay_knobs(self): | ||
| config = load_search_config(ROOT / "search_configs/metastack_v2_wd_sliding_local.yaml") | ||
| self.assertIn("MUON_WEIGHT_DECAY", config.search_space) | ||
| self.assertIn("SCALAR_WEIGHT_DECAY", config.search_space) | ||
| self.assertEqual(config.fixed_env["TOKEN_WEIGHT_DECAY"], 0.0) | ||
| self.assertEqual(config.fixed_env["HEAD_WEIGHT_DECAY"], 0.0) | ||
| self.assertIn("2026-03-20_MetaStack_v2_WD/train_gpt.py", str(config.runner.script_path)) |
There was a problem hiding this comment.
This test hard-codes ROOT = Path("/home/spark-advantage/parameter-golf"), which will fail on CI and any developer machine with a different checkout location. Derive the repo root from __file__ (e.g., Path(__file__).resolve().parents[1]) or use relative paths so the test is portable.
| start_monitors() { | ||
| ( | ||
| printf 'timestamp,index,name,util_gpu,util_mem,mem_used_mb,mem_total_mb,power_w,temp_c\n' | ||
| while true; do | ||
| local now | ||
| now="$(timestamp)" | ||
| while IFS=, read -r idx name util_gpu util_mem mem_used mem_total power temp; do | ||
| [[ -z "${idx:-}" ]] && continue | ||
| printf '%s,%s,%s,%s,%s,%s,%s,%s,%s\n' "$now" "$idx" "$name" "$util_gpu" "$util_mem" "$mem_used" "$mem_total" "$power" "$temp" | ||
| done < <(nvidia-smi --query-gpu=index,name,utilization.gpu,utilization.memory,memory.used,memory.total,power.draw,temperature.gpu --format=csv,noheader,nounits 2>/dev/null || true) | ||
| sleep 5 | ||
| done | ||
| ) >"$GPU_MONITOR_LOG" & | ||
| GPU_MONITOR_PID="$!" | ||
|
|
||
| ( | ||
| printf 'timestamp,fs_size_bytes,fs_used_bytes,fs_avail_bytes,mem_total_bytes,mem_available_bytes,load1\n' | ||
| while true; do | ||
| local now fs size used avail mem_total mem_avail load1 | ||
| now="$(timestamp)" | ||
| read -r size used avail < <(df -B1 --output=size,used,avail "$WORKDIR" | awk 'NR==2 {print $1, $2, $3}') | ||
| read -r mem_total mem_avail < <(free -b | awk '/^Mem:/ {print $2, $7}') | ||
| read -r load1 _ < /proc/loadavg | ||
| printf '%s,%s,%s,%s,%s,%s,%s\n' "$now" "$size" "$used" "$avail" "$mem_total" "$mem_avail" "$load1" | ||
| sleep 15 |
There was a problem hiding this comment.
start_monitors() runs its loops inside subshells (...) but uses local now / local now fs ... inside those subshells. In bash, local is only valid inside a function, so this will print an error and can stop the monitor loops. Remove local (or restructure into a real helper function invoked normally).
| run_unit_tests() { | ||
| CURRENT_PHASE="unit_tests" | ||
| write_status "$CURRENT_PHASE" "running" "running deployment unit tests" | ||
| run_logged "unit tests" "$PYTHON_BIN" -m unittest tests.test_search_runner tests.test_vast_render_remote_config |
There was a problem hiding this comment.
The remote bootstrap script runs python -m unittest tests.test_search_runner tests.test_vast_render_remote_config, but tests/test_vast_render_remote_config.py is not present in this PR/repo. This will make every remote run fail during the unit test phase; either add the missing test module or remove/replace it with the intended test(s).
| run_logged "unit tests" "$PYTHON_BIN" -m unittest tests.test_search_runner tests.test_vast_render_remote_config | |
| run_logged "unit tests" "$PYTHON_BIN" -m unittest tests.test_search_runner |
deploy/vast/create_vast_template.sh
Outdated
| VAST_BIN="${VAST_BIN:-$ROOT_DIR/.venv-vastai/bin/vastai}" | ||
| TEMPLATE_NAME="${PG_TEMPLATE_NAME:-parameter-golf-8xh100}" | ||
| IMAGE="${PG_TEMPLATE_IMAGE:-nvcr.io/nvidia/pytorch:25.12-py3}" | ||
| DISK_SPACE="${PG_TEMPLATE_DISK_GB:-200}" |
There was a problem hiding this comment.
Defaults are inconsistent: DISK_SPACE defaults to 200GB, but SEARCH_PARAMS includes disk_space>=300. With defaults, the created template configuration won’t match its own search filter expectations. Align the default PG_TEMPLATE_DISK_GB with the search params (or drop disk_space>=... from the default filter).
| DISK_SPACE="${PG_TEMPLATE_DISK_GB:-200}" | |
| DISK_SPACE="${PG_TEMPLATE_DISK_GB:-300}" |
| from scipy.stats.qmc import Sobol | ||
| from sklearn.gaussian_process import GaussianProcessRegressor | ||
| from sklearn.gaussian_process.kernels import ConstantKernel, DotProduct, Matern, WhiteKernel | ||
|
|
There was a problem hiding this comment.
This module unconditionally imports scipy and scikit-learn, but the repo’s top-level requirements.txt doesn’t include them. As-is, a clean install will fail to import search.protein_lite (and the added unit tests will fail). Add these dependencies to the standard requirements (or make the import optional with a clear error message/instructions).
| from scipy.stats.qmc import Sobol | |
| from sklearn.gaussian_process import GaussianProcessRegressor | |
| from sklearn.gaussian_process.kernels import ConstantKernel, DotProduct, Matern, WhiteKernel | |
| try: | |
| from scipy.stats.qmc import Sobol | |
| from sklearn.gaussian_process import GaussianProcessRegressor | |
| from sklearn.gaussian_process.kernels import ConstantKernel, DotProduct, Matern, WhiteKernel | |
| _SCIPY_SKLEARN_AVAILABLE = True | |
| except ImportError: # pragma: no cover - optional dependencies | |
| _SCIPY_SKLEARN_AVAILABLE = False | |
| class _MissingOptionalDependency: | |
| def __init__(self, *args, **kwargs): | |
| raise ImportError( | |
| "Optional dependencies 'scipy' and 'scikit-learn' are required for " | |
| "Sobol sampling and Gaussian process models in 'search.protein_lite'. " | |
| "Install them with:\n\n pip install scipy scikit-learn\n" | |
| ) | |
| # Placeholders that raise a clear error if used without the optional deps | |
| Sobol = _MissingOptionalDependency | |
| GaussianProcessRegressor = _MissingOptionalDependency | |
| ConstantKernel = _MissingOptionalDependency | |
| DotProduct = _MissingOptionalDependency | |
| Matern = _MissingOptionalDependency | |
| WhiteKernel = _MissingOptionalDependency | |
| warnings.warn( | |
| "Optional dependencies 'scipy' and 'scikit-learn' are not installed. " | |
| "Functionality requiring Sobol sampling or Gaussian process models in " | |
| "'search.protein_lite' will not be available.", | |
| RuntimeWarning, | |
| ) |
- monitor.py: fix indentation bug in search results display loop - monitor.py: include dropped 'head' weight_decay field in parsed log - test_search_configs.py: derive repo root from __file__ instead of hardcoded path - create_vast_template.sh: align disk default (300GB) with search filter - commit missing test_vast_render_remote_config.py referenced by bootstrap
10-layer GPT with BigramHash embeddings, SmearGate, OrthoInit, SWA (30 checkpoints), Muon WD=0.04, mixed int5(MLP)/int6(attn) quantization, 2% magnitude pruning. Sliding window eval at stride=64, seq_len=1024.
val_bpb: 1.1792 (sliding window int6+zstd22, 12.1MB artifact) Trained 600s on 8xH100 SXM, seed 1337, step 7819.
Includes: search harness (ProteinLite), Vast.ai deploy pipeline, live GPU monitor, v1/v2 lineage, test suite.
<3