MetaStack v3: 1.1792 sliding bpb, 10L BigramHash SmearGate OrthoInit SWA by xinpw8 · Pull Request #205 · openai/parameter-golf

xinpw8 · 2026-03-20T11:41:51Z

10-layer GPT with BigramHash embeddings, SmearGate, OrthoInit, SWA (30 checkpoints), Muon WD=0.04, mixed int5(MLP)/int6(attn) quantization, 2% magnitude pruning. Sliding window eval at stride=64, seq_len=1024.

val_bpb: 1.1792 (sliding window int6+zstd22, 12.1MB artifact) Trained 600s on 8xH100 SXM, seed 1337, step 7819.

Includes: search harness (ProteinLite), Vast.ai deploy pipeline, live GPU monitor, v1/v2 lineage, test suite.
<3

10-layer GPT with BigramHash embeddings, SmearGate, OrthoInit, SWA (30 checkpoints), Muon WD=0.04, mixed int5(MLP)/int6(attn) quantization, 2% magnitude pruning. Sliding window eval at stride=64, seq_len=1024. val_bpb: 1.1792 (sliding window int6+zstd22, 12.1MB artifact) Trained 600s on 8xH100 SXM, seed 1337, step 7819. Includes: search harness (ProteinLite), Vast.ai deploy pipeline, live GPU monitor, v1/v2 lineage, test suite.

Copilot

Pull request overview

This PR introduces a standalone hyperparameter search harness (“ProteinLite” + runner/log parser), a live monitoring utility, and a Vast.ai deployment pipeline to support rapid Parameter Golf experimentation and remote execution, alongside new MetaStack v2/v3 record artifacts.

Changes:

Add search/ harness (config loader, run launcher, log parsing, live-status tracking, optimizer) plus unit tests.
Add Vast.ai deployment scripts (remote bootstrap, config rendering, DDP smoke checks) and a live GPU/training monitor.
Add new record folders/config presets for MetaStack v2 WD + MetaStack v3 competitive submission metadata.

Reviewed changes

Copilot reviewed 33 out of 34 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
`tools/monitor.py`	Live SSH/local GPU + log + search-run status monitor.
`search/__init__.py`	Declares search package.
`search/config.py`	YAML-backed config loading for runner/search space/settings.
`search/log_parser.py`	Parses training logs into structured metrics + objective selection.
`search/protein_lite.py`	Implements ProteinLite optimizer (Sobol warm start + GP guidance).
`search/runner.py`	Builds commands/env and launches runs while teeing output.
`search/run_search.py`	Orchestrates search loop, live status JSON, and result persistence.
`search_configs/metastack_v2_wd_smoke.yaml`	Smoke search preset for v2 WD.
`search_configs/metastack_v2_wd_sliding_local.yaml`	Local sliding-eval search preset for v2 WD.
`search_configs/metastack_v2_wd_sliding_remote.yaml`	Remote 8-GPU sliding-eval search preset for v2 WD.
`tests/test_search_configs.py`	Unit test coverage for config presets.
`tests/test_search_live_status.py`	Unit tests for live status parsing/phase tracking.
`tests/test_search_log_parser.py`	Unit tests for log parsing/objective selection.
`tests/test_search_protein_lite.py`	Unit tests for deterministic suggestions + scoring semantics.
`tests/test_search_run_search.py`	Unit tests for observation building + log path resolution.
`tests/test_search_runner.py`	Unit tests for command building + launch output streaming.
`deploy/vast/requirements.remote.txt`	Remote runtime deps for search/training bootstrap.
`deploy/vast/rsync_excludes.txt`	Excludes large/ephemeral artifacts from rsync deploy.
`deploy/vast/render_remote_config.py`	Rewrites configs for remote workdir/python/log/output paths.
`deploy/vast/remote_bootstrap_and_run.sh`	End-to-end remote preflight/bootstrap/tests/data fetch/search runner.
`deploy/vast/Dockerfile.amd64`	Builds an amd64 runtime image with remote deps installed.
`deploy/vast/ddp_smoke.py`	Minimal NCCL all-reduce sanity check.
`deploy/vast/deploy_and_launch.py`	Syncs repo to Vast instance and launches remote workflow over SSH.
`deploy/vast/deploy_and_launch.sh`	Thin wrapper to run `deploy_and_launch.py`.
`deploy/vast/create_vast_template.sh`	Helper to create Vast template with required env/search params.
`deploy/vast/build_amd64_image.sh`	Helper to build/push amd64 docker image for Vast runs.
`records/track_10min_16mb/2026-03-19_MetaStack_v1/train_gpt.py`	Adds v1 lineage trainer snapshot to records.
`records/track_10min_16mb/2026-03-20_MetaStack_v2_WD/train_gpt.py`	Adds v2 WD trainer snapshot to records.
`records/track_10min_16mb/2026-03-20_MetaStack_v2_WD/README.md`	Documents v2 WD thesis/results/repro steps.
`records/track_10min_16mb/2026-03-20_MetaStack_v2_WD/EXPERIMENT_LEDGER.md`	Append-only experiment ledger for v2 WD.
`records/track_10min_16mb/2026-03-20_MetaStack_v3_competitive/README.md`	Documents v3 competitive model details + repro command.
`records/track_10min_16mb/2026-03-20_MetaStack_v3_competitive/submission.json`	Submission metadata for v3 competitive artifact.
`.gitignore`	Ignores Vast venv + deploy run outputs + logs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-20T11:47:32Z

tools/monitor.py

+                    for r in completed:
+                        rid = r.get("run_id", "?")[:35]
+                        st = r.get("status", "?")[:12]
+                        pre = r.get("terminal_prequant_bpb")
+                        i6 = r.get("final_int6_bpb")
+                        sl = r.get("sliding_window_bpb")
+                        sz = r.get("int6_artifact_bytes")
+                        color = "\033[32m" if st == "completed" else "\033[31m"
+                        pre_s = f"{pre:.4f}" if pre else "N/A"
+                    i6_s = f"{i6:.4f}" if i6 else "N/A"
+                    sl_s = f"{sl:.4f}" if sl else "N/A"
+                    sz_s = format_bytes(sz) if sz else "N/A"
+                    lines.append(f"  {rid:<35} {color}{st:<12}\033[0m {pre_s:>8} {i6_s:>8} {sl_s:>8} {sz_s:>10}")
+                    lines.append("")


The formatting block for completed search runs is mis-indented: i6_s/sl_s/sz_s and the lines.append(...) call are outside the for r in completed: loop, which will raise UnboundLocalError (or even a syntax/indentation error depending on editor) and only print one row. Re-indent those lines inside the loop (and consider using is not None checks instead of if pre so 0.0 doesn’t display as N/A).

Copilot · 2026-03-20T11:47:32Z

tools/monitor.py

+def run_remote(ssh_cmd: str, cmd: str, timeout: int = 10) -> str | None:
+    try:
+        result = subprocess.run(
+            ssh_cmd.split() + [cmd],
+            capture_output=True, text=True, timeout=timeout,
+        )


run_remote() tokenizes the provided SSH command with ssh_cmd.split(), which breaks when flags include quoted values (e.g. -o StrictHostKeyChecking=accept-new) or paths with spaces. Use shlex.split(ssh_cmd) to parse the SSH command line robustly.

Copilot · 2026-03-20T11:47:32Z

tools/monitor.py

+def run_local(cmd: str, timeout: int = 10) -> str | None:
+    try:
+        result = subprocess.run(
+            cmd, shell=True, capture_output=True, text=True, timeout=timeout,
+        )
+        if result.returncode == 0:
+            return result.stdout
+    except (subprocess.TimeoutExpired, Exception):
+        pass
+    return None
+
+
+def run_cmd(ssh_cmd: str | None, cmd: str, timeout: int = 10) -> str | None:
+    if ssh_cmd:
+        return run_remote(ssh_cmd, cmd, timeout)
+    return run_local(cmd, timeout)


run_local() executes cmd with shell=True while many call sites interpolate user-provided workdir/paths into the command string. This is both fragile (spaces/quoting) and exposes command-injection risk if the monitor is pointed at an untrusted workdir. Prefer subprocess.run([...]) with argument lists (and quote/escape remote paths similarly).

Copilot · 2026-03-20T11:47:33Z

tools/monitor.py

+WD_RE = re.compile(r"weight_decay\s+token:([\d.]+)\s+head:([\d.]+)\s+muon:([\d.]+)\s+scalar:([\d.]+)")
+
+
+def parse_training_log(text: str) -> dict:
+    """Parse key metrics from a training log."""
+    info: dict = {}
+    last_step = None
+    last_val = None
+    for line in text.splitlines():
+        m = STEP_RE.search(line)
+        if m:
+            step, total = int(m.group(1)), int(m.group(2))
+            last_step = {"step": step, "total": total, "time_ms": int(m.group(5)), "avg_ms": float(m.group(6))}
+            if m.group(3):
+                last_val = {"val_loss": float(m.group(3)), "val_bpb": float(m.group(4)), "step": step, "total": total}
+        m = QUANT_RE.search(line)
+        if m:
+            info["quant"] = {"int8_bpb": float(m.group(1)), "int6_bpb": float(m.group(2)),
+                             "int8_sz": int(m.group(3)), "int6_sz": int(m.group(4))}
+        m = ROUNDTRIP_RE.search(line)
+        if m:
+            info.setdefault("roundtrips", {})[m.group(1)] = {"bpb": float(m.group(3)), "time_ms": int(m.group(4))}
+        m = SLIDING_RE.search(line)
+        if m and "sliding_window:progress" not in line:
+            info["sliding"] = {"bpb": float(m.group(2)), "complete": True}
+        m = SLIDING_PROG_RE.search(line)
+        if m:
+            info["sliding_progress"] = {"pct": float(m.group(1)), "wps": float(m.group(2))}
+        if STOP_RE.search(line):
+            info["stopped_early"] = True
+        m = PARAMS_RE.search(line)
+        if m:
+            info["params"] = int(m.group(1))
+        m = WD_RE.search(line)
+        if m:
+            info["wd"] = {"token": m.group(1), "muon": m.group(3), "scalar": m.group(4)}
+    if last_step:


parse_training_log() matches weight_decay token:<...> head:<...> muon:<...> scalar:<...> but the captured head value is dropped and the remaining values are stored as strings. This makes the displayed WD incomplete/inconsistent; store all four fields and parse them as floats for reliable formatting/comparisons.

Copilot · 2026-03-20T11:47:33Z

tests/test_search_configs.py

+ROOT = Path("/home/spark-advantage/parameter-golf")
+
+
+class SearchConfigPresetTests(unittest.TestCase):
+    def test_v2_wd_sliding_local_has_weight_decay_knobs(self):
+        config = load_search_config(ROOT / "search_configs/metastack_v2_wd_sliding_local.yaml")
+        self.assertIn("MUON_WEIGHT_DECAY", config.search_space)
+        self.assertIn("SCALAR_WEIGHT_DECAY", config.search_space)
+        self.assertEqual(config.fixed_env["TOKEN_WEIGHT_DECAY"], 0.0)
+        self.assertEqual(config.fixed_env["HEAD_WEIGHT_DECAY"], 0.0)
+        self.assertIn("2026-03-20_MetaStack_v2_WD/train_gpt.py", str(config.runner.script_path))


This test hard-codes ROOT = Path("/home/spark-advantage/parameter-golf"), which will fail on CI and any developer machine with a different checkout location. Derive the repo root from __file__ (e.g., Path(__file__).resolve().parents[1]) or use relative paths so the test is portable.

Copilot · 2026-03-20T11:47:33Z

deploy/vast/remote_bootstrap_and_run.sh

+start_monitors() {
+    (
+        printf 'timestamp,index,name,util_gpu,util_mem,mem_used_mb,mem_total_mb,power_w,temp_c\n'
+        while true; do
+            local now
+            now="$(timestamp)"
+            while IFS=, read -r idx name util_gpu util_mem mem_used mem_total power temp; do
+                [[ -z "${idx:-}" ]] && continue
+                printf '%s,%s,%s,%s,%s,%s,%s,%s,%s\n' "$now" "$idx" "$name" "$util_gpu" "$util_mem" "$mem_used" "$mem_total" "$power" "$temp"
+            done < <(nvidia-smi --query-gpu=index,name,utilization.gpu,utilization.memory,memory.used,memory.total,power.draw,temperature.gpu --format=csv,noheader,nounits 2>/dev/null || true)
+            sleep 5
+        done
+    ) >"$GPU_MONITOR_LOG" &
+    GPU_MONITOR_PID="$!"
+
+    (
+        printf 'timestamp,fs_size_bytes,fs_used_bytes,fs_avail_bytes,mem_total_bytes,mem_available_bytes,load1\n'
+        while true; do
+            local now fs size used avail mem_total mem_avail load1
+            now="$(timestamp)"
+            read -r size used avail < <(df -B1 --output=size,used,avail "$WORKDIR" | awk 'NR==2 {print $1, $2, $3}')
+            read -r mem_total mem_avail < <(free -b | awk '/^Mem:/ {print $2, $7}')
+            read -r load1 _ < /proc/loadavg
+            printf '%s,%s,%s,%s,%s,%s,%s\n' "$now" "$size" "$used" "$avail" "$mem_total" "$mem_avail" "$load1"
+            sleep 15


start_monitors() runs its loops inside subshells (...) but uses local now / local now fs ... inside those subshells. In bash, local is only valid inside a function, so this will print an error and can stop the monitor loops. Remove local (or restructure into a real helper function invoked normally).

Copilot · 2026-03-20T11:47:33Z

deploy/vast/remote_bootstrap_and_run.sh

+run_unit_tests() {
+    CURRENT_PHASE="unit_tests"
+    write_status "$CURRENT_PHASE" "running" "running deployment unit tests"
+    run_logged "unit tests" "$PYTHON_BIN" -m unittest tests.test_search_runner tests.test_vast_render_remote_config


The remote bootstrap script runs python -m unittest tests.test_search_runner tests.test_vast_render_remote_config, but tests/test_vast_render_remote_config.py is not present in this PR/repo. This will make every remote run fail during the unit test phase; either add the missing test module or remove/replace it with the intended test(s).

Suggested change

run_logged "unit tests" "$PYTHON_BIN" -m unittest tests.test_search_runner tests.test_vast_render_remote_config

run_logged "unit tests" "$PYTHON_BIN" -m unittest tests.test_search_runner

Copilot · 2026-03-20T11:47:34Z

deploy/vast/create_vast_template.sh

+VAST_BIN="${VAST_BIN:-$ROOT_DIR/.venv-vastai/bin/vastai}"
+TEMPLATE_NAME="${PG_TEMPLATE_NAME:-parameter-golf-8xh100}"
+IMAGE="${PG_TEMPLATE_IMAGE:-nvcr.io/nvidia/pytorch:25.12-py3}"
+DISK_SPACE="${PG_TEMPLATE_DISK_GB:-200}"


Defaults are inconsistent: DISK_SPACE defaults to 200GB, but SEARCH_PARAMS includes disk_space>=300. With defaults, the created template configuration won’t match its own search filter expectations. Align the default PG_TEMPLATE_DISK_GB with the search params (or drop disk_space>=... from the default filter).

Suggested change

DISK_SPACE="${PG_TEMPLATE_DISK_GB:-200}"

DISK_SPACE="${PG_TEMPLATE_DISK_GB:-300}"

Copilot · 2026-03-20T11:47:34Z

search/protein_lite.py

+from scipy.stats.qmc import Sobol
+from sklearn.gaussian_process import GaussianProcessRegressor
+from sklearn.gaussian_process.kernels import ConstantKernel, DotProduct, Matern, WhiteKernel
+


This module unconditionally imports scipy and scikit-learn, but the repo’s top-level requirements.txt doesn’t include them. As-is, a clean install will fail to import search.protein_lite (and the added unit tests will fail). Add these dependencies to the standard requirements (or make the import optional with a clear error message/instructions).

Suggested change

from scipy.stats.qmc import Sobol

from sklearn.gaussian_process import GaussianProcessRegressor

from sklearn.gaussian_process.kernels import ConstantKernel, DotProduct, Matern, WhiteKernel

try:

from scipy.stats.qmc import Sobol

from sklearn.gaussian_process import GaussianProcessRegressor

from sklearn.gaussian_process.kernels import ConstantKernel, DotProduct, Matern, WhiteKernel

_SCIPY_SKLEARN_AVAILABLE = True

except ImportError: # pragma: no cover - optional dependencies

_SCIPY_SKLEARN_AVAILABLE = False

class _MissingOptionalDependency:

def __init__(self, *args, **kwargs):

raise ImportError(

"Optional dependencies 'scipy' and 'scikit-learn' are required for "

"Sobol sampling and Gaussian process models in 'search.protein_lite'. "

"Install them with:\n\n pip install scipy scikit-learn\n"

)

# Placeholders that raise a clear error if used without the optional deps

Sobol = _MissingOptionalDependency

GaussianProcessRegressor = _MissingOptionalDependency

ConstantKernel = _MissingOptionalDependency

DotProduct = _MissingOptionalDependency

Matern = _MissingOptionalDependency

WhiteKernel = _MissingOptionalDependency

warnings.warn(

"Optional dependencies 'scipy' and 'scikit-learn' are not installed. "

"Functionality requiring Sobol sampling or Gaussian process models in "

"'search.protein_lite' will not be available.",

RuntimeWarning,

)

- monitor.py: fix indentation bug in search results display loop - monitor.py: include dropped 'head' weight_decay field in parsed log - test_search_configs.py: derive repo root from __file__ instead of hardcoded path - create_vast_template.sh: align disk default (300GB) with search filter - commit missing test_vast_render_remote_config.py referenced by bootstrap

xinpw8 added 2 commits March 20, 2026 03:43

feat: add MetaStack v3 near-SOTA non-record submission

39846b7

Copilot AI review requested due to automatic review settings March 20, 2026 11:41

Copilot started reviewing on behalf of xinpw8 March 20, 2026 11:42 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MetaStack v3: 1.1792 sliding bpb, 10L BigramHash SmearGate OrthoInit SWA#205

MetaStack v3: 1.1792 sliding bpb, 10L BigramHash SmearGate OrthoInit SWA#205
xinpw8 wants to merge 3 commits intoopenai:mainfrom
xinpw8:main

xinpw8 commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	run_logged "unit tests" "$PYTHON_BIN" -m unittest tests.test_search_runner tests.test_vast_render_remote_config
	run_logged "unit tests" "$PYTHON_BIN" -m unittest tests.test_search_runner

	DISK_SPACE="${PG_TEMPLATE_DISK_GB:-200}"
	DISK_SPACE="${PG_TEMPLATE_DISK_GB:-300}"

-from scipy.stats.qmc import Sobol
-from sklearn.gaussian_process import GaussianProcessRegressor
-from sklearn.gaussian_process.kernels import ConstantKernel, DotProduct, Matern, WhiteKernel
+try:
+    from scipy.stats.qmc import Sobol
+    from sklearn.gaussian_process import GaussianProcessRegressor
+    from sklearn.gaussian_process.kernels import ConstantKernel, DotProduct, Matern, WhiteKernel
+    _SCIPY_SKLEARN_AVAILABLE = True
+except ImportError:  # pragma: no cover - optional dependencies
+    _SCIPY_SKLEARN_AVAILABLE = False
+    class _MissingOptionalDependency:
+        def __init__(self, *args, **kwargs):
+            raise ImportError(
+                "Optional dependencies 'scipy' and 'scikit-learn' are required for "
+                "Sobol sampling and Gaussian process models in 'search.protein_lite'. "
+                "Install them with:\n\n    pip install scipy scikit-learn\n"
+            )
+    # Placeholders that raise a clear error if used without the optional deps
+    Sobol = _MissingOptionalDependency
+    GaussianProcessRegressor = _MissingOptionalDependency
+    ConstantKernel = _MissingOptionalDependency
+    DotProduct = _MissingOptionalDependency
+    Matern = _MissingOptionalDependency
+    WhiteKernel = _MissingOptionalDependency
+    warnings.warn(
+        "Optional dependencies 'scipy' and 'scikit-learn' are not installed. "
+        "Functionality requiring Sobol sampling or Gaussian process models in "
+        "'search.protein_lite' will not be available.",
+        RuntimeWarning,
+    )

Conversation

xinpw8 commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants