diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..3812ec1 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,286 @@ +# Changelog + +All notable user-facing and developer-facing changes to iris. + +## [Unreleased] — 2026-05-03 + +The headline of this release is a complete snapshot/rollback stack: capture +the full machine state to disk, restore it, roll back inside a session, ship +snapshots between machines over HTTP, and validate that any of the above +produces deterministic results. + +### Added + +#### Snapshot system + +- **Save/restore/rollback** (`save_snapshot` / `load_snapshot` / + `ci_restore` / `ci_rollback` on `Machine`). Captures CPU, MC, IOC, HPC3, + REX3, RTC, EEPROM, SCSI, Seeq, and all RAM banks plus the COW disk + overlay. Snapshots live under `saves//`. +- **In-memory rollback checkpoint** (Phase 2.1): `ci_rollback` skips disk + by replaying a cached `RollbackCheckpoint` taken at the last `ci_restore`. + Measured ~42 ms per rollback on M2 vs 145–213 ms for the disk path. +- **Reflink overlay capture** (Phase 1.3): on APFS / btrfs / xfs, snapshot + copies of multi-GB COW overlays use `clonefile(2)` / `FICLONE` and consume + ~18 MB actual disk for a 4 GB apparent overlay. +- **Auto-fork-on-restore** (Phase 2.3): `ci_restore` captures the overlay's + dirty-sector set so the running session can mutate the disk without + poisoning the parent snapshot. +- **Scratch SCSI volume** (Phase 2.4): a host-controlled raw block device + for file injection/extraction without networking. Configure with + `scratch = true` in `iris.toml`; iris pre-formats it with a minimal SGI + Volume Header so IRIX surfaces it as `/dev/rdsk/dks0dNs0`. CI commands + `scratch-write` / `scratch-read` / `scratch-clear` / `scratch-info`. New + module `src/sgi_vh.rs`. +- **Content-addressable chunked RAM** (Phase 3.1): each RAM bank and + framebuffer is split into 64 KB chunks, BLAKE3-hashed, stored once under + `saves/.cas/`. Snapshots reference chunks by hash; identical chunks + across snapshots share storage. A second snapshot of an unchanged + machine adds **zero bytes** to disk. New module `src/chunk_store.rs`. +- **Snapshot determinism validator** (Phase 3.3): `validate + []` loads the snapshot twice with peripheral threads + stopped, steps each pass `n_instructions` times in-line, and diffs the + resulting CPU register digests. 1M instructions in 265 ms. Surfaces + `load_state` field omissions, host-wallclock leakage at load time, and + unrestored TLB/cache structures. New module `src/validate.rs`. +- **Snapshot library commands** (Phase 3.2): + - `tree` — render snapshot parent-chain hierarchy + - `diff ` — per-device, per-RAM-chunk, per-COW-sector delta + - `gc` — sweep CAS chunks not referenced by any kept snapshot +- **HTTP snapshot registry** (Phase 3.4): `pull ` and `push + ` ship snapshots between machines. URL layout mirrors disk + layout, so any static HTTP server (`python3 -m http.server` against + `saves/`) works as a read-only pull source. Pull validates each chunk's + BLAKE3 hash; push uploads chunks first and the manifest last so an + interrupted push never publishes an incomplete snapshot. Hand-rolled + HTTP/1.1 client over `std::net` — no new dependency. New module + `src/registry.rs`. Demonstrated 138× speedup on warm pulls (21 ms vs + 2.9 s) thanks to local-CAS dedup. + +#### CI control socket + +`--ci` enables a Unix-domain control plane at `/tmp/iris.sock`. New +newline-delimited JSON commands beyond the existing `start` / `quit` / +`serial-{send,read}` / `wait-serial` / `screenshot`: + +- `save` / `restore` / `rollback` / `list` / `info` / `delete` +- `validate` +- `tree` / `diff` / `gc` +- `scratch-write` / `scratch-read` / `scratch-clear` / `scratch-info` +- `pull` / `push` + +#### Snapshot manifest + +A `snapshot.toml` at the top of every snapshot directory records: +- `schema_version` (currently 3) +- `host_arch` (cross-arch loads are refused — FPU bit-layout differs) +- `iris_git_rev` (warns on mismatch) +- `created_at_unix` +- `parent` (snapshot name this was restored from, if any) +- `description` +- `installed_bundles` + +`tree` walks `parent` to render snapshot lineage; `diff` uses it to +report what changed between two related snapshots; `gc` uses it to +compute the live chunk set. + +#### Tests and validation + +- **Per-device round-trip property tests** (Phase 1.7): every `Saveable` + device has a `save_load_round_trip` test that mutates state, captures + v1 = `save_state()`, loads v1 into a fresh device, captures v2 = + `save_state()`, asserts v1 == v2. Catches `load_state` field omissions + before they corrupt snapshots silently. Covers 10 devices: + `eeprom_93c56`, `ds1x86`, `ioc`, `pit8254`, `mc`, `mips_tlb`, `ps2`, + `z85c30`, `wd33c93a`, `seeq8003`. +- **CiSerialBackend regression test**: round-trips a 53-char single-line + `dd` command through the loopback to prevent regression of the chunked- + input drop bug (see Fixed below). +- 28+ new unit tests across the new modules; all 198+ lib tests pass. + +### Changed + +- **Snapshot schema version bumped twice this release**: + - **v0 → v1** (Phase 1.2): added `snapshot.toml` manifest with + `schema_version`, `host_arch`, `parent`, etc. + - **v1 → v2** (Phase 2.2): per-device state moved from `*.toml` (hex + strings) to `*.bin` (postcard-encoded `BinValue`). cpu state file + shrunk 24% (3.65 MB → 2.79 MB) and parses 3.4× faster (19.7 ms → 5.8 ms). + - **v2 → v3** (Phase 3.1): RAM banks and framebuffers moved from raw + `bank{N}.bin`/`rex3_*.bin` files to the content-addressable chunk + store at `saves/.cas/`. Each snapshot writes a tiny `chunks.bin` + manifest of per-bank/per-framebuffer chunk hashes. + - **Backward compatibility**: load reads any of v0/v1/v2/v3; the + appropriate code path is dispatched off `manifest.schema_version`. + New saves write the highest version. +- **`load_snapshot` refactored** into `load_snapshot_inner` (private) + + `load_snapshot` (public, auto-starts CPU + peripherals on return) + + `load_snapshot_paused` (used by the determinism validator; leaves all + threads stopped). +- **`Machine::with_paused`** helper: briefly stops all device threads to + perform a host-side mutation (used by scratch-write etc.), then + resumes — but only restarts the CPU if it was running before, so + pre-`start` operations don't auto-launch the CPU. +- **iris.toml**: documented `[scsi.2]` scratch-volume block (commented + out by default). New optional fields `scratch: bool` and `size_mb: + Option` on `ScsiDeviceConfig`. + +### Fixed + +- **`cp0_compare` write recalibration: synthetic clock available behind + `--features ci_clock`.** The previous implementation in + `src/mips_core.rs` measured `Instant::now()` between successive + Compare writes to compute a wallclock-stretched `count_step`. Two + passes from the same starting state would see different host + scheduling → different `dt_ns` → different `count_step` → different + timer-interrupt timing → divergent guest execution. With + `--features ci_clock` we swap in `dt_ns = (cycles since last Compare + write) * 10ns` (R4400 ~100 MIPS), giving the Phase 3.3 validator + `deterministic: true` at any N. Default builds keep the wallclock + path so interactive desktop sessions retain real-time IRIX timing. + Tradeoff under `ci_clock`: guest wall-clock no longer tracks host + wall-clock — exactly what reproducible CI wants. +- **CiSerialBackend chunked-input loss** (Phase 3.5). The SCC channel-A + RX worker silently dropped bytes when its 8-byte `rx_queue` was full, + producing the symptom `dd if=/dev/rdsk/dks0d2s0 bs=512` arriving at + the IRIX shell as `dd if=/d=512`. Fixed by holding the byte in a + local `pending: Option` slot and retrying instead of dropping — + proper flow control: bytes only leave `host_to_guest` when there's + downstream space. Regression test `long_input_round_trips_without_loss` + in `src/z85c30.rs`. +- **EEPROM round-trip**: discovered during 1.7 testing that the EEPROM + has 128 words (not 256). Test corrected. +- **IOC round-trip**: `load_state` re-runs `update_interrupts()` which + re-derives the MAP_INT0/MAP_INT1 cascade bits in `l0_stat`/`l1_stat`. + Test now calls `update_interrupts` before the first save so the saved + state already reflects the cascade — matches what a real running + machine always shows. +- **Z85c30 default constructor binds TCP** 8880/8881 on `new()`; tests + use `new_null()` instead so two test instances don't race on the same + ports. Also the right choice for CI mode (which already used it). + +### Deprecated / Descoped + +- **Persistent JIT cache** (was Phase 2.5): descoped. Interp on M2 hits + Indy parity (60–100 MIPS for integer code). The plan-cited 1.5–2× JIT + win wasn't worth the maintenance burden of an unstable JIT (still-open + POST hang on M2, prior Loads-tier and store-correctness issues). JIT + code stays mothballed behind the existing `--features jit` flag — + re-enable if a future workload outgrows interp. + +### Module map + +New modules under `src/`: + +| Module | Purpose | +|---|---| +| `sgi_vh.rs` | Minimal SGI Volume Header writer for the scratch volume | +| `chunk_store.rs` | Content-addressable chunk store (BLAKE3, 64 KB) | +| `validate.rs` | Snapshot determinism check (interp two-pass diff) | +| `registry.rs` | Hand-rolled HTTP/1.1 client for snapshot pull/push | + +Existing modules with significant changes: + +| Module | Changes | +|---|---| +| `snapshot.rs` | Manifest, BinValue (postcard), ChunksManifest, write_state/read_state, write_chunks_manifest | +| `machine.rs` | save/load/restore/rollback orchestration, with_paused, scratch_path, schema-version-aware dispatch | +| `ci.rs` | 15+ new commands | +| `mips_exec.rs` | step_n_inline, state_digest, CpuStateDigest | +| `mips_core.rs` | Deterministic `cp0_compare` recalibration | +| `cow_disk.rs` | Reflink-based overlay capture | +| `z85c30.rs` | RX worker pending-byte hold, save_load_round_trip + long_input_round_trips_without_loss tests | +| `config.rs` | scratch + size_mb on ScsiDeviceConfig | + +### Performance numbers (M2 interp) + +| Metric | Value | +|---|---| +| Cold restore (disk) | 145–213 ms | +| In-memory rollback | 42 ms | +| Save (warm CAS, no guest changes) | 232 ms | +| Save (cold CAS, first save) | 851 ms | +| 1 MB scratch-write while CPU running | 31 ms | +| 1M-instruction determinism check | 265 ms | +| Snapshot pull (cold local CAS) | 2.9 s / 268 MB | +| Snapshot pull (warm local CAS) | 21 ms / 3.5 MB metadata | +| 100 snapshots from same parent (estimated) | ~1.5 GB total vs ~27 GB without dedup | + +### Dependencies added + +- `postcard = "1"` — non-self-describing binary serde format for v2 device state and v3 chunks manifest. +- `blake3 = "1"` — content hashing for the CAS chunk store. + +No HTTP client dependency added — `registry.rs` uses `std::net::TcpStream` +directly. + +--- + +### `iris-ci` wrapper binary + +Driving the CI socket via raw `printf … | nc -U /tmp/iris.sock` proved tedious +and error-prone in real use (long lines, brittle JSON quoting, hand-managed +timeouts, bs=512 foot-guns). New `iris-ci` companion binary replaces all of +that. + +#### Subcommands + +**Direct passthroughs to socket commands:** +`ping`, `start`, `quit`, `save`, `restore`, `rollback`, `list`, `info`, +`delete`, `tree`, `diff`, `gc`, `validate`, `screenshot`, `pull`, `push`, +`serial-send`, `serial-read`, `serial-wait`, `scratch read`, `scratch write`, +`scratch clear`, `scratch info`. + +**High-level macros** for the multi-step rituals that dominate a real CI loop: + +- `iris-ci boot` — the full PROM-menu-to-login dance (start CPU + wait + `Option?` + send `1` + wait `IRIS console login`) in one command. +- `iris-ci login [USER]` — sends username + handles vt100 prompt + waits for + `#`. Defaults to `root`. +- `iris-ci run ""` — sends a shell command, waits for the prompt, + prints just the captured stdout, returns non-zero on guest failure. Uses + csh `$status` by default; `--shell sh` switches to `$?`. Solves the SCC + echo-of-input ambiguity by waiting for `\nIRIS-CI-RC=` (only matches at + the start of the output line, never inside the typed-input echo line). +- `iris-ci put HOST_FILE [--to GUEST_PATH]` — copies a host file into the + guest. Stages bytes in the scratch volume, drives the guest with + `dd if=/dev/rdsk/dks0d2s0 of=… bs=512 count=N` where N is computed + automatically, then truncates the destination to the original byte length + with `dd if=/dev/null of=… bs=1 seek=N count=0`. **The user never types + bs=512 or sector counts.** +- `iris-ci get GUEST_PATH [--to HOST_FILE]` — pulls a guest file out. + Zeros scratch, drives the guest `dd … bs=512 conv=sync,notrunc` to write + with sector padding, looks up the byte count via `wc -c`, reads back + exactly that many bytes from scratch. +- `iris-ci script FILE` — runs a sequence of iris-ci commands from a file + (one per line, `#` comments, double-quoted args). Each step prints + `[ok Nms] ` or `[FAIL Nms] : `. Aborts on first + failure with non-zero overall exit. + +#### Connection options + +- Default socket `/tmp/iris.sock`; override with `--socket PATH` or + `IRIS_SOCKET` environment variable. +- `--json` for raw JSON responses (scriptable). `--quiet` for silent-on-success. +- Exit codes: 0 success, 1 socket/connection error, 2 iris error response, + 3 local error (file not found, etc.). + +#### Implementation + +- New binary `iris-ci` at `src/iris_ci_main.rs` (~700 lines), declared as + `[[bin]]` in `Cargo.toml`. No new dependencies — reuses the existing + `clap`, `serde_json`, and `std::os::unix::net`. +- Single-request, single-response per invocation. Connects, sends one + newline-delimited JSON request, reads one line of response, shuts down + the write side so the server's read loop exits cleanly. + +#### What this replaced in the manual test runbook + +| Before | After | +|---|---| +| 6-step PROM-to-shell ritual via `printf` + `nc` | `iris-ci boot && iris-ci login` | +| `printf '%s\n' '{"cmd":"serial-send",...}' \| nc …` | `iris-ci serial send "..."` | +| Hand-built `dd if=… bs=512 count=K` recipes for file injection | `iris-ci put localfile.tar` | +| Hand-built `dd … conv=sync,notrunc` + `wc -c` for extraction | `iris-ci get /tmp/foo --to ./foo.tar` | +| Multi-line shell sequences with manual error handling | `iris-ci script tests/scenario.iris` | +| JSON output piped through `head -c` and visually parsed | Pretty-printed tables + `--json` opt-in | diff --git a/Cargo.toml b/Cargo.toml index 0c4b439..1ce4d98 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -9,6 +9,11 @@ default-run = "iris" debug_cache = [] developer = [] developer_ip7 = [] # CP0 Compare/timer calibration stats and debug prints +# Synthetic deterministic clock for CP0 Compare calibration: dt_ns derived from +# (instructions executed) * 10ns instead of host Instant::now(). Required for +# the snapshot determinism validator (Phase 3.3). Default OFF preserves the +# wallclock-anchored timer that interactive desktop builds expect. +ci_clock = [] # Lightning: pedal-to-the-metal build — disables breakpoint checks and traceback buffer updates. # Incompatible with interactive debugging. For end-user / benchmarking builds only. lightning = [] @@ -38,9 +43,12 @@ crossbeam-utils = "0.8" bitfield = "0.14" cpal = "0.15" serde = { version = "1.0.228", features = ["derive"] } +serde_json = "1.0" toml = "1.0.3" -parking_lot = "0.12" +postcard = { version = "1", features = ["alloc"] } +blake3 = "1" png = "0.17" +parking_lot = "0.12" spin = "0.10.0" gdbstub = { version = "0.7", features = ["std"] } gdbstub_arch = "0.3" @@ -86,5 +94,9 @@ path = "src/main.rs" name = "coffdump" path = "src/coffdump.rs" +[[bin]] +name = "iris-ci" +path = "src/iris_ci_main.rs" + diff --git a/README.md b/README.md index 66621bc..09f722e 100644 --- a/README.md +++ b/README.md @@ -71,6 +71,7 @@ cargo run --release --features lightning # disable emulator breakpoi cargo run --release --features jit # enable Cranelift MIPS JIT compiler cargo run --release --features rex-jit # enable REX3 graphics JIT compiler cargo run --release --features tlbvmap # enable 8k slot to tlb entry map (increases cache use but may help depending on host cpu arch) +cargo run --release --features ci_clock # synthetic deterministic CP0 Compare clock (CI/snapshot validator only; loses realtime desktop timing) cargo run --release --features lightning,rex-jit,tlbvmap # recommended for best speed right now ``` @@ -131,6 +132,122 @@ Writes go to `scsi1.raw.overlay`. Monitor commands: - `cow reset` - discard all overlay writes +## Snapshots and rollback + +Capture the full machine state — RAM, every device, plus the COW overlay — into +`saves//`, and restore it later. CPU, MC, IOC, HPC3, REX3, RTC, EEPROM, +SCSI controller, and the Seeq Ethernet chip all round-trip. Current schema +version is 3: postcard-encoded binary device state plus content-addressable +chunked RAM under `saves/.cas/`. A second snapshot taken from the same parent +adds **zero bytes** to disk for any RAM region that didn't change — same +storage model as Docker layers. + +From the interactive monitor (`telnet 127.0.0.1 8888`): +``` +save base/desktop # writes saves/base/desktop/ +load base/desktop # restore everything (RAM, devices, disk overlay) +``` + +From `iris-ci` (the wrapper — see CI socket section below): +```bash +iris-ci save base/desktop +iris-ci restore base/desktop # full disk-backed reload (~150 ms cold) +iris-ci rollback # in-memory rewind to last restore (~40 ms) +iris-ci diff base/desktop tests/grep # what changed: devices, RAM chunks, COW sectors +iris-ci validate base/desktop -n 1000000 # bit-deterministic re-execution check (build with --features ci_clock) +iris-ci tree # snapshot parent-chain hierarchy +iris-ci gc # sweep CAS chunks no kept snapshot references +iris-ci pull http://reg/snapshots/base # fetch a snapshot from another machine +``` + +Two restore tiers: +- **`restore `** — full disk-backed reload. ~150 ms. Use after a hard + reset or to switch to a different snapshot. +- **`rollback`** — in-memory rewind to the last `restore` checkpoint. ~40 ms, + no disk I/O. Use this in tight inner test loops where you keep returning to + the same starting state. + +Reflinks are used on APFS / btrfs / xfs so capturing a snapshot of a 4 GB disk +image takes <10 ms and uses ~18 MB of actual disk. + +See [CHANGELOG.md](CHANGELOG.md) for the full feature set, and +[manual_test_runbook.md](manual_test_runbook.md) for a copy-paste tour. + + +## CI control socket and `iris-ci` + +`--ci` enables a Unix-socket control plane for headless automation, plus a +small in-process serial backend so the harness can drive the IRIX console +directly. The default socket path is `/tmp/iris.sock`. + +``` +cargo run --release --features lightning -- --ci +``` + +`cargo build` produces a companion binary, `iris-ci`, that's the **canonical +way** to drive the socket. Don't bother with raw `nc` + JSON unless you're +debugging the wrapper itself. + +```bash +# In one terminal: launch iris (Newport window opens, --ci is just an extra channel) +./target/release/iris --ci + +# In another terminal: drive it +./target/release/iris-ci boot # PROM menu → IRIS console login (one cmd) +./target/release/iris-ci login # send root + dismiss vt100 prompt + wait # +./target/release/iris-ci run 'ls /' # send shell command, get stdout + exit code +./target/release/iris-ci save base/multiuser +./target/release/iris-ci put localfile.tar # copy file into guest, no bs=512 math +./target/release/iris-ci get /tmp/out --to ./out.tar +./target/release/iris-ci diff base mutated # per-device + chunk + cow-sector deltas +./target/release/iris-ci tree +./target/release/iris-ci script tests/scenario.iris # batch-run a sequence of cmds +``` + +Run `iris-ci --help` for the full list, or `iris-ci --help` for any +subcommand. Every operation has a typed clap arg — no JSON quoting, no +hand-managed timeouts. + +For automation that doesn't want to depend on `iris-ci`, the underlying socket +protocol is newline-delimited JSON; `cmd` and `args` per request, `{ok, data, +error}` per response. See `src/ci.rs` for the dispatch table. + + +## Scratch volume — file injection without networking + +A SCSI device with `scratch = true` is a host-controlled raw block device for +pushing files into the guest (and pulling artifacts back out) without bringing +up NFS or anything else. iris pre-formats the underlying file with a minimal +SGI Volume Header on first run, and exposes it inside IRIX as +`/dev/rdsk/dks0d2s0`. + +Enable in `iris.toml`: +```toml +[scsi.2] +path = "scratch.raw" +cdrom = false +overlay = false +scratch = true +size_mb = 64 +``` + +The easy way (via `iris-ci`): +```bash +iris-ci put localfile.tar # copies host file into the guest +iris-ci get /tmp/output.log --to ./out.log # pulls a guest file out +``` + +`iris-ci put`/`get` handle the IRIX `dd bs=512` sector-alignment quirk +transparently — they compute the right block count from the host file size, +issue the right `dd` recipe to the guest, and truncate to the original byte +length on the receiving end. + +Manual/raw paths (if you want to drive `dd` yourself): +- Reads MUST use `bs=512` (or any 512-multiple); `bs=64` returns "I/O error". +- Writes must be padded to `bs`; add `conv=sync` for short inputs. +- Inside IRIX: `dd if=/dev/rdsk/dks0d2s0 bs=512 | tar xf -` + + ## Input Click the window to grab mouse and keyboard. Right Ctrl releases the grab. @@ -148,8 +265,9 @@ getting IRIX running. These are meant for both humans and AI assistants working on the codebase. - `rules/jit/` - dispatch architecture, store compilation, sync, verify mode, probe tuning -- `rules/irix/` - networking config, keyboard quirks +- `rules/irix/` - networking config, keyboard quirks, csh + scratch raw-device gotchas - `rules/testing/` - disk image handling, avoiding filesystem corruption +- `rules/snapshot/` - snapshot binary format, scratch-volume conventions, round-trip tests, CI overlay paths, **iris-ci as the canonical CI interface** If you're about to touch the JIT dispatch loop, read `rules/jit/dispatch-architecture.md` first. It'll save you a few days. diff --git a/iris.toml b/iris.toml index 1305f1f..1e40a57 100644 --- a/iris.toml +++ b/iris.toml @@ -13,9 +13,11 @@ no_audio = false # PROM ROM image (required). prom = "prom.bin" -# Window scale factor: 1 = native resolution, 2 = 2× for HiDPI/4K monitors. +# Window scale factor. Uses logical points (macOS) / DPI-scaled pixels, so +# scale=2 is visibly ~2× bigger than scale=1 on every display. +# Valid: 1, 2, 3, or 4. # Can also be set with the --2x command-line flag (CLI takes precedence). -scale = 1 +scale = 2 # RAM bank sizes in MB. # Each bank must be 0 (absent), 8, 16, 32, 64, or 128. @@ -32,19 +34,45 @@ banks = [128, 128, 0, 0] # Internal hard disk [scsi.1] -path = "scsi1.raw" +path = "irix65_4g.raw" cdrom = false +overlay = true # Internal hard disk +#[scsi.2] +#path = "scsi2.raw" +#cdrom = false + +# Scratch volume for host<->guest file injection without networking. +# iris auto-creates the file at `path` if missing — first 4 KB hold a minimal +# SGI Volume Header (so IRIX recognises the device); the rest (size_mb MB +# minus 4 KB) is the host-controlled payload area. The CI socket exposes +# scratch-write/scratch-read/scratch-clear/scratch-info; offsets passed to +# those commands are relative to the payload start, so the VH is never +# touched. The guest reads the same bytes at offset 0 of /dev/rdsk/dks0dNvol. +# No higher-level format is imposed — typical use is a tar stream: +# host: iris CI: scratch-write {host_path: "bundle.tar"} +# guest: dd if=/dev/rdsk/dks0d2s0 bs=512 | tar xf - +# tar cf - /var/log/foo | dd of=/dev/rdsk/dks0d2s0 bs=512 conv=notrunc +# host: iris CI: scratch-read {to_path: "log.tar"} +# IRIX raw block-device reads must be sector-aligned (bs must be a multiple +# of 512); bs=64 etc returns "Read error: I/O error". +# Implies cdrom=false, overlay=false (the volume must be host-writable, and +# scratch contents intentionally survive snapshot rollback so a freshly +# injected bundle isn't reverted). [scsi.2] -path = "scsi2.raw" -cdrom = false +path = "scratch.raw" +cdrom = false +overlay = false +scratch = true +size_mb = 64 # NFS share — requires unfsd on the host. # The shared directory is exported to the VM at 192.168.0.1:/path (standard NFS port 2049). # From IRIX: mount 192.168.0.1:/absolute/path /mnt [nfs] shared_dir = "./shared" +unfsd = "/usr/local/sbin/unfsd" # Port forwarding rules — forward host ports into the guest (IRIX). # proto: "tcp" or "udp" @@ -77,7 +105,7 @@ bind = "localhost" # For a single disc, set path only. # For a changer (cycled with "scsi eject 4" in the monitor), list all # ISO images in `discs`; the first entry is mounted at startup. -[scsi.4] -path = "cdrom4.iso" -cdrom = true -#discs = ["second.iso", "cdrom4.iso", "patches.iso"] +#[scsi.4] +#path = "cdrom4.iso" +#cdrom = true +##discs = ["second.iso", "cdrom4.iso", "patches.iso"] diff --git a/manual_test_runbook.md b/manual_test_runbook.md new file mode 100644 index 0000000..0798b95 --- /dev/null +++ b/manual_test_runbook.md @@ -0,0 +1,254 @@ +# Manual test runbook + +Copy-paste each block in order. The whole sequence runs ~5–10 minutes including +IRIX boot. Uses the `iris-ci` wrapper, not raw `nc` — every command is one short +line, no JSON escaping. + +## Setup + +```bash +cd ~/projects/github/unxmaal/iris + +# Build (produces both `iris` and `iris-ci`) +cargo build --release --features lightning + +# In iris.toml, uncomment the [scsi.2] scratch block: +# path = "scratch.raw" cdrom = false overlay = false +# scratch = true size_mb = 64 + +# Clean state from any prior run +rm -f /tmp/iris.sock /tmp/iris-ci-*-scsi*.overlay scratch.raw 2>/dev/null +rm -rf saves/.cas saves/test-* 2>/dev/null + +# Put iris-ci on PATH so the rest is shorter +alias ci=./target/release/iris-ci +``` + +## Boot iris and IRIX + +```bash +# Launch iris in the background (one terminal, --ci enables the control socket) +./target/release/iris --ci > /tmp/iris.log 2>&1 & +until [ -S /tmp/iris.sock ]; do sleep 1; done + +# Boot to root shell — one command replaces the 6-step PROM-menu dance +ci boot # ~40s on M2 interp +ci login # ~2s; defaults to root with no password +``` + +**Expected:** `boot: ready at login` followed by `login: shell ready`. Total ~42 s. + +--- + +## Test 1 — Bundle install + diff + +Snapshot a clean baseline, inject a "bundle" via the scratch volume, install it +in IRIX, snapshot the result, see exactly what changed. The `put` command +handles the IRIX `dd bs=512` quirk transparently — you never type a sector count. + +```bash +ci save test-1/before + +# Build a small "bundle" on the host +echo "fake bundle, marker=$(date +%s)" > /tmp/bundle.txt +tar -cf /tmp/bundle.tar -C /tmp bundle.txt + +# Inject it into the guest. iris-ci handles bs=512 and truncation. +ci put /tmp/bundle.tar --to /tmp/bundle.tar + +# Extract in the guest +ci run 'cd /tmp && tar xf bundle.tar' +ci run 'cat /tmp/bundle.txt' + +# Snapshot post-install +ci save test-1/after + +# Diff +ci diff test-1/before test-1/after +du -sh saves/.cas +``` + +**Expected:** +- `cat /tmp/bundle.txt` echoes the `marker=` line back from the guest. +- `diff` shows small `bank0/bank1` chunk deltas (a few %), banks 2/3 unchanged, + cow_diff lists new dirty sectors on scsi 1 from the tar extract, devices + changed includes `mc`, `cpu`, `scsi`. +- `du -sh saves/.cas` ≈ 250–260 MB (one snapshot's worth; second snapshot + added almost nothing thanks to CAS dedup). + +--- + +## Test 2 — Rollback inner loop + +The mogrix CI test loop: install bundle → run test → rollback → next bundle. + +```bash +ci save test-2/clean +ci restore test-2/clean # arms the in-memory checkpoint + +for run in 1 2 3 4 5; do + echo "=== run $run ===" + ci run "echo run-$run > /tmp/run.txt && ls /tmp/run.txt" + T=$(date +%s%N) + ci rollback >/dev/null + T2=$(date +%s%N); echo "rollback: $(( (T2-T)/1000000 )) ms" + ci run 'ls /tmp/run.txt 2>&1 || echo missing' +done +``` + +**Expected:** +- Each `rollback` prints in the **40–80 ms range** — in-memory, not disk. +- After every rollback, `ls /tmp/run.txt` says missing (or "No such file") — + RAM and the SCSI overlay both reverted. + +--- + +## Test 3 — CAS dedup at scale + +Take 10 snapshots over a brief idle period and confirm disk usage barely grows. + +```bash +for i in 01 02 03 04 05 06 07 08 09 10; do + ci run "date >> /tmp/log" >/dev/null + ci save test-3/snap$i >/dev/null + printf 'snap%s cas=%s\n' "$i" "$(du -sh saves/.cas | cut -f1)" +done + +# Delete every other one and gc +for i in 03 05 07 09; do + ci delete test-3/snap$i >/dev/null +done +ci gc +du -sh saves/.cas +``` + +**Expected:** +- `snap01` ≈ 250 MB. Each subsequent snap adds **<5 MB** (idle guest). +- `gc` reports `removed_chunks > 0` and `bytes_freed > 0`. + +--- + +## Test 4 — Determinism check + +After save, two cold runs of the same instructions should reach identical state. + +```bash +ci save test-4/repeatable +ci validate test-4/repeatable -n 0 # just load → digest +ci validate test-4/repeatable -n 1000000 # run 1M instructions twice + diff +``` + +**Expected:** Both runs print `deterministic for N instructions (PC=0x...)`. The +1M run completes in ~250–300 ms. + +--- + +## Test 5 — Snapshot tree + +`tree` shows parent-chain hierarchy. + +```bash +ci save test-5/base +ci restore test-5/base # restoring stamps `parent` on future saves + +ci run 'echo bundle-A >> /tmp/log' +ci save test-5/grep-A + +ci restore test-5/base +ci run 'echo bundle-B >> /tmp/log' +ci save test-5/grep-B + +ci tree +``` + +**Expected:** the tree shows `test-5/base` at top with `grep-A` and `grep-B` +indented under it. + +--- + +## Test 6 — Script mode + +Replace the test sequence above with a one-line invocation against a `.iris` +file. + +```bash +cat > /tmp/scenario.iris <<'EOF' +# scratch volume + bundle install scenario +ping +save test-6/before +put /tmp/bundle.tar --to /tmp/bundle.tar +run "cd /tmp && tar xf bundle.tar" +run "cat /tmp/bundle.txt" +save test-6/after +diff test-6/before test-6/after +EOF + +ci script /tmp/scenario.iris +``` + +**Expected:** each step prefixed with `[ok Nms]`, plus the natural output +of each command (diff table, etc.). Aborts on first failure. + +--- + +## Test 7 — HTTP registry pull + +Ship a snapshot between two "machines" (same machine, different `saves/`). + +```bash +# Move our latest snapshot into a registry directory +mkdir -p /tmp/iris-reg/snapshots /tmp/iris-reg/cas +cp -r saves/test-1/after /tmp/iris-reg/snapshots/test-1-after +cp -r saves/.cas/* /tmp/iris-reg/cas/ + +# Serve it +( cd /tmp/iris-reg && python3 -m http.server 8765 ) & +SVR=$! +sleep 1 + +# Delete local + pull +rm -rf saves/test-pulled saves/.cas +ci pull http://127.0.0.1:8765 test-pulled +ci pull http://127.0.0.1:8765 test-pulled # second pull, expect 0 chunks + +ci restore test-pulled +ci run 'cat /tmp/bundle.txt' + +# Cleanup +kill $SVR +rm -rf /tmp/iris-reg +``` + +**Expected:** +- First pull fetches all chunks (~270 MB). +- Second pull skips all chunks, transfers only ~3.5 MB of metadata, completes + in ~20 ms. +- Restore + cat shows the bundle marker — full round-trip working. + +--- + +## Cleanup + +```bash +ci quit +sleep 1 +rm -f /tmp/iris.sock /tmp/iris-ci-*-scsi*.overlay /tmp/iris.log /tmp/bundle.* scratch.raw +rm -rf saves/.cas saves/test-* saves/test-pulled + +# Optionally re-comment [scsi.2] in iris.toml +``` + +--- + +## What each test really proves + +| Test | Validates | +|---|---| +| Setup / boot | iris-ci wrapper + boot/login macros | +| 1 | Scratch volume + put + diff (Phases 2.4, 3.2) | +| 2 | In-memory rollback + COW overlay revert (Phase 2.1) | +| 3 | CAS dedup (Phase 3.1) + gc (Phase 3.2) | +| 4 | Snapshot determinism (Phase 3.3) — guards every future device change | +| 5 | Parent-chain tracking (Phase 1.2) + tree (Phase 3.2) | +| 6 | Script mode — replaces hand-managed multi-step sequences | +| 7 | HTTP registry pull (Phase 3.4) — Docker-layer-style snapshot sharing | diff --git a/rules/irix/irix-csh-scratch-raw-device-gotchas-when-you-cant-use-iris-ci.md b/rules/irix/irix-csh-scratch-raw-device-gotchas-when-you-cant-use-iris-ci.md new file mode 100644 index 0000000..dbf7c9d --- /dev/null +++ b/rules/irix/irix-csh-scratch-raw-device-gotchas-when-you-cant-use-iris-ci.md @@ -0,0 +1,44 @@ +# IRIX csh + scratch raw-device gotchas (when you can't use iris-ci) + +**Keywords:** irix,csh,bs512,scratch,dd,redirect,marker,wait,serial +**Category:** irix + +# IRIX csh + scratch raw-device gotchas + +If you're driving the CI socket without `iris-ci` (raw `nc`, foreign language harness, etc.), these are the pitfalls the wrapper handles for you. Use the wrapper if you can. + +## csh redirect syntax + +IRIX root logs into csh. `2>&1` is sh-only and csh fails to parse it silently. Use: + +- `>& /dev/null` — combined stdout+stderr to /dev/null +- `>& file` — combined stdout+stderr to file +- `>> file` — append stdout (csh has no portable stderr-only redirect) + +If you need sh semantics, wrap in `sh -c "..."`. + +## csh echoes typed input + +Any string in the typed command appears in the serial buffer twice — once as the literal input echo, once expanded in the output. A wait pattern of `IRIS-CI-RC=` matches the typed line (which contains the literal `IRIS-CI-RC=$status`) before the command runs. + +Use `\nIRIS-CI-RC=` as the wait pattern. The typed line has the marker inline; only the output line starts a fresh line with the marker, so the newline-prefixed pattern only matches the actual output. + +## Raw block-device alignment + +`/dev/rdsk/dks0dNs0` (the scratch payload partition) requires: + +- **Reads** in 512-byte multiples. `dd bs=64` returns `Read error: I/O error`. +- **Writes** padded to `bs`. From a 28-byte input, `dd bs=512 conv=sync,notrunc` zero-pads to 512. + +After a `dd … of=FILE bs=512 count=N` from the scratch device, the guest file is N×512 bytes — too long. Truncate to the real size with `dd if=/dev/null of=FILE bs=1 seek= count=0`. + +## Looking up byte counts in the guest + +`ls -l` column layout varies by IRIX version. Use `wc -c < FILE`, which prints just the byte count on one line and is cleanly parseable. + +## See also + +- `rules/snapshot/iris-ci-is-the-canonical-ci-socket-interface.md` — the wrapper that hides all of this +- `rules/snapshot/scratch-scsi-volume-sgi-vh-layout-and-irix-raw-device-gotchas.md` — partition layout and SGI VH details +- `rules/snapshot/ci-mode-overlay-path-is-tmpiris-ci-pid-scsiidoverlay.md` — where `--ci`'s COW overlay actually lives + diff --git a/rules/snapshot/ci-mode-overlay-path-is-tmpiris-ci-pid-scsiidoverlay.md b/rules/snapshot/ci-mode-overlay-path-is-tmpiris-ci-pid-scsiidoverlay.md new file mode 100644 index 0000000..e9524aa --- /dev/null +++ b/rules/snapshot/ci-mode-overlay-path-is-tmpiris-ci-pid-scsiidoverlay.md @@ -0,0 +1,34 @@ +# CI mode overlay path is /tmp/iris-ci-PID-scsiID.overlay + +**Keywords:** ci,overlay,scratch,/tmp,iris-ci,wd33c93a,cow,snapshot,debugging +**Category:** snapshot + +# CI Mode Overlay Path is /tmp-Based, Not Image-Sibling + +When iris is invoked with `--ci`, the COW overlay file does NOT live next to the base image (`.overlay`). It goes to `/tmp/iris-ci--scsi.overlay`. This isolates concurrent CI runs from each other and from any interactive session sharing the same base image. + +## Where it's set +`src/machine.rs:197`: +```rust +let ci_overlay = format!("/tmp/iris-ci-{}-scsi{}.overlay", ci_pid, id); +hpc3.add_scsi_device_with_overlay(id as usize, &path, dev.cdrom, discs, dev.overlay, &ci_overlay) +``` + +`src/wd33c93a.rs:255-258` honors the override: +```rust +let overlay_path = overlay_path_override + .map(|s| s.to_string()) + .unwrap_or_else(|| format!("{}.overlay", path)); +``` + +## Implications +- `rm -f irix65_4g.raw.overlay` before launching `--ci` is a no-op. +- To inspect the live overlay during a `--ci` run, find it via `lsof -p | grep overlay`. +- After the iris process exits, the CI overlay file remains under `/tmp` until the next reboot or manual cleanup. +- `save_snapshot` correctly captures the CI overlay regardless of path (it routes through `cow_disk::export_overlay`, which uses `self.overlay_path`). + +## Verification +``` +lsof -p $(pgrep -f 'target/release/iris.*--ci') | grep overlay +``` +Should show: `/private/tmp/iris-ci--scsi1.overlay` diff --git a/rules/snapshot/iris-ci-is-the-canonical-ci-socket-interface.md b/rules/snapshot/iris-ci-is-the-canonical-ci-socket-interface.md new file mode 100644 index 0000000..1175b2c --- /dev/null +++ b/rules/snapshot/iris-ci-is-the-canonical-ci-socket-interface.md @@ -0,0 +1,61 @@ +# iris-ci is the canonical CI socket interface + +**Keywords:** iris-ci,wrapper,ci,socket,bs512,csh,run,put,get,boot,login +**Category:** snapshot + +# iris-ci — the right way to drive the CI socket + +Built alongside `iris` from `src/iris_ci_main.rs`. Talks to `/tmp/iris.sock` with typed clap subcommands. Use this, not raw `nc` + JSON, for any new automation, runbook, or test scenario. + +## Common workflows + +```bash +iris-ci boot # PROM menu → IRIS console login (~40s on M2) +iris-ci login # send root + handle vt100 prompt + wait # +iris-ci run 'echo hello' # send shell command, get stdout, exit on guest failure +iris-ci put localfile.tar # copy host file into guest, no bs=512 math +iris-ci get /tmp/log --to . # pull guest file out, no conv=sync math +iris-ci save base/desktop +iris-ci diff a b # per-device + chunk + cow-sector deltas +iris-ci script tests/x.iris # batch-run a sequence (one cmd per line, # comments) +iris-ci pull http://reg/foo bar +``` + +`iris-ci --help` for the full subcommand list, `iris-ci --help` for any subcommand. + +## Why not raw nc + JSON + +Three real bugs that bit during dogfooding and that the wrapper handles for you: + +### 1. csh redirect syntax + +IRIX root login uses csh by default. `2>&1` is sh-only. Use `>& /dev/null` for combined stdout+stderr in csh. + +### 2. csh echoes typed input verbatim + +Any wait pattern that appears in your typed command will match the input echo BEFORE the command runs. So a marker like `IRIS-CI-RC=` matches both: +- the typed-input echo line (which contains literal `IRIS-CI-RC=$status`) +- the actual output line (which contains `IRIS-CI-RC=0`) + +Wait for `\nIRIS-CI-RC=` (newline-prefixed) — only matches at the start of the OUTPUT line, never inside the typed-input echo line because the echo is on its own line with no leading `\n` immediately before the marker. + +### 3. IRIX raw block-device gotchas + +- Reads MUST use `bs=512` or any 512-multiple. `bs=64` returns `Read error: I/O error` with no SCSI-level diagnostic. +- Writes must be padded to `bs`. From a 28-byte input file, `dd bs=512 conv=sync,notrunc` zero-pads to 512. Without `conv=sync`, the partial-block write fails. +- After receiving via `dd … of=FILE bs=512 count=N`, the guest file is N×512 bytes — too long. Truncate with `dd if=/dev/null of=FILE bs=1 seek=ORIG count=0`. + +`iris-ci put` and `iris-ci get` handle all three transparently. The user passes a host filename and a guest path; the wrapper computes counts, chooses csh-correct redirects, runs `wc -c` for size lookup, and truncates as needed. + +## When to read the JSON directly + +For automation that doesn't want to depend on `iris-ci` (e.g. a test harness in another language), the underlying socket protocol is newline-delimited JSON. Each request is one JSON object with `cmd` and `args`; each response is one JSON object with `ok` and `data` or `error`. See `src/ci.rs` for the dispatch table. Don't expect to do this comfortably from a shell script — that's why iris-ci exists. + +## Implementation notes + +- Single-request, single-response per invocation. Connect, write one line, read one line, shutdown the write side so the server's read loop exits cleanly. +- `cmd_run` waits for `\nIRIS-CI-RC=` then drains the trailing `\nIRIS N# ` to keep the next command's drain clean. Sleeps 150ms between the wait and the trailing read to let those bytes arrive. +- `extract_run_stdout` skips the first `\n` (end of typed echo line), strips the trailing `\nIRIS-CI-RC=` marker, normalises CRLF. +- `cmd_put` uses `dd if=/dev/null of=FILE bs=1 seek=N count=0` for truncation rather than perl; perl isn't reliably installed in IRIX 6.5. +- `cmd_get` uses `wc -c < FILE` for size lookup. Avoids parsing `ls -l` columns which vary across IRIX versions. + diff --git a/rules/snapshot/per-device-saveloadsave-round-trip-is-the-regression-net.md b/rules/snapshot/per-device-saveloadsave-round-trip-is-the-regression-net.md new file mode 100644 index 0000000..106c5b3 --- /dev/null +++ b/rules/snapshot/per-device-saveloadsave-round-trip-is-the-regression-net.md @@ -0,0 +1,49 @@ +# Per-device save→load→save round-trip is the regression net + +**Keywords:** snapshot,round-trip,save_state,load_state,regression,test,convention +**Category:** snapshot + +# Round-Trip Test Convention + +Every device with a `Saveable` impl gets a `save_load_round_trip` test in its `#[cfg(test)] mod tests`. Catches save/load asymmetries that would otherwise corrupt snapshots silently. + +## Pattern + +```rust +#[test] +fn save_load_round_trip() { + let src = Device::new(...); + // 1. Mutate to non-default state. + { + let mut s = src.state.lock(); + // ... touch fields that save_state serializes + } + let v1 = src.save_state(); + + let dst = Device::new(...); + dst.load_state(&v1).expect("load_state"); + let v2 = dst.save_state(); + + assert_eq!(v1, v2, "Device save_state mismatch after load_state round-trip"); +} +``` + +## Conventions + +- **Mutate first.** Saving an all-default state proves nothing — a load that no-ops on every field will pass. +- **Use null/CI constructors when devices bind ports.** Z85c30::new_null avoids TCP 8880/8881; Ioc::new_ci uses null backends. +- **If load_state has a side-effect that derives state from other fields, call it on src before saving.** Example: IOC update_interrupts re-derives MAP_INT0/MAP_INT1 cascade bits in l0_stat/l1_stat from (map_stat & map_mask{0,1}). Save the post-derive state so v1 already includes the cascade — otherwise v2 differs by the cascade bits. +- **Disable wall-clock-driven side effects.** RTC: clear TE_BIT before saving so save_state doesn't tick the host clock between v1 and v2. + +## What's covered + +eeprom_93c56, ds1x86, ioc, pit8254, mc, mips_tlb, ps2, z85c30, wd33c93a, seeq8003. + +## What's not (yet) covered + +- hpc3 — composite of nested devices. Round-trip indirectly tested via end-to-end snapshot/restore. +- rex3 — 16 MB framebuffers + massive VC2/CMAP/XMAP state. +- mips_exec — needs Tlb+Cache type params + Bus integration. + +These are exercised by the end-to-end snapshot/restore validation in the CI socket workflow. + diff --git a/rules/snapshot/scratch-scsi-volume-sgi-vh-layout-and-irix-raw-device-gotchas.md b/rules/snapshot/scratch-scsi-volume-sgi-vh-layout-and-irix-raw-device-gotchas.md new file mode 100644 index 0000000..4e8a377 --- /dev/null +++ b/rules/snapshot/scratch-scsi-volume-sgi-vh-layout-and-irix-raw-device-gotchas.md @@ -0,0 +1,40 @@ +# Scratch SCSI volume - SGI VH layout and IRIX raw-device gotchas + +**Keywords:** scratch,sgi,vh,volume,header,partition,scsi,raw,dd,iris,irix,phase2.4 +**Category:** snapshot + +# Scratch SCSI Volume (Phase 2.4) + +A SCSI device with `scratch = true` in `iris.toml` is a host-controlled raw block device for file injection/extraction without networking. iris pre-formats it with a minimal SGI Volume Header. + +## Partition layout + +| Slot | Device node | Purpose | Type | first_block | nblks | +|------|------------------------|--------------------------|-------------|-------------|-------------| +| 0 | /dev/rdsk/dks0dNs0 | Payload (host writes) | PT_RAW=3 | 8 | total - 8 | +| 8 | /dev/rdsk/dks0dNvh | Volume header itself | PT_VOLHDR=0 | 0 | 8 | +| 10 | /dev/rdsk/dks0dNvol | Whole-disk view | PT_VOLUME=6 | 0 | total | + +Slot 10 (vol) is special - by SGI convention it always starts at sector 0 regardless of first_block. Use slot 0 (s0) for payload reads. + +## Host wire format + +scratch-write and scratch-read operate on the payload area. offset = 0 means raw-byte 4096 in the underlying file (the first byte after the VH). The CI commands never touch the VH. + +``` +host: iris CI: scratch-write {host_path: "bundle.tar"} +guest: dd if=/dev/rdsk/dks0d2s0 bs=512 | tar xf - +``` + +## IRIX gotchas + +1. Reads must be sector-aligned. dd bs=64 returns "Read error: I/O error" with no SCSI-level error. Use bs=512 (or any 512-multiple). +2. Writes must be padded to bs. dd bs=512 from a 28-byte file produces "0+1 records in / 0+0 records out" with "Write error: I/O error". Add conv=sync to pad with zeros, plus conv=notrunc if you don't want to truncate the device file: + `dd if=/tmp/data of=/dev/rdsk/dks0d2s0 bs=512 conv=sync,notrunc` +3. Without a valid VH at sector 0, IRIX creates the device nodes but every read returns I/O error. +4. Checksum is required: vh_csum at offset 0x1F8 must make the sum of all 128 big-endian u32 words equal 0. iris computes this in sgi_vh::fix_csum. + +## When to use scratch over unfsd + +unfsd needs a manual build on macOS, is flaky in our experience, and requires IRIX networking before any file movement. The scratch volume works at PROM time, single-user, or any other phase. + diff --git a/rules/snapshot/snapshot-manifest-format-snapshottoml-schema-version1.md b/rules/snapshot/snapshot-manifest-format-snapshottoml-schema-version1.md new file mode 100644 index 0000000..6eb0252 --- /dev/null +++ b/rules/snapshot/snapshot-manifest-format-snapshottoml-schema-version1.md @@ -0,0 +1,37 @@ +# Snapshot manifest format (snapshot.toml schema_version=1) + +**Keywords:** snapshot,manifest,schema,version,parent,host_arch,iris_git_rev,saves +**Category:** snapshot + +# Snapshot Manifest (snapshot.toml) + +Every snapshot saved by Phase 1+ writes a `snapshot.toml` at the top of `saves//`. It is read FIRST on load so format mismatches fail fast with a clear error, before any device state is touched. + +## Schema +```toml +schema_version = 1 # u32, current = 1 +host_arch = "aarch64" # std::env::consts::ARCH at save time +created_at_unix = 1777764190 # u64 unix seconds +installed_bundles = [] # Vec; populated by mogrix tooling +# optional: +iris_git_rev = "abc123" # from option_env!("IRIS_GIT_REV") at build time +parent = "base/desktop" # name of the snapshot we restored from before this save +description = "post-mogrix" # free-form note +``` + +## Load behavior (`src/machine.rs:633` `load_snapshot`) +- **No manifest** → treated as legacy v0 with a warning. Best-effort load. (Old `saves/working*` snapshots are v0.) +- **`schema_version > 1`** → refuse: "snapshot schema_version N is newer than this iris build supports (1)". +- **`host_arch` mismatch** → refuse. FPU bit-layout differs cross-arch and there's no migration plumbing yet. +- **`iris_git_rev` mismatch with current build** → warn but proceed. Snapshots are not pinned to commits. + +## Where it's defined +- `src/snapshot.rs` — `Manifest` struct, `to_toml`/`from_toml`, `Snapshot::write_manifest`/`read_manifest`, `SCHEMA_VERSION` const. +- `src/machine.rs:594` writes the manifest first thing in `save_snapshot`. `parent` is auto-set to `self.last_restore`. +- `src/machine.rs:643-672` validates on load. + +## CI inspection +- `info ` socket command returns the manifest plus `bytes_on_disk` for any snapshot. Legacy snapshots return `{"schema_version":0,"legacy":true}`. + +## Future bumps +When a device's `save_state` format changes incompatibly, increment `SCHEMA_VERSION` and add migration logic keyed off the old version number. Don't silently break v1 readers. diff --git a/rules/snapshot/snapshot-v2-device-state-is-postcard-encoded-binvalue-bin.md b/rules/snapshot/snapshot-v2-device-state-is-postcard-encoded-binvalue-bin.md new file mode 100644 index 0000000..8664839 --- /dev/null +++ b/rules/snapshot/snapshot-v2-device-state-is-postcard-encoded-binvalue-bin.md @@ -0,0 +1,46 @@ +# Snapshot v2 device state is postcard-encoded BinValue *.bin + +**Keywords:** snapshot,binvalue,postcard,schema_version,binary,device,state +**Category:** snapshot + +# Snapshot v2: Postcard BinValue Device State + +For schema_version=2 snapshots, every per-device save_state lives in `.bin` (postcard-encoded) instead of `.toml`. + +## Why a tagged enum + +Postcard is non-self-describing — it cannot deserialize directly into `toml::Value`, whose Deserialize impl uses `deserialize_any`. To round-trip toml::Value we mirror it as `BinValue` (tagged) in `src/snapshot.rs`: + +```rust +pub enum BinValue { + String(String), Integer(i64), Float(f64), Boolean(bool), + Array(Vec), Table(Vec<(String, BinValue)>), + Datetime(String), // ISO-8601, falls back to String on parse error +} +``` + +The conversion `toml::Value` <-> `BinValue` is a single tree walk; sub-millisecond for typical device tables. + +## What's TOML, what's binary + +- TOML (text): `snapshot.toml` (manifest), `cow.toml` (overlay dirty sectors). +- Binary (postcard): `cpu.bin`, `mc.bin`, `ioc.bin`, `scc.bin`, `pit.bin`, `ps2.bin`, `rtc.bin`, `eeprom.bin`, `scsi.bin`, `seeq.bin`, `hpc3.bin`, `rex3.bin` (when REX3 present). +- Raw (untouched): `bank0..3.bin` (RAM), `rex3_rgb/aux.bin` (framebuffers), `scsi*.overlay` (COW). + +## Save/load helpers + +`Snapshot::write_state(base, &Value, schema_version)` picks `.bin` for v2+ and `.toml` for legacy. Mirror: `read_state(base, schema_version)`. v2 read also falls back to .toml when .bin is missing — half-migrated snapshots from external tooling still load. + +## Performance + +cpu state on M2 (3.6 MB cpu.toml legacy): +- TOML parse: 19.7 ms avg +- Postcard decode + BinValue->Value: 5.8 ms avg +- 3.4x speedup, 24% size reduction (2.79 MB bin vs 3.65 MB toml) + +End-to-end cold restore: 189 ms (v1) -> 145 ms (v2), ~23%. + +## Backward compatibility + +Manifest read first. Missing manifest -> schema_version=0, loads `*.toml`. Manifest version > SCHEMA_VERSION -> hard refuse. host_arch mismatch -> hard refuse (FPU bit-layout differs cross-arch). + diff --git a/rust-toolchain.toml b/rust-toolchain.toml new file mode 100644 index 0000000..2bf54c6 --- /dev/null +++ b/rust-toolchain.toml @@ -0,0 +1,3 @@ +[toolchain] +channel = "nightly" +components = ["rustc", "cargo", "rust-std", "clippy", "rustfmt"] diff --git a/src/chunk_store.rs b/src/chunk_store.rs new file mode 100644 index 0000000..6ee7e68 --- /dev/null +++ b/src/chunk_store.rs @@ -0,0 +1,321 @@ +//! Phase 3.1: content-addressable chunk store for snapshot RAM. +//! +//! Each snapshot's RAM banks are split into 64 KB chunks, BLAKE3-hashed, +//! and stored as `saves/.cas//` (sharded by the first byte to +//! keep any one directory under a few thousand files). Snapshots reference +//! chunks by hash; identical chunks across snapshots share storage. A +//! `mogrix-bundle-test` workflow that snapshots between every install +//! shares 95–99% of RAM with its parent, so adding a new snapshot costs +//! only the bytes that actually changed. +//! +//! Layout: +//! ```text +//! saves/.cas/ +//! ab/ +//! cd1234...beef.chunk ← BLAKE3 hash, hex64, raw 64KB content +//! cd/ +//! ef9876...cafe.chunk +//! ``` +//! +//! On-disk chunks are immutable (CAS). `gc(live_set)` deletes any chunk +//! whose hash isn't referenced by a kept snapshot's manifest — cheap to run +//! and the only way to actually free space (since `delete ` only +//! removes the manifest, not the underlying chunks). + +use std::collections::HashSet; +use std::fs; +use std::io::{self, Read, Write}; +use std::path::{Path, PathBuf}; + +/// Chunk size in bytes. 64 KB is the plan-cited sweet spot — small enough +/// that a few-page write to RAM only dirties one chunk, large enough that +/// per-chunk hashing + filesystem overhead doesn't dominate. +pub const CHUNK_SIZE: usize = 64 * 1024; + +const CAS_DIR: &str = ".cas"; +const CHUNK_EXT: &str = "chunk"; + +/// 32-byte BLAKE3 digest. +pub type ChunkHash = [u8; 32]; + +pub struct ChunkStore { + root: PathBuf, +} + +impl ChunkStore { + /// `saves_dir` is e.g. `Path::new("saves")`. The chunk store lives at + /// `saves_dir/.cas/`. + pub fn new(saves_dir: impl AsRef) -> Self { + Self { root: saves_dir.as_ref().join(CAS_DIR) } + } + + pub fn root(&self) -> &Path { &self.root } + + /// Hash `data`, write it as `saves/.cas//.chunk` if absent, + /// return the hash. Idempotent — concurrent saves of the same chunk are + /// safe; the second call is a no-op. + /// + /// Crash-safety: chunks are written to a `.tmp` sibling then renamed + /// (atomic on POSIX), so a partial write never appears under the final + /// content-addressed name. We deliberately skip per-chunk `fsync` — + /// 4096 fsyncs per snapshot was costing ~20 s on APFS for the first + /// save of a 256 MB image. If the process dies mid-save the manifest + /// (`chunks.bin`) hasn't been written yet, so any complete chunks are + /// just orphaned bytes that `gc` will sweep later. + pub fn put(&self, data: &[u8]) -> io::Result { + let hash: ChunkHash = blake3::hash(data).into(); + let path = self.path_for(&hash); + if path.exists() { + return Ok(hash); + } + if let Some(dir) = path.parent() { + fs::create_dir_all(dir)?; + } + let tmp = path.with_extension("chunk.tmp"); + { + let mut f = fs::File::create(&tmp)?; + f.write_all(data)?; + } + // Rename is atomic on POSIX. If two threads raced, the loser's + // rename overwrites the winner's identical content — fine. + fs::rename(&tmp, &path)?; + Ok(hash) + } + + pub fn get(&self, hash: &ChunkHash) -> io::Result> { + let path = self.path_for(hash); + let mut f = fs::File::open(&path)?; + let mut data = Vec::with_capacity(CHUNK_SIZE); + f.read_to_end(&mut data)?; + Ok(data) + } + + pub fn has(&self, hash: &ChunkHash) -> bool { + self.path_for(hash).exists() + } + + /// Remove any chunk whose hash isn't in `live`. Returns (removed_count, + /// removed_bytes). Safe to interrupt — chunks not yet visited stay. + pub fn gc(&self, live: &HashSet) -> io::Result<(usize, u64)> { + if !self.root.is_dir() { + return Ok((0, 0)); + } + let mut removed = 0usize; + let mut bytes_removed = 0u64; + for shard in fs::read_dir(&self.root)? { + let shard = shard?; + if !shard.file_type()?.is_dir() { continue; } + for chunk in fs::read_dir(shard.path())? { + let chunk = chunk?; + let path = chunk.path(); + let Some(stem) = path.file_stem().and_then(|s| s.to_str()) else { continue }; + let Some(hash) = parse_hex62(stem, &shard.file_name().to_string_lossy()) else { continue }; + if !live.contains(&hash) { + let size = chunk.metadata().map(|m| m.len()).unwrap_or(0); + if fs::remove_file(&path).is_ok() { + removed += 1; + bytes_removed += size; + } + } + } + } + Ok((removed, bytes_removed)) + } + + /// Total bytes occupied by the chunk store. Useful for `info` reporting. + pub fn total_size(&self) -> io::Result { + if !self.root.is_dir() { return Ok(0); } + let mut total = 0u64; + for shard in fs::read_dir(&self.root)? { + let shard = shard?; + if !shard.file_type()?.is_dir() { continue; } + for chunk in fs::read_dir(shard.path())? { + let chunk = chunk?; + total += chunk.metadata().map(|m| m.len()).unwrap_or(0); + } + } + Ok(total) + } + + pub fn path_for(&self, hash: &ChunkHash) -> PathBuf { + let hex = hex_encode(hash); + // Shard by first byte: saves/.cas/ab/cd1234...beef.chunk + let (head, tail) = hex.split_at(2); + self.root.join(head).join(format!("{}.{}", tail, CHUNK_EXT)) + } +} + +fn hex_encode(bytes: &[u8; 32]) -> String { + const HEX: &[u8; 16] = b"0123456789abcdef"; + let mut s = String::with_capacity(64); + for &b in bytes.iter() { + s.push(HEX[(b >> 4) as usize] as char); + s.push(HEX[(b & 0x0f) as usize] as char); + } + s +} + +fn parse_hex62(tail: &str, head: &str) -> Option { + if tail.len() != 62 || head.len() != 2 { return None; } + let mut out = [0u8; 32]; + let mut full = String::with_capacity(64); + full.push_str(head); + full.push_str(tail); + let bytes = full.as_bytes(); + for i in 0..32 { + out[i] = (hex_nibble(bytes[i * 2])? << 4) | hex_nibble(bytes[i * 2 + 1])?; + } + Some(out) +} + +fn hex_nibble(c: u8) -> Option { + match c { + b'0'..=b'9' => Some(c - b'0'), + b'a'..=b'f' => Some(10 + c - b'a'), + b'A'..=b'F' => Some(10 + c - b'A'), + _ => None, + } +} + +/// Walk `words` (host-endian u32) as big-endian byte chunks of `CHUNK_SIZE`, +/// store each chunk via `store.put`, and collect the hashes in order. The +/// final chunk may be smaller if `words.len() * 4` isn't a multiple of +/// `CHUNK_SIZE`. Returns the per-chunk hash list — concat'ing those chunks +/// in order reproduces the bank's BE byte stream exactly. +pub fn put_words_as_chunks( + store: &ChunkStore, + words: &[u32], +) -> io::Result> { + let bytes_total = words.len() * 4; + let chunk_words = CHUNK_SIZE / 4; + let mut hashes = Vec::with_capacity(bytes_total.div_ceil(CHUNK_SIZE)); + let mut buf = vec![0u8; CHUNK_SIZE]; + let mut i = 0usize; + while i < words.len() { + let take = (words.len() - i).min(chunk_words); + let bytes_this_chunk = take * 4; + for (k, &w) in words[i..i + take].iter().enumerate() { + buf[k * 4..k * 4 + 4].copy_from_slice(&w.to_be_bytes()); + } + let chunk_slice = &buf[..bytes_this_chunk]; + hashes.push(store.put(chunk_slice)?); + i += take; + } + Ok(hashes) +} + +/// Inverse of `put_words_as_chunks`. Given a hash list, fetch each chunk +/// and decode BE bytes back into a `Vec`. Caller is responsible for +/// cross-checking the resulting length against the bank's expected size. +pub fn get_chunks_as_words( + store: &ChunkStore, + hashes: &[ChunkHash], +) -> io::Result> { + let mut words = Vec::with_capacity(hashes.len() * (CHUNK_SIZE / 4)); + for h in hashes { + let bytes = store.get(h)?; + if bytes.len() % 4 != 0 { + return Err(io::Error::new( + io::ErrorKind::InvalidData, + format!("chunk size {} not a multiple of 4", bytes.len()), + )); + } + for chunk in bytes.chunks_exact(4) { + words.push(u32::from_be_bytes([chunk[0], chunk[1], chunk[2], chunk[3]])); + } + } + Ok(words) +} + +#[cfg(test)] +mod tests { + use super::*; + + fn unique_tmp_dir(tag: &str) -> PathBuf { + let nanos = std::time::SystemTime::now() + .duration_since(std::time::UNIX_EPOCH) + .map(|d| d.as_nanos()) + .unwrap_or(0); + let p = std::env::temp_dir().join(format!("iris-cas-{}-{}", tag, nanos)); + fs::create_dir_all(&p).unwrap(); + p + } + + #[test] + fn put_get_round_trip() { + let dir = unique_tmp_dir("rt"); + let store = ChunkStore::new(&dir); + let data = b"hello world chunk content here"; + let h = store.put(data).unwrap(); + assert!(store.has(&h)); + assert_eq!(store.get(&h).unwrap(), data); + let _ = fs::remove_dir_all(&dir); + } + + #[test] + fn put_dedupes_identical_content() { + let dir = unique_tmp_dir("dedupe"); + let store = ChunkStore::new(&dir); + let data = vec![0xAB; 1024]; + let h1 = store.put(&data).unwrap(); + let h2 = store.put(&data).unwrap(); + assert_eq!(h1, h2); + // Only one file on disk. + let mut count = 0; + for shard in fs::read_dir(&dir.join(".cas")).unwrap() { + for _ in fs::read_dir(shard.unwrap().path()).unwrap() { + count += 1; + } + } + assert_eq!(count, 1, "duplicate put should not write twice"); + let _ = fs::remove_dir_all(&dir); + } + + #[test] + fn put_get_words_round_trip() { + let dir = unique_tmp_dir("words"); + let store = ChunkStore::new(&dir); + // 33 KB worth of words — exercises the partial-final-chunk path. + let words: Vec = (0..33 * 256).map(|i| 0x80000000_u32 ^ (i as u32)).collect(); + let hashes = put_words_as_chunks(&store, &words).unwrap(); + let got = get_chunks_as_words(&store, &hashes).unwrap(); + assert_eq!(got, words); + let _ = fs::remove_dir_all(&dir); + } + + #[test] + fn put_words_two_banks_share_zero_chunks() { + // Two all-zero banks should produce the same hashes — same chunk + // stored once, both bank manifests reference it. + let dir = unique_tmp_dir("zero"); + let store = ChunkStore::new(&dir); + let words_a = vec![0u32; CHUNK_SIZE / 4]; + let words_b = vec![0u32; CHUNK_SIZE / 4]; + let h_a = put_words_as_chunks(&store, &words_a).unwrap(); + let h_b = put_words_as_chunks(&store, &words_b).unwrap(); + assert_eq!(h_a, h_b); + // One physical chunk file. + let mut count = 0; + for shard in fs::read_dir(&dir.join(".cas")).unwrap() { + for _ in fs::read_dir(shard.unwrap().path()).unwrap() { + count += 1; + } + } + assert_eq!(count, 1, "two zero banks must dedupe to a single chunk"); + let _ = fs::remove_dir_all(&dir); + } + + #[test] + fn gc_removes_unreferenced() { + let dir = unique_tmp_dir("gc"); + let store = ChunkStore::new(&dir); + let h_keep = store.put(b"keep me").unwrap(); + let _h_drop = store.put(b"drop me").unwrap(); + let mut live = HashSet::new(); + live.insert(h_keep); + let (removed, _bytes) = store.gc(&live).unwrap(); + assert_eq!(removed, 1); + assert!(store.has(&h_keep)); + let _ = fs::remove_dir_all(&dir); + } +} diff --git a/src/ci.rs b/src/ci.rs new file mode 100644 index 0000000..34b0ddc --- /dev/null +++ b/src/ci.rs @@ -0,0 +1,1014 @@ +//! CI control socket. +//! +//! Unix domain socket that drives the emulator for automated testing. The +//! protocol is newline-delimited JSON, strict request/response, single client. +//! See `ci_mode_plan.md` in the repo root. + +#![cfg(unix)] + +use std::io::{BufRead, BufReader, Write}; +use std::os::unix::net::{UnixListener, UnixStream}; +use std::sync::Arc; +use std::thread; +use std::time::Duration; + +use parking_lot::Mutex; +use serde::{Deserialize, Serialize}; +use serde_json::Value; + +use crate::machine::Machine; +use crate::rex3::Rex3; +use crate::z85c30::CiSerialBackend; + +/// Set at `start_server`; consulted by `quit` so the socket file is cleaned up +/// before `std::process::exit` (which skips Drop). +static SOCKET_PATH: Mutex> = Mutex::new(None); + +#[derive(Deserialize)] +struct Request { + cmd: String, + #[serde(default)] + args: Value, +} + +#[derive(Serialize)] +struct Response { + ok: bool, + #[serde(skip_serializing_if = "Option::is_none")] + data: Option, + #[serde(skip_serializing_if = "Option::is_none")] + error: Option, +} + +impl Response { + fn ok() -> Self { Self { ok: true, data: None, error: None } } + fn data(v: Value) -> Self { Self { ok: true, data: Some(v), error: None } } + fn err(msg: impl Into) -> Self { + Self { ok: false, data: None, error: Some(msg.into()) } + } +} + +// ---------------------------------------------------------------------------- +// Server +// ---------------------------------------------------------------------------- + +/// Holder for the raw `*mut Machine` passed in from `main`. The pointer is +/// valid for the process lifetime because `Machine` lives on main's stack. +/// Mirrors the `SystemController` pattern in `machine.rs`. +struct MachinePtr(*mut Machine); +unsafe impl Send for MachinePtr {} +unsafe impl Sync for MachinePtr {} + +pub struct CiServer { + socket_path: String, + machine: Arc>, + ci_serial: Arc, + /// Optional in case --headless is also passed (no REX3). Screenshot + /// commands return an error in that case. + rex3: Option>, +} + +impl Drop for CiServer { + fn drop(&mut self) { + let _ = std::fs::remove_file(&self.socket_path); + } +} + +impl CiServer { + fn with_machine(&self, f: impl FnOnce(&mut Machine) -> R) -> R { + let mut guard = self.machine.lock(); + // SAFETY: pointer is valid for process lifetime; this mutex serializes + // all Machine accesses from CI command handlers. CPU/peripheral threads + // observe state changes only when the methods we call stop them first + // (ci_restore/ci_rollback do). + let machine = unsafe { &mut *(guard.0) }; + f(machine) + } +} + +/// Bind the control socket, spawn the accept thread, return a handle. +/// +/// # Safety +/// `machine_ptr` must remain valid for the process lifetime. Pass the address +/// of a `Machine` owned by `main`'s stack (or a heap-pinned Box that `main` +/// keeps alive). +pub fn start_server( + machine_ptr: *mut Machine, + socket_path: &str, +) -> Result, String> { + // SAFETY: caller guarantees the pointer is valid. + let ci_serial = unsafe { (*machine_ptr).get_ci_serial() } + .ok_or_else(|| "CI mode: CiSerialBackend not installed on Machine".to_string())?; + let rex3 = unsafe { (*machine_ptr).get_rex3() }; + + let path = socket_path.to_string(); + // Clear stale socket from a previous run. + let _ = std::fs::remove_file(&path); + let listener = UnixListener::bind(&path) + .map_err(|e| format!("failed to bind {}: {}", path, e))?; + + eprintln!("iris: --ci control socket listening at {}", path); + + *SOCKET_PATH.lock() = Some(path.clone()); + + let server = Arc::new(CiServer { + socket_path: path, + machine: Arc::new(Mutex::new(MachinePtr(machine_ptr))), + ci_serial, + rex3, + }); + + let server_clone = server.clone(); + thread::Builder::new() + .name("iris-ci-accept".into()) + .spawn(move || { + for conn in listener.incoming() { + match conn { + Ok(stream) => { + let s = server_clone.clone(); + thread::Builder::new() + .name("iris-ci-handler".into()) + .spawn(move || handle_client(s, stream)) + .ok(); + } + Err(e) => eprintln!("iris-ci-accept: {}", e), + } + } + }) + .map_err(|e| format!("failed to spawn CI accept thread: {}", e))?; + + Ok(server) +} + +// ---------------------------------------------------------------------------- +// Connection handling +// ---------------------------------------------------------------------------- + +fn handle_client(server: Arc, stream: UnixStream) { + let reader = match stream.try_clone() { + Ok(s) => BufReader::new(s), + Err(e) => { + eprintln!("iris-ci-handler: clone failed: {}", e); + return; + } + }; + let mut writer = stream; + for line in reader.lines() { + let Ok(line) = line else { break }; + let trimmed = line.trim(); + if trimmed.is_empty() { continue; } + + let response = match serde_json::from_str::(trimmed) { + Ok(req) => dispatch(&server, &req), + Err(e) => Response::err(format!("invalid json: {}", e)), + }; + + let mut out = match serde_json::to_vec(&response) { + Ok(v) => v, + Err(e) => serde_json::to_vec(&Response::err(format!("encode: {}", e))).unwrap_or_default(), + }; + out.push(b'\n'); + if writer.write_all(&out).is_err() { break; } + } +} + +// ---------------------------------------------------------------------------- +// Dispatch +// ---------------------------------------------------------------------------- + +fn dispatch(server: &CiServer, req: &Request) -> Response { + match req.cmd.as_str() { + "ping" => Response::ok(), + "quit" => cmd_quit(), + "start" => cmd_start(server), + "save" => cmd_save(server, &req.args), + "restore" => cmd_restore(server, &req.args), + "rollback" => cmd_rollback(server), + "list" => cmd_list(&req.args), + "info" => cmd_info(&req.args), + "delete" => cmd_delete(&req.args), + "serial-send" => cmd_serial_send(server, &req.args), + "serial-read" => cmd_serial_read(server), + "wait-serial" => cmd_wait_serial(server, &req.args), + "screenshot" => cmd_screenshot(server, &req.args), + "scratch-write" => cmd_scratch_write(server, &req.args), + "scratch-read" => cmd_scratch_read(server, &req.args), + "scratch-clear" => cmd_scratch_clear(server), + "scratch-info" => cmd_scratch_info(server), + "validate" => cmd_validate(server, &req.args), + "gc" => cmd_gc(), + "diff" => cmd_diff(&req.args), + "tree" => cmd_tree(), + "pull" => cmd_pull(&req.args), + "push" => cmd_push(&req.args), + other => Response::err(format!("unknown command: {}", other)), + } +} + +fn cmd_quit() -> Response { + // Schedule process exit after a brief delay so the response flushes. + thread::spawn(|| { + thread::sleep(Duration::from_millis(50)); + if let Some(p) = SOCKET_PATH.lock().take() { + let _ = std::fs::remove_file(&p); + } + std::process::exit(0); + }); + Response::ok() +} + +fn cmd_start(server: &CiServer) -> Response { + server.with_machine(|m| m.cpu_start()); + Response::ok() +} + +fn cmd_save(server: &CiServer, args: &Value) -> Response { + let name = match args.get("name").and_then(|v| v.as_str()) { + Some(n) => n.to_string(), + None => return Response::err("save: missing 'name' arg"), + }; + match server.with_machine(|m| m.save_snapshot(&name)) { + Ok(()) => Response::ok(), + Err(e) => Response::err(format!("save failed: {}", e)), + } +} + +fn cmd_restore(server: &CiServer, args: &Value) -> Response { + let name = match args.get("name").and_then(|v| v.as_str()) { + Some(n) => n.to_string(), + None => return Response::err("restore: missing 'name' arg"), + }; + match server.with_machine(|m| m.ci_restore(&name)) { + Ok(()) => Response::ok(), + Err(e) => Response::err(format!("restore failed: {}", e)), + } +} + +fn cmd_rollback(server: &CiServer) -> Response { + match server.with_machine(|m| m.ci_rollback()) { + Ok(()) => Response::ok(), + Err(e) => Response::err(format!("rollback failed: {}", e)), + } +} + +fn cmd_list(_args: &Value) -> Response { + // Walk saves/ recursively, return every directory that contains a + // snapshot.toml (current format) OR a cpu.toml (legacy v0). Names are + // returned slash-joined relative to saves/. + let root = std::path::Path::new("saves"); + if !root.is_dir() { + return Response::data(serde_json::json!({ "snapshots": [] })); + } + let mut out: Vec = Vec::new(); + let mut stack: Vec = vec![root.to_path_buf()]; + while let Some(dir) = stack.pop() { + let entries = match std::fs::read_dir(&dir) { + Ok(e) => e, + Err(_) => continue, + }; + let mut subdirs: Vec = Vec::new(); + let mut is_snapshot = false; + for e in entries.flatten() { + let p = e.path(); + if p.is_dir() { + subdirs.push(p); + } else if let Some(name) = p.file_name().and_then(|n| n.to_str()) { + if name == "snapshot.toml" || name == "cpu.toml" { + is_snapshot = true; + } + } + } + if is_snapshot { + if let Ok(rel) = dir.strip_prefix(root) { + let s = rel.to_string_lossy().replace('\\', "/"); + if !s.is_empty() { + out.push(s); + } + } + } + for s in subdirs { + stack.push(s); + } + } + out.sort(); + Response::data(serde_json::json!({ "snapshots": out })) +} + +fn cmd_info(args: &Value) -> Response { + let name = match args.get("name").and_then(|v| v.as_str()) { + Some(n) => n, + None => return Response::err("info: missing 'name' arg"), + }; + let dir = std::path::Path::new("saves").join(name); + if !dir.is_dir() { + return Response::err(format!("info: snapshot '{}' not found", name)); + } + let snap = crate::snapshot::Snapshot::new(&dir); + let manifest = match snap.read_manifest() { + Ok(Some(m)) => Some(m), + Ok(None) => None, + Err(e) => return Response::err(format!("info: manifest read failed: {}", e)), + }; + + // Disk usage rollup: sum file sizes inside the snapshot dir. + let mut bytes_on_disk: u64 = 0; + if let Ok(walker) = std::fs::read_dir(&dir) { + for e in walker.flatten() { + if let Ok(meta) = e.metadata() { + if meta.is_file() { + bytes_on_disk += meta.len(); + } + } + } + } + + let mut out = serde_json::Map::new(); + out.insert("name".into(), Value::String(name.to_string())); + out.insert("bytes_on_disk".into(), Value::Number(bytes_on_disk.into())); + if let Some(m) = manifest { + out.insert("schema_version".into(), Value::Number(m.schema_version.into())); + out.insert("host_arch".into(), Value::String(m.host_arch)); + out.insert("created_at_unix".into(), Value::Number(m.created_at_unix.into())); + if let Some(rev) = m.iris_git_rev { out.insert("iris_git_rev".into(), Value::String(rev)); } + if let Some(p) = m.parent { out.insert("parent".into(), Value::String(p)); } + if let Some(d) = m.description { out.insert("description".into(), Value::String(d)); } + out.insert("installed_bundles".into(), + Value::Array(m.installed_bundles.into_iter().map(Value::String).collect())); + } else { + out.insert("schema_version".into(), Value::Number(0.into())); + out.insert("legacy".into(), Value::Bool(true)); + } + Response::data(Value::Object(out)) +} + +fn cmd_delete(args: &Value) -> Response { + let name = match args.get("name").and_then(|v| v.as_str()) { + Some(n) => n, + None => return Response::err("delete: missing 'name' arg"), + }; + if name.is_empty() || name.contains("..") { + return Response::err("delete: invalid name"); + } + let dir = std::path::Path::new("saves").join(name); + if !dir.is_dir() { + return Response::err(format!("delete: snapshot '{}' not found", name)); + } + if let Err(e) = std::fs::remove_dir_all(&dir) { + return Response::err(format!("delete: {}: {}", dir.display(), e)); + } + Response::ok() +} + +fn cmd_serial_send(server: &CiServer, args: &Value) -> Response { + let data = match args.get("data").and_then(|v| v.as_str()) { + Some(s) => s, + None => return Response::err("serial-send: missing 'data' arg"), + }; + server.ci_serial.push_host(data.as_bytes()); + Response::ok() +} + +fn cmd_serial_read(server: &CiServer) -> Response { + let bytes = server.ci_serial.drain_guest(); + let s = String::from_utf8_lossy(&bytes).into_owned(); + Response::data(Value::String(s)) +} + +fn cmd_screenshot(server: &CiServer, args: &Value) -> Response { + let Some(rex3) = &server.rex3 else { + return Response::err("screenshot: REX3 not present (running with --headless?)"); + }; + let Some(path) = args.get("path").and_then(|v| v.as_str()) else { + return Response::err("screenshot: missing 'path' arg"); + }; + + // Snapshot the framebuffer under the screen lock; unlock before the PNG + // encode so the refresh thread isn't blocked during disk I/O. + let (width, height, rgba_copy) = { + let screen = rex3.screen.lock(); + let w = screen.width; + let h = screen.height; + let mut out = Vec::with_capacity(w * h); + // `rgba` has row stride 2048; copy the visible window. + for y in 0..h { + let base = y * 2048; + out.extend_from_slice(&screen.rgba[base..base + w]); + } + (w, h, out) + }; + + // Encode each u32 0xFFRRGGBB as 3 RGB bytes in the order the PNG encoder + // expects. + let mut rgb = Vec::with_capacity(width * height * 3); + for px in &rgba_copy { + rgb.push(((px >> 16) & 0xff) as u8); + rgb.push(((px >> 8) & 0xff) as u8); + rgb.push((px & 0xff) as u8); + } + + let file = match std::fs::File::create(path) { + Ok(f) => f, + Err(e) => return Response::err(format!("screenshot: create {}: {}", path, e)), + }; + let bw = std::io::BufWriter::new(file); + let mut enc = png::Encoder::new(bw, width as u32, height as u32); + enc.set_color(png::ColorType::Rgb); + enc.set_depth(png::BitDepth::Eight); + let mut writer = match enc.write_header() { + Ok(w) => w, + Err(e) => return Response::err(format!("screenshot: png header: {}", e)), + }; + if let Err(e) = writer.write_image_data(&rgb) { + return Response::err(format!("screenshot: png write: {}", e)); + } + + Response::data(serde_json::json!({ + "path": path, + "width": width, + "height": height, + "bytes": rgb.len() + 100, // rough + })) +} + +fn cmd_wait_serial(server: &CiServer, args: &Value) -> Response { + let pattern = match args.get("pattern").and_then(|v| v.as_str()) { + Some(p) => p.to_string(), + None => return Response::err("wait-serial: missing 'pattern' arg"), + }; + let timeout_ms = args.get("timeout_ms").and_then(|v| v.as_u64()).unwrap_or(10_000); + + match server.ci_serial.wait_for(pattern.as_bytes(), Duration::from_millis(timeout_ms)) { + Some(consumed) => { + let s = String::from_utf8_lossy(&consumed).into_owned(); + Response::data(Value::String(s)) + } + None => Response::err(format!("wait-serial: timeout after {}ms waiting for {:?}", timeout_ms, pattern)), + } +} + +// ---------------------------------------------------------------------------- +// Scratch volume (Phase 2.4): file injection / extraction without networking. +// +// The scratch device is a raw SCSI LUN (`scratch = true` in iris.toml). +// iris pre-formats the underlying file with a minimal SGI Volume Header at +// sector 0 so IRIX recognises it (without the VH, /dev/rdsk/dks0dNvol +// returns I/O error on every read). The VH defines partition slot 7 +// ("vol") spanning sectors 8..end and slot 8 ("vh") spanning sectors 0..7. +// +// Wire convention: +// - `scratch-write` and `scratch-read` operate on the *payload* area — +// `offset = 0` means the first byte after the VH (raw byte 4096 in the +// underlying file). The VH is never touched by these commands. +// - The guest reads the same payload at offset 0 of /dev/rdsk/dks0dNvol +// because partition 7's first_block = 8. +// - Typical guest read: `dd if=/dev/rdsk/dks0d2vol bs=64k | tar xf -`. +// +// Each scratch op briefly stops the machine to quiesce in-flight SCSI I/O +// (Machine::with_paused). The CPU is restarted only if it was running before +// — a scratch-write issued before the harness `start`s the CPU does not +// auto-start it. +// ---------------------------------------------------------------------------- + +use crate::sgi_vh::SCRATCH_PAYLOAD_OFFSET; + +/// Reject names that would escape the host or smuggle in shell metachars. The +/// host-side path is read by serde_json so quoting is already handled, but a +/// caller-supplied "../" can still escape an intended sandbox. +fn validate_host_path(p: &str) -> Result { + if p.is_empty() { + return Err("path: empty".into()); + } + let pb = std::path::PathBuf::from(p); + if pb.components().any(|c| matches!(c, std::path::Component::ParentDir)) { + return Err(format!("path: '..' components not allowed in {:?}", p)); + } + Ok(pb) +} + +fn cmd_scratch_write(server: &CiServer, args: &Value) -> Response { + let host_path = match args.get("host_path").and_then(|v| v.as_str()) { + Some(p) => match validate_host_path(p) { + Ok(pb) => pb, + Err(e) => return Response::err(format!("scratch-write: {}", e)), + }, + None => return Response::err("scratch-write: missing 'host_path' arg"), + }; + let offset = args.get("offset").and_then(|v| v.as_u64()).unwrap_or(0); + + let bytes = match std::fs::read(&host_path) { + Ok(b) => b, + Err(e) => return Response::err(format!("scratch-write: read {}: {}", host_path.display(), e)), + }; + + let result = server.with_machine(|m| { + let scratch = match m.scratch_path() { + Some(p) => p.to_path_buf(), + None => return Err("scratch volume not configured (set `scratch = true` on a SCSI device in iris.toml)".to_string()), + }; + m.with_paused(|| -> Result { + use std::io::{Seek, SeekFrom, Write}; + let mut f = std::fs::OpenOptions::new() + .write(true) + .open(&scratch) + .map_err(|e| format!("open {}: {}", scratch.display(), e))?; + // Skip the VH partition; offset is relative to the payload area. + let raw_offset = SCRATCH_PAYLOAD_OFFSET.checked_add(offset) + .ok_or_else(|| "offset overflow".to_string())?; + f.seek(SeekFrom::Start(raw_offset)).map_err(|e| format!("seek: {}", e))?; + f.write_all(&bytes).map_err(|e| format!("write: {}", e))?; + f.sync_all().map_err(|e| format!("fsync: {}", e))?; + Ok(bytes.len() as u64) + }) + }); + + match result { + Ok(n) => Response::data(serde_json::json!({ + "bytes_written": n, + "offset": offset, + "host_path": host_path.display().to_string(), + })), + Err(e) => Response::err(format!("scratch-write: {}", e)), + } +} + +fn cmd_scratch_read(server: &CiServer, args: &Value) -> Response { + let to_path = match args.get("to_path").and_then(|v| v.as_str()) { + Some(p) => match validate_host_path(p) { + Ok(pb) => pb, + Err(e) => return Response::err(format!("scratch-read: {}", e)), + }, + None => return Response::err("scratch-read: missing 'to_path' arg"), + }; + let offset = args.get("offset").and_then(|v| v.as_u64()).unwrap_or(0); + let length = args.get("length").and_then(|v| v.as_u64()); + + let result = server.with_machine(|m| { + let scratch = match m.scratch_path() { + Some(p) => p.to_path_buf(), + None => return Err("scratch volume not configured (set `scratch = true` on a SCSI device in iris.toml)".to_string()), + }; + m.with_paused(|| -> Result { + use std::io::{Read, Seek, SeekFrom}; + let mut f = std::fs::File::open(&scratch) + .map_err(|e| format!("open {}: {}", scratch.display(), e))?; + let total = f.metadata().map(|m| m.len()).unwrap_or(0); + let payload_total = total.saturating_sub(SCRATCH_PAYLOAD_OFFSET); + let raw_offset = SCRATCH_PAYLOAD_OFFSET.checked_add(offset) + .ok_or_else(|| "offset overflow".to_string())?; + let len = match length { + Some(n) => n.min(payload_total.saturating_sub(offset)), + None => payload_total.saturating_sub(offset), + }; + f.seek(SeekFrom::Start(raw_offset)).map_err(|e| format!("seek: {}", e))?; + let mut buf = vec![0u8; len as usize]; + f.read_exact(&mut buf).map_err(|e| format!("read: {}", e))?; + std::fs::write(&to_path, &buf) + .map_err(|e| format!("write {}: {}", to_path.display(), e))?; + Ok(buf.len() as u64) + }) + }); + + match result { + Ok(n) => Response::data(serde_json::json!({ + "bytes_read": n, + "offset": offset, + "to_path": to_path.display().to_string(), + })), + Err(e) => Response::err(format!("scratch-read: {}", e)), + } +} + +fn cmd_scratch_clear(server: &CiServer) -> Response { + let result = server.with_machine(|m| { + let scratch = match m.scratch_path() { + Some(p) => p.to_path_buf(), + None => return Err("scratch volume not configured".to_string()), + }; + m.with_paused(|| -> Result { + use std::io::{Seek, SeekFrom, Write}; + let mut f = std::fs::OpenOptions::new() + .write(true) + .open(&scratch) + .map_err(|e| format!("open {}: {}", scratch.display(), e))?; + let size = f.metadata().map(|m| m.len()).unwrap_or(0); + // Zero only the payload area (after the VH). Zero in 1 MiB chunks + // rather than allocating a buffer the full size of the volume. + let chunk = vec![0u8; 1024 * 1024]; + f.seek(SeekFrom::Start(SCRATCH_PAYLOAD_OFFSET)) + .map_err(|e| format!("seek: {}", e))?; + let mut remaining = size.saturating_sub(SCRATCH_PAYLOAD_OFFSET); + while remaining > 0 { + let n = remaining.min(chunk.len() as u64) as usize; + f.write_all(&chunk[..n]).map_err(|e| format!("write: {}", e))?; + remaining -= n as u64; + } + f.sync_all().map_err(|e| format!("fsync: {}", e))?; + Ok(size.saturating_sub(SCRATCH_PAYLOAD_OFFSET)) + }) + }); + + match result { + Ok(n) => Response::data(serde_json::json!({ "bytes_cleared": n })), + Err(e) => Response::err(format!("scratch-clear: {}", e)), + } +} + +fn cmd_scratch_info(server: &CiServer) -> Response { + let path = server.with_machine(|m| m.scratch_path().map(|p| p.to_path_buf())); + let Some(path) = path else { + return Response::err("scratch-info: scratch volume not configured"); + }; + let size = std::fs::metadata(&path).map(|m| m.len()).unwrap_or(0); + Response::data(serde_json::json!({ + "path": path.display().to_string(), + "size_bytes": size, + "payload_offset": SCRATCH_PAYLOAD_OFFSET, + "payload_size_bytes": size.saturating_sub(SCRATCH_PAYLOAD_OFFSET), + })) +} + +// ---------------------------------------------------------------------------- +// Snapshot determinism validator (Phase 3.3) +// ---------------------------------------------------------------------------- + +fn cmd_validate(server: &CiServer, args: &Value) -> Response { + let name = match args.get("name").and_then(|v| v.as_str()) { + Some(n) => n.to_string(), + None => return Response::err("validate: missing 'name' arg"), + }; + let n = args + .get("n_instructions") + .and_then(|v| v.as_u64()) + .unwrap_or(1_000_000); + + let report_result = server.with_machine(|m| { + crate::validate::validate_snapshot_determinism(m, &name, n) + }); + + match report_result { + Ok(report) => Response::data(serde_json::json!({ + "deterministic": report.deterministic, + "instructions_run": report.instructions_run, + "summary": report.summary(), + "diffs": report.diffs.iter().map(|(f, a, b)| { + serde_json::json!({"field": f, "a": a, "b": b}) + }).collect::>(), + "pc": format!("0x{:016x}", report.state_a.pc), + })), + Err(e) => Response::err(format!("validate: {}", e)), + } +} + +// ---------------------------------------------------------------------------- +// Snapshot library: gc / diff / tree (Phase 3.2) +// ---------------------------------------------------------------------------- + +/// Walk every snapshot directory under `saves/`, parse each `chunks.bin`, and +/// collect the set of referenced chunk hashes. Used by `gc` to figure out +/// which chunks are still live. +fn collect_live_chunks() -> std::io::Result> { + use std::collections::HashSet; + let mut live: HashSet = HashSet::new(); + let root = std::path::Path::new("saves"); + if !root.is_dir() { + return Ok(live); + } + let mut stack: Vec = vec![root.to_path_buf()]; + while let Some(dir) = stack.pop() { + for e in std::fs::read_dir(&dir)?.flatten() { + let p = e.path(); + if let Some(name) = p.file_name().and_then(|n| n.to_str()) { + if name == ".cas" { continue; } + } + if p.is_dir() { + stack.push(p); + continue; + } + if p.file_name().and_then(|n| n.to_str()) == Some("chunks.bin") { + if let Ok(bytes) = std::fs::read(&p) { + if let Ok(m) = postcard::from_bytes::(&bytes) { + for h in m.referenced_hashes() { + live.insert(*h); + } + } + } + } + } + } + Ok(live) +} + +fn cmd_gc() -> Response { + let live = match collect_live_chunks() { + Ok(l) => l, + Err(e) => return Response::err(format!("gc: collect live: {}", e)), + }; + let store = crate::chunk_store::ChunkStore::new("saves"); + let total_before = store.total_size().unwrap_or(0); + match store.gc(&live) { + Ok((removed, bytes)) => { + // Drop now-empty shard dirs so saves/.cas stays tidy. + if let Ok(entries) = std::fs::read_dir(store.root()) { + for e in entries.flatten() { + let p = e.path(); + if p.is_dir() { + let empty = std::fs::read_dir(&p).map(|mut it| it.next().is_none()).unwrap_or(false); + if empty { + let _ = std::fs::remove_dir(&p); + } + } + } + } + Response::data(serde_json::json!({ + "live_chunks": live.len(), + "removed_chunks": removed, + "bytes_freed": bytes, + "bytes_before": total_before, + "bytes_after": total_before.saturating_sub(bytes), + })) + } + Err(e) => Response::err(format!("gc: {}", e)), + } +} + +/// Diff two snapshots: per-device state diffs, RAM chunk-level deltas, COW +/// overlay sector deltas. Heavy lifting reuses BinValue's `PartialEq` (toml +/// equality) for device state and ChunksManifest hashes for RAM/framebuffer +/// regions. +fn cmd_diff(args: &Value) -> Response { + let a = match args.get("a").and_then(|v| v.as_str()) { + Some(s) => s.to_string(), + None => return Response::err("diff: missing 'a' arg"), + }; + let b = match args.get("b").and_then(|v| v.as_str()) { + Some(s) => s.to_string(), + None => return Response::err("diff: missing 'b' arg"), + }; + if a.is_empty() || a.contains("..") || b.is_empty() || b.contains("..") { + return Response::err("diff: invalid name"); + } + + let dir_a = std::path::PathBuf::from("saves").join(&a); + let dir_b = std::path::PathBuf::from("saves").join(&b); + if !dir_a.is_dir() { + return Response::err(format!("diff: snapshot '{}' not found", a)); + } + if !dir_b.is_dir() { + return Response::err(format!("diff: snapshot '{}' not found", b)); + } + + let snap_a = crate::snapshot::Snapshot::new(&dir_a); + let snap_b = crate::snapshot::Snapshot::new(&dir_b); + + let sv_a = snap_a.read_manifest().ok().flatten().map(|m| m.schema_version).unwrap_or(0); + let sv_b = snap_b.read_manifest().ok().flatten().map(|m| m.schema_version).unwrap_or(0); + + // Per-device state. The eight devices we track here are the ones every + // configuration writes; rex3 is optional so it's handled separately. + let device_bases = [ + "cpu", "mc", "ioc", "scc", "pit", "ps2", "rtc", + "eeprom", "scsi", "seeq", "hpc3", + ]; + let mut devices_changed: Vec<&'static str> = Vec::new(); + let mut devices_unchanged: Vec<&'static str> = Vec::new(); + for &base in &device_bases { + let va = snap_a.read_state(base, sv_a).ok(); + let vb = snap_b.read_state(base, sv_b).ok(); + match (va, vb) { + (Some(va), Some(vb)) => { + if va == vb { devices_unchanged.push(base); } + else { devices_changed.push(base); } + } + _ => devices_changed.push(base), + } + } + // REX3 separately because it's optional. + let rex_a = snap_a.read_state("rex3", sv_a).ok(); + let rex_b = snap_b.read_state("rex3", sv_b).ok(); + let rex3_changed = match (rex_a, rex_b) { + (Some(va), Some(vb)) => Some(va != vb), + (None, None) => None, + _ => Some(true), + }; + + // RAM bank deltas via chunks.bin (v3+ only). + let mut bank_changed_chunks = [0u32; 4]; + let mut bank_total_chunks = [0u32; 4]; + let mut framebuffer_changed_chunks: Option<(u32, u32)> = None; + if sv_a >= 3 && sv_b >= 3 { + if let (Ok(ma), Ok(mb)) = (snap_a.read_chunks_manifest(), snap_b.read_chunks_manifest()) { + for i in 0..4 { + let ah = &ma.bank_chunks[i]; + let bh = &mb.bank_chunks[i]; + let n = ah.len().max(bh.len()); + bank_total_chunks[i] = n as u32; + let mut changed = 0u32; + for k in 0..n { + let av = ah.get(k); + let bv = bh.get(k); + if av != bv { changed += 1; } + } + bank_changed_chunks[i] = changed; + } + if let (Some((rgb_a, aux_a)), Some((rgb_b, aux_b))) = + (&ma.framebuffer_chunks, &mb.framebuffer_chunks) + { + let n = rgb_a.len().max(rgb_b.len()) + aux_a.len().max(aux_b.len()); + let mut changed = 0u32; + for k in 0..rgb_a.len().max(rgb_b.len()) { + if rgb_a.get(k) != rgb_b.get(k) { changed += 1; } + } + for k in 0..aux_a.len().max(aux_b.len()) { + if aux_a.get(k) != aux_b.get(k) { changed += 1; } + } + framebuffer_changed_chunks = Some((changed, n as u32)); + } + } + } + + // COW overlay sector deltas from cow.toml. + let cow_a = snap_a.read_toml("cow.toml").ok(); + let cow_b = snap_b.read_toml("cow.toml").ok(); + let mut cow_diff_per_id: Vec<(usize, u64, u64, u64)> = Vec::new(); // (id, only_a, only_b, both) + if let (Some(ca), Some(cb)) = (cow_a, cow_b) { + let mut ids: std::collections::BTreeSet = Default::default(); + if let Some(t) = ca.as_table() { + for k in t.keys() { + if let Some(s) = k.strip_prefix("scsi") { + if let Ok(n) = s.parse::() { ids.insert(n); } + } + } + } + if let Some(t) = cb.as_table() { + for k in t.keys() { + if let Some(s) = k.strip_prefix("scsi") { + if let Ok(n) = s.parse::() { ids.insert(n); } + } + } + } + for id in ids { + let key = format!("scsi{}", id); + let set_a: std::collections::HashSet = ca.get(&key) + .and_then(|v| v.as_array()) + .map(|arr| arr.iter().filter_map(|x| x.as_integer().map(|i| i as u64)).collect()) + .unwrap_or_default(); + let set_b: std::collections::HashSet = cb.get(&key) + .and_then(|v| v.as_array()) + .map(|arr| arr.iter().filter_map(|x| x.as_integer().map(|i| i as u64)).collect()) + .unwrap_or_default(); + let only_a = set_a.difference(&set_b).count() as u64; + let only_b = set_b.difference(&set_a).count() as u64; + let both = set_a.intersection(&set_b).count() as u64; + cow_diff_per_id.push((id, only_a, only_b, both)); + } + } + + Response::data(serde_json::json!({ + "a": a, + "b": b, + "schema_a": sv_a, + "schema_b": sv_b, + "devices_changed": devices_changed, + "devices_unchanged": devices_unchanged, + "rex3_changed": rex3_changed, + "bank_changed_chunks": bank_changed_chunks, + "bank_total_chunks": bank_total_chunks, + "framebuffer_changed_chunks": framebuffer_changed_chunks, + "cow_diff": cow_diff_per_id.into_iter().map(|(id, only_a, only_b, both)| { + serde_json::json!({"scsi_id": id, "only_a": only_a, "only_b": only_b, "both": both}) + }).collect::>(), + })) +} + +/// Walk every snapshot under `saves/`, build a parent → children map, render +/// indented tree text. Snapshots without a parent (or with a parent that +/// doesn't exist locally) hang off a synthetic `(none)` root. +fn cmd_tree() -> Response { + use std::collections::BTreeMap; + let root = std::path::Path::new("saves"); + if !root.is_dir() { + return Response::data(serde_json::json!({"tree": "(no saves directory)"})); + } + + // (name, parent) for each snapshot. + let mut entries: Vec<(String, Option)> = Vec::new(); + let mut stack: Vec = vec![root.to_path_buf()]; + while let Some(dir) = stack.pop() { + let Ok(it) = std::fs::read_dir(&dir) else { continue }; + let mut subdirs: Vec = Vec::new(); + let mut found_manifest = false; + let mut found_legacy_cpu = false; + for e in it.flatten() { + let p = e.path(); + if let Some(n) = p.file_name().and_then(|n| n.to_str()) { + if n == ".cas" { continue; } + } + if p.is_dir() { + subdirs.push(p); + } else if let Some(name) = p.file_name().and_then(|n| n.to_str()) { + if name == "snapshot.toml" { found_manifest = true; } + if name == "cpu.toml" { found_legacy_cpu = true; } + } + } + if found_manifest || found_legacy_cpu { + if let Ok(rel) = dir.strip_prefix(root) { + let display_name = rel.to_string_lossy().replace('\\', "/"); + if !display_name.is_empty() { + let snap = crate::snapshot::Snapshot::new(&dir); + let parent = snap.read_manifest().ok().flatten().and_then(|m| m.parent); + entries.push((display_name, parent)); + } + } + } + for s in subdirs { stack.push(s); } + } + + // Build parent → children map (None parent → top-level). + let mut by_parent: BTreeMap, Vec> = BTreeMap::new(); + let names: std::collections::HashSet = entries.iter().map(|(n, _)| n.clone()).collect(); + for (name, parent) in &entries { + let key = match parent { + Some(p) if names.contains(p) => Some(p.clone()), + _ => None, + }; + by_parent.entry(key).or_default().push(name.clone()); + } + for v in by_parent.values_mut() { v.sort(); } + + fn render(out: &mut String, by_parent: &BTreeMap, Vec>, parent: Option<&str>, depth: usize) { + let key = parent.map(String::from); + if let Some(children) = by_parent.get(&key) { + for child in children { + for _ in 0..depth { out.push_str(" "); } + out.push_str("- "); + out.push_str(child); + out.push('\n'); + render(out, by_parent, Some(child), depth + 1); + } + } + } + let mut text = String::new(); + render(&mut text, &by_parent, None, 0); + if text.is_empty() { text.push_str("(no snapshots)\n"); } + + Response::data(serde_json::json!({ + "snapshots": entries.iter().map(|(n, p)| { + serde_json::json!({"name": n, "parent": p}) + }).collect::>(), + "tree": text.trim_end_matches('\n').to_string(), + })) +} + +// ---------------------------------------------------------------------------- +// HTTP snapshot registry (Phase 3.4) +// ---------------------------------------------------------------------------- + +fn cmd_pull(args: &Value) -> Response { + let url = match args.get("url").and_then(|v| v.as_str()) { + Some(s) => s.to_string(), + None => return Response::err("pull: missing 'url' arg"), + }; + let name = match args.get("name").and_then(|v| v.as_str()) { + Some(s) => s.to_string(), + None => return Response::err("pull: missing 'name' arg"), + }; + let saves = std::path::PathBuf::from("saves"); + if !saves.is_dir() { + if let Err(e) = std::fs::create_dir_all(&saves) { + return Response::err(format!("pull: create saves/: {}", e)); + } + } + match crate::registry::pull(&url, &name, &saves) { + Ok(report) => Response::data(serde_json::json!({ + "name": name, + "url": url, + "chunks_fetched": report.chunks_fetched, + "chunks_skipped": report.chunks_skipped, + "files_transferred": report.files_transferred, + "bytes_transferred": report.bytes_transferred, + })), + Err(e) => Response::err(format!("pull: {}", e)), + } +} + +fn cmd_push(args: &Value) -> Response { + let url = match args.get("url").and_then(|v| v.as_str()) { + Some(s) => s.to_string(), + None => return Response::err("push: missing 'url' arg"), + }; + let name = match args.get("name").and_then(|v| v.as_str()) { + Some(s) => s.to_string(), + None => return Response::err("push: missing 'name' arg"), + }; + match crate::registry::push(&url, &name, std::path::Path::new("saves")) { + Ok(report) => Response::data(serde_json::json!({ + "name": name, + "url": url, + "chunks_uploaded": report.chunks_fetched, + "chunks_skipped": report.chunks_skipped, + "files_transferred": report.files_transferred, + "bytes_transferred": report.bytes_transferred, + })), + Err(e) => Response::err(format!("push: {}", e)), + } +} diff --git a/src/config.rs b/src/config.rs index f775f80..09e9994 100644 --- a/src/config.rs +++ b/src/config.rs @@ -18,6 +18,19 @@ pub struct ScsiDeviceConfig { /// `{path}.overlay`. Delete the overlay file to reset to clean state. #[serde(default)] pub overlay: bool, + /// Scratch volume: a host-controlled raw block device used for file + /// injection/extraction without networking. iris auto-creates a zero-filled + /// file at `path` if it doesn't exist (size = `size_mb`, default 64). The + /// CI socket exposes scratch-write/read/clear/info to mutate it from the + /// host side. No filesystem is imposed: callers can write a tar stream and + /// the guest reads it with `dd if=/dev/rdsk/dks0dNvh | tar xf -`. + /// Implies !cdrom && !overlay (the volume must be host-writable directly). + #[serde(default)] + pub scratch: bool, + /// Size in MB for an auto-created scratch volume. Ignored when the file + /// already exists or `scratch=false`. + #[serde(default)] + pub size_mb: Option, } /// Protocol for port forwarding. @@ -115,8 +128,24 @@ pub struct MachineConfig { /// If Some(port), start the GDB RSP stub on that TCP port. #[serde(default)] pub gdb_port: Option, + + /// CI mode: opens a control socket for automation, applies speed-favoring + /// fidelity shortcuts. Implies headless unless ci_display is also set. + #[serde(default)] + pub ci: bool, + + /// Unix socket path for CI control. Used only when `ci` is true. + #[serde(default = "default_ci_socket")] + pub ci_socket: String, + + /// With `ci`, keep the Newport window visible (deferred rendering) for + /// interactive test development. + #[serde(default)] + pub ci_display: bool, } +fn default_ci_socket() -> String { "/tmp/iris.sock".to_string() } + fn default_prom() -> String { "prom.bin".to_string() } @@ -134,12 +163,16 @@ fn default_scsi() -> std::collections::HashMap { discs: vec![], cdrom: false, overlay: false, + scratch: false, + size_mb: None, }); map.insert(4, ScsiDeviceConfig { path: "cdrom4.iso".to_string(), discs: vec![], cdrom: true, overlay: false, + scratch: false, + size_mb: None, }); map } @@ -156,6 +189,9 @@ impl Default for MachineConfig { headless: false, no_audio: false, gdb_port: None, + ci: false, + ci_socket: default_ci_socket(), + ci_display: false, } } } @@ -178,8 +214,8 @@ impl MachineConfig { /// Validate bank sizes, returns a description of any errors. pub fn validate(&self) -> Result<(), String> { - if self.scale != 1 && self.scale != 2 { - return Err(format!("scale {} is invalid (valid: 1, 2)", self.scale)); + if self.scale < 1 || self.scale > 4 { + return Err(format!("scale {} is invalid (valid: 1, 2, 3, 4)", self.scale)); } for (i, &sz) in self.banks.iter().enumerate() { if !VALID_BANK_SIZES.contains(&sz) { @@ -310,6 +346,20 @@ pub struct Cli { /// Connect with: target remote localhost: #[arg(long = "gdb-port", value_name = "PORT")] pub gdb_port: Option, + + /// CI mode: enable the control socket and apply speed-favoring fidelity + /// shortcuts. Implies --headless unless --ci-display is also set. + #[arg(long, default_value_t = false)] + pub ci: bool, + + /// Override the default control-socket path (/tmp/iris.sock). + #[arg(long = "ci-socket", value_name = "PATH")] + pub ci_socket: Option, + + /// With --ci, keep the Newport window visible for interactive test + /// development (deferred rendering at 10–15 fps). + #[arg(long = "ci-display", default_value_t = false)] + pub ci_display: bool, } impl Cli { @@ -329,6 +379,8 @@ impl Cli { discs: vec![], cdrom, overlay: false, + scratch: false, + size_mb: None, }); entry.path = path; entry.cdrom = cdrom; @@ -349,6 +401,12 @@ impl Cli { if self.headless { cfg.headless = true; } if self.no_audio { cfg.no_audio = true; } + if self.ci { cfg.ci = true; } + if let Some(p) = &self.ci_socket { cfg.ci_socket = p.clone(); } + if self.ci_display { cfg.ci_display = true; } + // NB: --ci does NOT imply --headless. REX3 stays alive so screenshots + // work; main.rs simply skips the host window when ci && !ci_display. + // NFS: --nfs-dir enables NFS; other flags refine an existing [nfs] section or the defaults. if let Some(dir) = &self.nfs_dir { let base = cfg.nfs.get_or_insert_with(|| NfsConfig { diff --git a/src/cow_disk.rs b/src/cow_disk.rs index 1a2e89c..16cdd59 100644 --- a/src/cow_disk.rs +++ b/src/cow_disk.rs @@ -7,9 +7,95 @@ use std::collections::HashSet; use std::fs::{File, OpenOptions}; use std::io::{self, Read, Seek, SeekFrom, Write}; +use std::path::{Path, PathBuf}; const SECTOR_SIZE: u64 = 512; +/// Clone `src` to `dst` via filesystem-level CoW (APFS clonefile, Linux +/// FICLONE) when supported; fall back to a regular byte copy otherwise. On a +/// reflink-capable filesystem this is metadata-only — sub-millisecond for any +/// size — which makes per-snapshot overlay capture essentially free. +fn reflink_or_copy(src: &Path, dst: &Path) -> io::Result<()> { + let _ = std::fs::remove_file(dst); + if try_reflink(src, dst).is_ok() { + return Ok(()); + } + std::fs::copy(src, dst).map(|_| ()) +} + +#[cfg(target_os = "macos")] +fn try_reflink(src: &Path, dst: &Path) -> io::Result<()> { + use std::ffi::CString; + use std::os::unix::ffi::OsStrExt; + let src_c = CString::new(src.as_os_str().as_bytes()) + .map_err(|e| io::Error::new(io::ErrorKind::InvalidInput, e))?; + let dst_c = CString::new(dst.as_os_str().as_bytes()) + .map_err(|e| io::Error::new(io::ErrorKind::InvalidInput, e))?; + let rc = unsafe { libc::clonefile(src_c.as_ptr(), dst_c.as_ptr(), 0) }; + if rc == 0 { Ok(()) } else { Err(io::Error::last_os_error()) } +} + +#[cfg(target_os = "linux")] +fn try_reflink(src: &Path, dst: &Path) -> io::Result<()> { + use std::os::unix::io::AsRawFd; + // FICLONE = _IOW(0x94, 9, int); see linux/fs.h. + const FICLONE: libc::c_ulong = 0x40049409; + let src_f = File::open(src)?; + let dst_f = OpenOptions::new().write(true).create(true).truncate(true).open(dst)?; + let rc = unsafe { libc::ioctl(dst_f.as_raw_fd(), FICLONE, src_f.as_raw_fd()) }; + if rc == 0 { + Ok(()) + } else { + let err = io::Error::last_os_error(); + let _ = std::fs::remove_file(dst); + Err(err) + } +} + +#[cfg(not(any(target_os = "macos", target_os = "linux")))] +fn try_reflink(_src: &Path, _dst: &Path) -> io::Result<()> { + Err(io::Error::new(io::ErrorKind::Unsupported, "reflink not supported on this OS")) +} + +/// Sidecar file holding the dirty sector list. Written next to the overlay +/// (e.g. `foo.overlay.dirty`). Format: binary, `u64` little-endian count +/// followed by that many `u64` sector LBAs, also LE. Compact enough that +/// flushing it on shutdown or on a periodic schedule is cheap. +fn dirty_sidecar_path(overlay_path: &str) -> PathBuf { + PathBuf::from(format!("{}.dirty", overlay_path)) +} + +fn load_dirty_sidecar(path: &Path) -> io::Result> { + if !path.exists() { return Ok(HashSet::new()); } + let mut f = File::open(path)?; + let mut count_buf = [0u8; 8]; + if f.read_exact(&mut count_buf).is_err() { return Ok(HashSet::new()); } + let count = u64::from_le_bytes(count_buf) as usize; + let mut set = HashSet::with_capacity(count); + let mut buf = [0u8; 8]; + for _ in 0..count { + if f.read_exact(&mut buf).is_err() { break; } + set.insert(u64::from_le_bytes(buf)); + } + Ok(set) +} + +fn save_dirty_sidecar(path: &Path, dirty: &HashSet) -> io::Result<()> { + // Write atomically: write to a temp file then rename. + let tmp = path.with_extension("dirty.tmp"); + { + let mut f = File::create(&tmp)?; + let count = dirty.len() as u64; + f.write_all(&count.to_le_bytes())?; + for &s in dirty { + f.write_all(&s.to_le_bytes())?; + } + f.sync_all()?; + } + std::fs::rename(&tmp, path)?; + Ok(()) +} + pub struct CowDisk { base: File, overlay: File, @@ -32,16 +118,23 @@ impl CowDisk { .create(true) .open(overlay_path)?; - // Rebuild the dirty set from the overlay file size. - // The overlay is a sparse file with the same layout as the base. - // Any sector that has been written occupies space, but we can't easily - // detect sparse holes portably. Instead, track dirty sectors in memory - // and accept that a fresh start after crash loses the dirty set - // (overlay is deleted on state load anyway). - let dirty = HashSet::new(); + // Recover the dirty set from our sidecar file (written on flush / + // shutdown by previous runs). If the sidecar is missing we start + // empty — any prior writes in the overlay file are effectively + // invisible until the sidecar gets written. This is deliberate: + // "dirty" means "the host finished writing this sector," not "the + // file has some bytes here" (sparse allocation can contain partial + // writes from an interrupted run, which can't be trusted). + let sidecar = dirty_sidecar_path(overlay_path); + let dirty = load_dirty_sidecar(&sidecar).unwrap_or_default(); - eprintln!("iris: COW overlay active (base: {}, overlay: {})", base_path, overlay_path); - eprintln!("iris: to reset disk to clean state, delete {}", overlay_path); + eprintln!("iris: COW overlay active (base: {}, overlay: {}, dirty sectors: {})", + base_path, overlay_path, dirty.len()); + if dirty.is_empty() && std::fs::metadata(overlay_path).map(|m| m.len()).unwrap_or(0) > 0 { + eprintln!("iris: note: overlay file has data but no .dirty sidecar — prior writes are not in use"); + } + eprintln!("iris: to reset disk to clean state, delete {} and {}", + overlay_path, sidecar.display()); Ok(Self { base, @@ -152,11 +245,107 @@ impl CowDisk { self.dirty.clear(); self.overlay.set_len(0)?; self.overlay.seek(SeekFrom::Start(0))?; + // Also clear the sidecar so we don't "remember" sectors that no + // longer exist after the truncation. + let _ = std::fs::remove_file(dirty_sidecar_path(&self.overlay_path)); Ok(()) } + /// Flush the overlay file's data and persist the dirty sector set to + /// the sidecar. Call this on clean shutdown or before snapshot save so + /// a subsequent run can read back what we wrote. + pub fn flush(&mut self) -> io::Result<()> { + self.overlay.sync_all()?; + save_dirty_sidecar(&dirty_sidecar_path(&self.overlay_path), &self.dirty) + } + /// Number of dirty sectors in the overlay. pub fn dirty_count(&self) -> usize { self.dirty.len() } + + /// Copy the current overlay file to `dest` and return the dirty sector + /// list (sorted, ascending). Used by snapshot save so the entire disk + /// state — base + overlay — is captured consistently with RAM. + pub fn export_overlay(&mut self, dest: &Path) -> io::Result> { + self.overlay.sync_all()?; + reflink_or_copy(Path::new(&self.overlay_path), dest)?; + let mut dirty: Vec = self.dirty.iter().copied().collect(); + dirty.sort_unstable(); + Ok(dirty) + } + + /// Replace the overlay contents with `source` and adopt `dirty` as the + /// dirty sector set. Used by snapshot load. If `source` doesn't exist + /// the overlay is truncated instead (matches `reset_overlay` behavior — + /// handles old snapshots without overlay data). + pub fn import_overlay(&mut self, source: &Path, dirty: Vec) -> io::Result<()> { + if source.exists() { + reflink_or_copy(source, Path::new(&self.overlay_path))?; + } else { + // Clear the overlay: nothing saved for this device. + std::fs::File::create(&self.overlay_path)?; + } + // Reopen the file handle — the previous File object points at the + // old inode (which std::fs::copy replaced on some platforms). + self.overlay = OpenOptions::new() + .read(true) + .write(true) + .create(true) + .open(&self.overlay_path)?; + self.dirty = dirty.into_iter().collect(); + Ok(()) + } +} + +impl Drop for CowDisk { + fn drop(&mut self) { + if let Err(e) = self.flush() { + eprintln!("iris: COW flush on drop failed for {}: {} (writes may be lost)", + self.overlay_path, e); + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + use std::io::Write as _; + + fn unique_tmp(tag: &str, ext: &str) -> PathBuf { + let nanos = std::time::SystemTime::now() + .duration_since(std::time::UNIX_EPOCH) + .map(|d| d.as_nanos()) + .unwrap_or(0); + std::env::temp_dir().join(format!("iris-cow-{}-{}.{}", tag, nanos, ext)) + } + + #[test] + fn reflink_or_copy_preserves_bytes() { + let src = unique_tmp("reflink-src", "bin"); + let dst = unique_tmp("reflink-dst", "bin"); + let payload: Vec = (0u8..=255).cycle().take(64 * 1024 + 17).collect(); + { + let mut f = File::create(&src).unwrap(); + f.write_all(&payload).unwrap(); + f.sync_all().unwrap(); + } + reflink_or_copy(&src, &dst).expect("reflink_or_copy"); + let read_back = std::fs::read(&dst).unwrap(); + assert_eq!(read_back, payload); + let _ = std::fs::remove_file(&src); + let _ = std::fs::remove_file(&dst); + } + + #[test] + fn reflink_or_copy_overwrites_existing_dst() { + let src = unique_tmp("reflink-src2", "bin"); + let dst = unique_tmp("reflink-dst2", "bin"); + std::fs::write(&src, b"new content").unwrap(); + std::fs::write(&dst, b"old content that is longer than the new one").unwrap(); + reflink_or_copy(&src, &dst).expect("overwrite"); + assert_eq!(std::fs::read(&dst).unwrap(), b"new content"); + let _ = std::fs::remove_file(&src); + let _ = std::fs::remove_file(&dst); + } } diff --git a/src/ds1x86.rs b/src/ds1x86.rs index 0acd115..c22f619 100644 --- a/src/ds1x86.rs +++ b/src/ds1x86.rs @@ -394,3 +394,39 @@ impl Saveable for Ds1x86 { Ok(()) } } + +#[cfg(test)] +mod tests { + use super::*; + + /// Phase 1.7 round-trip: a fresh RTC loaded from a captured save_state must + /// re-serialize byte-identically. Save_state flushes the live host clock + /// into regs when TE is set, so we clear TE first to make the test stable. + #[test] + fn save_load_round_trip() { + let src = Ds1x86::new(8192); + // Disable transfer-enable so save_state doesn't tick the clock between + // calls; mutate a few NVRAM bytes outside the time-keeping registers. + { + let mut d = src.data.lock(); + d.regs[CMD_REG_OFFSET] &= !TE_BIT; + d.regs[64] = 0xa5; + d.regs[1024] = 0x5a; + d.regs[8190] = 0xff; + } + let v1 = src.save_state(); + + let dst = Ds1x86::new(8192); + dst.load_state(&v1).expect("load_state"); + // Same: clear TE on dst before re-serializing so its save_state path + // matches src's behavior. (load_state preserves the TE bit from v1, so + // it should already be cleared, but be defensive.) + { + let mut d = dst.data.lock(); + d.regs[CMD_REG_OFFSET] &= !TE_BIT; + } + let v2 = dst.save_state(); + + assert_eq!(v1, v2, "Ds1x86 save_state mismatch after load_state round-trip"); + } +} diff --git a/src/eeprom_93c56.rs b/src/eeprom_93c56.rs index 4aa4ad0..011b701 100644 --- a/src/eeprom_93c56.rs +++ b/src/eeprom_93c56.rs @@ -387,4 +387,25 @@ mod tests { assert_eq!(data, 0xABCD); } + + /// Phase 1.7 round-trip: a fresh Eeprom loaded from a captured save_state + /// must re-serialize byte-identically. Catches load_state_mut forgetting a + /// field that save_state writes. + #[test] + fn save_load_round_trip() { + // Mutate a few words so we're not testing the all-default 0xFFFF value. + let mut src = Eeprom93c56::new(); + src.write_enable = true; + src.data[0] = 0xdead; + src.data[42] = 0xbeef; + src.data[64] = 0x1234; + src.data[127] = 0xcafe; + let v1 = src.save_state_owned(); + + let mut dst = Eeprom93c56::new(); + dst.load_state_mut(&v1).expect("load_state_mut"); + let v2 = dst.save_state_owned(); + + assert_eq!(v1, v2, "EEPROM save_state mismatch after load_state round-trip"); + } } \ No newline at end of file diff --git a/src/hpc3.rs b/src/hpc3.rs index 5e8bc69..f1b4f5c 100644 --- a/src/hpc3.rs +++ b/src/hpc3.rs @@ -1067,7 +1067,15 @@ impl Hpc3 { } pub fn add_scsi_device(&self, id: usize, path: &str, is_cdrom: bool, discs: Vec, overlay: bool) -> std::io::Result<()> { - self.scsi_dev.add_device(id, path, is_cdrom, discs, overlay) + self.scsi_dev.add_device(id, path, is_cdrom, discs, overlay, None) + } + + /// Same as `add_scsi_device` but lets the caller specify where the COW + /// overlay file lives. Used by `--ci` mode to keep per-process overlays + /// in `/tmp` so parallel `--ci` instances (and an interactive session) + /// don't race on the same file. + pub fn add_scsi_device_with_overlay(&self, id: usize, path: &str, is_cdrom: bool, discs: Vec, overlay: bool, overlay_path: &str) -> std::io::Result<()> { + self.scsi_dev.add_device(id, path, is_cdrom, discs, overlay, Some(overlay_path)) } pub fn ioc(&self) -> &Ioc { diff --git a/src/ioc.rs b/src/ioc.rs index a906901..6d24a00 100644 --- a/src/ioc.rs +++ b/src/ioc.rs @@ -199,6 +199,18 @@ pub struct Ioc { impl Ioc { pub fn new(guinness: bool) -> Self { + Self::new_inner(guinness, false) + } + + /// CI-mode constructor: skips TCP serial backend binding on SCC channels + /// so multiple instances can run in parallel without port conflicts. + /// Caller must install backends via `scc().set_backend_{a,b}` before the + /// first `start()`. + pub fn new_ci(guinness: bool) -> Self { + Self::new_inner(guinness, true) + } + + fn new_inner(guinness: bool, ci_mode: bool) -> Self { let sys_id = if guinness { 0x26 } else { 0x11 }; // primarily prom looks at bit 1 to detect full house. let state = Arc::new(Mutex::new(IocState { sys_id, @@ -241,9 +253,15 @@ impl Ioc { source: IocInterrupt::KbMouse, }); + let scc = if ci_mode { + Z85c30::new_null(Some(serial_irq)) + } else { + Z85c30::new(Some(serial_irq)) + }; + Self { state, - scc: Z85c30::new(Some(serial_irq)), + scc, pit: Pit8254::new(1_000_000, Some(timer0_cb), Some(timer1_cb), None), ps2: Arc::new(Ps2Controller::new(Some(ps2_cb))), guinness, @@ -673,3 +691,41 @@ impl Saveable for Ioc { Ok(()) } } + +#[cfg(test)] +mod tests { + use super::*; + + /// Phase 1.7 round-trip: a fresh IOC loaded from a captured save_state must + /// re-serialize byte-identically. Catches load_state forgetting any of the + /// 16 register fields that save_state writes. + #[test] + fn save_load_round_trip() { + // new_ci uses null serial backends — avoids TCP port binding under + // concurrent test runs. + let src = Ioc::new_ci(true); + { + let mut s = src.state.lock(); + s.l0_stat = 0x12; s.l0_mask = 0x34; + s.l1_stat = 0x56; s.l1_mask = 0x78; + s.map_stat = 0x9a; s.map_mask0 = 0xbc; s.map_mask1 = 0xde; + s.map_pol = 0xf0; s.err_stat = 0x01; + s.gc_select = 0x0f; s.gen_cntl = 0xa5; s.panel = 0x5a; + s.read_reg = 0xff; s.dma_sel = 0x33; + s.reset_reg = 0x77; s.write_reg = 0xee; + // load_state re-runs update_interrupts, so the saved snapshot must + // already reflect the cascade-derived bits (MAP_INT0/MAP_INT1) for + // v1 to round-trip cleanly. In a real save these are always + // up-to-date because the bus driver runs update_interrupts on + // every register write. + s.update_interrupts(); + } + let v1 = src.save_state(); + + let dst = Ioc::new_ci(true); + dst.load_state(&v1).expect("load_state"); + let v2 = dst.save_state(); + + assert_eq!(v1, v2, "Ioc save_state mismatch after load_state round-trip"); + } +} diff --git a/src/iris_ci_main.rs b/src/iris_ci_main.rs new file mode 100644 index 0000000..6720e05 --- /dev/null +++ b/src/iris_ci_main.rs @@ -0,0 +1,906 @@ +//! `iris-ci` — ergonomic wrapper around the iris CI control socket. +//! +//! Replaces the raw `printf '...' | nc -U /tmp/iris.sock` pattern that's +//! awkward to type, brittle to quote, and tedious to compose. Every +//! socket-level operation gets a typed clap subcommand, plus macros for +//! the recurring multi-step rituals (boot, login, run, put/get). +//! +//! The headline ergonomic wins: +//! - `iris-ci boot` does the full PROM-menu-to-login dance in one command. +//! - `iris-ci run "ls /tmp"` sends the command, waits for the prompt, and +//! returns just the captured stdout + exit status. +//! - `iris-ci put localfile.tar` copies a host file into the guest with +//! the right `dd bs=512 count=N` recipe baked in — no foot-gun. +//! - `iris-ci script tests/scenario.iris` runs a sequence of commands +//! and prints per-step status + duration. +//! +//! Returns 0 on success, 1 on socket error, 2 on iris error response, +//! 3 on local error (file not found etc). + +use clap::{Parser, Subcommand}; +use serde_json::{json, Value}; +use std::io::{BufRead, BufReader, Write}; +use std::net::Shutdown; +use std::os::unix::net::UnixStream; +use std::path::PathBuf; +use std::time::{Duration, Instant}; + +const DEFAULT_SOCKET: &str = "/tmp/iris.sock"; +const PROMPT_RE: &str = "IRIS"; // Match "IRIS N# " — N is a counter that increments +const RC_MARKER: &str = "IRIS-CI-RC="; + +#[derive(Parser, Debug)] +#[command( + name = "iris-ci", + about = "Drive the iris CI control socket without raw nc + JSON.", + version +)] +struct Cli { + /// Path to the iris CI Unix socket. Override with $IRIS_SOCKET. + #[arg(long, global = true)] + socket: Option, + + /// Print raw JSON responses instead of pretty output. + #[arg(long, global = true)] + json: bool, + + /// Be silent on success (for use in scripts). + #[arg(long, short = 'q', global = true)] + quiet: bool, + + #[command(subcommand)] + cmd: Cmd, +} + +#[derive(Subcommand, Debug)] +enum Cmd { + /// Liveness check — returns "ok" if the socket is reachable. + Ping, + /// Start the CPU thread (no-op if already running). + Start, + /// Cleanly shut down iris. + Quit, + + /// Save the current machine state to saves//. + Save { + name: String, + #[arg(long)] + description: Option, + }, + /// Disk-backed full restore (~145 ms cold, sets up rollback checkpoint). + Restore { name: String }, + /// In-memory rewind to the last `restore` checkpoint (~40 ms). + Rollback, + /// List all saved snapshots. + List, + /// Show metadata (manifest, schema_version, size) for one snapshot. + Info { name: String }, + /// Delete a snapshot (does NOT free CAS chunks; run `gc` after). + Delete { name: String }, + /// Render the snapshot parent-chain tree. + Tree, + /// Compare two snapshots: device + RAM-chunk + COW-sector deltas. + Diff { a: String, b: String }, + /// Sweep CAS chunks not referenced by any kept snapshot. + Gc, + /// Run the snapshot determinism validator. + Validate { + name: String, + /// Number of instructions to step in each pass (default 1_000_000). + #[arg(short = 'n', long, default_value_t = 1_000_000)] + n: u64, + }, + /// Save the REX3 framebuffer to a PNG. + Screenshot { path: PathBuf }, + + /// Send keystrokes to the IRIX serial console. + SerialSend { + text: String, + /// Don't append \r to the text. + #[arg(long)] + no_cr: bool, + }, + /// Drain the serial output buffer and print it. + SerialRead, + /// Wait until `pattern` appears in serial output (or timeout). + SerialWait { + pattern: String, + #[arg(long, default_value_t = 30)] + timeout: u64, + }, + + /// Boot from PROM menu through to the IRIS console login prompt. + Boot { + /// Total timeout in seconds for boot to reach the login prompt. + #[arg(long, default_value_t = 240)] + timeout: u64, + }, + /// Send root login + dismiss the vt100 prompt + wait for the shell. + Login { + #[arg(default_value = "root")] + user: String, + /// Optional password (most IRIX root accounts have none). + #[arg(long)] + password: Option, + }, + /// Send a shell command, wait for the prompt, return stdout + exit code. + Run { + command: String, + /// Guest shell. csh uses $status; sh uses $?. + #[arg(long, default_value = "csh")] + shell: String, + #[arg(long, default_value_t = 60)] + timeout: u64, + }, + /// Drain output and wait for the next shell prompt. + WaitPrompt { + #[arg(long, default_value_t = 30)] + timeout: u64, + }, + + /// Copy a host file into the guest. Handles bs=512 + count automatically. + Put { + host_path: PathBuf, + /// Where to put it inside IRIX. Defaults to /tmp/. + #[arg(long)] + to: Option, + #[arg(long, default_value_t = 120)] + timeout: u64, + }, + /// Pull a guest file out to the host. Handles tar + scratch round-trip. + Get { + guest_path: String, + /// Where to write on the host. Defaults to ./. + #[arg(long)] + to: Option, + #[arg(long, default_value_t = 120)] + timeout: u64, + }, + + /// Raw scratch volume operations (bypass guest interaction). + #[command(subcommand)] + Scratch(ScratchCmd), + + /// Pull a snapshot from a remote registry (e.g. `http://localhost:8765`). + Pull { url: String, name: String }, + /// Push a snapshot to a remote registry. + Push { url: String, name: String }, + + /// Run a sequence of iris-ci commands from a file (one per line, # comments). + Script { path: PathBuf }, +} + +#[derive(Subcommand, Debug)] +enum ScratchCmd { + /// Copy raw bytes from a host file into the scratch payload area. + Write { + path: PathBuf, + #[arg(long, default_value_t = 0)] + offset: u64, + }, + /// Copy raw bytes from the scratch payload area into a host file. + Read { + path: PathBuf, + #[arg(long, default_value_t = 0)] + offset: u64, + #[arg(long)] + length: Option, + }, + /// Zero the scratch payload area (preserves the SGI VH at sector 0). + Clear, + /// Show scratch volume size + payload offset. + Info, +} + +// ---- main / dispatch --------------------------------------------------------- + +fn main() { + let cli = Cli::parse(); + let socket = cli + .socket + .clone() + .or_else(|| std::env::var_os("IRIS_SOCKET").map(PathBuf::from)) + .unwrap_or_else(|| PathBuf::from(DEFAULT_SOCKET)); + let opts = Opts { + socket, + json: cli.json, + quiet: cli.quiet, + }; + + let exit = match dispatch(&opts, cli.cmd) { + Ok(()) => 0, + Err(Error::Local(e)) => { + eprintln!("iris-ci: {}", e); + 3 + } + Err(Error::Connection(e)) => { + eprintln!("iris-ci: connect {}: {}", opts.socket.display(), e); + 1 + } + Err(Error::Iris(e)) => { + eprintln!("iris-ci: iris error: {}", e); + 2 + } + }; + std::process::exit(exit); +} + +fn dispatch(opts: &Opts, cmd: Cmd) -> Result<()> { + match cmd { + Cmd::Ping => simple(opts, "ping", json!({}), "ok"), + Cmd::Start => simple(opts, "start", json!({}), "started"), + Cmd::Quit => simple(opts, "quit", json!({}), "quit"), + Cmd::Save { name, description } => { + let mut args = json!({"name": name}); + if let Some(d) = description { args["description"] = Value::String(d); } + simple(opts, "save", args, &format!("saved: {}", "")) // detailed status logged below + } + Cmd::Restore { name } => simple(opts, "restore", json!({"name": name}), "restored"), + Cmd::Rollback => simple(opts, "rollback", json!({}), "rolled back"), + Cmd::List => cmd_list(opts), + Cmd::Info { name } => cmd_info(opts, &name), + Cmd::Delete { name } => simple(opts, "delete", json!({"name": name}), "deleted"), + Cmd::Tree => cmd_tree(opts), + Cmd::Diff { a, b } => cmd_diff(opts, &a, &b), + Cmd::Gc => cmd_gc(opts), + Cmd::Validate { name, n } => cmd_validate(opts, &name, n), + Cmd::Screenshot { path } => simple(opts, "screenshot", json!({"path": path.display().to_string()}), "screenshot"), + + Cmd::SerialSend { text, no_cr } => { + let data = if no_cr { text } else { format!("{}\r", text) }; + simple(opts, "serial-send", json!({"data": data}), "sent") + } + Cmd::SerialRead => cmd_serial_read(opts), + Cmd::SerialWait { pattern, timeout } => cmd_serial_wait(opts, &pattern, timeout * 1000), + + Cmd::Boot { timeout } => cmd_boot(opts, timeout), + Cmd::Login { user, password } => cmd_login(opts, &user, password.as_deref()), + Cmd::Run { command, shell, timeout } => cmd_run(opts, &command, &shell, timeout * 1000), + Cmd::WaitPrompt { timeout } => cmd_wait_prompt(opts, timeout * 1000), + + Cmd::Put { host_path, to, timeout } => cmd_put(opts, &host_path, to.as_deref(), timeout * 1000), + Cmd::Get { guest_path, to, timeout } => cmd_get(opts, &guest_path, to.as_deref(), timeout * 1000), + + Cmd::Scratch(s) => cmd_scratch(opts, s), + + Cmd::Pull { url, name } => cmd_pull(opts, &url, &name), + Cmd::Push { url, name } => cmd_push(opts, &url, &name), + + Cmd::Script { path } => cmd_script(opts, &path), + } +} + +// ---- error type -------------------------------------------------------------- + +#[derive(Debug)] +enum Error { + Local(String), + Connection(std::io::Error), + Iris(String), +} +type Result = std::result::Result; +impl From for Error { + fn from(e: std::io::Error) -> Self { Error::Connection(e) } +} + +struct Opts { + socket: PathBuf, + json: bool, + quiet: bool, +} + +// ---- socket client ----------------------------------------------------------- + +/// Send one JSON command, return the parsed response data on `ok:true`, +/// or an Error on connection failure or `ok:false`. +/// +/// Protocol detail: the server (`src/ci.rs::handle_client`) keeps the +/// connection open and reads requests in a loop, expecting the client to +/// close. We send our single request, then read exactly one newline- +/// terminated response line, then drop the stream — the server's reader +/// loop sees EOF and exits cleanly. +fn send(opts: &Opts, cmd: &str, args: Value) -> Result { + let s = UnixStream::connect(&opts.socket)?; + s.set_read_timeout(Some(Duration::from_secs(300))).ok(); + let req = json!({"cmd": cmd, "args": args}); + let line = format!("{}\n", serde_json::to_string(&req).expect("json")); + { + let mut writer = s.try_clone()?; + writer.write_all(line.as_bytes())?; + writer.flush()?; + } + // Read exactly one line of response. + let mut reader = BufReader::new(s.try_clone()?); + let mut buf = String::new(); + reader.read_line(&mut buf)?; + // Tell the server we're done so its read loop exits. + let _ = s.shutdown(Shutdown::Both); + let trimmed = buf.trim(); + if trimmed.is_empty() { + return Err(Error::Iris("empty response".into())); + } + let resp: Value = serde_json::from_str(trimmed).map_err(|e| { + Error::Iris(format!("bad response: {}: {}", e, trimmed)) + })?; + if resp.get("ok").and_then(|v| v.as_bool()) != Some(true) { + let msg = resp + .get("error") + .and_then(|v| v.as_str()) + .unwrap_or("unknown error"); + return Err(Error::Iris(format!("{}: {}", cmd, msg))); + } + Ok(resp.get("data").cloned().unwrap_or(Value::Null)) +} + +// ---- 1:1 commands with pretty output ----------------------------------------- + +fn simple(opts: &Opts, cmd: &str, args: Value, ok_msg: &str) -> Result<()> { + let data = send(opts, cmd, args)?; + print_response(opts, ok_msg, &data); + Ok(()) +} + +fn print_response(opts: &Opts, ok_msg: &str, data: &Value) { + if opts.json { + println!("{}", serde_json::to_string_pretty(data).unwrap_or_else(|_| data.to_string())); + return; + } + if opts.quiet { return; } + if data.is_null() || (data.is_object() && data.as_object().map(|m| m.is_empty()).unwrap_or(true)) { + println!("{}", ok_msg); + } else { + println!("{}: {}", ok_msg, data); + } +} + +fn cmd_list(opts: &Opts) -> Result<()> { + let data = send(opts, "list", json!({}))?; + if opts.json { println!("{}", serde_json::to_string_pretty(&data).unwrap_or_default()); return Ok(()); } + if let Some(arr) = data.get("snapshots").and_then(|v| v.as_array()) { + for s in arr { + if let Some(name) = s.as_str() { + println!("{}", name); + } + } + } + Ok(()) +} + +fn cmd_info(opts: &Opts, name: &str) -> Result<()> { + let data = send(opts, "info", json!({"name": name}))?; + if opts.json { println!("{}", serde_json::to_string_pretty(&data).unwrap_or_default()); return Ok(()); } + let f = |k: &str| data.get(k).cloned().unwrap_or(Value::Null); + println!("name {}", f("name")); + println!("schema_version {}", f("schema_version")); + println!("host_arch {}", f("host_arch")); + println!("created_at_unix {}", f("created_at_unix")); + println!("bytes_on_disk {}", f("bytes_on_disk")); + if let Some(p) = data.get("parent") { if !p.is_null() { println!("parent {}", p); } } + if let Some(d) = data.get("description") { if !d.is_null() { println!("description {}", d); } } + if let Some(b) = data.get("installed_bundles") { println!("installed {}", b); } + Ok(()) +} + +fn cmd_tree(opts: &Opts) -> Result<()> { + let data = send(opts, "tree", json!({}))?; + if opts.json { println!("{}", serde_json::to_string_pretty(&data).unwrap_or_default()); return Ok(()); } + if let Some(t) = data.get("tree").and_then(|v| v.as_str()) { + println!("{}", t); + } + Ok(()) +} + +fn cmd_diff(opts: &Opts, a: &str, b: &str) -> Result<()> { + let data = send(opts, "diff", json!({"a": a, "b": b}))?; + if opts.json { println!("{}", serde_json::to_string_pretty(&data).unwrap_or_default()); return Ok(()); } + println!("diff {} → {}", a, b); + if let Some(arr) = data.get("devices_changed").and_then(|v| v.as_array()) { + let names: Vec = arr.iter().filter_map(|v| v.as_str().map(str::to_string)).collect(); + if !names.is_empty() { + println!(" devices changed: {}", names.join(", ")); + } + } + if let Some(arr) = data.get("devices_unchanged").and_then(|v| v.as_array()) { + let names: Vec = arr.iter().filter_map(|v| v.as_str().map(str::to_string)).collect(); + if !names.is_empty() { + println!(" devices unchanged: {}", names.join(", ")); + } + } + if let (Some(c), Some(t)) = ( + data.get("bank_changed_chunks").and_then(|v| v.as_array()), + data.get("bank_total_chunks").and_then(|v| v.as_array()), + ) { + for i in 0..c.len().min(t.len()) { + let c = c[i].as_u64().unwrap_or(0); + let t = t[i].as_u64().unwrap_or(0); + if t > 0 { + println!(" bank{}: {}/{} chunks changed", i, c, t); + } + } + } + if let Some(arr) = data.get("cow_diff").and_then(|v| v.as_array()) { + for entry in arr { + let id = entry.get("scsi_id").and_then(|v| v.as_u64()).unwrap_or(0); + let only_a = entry.get("only_a").and_then(|v| v.as_u64()).unwrap_or(0); + let only_b = entry.get("only_b").and_then(|v| v.as_u64()).unwrap_or(0); + let both = entry.get("both").and_then(|v| v.as_u64()).unwrap_or(0); + println!(" scsi{}: only-a={} only-b={} both={}", id, only_a, only_b, both); + } + } + Ok(()) +} + +fn cmd_gc(opts: &Opts) -> Result<()> { + let data = send(opts, "gc", json!({}))?; + if opts.json { println!("{}", serde_json::to_string_pretty(&data).unwrap_or_default()); return Ok(()); } + let removed = data.get("removed_chunks").and_then(|v| v.as_u64()).unwrap_or(0); + let bytes = data.get("bytes_freed").and_then(|v| v.as_u64()).unwrap_or(0); + let live = data.get("live_chunks").and_then(|v| v.as_u64()).unwrap_or(0); + println!("gc: {} chunks removed, {} bytes freed, {} live", removed, bytes, live); + Ok(()) +} + +fn cmd_validate(opts: &Opts, name: &str, n: u64) -> Result<()> { + let data = send(opts, "validate", json!({"name": name, "n_instructions": n}))?; + if opts.json { println!("{}", serde_json::to_string_pretty(&data).unwrap_or_default()); return Ok(()); } + if let Some(s) = data.get("summary").and_then(|v| v.as_str()) { + println!("{}", s); + } + if data.get("deterministic").and_then(|v| v.as_bool()) != Some(true) { + // Validation surfaced a real divergence — exit with iris-error code so + // scripts can branch on it. + return Err(Error::Iris("non-deterministic".into())); + } + Ok(()) +} + +// ---- serial helpers ---------------------------------------------------------- + +fn cmd_serial_read(opts: &Opts) -> Result<()> { + let data = send(opts, "serial-read", json!({}))?; + if let Some(s) = data.as_str() { + if !s.is_empty() { + // Re-render \r\n cleanly — IRIX uses CRLF on the wire. + print!("{}", s.replace("\r\n", "\n").replace('\r', "\n")); + } + } + Ok(()) +} + +fn cmd_serial_wait(opts: &Opts, pattern: &str, timeout_ms: u64) -> Result<()> { + let data = send(opts, "wait-serial", json!({"pattern": pattern, "timeout_ms": timeout_ms}))?; + if let Some(s) = data.as_str() { + if !opts.quiet { + print!("{}", s.replace("\r\n", "\n").replace('\r', "\n")); + } + } + Ok(()) +} + +// ---- boot/login/run macros -------------------------------------------------- + +fn cmd_boot(opts: &Opts, timeout_s: u64) -> Result<()> { + let deadline = Instant::now() + Duration::from_secs(timeout_s); + if !opts.quiet { eprintln!("boot: starting CPU"); } + send(opts, "start", json!({}))?; + if !opts.quiet { eprintln!("boot: waiting for PROM menu"); } + wait_with_deadline(opts, "Option?", deadline)?; + if !opts.quiet { eprintln!("boot: PROM reached, selecting 1) Start System"); } + send(opts, "serial-send", json!({"data": "1\r"}))?; + if !opts.quiet { eprintln!("boot: waiting for kernel boot to login prompt"); } + wait_with_deadline(opts, "IRIS console login", deadline)?; + if !opts.quiet { eprintln!("boot: ready at login"); } + Ok(()) +} + +fn cmd_login(opts: &Opts, user: &str, password: Option<&str>) -> Result<()> { + send(opts, "serial-send", json!({"data": format!("{}\r", user)}))?; + // IRIX presents `TERM = (vt100)` after the username; pressing enter accepts. + std::thread::sleep(Duration::from_millis(2000)); + if let Some(p) = password { + send(opts, "wait-serial", json!({"pattern": "Password:", "timeout_ms": 5000}))?; + send(opts, "serial-send", json!({"data": format!("{}\r", p)}))?; + } + send(opts, "serial-send", json!({"data": "\r"}))?; + let deadline = Instant::now() + Duration::from_secs(15); + wait_with_deadline(opts, "#", deadline)?; + if !opts.quiet { eprintln!("login: shell ready"); } + Ok(()) +} + +fn cmd_wait_prompt(opts: &Opts, timeout_ms: u64) -> Result<()> { + let deadline = Instant::now() + Duration::from_millis(timeout_ms); + wait_with_deadline(opts, PROMPT_RE, deadline)?; + Ok(()) +} + +/// Run a command and return captured stdout + exit code. Internal helper +/// shared by `cmd_run` (which prints stdout) and `cmd_get` (which parses it). +fn run_capture(opts: &Opts, command: &str, shell: &str, timeout_ms: u64) -> Result<(String, i32)> { + let rc_var = match shell { + "csh" | "tcsh" => "$status", + "sh" | "bash" | "ksh" => "$?", + other => return Err(Error::Local(format!("unknown shell {}", other))), + }; + // Drain anything stale before sending. + let _ = send(opts, "serial-read", json!({}))?; + let line = format!("{}; echo {}{}\r", command, RC_MARKER, rc_var); + send(opts, "serial-send", json!({"data": line}))?; + // Single wait: pattern `\nIRIS-CI-RC=` only matches at the start of the + // output line (the typed-input echo line has `IRIS-CI-RC=$status` inline, + // so it has no preceding newline immediately before the marker). + let pat = format!("\n{}", RC_MARKER); + let captured = send( + opts, + "wait-serial", + json!({"pattern": pat, "timeout_ms": timeout_ms}), + )?; + let raw = captured.as_str().unwrap_or("").to_string(); + // Drain trailing chars (rc digits + next prompt). + std::thread::sleep(Duration::from_millis(150)); + let trailing = send(opts, "serial-read", json!({}))?; + let trailing_s = trailing.as_str().unwrap_or(""); + let rc = parse_rc(&format!("{}{}", RC_MARKER, trailing_s)).unwrap_or(-1); + let stdout = extract_run_stdout(&raw); + Ok((stdout, rc)) +} + +/// Send a command, wait for a sentinel, print stdout, fail on non-zero exit. +/// csh: appends `; echo IRIS-CI-RC=$status`. sh: appends `; echo IRIS-CI-RC=$?`. +fn cmd_run(opts: &Opts, command: &str, shell: &str, timeout_ms: u64) -> Result<()> { + let (stdout, rc) = run_capture(opts, command, shell, timeout_ms)?; + if !stdout.is_empty() { + println!("{}", stdout); + } + if rc != 0 { + return Err(Error::Iris(format!("guest exit {}", rc))); + } + Ok(()) +} + +/// `wait-serial` for `\nIRIS-CI-RC=` returns bytes shaped like: +/// +/// \r\n\r?\n +/// +/// Skip the first newline (end of the typed echo), strip the trailing +/// pattern + its leading newline, normalise CRLF, return. +fn extract_run_stdout(buf: &str) -> String { + // Drop the typed-echo-line (everything up through and including its + // first \n). + let after_echo = match buf.find('\n') { + Some(i) => &buf[i + 1..], + None => buf, + }; + // Drop the trailing `\nIRIS-CI-RC=` (we waited for `\nIRIS-CI-RC=`). + let trimmed = match after_echo.rfind(RC_MARKER) { + Some(p) => &after_echo[..p], + None => after_echo, + }; + trimmed + .trim_end_matches(['\r', '\n']) + .replace("\r\n", "\n") + .replace('\r', "\n") +} + +/// Pull the digits after IRIS-CI-RC= out of a buffer. +fn parse_rc(buf: &str) -> Option { + let pos = buf.rfind(RC_MARKER)?; + let tail = &buf[pos + RC_MARKER.len()..]; + let digits: String = tail.chars().take_while(|c| c.is_ascii_digit() || *c == '-').collect(); + digits.parse().ok() +} + + +fn wait_with_deadline(opts: &Opts, pattern: &str, deadline: Instant) -> Result<()> { + let now = Instant::now(); + if now >= deadline { + return Err(Error::Iris(format!("wait {}: deadline already passed", pattern))); + } + let timeout_ms = (deadline - now).as_millis() as u64; + send(opts, "wait-serial", json!({"pattern": pattern, "timeout_ms": timeout_ms})) + .map(|_| ()) +} + +// ---- put / get (the bs=512 foot-gun killers) -------------------------------- + +fn cmd_put(opts: &Opts, host_path: &std::path::Path, to: Option<&str>, timeout_ms: u64) -> Result<()> { + let bytes = std::fs::read(host_path) + .map_err(|e| Error::Local(format!("read {}: {}", host_path.display(), e)))?; + let basename = host_path + .file_name() + .and_then(|s| s.to_str()) + .unwrap_or("inject.bin"); + let guest_path = to + .map(String::from) + .unwrap_or_else(|| format!("/tmp/{}", basename)); + + // 1. Write to scratch volume payload area at offset 0. + let scratch_payload = send(opts, "scratch-info", json!({}))?; + let payload_size = scratch_payload + .get("payload_size_bytes") + .and_then(|v| v.as_u64()) + .ok_or_else(|| Error::Iris("scratch-info: no payload_size_bytes".into()))?; + if (bytes.len() as u64) > payload_size { + return Err(Error::Local(format!( + "{} bytes too large for {} byte scratch payload", + bytes.len(), + payload_size + ))); + } + let host_path_for_socket = host_path.canonicalize() + .unwrap_or_else(|_| host_path.to_path_buf()); + send( + opts, + "scratch-write", + json!({"host_path": host_path_for_socket.display().to_string()}), + )?; + if !opts.quiet { + eprintln!("put: {} bytes staged in scratch", bytes.len()); + } + + // 2. Drive the guest to read exactly the right number of 512-byte sectors. + // Use `>&` for combined stderr+stdout (csh syntax — `2>&1` is sh-only). + // cmd_run wraps with `; echo IRIS-CI-RC=$status` itself. + let sectors = (bytes.len() as u64).div_ceil(512); + let dd_cmd = format!( + "dd if=/dev/rdsk/dks0d2s0 of={} bs=512 count={} >& /dev/null", + guest_path, sectors + ); + cmd_run(opts, &dd_cmd, "csh", timeout_ms)?; + + // 3. Truncate the guest file to the original byte length (dd reads in + // sector multiples, so a 28-byte input becomes 512 bytes on the guest). + // `dd of=FILE bs=1 seek=N count=0` is POSIX and IRIX-clean. + let dd_trunc = format!( + "dd if=/dev/null of={} bs=1 seek={} count=0 >& /dev/null", + guest_path, + bytes.len() + ); + cmd_run(opts, &dd_trunc, "csh", 10_000)?; + + if !opts.quiet { + eprintln!("put: {} → {} ({} bytes)", host_path.display(), guest_path, bytes.len()); + } + Ok(()) +} + +fn cmd_get(opts: &Opts, guest_path: &str, to: Option<&std::path::Path>, timeout_ms: u64) -> Result<()> { + let host_path: PathBuf = match to { + Some(p) => p.to_path_buf(), + None => { + let basename = guest_path + .rsplit('/') + .next() + .filter(|s| !s.is_empty()) + .unwrap_or("captured.bin"); + PathBuf::from(basename) + } + }; + + // 1. Zero scratch payload so trailing zeros after the file are unambiguous. + send(opts, "scratch-clear", json!({}))?; + + // 2. Drive the guest to write the file to scratch with conv=sync padding. + // csh redirect syntax: `>&` for stdout+stderr. cmd_run adds the + // rc-marker echo itself. + let dd_cmd = format!( + "dd if={} of=/dev/rdsk/dks0d2s0 bs=512 conv=sync,notrunc >& /dev/null", + guest_path + ); + cmd_run(opts, &dd_cmd, "csh", timeout_ms)?; + + // 3. Look up the guest file size so we know how much to slice off the + // scratch payload (which is now padded to a 512-byte boundary). Use + // a pure-shell approach: `wc -c` outputs just the byte count. + // `awk` is also available but `wc -c` is simpler to parse. + let stat_cmd = format!("wc -c < {}", guest_path); + let (stat_stdout, stat_rc) = run_capture(opts, &stat_cmd, "csh", 10_000)?; + if stat_rc != 0 { + return Err(Error::Iris(format!( + "guest stat of {} failed (exit {})", guest_path, stat_rc + ))); + } + let size_bytes = stat_stdout + .lines() + .filter_map(|l| l.trim().parse::().ok()) + .next() + .ok_or_else(|| Error::Iris(format!( + "couldn't parse byte count from `wc -c < {}`: {:?}", + guest_path, stat_stdout + )))?; + + // 5. Read the exact number of bytes back from scratch. + let host_abs = std::path::absolute(&host_path).unwrap_or_else(|_| host_path.clone()); + send( + opts, + "scratch-read", + json!({ + "to_path": host_abs.display().to_string(), + "length": size_bytes, + "offset": 0, + }), + )?; + if !opts.quiet { + eprintln!( + "get: {} ({} bytes) → {}", + guest_path, + size_bytes, + host_path.display() + ); + } + Ok(()) +} + +// ---- scratch raw ------------------------------------------------------------- + +fn cmd_scratch(opts: &Opts, s: ScratchCmd) -> Result<()> { + match s { + ScratchCmd::Write { path, offset } => { + let abs = path + .canonicalize() + .map_err(|e| Error::Local(format!("{}: {}", path.display(), e)))?; + simple( + opts, + "scratch-write", + json!({"host_path": abs.display().to_string(), "offset": offset}), + "wrote", + ) + } + ScratchCmd::Read { path, offset, length } => { + let abs = std::path::absolute(&path).unwrap_or(path); + let mut args = json!({"to_path": abs.display().to_string(), "offset": offset}); + if let Some(n) = length { + args["length"] = json!(n); + } + simple(opts, "scratch-read", args, "read") + } + ScratchCmd::Clear => simple(opts, "scratch-clear", json!({}), "cleared"), + ScratchCmd::Info => { + let data = send(opts, "scratch-info", json!({}))?; + if opts.json { + println!("{}", serde_json::to_string_pretty(&data).unwrap_or_default()); + } else { + println!("path {}", data.get("path").cloned().unwrap_or(Value::Null)); + println!("size_bytes {}", data.get("size_bytes").cloned().unwrap_or(Value::Null)); + println!("payload_offset {}", data.get("payload_offset").cloned().unwrap_or(Value::Null)); + println!("payload_size_bytes {}", data.get("payload_size_bytes").cloned().unwrap_or(Value::Null)); + } + Ok(()) + } + } +} + +// ---- pull / push ------------------------------------------------------------- + +fn cmd_pull(opts: &Opts, url: &str, name: &str) -> Result<()> { + let data = send(opts, "pull", json!({"url": url, "name": name}))?; + if opts.json { println!("{}", serde_json::to_string_pretty(&data).unwrap_or_default()); return Ok(()); } + let f = |k: &str| data.get(k).and_then(|v| v.as_u64()).unwrap_or(0); + println!( + "pull {}: {} chunks fetched, {} skipped, {} files, {} bytes", + name, + f("chunks_fetched"), + f("chunks_skipped"), + f("files_transferred"), + f("bytes_transferred"), + ); + Ok(()) +} + +fn cmd_push(opts: &Opts, url: &str, name: &str) -> Result<()> { + let data = send(opts, "push", json!({"url": url, "name": name}))?; + if opts.json { println!("{}", serde_json::to_string_pretty(&data).unwrap_or_default()); return Ok(()); } + let f = |k: &str| data.get(k).and_then(|v| v.as_u64()).unwrap_or(0); + println!( + "push {}: {} chunks uploaded, {} skipped, {} files, {} bytes", + name, + f("chunks_uploaded"), + f("chunks_skipped"), + f("files_transferred"), + f("bytes_transferred"), + ); + Ok(()) +} + +// ---- script file mode -------------------------------------------------------- + +/// Parse a script line into argv tokens. Supports double-quoted strings with +/// `\"` and `\\` escapes — same surface as a typical shell so users can +/// write `run "echo hello"` without bash being involved. +fn tokenize(line: &str) -> std::result::Result, String> { + let mut out = Vec::new(); + let mut cur = String::new(); + let mut in_quote = false; + let mut escape = false; + let mut started = false; + for c in line.chars() { + if escape { + cur.push(c); + escape = false; + continue; + } + if in_quote { + match c { + '\\' => escape = true, + '"' => { + in_quote = false; + out.push(std::mem::take(&mut cur)); + started = false; + } + _ => cur.push(c), + } + continue; + } + match c { + '"' => { in_quote = true; started = true; } + ' ' | '\t' => { + if started { out.push(std::mem::take(&mut cur)); started = false; } + } + _ => { cur.push(c); started = true; } + } + } + if in_quote { return Err("unterminated quote".into()); } + if started { out.push(cur); } + Ok(out) +} + +fn cmd_script(opts: &Opts, path: &std::path::Path) -> Result<()> { + let text = std::fs::read_to_string(path) + .map_err(|e| Error::Local(format!("read {}: {}", path.display(), e)))?; + let mut overall_failed = false; + for (lineno, raw) in text.lines().enumerate() { + let line = raw.trim(); + if line.is_empty() || line.starts_with('#') { + continue; + } + let tokens = tokenize(line).map_err(|e| Error::Local(format!("line {}: {}", lineno + 1, e)))?; + if tokens.is_empty() { + continue; + } + + // Re-parse via clap to dispatch. + let mut argv = vec!["iris-ci".to_string()]; + argv.extend(tokens.iter().cloned()); + let cli = match Cli::try_parse_from(&argv) { + Ok(c) => c, + Err(e) => { + eprintln!("[line {}] parse error: {}", lineno + 1, e); + overall_failed = true; + break; + } + }; + + // Inherit our --socket / --json / --quiet from the outer invocation. + let sub_opts = Opts { + socket: opts.socket.clone(), + json: opts.json || cli.json, + quiet: opts.quiet || cli.quiet, + }; + + let pretty = format_step(line); + let t = Instant::now(); + let res = dispatch(&sub_opts, cli.cmd); + let elapsed = t.elapsed(); + match res { + Ok(()) => { + if !opts.quiet { + println!("[ok {:>6.0?}] {}", elapsed, pretty); + } + } + Err(e) => { + eprintln!("[FAIL {:>6.0?}] {}: {:?}", elapsed, pretty, e); + overall_failed = true; + break; + } + } + } + if overall_failed { + return Err(Error::Local("script aborted on error".into())); + } + Ok(()) +} + +/// Truncate long script lines for display. +fn format_step(line: &str) -> String { + if line.len() <= 72 { line.to_string() } else { format!("{}…", &line[..71]) } +} diff --git a/src/lib.rs b/src/lib.rs index 47fe9f6..210d77c 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -43,6 +43,11 @@ pub mod disp; pub mod exp; pub mod gdb_stub; pub mod snapshot; +pub mod sgi_vh; +pub mod chunk_store; +pub mod validate; +pub mod registry; +pub mod ci; pub mod hptimer; pub mod hptimer_tests; pub mod vga_font; diff --git a/src/machine.rs b/src/machine.rs index 38d17da..1cc9e6f 100644 --- a/src/machine.rs +++ b/src/machine.rs @@ -31,7 +31,8 @@ use crate::hpc3::Hpc3; use crate::ioc::Ioc; use crate::monitor::Monitor; use crate::rex3::Rex3; -use crate::snapshot::Snapshot; +use crate::snapshot::{Snapshot, Manifest, SCHEMA_VERSION, ChunksManifest}; +use crate::chunk_store::{ChunkStore, get_chunks_as_words, put_words_as_chunks}; use crate::hptimer::TimerManager; pub fn emulator_name() -> &'static str { @@ -69,10 +70,64 @@ pub struct Machine { pub event_tx: mpsc::SyncSender, event_rx: Option>, timer_manager: Arc, + /// When `cfg.ci` is set, the channel-A backend is replaced by this + /// in-process one so the CI control socket can drive the console. + ci_serial: Option>, + /// Most recent snapshot restored via `ci_restore`. `rollback` reuses this + /// name as the fallback path if the in-memory checkpoint is absent. + last_restore: Option, + /// In-memory copy of the just-loaded state, taken at the end of every + /// successful `ci_restore`. Lets `ci_rollback` skip disk IO and TOML + /// re-parsing — paste back the cached `toml::Value`s and `memcpy` the + /// bank/framebuffer buffers. Cleared on any explicit `load_snapshot` + /// outside the CI path. + last_restore_checkpoint: Option, + /// Path of the configured scratch SCSI volume, if any. The CI socket reads + /// and writes this file directly (with the machine briefly stopped) to + /// inject/exfiltrate files without going through the network. None when no + /// SCSI device has `scratch = true` set in the config. + scratch_path: Option, +} + +/// In-memory snapshot of the just-restored guest state. Populated at the end +/// of `ci_restore`; consumed by `ci_rollback`. Trades ~270 MB of RSS for +/// disk-IO-free rollback. +struct RollbackCheckpoint { + /// Snapshot directory (saves//) — re-used by rollback to reflink + /// the COW overlays back into place. + overlay_dir: std::path::PathBuf, + /// Per-SCSI-id dirty sector lists from cow.toml at the time of restore. + overlay_sets: Vec<(usize, Vec)>, + + /// Native-endian RAM bank words. `bank_words[i].len() == + /// banks[i].size_bytes / 4` for present banks; populated for all four. + bank_words: [Vec; 4], + + /// Framebuffer contents (RGB, aux). `None` when running headless. + framebuffers: Option<(Vec, Vec)>, + + /// Parsed device save_state TOMLs. Holding `toml::Value` directly skips + /// the ~80 ms cpu.toml string-parse cost on every rollback. + cpu: toml::Value, + mc: toml::Value, + ioc: toml::Value, + scc: toml::Value, + pit: toml::Value, + ps2: toml::Value, + rtc: toml::Value, + eeprom: toml::Value, + scsi: toml::Value, + seeq: toml::Value, + hpc3: toml::Value, + rex3: Option, } impl Machine { pub fn new(cfg: MachineConfig) -> Self { + // Capture config flags that are needed after the local `cfg` binding + // is shadowed later in this function. + let ci_enabled = cfg.ci; + // 0. Shared EEPROM let eeprom = Arc::new(Mutex::new(Eeprom93c56::new())); @@ -102,8 +157,22 @@ impl Machine { let l1i_fetch_count = Arc::new(AtomicU64::new(0)); // L1-I fetch counter let uncached_fetch_count = Arc::new(AtomicU64::new(0)); // uncached instruction fetches - // HPC3 (512KB at 0x1FB80000) - let ioc = Ioc::new(true); + // HPC3 (512KB at 0x1FB80000). CI mode skips the SCC TCP backend + // bindings so multiple `--ci` instances can coexist. + let ioc = if ci_enabled { Ioc::new_ci(true) } else { Ioc::new(true) }; + + // CI mode replaces the default TCP backend on channel B (tty1, the + // SGI serial console) with an in-process backend the control socket + // drives directly. Channel A (tty2) keeps its default TCP backend. + // Must happen before any peripheral `start()` call (which clones the + // current backend Arc into the RX/TX threads). + let ci_serial = if ci_enabled { + let b = Arc::new(crate::z85c30::CiSerialBackend::new()); + ioc.scc().set_backend_b(b.clone()); + Some(b) + } else { + None + }; let timer_manager = Arc::new(TimerManager::new()); ioc.set_timer_manager(timer_manager.clone()); ioc.set_heartbeat(heartbeat.clone()); @@ -113,23 +182,68 @@ impl Machine { // Attach SCSI devices from config (IDs 1–7). let mut scsi_ids: Vec = cfg.scsi.keys().copied().collect(); scsi_ids.sort(); + // CI mode: isolate each COW overlay under /tmp so an interactive + // iris holding {base}.overlay can coexist with any number of `--ci` + // processes. Files are kept for post-mortem inspection; cleanup + // happens on machine drop below. + let ci_pid = std::process::id(); + // Track the on-disk path of any scratch device so the CI socket can + // read/write its bytes directly (Phase 2.4). + let mut scratch_path: Option = None; for id in scsi_ids { let dev = &cfg.scsi[&id]; - // For CD-ROMs: build ordered disc list; first entry is mounted now. - // For HDDs: disc list is unused (empty). + // Scratch volume: pre-create a raw file with a minimal SGI Volume + // Header if it doesn't exist. Refuse cdrom/overlay combinations — + // scratch must be a host-writable raw file. Default size 64 MB. + // + // The VH lays out partition 7 ("vol") spanning sectors 8..end and + // partition 8 ("vh") spanning sectors 0..7 (the VH itself). + // Without a VH, IRIX recognises the device but returns I/O error + // on every read because /dev/rdsk/dks0dNvh and /dev/rdsk/dks0dNvol + // both consult the partition table at sector 0. + // + // Convention: host writes payload via scratch-write at offset >= + // SCRATCH_PAYLOAD_OFFSET (4096). Guest reads from offset 0 of + // /dev/rdsk/dks0dNvol (which maps to sector 8 of the disk by + // partition 7's first_block=8). + if dev.scratch { + if dev.cdrom || dev.overlay { + println!("Note: SCSI ID {}: scratch=true is incompatible with cdrom/overlay; ignoring scratch flag", id); + } else { + let path = std::path::Path::new(&dev.path); + if !path.exists() { + let size_mb = dev.size_mb.unwrap_or(64) as u64; + let bytes = size_mb * 1024 * 1024; + match crate::sgi_vh::create_scratch_image(path, bytes) { + Ok(()) => println!("iris: created scratch volume {} ({} MB, with SGI VH)", dev.path, size_mb), + Err(e) => println!("Note: could not create scratch volume {}: {}", dev.path, e), + } + } + if scratch_path.is_some() { + println!("Note: multiple scratch SCSI devices configured; CI socket will use the lowest-id one"); + } else { + scratch_path = Some(path.to_path_buf()); + } + } + } let (path, discs) = if dev.cdrom { let mut list = dev.discs.clone(); if list.is_empty() { list.push(dev.path.clone()); } else if list[0] != dev.path { - // Ensure path is front of list if explicitly set list.insert(0, dev.path.clone()); } (list[0].clone(), list) } else { (dev.path.clone(), vec![]) }; - if let Err(e) = hpc3.add_scsi_device(id as usize, &path, dev.cdrom, discs, dev.overlay) { + let result = if ci_enabled && dev.overlay && !dev.cdrom { + let ci_overlay = format!("/tmp/iris-ci-{}-scsi{}.overlay", ci_pid, id); + hpc3.add_scsi_device_with_overlay(id as usize, &path, dev.cdrom, discs, dev.overlay, &ci_overlay) + } else { + hpc3.add_scsi_device(id as usize, &path, dev.cdrom, discs, dev.overlay) + }; + if let Err(e) = result { println!("Note: Could not attach {} to SCSI ID {}: {}", path, id, e); } } @@ -292,7 +406,35 @@ impl Machine { event_tx, event_rx: Some(event_rx), timer_manager, + ci_serial, + last_restore: None, + last_restore_checkpoint: None, + scratch_path, + } + } + + /// Path of the configured scratch SCSI volume, if any. Used by the CI + /// socket scratch-{write,read,clear,info} commands to act on the file + /// directly while the machine is briefly stopped. + pub fn scratch_path(&self) -> Option<&std::path::Path> { + self.scratch_path.as_deref() + } + + /// Briefly stop the machine, run `work`, then restart peripherals and the + /// CPU only if it was running before. Used by the scratch-write/read/clear + /// CI commands to mutate the scratch file without racing the SCSI device's + /// in-flight reads. CPU stays stopped if the harness hasn't called `start` + /// yet — a file injected before boot stays injected, the CPU doesn't get + /// auto-started. + pub fn with_paused(&mut self, work: impl FnOnce() -> R) -> R { + let was_running = self.cpu.is_running(); + self.stop(); + let r = work(); + self.restart_peripherals(); + if was_running { + self.cpu.start(); } + r } pub fn start(&mut self) { @@ -301,10 +443,19 @@ impl Machine { self.hpc3.start(); if let Some(rex3) = &self._phys.rex3 { rex3.start(); } - // Start monitor server on localhost:8888 - self.monitor.clone().start_server("127.0.0.1:8888".to_string()); + // Monitor server on localhost:8888. Skipped in CI mode — the control + // socket replaces it, and binding a fixed port would prevent parallel + // `--ci` instances. + if self.ci_serial.is_none() { + self.monitor.clone().start_server("127.0.0.1:8888".to_string()); + } + + // CI mode: the harness drives startup via `restore` / `start`. Don't + // autostart the CPU so the first command finds a quiet machine. #[cfg(not(any(debug_assertions, feature = "developer")))] - self.cpu.start(); + if self.ci_serial.is_none() { + self.cpu.start(); + } } /// Register a SystemController with the monitor so that `reset`, `save`, @@ -424,6 +575,181 @@ impl Machine { MipsCpuDebugAdapter::new(self.cpu.clone()) } + /// The in-process serial backend used by `--ci` mode. `None` in + /// interactive mode. + pub fn get_ci_serial(&self) -> Option> { + self.ci_serial.clone() + } + + /// CPU thread, started explicitly by the CI `start` command or by + /// `ci_restore`. In `--ci` mode the CPU is not autostarted in `start()` + /// — the harness drives startup via `restore`. + pub fn cpu_start(&self) { + self.cpu.start(); + } + + /// Step the CPU `n` instructions in-line on the calling thread, with all + /// peripheral threads stopped so the CPU sees no external interrupts. + /// Used by Phase 3.3 snapshot determinism validator. + /// Caller must arrange `load_snapshot_paused` first. + pub fn cpu_step_n_inline(&self, n: u64) -> Result { + self.cpu.step_n_inline(n) + } + + /// Snapshot the deterministic-from-state CPU registers. + pub fn cpu_state_digest(&self) -> Result { + self.cpu.state_digest() + } + + /// Full rewind: load the named snapshot, which now captures the COW + /// overlay too so the filesystem state is deterministic per snapshot. + /// The CPU resumes automatically (load_snapshot restarts it). After the + /// load, an in-memory checkpoint of the just-restored state is taken so + /// the next `ci_rollback` can run without touching disk. + pub fn ci_restore(&mut self, name: &str) -> Result<(), String> { + // Clear any leftover serial bytes from the previous run so the + // next command doesn't see stale output. + if let Some(ci) = &self.ci_serial { + ci.reset(); + } + + self.load_snapshot(name)?; + self.last_restore = Some(name.to_string()); + // Capture the rollback checkpoint. If this fails, the restore still + // succeeded — rollback will fall back to the disk path. + match self.capture_rollback_checkpoint(name) { + Ok(cp) => self.last_restore_checkpoint = Some(cp), + Err(e) => { + eprintln!("ci_restore: rollback checkpoint capture failed: {} — rollback will use the disk path", e); + self.last_restore_checkpoint = None; + } + } + Ok(()) + } + + /// Roll back to the state captured at the last `ci_restore`. Uses the + /// in-memory checkpoint when present; falls back to a disk reload if it's + /// absent (legacy snapshot loaded outside CI, or capture failed). + pub fn ci_rollback(&mut self) -> Result<(), String> { + if let Some(ci) = &self.ci_serial { + ci.reset(); + } + + // Take the checkpoint out so the apply path can hold &cp without + // borrowing self at the same time. Restored after apply so repeated + // rollbacks work. + let cp = match self.last_restore_checkpoint.take() { + Some(cp) => cp, + None => { + let name = self.last_restore.clone() + .ok_or_else(|| "no previous restore to roll back to".to_string())?; + eprintln!("ci_rollback: no in-memory checkpoint — falling back to disk reload"); + return self.ci_restore(&name); + } + }; + let result = self.apply_rollback_checkpoint(&cp); + self.last_restore_checkpoint = Some(cp); + result + } + + /// Capture in-memory state for fast rollback. Stops the CPU briefly. + fn capture_rollback_checkpoint(&mut self, name: &str) -> Result { + self.stop(); + + let cpu = self.cpu.save_state(); + let mc = self.mc.save_state(); + let ioc = self.hpc3.ioc().save_state(); + let scc = self.hpc3.ioc().scc().save_state(); + let pit = self.hpc3.ioc().pit().save_state(); + let ps2 = self.hpc3.ioc().ps2().save_state(); + let rtc = self.hpc3.rtc().save_state(); + let eeprom = self.hpc3.eeprom().lock().save_state_owned(); + let scsi = self.hpc3.scsi().save_state(); + let seeq = self.hpc3.seeq().save_state(); + let hpc3 = self.hpc3.save_state(); + let rex3 = self._phys.rex3.as_ref().map(|r| r.save_state()); + + let bank_words: [Vec; 4] = [ + self._phys.snapshot_bank_inmem(0), + self._phys.snapshot_bank_inmem(1), + self._phys.snapshot_bank_inmem(2), + self._phys.snapshot_bank_inmem(3), + ]; + + let framebuffers = self._phys.rex3.as_ref() + .map(|r| r.snapshot_framebuffers_inmem()); + + // Re-read cow.toml so rollback knows which dirty sectors to import + // back. The file was just consumed by load_snapshot but it's tiny and + // re-reading from page cache is cheap (~µs). + let overlay_dir = std::path::PathBuf::from("saves").join(name); + let snap = Snapshot::new(&overlay_dir); + let mut overlay_sets: Vec<(usize, Vec)> = Vec::new(); + if let Ok(cow_toml) = snap.read_toml("cow.toml") { + if let Some(tbl) = cow_toml.as_table() { + for (k, v) in tbl { + let Some(id_str) = k.strip_prefix("scsi") else { continue }; + let Ok(id) = id_str.parse::() else { continue }; + let Some(arr) = v.as_array() else { continue }; + let dirty: Vec = arr.iter() + .filter_map(|x| x.as_integer().map(|i| i as u64)) + .collect(); + overlay_sets.push((id, dirty)); + } + } + } + + self.restart_peripherals(); + self.cpu.start(); + + Ok(RollbackCheckpoint { + overlay_dir, + overlay_sets, + bank_words, + framebuffers, + cpu, mc, ioc, scc, pit, ps2, rtc, eeprom, scsi, seeq, hpc3, rex3, + }) + } + + /// Apply an in-memory checkpoint, restoring the guest to the state at + /// the moment of capture. Skips disk IO and TOML string-parsing. + fn apply_rollback_checkpoint(&mut self, cp: &RollbackCheckpoint) -> Result<(), String> { + self.stop(); + self.power_on_devices(); + + self.cpu.load_state(&cp.cpu)?; + self.mc.load_state(&cp.mc)?; + self.hpc3.ioc().load_state(&cp.ioc)?; + self.hpc3.ioc().scc().load_state(&cp.scc)?; + self.hpc3.ioc().pit().load_state(&cp.pit)?; + self.hpc3.ioc().ps2().load_state(&cp.ps2)?; + self.hpc3.rtc().load_state(&cp.rtc)?; + self.hpc3.eeprom().lock().load_state_mut(&cp.eeprom)?; + self.hpc3.scsi().load_state(&cp.scsi)?; + self.hpc3.seeq().load_state(&cp.seeq)?; + self.hpc3.load_state(&cp.hpc3)?; + if let (Some(rex3), Some(rex3_toml)) = (&self._phys.rex3, &cp.rex3) { + rex3.load_state(rex3_toml)?; + } + + for (i, words) in cp.bank_words.iter().enumerate() { + self._phys.restore_bank_inmem(i, words); + } + if let (Some(rex3), Some((rgb, aux))) = (&self._phys.rex3, &cp.framebuffers) { + rex3.restore_framebuffers_inmem(rgb, aux); + } + + // Reflink the overlay back into place. saves//scsi*.overlay is + // unchanged by guest writes (writes go to the live overlay), so this + // can re-import directly. + self.hpc3.scsi().import_overlays(&cp.overlay_dir, &cp.overlay_sets) + .map_err(|e| format!("rollback: COW overlay import: {}", e))?; + + self.restart_peripherals(); + self.cpu.start(); + Ok(()) + } + /// Restart peripherals (MC, HPC3, REX3) without restarting the monitor server. fn restart_peripherals(&mut self) { self.mc.start(); @@ -477,135 +803,258 @@ impl Machine { let snap = Snapshot::new(&dir); snap.ensure_dir().map_err(|e| e.to_string())?; - // CPU + TLB - let cpu_toml = self.cpu.save_state(); - snap.write_toml("cpu.toml", &cpu_toml).map_err(|e| e.to_string())?; - - // Memory Controller - let mc_toml = self.mc.save_state(); - snap.write_toml("mc.toml", &mc_toml).map_err(|e| e.to_string())?; - - // IOC - let ioc_toml = self.hpc3.ioc().save_state(); - snap.write_toml("ioc.toml", &ioc_toml).map_err(|e| e.to_string())?; - - // SCC (Z85C30 serial) - let scc_toml = self.hpc3.ioc().scc().save_state(); - snap.write_toml("scc.toml", &scc_toml).map_err(|e| e.to_string())?; - - // PIT (8254 timer) - let pit_toml = self.hpc3.ioc().pit().save_state(); - snap.write_toml("pit.toml", &pit_toml).map_err(|e| e.to_string())?; - - // PS2 - let ps2_toml = self.hpc3.ioc().ps2().save_state(); - snap.write_toml("ps2.toml", &ps2_toml).map_err(|e| e.to_string())?; - - // RTC (DS1x86) - let rtc_toml = self.hpc3.rtc().save_state(); - snap.write_toml("rtc.toml", &rtc_toml).map_err(|e| e.to_string())?; - - // EEPROM (93C56) - let eeprom_toml = self.hpc3.eeprom().lock().save_state_owned(); - snap.write_toml("eeprom.toml", &eeprom_toml).map_err(|e| e.to_string())?; - - // SCSI (WD33C93A) - let scsi_toml = self.hpc3.scsi().save_state(); - snap.write_toml("scsi.toml", &scsi_toml).map_err(|e| e.to_string())?; - - // Seeq8003 (Ethernet) - let seeq_toml = self.hpc3.seeq().save_state(); - snap.write_toml("seeq.toml", &seeq_toml).map_err(|e| e.to_string())?; - - // HPC3 - let hpc3_toml = self.hpc3.save_state(); - snap.write_toml("hpc3.toml", &hpc3_toml).map_err(|e| e.to_string())?; - - // REX3 + // Write the manifest first so `read_manifest` succeeds even if a later + // step crashes — the partial snapshot is at least diagnosable. + let mut manifest = Manifest::for_current_save(); + manifest.parent = self.last_restore.clone(); + snap.write_manifest(&manifest).map_err(|e| e.to_string())?; + let sv = manifest.schema_version; + + // Device state — schema_version=2 writes *.bin (postcard-encoded + // BinValue tree); legacy writes *.toml. write_state encapsulates the + // choice so this orchestrator stays format-agnostic. + snap.write_state("cpu", &self.cpu.save_state(), sv).map_err(|e| e.to_string())?; + snap.write_state("mc", &self.mc.save_state(), sv).map_err(|e| e.to_string())?; + snap.write_state("ioc", &self.hpc3.ioc().save_state(), sv).map_err(|e| e.to_string())?; + snap.write_state("scc", &self.hpc3.ioc().scc().save_state(), sv).map_err(|e| e.to_string())?; + snap.write_state("pit", &self.hpc3.ioc().pit().save_state(), sv).map_err(|e| e.to_string())?; + snap.write_state("ps2", &self.hpc3.ioc().ps2().save_state(), sv).map_err(|e| e.to_string())?; + snap.write_state("rtc", &self.hpc3.rtc().save_state(), sv).map_err(|e| e.to_string())?; + snap.write_state("eeprom", &self.hpc3.eeprom().lock().save_state_owned(), sv).map_err(|e| e.to_string())?; + snap.write_state("scsi", &self.hpc3.scsi().save_state(), sv).map_err(|e| e.to_string())?; + snap.write_state("seeq", &self.hpc3.seeq().save_state(), sv).map_err(|e| e.to_string())?; + snap.write_state("hpc3", &self.hpc3.save_state(), sv).map_err(|e| e.to_string())?; + + // REX3 (optional — absent in headless config). Framebuffers are + // included in the chunks manifest below for v3+; v2 wrote them as + // standalone .bin files. if let Some(rex3) = &self._phys.rex3 { - let rex3_toml = rex3.save_state(); - snap.write_toml("rex3.toml", &rex3_toml).map_err(|e| e.to_string())?; - rex3.save_framebuffers(&snap.dir).map_err(|e| e.to_string())?; + snap.write_state("rex3", &rex3.save_state(), sv).map_err(|e| e.to_string())?; + if sv < 3 { + rex3.save_framebuffers(&snap.dir).map_err(|e| e.to_string())?; + } + } + + // Bulk memory: v3+ goes to the content-addressable chunk store + // shared across all snapshots in `saves/.cas/`. v2 (legacy) writes + // raw bank{N}.bin files. Chunk hashes go in chunks.bin so load can + // walk the right chunks back out. + if sv >= 3 { + let store = ChunkStore::new("saves"); + let mut chunks = ChunksManifest::default(); + for i in 0..4 { + let words = self._phys.snapshot_bank_inmem(i); + chunks.bank_chunks[i] = put_words_as_chunks(&store, &words) + .map_err(|e| format!("CAS bank{} put: {}", i, e))?; + } + if let Some(rex3) = &self._phys.rex3 { + let (rgb, aux) = rex3.snapshot_framebuffers_inmem(); + let rgb_chunks = put_words_as_chunks(&store, &rgb) + .map_err(|e| format!("CAS rex3 rgb put: {}", e))?; + let aux_chunks = put_words_as_chunks(&store, &aux) + .map_err(|e| format!("CAS rex3 aux put: {}", e))?; + chunks.framebuffer_chunks = Some((rgb_chunks, aux_chunks)); + } + snap.write_chunks_manifest(&chunks).map_err(|e| e.to_string())?; + } else { + for i in 0..4 { + self._phys.save_bank(i, dir.join(format!("bank{}.bin", i))).map_err(|e| e.to_string())?; + } } - // Bulk memory (raw binary, big-endian word layout) — 4 × 128MB banks - for i in 0..4 { - self._phys.save_bank(i, dir.join(format!("bank{}.bin", i))).map_err(|e| e.to_string())?; + // COW overlays per SCSI device, plus a `cow.toml` with the dirty + // sector set for each one. Keeps the on-disk filesystem state + // consistent with the captured RAM. + let overlays = self.hpc3.scsi().export_overlays(&snap.dir) + .map_err(|e| format!("COW overlay export: {}", e))?; + let mut cow_tbl = toml::map::Map::new(); + for (id, dirty) in overlays { + let arr: Vec = dirty.into_iter() + .map(|v| toml::Value::Integer(v as i64)) + .collect(); + cow_tbl.insert(format!("scsi{}", id), toml::Value::Array(arr)); } + snap.write_toml("cow.toml", &toml::Value::Table(cow_tbl)) + .map_err(|e| e.to_string())?; self.restart_peripherals(); + // Resume execution so the session feels like it never paused. + // Without this the user sees JIT shutdown stats and a dead prompt + // after `save` — the CPU would otherwise stay stopped. + self.cpu.start(); println!("Snapshot saved to saves/{}", name); Ok(()) } - /// Restore full machine snapshot from `saves//`. + /// Restore full machine snapshot from `saves//`. CPU is auto-started + /// at the end so the guest resumes from the snapshotted PC. + /// For determinism validation use `load_snapshot_paused` instead. pub fn load_snapshot(&mut self, name: &str) -> Result<(), String> { + self.load_snapshot_inner(name)?; + self.cpu.start(); + println!("Snapshot loaded from saves/{}", name); + Ok(()) + } + + /// Same body as `load_snapshot` but leaves CPU and peripheral threads + /// stopped on return. Used by the Phase 3.3 determinism validator which + /// must prevent any thread from running between load and digest, since + /// thread scheduling jitter would mask CPU determinism issues. + pub fn load_snapshot_paused(&mut self, name: &str) -> Result<(), String> { + self.load_snapshot_inner(name)?; + // load_snapshot_inner restarted peripherals; stop them again. + self.hpc3.stop(); + self.mc.stop(); + if let Some(rex3) = &self._phys.rex3 { rex3.stop(); } + Ok(()) + } + + /// Restore full machine snapshot from `saves//`. + /// + /// JIT-cache invariant: `self.stop()` exits the CPU thread, which drops + /// the `CodeCache` owned by `run_jit_dispatch`. Subsequent `cpu.start()` + /// (in the public `load_snapshot` wrapper) builds a fresh cache. So no + /// explicit invalidation is needed here as long as that ownership + /// pattern holds. The persistent JIT profile uses content_hash to skip + /// stale entries (see `profile_stale` in dispatch.rs). + fn load_snapshot_inner(&mut self, name: &str) -> Result<(), String> { self.stop(); + // Any prior in-memory rollback checkpoint is now stale (it described + // a different snapshot). ci_restore will recapture if reached via + // that path; the monitor `load` command leaves it cleared. + self.last_restore_checkpoint = None; + // Reset to clean state before loading self.power_on_devices(); let dir = std::path::PathBuf::from("saves").join(name); let snap = Snapshot::new(&dir); - // CPU + TLB - let cpu_toml = snap.read_toml("cpu.toml").map_err(|e| e.to_string())?; - self.cpu.load_state(&cpu_toml)?; + // Validate the manifest before reading anything else. Legacy snapshots + // (no snapshot.toml) are accepted with a warning. Cross-arch loads are + // refused — FPU bit-layout differs between aarch64 and x86_64 and we + // don't have migration plumbing yet. + let schema_version = match snap.read_manifest()? { + Some(m) => { + if m.host_arch != std::env::consts::ARCH { + return Err(format!( + "snapshot host_arch '{}' does not match current host '{}'; cross-arch load is not supported", + m.host_arch, std::env::consts::ARCH + )); + } + if m.schema_version > SCHEMA_VERSION { + return Err(format!( + "snapshot schema_version {} is newer than this iris build supports ({})", + m.schema_version, SCHEMA_VERSION + )); + } + if let Some(rev) = &m.iris_git_rev { + if let Some(my_rev) = option_env!("IRIS_GIT_REV") { + if rev != my_rev { + eprintln!("load_snapshot: snapshot was captured at iris {} but current build is {}", rev, my_rev); + } + } + } + m.schema_version + } + None => { + eprintln!("load_snapshot: no snapshot.toml in {} — treating as legacy v0 (no manifest)", dir.display()); + 0 + } + }; + + // Device state — read_state picks .bin (v2+) or .toml + // (legacy). v2 also falls back to .toml if .bin is absent. + let cpu = snap.read_state("cpu", schema_version).map_err(|e| e.to_string())?; + self.cpu.load_state(&cpu)?; - // Memory Controller - let mc_toml = snap.read_toml("mc.toml").map_err(|e| e.to_string())?; - self.mc.load_state(&mc_toml)?; + let mc = snap.read_state("mc", schema_version).map_err(|e| e.to_string())?; + self.mc.load_state(&mc)?; - // IOC - let ioc_toml = snap.read_toml("ioc.toml").map_err(|e| e.to_string())?; - self.hpc3.ioc().load_state(&ioc_toml)?; + let ioc = snap.read_state("ioc", schema_version).map_err(|e| e.to_string())?; + self.hpc3.ioc().load_state(&ioc)?; - // SCC (Z85C30 serial) - let scc_toml = snap.read_toml("scc.toml").map_err(|e| e.to_string())?; - self.hpc3.ioc().scc().load_state(&scc_toml)?; + let scc = snap.read_state("scc", schema_version).map_err(|e| e.to_string())?; + self.hpc3.ioc().scc().load_state(&scc)?; - // PIT (8254 timer) - let pit_toml = snap.read_toml("pit.toml").map_err(|e| e.to_string())?; - self.hpc3.ioc().pit().load_state(&pit_toml)?; + let pit = snap.read_state("pit", schema_version).map_err(|e| e.to_string())?; + self.hpc3.ioc().pit().load_state(&pit)?; - // PS2 - let ps2_toml = snap.read_toml("ps2.toml").map_err(|e| e.to_string())?; - self.hpc3.ioc().ps2().load_state(&ps2_toml)?; + let ps2 = snap.read_state("ps2", schema_version).map_err(|e| e.to_string())?; + self.hpc3.ioc().ps2().load_state(&ps2)?; - // RTC (DS1x86) - let rtc_toml = snap.read_toml("rtc.toml").map_err(|e| e.to_string())?; - self.hpc3.rtc().load_state(&rtc_toml)?; + let rtc = snap.read_state("rtc", schema_version).map_err(|e| e.to_string())?; + self.hpc3.rtc().load_state(&rtc)?; - // EEPROM (93C56) - let eeprom_toml = snap.read_toml("eeprom.toml").map_err(|e| e.to_string())?; - self.hpc3.eeprom().lock().load_state_mut(&eeprom_toml)?; + let eeprom = snap.read_state("eeprom", schema_version).map_err(|e| e.to_string())?; + self.hpc3.eeprom().lock().load_state_mut(&eeprom)?; - // SCSI (WD33C93A) - let scsi_toml = snap.read_toml("scsi.toml").map_err(|e| e.to_string())?; - self.hpc3.scsi().load_state(&scsi_toml)?; + let scsi = snap.read_state("scsi", schema_version).map_err(|e| e.to_string())?; + self.hpc3.scsi().load_state(&scsi)?; - // Seeq8003 (Ethernet) - let seeq_toml = snap.read_toml("seeq.toml").map_err(|e| e.to_string())?; - self.hpc3.seeq().load_state(&seeq_toml)?; + let seeq = snap.read_state("seeq", schema_version).map_err(|e| e.to_string())?; + self.hpc3.seeq().load_state(&seeq)?; - // HPC3 - let hpc3_toml = snap.read_toml("hpc3.toml").map_err(|e| e.to_string())?; - self.hpc3.load_state(&hpc3_toml)?; + let hpc3 = snap.read_state("hpc3", schema_version).map_err(|e| e.to_string())?; + self.hpc3.load_state(&hpc3)?; - // REX3 if let Some(rex3) = &self._phys.rex3 { - let rex3_toml = snap.read_toml("rex3.toml").map_err(|e| e.to_string())?; - rex3.load_state(&rex3_toml)?; - rex3.load_framebuffers(&snap.dir).map_err(|e| e.to_string())?; + let rex3_v = snap.read_state("rex3", schema_version).map_err(|e| e.to_string())?; + rex3.load_state(&rex3_v)?; + // v3+ stores framebuffers in the chunk store; v2 used .bin files. + if schema_version < 3 { + rex3.load_framebuffers(&snap.dir).map_err(|e| e.to_string())?; + } + } + + // Bulk memory: v3+ comes from the content-addressable chunk store + // shared across snapshots; v2 reads raw bank{N}.bin files. + if schema_version >= 3 { + let store = ChunkStore::new("saves"); + let chunks = snap.read_chunks_manifest() + .map_err(|e| format!("read chunks.bin: {}", e))?; + for (i, hashes) in chunks.bank_chunks.iter().enumerate() { + if hashes.is_empty() { continue; } + let words = get_chunks_as_words(&store, hashes) + .map_err(|e| format!("CAS bank{} get: {}", i, e))?; + self._phys.restore_bank_inmem(i, &words); + } + if let (Some(rex3), Some((rgb_h, aux_h))) = (&self._phys.rex3, &chunks.framebuffer_chunks) { + let rgb = get_chunks_as_words(&store, rgb_h) + .map_err(|e| format!("CAS rex3 rgb get: {}", e))?; + let aux = get_chunks_as_words(&store, aux_h) + .map_err(|e| format!("CAS rex3 aux get: {}", e))?; + rex3.restore_framebuffers_inmem(&rgb, &aux); + } + } else { + for i in 0..4 { + self._phys.load_bank(i, dir.join(format!("bank{}.bin", i))).map_err(|e| e.to_string())?; + } } - // Bulk memory — 4 × 128MB banks - for i in 0..4 { - self._phys.load_bank(i, dir.join(format!("bank{}.bin", i))).map_err(|e| e.to_string())?; + // COW overlays — best-effort for backward compatibility with + // snapshots saved before overlay capture was added. + if let Ok(cow_toml) = snap.read_toml("cow.toml") { + let mut sets: Vec<(usize, Vec)> = Vec::new(); + if let Some(tbl) = cow_toml.as_table() { + for (k, v) in tbl { + let Some(id_str) = k.strip_prefix("scsi") else { continue }; + let Ok(id) = id_str.parse::() else { continue }; + let Some(arr) = v.as_array() else { continue }; + let dirty: Vec = arr.iter() + .filter_map(|x| x.as_integer().map(|i| i as u64)) + .collect(); + sets.push((id, dirty)); + } + } + self.hpc3.scsi().import_overlays(&snap.dir, &sets) + .map_err(|e| format!("COW overlay import: {}", e))?; + } else { + eprintln!("load_snapshot: no cow.toml in snapshot — overlays left unchanged"); } self.restart_peripherals(); - println!("Snapshot loaded from saves/{}", name); Ok(()) } } diff --git a/src/main.rs b/src/main.rs index 28721b1..cf6a76d 100644 --- a/src/main.rs +++ b/src/main.rs @@ -5,6 +5,12 @@ fn main() { let (mut cfg, scale) = load_config(); let headless = cfg.headless; let gdb_port = cfg.gdb_port; + let ci_enabled = cfg.ci; + let ci_display = cfg.ci_display; + let ci_socket_path = cfg.ci_socket.clone(); + + // CI control socket will be started after Machine::new below (it needs a + // pointer into the constructed Machine). // Start unfsd before the machine so NFS is ready when IRIX boots. // If start_unfsd returns None (directory missing/uncreatable, or binary not found), @@ -24,6 +30,22 @@ fn main() { .unwrap(); machine.register_system_controller(); + // CI control socket: started after Machine::new so it can hand out the + // machine pointer + CiSerialBackend to command handlers. + #[cfg(unix)] + let _ci_server = if ci_enabled { + let mptr: *mut iris::machine::Machine = &mut *machine; + match iris::ci::start_server(mptr, &ci_socket_path) { + Ok(s) => Some(s), + Err(e) => { + eprintln!("iris: failed to start CI server: {}", e); + std::process::exit(1); + } + } + } else { + None + }; + // DIAG: optionally enable verbose logging from startup via IRIS_DEBUG_LOG. // IRIS_DEBUG_LOG="mc,mips" enables those modules. "all" enables everything. // Output is broadcast to a stderr sink so jit-diag.sh's tee captures it inline. @@ -57,14 +79,22 @@ fn main() { } machine.start(); - std::thread::spawn(|| { - Machine::run_console_client(); - }); - - if headless { - // Headless mode: no window, no graphics, no audio. - // Park the main thread and let the machine run until killed. - eprintln!("iris: running headless (no window)"); + if !ci_enabled { + std::thread::spawn(|| { + Machine::run_console_client(); + }); + } + + let show_window = !headless && !(ci_enabled && !ci_display); + if !show_window { + if headless { + eprintln!("iris: running headless (no REX3, no window)"); + } else if ci_enabled { + eprintln!("iris: --ci mode (REX3 rendering to offscreen buffer, no window)"); + } + // Park the main thread so background threads (CPU, REX3 refresh, + // CI socket) keep running. `quit` via the CI socket calls + // std::process::exit. std::thread::park(); } else { use iris::ui::Ui; diff --git a/src/mc.rs b/src/mc.rs index fa67513..a641f67 100644 --- a/src/mc.rs +++ b/src/mc.rs @@ -1332,4 +1332,48 @@ mod tests { // Read back via BusDevice { let _r = mc.read32(MC_BASE + REG_CPUCTRL0); assert!(_r.is_ok(), "Failed to read CPUCTRL0"); assert_eq!(_r.data, val); } } + + /// Phase 1.7 round-trip: a fresh MC loaded from a captured save_state must + /// re-serialize byte-identically. Catches load_state forgetting a field + /// (regs, semaphores, GIO DMA registers) that save_state writes. + #[test] + fn save_load_round_trip() { + let eeprom = Arc::new(Mutex::new(Eeprom93c56::new())); + let src = MemoryController::new(eeprom.clone(), true, [128, 128, 0, 0]); + + // Mutate registers and DMA state so we're not testing all-default state. + let _ = src.write32(MC_BASE + REG_CPUCTRL0, 0xdead_beef); + { + let mut s = src.state.lock(); + s.sys_semaphore = true; + s.user_semaphores[0] = true; + s.user_semaphores[7] = true; + s.user_semaphores[15] = true; + } + { + let mut d = src.giodma.state.lock(); + d.gio_mask = 0x0000_00ff; + d.gio_sub = 0x0000_0001; + d.cause = 0x0000_0010; + d.ctl = 0x4000_0000; + d.memadr = 0x0800_0000; + d.size = 0x0000_1000; + d.stride = 0x0000_0040; + d.gio_adr = 0x1f00_0000; + d.mode = 0x0000_0007; + d.count = 0x0000_0040; + d.run = 0x0000_0001; + d.stdma = 0x0000_0002; + d.tlb_hi[0] = 0xa5a5_a5a5; + d.tlb_lo[1] = 0x5a5a_5a5a; + d.run_real = true; + } + let v1 = src.save_state(); + + let dst = MemoryController::new(eeprom, true, [128, 128, 0, 0]); + dst.load_state(&v1).expect("load_state"); + let v2 = dst.save_state(); + + assert_eq!(v1, v2, "MemoryController save_state mismatch after load_state round-trip"); + } } diff --git a/src/mem.rs b/src/mem.rs index 07cef6d..5a1d659 100644 --- a/src/mem.rs +++ b/src/mem.rs @@ -82,6 +82,24 @@ impl Memory { } Ok(()) } + + /// Clone the bank's word buffer in native endian. Used by the in-memory + /// rollback checkpoint to capture state without touching disk. Caller + /// should ensure the CPU/peripheral threads are stopped to avoid a + /// torn read. + pub fn snapshot_words(&self) -> Vec { + let data = unsafe { self.data() }; + data.to_vec() + } + + /// Overwrite the bank's word buffer from `src`. Length is clamped to the + /// bank's word count; extra source words are dropped, missing tail words + /// are left untouched. Pair with `snapshot_words` for rollback. + pub fn restore_words(&self, src: &[u32]) { + let data = unsafe { self.data() }; + let n = src.len().min(data.len()); + data[..n].copy_from_slice(&src[..n]); + } } impl Resettable for Memory { @@ -321,3 +339,50 @@ impl BusDevice for UnmappedRam { fn read64(&self, _addr: u32) -> BusRead64 { BusRead64::ok(0) } fn write64(&self, _addr: u32, _v: u64) -> u32 { BUS_OK } } + +#[cfg(test)] +mod tests { + use super::*; + use crate::traits::BusDevice; + + #[test] + fn snapshot_and_restore_words_roundtrip() { + let m = Memory::new(1); // 1 MB = 256 K words + // Seed unique data via the bus interface so storage layout matches + // production access patterns. + for i in 0..32 { + let addr = (i * 4) as u32; + m.write32(addr, 0xDEAD0000 + i as u32); + } + let snap = m.snapshot_words(); + // Mutate in place. + for i in 0..32 { + let addr = (i * 4) as u32; + m.write32(addr, 0xCAFEBABE); + } + // Restore from snapshot, verify original data returns. + m.restore_words(&snap); + for i in 0..32 { + let addr = (i * 4) as u32; + let v = m.read32(addr).data; + assert_eq!(v, 0xDEAD0000 + i as u32, "word {} mismatch after restore", i); + } + } + + #[test] + fn restore_words_clamps_to_bank_size() { + let m = Memory::new(1); + let words = m.snapshot_words(); + let bank_words = words.len(); + // Source buffer larger than bank — should not panic, should not write + // past end. + let mut larger = vec![0xAAAAAAAAu32; bank_words + 100]; + for (i, w) in larger.iter_mut().enumerate().take(bank_words + 100) { + *w = (i as u32).wrapping_mul(7); + } + m.restore_words(&larger); + // Spot-check a word inside the bank. + let v = m.read32(0).data; + assert_eq!(v, 0); + } +} diff --git a/src/mips_core.rs b/src/mips_core.rs index 6225efb..c2f5054 100644 --- a/src/mips_core.rs +++ b/src/mips_core.rs @@ -80,9 +80,13 @@ pub struct MipsCore { /// Shared with the display refresh thread for status bar display. pub count_step_atomic: Arc, /// Cycle count when cp0_compare was last written (0 = never written yet). - compare_last_cycles: u64, - /// Wall-clock instant when cp0_compare was last written. - compare_last_instant: std::time::Instant, + /// `pub(crate)` so snapshot load in `mips_exec.rs` can re-anchor the + /// calibration after restoring CP0 fields. + pub(crate) compare_last_cycles: u64, + /// Wall-clock instant when cp0_compare was last written. Reset to + /// `Instant::now()` on snapshot load — Instants from a previous run are + /// meaningless across a restore. + pub(crate) compare_last_instant: std::time::Instant, /// Frequency map of CP0 Compare delta values (hardware counts, rounded to nearest 100). /// Key = `(delta >> 16) / 100 * 100`, value = number of occurrences. #[cfg(feature = "developer_ip7")] @@ -542,10 +546,23 @@ impl MipsCore { // Formula: count_step = delta * dt_ns / (dc * 1_000_000) // delta = count units to next compare (what the kernel programmed) // dc = instructions executed in last interval - // dt_ns = wall-clock ns elapsed in last interval - // = (count units per instruction) * (wall-clock stretch factor) + // dt_ns = ns elapsed in last interval + // = (count units per instruction) * (rate-stretch factor) // Only calibrate for ~1ms timer intervals (IRIX 1000 Hz scheduler); // leave count_step unchanged for other timer uses (one-shot, low-freq). + // + // Two clock sources, gated by `ci_clock`: + // default (interactive desktop): dt_ns from host `Instant::now()`, + // so the guest timer tracks real wall-clock. Sensitive to host + // scheduling jitter, but that's the price of a real-time desktop. + // --features ci_clock: dt_ns = dc * 10 ns (synthetic R4400 ~100 MIPS). + // Decouples guest-perceived time from host scheduling so the Phase + // 3.3 snapshot determinism validator passes at any N. Tradeoff: a + // CI run that takes 5 host minutes may present as 30 guest minutes + // depending on host MIPS — exactly what reproducible CI wants. + #[cfg(feature = "ci_clock")] + const NS_PER_GUEST_CYCLE: u64 = 10; + #[cfg(not(feature = "ci_clock"))] let now = std::time::Instant::now(); let cycles_now = self.local_cycles; // Compute new_delta before the calibration block so we can guard on it. @@ -554,9 +571,13 @@ impl MipsCore { let new_delta = self.cp0_compare.wrapping_sub(self.cp0_count); if new_delta >> 63 != 0 { self.compare_last_cycles = cycles_now; - self.compare_last_instant = now; + #[cfg(not(feature = "ci_clock"))] + { self.compare_last_instant = now; } } else if self.compare_last_cycles != 0 { let dc = cycles_now.wrapping_sub(self.compare_last_cycles); + #[cfg(feature = "ci_clock")] + let dt_ns = dc.saturating_mul(NS_PER_GUEST_CYCLE); + #[cfg(not(feature = "ci_clock"))] let dt_ns = now.duration_since(self.compare_last_instant).as_nanos() as u64; // new_delta: what the *next* interval will fire at, stored as 32.32 fp. #[cfg(feature = "developer_ip7")] @@ -598,7 +619,8 @@ impl MipsCore { } // First write: keep default count_step (1<<15), just record state. self.compare_last_cycles = cycles_now; - self.compare_last_instant = now; + #[cfg(not(feature = "ci_clock"))] + { self.compare_last_instant = now; } } 12 => { let old = self.cp0_status; diff --git a/src/mips_exec.rs b/src/mips_exec.rs index 6baaf57..a3674ff 100644 --- a/src/mips_exec.rs +++ b/src/mips_exec.rs @@ -4820,6 +4820,106 @@ impl MipsCpu { fn try_lock_executor(&self) -> Result>, String> { self.executor.try_lock().ok_or_else(|| "CPU thread holds the executor lock; try 'cpu stop' first".to_string()) } + + /// Step the executor `n` times in-line on the calling thread. Caller must + /// have stopped the runtime CPU thread first (otherwise we deadlock on + /// the executor mutex). Returns the number of steps actually executed — + /// will be `< n` only if the CPU stops itself (e.g. soft-reset). + /// + /// Used by Phase 3.3 snapshot determinism validator. Single-threaded, + /// no thread scheduling jitter, so two runs from identical state should + /// reach identical state after the same number of steps. + pub fn step_n_inline(&self, n: u64) -> Result { + let mut exec = self.try_lock_executor()?; + let mut executed = 0u64; + for _ in 0..n { + let _status = exec.step(); + executed += 1; + // Don't break on exceptions — they're part of normal CPU + // operation and a deterministic run should re-enter and continue. + } + exec.flush_cycles(); + Ok(executed) + } + + /// Snapshot the deterministic-from-state CPU registers. Excludes host + /// wallclock anchors like `compare_last_instant` (they're meaningless + /// across runs) but includes their calibrated equivalents (count_step, + /// compare_delta_*). + pub fn state_digest(&self) -> Result { + let exec = self.try_lock_executor()?; + let c = &exec.core; + Ok(CpuStateDigest { + gpr: c.gpr, + pc: c.pc, + hi: c.hi, + lo: c.lo, + cp0_count: c.cp0_count, + cp0_compare: c.cp0_compare, + cp0_status: c.cp0_status, + cp0_cause: c.cp0_cause, + cp0_epc: c.cp0_epc, + cp0_badvaddr: c.cp0_badvaddr, + cp0_entryhi: c.cp0_entryhi, + count_step: c.count_step, + in_delay_slot: exec.in_delay_slot, + }) + } +} + +/// Deterministic-from-state CPU register snapshot. Excludes host wallclock +/// anchors so two runs from the same starting state can be diffed cleanly. +/// `local_cycles` is intentionally not included — it's a runtime perf counter +/// that's not part of save_state and stays stale across `load_snapshot`. +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct CpuStateDigest { + pub gpr: [u64; 32], + pub pc: u64, + pub hi: u64, + pub lo: u64, + pub cp0_count: u64, + pub cp0_compare: u64, + pub cp0_status: u32, + pub cp0_cause: u32, + pub cp0_epc: u64, + pub cp0_badvaddr: u64, + pub cp0_entryhi: u64, + pub count_step: u64, + pub in_delay_slot: bool, +} + +impl CpuStateDigest { + /// Return a list of (field_name, lhs_repr, rhs_repr) for every field that + /// differs. Empty if states are bit-identical. For arrays, only diverging + /// indices are reported. + pub fn diff(&self, other: &CpuStateDigest) -> Vec<(String, String, String)> { + let mut out = Vec::new(); + for (i, (a, b)) in self.gpr.iter().zip(other.gpr.iter()).enumerate() { + if a != b { + out.push((format!("gpr[{}]", i), format!("0x{:016x}", a), format!("0x{:016x}", b))); + } + } + macro_rules! cmp { + ($name:ident, $fmt:expr) => { + if self.$name != other.$name { + out.push((stringify!($name).to_string(), format!($fmt, self.$name), format!($fmt, other.$name))); + } + }; + } + cmp!(pc, "0x{:016x}"); + cmp!(hi, "0x{:016x}"); + cmp!(lo, "0x{:016x}"); + cmp!(cp0_count, "0x{:016x}"); + cmp!(cp0_compare, "0x{:016x}"); + cmp!(cp0_status, "0x{:08x}"); + cmp!(cp0_cause, "0x{:08x}"); + cmp!(cp0_epc, "0x{:016x}"); + cmp!(cp0_badvaddr, "0x{:016x}"); + cmp!(cp0_entryhi, "0x{:016x}"); + cmp!(count_step, "{}"); + cmp!(in_delay_slot, "{}"); + out + } } fn is_call_instruction(instr: u32) -> bool { @@ -5951,7 +6051,16 @@ impl Saveable for MipsCp ($f:ident) => { cp0.insert(stringify!($f).into(), hex_u64(c.$f)); } } cp0u32!(cp0_index); cp0u32!(cp0_random); cp0u32!(cp0_wired); - cp0u64!(cp0_count); cp0u64!(cp0_compare); cp0u32!(cp0_status); cp0u32!(cp0_cause); + cp0u64!(cp0_count); cp0u64!(cp0_compare); + // Timer calibration state. Without these, restore loses the kernel's + // learned tick rate and runs at the default count_step until IRIX + // touches Compare again — guest scheduler drifts noticeably for the + // first few seconds after every restore. compare_last_cycles and + // compare_last_instant are intentionally not saved: they're host-wall + // anchors, not calibrated state, and must be reset on load. + cp0u64!(count_step); cp0u64!(compare_delta_prev); + cp0u64!(compare_delta_slow); cp0u64!(compare_delta_fast); + cp0u32!(cp0_status); cp0u32!(cp0_cause); cp0u32!(cp0_prid); cp0u32!(cp0_config); cp0u32!(cp0_lladdr); cp0u32!(cp0_watchlo); cp0u32!(cp0_watchhi); cp0u32!(cp0_ecc); cp0u32!(cp0_cacheerr); cp0u32!(cp0_taglo); cp0u32!(cp0_taghi); @@ -6005,6 +6114,19 @@ impl Saveable for MipsCp }} ld32!(cp0_index); ld32!(cp0_random); ld32!(cp0_wired); ld64!(cp0_count); ld64!(cp0_compare); + ld64!(count_step); ld64!(compare_delta_prev); + ld64!(compare_delta_slow); ld64!(compare_delta_fast); + // Mirror count_step into its atomic shadow (read by the display + // thread) so the live UI matches the restored core state. + c.count_step_atomic.store(c.count_step, std::sync::atomic::Ordering::Relaxed); + // Re-anchor the host-wall calibration timer. Setting cycles to 0 + // forces the next CP0 Compare write to take the "first write" + // path (no calibration), which is what we want — dt_ns measured + // against an Instant from the previous run would be garbage. The + // saved count_step keeps the rate steady until calibration catches + // up over the next few Compare writes. + c.compare_last_cycles = 0; + c.compare_last_instant = std::time::Instant::now(); ld32!(cp0_status); ld32!(cp0_cause); ld32!(cp0_prid); ld32!(cp0_config); ld32!(cp0_lladdr); ld32!(cp0_watchlo); ld32!(cp0_watchhi); ld32!(cp0_ecc); ld32!(cp0_cacheerr); ld32!(cp0_taglo); ld32!(cp0_taghi); diff --git a/src/mips_tlb.rs b/src/mips_tlb.rs index 56ffa2c..7e7c96c 100644 --- a/src/mips_tlb.rs +++ b/src/mips_tlb.rs @@ -630,6 +630,16 @@ impl Tlb for MipsTlb { self.vmap_fill(i); } } + // Reset MRU lists to canonical post-power-on order. Without this, two + // restores of the same snapshot can have different `tlbwr` victims if + // the prior session's MRU history leaks in. + for list in 0..MRU_LISTS { + self.mru_head[list] = 0; + for i in 0..TLB_NUM_ENTRIES - 1 { + self.mru_next[list][i] = (i + 1) as u8; + } + self.mru_next[list][TLB_NUM_ENTRIES - 1] = MRU_NONE; + } Ok(()) } diff --git a/src/mips_tlb_test.rs b/src/mips_tlb_test.rs index 2948d55..6eb7eab 100644 --- a/src/mips_tlb_test.rs +++ b/src/mips_tlb_test.rs @@ -283,4 +283,29 @@ mod tests { let result = tlb.probe(va_wrong_r, asid, true); assert_eq!(result & 0x80000000, 0x80000000, "Expected probe to miss with wrong R field"); } + + /// Phase 1.7 round-trip: a fresh TLB loaded from a captured save_state must + /// re-serialize byte-identically. Catches load_state forgetting a field + /// that save_state writes. + #[test] + fn save_load_round_trip() { + let mut src = MipsTlb::new(TLB_NUM_ENTRIES); + // Write a few entries with varied bit patterns so we're not just + // testing all-zero defaults. + for (slot, vpn2) in [(0usize, 0x100u64), (5, 0x800), (17, 0x4000), (47, 0xffff)].iter().copied() { + let mut e = TlbEntry::new(); + e.page_mask = (slot as u64) << 13; + e.entry_hi = (2u64 << 62) | (vpn2 << 13) | (slot as u64 & 0xff); + e.entry_lo0 = ((slot as u64) << 6) | (3 << 3) | 0x6; + e.entry_lo1 = ((slot as u64 + 1) << 6) | (3 << 3) | 0x6; + src.write(slot, e); + } + let v1 = src.save_state(); + + let mut dst = MipsTlb::new(TLB_NUM_ENTRIES); + dst.load_state(&v1).expect("load_state"); + let v2 = dst.save_state(); + + assert_eq!(v1, v2, "MipsTlb save_state mismatch after load_state round-trip"); + } } diff --git a/src/net.rs b/src/net.rs index 09f31a4..52e980b 100644 --- a/src/net.rs +++ b/src/net.rs @@ -1057,26 +1057,39 @@ impl NatEngine { // ── NFS destination remapping ───────────────────────────────────────────── // - // IRIX talks to 192.168.0.1 on VM-visible NFS/mountd ports. Rewrite the - // destination to 127.0.0.1 on the high host-side ports where unfsd listens. + // Rewrite guest outbound destination to a host-reachable address. + // + // IRIX sees the gateway at 192.168.0.1 but that's a virtual address iris + // doesn't actually bind to, so unmodified TcpStream::connect() fails. We + // rewrite any gateway-destined packet to 127.0.0.1. NFS ports additionally + // shift to the high host ports where unfsd listens. fn nfs_remap_dst(&self, dst_ip: Ipv4Addr, dport: u16) -> (Ipv4Addr, u16) { - let Some(nfs) = &self.config.nfs else { return (dst_ip, dport); }; if dst_ip != self.config.gateway_ip { return (dst_ip, dport); } - match dport { - NFS_VM_PORT => (Ipv4Addr::LOCALHOST, nfs.nfs_host_port), - MOUNTD_VM_PORT => (Ipv4Addr::LOCALHOST, nfs.mountd_host_port), - _ => (dst_ip, dport), + if let Some(nfs) = &self.config.nfs { + match dport { + NFS_VM_PORT => return (Ipv4Addr::LOCALHOST, nfs.nfs_host_port), + MOUNTD_VM_PORT => return (Ipv4Addr::LOCALHOST, nfs.mountd_host_port), + _ => {} + } } + // Generic outbound: guest→gateway becomes guest→host loopback on + // the same port. Lets the guest reach any service the host is + // running on 127.0.0.1: (pyftpdlib on 2121, python -m + // http.server, etc.). + (Ipv4Addr::LOCALHOST, dport) } - // Reverse: translate (127.0.0.1, host_port) back to (192.168.0.1, vm_port) - // so replies to IRIX appear to come from the gateway on the standard NFS ports. + // Reverse: translate (127.0.0.1, host_port) back to the address the guest + // dialed, so replies look like they came from the gateway. fn nfs_unmap_src(&self, src_ip: Ipv4Addr, sport: u16) -> (Ipv4Addr, u16) { - let Some(nfs) = &self.config.nfs else { return (src_ip, sport); }; if src_ip != Ipv4Addr::LOCALHOST { return (src_ip, sport); } - if sport == nfs.nfs_host_port { return (self.config.gateway_ip, NFS_VM_PORT); } - if sport == nfs.mountd_host_port { return (self.config.gateway_ip, MOUNTD_VM_PORT); } - (src_ip, sport) + if let Some(nfs) = &self.config.nfs { + if sport == nfs.nfs_host_port { return (self.config.gateway_ip, NFS_VM_PORT); } + if sport == nfs.mountd_host_port { return (self.config.gateway_ip, MOUNTD_VM_PORT); } + } + // Generic outbound: reply from host-side dport becomes gateway:dport + // to the guest. + (self.config.gateway_ip, sport) } // ── Portmap (port 111) — tiny inline RPC GETPORT responder ─────────────── diff --git a/src/physical.rs b/src/physical.rs index fcb87fd..0dbc4cf 100644 --- a/src/physical.rs +++ b/src/physical.rs @@ -238,6 +238,17 @@ impl Physical { bank.power_on(); } } + + /// Snapshot bank `bank` into a native-endian Vec. Used by the + /// in-memory rollback checkpoint to skip the disk byte-shuffle. + pub fn snapshot_bank_inmem(&self, bank: usize) -> Vec { + self.banks[bank].snapshot_words() + } + + /// Restore bank `bank` from a buffer produced by `snapshot_bank_inmem`. + pub fn restore_bank_inmem(&self, bank: usize, src: &[u32]) { + self.banks[bank].restore_words(src); + } } impl Physical { diff --git a/src/pit8254.rs b/src/pit8254.rs index 24bcfe8..519168c 100644 --- a/src/pit8254.rs +++ b/src/pit8254.rs @@ -593,6 +593,25 @@ mod tests { assert!(delta < 1000, "count={} expected ~{} (delta={})", count, expected, delta); } + + /// Phase 1.7 round-trip: program a few channels with non-default values, + /// save, load into a fresh PIT, save again, assert the two save_states are + /// byte-identical. Catches load_state forgetting a channel field. + #[test] + fn save_load_round_trip() { + let _lock = SERIAL.lock().unwrap(); + let src = make_pit(1_000_000); + program_mode2(&src, 0, 0x1234); + program_mode2(&src, 1, 0x5678); + program_mode2(&src, 2, 0xabcd); + let v1 = src.save_state(); + + let dst = make_pit(1_000_000); + dst.load_state(&v1).expect("load_state"); + let v2 = dst.save_state(); + + assert_eq!(v1, v2, "Pit8254 save_state mismatch after load_state round-trip"); + } } impl Saveable for Pit8254 { diff --git a/src/ps2.rs b/src/ps2.rs index 908d918..b881e6a 100644 --- a/src/ps2.rs +++ b/src/ps2.rs @@ -910,6 +910,42 @@ impl Saveable for Ps2Controller { } } +#[cfg(test)] +mod tests { + use super::*; + + /// Phase 1.7 round-trip: a fresh PS/2 controller loaded from a captured + /// save_state must re-serialize byte-identically. Mutates rx_queue, + /// command_state, and assorted flag/byte fields so the test exercises every + /// branch of load_state. + #[test] + fn save_load_round_trip() { + let src = Ps2Controller::new(None); + { + let mut s = src.state.lock(); + s.rx_queue.push_back((0xfa, Ps2Source::Keyboard)); + s.rx_queue.push_back((0x42, Ps2Source::Mouse)); + s.rx_queue.push_back((0xee, Ps2Source::MouseCmd)); + s.mouse_queue_bytes = 1; + s.next_write_is_mouse = true; + s.led_state = 0x07; + s.scancode_set = 1; + s.config = 0x65; + s.command_state = CommandState::SetTypematic; + s.scanning_enabled = true; + s.mouse_enabled = true; + s.last_read = 0x12; + } + let v1 = src.save_state(); + + let dst = Ps2Controller::new(None); + dst.load_state(&v1).expect("load_state"); + let v2 = dst.save_state(); + + assert_eq!(v1, v2, "Ps2Controller save_state mismatch after load_state round-trip"); + } +} + #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(u16)] pub enum ScancodeSet1 { diff --git a/src/registry.rs b/src/registry.rs new file mode 100644 index 0000000..1df270b --- /dev/null +++ b/src/registry.rs @@ -0,0 +1,526 @@ +//! Phase 3.4: HTTP snapshot registry. +//! +//! Pull/push iris snapshots between machines, Docker-layer-style. The CAS +//! chunk store from Phase 3.1 makes this nearly free at the wire level: only +//! chunks that the receiving side doesn't already have are transferred. +//! +//! ## URL layout +//! +//! Mirrors the on-disk layout, so any static file server pointing at +//! `saves/` (e.g. `python3 -m http.server` running in your saves directory) +//! works as a read-only pull source. Push needs a server that accepts PUT. +//! +//! ```text +//! GET /snapshots//snapshot.toml ← manifest, schema_version +//! GET /snapshots//cpu.bin ← v2+ device state +//! GET /snapshots//chunks.bin ← v3+ CAS hash list +//! GET /snapshots//cow.toml ← per-SCSI dirty sectors +//! GET /snapshots//scsi1.overlay ← per-SCSI overlay bytes +//! GET /cas//.chunk ← content-addressed RAM chunk +//! ``` +//! +//! ## Wire format +//! +//! Hand-rolled HTTP/1.1 over `std::net::TcpStream` — no new dependency. HTTP +//! only (no TLS); use behind a tunnel or trusted network. Single-request, +//! single-connection — no keep-alive. Plenty for snapshot transfers because +//! the per-request overhead is dwarfed by the chunk payload. +//! +//! ## Commit ordering +//! +//! Push uploads chunks first, then `snapshot.toml` LAST. An interrupted push +//! leaves orphan chunks (which `gc` on the server side will sweep) but never +//! a half-published snapshot manifest pointing at missing chunks. Pull +//! validates `chunks.bin` against fetched chunks at the end so a torn pull +//! is detectable. + +use std::io::{Read, Write}; +use std::net::{TcpStream, ToSocketAddrs}; +use std::path::{Path, PathBuf}; +use std::time::Duration; + +use crate::chunk_store::{ChunkHash, ChunkStore}; +use crate::snapshot::{ChunksManifest, Snapshot}; + +const HTTP_TIMEOUT: Duration = Duration::from_secs(60); + +/// Outcome of a `pull` or `push` operation. JSON-serializable for the CI socket. +#[derive(Debug, Clone, Default)] +pub struct TransferReport { + pub chunks_fetched: u64, + pub chunks_skipped: u64, + pub bytes_transferred: u64, + pub files_transferred: u64, +} + +/// Pull a snapshot from `base_url` into the local `saves_dir`. Idempotent — +/// chunks already in the local store are not re-downloaded. Returns a +/// transfer report. +pub fn pull(base_url: &str, name: &str, saves_dir: &Path) -> Result { + if name.is_empty() || name.contains("..") { + return Err("pull: invalid snapshot name".into()); + } + let base = base_url.trim_end_matches('/'); + let mut report = TransferReport::default(); + + let snap_dir = saves_dir.join(name); + std::fs::create_dir_all(&snap_dir).map_err(|e| format!("create {}: {}", snap_dir.display(), e))?; + + // 1. Manifest first — tells us schema_version, which gates which other + // files exist. + let manifest_url = format!("{}/snapshots/{}/snapshot.toml", base, name); + let manifest_bytes = http_get(&manifest_url).map_err(|e| format!("fetch manifest: {}", e))?; + std::fs::write(snap_dir.join("snapshot.toml"), &manifest_bytes) + .map_err(|e| format!("write snapshot.toml: {}", e))?; + report.files_transferred += 1; + report.bytes_transferred += manifest_bytes.len() as u64; + + let snap_local = Snapshot::new(&snap_dir); + let manifest = snap_local + .read_manifest() + .map_err(|e| format!("parse manifest: {}", e))? + .ok_or_else(|| "manifest missing or unparseable".to_string())?; + let sv = manifest.schema_version; + + // 2. Per-device state. v3+ uses .bin (postcard); v1 used .toml. v0 has no + // manifest so we don't get here. + let device_bases = [ + "cpu", "mc", "ioc", "scc", "pit", "ps2", "rtc", + "eeprom", "scsi", "seeq", "hpc3", "rex3", + ]; + let suffix = if sv >= 2 { "bin" } else { "toml" }; + for base_name in device_bases { + let url = format!("{}/snapshots/{}/{}.{}", base, name, base_name, suffix); + match http_get(&url) { + Ok(bytes) => { + std::fs::write(snap_dir.join(format!("{}.{}", base_name, suffix)), &bytes) + .map_err(|e| format!("write {}.{}: {}", base_name, suffix, e))?; + report.files_transferred += 1; + report.bytes_transferred += bytes.len() as u64; + } + Err(e) if e.contains("404") => { + // rex3 is optional (headless configs skip it); other devices + // could in principle be absent in a future config. + continue; + } + Err(e) => return Err(format!("fetch {}.{}: {}", base_name, suffix, e)), + } + } + + // 3. cow.toml (overlay dirty sector lists) and the scsi*.overlay files. + if let Ok(bytes) = http_get(&format!("{}/snapshots/{}/cow.toml", base, name)) { + std::fs::write(snap_dir.join("cow.toml"), &bytes) + .map_err(|e| format!("write cow.toml: {}", e))?; + report.files_transferred += 1; + report.bytes_transferred += bytes.len() as u64; + + if let Ok(text) = std::str::from_utf8(&bytes) { + if let Ok(toml::Value::Table(t)) = text.parse::() { + for (key, _) in t { + if let Some(id_str) = key.strip_prefix("scsi") { + if id_str.parse::().is_ok() { + let fname = format!("{}.overlay", key); + let url = format!("{}/snapshots/{}/{}", base, name, fname); + if let Ok(b) = http_get(&url) { + std::fs::write(snap_dir.join(&fname), &b) + .map_err(|e| format!("write {}: {}", fname, e))?; + report.files_transferred += 1; + report.bytes_transferred += b.len() as u64; + } + } + } + } + } + } + } + + // 4. v3+: fetch chunks.bin, then any chunk hashes the local store doesn't + // already have. + if sv >= 3 { + let chunks_url = format!("{}/snapshots/{}/chunks.bin", base, name); + let chunks_bytes = http_get(&chunks_url).map_err(|e| format!("fetch chunks.bin: {}", e))?; + let chunks: ChunksManifest = postcard::from_bytes(&chunks_bytes) + .map_err(|e| format!("parse chunks.bin: {}", e))?; + std::fs::write(snap_dir.join("chunks.bin"), &chunks_bytes) + .map_err(|e| format!("write chunks.bin: {}", e))?; + report.files_transferred += 1; + report.bytes_transferred += chunks_bytes.len() as u64; + + let store = ChunkStore::new(saves_dir); + let mut seen: std::collections::HashSet = std::collections::HashSet::new(); + for hash in chunks.referenced_hashes() { + if !seen.insert(*hash) { + continue; + } + if store.has(hash) { + report.chunks_skipped += 1; + continue; + } + let url = format!("{}/cas/{}/{}.chunk", base, hex2_of(hash), hex62_of(hash)); + let bytes = http_get(&url).map_err(|e| format!("fetch chunk {}: {}", hex_of(hash), e))?; + // Validate the server gave us the right content. + let actual: ChunkHash = blake3::hash(&bytes).into(); + if &actual != hash { + return Err(format!( + "chunk hash mismatch for {}: got {}", + hex_of(hash), + hex_of(&actual) + )); + } + store + .put(&bytes) + .map_err(|e| format!("store chunk {}: {}", hex_of(hash), e))?; + report.chunks_fetched += 1; + report.bytes_transferred += bytes.len() as u64; + } + } + + Ok(report) +} + +/// Push a local snapshot to `base_url`. Uploads only chunks the server +/// doesn't already have. Manifest goes LAST so an interrupted push never +/// leaves a half-committed snapshot. +pub fn push(base_url: &str, name: &str, saves_dir: &Path) -> Result { + if name.is_empty() || name.contains("..") { + return Err("push: invalid snapshot name".into()); + } + let base = base_url.trim_end_matches('/'); + let mut report = TransferReport::default(); + + let snap_dir = saves_dir.join(name); + if !snap_dir.is_dir() { + return Err(format!("push: snapshot '{}' not found", name)); + } + let snap_local = Snapshot::new(&snap_dir); + let manifest = snap_local + .read_manifest() + .map_err(|e| format!("read manifest: {}", e))? + .ok_or_else(|| "manifest missing — only v1+ snapshots can be pushed".to_string())?; + let sv = manifest.schema_version; + + // 1. Chunks first (v3+). Manifest goes last so the snapshot only becomes + // visible to pullers once all its chunks are in place. + if sv >= 3 { + let chunks_path = snap_dir.join("chunks.bin"); + let chunks_bytes = std::fs::read(&chunks_path).map_err(|e| format!("read chunks.bin: {}", e))?; + let chunks: ChunksManifest = postcard::from_bytes(&chunks_bytes) + .map_err(|e| format!("parse chunks.bin: {}", e))?; + let store = ChunkStore::new(saves_dir); + let mut seen: std::collections::HashSet = std::collections::HashSet::new(); + for hash in chunks.referenced_hashes() { + if !seen.insert(*hash) { + continue; + } + let url = format!("{}/cas/{}/{}.chunk", base, hex2_of(hash), hex62_of(hash)); + if http_head(&url).unwrap_or(false) { + report.chunks_skipped += 1; + continue; + } + let bytes = store + .get(hash) + .map_err(|e| format!("read chunk {}: {}", hex_of(hash), e))?; + http_put(&url, &bytes).map_err(|e| format!("PUT chunk {}: {}", hex_of(hash), e))?; + report.chunks_fetched += 1; + report.bytes_transferred += bytes.len() as u64; + } + } + + // 2. Per-device state. + let device_bases = [ + "cpu", "mc", "ioc", "scc", "pit", "ps2", "rtc", + "eeprom", "scsi", "seeq", "hpc3", "rex3", + ]; + let suffix = if sv >= 2 { "bin" } else { "toml" }; + for base_name in device_bases { + let p = snap_dir.join(format!("{}.{}", base_name, suffix)); + if !p.exists() { + continue; // rex3 may legitimately be absent + } + let bytes = std::fs::read(&p).map_err(|e| format!("read {}: {}", p.display(), e))?; + let url = format!("{}/snapshots/{}/{}.{}", base, name, base_name, suffix); + http_put(&url, &bytes).map_err(|e| format!("PUT {}.{}: {}", base_name, suffix, e))?; + report.files_transferred += 1; + report.bytes_transferred += bytes.len() as u64; + } + + // 3. cow.toml + scsi*.overlay (each overlay file is a sector-image, can be MB). + let cow_path = snap_dir.join("cow.toml"); + if cow_path.exists() { + let cow_bytes = std::fs::read(&cow_path).map_err(|e| format!("read cow.toml: {}", e))?; + // Push overlay binaries first, cow.toml last (the index that lists them). + if let Ok(text) = std::str::from_utf8(&cow_bytes) { + if let Ok(toml::Value::Table(t)) = text.parse::() { + for (key, _) in t { + if let Some(id_str) = key.strip_prefix("scsi") { + if id_str.parse::().is_ok() { + let fname = format!("{}.overlay", key); + let p = snap_dir.join(&fname); + if p.exists() { + let bytes = std::fs::read(&p) + .map_err(|e| format!("read {}: {}", fname, e))?; + let url = format!("{}/snapshots/{}/{}", base, name, fname); + http_put(&url, &bytes) + .map_err(|e| format!("PUT {}: {}", fname, e))?; + report.files_transferred += 1; + report.bytes_transferred += bytes.len() as u64; + } + } + } + } + } + } + let url = format!("{}/snapshots/{}/cow.toml", base, name); + http_put(&url, &cow_bytes).map_err(|e| format!("PUT cow.toml: {}", e))?; + report.files_transferred += 1; + report.bytes_transferred += cow_bytes.len() as u64; + } + + // 4. chunks.bin (v3+) — uploaded BEFORE manifest because pullers fetch + // it after manifest and would otherwise race a concurrent push. + if sv >= 3 { + let chunks_bytes = std::fs::read(snap_dir.join("chunks.bin")) + .map_err(|e| format!("read chunks.bin: {}", e))?; + let url = format!("{}/snapshots/{}/chunks.bin", base, name); + http_put(&url, &chunks_bytes).map_err(|e| format!("PUT chunks.bin: {}", e))?; + report.files_transferred += 1; + report.bytes_transferred += chunks_bytes.len() as u64; + } + + // 5. Manifest LAST (commit point). + let manifest_bytes = std::fs::read(snap_dir.join("snapshot.toml")) + .map_err(|e| format!("read snapshot.toml: {}", e))?; + let url = format!("{}/snapshots/{}/snapshot.toml", base, name); + http_put(&url, &manifest_bytes).map_err(|e| format!("PUT snapshot.toml: {}", e))?; + report.files_transferred += 1; + report.bytes_transferred += manifest_bytes.len() as u64; + + Ok(report) +} + +// ---- minimal HTTP/1.1 client over std::net ---- + +struct ParsedUrl { + host: String, + port: u16, + path: String, +} + +fn parse_url(url: &str) -> Result { + let rest = url + .strip_prefix("http://") + .ok_or_else(|| format!("only http:// URLs supported, got {}", url))?; + let (host_port, path) = match rest.find('/') { + Some(i) => (&rest[..i], &rest[i..]), + None => (rest, "/"), + }; + let (host, port) = match host_port.rsplit_once(':') { + Some((h, p)) => (h.to_string(), p.parse().map_err(|e| format!("bad port: {}", e))?), + None => (host_port.to_string(), 80u16), + }; + Ok(ParsedUrl { + host, + port, + path: path.to_string(), + }) +} + +fn http_send(method: &str, url: &str, body: Option<&[u8]>) -> Result<(u16, Vec), String> { + let p = parse_url(url)?; + let addr = (p.host.as_str(), p.port) + .to_socket_addrs() + .map_err(|e| format!("resolve {}: {}", p.host, e))? + .next() + .ok_or_else(|| format!("no addresses for {}", p.host))?; + let mut s = TcpStream::connect_timeout(&addr, HTTP_TIMEOUT).map_err(|e| format!("connect: {}", e))?; + s.set_read_timeout(Some(HTTP_TIMEOUT)).ok(); + s.set_write_timeout(Some(HTTP_TIMEOUT)).ok(); + + let mut req = Vec::with_capacity(256); + write!(req, "{} {} HTTP/1.1\r\nHost: {}:{}\r\nConnection: close\r\n", + method, p.path, p.host, p.port).map_err(|e| e.to_string())?; + if let Some(b) = body { + write!(req, "Content-Length: {}\r\nContent-Type: application/octet-stream\r\n", b.len()) + .map_err(|e| e.to_string())?; + } + req.extend_from_slice(b"\r\n"); + if let Some(b) = body { + req.extend_from_slice(b); + } + s.write_all(&req).map_err(|e| format!("write request: {}", e))?; + + let mut buf = Vec::new(); + s.read_to_end(&mut buf).map_err(|e| format!("read response: {}", e))?; + + // Parse status + headers + body. + let split = buf + .windows(4) + .position(|w| w == b"\r\n\r\n") + .ok_or_else(|| "malformed response: no header terminator".to_string())?; + let header = std::str::from_utf8(&buf[..split]) + .map_err(|e| format!("non-utf8 header: {}", e))?; + let body_start = split + 4; + let mut lines = header.lines(); + let status_line = lines.next().ok_or_else(|| "empty response".to_string())?; + let status: u16 = status_line + .split_whitespace() + .nth(1) + .and_then(|s| s.parse().ok()) + .ok_or_else(|| format!("bad status line: {}", status_line))?; + + // Handle Content-Length and Transfer-Encoding: chunked. + let mut content_length: Option = None; + let mut chunked = false; + for line in lines { + if let Some(v) = line.strip_prefix_ignore_case("Content-Length: ") { + content_length = v.trim().parse().ok(); + } + if line.eq_ignore_ascii_case("Transfer-Encoding: chunked") { + chunked = true; + } + } + let body = if chunked { + decode_chunked(&buf[body_start..])? + } else if let Some(n) = content_length { + buf[body_start..body_start + n.min(buf.len() - body_start)].to_vec() + } else { + buf[body_start..].to_vec() + }; + Ok((status, body)) +} + +trait StripPrefixIgnoreCase { + fn strip_prefix_ignore_case(&self, prefix: &str) -> Option<&str>; +} +impl StripPrefixIgnoreCase for str { + fn strip_prefix_ignore_case(&self, prefix: &str) -> Option<&str> { + if self.len() >= prefix.len() && self[..prefix.len()].eq_ignore_ascii_case(prefix) { + Some(&self[prefix.len()..]) + } else { + None + } + } +} + +fn decode_chunked(data: &[u8]) -> Result, String> { + let mut out = Vec::with_capacity(data.len()); + let mut i = 0; + while i < data.len() { + // Read the size line up to \r\n. + let crlf = data[i..] + .windows(2) + .position(|w| w == b"\r\n") + .ok_or_else(|| "chunked: missing size CRLF".to_string())?; + let size_str = std::str::from_utf8(&data[i..i + crlf]) + .map_err(|_| "chunked: non-utf8 size line".to_string())?; + let size = usize::from_str_radix(size_str.split(';').next().unwrap_or(size_str).trim(), 16) + .map_err(|e| format!("chunked: bad size {}: {}", size_str, e))?; + i += crlf + 2; + if size == 0 { + break; + } + if i + size > data.len() { + return Err("chunked: short chunk data".into()); + } + out.extend_from_slice(&data[i..i + size]); + i += size + 2; // skip trailing \r\n + } + Ok(out) +} + +fn http_get(url: &str) -> Result, String> { + let (status, body) = http_send("GET", url, None)?; + if status == 200 { + Ok(body) + } else { + Err(format!("HTTP {} for {}", status, url)) + } +} + +fn http_head(url: &str) -> Result { + let (status, _) = http_send("HEAD", url, None)?; + Ok(status == 200) +} + +fn http_put(url: &str, body: &[u8]) -> Result<(), String> { + let (status, resp) = http_send("PUT", url, Some(body))?; + if (200..300).contains(&status) { + Ok(()) + } else { + Err(format!( + "HTTP {} for PUT {}: {}", + status, + url, + String::from_utf8_lossy(&resp) + )) + } +} + +// ---- hex helpers (mirror chunk_store layout) ---- + +fn hex_of(h: &ChunkHash) -> String { + const HEX: &[u8; 16] = b"0123456789abcdef"; + let mut s = String::with_capacity(64); + for &b in h.iter() { + s.push(HEX[(b >> 4) as usize] as char); + s.push(HEX[(b & 0x0f) as usize] as char); + } + s +} +fn hex2_of(h: &ChunkHash) -> String { + let s = hex_of(h); + s[..2].to_string() +} +fn hex62_of(h: &ChunkHash) -> String { + let s = hex_of(h); + s[2..].to_string() +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn parse_url_default_port() { + let p = parse_url("http://example.com/foo/bar").unwrap(); + assert_eq!(p.host, "example.com"); + assert_eq!(p.port, 80); + assert_eq!(p.path, "/foo/bar"); + } + + #[test] + fn parse_url_explicit_port() { + let p = parse_url("http://localhost:8080/").unwrap(); + assert_eq!(p.host, "localhost"); + assert_eq!(p.port, 8080); + assert_eq!(p.path, "/"); + } + + #[test] + fn parse_url_no_path() { + let p = parse_url("http://localhost:8080").unwrap(); + assert_eq!(p.path, "/"); + } + + #[test] + fn parse_url_rejects_https() { + assert!(parse_url("https://example.com/").is_err()); + } + + #[test] + fn decode_chunked_happy() { + let data = b"4\r\nWiki\r\n5\r\npedia\r\n0\r\n\r\n"; + let out = decode_chunked(data).unwrap(); + assert_eq!(out, b"Wikipedia"); + } + + #[test] + fn hex_round_trip() { + let h: ChunkHash = blake3::hash(b"hello").into(); + let s = hex_of(&h); + assert_eq!(s.len(), 64); + assert_eq!(hex2_of(&h).len(), 2); + assert_eq!(hex62_of(&h).len(), 62); + assert_eq!(format!("{}{}", hex2_of(&h), hex62_of(&h)), s); + } +} diff --git a/src/rex3.rs b/src/rex3.rs index db1399c..20ea6c8 100644 --- a/src/rex3.rs +++ b/src/rex3.rs @@ -5,7 +5,7 @@ use std::thread; use crossbeam_utils::CachePadded; use crate::traits::{BusRead8, BusRead16, BusRead32, BusRead64, BUS_OK, BUS_ERR, BusDevice, Device, Resettable, Saveable}; use crate::devlog::{LogModule, devlog_is_active, devlog}; -use crate::snapshot::{get_field, u32_slice_to_toml, u16_slice_to_toml, load_u32_slice, load_u16_slice, toml_u32, toml_u64, hex_u32, hex_u64}; +use crate::snapshot::{get_field, u32_slice_to_toml, u16_slice_to_toml, u8_slice_to_toml, load_u32_slice, load_u16_slice, load_u8_slice, toml_u32, toml_u64, toml_u8, hex_u32, hex_u64, hex_u8}; use std::cell::{Cell, UnsafeCell}; use crate::vc2::Vc2; use crate::xmap9::Xmap9; @@ -3671,6 +3671,28 @@ impl Rex3 { Ok(()) } + /// Clone the framebuffers (RGB and aux) into native-endian Vec + /// buffers. Pair with `restore_framebuffers_inmem` for the in-memory + /// rollback checkpoint; bypasses the byte-shuffle the disk path needs. + pub fn snapshot_framebuffers_inmem(&self) -> (Vec, Vec) { + let rgb = unsafe { &*self.fb_rgb.get() }; + let aux = unsafe { &*self.fb_aux.get() }; + (rgb.to_vec(), aux.to_vec()) + } + + /// Restore framebuffers from buffers captured by + /// `snapshot_framebuffers_inmem`. Lengths are clamped to the actual + /// framebuffer size. + pub fn restore_framebuffers_inmem(&self, rgb: &[u32], aux: &[u32]) { + let dst_rgb = unsafe { &mut *self.fb_rgb.get() }; + let n = rgb.len().min(dst_rgb.len()); + dst_rgb[..n].copy_from_slice(&rgb[..n]); + + let dst_aux = unsafe { &mut *self.fb_aux.get() }; + let n = aux.len().min(dst_aux.len()); + dst_aux[..n].copy_from_slice(&aux[..n]); + } + pub fn load_framebuffers(&self, dir: &std::path::Path) -> std::io::Result<()> { let path_rgb = dir.join("rex3_rgb.bin"); if path_rgb.exists() { @@ -4657,6 +4679,13 @@ impl Saveable for Rex3 { tbl.insert("cmap1".into(), save_cmap(&cmap)); } + // Bt445 RAMDAC (palette + registers) — missing this makes every + // pixel decode to black after restore. + { + let dac = self.bt445.lock(); + tbl.insert("bt445".into(), save_bt445(&dac)); + } + toml::Value::Table(tbl) } @@ -4691,6 +4720,7 @@ impl Saveable for Rex3 { if let Some(xv) = get_field(v, "xmap1") { load_xmap9(&mut self.xmap1.lock(), xv); } if let Some(cv) = get_field(v, "cmap0") { load_cmap(&mut self.cmap0.lock(), cv); } if let Some(cv) = get_field(v, "cmap1") { load_cmap(&mut self.cmap1.lock(), cv); } + if let Some(dv) = get_field(v, "bt445") { load_bt445(&mut self.bt445.lock(), dv); } Ok(()) } @@ -4732,6 +4762,63 @@ fn load_cmap(cmap: &mut crate::cmap::Cmap, v: &toml::Value) { cmap.dirty = true; } +// Bt445 RAMDAC: palette + control registers. Critical for snapshot restore +// because `power_on` wipes the palette to all-zero, which makes every pixel +// decode to black after the gamma lookup in disp.rs::refresh. +fn save_bt445(dac: &crate::bt445::Bt445) -> toml::Value { + let flatten = |rgb: &[[u8; 3]]| -> Vec { + let mut v = Vec::with_capacity(rgb.len() * 3); + for e in rgb { v.extend_from_slice(e); } + v + }; + let mut tbl = toml::map::Map::new(); + tbl.insert("palette".into(), u8_slice_to_toml(&flatten(&dac.palette))); + tbl.insert("overlay".into(), u8_slice_to_toml(&flatten(&dac.overlay))); + tbl.insert("cursor_color".into(), u8_slice_to_toml(&flatten(&dac.cursor_color))); + tbl.insert("addr".into(), hex_u8(dac.addr)); + tbl.insert("rgb_counter".into(), hex_u8(dac.rgb_counter)); + tbl.insert("read_enable".into(), hex_u8(dac.read_enable)); + tbl.insert("blink_enable".into(), hex_u8(dac.blink_enable)); + tbl.insert("cmd0".into(), hex_u8(dac.cmd0)); + tbl.insert("rgb_ctrl".into(), u8_slice_to_toml(&dac.rgb_ctrl)); + tbl.insert("setup".into(), u8_slice_to_toml(&dac.setup)); + toml::Value::Table(tbl) +} + +fn load_bt445(dac: &mut crate::bt445::Bt445, v: &toml::Value) { + let unflatten = |bytes: &[u8], dest: &mut [[u8; 3]]| { + for (i, chunk) in bytes.chunks(3).enumerate() { + if i >= dest.len() { break; } + if chunk.len() == 3 { + dest[i] = [chunk[0], chunk[1], chunk[2]]; + } + } + }; + if let Some(r) = get_field(v, "palette") { + let mut buf = vec![0u8; dac.palette.len() * 3]; + load_u8_slice(r, &mut buf); + unflatten(&buf, &mut dac.palette); + } + if let Some(r) = get_field(v, "overlay") { + let mut buf = vec![0u8; dac.overlay.len() * 3]; + load_u8_slice(r, &mut buf); + unflatten(&buf, &mut dac.overlay); + } + if let Some(r) = get_field(v, "cursor_color") { + let mut buf = vec![0u8; dac.cursor_color.len() * 3]; + load_u8_slice(r, &mut buf); + unflatten(&buf, &mut dac.cursor_color); + } + if let Some(x) = get_field(v, "addr") { if let Some(n) = toml_u8(x) { dac.addr = n; } } + if let Some(x) = get_field(v, "rgb_counter") { if let Some(n) = toml_u8(x) { dac.rgb_counter = n; } } + if let Some(x) = get_field(v, "read_enable") { if let Some(n) = toml_u8(x) { dac.read_enable = n; } } + if let Some(x) = get_field(v, "blink_enable") { if let Some(n) = toml_u8(x) { dac.blink_enable = n; } } + if let Some(x) = get_field(v, "cmd0") { if let Some(n) = toml_u8(x) { dac.cmd0 = n; } } + if let Some(r) = get_field(v, "rgb_ctrl") { load_u8_slice(r, &mut dac.rgb_ctrl); } + if let Some(r) = get_field(v, "setup") { load_u8_slice(r, &mut dac.setup); } + dac.dirty = true; +} + #[cfg(test)] #[path = "rex3_tests.rs"] mod tests; diff --git a/src/scsi.rs b/src/scsi.rs index f92f8ce..a70dbc7 100644 --- a/src/scsi.rs +++ b/src/scsi.rs @@ -167,6 +167,24 @@ impl ScsiDevice { } } + /// Copy the COW overlay into `dest` and return its dirty sector set. + /// Direct-mode devices return an empty list and create no file. + pub fn cow_export(&mut self, dest: &std::path::Path) -> io::Result> { + match &mut self.backend { + DiskBackend::Cow(cow) => cow.export_overlay(dest), + DiskBackend::Direct(_) => Ok(Vec::new()), + } + } + + /// Replace the COW overlay with the contents of `source` and adopt + /// `dirty` as the dirty sector set. No-op on direct-mode devices. + pub fn cow_import(&mut self, source: &std::path::Path, dirty: Vec) -> io::Result<()> { + match &mut self.backend { + DiskBackend::Cow(cow) => cow.import_overlay(source, dirty), + DiskBackend::Direct(_) => Ok(()), + } + } + /// Number of dirty sectors in the COW overlay, or 0 if direct mode. pub fn cow_dirty_count(&self) -> usize { match &self.backend { diff --git a/src/seeq8003.rs b/src/seeq8003.rs index df7d503..84ce9b2 100644 --- a/src/seeq8003.rs +++ b/src/seeq8003.rs @@ -805,3 +805,31 @@ impl Saveable for Seeq8003 { Ok(()) } } + +#[cfg(test)] +mod tests { + use super::*; + + /// Phase 1.7 round-trip: a fresh Seeq loaded from a captured save_state + /// must re-serialize byte-identically. Mutates station_addr and the four + /// rx/tx command/status registers. + #[test] + fn save_load_round_trip() { + let src = Seeq8003::new(None, None, None, Arc::new(AtomicU64::new(0))); + { + let mut st = src.state.lock(); + st.station_addr = [0x08, 0x00, 0x69, 0x12, 0x34, 0x56]; + st.rx_cmd = 0x18; + st.rx_stat = 0xa1; + st.tx_cmd = 0x40; + st.tx_stat = 0x82; + } + let v1 = src.save_state(); + + let dst = Seeq8003::new(None, None, None, Arc::new(AtomicU64::new(0))); + dst.load_state(&v1).expect("load_state"); + let v2 = dst.save_state(); + + assert_eq!(v1, v2, "Seeq8003 save_state mismatch after load_state round-trip"); + } +} diff --git a/src/sgi_vh.rs b/src/sgi_vh.rs new file mode 100644 index 0000000..8e472f8 --- /dev/null +++ b/src/sgi_vh.rs @@ -0,0 +1,221 @@ +//! Minimal SGI Volume Header writer for the Phase 2.4 scratch volume. +//! +//! IRIX requires a recognisable partition table at sector 0 before the +//! `/dev/rdsk/dks0dNvol` and `/dev/rdsk/dks0dNvh` device nodes return real +//! data. Without one IRIX enumerates the SCSI target on `hinv` but every +//! read returns "I/O error". This module writes a 512-byte SGI Volume Header +//! into sector 0 of a freshly-created scratch image with two partition +//! entries: +//! +//! - **slot 0 ("payload")**: type 3 (`PT_RAW`), spans sectors 8..end. IRIX +//! surfaces this as `/dev/rdsk/dks0dNs0`. This is the partition the host +//! injects payload bytes into and that the guest reads — `first_block` is +//! honoured so reads from offset 0 of `s0` map to byte 4096 of the disk +//! (right after the VH). +//! - **slot 8 ("vh")**: type 0 (`PT_VOLHDR`), spans sectors 0..7. IRIX +//! surfaces this as `/dev/rdsk/dks0dNvh`. Present only so IRIX's standard +//! convention is satisfied; the host-side `scratch-write` never touches it. +//! - **slot 10 ("vol")**: type 6 (`PT_VOLUME`), spans the entire disk. IRIX +//! surfaces this as `/dev/rdsk/dks0dNvol`. The `vol` partition by SGI +//! convention always covers sector 0 onwards regardless of `first_block`, +//! so reading it returns the VH first — use `s0` for payload reads. +//! +//! NB: IRIX raw block-device reads must be sector-aligned (multiples of 512 +//! bytes). `dd if=/dev/rdsk/dks0dNs0 bs=512 count=N` works; `bs=64` returns +//! "Read error: I/O error" with no SCSI-level error. +//! +//! Convention: host writes payload at offset `SCRATCH_PAYLOAD_OFFSET` (4096 +//! = sector 8). Guest reads payload from offset 0 of the `vol` partition, +//! which the kernel maps to sector 8 of the underlying disk. +//! +//! All values are big-endian per SGI convention. + +use std::fs::File; +use std::io::{self, Write}; +use std::path::Path; + +/// First payload byte. Reserved bytes 0..4095 hold the 8-sector VH partition. +pub const SCRATCH_PAYLOAD_OFFSET: u64 = 4096; + +const SECTOR_SIZE: u64 = 512; +const VH_SECTORS: u64 = 8; +const SGI_MAGIC: u32 = 0x0BE5_A941; + +const PT_VOLHDR: u32 = 0; +const PT_RAW: u32 = 3; +const PT_VOLUME: u32 = 6; + +const PT_TABLE_OFFSET: usize = 0x138; +const PT_ENTRY_SIZE: usize = 12; +const CSUM_OFFSET: usize = 0x1F8; + +/// Create a fresh scratch image at `path` of `total_bytes` size, with a +/// minimal SGI Volume Header at sector 0. Overwrites any existing file. +pub fn create_scratch_image(path: &Path, total_bytes: u64) -> io::Result<()> { + if total_bytes < SCRATCH_PAYLOAD_OFFSET + SECTOR_SIZE { + return Err(io::Error::new( + io::ErrorKind::InvalidInput, + format!( + "scratch size {} bytes is too small (minimum {} bytes)", + total_bytes, + SCRATCH_PAYLOAD_OFFSET + SECTOR_SIZE + ), + )); + } + if total_bytes % SECTOR_SIZE != 0 { + return Err(io::Error::new( + io::ErrorKind::InvalidInput, + format!("scratch size {} is not a multiple of {} bytes", total_bytes, SECTOR_SIZE), + )); + } + + let total_sectors = total_bytes / SECTOR_SIZE; + let vol_sectors = total_sectors - VH_SECTORS; + + let mut vh = build_vh(vol_sectors); + fix_csum(&mut vh); + + let f = File::create(path)?; + f.set_len(total_bytes)?; + let mut f = f; + f.write_all(&vh)?; + f.sync_all()?; + Ok(()) +} + +fn build_vh(vol_sectors: u64) -> [u8; SECTOR_SIZE as usize] { + let mut vh = [0u8; SECTOR_SIZE as usize]; + + // Magic. + vh[0..4].copy_from_slice(&SGI_MAGIC.to_be_bytes()); + + // root_partnum / swap_partnum / bootfile / device_parameters all stay 0. + + // Partition table at PT_TABLE_OFFSET (0x138). + // Slot 0 ("payload"): type PT_RAW, sectors 8..end. IRIX maps this to + // /dev/rdsk/dks0dNs0 with first_block honoured — reads at offset 0 of + // s0 land at byte 4096 of the disk (right after the VH). + write_pt_entry(&mut vh, 0, vol_sectors as u32, VH_SECTORS as u32, PT_RAW); + // Slot 8 ("vh"): type PT_VOLHDR, sectors 0..7. IRIX maps this to + // /dev/rdsk/dks0dNvh. + write_pt_entry(&mut vh, 8, VH_SECTORS as u32, 0, PT_VOLHDR); + // Slot 10 ("vol"): type PT_VOLUME, whole disk. IRIX maps this to + // /dev/rdsk/dks0dNvol — convenient for raw whole-disk dumps but always + // starts at sector 0 (the VH), so use s0 for payload reads. + let total_sectors_u32 = (vol_sectors + VH_SECTORS) as u32; + write_pt_entry(&mut vh, 10, total_sectors_u32, 0, PT_VOLUME); + + vh +} + +fn write_pt_entry(vh: &mut [u8; SECTOR_SIZE as usize], slot: usize, nblks: u32, first: u32, ty: u32) { + let off = PT_TABLE_OFFSET + slot * PT_ENTRY_SIZE; + vh[off..off + 4].copy_from_slice(&nblks.to_be_bytes()); + vh[off + 4..off + 8].copy_from_slice(&first.to_be_bytes()); + vh[off + 8..off + 12].copy_from_slice(&ty.to_be_bytes()); +} + +/// Set csum so the 32-bit two's-complement sum of all 128 big-endian words +/// equals zero. fx, prtvtoc, and the IRIX kernel all check this. +fn fix_csum(vh: &mut [u8; SECTOR_SIZE as usize]) { + // Zero the existing csum first, then sum, then store -sum. + vh[CSUM_OFFSET..CSUM_OFFSET + 4].fill(0); + let mut sum: u32 = 0; + for chunk in vh.chunks_exact(4) { + let w = u32::from_be_bytes([chunk[0], chunk[1], chunk[2], chunk[3]]); + sum = sum.wrapping_add(w); + } + let csum = (!sum).wrapping_add(1); // -sum + vh[CSUM_OFFSET..CSUM_OFFSET + 4].copy_from_slice(&csum.to_be_bytes()); +} + +#[cfg(test)] +mod tests { + use super::*; + + fn unique_tmp_path(tag: &str) -> std::path::PathBuf { + let nanos = std::time::SystemTime::now() + .duration_since(std::time::UNIX_EPOCH) + .map(|d| d.as_nanos()) + .unwrap_or(0); + std::env::temp_dir().join(format!("iris-vh-{}-{}.raw", tag, nanos)) + } + + #[test] + fn scratch_image_has_correct_size_and_magic() { + let p = unique_tmp_path("size"); + let size: u64 = 4 * 1024 * 1024; // 4 MB + create_scratch_image(&p, size).expect("create"); + let meta = std::fs::metadata(&p).unwrap(); + assert_eq!(meta.len(), size, "image size must match request"); + let bytes = std::fs::read(&p).unwrap(); + assert_eq!(&bytes[0..4], &SGI_MAGIC.to_be_bytes(), "missing SGI magic"); + let _ = std::fs::remove_file(&p); + } + + #[test] + fn partition_table_describes_vol_and_vh() { + let p = unique_tmp_path("pt"); + let size: u64 = 64 * 1024 * 1024; + create_scratch_image(&p, size).expect("create"); + let bytes = std::fs::read(&p).unwrap(); + + // Slot 0 (payload): nblks = total - 8, first = 8, type = PT_RAW. + let off0 = PT_TABLE_OFFSET; + let nblks = u32::from_be_bytes(bytes[off0..off0 + 4].try_into().unwrap()); + let first = u32::from_be_bytes(bytes[off0 + 4..off0 + 8].try_into().unwrap()); + let ty = u32::from_be_bytes(bytes[off0 + 8..off0 + 12].try_into().unwrap()); + assert_eq!(nblks, (size / SECTOR_SIZE - VH_SECTORS) as u32); + assert_eq!(first, VH_SECTORS as u32); + assert_eq!(ty, PT_RAW); + + // Slot 8 (vh): nblks = 8, first = 0, type = PT_VOLHDR. + let off8 = PT_TABLE_OFFSET + 8 * PT_ENTRY_SIZE; + let nblks = u32::from_be_bytes(bytes[off8..off8 + 4].try_into().unwrap()); + let first = u32::from_be_bytes(bytes[off8 + 4..off8 + 8].try_into().unwrap()); + let ty = u32::from_be_bytes(bytes[off8 + 8..off8 + 12].try_into().unwrap()); + assert_eq!(nblks, VH_SECTORS as u32); + assert_eq!(first, 0); + assert_eq!(ty, PT_VOLHDR); + + // Slot 10 (vol): nblks = total, first = 0, type = PT_VOLUME (whole disk). + let off10 = PT_TABLE_OFFSET + 10 * PT_ENTRY_SIZE; + let nblks = u32::from_be_bytes(bytes[off10..off10 + 4].try_into().unwrap()); + let first = u32::from_be_bytes(bytes[off10 + 4..off10 + 8].try_into().unwrap()); + let ty = u32::from_be_bytes(bytes[off10 + 8..off10 + 12].try_into().unwrap()); + assert_eq!(nblks, (size / SECTOR_SIZE) as u32); + assert_eq!(first, 0); + assert_eq!(ty, PT_VOLUME); + + let _ = std::fs::remove_file(&p); + } + + #[test] + fn checksum_sums_to_zero() { + let p = unique_tmp_path("csum"); + let size: u64 = 64 * 1024 * 1024; + create_scratch_image(&p, size).expect("create"); + let bytes = std::fs::read(&p).unwrap(); + let mut sum: u32 = 0; + for chunk in bytes[..512].chunks_exact(4) { + let w = u32::from_be_bytes([chunk[0], chunk[1], chunk[2], chunk[3]]); + sum = sum.wrapping_add(w); + } + assert_eq!(sum, 0, "VH csum must make 32-bit sum of 128 BE words == 0"); + let _ = std::fs::remove_file(&p); + } + + #[test] + fn rejects_too_small_image() { + let p = unique_tmp_path("small"); + let r = create_scratch_image(&p, 4096); // exactly VH size, no payload + assert!(r.is_err()); + } + + #[test] + fn rejects_non_sector_aligned_size() { + let p = unique_tmp_path("misaligned"); + let r = create_scratch_image(&p, 4096 + 100); + assert!(r.is_err()); + } +} diff --git a/src/snapshot.rs b/src/snapshot.rs index 537ab82..15030b8 100644 --- a/src/snapshot.rs +++ b/src/snapshot.rs @@ -1,25 +1,134 @@ // System Snapshot — save and restore full machine state to/from a directory. // -// Layout of saves//: -// cpu.toml — CPU core (GPRs, CP0, FPU), TLB entries -// mc.toml — Memory Controller registers + GIO DMA state -// ioc.toml — IOC interrupt registers -// hpc3.toml — HPC3 state register, PBUS PIO, DMA channel registers -// rex3.toml — REX3 drawing registers, VC2, XMAP9, CMAP palette +// Layout of saves// (schema_version = 2): +// snapshot.toml — manifest (always TOML, human-readable) +// cpu.bin — CPU core (GPRs, CP0, FPU), TLB entries (postcard BinValue) +// mc.bin — Memory Controller registers + GIO DMA state +// ioc.bin — IOC interrupt registers +// hpc3.bin — HPC3 state register, PBUS PIO, DMA channel registers +// rex3.bin — REX3 drawing registers, VC2, XMAP9, CMAP palette +// {scc,pit,ps2,rtc,eeprom,scsi,seeq}.bin — peripheral device state +// cow.toml — COW overlay dirty sectors per SCSI device (stays TOML) // bank0.bin — 128 MB RAM bank A (raw u8, big-endian word layout) // bank1.bin — 128 MB RAM bank B // bank2.bin — 128 MB RAM bank C // bank3.bin — 128 MB RAM bank D +// +// schema_version = 1: same layout but device state is *.toml (hex strings). +// schema_version = 0 (no manifest): legacy, also *.toml. +use serde::{Deserialize, Serialize}; use std::fs; use std::io::{Read, Write}; use std::path::PathBuf; use toml::Value; +/// On-disk schema version for the snapshot directory layout. Bumped when a +/// device's save_state format changes incompatibly. Old snapshots without a +/// manifest are treated as v0 (legacy, best-effort load). +/// +/// v1 → v2: device state moved from *.toml (hex strings, ~80 ms cpu.toml +/// parse) to *.bin (postcard-encoded BinValue tree, sub-millisecond). Manifest +/// and cow.toml stay TOML. +/// +/// v2 → v3: RAM banks and framebuffers moved from raw `bank{N}.bin`/`rex3_*.bin` +/// files to a content-addressable chunk store at `saves/.cas/`. Each snapshot +/// writes a tiny `chunks.bin` manifest of per-bank/per-framebuffer chunk +/// hashes. Two snapshots from the same parent share 95–99% of chunks, so a +/// fresh save-after-bundle-install costs only the bytes that changed. +pub const SCHEMA_VERSION: u32 = 3; + +const MANIFEST_FILE: &str = "snapshot.toml"; + pub struct Snapshot { pub dir: PathBuf, } +/// Top-level snapshot manifest. Lives at `saves//snapshot.toml`. Written +/// first on save and read first on load so the rest of the pipeline can fail +/// fast with a clear error before reading half a snapshot. +#[derive(Debug, Clone)] +pub struct Manifest { + pub schema_version: u32, + pub iris_git_rev: Option, + pub host_arch: String, + pub created_at_unix: u64, + pub parent: Option, + pub description: Option, + pub installed_bundles: Vec, +} + +impl Manifest { + /// Build a manifest describing the current build/host, with no parent or + /// description. Caller can mutate fields before writing. + pub fn for_current_save() -> Self { + let created_at_unix = std::time::SystemTime::now() + .duration_since(std::time::UNIX_EPOCH) + .map(|d| d.as_secs()) + .unwrap_or(0); + Self { + schema_version: SCHEMA_VERSION, + iris_git_rev: option_env!("IRIS_GIT_REV").map(String::from), + host_arch: std::env::consts::ARCH.to_string(), + created_at_unix, + parent: None, + description: None, + installed_bundles: Vec::new(), + } + } + + pub fn to_toml(&self) -> Value { + let mut tbl = toml::map::Map::new(); + tbl.insert("schema_version".into(), Value::Integer(self.schema_version as i64)); + if let Some(rev) = &self.iris_git_rev { + tbl.insert("iris_git_rev".into(), Value::String(rev.clone())); + } + tbl.insert("host_arch".into(), Value::String(self.host_arch.clone())); + tbl.insert("created_at_unix".into(), Value::Integer(self.created_at_unix as i64)); + if let Some(parent) = &self.parent { + tbl.insert("parent".into(), Value::String(parent.clone())); + } + if let Some(d) = &self.description { + tbl.insert("description".into(), Value::String(d.clone())); + } + let bundles: Vec = self.installed_bundles.iter() + .map(|s| Value::String(s.clone())).collect(); + tbl.insert("installed_bundles".into(), Value::Array(bundles)); + Value::Table(tbl) + } + + pub fn from_toml(v: &Value) -> Result { + let tbl = v.as_table().ok_or("manifest: not a table")?; + let schema_version = tbl.get("schema_version") + .and_then(|x| x.as_integer()) + .ok_or("manifest: missing schema_version")? as u32; + let host_arch = tbl.get("host_arch") + .and_then(|x| x.as_str()) + .ok_or("manifest: missing host_arch")? + .to_string(); + let created_at_unix = tbl.get("created_at_unix") + .and_then(|x| x.as_integer()) + .map(|i| i as u64) + .unwrap_or(0); + let iris_git_rev = tbl.get("iris_git_rev").and_then(|x| x.as_str()).map(String::from); + let parent = tbl.get("parent").and_then(|x| x.as_str()).map(String::from); + let description = tbl.get("description").and_then(|x| x.as_str()).map(String::from); + let installed_bundles = tbl.get("installed_bundles") + .and_then(|x| x.as_array()) + .map(|arr| arr.iter().filter_map(|x| x.as_str().map(String::from)).collect()) + .unwrap_or_default(); + Ok(Self { + schema_version, + iris_git_rev, + host_arch, + created_at_unix, + parent, + description, + installed_bundles, + }) + } +} + impl Snapshot { pub fn new(dir: impl Into) -> Self { Self { dir: dir.into() } @@ -56,9 +165,183 @@ impl Snapshot { fs::read(path) } + /// Postcard-encode a `toml::Value` (via the tagged `BinValue` mirror) and + /// write it as ``. Sub-millisecond for typical device tables vs ~80 + /// ms TOML parse on cpu.toml. + pub fn write_value_bin(&self, name: &str, v: &Value) -> std::io::Result<()> { + let bv = BinValue::from_toml(v); + let bytes = postcard::to_allocvec(&bv) + .map_err(|e| std::io::Error::new(std::io::ErrorKind::Other, e))?; + self.write_bin(name, &bytes) + } + + /// Inverse of `write_value_bin`. Returns the reconstructed `toml::Value`. + pub fn read_value_bin(&self, name: &str) -> std::io::Result { + let bytes = self.read_bin(name)?; + let bv: BinValue = postcard::from_bytes(&bytes) + .map_err(|e| std::io::Error::new(std::io::ErrorKind::Other, e))?; + Ok(bv.into_toml()) + } + + /// Postcard-encode a `ChunksManifest` (v3+ snapshots). + pub fn write_chunks_manifest(&self, m: &ChunksManifest) -> std::io::Result<()> { + let bytes = postcard::to_allocvec(m) + .map_err(|e| std::io::Error::new(std::io::ErrorKind::Other, e))?; + self.write_bin("chunks.bin", &bytes) + } + + pub fn read_chunks_manifest(&self) -> std::io::Result { + let bytes = self.read_bin("chunks.bin")?; + postcard::from_bytes(&bytes) + .map_err(|e| std::io::Error::new(std::io::ErrorKind::Other, e)) + } + + /// Write a device save_state value, picking `.bin` for v2+ and + /// `.toml` for legacy schemas. Centralizes the per-call branching + /// in machine.rs. + pub fn write_state(&self, base: &str, v: &Value, schema_version: u32) -> std::io::Result<()> { + if schema_version >= 2 { + self.write_value_bin(&format!("{}.bin", base), v) + } else { + self.write_toml(&format!("{}.toml", base), v) + } + } + + /// Read a device save_state value. For v2+ tries `.bin` first and + /// falls back to `.toml` for snapshots half-migrated by external + /// tooling. For legacy schemas reads `.toml` directly. + pub fn read_state(&self, base: &str, schema_version: u32) -> std::io::Result { + if schema_version >= 2 { + match self.read_value_bin(&format!("{}.bin", base)) { + Ok(v) => Ok(v), + Err(_) => self.read_toml(&format!("{}.toml", base)), + } + } else { + self.read_toml(&format!("{}.toml", base)) + } + } + pub fn ensure_dir(&self) -> std::io::Result<()> { fs::create_dir_all(&self.dir) } + + /// Write the manifest to `snapshot.toml`. Always called first on save. + pub fn write_manifest(&self, m: &Manifest) -> std::io::Result<()> { + self.write_toml(MANIFEST_FILE, &m.to_toml()) + } + + /// Read the manifest. Returns `Ok(None)` if `snapshot.toml` is absent + /// (legacy snapshots taken before this format was introduced). + pub fn read_manifest(&self) -> Result, String> { + let path = self.dir.join(MANIFEST_FILE); + if !path.exists() { + return Ok(None); + } + let v = self.read_toml(MANIFEST_FILE).map_err(|e| e.to_string())?; + Manifest::from_toml(&v).map(Some) + } +} + +// ---- ChunksManifest: per-bank / per-framebuffer chunk hash lists (v3+) ---- + +use crate::chunk_store::ChunkHash; + +/// Per-snapshot pointer into the content-addressable chunk store. Every bank +/// and (optionally) each framebuffer is split into 64 KB chunks; this +/// manifest records the BLAKE3 hash of each chunk in order. Loading a +/// snapshot fetches the chunks and concatenates them back into the bank's +/// big-endian byte stream. +/// +/// Stored as `chunks.bin` in the snapshot dir, postcard-encoded. +#[derive(Debug, Clone, Serialize, Deserialize, Default)] +pub struct ChunksManifest { + /// One entry per RAM bank (0..3). Empty inner Vec means the bank wasn't + /// captured (e.g. zero-sized in this configuration). + pub bank_chunks: [Vec; 4], + /// REX3 framebuffer chunks: (rgb, aux). `None` when running headless. + pub framebuffer_chunks: Option<(Vec, Vec)>, +} + +impl ChunksManifest { + /// Iterate every chunk hash referenced by this manifest. Used by `gc` + /// to build the live set across all kept snapshots. + pub fn referenced_hashes(&self) -> impl Iterator { + self.bank_chunks.iter().flatten().chain( + self.framebuffer_chunks + .iter() + .flat_map(|(rgb, aux)| rgb.iter().chain(aux.iter())), + ) + } +} + +// ---- BinValue: tagged binary mirror of toml::Value ---- +// +// Postcard is non-self-describing — it cannot deserialize directly into the +// untagged `toml::Value` enum (which relies on `deserialize_any`). BinValue +// carries an explicit variant tag so postcard can round-trip it. The +// conversion to/from `toml::Value` is a single tree walk and runs in low +// milliseconds even for the largest device tables. +// +// Datetime is rare in our save_state output — encode it as an ISO-8601 string +// and reparse on the way back. If parsing fails the value falls back to a +// plain `toml::Value::String` so a malformed datetime never panics a load. + +/// Tagged binary mirror of `toml::Value`. Order-preserving for tables (matches +/// `toml::Value::Table` which uses an `IndexMap` under the hood). +#[derive(Debug, Clone, Serialize, Deserialize)] +pub enum BinValue { + String(String), + Integer(i64), + Float(f64), + Boolean(bool), + Array(Vec), + Table(Vec<(String, BinValue)>), + Datetime(String), +} + +impl BinValue { + pub fn from_toml(v: &Value) -> Self { + match v { + Value::String(s) => BinValue::String(s.clone()), + Value::Integer(i) => BinValue::Integer(*i), + Value::Float(f) => BinValue::Float(*f), + Value::Boolean(b) => BinValue::Boolean(*b), + Value::Array(arr) => { + BinValue::Array(arr.iter().map(BinValue::from_toml).collect()) + } + Value::Table(tbl) => { + let mut out = Vec::with_capacity(tbl.len()); + for (k, v) in tbl { + out.push((k.clone(), BinValue::from_toml(v))); + } + BinValue::Table(out) + } + Value::Datetime(dt) => BinValue::Datetime(dt.to_string()), + } + } + + pub fn into_toml(self) -> Value { + match self { + BinValue::String(s) => Value::String(s), + BinValue::Integer(i) => Value::Integer(i), + BinValue::Float(f) => Value::Float(f), + BinValue::Boolean(b) => Value::Boolean(b), + BinValue::Array(arr) => { + Value::Array(arr.into_iter().map(BinValue::into_toml).collect()) + } + BinValue::Table(entries) => { + let mut tbl = toml::map::Map::new(); + for (k, v) in entries { + tbl.insert(k, v.into_toml()); + } + Value::Table(tbl) + } + BinValue::Datetime(s) => match s.parse::() { + Ok(dt) => Value::Datetime(dt), + Err(_) => Value::String(s), + }, + } + } } // ---- scalar hex helpers ---- @@ -182,3 +465,243 @@ pub fn load_u8_slice(v: &Value, dst: &mut [u8]) { pub fn get_field<'a>(table: &'a Value, key: &str) -> Option<&'a Value> { table.as_table()?.get(key) } + +#[cfg(test)] +mod tests { + use super::*; + + fn unique_tmp_dir(tag: &str) -> PathBuf { + let nanos = std::time::SystemTime::now() + .duration_since(std::time::UNIX_EPOCH) + .map(|d| d.as_nanos()) + .unwrap_or(0); + let p = std::env::temp_dir().join(format!("iris-snap-test-{}-{}", tag, nanos)); + fs::create_dir_all(&p).unwrap(); + p + } + + #[test] + fn manifest_round_trip_full() { + let m = Manifest { + schema_version: 1, + iris_git_rev: Some("abc123".into()), + host_arch: "aarch64".into(), + created_at_unix: 1_700_000_000, + parent: Some("base/desktop".into()), + description: Some("post mogrix install".into()), + installed_bundles: vec!["grep-2.5.4".into(), "sed-4.2.2".into()], + }; + let v = m.to_toml(); + let m2 = Manifest::from_toml(&v).expect("parse"); + assert_eq!(m2.schema_version, m.schema_version); + assert_eq!(m2.iris_git_rev, m.iris_git_rev); + assert_eq!(m2.host_arch, m.host_arch); + assert_eq!(m2.created_at_unix, m.created_at_unix); + assert_eq!(m2.parent, m.parent); + assert_eq!(m2.description, m.description); + assert_eq!(m2.installed_bundles, m.installed_bundles); + } + + #[test] + fn manifest_round_trip_minimal() { + let m = Manifest { + schema_version: 1, + iris_git_rev: None, + host_arch: "x86_64".into(), + created_at_unix: 0, + parent: None, + description: None, + installed_bundles: vec![], + }; + let v = m.to_toml(); + let m2 = Manifest::from_toml(&v).expect("parse"); + assert!(m2.iris_git_rev.is_none()); + assert!(m2.parent.is_none()); + assert!(m2.description.is_none()); + assert!(m2.installed_bundles.is_empty()); + } + + #[test] + fn manifest_rejects_missing_schema_version() { + let mut tbl = toml::map::Map::new(); + tbl.insert("host_arch".into(), Value::String("aarch64".into())); + let v = Value::Table(tbl); + assert!(Manifest::from_toml(&v).is_err()); + } + + #[test] + fn manifest_disk_round_trip() { + let dir = unique_tmp_dir("manifest"); + let snap = Snapshot::new(&dir); + let m = Manifest::for_current_save(); + snap.write_manifest(&m).expect("write"); + let loaded = snap.read_manifest().expect("read").expect("present"); + assert_eq!(loaded.schema_version, SCHEMA_VERSION); + assert_eq!(loaded.host_arch, std::env::consts::ARCH); + // cleanup + let _ = fs::remove_dir_all(&dir); + } + + #[test] + fn manifest_absent_returns_none() { + let dir = unique_tmp_dir("missing"); + let snap = Snapshot::new(&dir); + let loaded = snap.read_manifest().expect("read"); + assert!(loaded.is_none()); + let _ = fs::remove_dir_all(&dir); + } + + #[test] + fn for_current_save_uses_runtime_arch() { + let m = Manifest::for_current_save(); + assert_eq!(m.schema_version, SCHEMA_VERSION); + assert_eq!(m.host_arch, std::env::consts::ARCH); + assert!(m.parent.is_none()); + } + + fn sample_value() -> Value { + // Mirrors a slice of cpu.toml: top-level scalars + a sub-table with + // mixed integer/string/array entries. Order matters for the table + // round-trip assertion. + let mut cp0 = toml::map::Map::new(); + cp0.insert("cp0_index".into(), Value::String("0x00000001".into())); + cp0.insert("cp0_count".into(), Value::String("0x000000000badf00d".into())); + cp0.insert("cp0_status".into(), Value::Integer(0x4040_0000)); + let mut tbl = toml::map::Map::new(); + tbl.insert("pc".into(), Value::String("0x9fc00000".into())); + tbl.insert( + "gpr".into(), + Value::Array(vec![ + Value::String("0x0000000000000000".into()), + Value::String("0x0000000000000001".into()), + Value::String("0xffffffff80001234".into()), + ]), + ); + tbl.insert("cp0".into(), Value::Table(cp0)); + tbl.insert("running".into(), Value::Boolean(true)); + tbl.insert("ratio".into(), Value::Float(1.5)); + Value::Table(tbl) + } + + #[test] + fn binvalue_round_trip_matches_toml() { + let v = sample_value(); + let bv = BinValue::from_toml(&v); + let back = bv.into_toml(); + assert_eq!(back, v); + } + + #[test] + fn binvalue_postcard_round_trip() { + let v = sample_value(); + let bv = BinValue::from_toml(&v); + let bytes = postcard::to_allocvec(&bv).expect("encode"); + let bv2: BinValue = postcard::from_bytes(&bytes).expect("decode"); + assert_eq!(bv2.into_toml(), v); + } + + #[test] + fn write_state_v2_writes_bin_and_reads_back() { + let dir = unique_tmp_dir("state-v2"); + let snap = Snapshot::new(&dir); + let v = sample_value(); + snap.write_state("cpu", &v, 2).expect("write v2"); + assert!(dir.join("cpu.bin").exists(), "expected cpu.bin to be written"); + assert!(!dir.join("cpu.toml").exists(), "v2 must not write cpu.toml"); + let back = snap.read_state("cpu", 2).expect("read v2"); + assert_eq!(back, v); + let _ = fs::remove_dir_all(&dir); + } + + #[test] + fn write_state_v1_writes_toml_and_reads_back() { + let dir = unique_tmp_dir("state-v1"); + let snap = Snapshot::new(&dir); + let v = sample_value(); + snap.write_state("cpu", &v, 1).expect("write v1"); + assert!(dir.join("cpu.toml").exists(), "expected cpu.toml to be written"); + assert!(!dir.join("cpu.bin").exists(), "v1 must not write cpu.bin"); + let back = snap.read_state("cpu", 1).expect("read v1"); + assert_eq!(back, v); + let _ = fs::remove_dir_all(&dir); + } + + #[test] + fn read_state_v2_falls_back_to_toml_when_bin_missing() { + // External tooling may legitimately produce a v2 manifest with .toml + // device files (e.g. dump-and-edit workflow). Loader must be tolerant. + let dir = unique_tmp_dir("state-fallback"); + let snap = Snapshot::new(&dir); + let v = sample_value(); + snap.write_toml("cpu.toml", &v).expect("write toml"); + let back = snap.read_state("cpu", 2).expect("read with fallback"); + assert_eq!(back, v); + let _ = fs::remove_dir_all(&dir); + } + + /// Hand-runnable bench: `cargo test --release --features lightning -- --ignored bench_cpu_toml_vs_bin --nocapture`. + /// Reads saves/working/cpu.toml (3.6 MB legacy snapshot) and prints the + /// parse-time delta between toml::from_str and postcard::from_bytes. + #[test] + #[ignore] + fn bench_cpu_toml_vs_bin() { + let path = "saves/working/cpu.toml"; + let s = match std::fs::read_to_string(path) { + Ok(s) => s, + Err(e) => { + eprintln!("skipping: cannot read {}: {}", path, e); + return; + } + }; + println!("cpu.toml: {} bytes", s.len()); + + let runs = 5; + let mut toml_total_us = 0u128; + let mut toml_v: Option = None; + for _ in 0..runs { + let t = std::time::Instant::now(); + toml_v = Some(toml::from_str::(&s).unwrap()); + toml_total_us += t.elapsed().as_micros(); + } + println!("toml::from_str avg over {} runs: {:.2} ms", + runs, toml_total_us as f64 / runs as f64 / 1000.0); + + let v = toml_v.take().unwrap(); + let bv = BinValue::from_toml(&v); + let bytes = postcard::to_allocvec(&bv).unwrap(); + println!("postcard encoded: {} bytes (vs toml {} bytes, ratio {:.2}x)", + bytes.len(), s.len(), s.len() as f64 / bytes.len() as f64); + + let mut bin_total_us = 0u128; + for _ in 0..runs { + let t = std::time::Instant::now(); + let bv: BinValue = postcard::from_bytes(&bytes).unwrap(); + let _ = bv.into_toml(); + bin_total_us += t.elapsed().as_micros(); + } + println!("postcard decode + into_toml avg over {} runs: {:.2} ms", + runs, bin_total_us as f64 / runs as f64 / 1000.0); + println!("speedup: {:.1}x", + toml_total_us as f64 / bin_total_us as f64); + } + + #[test] + fn binvalue_payload_is_smaller_than_toml() { + // Sanity check the size win on a representative-ish payload. + let mut tbl = toml::map::Map::new(); + let big_arr: Vec = (0..1024) + .map(|i| Value::String(format!("0x{:016x}", i as u64))) + .collect(); + tbl.insert("gpr_big".into(), Value::Array(big_arr)); + let v = Value::Table(tbl); + let toml_bytes = toml::to_string(&v).unwrap().into_bytes(); + let bv = BinValue::from_toml(&v); + let bin_bytes = postcard::to_allocvec(&bv).unwrap(); + assert!( + bin_bytes.len() < toml_bytes.len(), + "bin {} bytes should be smaller than toml {} bytes", + bin_bytes.len(), + toml_bytes.len() + ); + } +} diff --git a/src/validate.rs b/src/validate.rs new file mode 100644 index 0000000..45287f6 --- /dev/null +++ b/src/validate.rs @@ -0,0 +1,88 @@ +//! Phase 3.3: snapshot determinism validator. +//! +//! Loads a saved snapshot twice, runs the CPU `n` instructions inline (with +//! all peripheral threads stopped to eliminate scheduling jitter), and +//! diffs the resulting CPU state digests. Two passes over the same starting +//! state should produce bit-identical CPU registers — any divergence points +//! at non-determinism in `load_snapshot` (host wallclock leakage, missing +//! `load_state` field, uninitialised structure) that would silently corrupt +//! mogrix CI replays. +//! +//! With JIT descoped (Phase 2.5), the original "JIT vs interp lockstep" +//! framing is gone; this is the snapshot-determinism portion. Peripheral +//! threads are stopped during the test so device-side timing variance +//! doesn't leak into the result. + +use crate::machine::Machine; +use crate::mips_exec::CpuStateDigest; + +/// Result of `validate_snapshot_determinism`. +#[derive(Debug)] +pub struct DeterminismReport { + pub instructions_run: u64, + pub deterministic: bool, + /// Per-field divergence list. Empty when `deterministic` is true. + pub diffs: Vec<(String, String, String)>, + pub state_a: CpuStateDigest, + pub state_b: CpuStateDigest, +} + +impl DeterminismReport { + pub fn summary(&self) -> String { + if self.deterministic { + format!( + "deterministic for {} instructions (PC=0x{:016x})", + self.instructions_run, self.state_a.pc + ) + } else { + let mut s = format!( + "DIVERGED after {} instructions ({} field(s)):", + self.instructions_run, + self.diffs.len() + ); + for (field, a, b) in &self.diffs { + s.push_str(&format!("\n {}: A={} B={}", field, a, b)); + } + s + } + } +} + +/// Run two passes of `load_snapshot(name); step n; capture` and diff the +/// resulting CPU state digests. Side effects: leaves the machine stopped +/// after both passes, with the second-pass state loaded. Caller is +/// responsible for any subsequent `start`/`restart_peripherals`. +pub fn validate_snapshot_determinism( + machine: &mut Machine, + name: &str, + n_instructions: u64, +) -> Result { + // Pass A: load with everything paused → step inline → capture. + // load_snapshot_paused leaves CPU and peripheral threads stopped, so no + // thread runs between load and digest. This is the key to surfacing + // genuine load_state determinism issues vs. thread-scheduling jitter. + machine.load_snapshot_paused(name)?; + let executed_a = machine.cpu_step_n_inline(n_instructions)?; + let state_a = machine.cpu_state_digest()?; + + // Pass B: same starting snapshot, fresh load. + machine.load_snapshot_paused(name)?; + let executed_b = machine.cpu_step_n_inline(n_instructions)?; + let state_b = machine.cpu_state_digest()?; + + if executed_a != executed_b { + return Err(format!( + "step counts disagree: A ran {}, B ran {} (CPU stopped itself differently)", + executed_a, executed_b + )); + } + + let diffs = state_a.diff(&state_b); + Ok(DeterminismReport { + instructions_run: executed_a, + deterministic: diffs.is_empty(), + diffs, + state_a, + state_b, + }) +} diff --git a/src/wd33c93a.rs b/src/wd33c93a.rs index c4a6a8c..bf3e23b 100644 --- a/src/wd33c93a.rs +++ b/src/wd33c93a.rs @@ -234,12 +234,27 @@ impl Wd33c93a { /// For CD-ROMs, `discs` is the full ordered list of ISO paths; the first /// entry is mounted immediately. For HDDs `discs` is ignored — only /// `path` is used. - pub fn add_device(&self, id: usize, path: &str, is_cdrom: bool, discs: Vec, overlay: bool) -> std::io::Result<()> { + /// + /// If `overlay_path_override` is `Some`, it specifies where the COW + /// overlay file lives. This lets CI mode isolate its overlay from an + /// interactive session sharing the same base image. Ignored when + /// `overlay` is false. + pub fn add_device( + &self, + id: usize, + path: &str, + is_cdrom: bool, + discs: Vec, + overlay: bool, + overlay_path_override: Option<&str>, + ) -> std::io::Result<()> { use crate::cow_disk::CowDisk; use crate::scsi::DiskBackend; let (backend, size) = if overlay && !is_cdrom { - let overlay_path = format!("{}.overlay", path); + let overlay_path = overlay_path_override + .map(|s| s.to_string()) + .unwrap_or_else(|| format!("{}.overlay", path)); let cow = CowDisk::new(path, &overlay_path)?; let sz = cow.size(); (DiskBackend::Cow(cow), sz) @@ -317,6 +332,59 @@ impl Wd33c93a { .collect() } + /// Reset the COW overlay on every attached device that's using COW. + /// Direct-mode devices are left alone. Used by `Machine::ci_restore`. + pub fn reset_all_overlays(&self) -> Vec<(usize, std::io::Result<()>)> { + let mut state = self.state.lock(); + let mut results = Vec::new(); + for id in 0..8 { + if let Some(dev) = &mut state.devices[id] { + if dev.is_cow() { + results.push((id, dev.cow_reset())); + } + } + } + results + } + + /// Copy every COW overlay into `dir` as `scsi.overlay`. Returns a + /// list of `(id, dirty_sector_list)` entries so snapshot save can + /// persist the dirty set alongside the raw overlay bytes. + pub fn export_overlays(&self, dir: &std::path::Path) -> std::io::Result)>> { + let mut state = self.state.lock(); + let mut out = Vec::new(); + for id in 0..8 { + if let Some(dev) = &mut state.devices[id] { + if dev.is_cow() { + let dest = dir.join(format!("scsi{}.overlay", id)); + let dirty = dev.cow_export(&dest)?; + out.push((id, dirty)); + } + } + } + Ok(out) + } + + /// Replace each COW overlay with its saved counterpart in `dir` and + /// adopt the matching dirty sector set. Devices with no corresponding + /// entry in `dirty_sets` keep their current overlay untouched. + pub fn import_overlays( + &self, + dir: &std::path::Path, + dirty_sets: &[(usize, Vec)], + ) -> std::io::Result<()> { + let mut state = self.state.lock(); + for (id, dirty) in dirty_sets { + if let Some(dev) = &mut state.devices[*id] { + if dev.is_cow() { + let src = dir.join(format!("scsi{}.overlay", id)); + dev.cow_import(&src, dirty.clone())?; + } + } + } + Ok(()) + } + pub fn read_fifo(&self) -> u8 { let mut state = self.state.lock(); state.fifo.pop_front().unwrap_or(0) @@ -1375,4 +1443,42 @@ impl Wd33c93aState { self.regs[regs::TRANSFER_COUNT_2ND as usize] = ((count >> 8) & 0xFF) as u8; self.regs[regs::TRANSFER_COUNT_LSB as usize] = (count & 0xFF) as u8; } +} + +#[cfg(test)] +mod tests { + use super::*; + + fn make_scsi() -> Wd33c93a { + Wd33c93a::new(None, None, Arc::new(AtomicU64::new(0))) + } + + /// Phase 1.7 round-trip: a fresh SCSI controller loaded from a captured + /// save_state must re-serialize byte-identically. Mutates regs and the + /// scalar shadow fields (ar, asr, target_id, pending_*). + #[test] + fn save_load_round_trip() { + let src = make_scsi(); + { + let mut s = src.state.lock(); + s.regs[regs::CONTROL as usize] = 0x60; + s.regs[regs::SCSI_STATUS as usize] = 0x10; + s.regs[regs::COMMAND_PHASE as usize] = 0x46; + s.regs[regs::OWN_ID as usize] = 0x07; + s.ar = 0x42; + s.asr = 0x10; + s.data_direction_in = true; + s.target_id = 4; + s.pending_status = 0x02; + s.pending_msg = 0x80; + s.advanced_mode = true; + } + let v1 = src.save_state(); + + let dst = make_scsi(); + dst.load_state(&v1).expect("load_state"); + let v2 = dst.save_state(); + + assert_eq!(v1, v2, "Wd33c93a save_state mismatch after load_state round-trip"); + } } \ No newline at end of file diff --git a/src/z85c30.rs b/src/z85c30.rs index d0f3cad..ee44833 100644 --- a/src/z85c30.rs +++ b/src/z85c30.rs @@ -320,6 +320,17 @@ pub trait SerialBackend: Send + Sync { fn recv_byte(&self) -> io::Result; } +/// Drops TX bytes and never yields RX. Used as a placeholder when a channel +/// isn't wired to a host I/O source (e.g. CI mode unused channel). +struct NullBackend; + +impl SerialBackend for NullBackend { + fn send_byte(&self, _byte: u8) {} + fn recv_byte(&self) -> io::Result { + Err(io::Error::new(io::ErrorKind::WouldBlock, "null")) + } +} + #[cfg(unix)] struct UnixSocketBackend { listener: UnixListener, @@ -456,28 +467,68 @@ impl SerialBackend for TcpSocketBackend { pub struct Z85c30 { pub channel_a: Arc<(Mutex, Condvar)>, pub channel_b: Arc<(Mutex, Condvar)>, - backend_a: Arc, - backend_b: Arc, + // Swappable so CI mode can replace the default TCP backend with a + // `CiSerialBackend` before `start()` is called. Wrapped in `Arc>` + // so `Z85c30` stays `Clone` and the swap is thread-safe. + backend_a: Arc>>, + backend_b: Arc>>, running: Arc, threads: Arc>>>, } impl Z85c30 { + /// Default constructor: binds TCP serial backends on 127.0.0.1:8880 + /// (channel A / tty2) and 127.0.0.1:8881 (channel B / tty1). pub fn new(callback: Option>) -> Self { + Self::new_inner(callback, true) + } + + /// CI-mode constructor: uses null backends instead of binding TCP. The + /// caller is expected to install real backends via `set_backend_a` / + /// `set_backend_b` before the first `start()`. Avoids port conflicts + /// when multiple `--ci` instances run in parallel. + pub fn new_null(callback: Option>) -> Self { + Self::new_inner(callback, false) + } + + fn new_inner(callback: Option>, bind_tcp: bool) -> Self { let ip_a = Arc::new(AtomicU8::new(0)); let ip_b = Arc::new(AtomicU8::new(0)); + let (backend_a, backend_b): (Arc, Arc) = if bind_tcp { + ( + Arc::new(TcpSocketBackend::new("127.0.0.1:8880")), + Arc::new(TcpSocketBackend::new("127.0.0.1:8881")), + ) + } else { + (Arc::new(NullBackend), Arc::new(NullBackend)) + }; + Self { channel_a: Arc::new((Mutex::new(Channel::new("A", ip_a.clone(), ip_b.clone(), callback.clone())), Condvar::new())), // Note: Channel B gets ip_b as its 'num' and ip_a as 'other' channel_b: Arc::new((Mutex::new(Channel::new("B", ip_b, ip_a, callback)), Condvar::new())), - backend_a: Arc::new(TcpSocketBackend::new("127.0.0.1:8880")), - backend_b: Arc::new(TcpSocketBackend::new("127.0.0.1:8881")), + backend_a: Arc::new(Mutex::new(backend_a)), + backend_b: Arc::new(Mutex::new(backend_b)), running: Arc::new(AtomicBool::new(false)), threads: Arc::new(Mutex::new(Vec::new())), } } + /// Swap in an alternate backend for channel A (tty2 on Indy). + /// Must be called before `start()` — running RX/TX threads cache the + /// backend Arc at spawn time and will not observe the new one until + /// they are stopped and restarted. + pub fn set_backend_a(&self, backend: Arc) { + *self.backend_a.lock() = backend; + } + + /// Swap in an alternate backend for channel B (tty1, the PROM/IRIX + /// serial console on Indy). Same constraint as `set_backend_a`. + pub fn set_backend_b(&self, backend: Arc) { + *self.backend_b.lock() = backend; + } + pub fn read_a_control(&self) -> u8 { let mut a = self.channel_a.0.lock(); if a.reg_ptr == 2 { @@ -610,8 +661,8 @@ impl Device for Z85c30 { } let pairs = [ - (self.channel_a.clone(), self.backend_a.clone()), - (self.channel_b.clone(), self.backend_b.clone()), + (self.channel_a.clone(), self.backend_a.lock().clone()), + (self.channel_b.clone(), self.backend_b.lock().clone()), ]; let mut threads = self.threads.lock(); @@ -691,46 +742,75 @@ impl Device for Z85c30 { threads.push(thread::Builder::new().name(format!("SCC-RX-{}", ch_name)).spawn(move || { let mut last_rx_time = Instant::now(); + // When the SCC's 8-byte rx_queue is full, hold the just-read + // byte here and retry on the next iteration instead of + // dropping it. This prevents loss when the host pushes a + // long line into CiSerialBackend faster than IRIX's tty + // driver clocks bytes off rx_queue. Without this hold, a + // ~30-char `dd if=/dev/rdsk/dks0d2s0 bs=512` arrives at + // the shell as `dd if=/d=512` (chars 9..24 dropped). + let mut pending: Option = None; while running.load(Ordering::Relaxed) { - if let Ok(mut byte) = rx_backend.recv_byte() { - if byte == 0x05 { - crate::dlog_dev!(LogModule::Scc, "SCC: Converting ^E to ^D (BREAK)"); - byte = 0x04; - } - let (lock, _cvar) = &*rx_channel; - let mut channel = lock.lock(); - - let wr3 = channel.regs[scc_regs::WR3 as usize]; - let rx_enabled = (wr3 & wr3::RX_ENABLE) != 0; - - // Get pre-calculated delay - let delay_micros = channel.tx_delay; - let char_duration = Duration::from_micros(delay_micros); - - if rx_enabled && channel.rx_queue.len() < 8 { - crate::dlog_dev!(LogModule::Scc, "SCC: RX({}) '{}' ({:02x})", channel.name, if byte.is_ascii_graphic() { byte as char } else { '.' }, byte); - channel.rx_queue.push_back(byte); - channel.status |= rr0::RX_CHAR_AVAILABLE; - channel.update_ip(); - } + let mut byte = match pending.take() { + Some(b) => b, + None => match rx_backend.recv_byte() { + Ok(b) => b, + Err(_) => { + thread::sleep(Duration::from_millis(10)); + continue; + } + }, + }; + if byte == 0x05 { + crate::dlog_dev!(LogModule::Scc, "SCC: Converting ^E to ^D (BREAK)"); + byte = 0x04; + } + + let (lock, _cvar) = &*rx_channel; + let mut channel = lock.lock(); + + let wr3 = channel.regs[scc_regs::WR3 as usize]; + let rx_enabled = (wr3 & wr3::RX_ENABLE) != 0; + let delay_micros = channel.tx_delay; + let char_duration = Duration::from_micros(delay_micros); + if !rx_enabled { + // RX disabled — drop the byte (matches real hw with + // RX off). Don't hold it in `pending` or we'd block + // forever waiting for re-enable. drop(channel); + continue; + } - let now = Instant::now(); - if last_rx_time < now { - if now.duration_since(last_rx_time) > Duration::from_millis(100) { - last_rx_time = now; - } - } - last_rx_time += char_duration; - let wait = last_rx_time.saturating_duration_since(now); - if !wait.is_zero() { - thread::sleep(wait); + if channel.rx_queue.len() >= 8 { + // SCC FIFO full. Hold the byte and back off briefly + // so the guest's tty driver gets a chance to drain + // rx_queue. Don't drop — that's the bug this + // section fixes. + drop(channel); + pending = Some(byte); + thread::sleep(Duration::from_millis(1)); + continue; + } + + crate::dlog_dev!(LogModule::Scc, "SCC: RX({}) '{}' ({:02x})", channel.name, if byte.is_ascii_graphic() { byte as char } else { '.' }, byte); + channel.rx_queue.push_back(byte); + channel.status |= rr0::RX_CHAR_AVAILABLE; + channel.update_ip(); + drop(channel); + + // Pacing — simulate baud-rate inter-character spacing. + let now = Instant::now(); + if last_rx_time < now { + if now.duration_since(last_rx_time) > Duration::from_millis(100) { + last_rx_time = now; } - } else { - // Avoid busy loop on error - thread::sleep(Duration::from_millis(10)); + } + last_rx_time += char_duration; + let wait = last_rx_time.saturating_duration_since(now); + if !wait.is_zero() { + thread::sleep(wait); } } }).unwrap()); @@ -834,3 +914,195 @@ impl Saveable for Z85c30 { Ok(()) } } + +// ============================================================================ +// CiSerialBackend — in-process serial backend used by --ci mode. +// ============================================================================ + +/// Serial backend that the CI control socket reads from and writes to. The +/// guest sees this as channel A (the IRIX console). Host pushes bytes into +/// `host_to_guest` via `push_host`; the existing RX thread drains them into +/// `channel_a.rx_queue`. Guest output reaches `send_byte`, which pushes into +/// `guest_to_host` and wakes anyone waiting in `wait_for`. +pub struct CiSerialBackend { + host_to_guest: Mutex>, + guest_to_host: Mutex>, + cv: Condvar, +} + +impl CiSerialBackend { + pub fn new() -> Self { + Self { + host_to_guest: Mutex::new(VecDeque::new()), + guest_to_host: Mutex::new(Vec::new()), + cv: Condvar::new(), + } + } + + /// Inject bytes from host to guest (the harness typing on the console). + pub fn push_host(&self, data: &[u8]) { + let mut q = self.host_to_guest.lock(); + q.extend(data.iter().copied()); + } + + /// Drain everything the guest has produced since the last call. Empties + /// the buffer; the returned Vec is the guest output as raw bytes. + pub fn drain_guest(&self) -> Vec { + let mut q = self.guest_to_host.lock(); + std::mem::take(&mut *q) + } + + /// Block until `needle` is seen in guest output, or `timeout` expires. + /// On success returns the consumed bytes up to and including the match; + /// bytes that arrived after the match stay in the buffer for the next + /// `serial-read`. On timeout returns `None` without consuming anything. + pub fn wait_for(&self, needle: &[u8], timeout: Duration) -> Option> { + if needle.is_empty() { + return Some(Vec::new()); + } + let deadline = Instant::now() + timeout; + let mut q = self.guest_to_host.lock(); + loop { + if let Some(pos) = find_subseq(&q, needle) { + let end = pos + needle.len(); + let consumed: Vec = q.drain(..end).collect(); + return Some(consumed); + } + let now = Instant::now(); + if now >= deadline { + return None; + } + if self.cv.wait_until(&mut q, deadline).timed_out() { + // One more scan in case bytes arrived between the last check + // and the timeout. + if let Some(pos) = find_subseq(&q, needle) { + let end = pos + needle.len(); + let consumed: Vec = q.drain(..end).collect(); + return Some(consumed); + } + return None; + } + } + } + + /// Clear both queues. Called on `restore`/`rollback` so stale serial + /// output from the previous run doesn't leak into the next test. + pub fn reset(&self) { + self.host_to_guest.lock().clear(); + self.guest_to_host.lock().clear(); + } +} + +fn find_subseq(haystack: &[u8], needle: &[u8]) -> Option { + if needle.is_empty() || haystack.len() < needle.len() { + return None; + } + haystack.windows(needle.len()).position(|w| w == needle) +} + +impl SerialBackend for CiSerialBackend { + fn send_byte(&self, byte: u8) { + self.guest_to_host.lock().push(byte); + self.cv.notify_all(); + } + + fn recv_byte(&self) -> io::Result { + let mut q = self.host_to_guest.lock(); + match q.pop_front() { + Some(b) => Ok(b), + None => Err(io::Error::new(io::ErrorKind::WouldBlock, "empty")), + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + + /// Phase 1.7 round-trip: a fresh SCC loaded from a captured save_state must + /// re-serialize byte-identically. Use new_null so the test doesn't bind any + /// TCP ports. + #[test] + fn save_load_round_trip() { + let src = Z85c30::new_null(None); + { + let mut ch = src.channel_a.0.lock(); + ch.regs[0] = 0x44; + ch.regs[1] = 0x12; + ch.regs[3] = 0xc1; + ch.regs[5] = 0xea; + ch.reg_ptr = 7; + ch.status = 0x40; + } + { + let mut ch = src.channel_b.0.lock(); + ch.regs[0] = 0x88; + ch.regs[2] = 0x10; + ch.regs[15] = 0x05; + ch.reg_ptr = 3; + ch.status = 0x80; + } + let v1 = src.save_state(); + + let dst = Z85c30::new_null(None); + dst.load_state(&v1).expect("load_state"); + let v2 = dst.save_state(); + + assert_eq!(v1, v2, "Z85c30 save_state mismatch after load_state round-trip"); + } + + /// Phase 3.5: a long single-line `serial-send` from the host must arrive + /// at the guest's tty intact. Before the rx-thread fix, bytes 9..N of any + /// burst were silently dropped when SCC's 8-byte rx_queue filled — a 53- + /// char `dd if=/dev/rdsk/dks0d2s0 of=/tmp/r.bin bs=512 count=1\r` arrived + /// at IRIX as `dd if=/d=512 count=1`, causing CI scripts to fabricate + /// shell errors out of thin air. This test pushes that exact line through + /// the loopback CiSerialBackend, drains the SCC rx_queue at the rate the + /// IRIX kernel would (one byte at a time, polled), and asserts every + /// byte arrives. + #[test] + fn long_input_round_trips_without_loss() { + use std::sync::Arc; + use std::time::{Duration, Instant}; + + let scc = Z85c30::new_null(None); + let backend = Arc::new(CiSerialBackend::new()); + scc.set_backend_a(backend.clone()); + + // Enable RX on channel A so the rx thread queues bytes. tx_delay is + // tx-direction baud-rate emulation; set a small value so the test + // doesn't pay 19.2 kbaud-per-char latency. + { + let mut ch = scc.channel_a.0.lock(); + ch.regs[scc_regs::WR3 as usize] |= wr3::RX_ENABLE; + ch.tx_delay = 50; // 50 µs/byte + } + + scc.start(); + + let line = b"dd if=/dev/rdsk/dks0d2s0 of=/tmp/r.bin bs=512 count=1\r"; + backend.push_host(line); + + // Drain rx_queue at ~20 kHz so the rx thread always has space to + // push pending bytes. Mirrors how IRIX's tty driver consumes + // RR0::RX_CHAR_AVAILABLE. + let mut received = Vec::with_capacity(line.len()); + let deadline = Instant::now() + Duration::from_secs(5); + while received.len() < line.len() && Instant::now() < deadline { + let popped = { + let mut ch = scc.channel_a.0.lock(); + ch.rx_queue.pop_front() + }; + match popped { + Some(b) => received.push(b), + None => std::thread::sleep(Duration::from_micros(50)), + } + } + + scc.stop(); + + assert_eq!(received.len(), line.len(), + "expected {} bytes, got {} (lossy rx_queue?)", line.len(), received.len()); + assert_eq!(&received, line, "byte content mismatch — bytes dropped or reordered"); + } +} diff --git a/tools/iris-test b/tools/iris-test new file mode 100755 index 0000000..2700a7b --- /dev/null +++ b/tools/iris-test @@ -0,0 +1,270 @@ +#!/usr/bin/env python3 +"""iris-test — drive the IRIS --ci control socket against a test spec. + +A test is a YAML file. Steps currently supported: + + - type: serial + send: "rpm -ivh /tmp/grep.rpm\n" + expect: "complete" + timeout: 30 + + - type: sleep + seconds: 2 + +See ci_mode_plan.md for the full schema. Output is a one-line summary per +test (PASS/FAIL + elapsed), plus a raw-log directory on failure. + +Usage: + tools/iris-test path/to/test.yaml + +Optional flags: + --socket PATH Unix socket path (default: /tmp/iris.sock) + --verbose Dump every RPC and raw serial output + --no-restore Skip the restore step (useful when driving a + pre-booted emulator manually) +""" + +from __future__ import annotations + +import argparse +import json +import socket +import sys +import time +from dataclasses import dataclass +from pathlib import Path +from typing import Any + + +# ---------------------------------------------------------------------------- +# YAML without PyYAML dependency — simple subset parser. +# Supports the schema in ci_mode_plan.md. If the user has PyYAML installed we +# prefer that; otherwise fall back to a minimal parser for our schema. +# ---------------------------------------------------------------------------- + +def load_yaml(path: Path) -> dict[str, Any]: + try: + import yaml # type: ignore + with path.open() as f: + return yaml.safe_load(f) + except ImportError: + return _mini_yaml(path.read_text()) + + +def _mini_yaml(text: str) -> dict[str, Any]: + """A tiny YAML subset: top-level scalars, a 'steps' list of dicts with + scalar values. No anchors, flows, or nested structures.""" + root: dict[str, Any] = {} + steps: list[dict[str, Any]] = [] + in_steps = False + current: dict[str, Any] | None = None + + def coerce(v: str) -> Any: + if (v.startswith('"') and v.endswith('"')) or (v.startswith("'") and v.endswith("'")): + return v[1:-1].encode().decode("unicode_escape") + try: + if "." in v: + return float(v) + return int(v) + except ValueError: + return v + + for raw in text.splitlines(): + line = raw.rstrip() + if not line or line.lstrip().startswith("#"): + continue + if not line.startswith(" "): + if line == "steps:": + in_steps = True + continue + k, _, v = line.partition(":") + root[k.strip()] = coerce(v.strip()) + continue + stripped = line.lstrip() + if stripped.startswith("- "): + if current is not None: + steps.append(current) + current = {} + stripped = stripped[2:] + if current is None: + continue + k, _, v = stripped.partition(":") + current[k.strip()] = coerce(v.strip()) + if current is not None: + steps.append(current) + if in_steps: + root["steps"] = steps + return root + + +# ---------------------------------------------------------------------------- +# Control-socket client. +# ---------------------------------------------------------------------------- + +class CiClient: + def __init__(self, path: str, verbose: bool = False): + self.verbose = verbose + self.sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) + try: + self.sock.connect(path) + except (FileNotFoundError, ConnectionRefusedError) as e: + raise SystemExit( + f"iris-test: no iris --ci listener at {path} ({e}).\n" + f" Start one first, e.g. in another terminal:\n" + f" ./target/release/iris --ci --ci-socket {path}" + ) + self.f = self.sock.makefile("rw", buffering=1, newline="\n") + + def rpc(self, cmd: str, **args: Any) -> dict[str, Any]: + req = {"cmd": cmd, "args": args} + self.f.write(json.dumps(req) + "\n") + self.f.flush() + line = self.f.readline() + if not line: + raise RuntimeError(f"socket EOF while waiting for {cmd!r}") + resp = json.loads(line) + if self.verbose: + print(f" rpc {cmd} {args} -> {resp}", file=sys.stderr) + return resp + + def close(self) -> None: + try: + self.sock.close() + except OSError: + pass + + +# ---------------------------------------------------------------------------- +# Step runners. +# ---------------------------------------------------------------------------- + +@dataclass +class StepResult: + ok: bool + message: str + elapsed_s: float + captured: str = "" + + +def run_serial(client: CiClient, step: dict[str, Any]) -> StepResult: + start = time.monotonic() + send = step.get("send", "") + expect = step.get("expect") + timeout = int(step.get("timeout", 30)) + + if send: + resp = client.rpc("serial-send", data=send) + if not resp.get("ok"): + return StepResult(False, f"serial-send failed: {resp.get('error')}", time.monotonic() - start) + + captured = "" + if expect: + resp = client.rpc("wait-serial", pattern=expect, timeout_ms=timeout * 1000) + if not resp.get("ok"): + # Drain whatever is there so it shows up in failure output. + drained = client.rpc("serial-read").get("data", "") + return StepResult(False, resp.get("error", "wait-serial failed"), + time.monotonic() - start, captured=drained) + captured = resp.get("data", "") + + return StepResult(True, f"matched {expect!r}" if expect else "sent", time.monotonic() - start, captured=captured) + + +def run_sleep(_client: CiClient, step: dict[str, Any]) -> StepResult: + start = time.monotonic() + secs = float(step.get("seconds", 1)) + time.sleep(secs) + return StepResult(True, f"slept {secs}s", time.monotonic() - start) + + +def run_screenshot(client: CiClient, step: dict[str, Any]) -> StepResult: + start = time.monotonic() + path = step.get("path", "/tmp/iris-ci-screenshot.png") + min_bytes = int(step.get("min_bytes", 1024)) # sanity lower bound + + resp = client.rpc("screenshot", path=path) + if not resp.get("ok"): + return StepResult(False, f"screenshot: {resp.get('error')}", time.monotonic() - start) + try: + st = Path(path).stat() + except OSError as e: + return StepResult(False, f"screenshot file missing: {e}", time.monotonic() - start) + if st.st_size < min_bytes: + return StepResult(False, f"screenshot too small ({st.st_size} bytes, need >={min_bytes})", + time.monotonic() - start) + dims = resp.get("data", {}) + w, h = dims.get("width"), dims.get("height") + return StepResult(True, f"screenshot {w}x{h}, {st.st_size} bytes -> {path}", + time.monotonic() - start) + + +STEP_RUNNERS = { + "serial": run_serial, + "sleep": run_sleep, + "screenshot": run_screenshot, +} + + +# ---------------------------------------------------------------------------- +# Main. +# ---------------------------------------------------------------------------- + +def run_test(spec: dict[str, Any], socket_path: str, verbose: bool, no_restore: bool) -> int: + name = spec.get("name", "unnamed") + snapshot = spec.get("snapshot") + steps = spec.get("steps", []) + + overall_start = time.monotonic() + client = CiClient(socket_path, verbose=verbose) + + try: + if not no_restore: + if not snapshot: + print(f"FAIL {name} (spec missing 'snapshot')", file=sys.stderr) + return 2 + resp = client.rpc("restore", name=snapshot) + if not resp.get("ok"): + print(f"FAIL {name} (restore: {resp.get('error')})", file=sys.stderr) + return 2 + + for i, step in enumerate(steps): + kind = step.get("type", "") + runner = STEP_RUNNERS.get(kind) + if runner is None: + print(f"FAIL {name} (step {i}: unknown type {kind!r})", file=sys.stderr) + return 2 + result = runner(client, step) + if not result.ok: + print(f"FAIL {name} step {i} ({kind}): {result.message}") + if result.captured: + print("--- serial output ---") + print(result.captured, end="") + if not result.captured.endswith("\n"): + print() + print("--- end ---") + return 1 + if verbose: + print(f" step {i} ({kind}, {result.elapsed_s:.1f}s): {result.message}", + file=sys.stderr) + + total = time.monotonic() - overall_start + print(f"PASS {name} ({total:.1f}s)") + return 0 + finally: + client.close() + + +def main() -> int: + ap = argparse.ArgumentParser(description="Run an IRIS --ci test spec") + ap.add_argument("spec", type=Path) + ap.add_argument("--socket", default="/tmp/iris.sock") + ap.add_argument("--verbose", action="store_true") + ap.add_argument("--no-restore", action="store_true") + args = ap.parse_args() + + spec = load_yaml(args.spec) + return run_test(spec, args.socket, args.verbose, args.no_restore) + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/tools/tests/prom-smoke.yaml b/tools/tests/prom-smoke.yaml new file mode 100644 index 0000000..a90f3dc --- /dev/null +++ b/tools/tests/prom-smoke.yaml @@ -0,0 +1,11 @@ +name: prom-smoke +# Minimal sanity: cold-boot into PROM, wait for any banner output. +# No snapshot — start the CPU fresh. Use --no-restore with iris-test. + +steps: + - type: sleep + seconds: 1 + + - type: serial + expect: "SGI" + timeout: 60 diff --git a/tools/tests/restore-smoke.yaml b/tools/tests/restore-smoke.yaml new file mode 100644 index 0000000..8f3e93c --- /dev/null +++ b/tools/tests/restore-smoke.yaml @@ -0,0 +1,15 @@ +name: restore-smoke +# Loads the snapshot you captured interactively and verifies the guest is +# alive: send a CR and wait for an echo. Works regardless of whether the +# snapshot is in PROM, the maintenance menu, or an IRIX shell — any of +# them will echo a newline or re-display their prompt. +snapshot: working2 + +steps: + - type: sleep + seconds: 1 + + - type: serial + send: "\r" + expect: "\n" + timeout: 10 diff --git a/tools/tests/screenshot-smoke.yaml b/tools/tests/screenshot-smoke.yaml new file mode 100644 index 0000000..142407d --- /dev/null +++ b/tools/tests/screenshot-smoke.yaml @@ -0,0 +1,17 @@ +name: screenshot-smoke +# Restores the IRIX snapshot and captures a screenshot. Verifies that: +# 1. restore succeeded (snapshot machinery works end-to-end including overlay) +# 2. REX3 is alive and rendering in --ci mode (no --headless) +# 3. The framebuffer has plausible content (file size sanity check) +# +# This is the "GUI mode smoke" — doesn't rely on serial console output. +snapshot: working3 + +steps: + # Give REX3's refresh thread a beat to composite a frame after restore. + - type: sleep + seconds: 2 + + - type: screenshot + path: /tmp/iris-screenshot.png + min_bytes: 10240