Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
197 changes: 144 additions & 53 deletions HACKATHON_SUBMISSION.md
Original file line number Diff line number Diff line change
@@ -1,98 +1,189 @@
# Agent Flight Recorder
# NullOS Mission Control

## Problem Discovered

NullWatch already provides the observability layer for the nullclaw ecosystem:
run summaries, spans, evals, OTLP ingest, cost, token usage, and failure context.
It also exports a NullHub-compatible manifest. NullHub already provides the
operator UI and orchestration pages, but it did not register NullWatch or expose
its tracing/eval data in the UI.
The nullclaw ecosystem already has the building blocks of a lightweight local
agent platform: NullHub for control, NullBoiler for orchestration,
NullTickets for tracker-backed work, and NullWatch for traces and evals.
What was missing was a memorable local demo that shows these ideas as one
operator experience.

## Chosen Solution
Without that vertical slice, a new contributor or hackathon judge has to infer
the platform story from separate repositories, APIs, and docs.

Add a local-first Observability cockpit to NullHub:
## Chosen Solution

- register `nullwatch` as a known component
- proxy `/api/observability/*` to a managed NullWatch instance
- add a Flight Recorder page for runs, spans, evals, cost, tokens, and errors
- document the local demo flow through NullHub's managed install path
Add a local-first Mission Control page to NullHub:

- a deterministic backend mission API under `/api/mission-control`
- a versioned embedded replay fixture for scenario data
- a `/mission-control` control-room UI
- one cinematic workflow showing agent roles, checkpointing, test failure,
human intervention, recovered replay, review, and telemetry
- schema-versioned API responses and structured errors for invalid actions
- NullWatch-style trace references that map replay events to run ids, span ids,
operations, and eval keys
- a replay artifact export for sharing the current snapshot, source fixture,
and ecosystem mapping as JSON
- a local smoke test for the full mission lifecycle
- a judge-mode demo driver and macOS local video recorder
- screenshots and a written demo plan for PR review

The demo is intentionally deterministic. It does not call hosted services,
require model keys, or depend on a running multi-repo stack.

## Why This Idea Was Chosen

This is stronger than a single CLI preflight because it connects multiple parts
of the ecosystem into a visible agent platform story: execution, orchestration,
task tracking, observability, and operations. It is still hackathon-sized because
it uses existing NullWatch APIs and NullHub UI patterns instead of changing core
agent runtime behavior.
This was chosen over a smaller CLI-only contribution because it creates a
stronger hackathon story: judges can see autonomy, orchestration,
observability, failure recovery, and human-in-the-loop control in under three
minutes.

It belongs in NullHub because NullHub is already the control plane for the
ecosystem. The page can honestly present simulated NullTickets-style tasks,
NullBoiler-style checkpoints, and NullWatch-style telemetry while leaving a
clear future path to real cross-service wiring.

## What Was Implemented

- NullWatch component registration in the NullHub registry.
- Observability reverse proxy with optional bearer token forwarding.
- Sidebar entry and `/observability` UI page.
- API client methods for NullWatch summary, runs, spans, evals, and health.
- README documentation for the proxy and local demo setup.
- Added `src/api/mission_control.zig` with structured mission state, reset,
launch, recover, deterministic phase progression, telemetry, graph nodes,
graph edges, agent roles, failure details, and recovery details.
- Added `src/api/mission_control/code_red.v1.json` as the versioned replay
fixture for phase timing, graph, events, telemetry, and failure/recovery
metadata.
- Added `src/api/mission_control_replay.zig` to parse and validate replay
fixtures before serving mission state.
- Added validated trace references in mission events so the demo can deep-link
from Mission Control to `/observability?run_id=...` without requiring
NullWatch to be running for the local replay.
- Added explicit response metadata: `schema_version`, `mode`, `scenario_id`,
`scenario_version`, and `generated_at_ms`.
- Added `GET /api/mission-control/replay` to export the current snapshot,
source fixture, and NullTickets/NullBoiler/NullClaw/NullWatch mapping
metadata as a portable JSON artifact.
- Added transition guards so early recovery and duplicate launch return
actionable `409 Conflict` responses.
- Registered the Mission Control API in the NullHub server route table and API
metadata.
- Added typed frontend client methods for mission state and actions.
- Added a sidebar entry and `/mission-control` Svelte page with adaptive
polling, retry handling, trace chips, observability deep links, and responsive
mission panels.
- Added in-screen three-minute story beats and a failed-vs-recovered comparison
panel so the demo narrative remains visible during judging and PR review.
- Added a PR-ready plan file, README documentation, and screenshots.
- Added backend tests for mission path routing, idle state, failure state,
recovery state, action handlers, invalid transitions, and route semantics.
- Added replay fixture tests for duplicate ids, graph references, telemetry
references, trace references, ordering, required fields, and required phases.
- Added `tests/test_mission_control_smoke.sh` for live API validation.
- Added `scripts/mission_control_demo.sh` for a timed judge-mode mission run.
- Added `scripts/record_mission_control_demo.sh` and
`docs/demo/mission-control-local-demo.md` so the local demo can be recorded
as a review video artifact.
- Added `docs/demo/mission-control-replay-artifact.md` to document the export
schema and ecosystem mapping.
- Added `docs/demo/mission-control-pr-package.md` with the copy-ready PR title,
PR description, reviewer path, validation matrix, and three-minute hackathon
story.

## Files Changed

- `src/installer/registry.zig`
- `src/api/observability.zig`
- `src/api/proxy.zig`
- `src/api/components.zig`
- `MISSION_CONTROL_PLAN.md`
- `src/api/mission_control.zig`
- `src/api/mission_control_replay.zig`
- `src/api/mission_control/code_red.v1.json`
- `src/api/meta.zig`
- `src/root.zig`
- `src/server.zig`
- `ui/src/lib/api/client.ts`
- `ui/src/lib/components/Sidebar.svelte`
- `ui/src/routes/observability/+page.svelte`
- `ui/src/routes/mission-control/+page.svelte`
- `tests/test_mission_control_smoke.sh`
- `scripts/mission_control_demo.sh`
- `scripts/record_mission_control_demo.sh`
- `docs/demo/.gitignore`
- `docs/demo/mission-control-local-demo.md`
- `docs/demo/mission-control-replay-artifact.md`
- `docs/demo/mission-control-pr-package.md`
- `docs/screenshots/nullhub-mission-control-live.png`
- `docs/screenshots/nullhub-mission-control-recovered.png`
- `README.md`
- `HACKATHON_SUBMISSION.md`

## How To Test Or Demo

Start NullHub:
Run the backend tests:

```bash
zig build run -- serve --no-open
zig build test -Dembed-ui=false --summary all
```

Install NullWatch from NullHub:
Build the UI:

1. Open the web UI.
2. Go to `Install Component`.
3. Select `NullWatch`.
4. Keep or set the API port to `7710`.
5. Finish the wizard. The installer starts the NullWatch instance and NullHub
discovers it automatically.
```bash
npm --prefix ui run build
```

Optional sample data can be ingested through the NullHub proxy:
Start NullHub locally:

```bash
curl -X POST http://127.0.0.1:19800/api/observability/v1/spans \
-H 'Content-Type: application/json' \
-d '{"run_id":"demo-run-1","trace_id":"trace-demo-1","span_id":"span-1","source":"nullclaw","operation":"tool.call","status":"error","started_at_ms":1710000000000,"ended_at_ms":1710000001500,"tool_name":"shell","error_message":"tool call failed: command timed out","attributes_json":"{\"exit_code\":124}"}'
zig build run -- serve --host 127.0.0.1 --port 19802 --no-open
```

curl -X POST http://127.0.0.1:19800/api/observability/v1/evals \
-H 'Content-Type: application/json' \
-d '{"run_id":"demo-run-1","eval_key":"tool_success","scorer":"deterministic","score":0.0,"verdict":"fail","dataset":"demo","notes":"The tool call timed out."}'
Run the live smoke test:

```bash
NULLHUB_URL=http://127.0.0.1:19802 ./tests/test_mission_control_smoke.sh
```

Open `/observability` in NullHub and inspect the NullWatch runs.
Run the automated local demo:

```bash
MISSION_CONTROL_OPEN_BROWSER=1 ./scripts/mission_control_demo.sh
```

Export the current replay artifact:

```bash
curl -fsS http://127.0.0.1:19802/api/mission-control/replay \
-o mission-control-replay.json
```

Record a local macOS video artifact:

```bash
./scripts/record_mission_control_demo.sh
```

The generated `.mov` is ignored by git and can be uploaded directly to the PR
discussion or hackathon submission.

Open `/mission-control`, then:

## Screenshots
1. Click `Launch Mission`.
2. Watch the workflow progress through research, patching, checkpointing, and
test execution.
3. When the test fails, click `Fork From Checkpoint`.
4. Use the trace chips or failed/recovered run links to jump into Flight
Recorder deep links.
5. Watch the recovered run pass and complete review.

Flight Recorder overview:
Live mission state:

![NullHub Observability overview](docs/screenshots/nullhub-observability-overview.png)
![NullHub Mission Control live workflow](docs/screenshots/nullhub-mission-control-live.png)

Failure detail with tool-call error context:
Recovered mission:

![NullHub Observability failure detail](docs/screenshots/nullhub-observability-failure.png)
![NullHub Mission Control recovered workflow](docs/screenshots/nullhub-mission-control-recovered.png)

## Limitations And Future Improvements

- `NULLWATCH_URL` remains useful for pointing NullHub at an external NullWatch
instance, but the default demo path uses a managed NullWatch install.
- The first UI version renders a compact timeline, not a full waterfall chart.
- Run correlation with NullBoiler orchestration pages can be added as a follow-up
when both systems share stable run ids.
- The MVP uses deterministic demo state instead of real cross-service execution.
- The mission replay maps to NullTickets, NullBoiler, and NullWatch concepts,
but does not yet write into those services.
- A future version could add durable replay storage, side-by-side replay
comparison, exportable replay bundles, real NullWatch span hydration, and a
judge-mode one-click replay.
Loading
Loading