WIP - burst-100k benchmark mvp by kisernl · Pull Request #118 · computesdk/benchmarks

kisernl · 2026-05-13T10:36:45Z

This pull request implements the infrastructure and code to support the "100k burst" benchmark, enabling large-scale concurrent benchmarking of compute providers. It introduces a new database schema, coordinator logic, provider configuration, launch automation, and supporting scripts and dependencies. The changes are grouped below by theme.

Database and Schema Setup:

Added db/burst-100k.sql, defining the runs and sandbox_results tables and associated indexes for tracking benchmark runs and their results. The schema is idempotent and optimized for status and stuck-run queries.

Coordinator and Benchmark Logic:

Added src/burst-100k/coordinator.ts, the main coordinator that orchestrates the benchmark run: validates environment, manages heartbeats, coordinates results writing to Postgres and R2, handles shutdown, and computes final statistics.
Added src/burst-100k/providers.ts, defining the opt-in provider(s) (initially "e2b") and their configuration, including environment requirements and concurrency targets.

Automation and Tooling:

Added scripts/burst-100k-launch.sh, a robust shell script to provision a Namespace VM, upload the coordinator bundle, apply the schema, and launch the benchmark with proper environment and error handling.
Added a comprehensive implementation checklist in one-hundred-k-mvp-checklist.md to track progress and document lessons learned.

Dependency and Build Updates:

Updated package.json to add dependencies required for the benchmark (pg, p-limit, esbuild, @types/pg) and new scripts for bundling and running the coordinator locally. [1] [2]

Database and Schema:

Introduced db/burst-100k.sql with runs and sandbox_results tables, supporting indexes, and idempotent schema evolution for tracking large-scale benchmark runs.

Coordinator and Providers:

Implemented src/burst-100k/coordinator.ts to manage the lifecycle of a benchmark run, including environment validation, progress tracking, error handling, and finalization.
Defined provider opt-in and configuration in src/burst-100k/providers.ts, starting with the "e2b" provider and supporting extensibility.

Automation and Documentation:

Added scripts/burst-100k-launch.sh to automate VM provisioning, schema setup, bundle upload, and detached benchmark execution on Namespace.
Created one-hundred-k-mvp-checklist.md to document implementation steps, operational notes, and onboarding for additional providers.

Build and Dependency Management:

Updated package.json with new dependencies and scripts for building and running the burst-100k coordinator. [1] [2]

Summary by CodeRabbit

New Features
- Added a 100k concurrent sandbox burst benchmark system for performance testing across providers.
Documentation
- Added planning guide, implementation checklist, and data inventory documentation for the benchmark.
Chores
- Added npm build scripts and dependencies (p-limit, pg, esbuild).
- Added database schema and launch script for benchmark orchestration.
- Updated .gitignore to exclude build artifacts.

Adds opt-in 100k-sandbox burst benchmark module alongside the daily ~100-burst path. Includes the design plan + implementation checklist, idempotent Postgres schema, and a coordinator (types, e2b provider, pg/R2 sinks, p-limit ramp runner) bundled via esbuild. Local smoke validated against e2b/Neon/R2; launch script + workflow are next.

scripts/burst-100k-launch.sh provisions a Namespace VM, applies the schema, uploads the bundled coordinator, records the run, and starts the coordinator detached. Uses --bare/--cidfile for nsc, installs nodejs via apk, passes env via an uploaded chmod-600 startup script (printf %q-quoted) that self-destructs after detaching node, and pgrep-verifies the hand-off succeeded. Validated end-to-end at N=10 against e2b/Neon/R2.

github-actions · 2026-05-13T10:37:50Z

Browser Benchmark Results

#	Provider	Score	Create	Connect	Navigate	Release	Total	Status
1	Hyperbrowser	95.4	0.18s	0.15s	0.10s	0.09s	0.55s	10/10
2	Kernel	94.5	0.03s	0.36s	0.13s	0.06s	0.57s	10/10
3	Browserbase	93.6	0.22s	0.11s	0.16s	0.14s	0.64s	10/10
4	Steel	87.0	0.21s	0.79s	0.16s	0.13s	1.41s	10/10

View full run · SVG available as build artifact

github-actions · 2026-05-13T10:38:17Z

Sandbox Benchmark Results

Sequential

#	Provider	Score	Median TTI	P95	P99	Status
1	declaw	98.8	0.04s	0.25s	0.25s	10/10
2	upstash	96.5	0.33s	0.39s	0.39s	10/10
3	daytona	96.2	0.27s	0.54s	0.54s	10/10
4	archil	94.9	0.45s	0.59s	0.59s	10/10
5	vercel	92.2	0.69s	0.92s	0.92s	10/10
6	blaxel	91.8	0.50s	1.29s	1.29s	10/10
7	runloop	81.9	1.59s	2.13s	2.13s	10/10
8	hopx	79.1	1.63s	2.78s	2.78s	10/10
9	modal	77.9	1.85s	2.76s	2.76s	10/10
10	namespace	75.9	1.76s	3.38s	3.38s	10/10
11	e2b	73.7	0.61s	5.65s	5.65s	10/10
12	codesandbox	72.2	2.57s	3.11s	3.11s	10/10
13	cloudflare	71.6	2.57s	3.24s	3.24s	10/10
14	tensorlake	0.0	0.00s	0.00s	0.00s	0/10

Staggered

#	Provider	Score	Median TTI	P95	P99	Status
1	declaw	99.6	0.04s	0.05s	0.05s	10/10
2	upstash	96.4	0.34s	0.38s	0.38s	10/10
3	archil	96.0	0.33s	0.51s	0.51s	10/10
4	daytona	95.5	0.28s	0.70s	0.70s	10/10
5	e2b	95.0	0.43s	0.61s	0.61s	10/10
6	blaxel	94.8	0.51s	0.55s	0.55s	10/10
7	vercel	92.5	0.59s	0.97s	0.97s	10/10
8	hopx	84.0	1.39s	1.91s	1.91s	10/10
9	namespace	80.8	1.77s	2.15s	2.15s	10/10
10	modal	80.2	1.86s	2.16s	2.16s	10/10
11	runloop	79.4	1.93s	2.24s	2.24s	10/10
12	codesandbox	70.5	2.59s	3.49s	3.49s	10/10
13	cloudflare	68.9	2.80s	3.57s	3.57s	10/10
14	tensorlake	0.0	0.00s	0.00s	0.00s	0/10

Burst

#	Provider	Score	Median TTI	P95	P99	Status
1	declaw	99.5	0.05s	0.05s	0.05s	10/10
2	daytona	97.2	0.25s	0.33s	0.33s	10/10
3	archil	95.7	0.37s	0.52s	0.52s	10/10
4	upstash	95.0	0.42s	0.63s	0.63s	10/10
5	blaxel	94.2	0.54s	0.64s	0.64s	10/10
6	e2b	93.8	0.50s	0.82s	0.82s	10/10
7	vercel	90.8	0.61s	1.37s	1.37s	10/10
8	runloop	79.9	1.92s	2.16s	2.16s	10/10
9	modal	79.4	1.85s	2.38s	2.38s	10/10
10	hopx	79.3	2.01s	2.16s	2.16s	10/10
11	namespace	77.7	1.94s	2.66s	2.66s	10/10
12	codesandbox	68.4	3.01s	3.38s	3.38s	10/10
13	cloudflare	59.3	2.31s	5.05s	5.05s	9/10
14	tensorlake	0.0	0.00s	0.00s	0.00s	0/10

View full run · SVGs available as build artifacts

github-actions · 2026-05-13T10:40:44Z

Storage Benchmark Results

1MB Files

#	Provider	Score	Download	Throughput	Upload	Status
1	AWS S3	94.8	0.13s	63.9 Mbps	0.05s	1000/1000
2	Cloudflare R2	94.8	0.11s	79.6 Mbps	0.18s	1000/1000
3	Tigris	93.7	0.52s	16.1 Mbps	0.12s	1000/1000

4MB Files

#	Provider	Score	Download	Throughput	Upload	Status
1	Cloudflare R2	95.0	0.18s	187.6 Mbps	0.30s	1000/1000
2	AWS S3	92.5	0.91s	36.7 Mbps	0.23s	1000/1000
3	Tigris	90.3	1.91s	17.5 Mbps	0.36s	1000/1000

10MB Files

#	Provider	Score	Download	Throughput	Upload	Status
1	Cloudflare R2	94.6	0.30s	278.1 Mbps	0.71s	1000/1000
2	AWS S3	88.8	2.50s	33.5 Mbps	0.51s	1000/1000
3	Tigris	84.4	3.94s	21.3 Mbps	0.52s	992/1000

16MB Files

#	Provider	Score	Download	Throughput	Upload	Status
1	Cloudflare R2	94.3	0.44s	304.3 Mbps	0.74s	1000/1000
2	AWS S3	86.8	3.19s	42.0 Mbps	0.52s	1000/1000
3	Tigris	79.5	5.76s	23.3 Mbps	0.55s	989/1000

View full run · SVGs available as build artifacts

Same S3-compatible API, different provider. Renames sinks/r2.ts → sinks/tigris.ts (R2Sink → TigrisSink), env vars R2_* → TIGRIS_STORAGE_*, and the runs.r2_prefix column → tigris_prefix. Also fixes launch.sh's pgrep false-negative (now retries up to 10s and matches against coordinator.cjs) and updates the plan doc to reflect Tigris and the current --bare/--cidfile nsc flags. Validated end-to-end at N=10: 10/10 sandboxes ok, all three Tigris objects (raw.jsonl, heartbeat.json, meta.json) written.

Coordinator now reads $COORDINATOR_LOG_PATH (set to /root/run.log by launch.sh) and pushes its own stdout/stderr to Tigris on every heartbeat and at shutdown. Closes the "logs die with the VM" gap. Local runs skip silently when the env var is unset.

coderabbitai · 2026-05-13T19:54:17Z

📝 Walkthrough

Walkthrough

This pull request implements a complete "100k burst" benchmark system that orchestrates 100,000 concurrent sandbox creation requests, persists results to Postgres and Tigris, and decouples long-running work from GitHub Actions via SSH to a Namespace VM.

Changes

100k Burst Benchmark Implementation

Layer / File(s)	Summary
Planning documentation and database schema `one-hundred-k-mvp-plan.md`, `one-hundred-k-mvp-checklist.md`, `one-hundred-k-mvp-data-inventory.md`, `db/burst-100k.sql`, `.gitignore`	Documents the burst-100k design, goals, architecture, and operational model. Defines Postgres `runs` table (with heartbeat/stuck-run indexes) and `sandbox_results` table (composite PK, status enum, HTTP/error fields, foreign key to runs) for durable result capture. Updates repo configuration to ignore the build output directory.
Type definitions and provider registry `src/burst-100k/types.ts`, `src/burst-100k/providers.ts`	Defines `BurstProviderConfig` (concurrency/ramp/timeout), `SandboxResult` (timing/latency/status/error), `ProgressStats`, and `FinalStats` types. Exports opt-in provider registry for e2b, modal, runloop with `getProvider(name)` lookup and required env var gates.
Concurrency-ramped task runner `src/burst-100k/runner.ts`	Implements `runBurst` that linearly ramps sandbox-creation requests over a concurrency-limited pool, measures per-request latency, classifies failures (timeout/http_error/network_error), and triggers non-blocking cleanup via `sandbox.destroy()`.
Postgres and Tigris persistence `src/burst-100k/sinks/postgres.ts`, `src/burst-100k/sinks/tigris.ts`	Implements batched Postgres sink for runs/sandbox_results with heartbeat and lifecycle updates. Implements S3-compatible Tigris sink for streaming JSONL results, periodic heartbeat snapshots, final metadata, and coordinator logs via multipart upload.
Coordinator main loop and orchestration `src/burst-100k/coordinator.ts`	Wires provider/Postgres/Tigris from environment, validates required env vars, runs burst orchestrator with streaming writes and periodic heartbeats, computes final latency distributions (p50/p99/histogram), and manages SIGTERM/SIGINT shutdown with log uploads and failure recording.
Build configuration and launch orchestration `package.json`, `scripts/burst-100k-launch.sh`	Adds npm scripts for bundling coordinator to CJS via esbuild. Implements launch script that validates env, bundles coordinator, applies Postgres schema idempotently, provisions Namespace VM, uploads bundle, generates startup script with quoted env forwarding, records run in Postgres, verifies process startup, and surfaces operational next-steps.

Sequence Diagram(s)

sequenceDiagram
  participant Main as Coordinator Main
  participant Postgres as PostgresSink
  participant Tigris as TigrisSink
  participant Runner as runBurst
  participant Compute as Provider Compute
  
  Main->>Postgres: bootstrap(provider, commit_sha, instance_id, tigris_prefix)
  Main->>Tigris: initialize with config
  Main->>+Runner: runBurst with callbacks
  
  loop Ramp & Concurrency
    Runner->>Compute: createSandbox (ramped over rampSeconds)
    Compute-->>Runner: SandboxResult (latency, status, timing)
    Runner->>Tigris: writeResult (streaming JSONL)
    Runner->>Postgres: write (buffered)
    Runner->>Main: onProgress callback
  end
  
  Runner-->>-Main: completion
  Main->>Postgres: flush pending results
  Main->>Postgres: complete(final stats with p50/p99)
  Main->>Tigris: writeMeta (full distribution, final stats)
  Main->>Tigris: writeLog (coordinator log)
  Main->>Tigris: close (await multipart upload)
  Main->>Postgres: close connection
  
  Note over Main: SIGTERM/SIGINT gracefully flushes and exits

sequenceDiagram
  participant Script as burst-100k-launch.sh
  participant Build as esbuild
  participant DB as Postgres
  participant NSC as nsc CLI
  participant VM as Namespace VM
  participant Coordinator as coordinator process
  
  Script->>Script: validate required env vars
  Script->>Build: npm run bundle:burst-100k
  Build-->>Script: dist/burst-100k.cjs
  Script->>DB: psql (apply burst-100k.sql)
  Script->>NSC: nsc create (provision VM)
  NSC-->>Script: instance_id
  Script->>DB: INSERT runs row (ON CONFLICT DO NOTHING)
  Script->>VM: upload dist/burst-100k.cjs
  Script->>VM: generate /root/start.sh (with quoted env)
  Script->>VM: chmod +x /root/start.sh
  Script->>VM: execute /root/start.sh detached
  Script->>Script: wait & retry pgrep coordinator.cjs
  Coordinator->>VM: run (heartbeat, write results)
  Script-->>Script: print OK + operational commands

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A burst of a hundred thousand dreams,
Racing through concurrency streams,
Postgres and Tigris hold the tale,
Heartbeats and latencies never fail—
From humble laptop to Namespace VM,
The benchmark hops with grace and vim! 🚀

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'WIP - burst-100k benchmark mvp' is vague and uses non-descriptive terms ('WIP', 'mvp') without conveying the primary technical change or scope of the extensive changeset.	Replace with a more specific title summarizing the main change, e.g., 'Add burst-100k benchmark infrastructure with coordinator, schema, and launch automation' or similar.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch one-hundred-k-mvp

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (1)

src/burst-100k/runner.ts (1)

54-77: ⚡ Quick win

Consider logging errors from onResult callback.

Line 68 swallows errors from callbacks.onResult(result) to prevent a single write failure from aborting the entire burst. However, silent failures make debugging difficult if results are not being written.

Consider logging caught errors:

try { await callbacks.onResult(result); } catch (e: any) { 
  console.error('[runner] onResult failed:', e?.message ?? e);
}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/burst-100k/runner.ts` around lines 54 - 77, In the finally block where
the code awaits callbacks.onResult(result) inside runner.ts, change the empty
catch that swallows errors to log the caught error details (message and stack if
present) so write failures are visible; specifically wrap the await
callbacks.onResult(result) so its catch logs something like a clear prefix (e.g.
"[runner] onResult failed:") plus e?.message and e?.stack (or the error object)
and then continue to swallow to avoid aborting the burst.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@db/burst-100k.sql`:
- Around line 15-17: Stuck-run detection currently only checks last_heartbeat
age and therefore misses rows with status='running' where last_heartbeat IS
NULL; update the detection logic to treat NULL heartbeats as stale by including
rows WHERE status = 'running' AND (last_heartbeat IS NULL OR last_heartbeat <
now() - interval '...') so that runs that never emitted a heartbeat are flagged;
apply the same change to the other occurrence that references last_heartbeat and
status (the second block mentioned around lines 29-32).

In `@one-hundred-k-mvp-checklist.md`:
- Around line 12-14: Update the checklist and any related docs to remove or
replace legacy R2/static-token steps with the new Tigris/OIDC flow: search for
and update references to "R2", "sinks/r2.ts", "NSC_TOKEN", and any checklist
items that suggest running aws s3 round-trips or static token setup, and replace
them with instructions that validate Tigris connectivity and OIDC-based auth
(e.g., OIDC trust configuration and runtime token exchange), making the guidance
consistent across lines noted (around items previously at 12–14, 35, 45, 50–52,
81, 104).

In `@one-hundred-k-mvp-plan.md`:
- Line 60: Several fenced code blocks are unlabeled and trigger markdownlint
MD040; update each unlabeled triple-backtick fence that surrounds the snippets
with an appropriate language identifier (e.g., use ```text) so the blocks become
labeled. Specifically, change the fences around the blocks containing the
strings "src/burst-100k/", "GitHub Action (workflow_dispatch / schedule)",
"s3://<bucket>/<run_id>/", and "GitHub Secrets" to use a language tag (e.g.,
text), and apply the same fix to the other unlabeled fences that contain the
same or similar snippets referenced in the comment (the other occurrences noted
in the review). Ensure every opening ``` has a language identifier and the
corresponding closing ``` remains.
- Around line 135-136: Update the example options array so it accurately
reflects current opt-in workflow providers: replace or augment the existing
entry "options: [e2b, modal, daytona, codesandbox]" to include "runloop" (i.e.,
ensure the list contains runloop alongside e2b, modal, daytona, codesandbox) and
keep the accompanying comment about only providers opted in; make the change
where the options line appears in the document.

In `@scripts/burst-100k-launch.sh`:
- Around line 71-76: The INSERT into the runs table uses unescaped shell
variable interpolation (RUN_ID, PROVIDER, GITHUB_SHA, INSTANCE_ID,
TIGRIS_STORAGE_BUCKET) which can cause SQL injection or syntax errors; change
the psql invocation to use parameterized psql variables (e.g. \set or --set) or
psql's parameter binding (psql ... -c "INSERT ... VALUES (:v1,:v2,...)") and
pass the shell values via --set / -v to safely substitute them, or alternatively
shell-escape each variable before embedding; update the block that builds the
SQL string where RUN_ID, PROVIDER, GITHUB_SHA, INSTANCE_ID and
TIGRIS_STORAGE_BUCKET are referenced so values are passed as psql parameters
instead of raw interpolation.

In `@src/burst-100k/coordinator.ts`:
- Around line 30-35: The code sets provider.concurrencyTarget =
parseInt(override, 10) without validating the result; update the override
handling so after calling parseInt(override, 10) you check for NaN (e.g.,
Number.isNaN or !Number.isFinite) and handle invalid input by either falling
back to the original provider.concurrencyTarget or exiting with a clear error
log; update the console.log to reflect the validated integer (or the fallback)
and reference the exact symbols process.env.CONCURRENCY_TARGET,
parseInt(override, 10) and provider.concurrencyTarget when making the change.

In `@src/burst-100k/sinks/postgres.ts`:
- Around line 38-47: The timed flush handler in write (references: write,
flushTimer, flush, buffer, BATCH_SIZE, BATCH_TIMEOUT_MS) swallows errors via
.catch and only logs them, risking silent data loss; modify the handler to track
consecutive failures (e.g., a failureCounter on the PostgresSink instance),
increment it on each flush rejection and emit a warning or call a provided error
callback / trigger run-failure once the counter exceeds a threshold N, and reset
the counter on a successful flush; alternatively allow the error to propagate by
rethrowing from the timer callback or invoking the sink's error handler so the
coordinator can react.

---

Nitpick comments:
In `@src/burst-100k/runner.ts`:
- Around line 54-77: In the finally block where the code awaits
callbacks.onResult(result) inside runner.ts, change the empty catch that
swallows errors to log the caught error details (message and stack if present)
so write failures are visible; specifically wrap the await
callbacks.onResult(result) so its catch logs something like a clear prefix (e.g.
"[runner] onResult failed:") plus e?.message and e?.stack (or the error object)
and then continue to swallow to avoid aborting the burst.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 5678f203-d2c0-455d-b181-aab0a46e1b9e

📥 Commits

Reviewing files that changed from the base of the PR and between 8205b11 and 63c19b3.

⛔ Files ignored due to path filters (1)

package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (13)

.gitignore
db/burst-100k.sql
one-hundred-k-mvp-checklist.md
one-hundred-k-mvp-data-inventory.md
one-hundred-k-mvp-plan.md
package.json
scripts/burst-100k-launch.sh
src/burst-100k/coordinator.ts
src/burst-100k/providers.ts
src/burst-100k/runner.ts
src/burst-100k/sinks/postgres.ts
src/burst-100k/sinks/tigris.ts
src/burst-100k/types.ts

coderabbitai · 2026-05-13T20:01:42Z

+  last_heartbeat        TIMESTAMPTZ,
+  status                TEXT NOT NULL               -- running | done | failed
+                        CHECK (status IN ('running', 'done', 'failed')),


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stuck-run detection misses runs that never emitted a heartbeat.

Using only last_heartbeat excludes status='running' rows where heartbeat is NULL (e.g., coordinator dies before first heartbeat), so they won’t be flagged as stuck.

Suggested fix

--- a/db/burst-100k.sql +++ b/db/burst-100k.sql @@ --- Partial index for the stuck-run query: --- SELECT * FROM runs WHERE status='running' AND last_heartbeat < now() - interval '5 minutes'; +-- Partial index for the stuck-run query: +-- SELECT * FROM runs +-- WHERE status='running' +-- AND COALESCE(last_heartbeat, started_at) < now() - interval '5 minutes'; CREATE INDEX IF NOT EXISTS runs_stuck - ON runs (last_heartbeat) WHERE status = 'running'; + ON runs ((COALESCE(last_heartbeat, started_at))) WHERE status = 'running';

Also applies to: 29-32

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@db/burst-100k.sql` around lines 15 - 17, Stuck-run detection currently only checks last_heartbeat age and therefore misses rows with status='running' where last_heartbeat IS NULL; update the detection logic to treat NULL heartbeats as stale by including rows WHERE status = 'running' AND (last_heartbeat IS NULL OR last_heartbeat < now() - interval '...') so that runs that never emitted a heartbeat are flagged; apply the same change to the other occurrence that references last_heartbeat and status (the second block mentioned around lines 29-32).

coderabbitai · 2026-05-13T20:01:42Z

+- [x] R2 bucket created; access key has write + multipart permission
+- [x] R2 reachable from a Namespace VM (one-off `aws s3 cp` round-trip)
+- [x] Namespace auth via static token (`NSC_TOKEN` env secret in `burst-100k` environment); OIDC trust deferred


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Checklist guidance is inconsistent with the Tigris/OIDC flow in this PR.

Several items still point to R2/static-token-era steps (R2, sinks/r2.ts, NSC_TOKEN). This can drive incorrect environment setup and validation for new runs.

Suggested fix (representative updates)

-- [x] R2 bucket created; access key has write + multipart permission -- [x] R2 reachable from a Namespace VM (one-off `aws s3 cp` round-trip) -- [x] Namespace auth via static token (`NSC_TOKEN` env secret in `burst-100k` environment); OIDC trust deferred +- [x] Tigris bucket created; access key has write + multipart permission +- [x] Tigris reachable from a Namespace VM (one-off `aws s3 cp` round-trip) +- [x] Namespace auth via OIDC (`namespacelabs/nscloud-setup@v0`) validated -- [x] `sinks/r2.ts` — `@aws-sdk/lib-storage` multipart upload for `raw.jsonl`; `putObject` for `heartbeat.json` and `meta.json` +- [x] `sinks/tigris.ts` — `@aws-sdk/lib-storage` multipart upload for `raw.jsonl`; `putObject` for `heartbeat.json` and `meta.json` -- [x] `raw.jsonl` present in R2 at `s3://<bucket>/<run_id>/` (100 lines, first/last span the ~60s ramp) -- [x] `heartbeat.json` present in R2 and was updated +- [x] `raw.jsonl` present in Tigris at `s3://<bucket>/<run_id>/` (100 lines, first/last span the ~60s ramp) +- [x] `heartbeat.json` present in Tigris and was updated -- [ ] R2 multipart parts appear under the run prefix +- [ ] Tigris multipart parts appear under the run prefix -- [ ] `raw.jsonl` in R2 contains ~100k lines +- [ ] `raw.jsonl` in Tigris contains ~100k lines

Also applies to: 35-35, 45-45, 50-52, 81-81, 104-104

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@one-hundred-k-mvp-checklist.md` around lines 12 - 14, Update the checklist and any related docs to remove or replace legacy R2/static-token steps with the new Tigris/OIDC flow: search for and update references to "R2", "sinks/r2.ts", "NSC_TOKEN", and any checklist items that suggest running aws s3 round-trips or static token setup, and replace them with instructions that validate Tigris connectivity and OIDC-based auth (e.g., OIDC trust configuration and runtime token exchange), making the guidance consistent across lines noted (around items previously at 12–14, 35, 45, 50–52, 81, 104).

coderabbitai · 2026-05-13T20:01:42Z

+output, which is the wrong model for a coordinator that runs detached on a
+remote VM and streams to Tigris/Postgres.
+
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks.

These unlabeled fences trigger markdownlint MD040; labeling them keeps docs lint-clean.

Suggested fix

-``` +```text src/burst-100k/ @@ -``` +```text GitHub Action (workflow_dispatch / schedule) @@ -``` +```text s3://<bucket>/<run_id>/ @@ -``` +```text GitHub Secrets

Also applies to: 93-93, 390-390, 411-411

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 60-60: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@one-hundred-k-mvp-plan.md` at line 60, Several fenced code blocks are unlabeled and trigger markdownlint MD040; update each unlabeled triple-backtick fence that surrounds the snippets with an appropriate language identifier (e.g., use ```text) so the blocks become labeled. Specifically, change the fences around the blocks containing the strings "src/burst-100k/", "GitHub Action (workflow_dispatch / schedule)", "s3://<bucket>/<run_id>/", and "GitHub Secrets" to use a language tag (e.g., text), and apply the same fix to the other unlabeled fences that contain the same or similar snippets referenced in the comment (the other occurrences noted in the review). Ensure every opening ``` has a language identifier and the corresponding closing ``` remains.

coderabbitai · 2026-05-13T20:01:42Z

+        options: [e2b, modal, daytona, codesandbox]  # only providers opted in
+  # schedule trigger deliberately omitted for v1 — see open questions


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Workflow provider options example is out of sync with current opt-in providers.

The example says “only providers opted in” but lists daytona/codesandbox and omits runloop, which can mislead readers during setup.

Suggested fix

- options: [e2b, modal, daytona, codesandbox] # only providers opted in + options: [e2b, modal, runloop] # only providers opted in

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

options: [e2b, modal, daytona, codesandbox] # only providers opted in

# schedule trigger deliberately omitted for v1 — see open questions

options: [e2b, modal, runloop] # only providers opted in

# schedule trigger deliberately omitted for v1 — see open questions

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@one-hundred-k-mvp-plan.md` around lines 135 - 136, Update the example options array so it accurately reflects current opt-in workflow providers: replace or augment the existing entry "options: [e2b, modal, daytona, codesandbox]" to include "runloop" (i.e., ensure the list contains runloop alongside e2b, modal, daytona, codesandbox) and keep the accompanying comment about only providers opted in; make the change where the options line appears in the document.

coderabbitai · 2026-05-13T20:01:42Z

+psql "$PG_URL" -v ON_ERROR_STOP=1 -q -c "
+  INSERT INTO runs (id, provider, commit_sha, instance_id, started_at, status, tigris_prefix)
+  VALUES ('$RUN_ID', '$PROVIDER', '$GITHUB_SHA', '$INSTANCE_ID', now(), 'running',
+          's3://${TIGRIS_STORAGE_BUCKET}/${RUN_ID}/')
+  ON CONFLICT (id) DO NOTHING;
+"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

SQL injection risk in inline string interpolation.

The SQL statement directly interpolates $RUN_ID, $PROVIDER, $GITHUB_SHA, and $INSTANCE_ID without escaping. If any of these variables contain single quotes or other SQL special characters, the query will break or could be exploited.

Use psql parameterized queries or escape the values properly.

🛡️ Proposed fix using psql variables

-psql "$PG_URL" -v ON_ERROR_STOP=1 -q -c " - INSERT INTO runs (id, provider, commit_sha, instance_id, started_at, status, tigris_prefix) - VALUES ('$RUN_ID', '$PROVIDER', '$GITHUB_SHA', '$INSTANCE_ID', now(), 'running', - 's3://${TIGRIS_STORAGE_BUCKET}/${RUN_ID}/') - ON CONFLICT (id) DO NOTHING; -" +psql "$PG_URL" -v ON_ERROR_STOP=1 -q \ + -v run_id="$RUN_ID" \ + -v provider="$PROVIDER" \ + -v commit_sha="$GITHUB_SHA" \ + -v instance_id="$INSTANCE_ID" \ + -v tigris_prefix="s3://${TIGRIS_STORAGE_BUCKET}/${RUN_ID}/" \ + -c " + INSERT INTO runs (id, provider, commit_sha, instance_id, started_at, status, tigris_prefix) + VALUES (:'run_id', :'provider', :'commit_sha', :'instance_id', now(), 'running', :'tigris_prefix') + ON CONFLICT (id) DO NOTHING; +"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/burst-100k-launch.sh` around lines 71 - 76, The INSERT into the runs table uses unescaped shell variable interpolation (RUN_ID, PROVIDER, GITHUB_SHA, INSTANCE_ID, TIGRIS_STORAGE_BUCKET) which can cause SQL injection or syntax errors; change the psql invocation to use parameterized psql variables (e.g. \set or --set) or psql's parameter binding (psql ... -c "INSERT ... VALUES (:v1,:v2,...)") and pass the shell values via --set / -v to safely substitute them, or alternatively shell-escape each variable before embedding; update the block that builds the SQL string where RUN_ID, PROVIDER, GITHUB_SHA, INSTANCE_ID and TIGRIS_STORAGE_BUCKET are referenced so values are passed as psql parameters instead of raw interpolation.

coderabbitai · 2026-05-13T20:01:42Z

+  // Allow env override of concurrencyTarget for local smoke tests.
+  const override = process.env.CONCURRENCY_TARGET;
+  if (override) {
+    provider.concurrencyTarget = parseInt(override, 10);
+    console.log(`[coordinator] override CONCURRENCY_TARGET=${provider.concurrencyTarget}`);
+  }


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate CONCURRENCY_TARGET parsing to prevent NaN.

Line 33 uses parseInt(override, 10) without validation. If CONCURRENCY_TARGET is set to a non-numeric value, provider.concurrencyTarget will be NaN, causing the runner to fail with cryptic errors.

🛡️ Proposed fix

const override = process.env.CONCURRENCY_TARGET; if (override) { - provider.concurrencyTarget = parseInt(override, 10); + const parsed = parseInt(override, 10); + if (isNaN(parsed) || parsed <= 0) { + console.error(`[coordinator] invalid CONCURRENCY_TARGET: ${override}`); + process.exit(1); + } + provider.concurrencyTarget = parsed; console.log(`[coordinator] override CONCURRENCY_TARGET=${provider.concurrencyTarget}`); }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/burst-100k/coordinator.ts` around lines 30 - 35, The code sets provider.concurrencyTarget = parseInt(override, 10) without validating the result; update the override handling so after calling parseInt(override, 10) you check for NaN (e.g., Number.isNaN or !Number.isFinite) and handle invalid input by either falling back to the original provider.concurrencyTarget or exiting with a clear error log; update the console.log to reflect the validated integer (or the fallback) and reference the exact symbols process.env.CONCURRENCY_TARGET, parseInt(override, 10) and provider.concurrencyTarget when making the change.

coderabbitai · 2026-05-13T20:01:42Z

+  async write(result: SandboxResult): Promise<void> {
+    this.buffer.push(result);
+    if (this.buffer.length >= BATCH_SIZE) {
+      await this.flush();
+    } else if (!this.flushTimer) {
+      this.flushTimer = setTimeout(() => {
+        this.flush().catch(err => console.error('[pg] timed flush failed:', err.message));
+      }, BATCH_TIMEOUT_MS);
+    }
+  }


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Consider propagating flush errors to prevent silent data loss.

The timed flush (line 44) catches and logs errors without propagating them. If flushes consistently fail due to network or database issues, results will accumulate in the buffer and be lost without alerting the coordinator.

Consider tracking flush failures and failing the run or at least emitting a warning after N consecutive failures.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/burst-100k/sinks/postgres.ts` around lines 38 - 47, The timed flush handler in write (references: write, flushTimer, flush, buffer, BATCH_SIZE, BATCH_TIMEOUT_MS) swallows errors via .catch and only logs them, risking silent data loss; modify the handler to track consecutive failures (e.g., a failureCounter on the PostgresSink instance), increment it on each flush rejection and emit a warning or call a provided error callback / trigger run-failure once the counter exceeds a threshold N, and reset the counter on a successful flush; alternatively allow the error to propagate by rethrowing from the timer callback or invoking the sink's error handler so the coordinator can react.

Coordinator tallies per-status counts during the burst and writes them to new columns on runs (timeouts, http_errors, network_errors) plus an error_histogram object in Tigris meta.json. Schema migration is idempotent (ALTER TABLE ADD COLUMN IF NOT EXISTS), so re-running the launch script catches up existing DBs.

Coordinator now tracks every sandbox's start/end timestamps and builds an interval-overlap sweep at run-end. Writes concurrency_summary (peak_concurrent, peak_t_ms, mean_concurrent, total_run_ms) and a 1Hz concurrency_timeline to Tigris meta.json. Lets us tell whether the ramp actually behaved and where the burst saturates.

Runner reflects every primitive prop off the adapter's returned sandbox object (skipping credential-shaped keys) and stores the result as a JSONB column on sandbox_results and as a field in Tigris raw.jsonl. Verified on e2b and runloop — both expose { provider, sandboxId }, which lets us cross-reference against provider dashboards. Schema migration is idempotent.

Coordinator samples every 5s (process CPU, memory, event-loop lag percentiles, load averages, /proc/self/fd count, /proc/net/sockstat) into <run_id>/metrics.jsonl. Uploaded on every 30s heartbeat for partial-result durability plus a final flush at shutdown. Headline peaks land in meta.json.metrics_summary for at-a-glance review.

Noah Kiser added 2 commits May 13, 2026 09:59

Noah Kiser added 6 commits May 13, 2026 14:12

update tigris fro storage

d80fcd1

add modal + data inventory md

7456b0a

add runloop to burst-100k

6c8b31e

add latency histogram data to tigris

63c19b3

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

Noah Kiser added 4 commits May 14, 2026 14:06

		options: [e2b, modal, daytona, codesandbox] # only providers opted in
		# schedule trigger deliberately omitted for v1 — see open questions

Conversation

kisernl commented May 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Browser Benchmark Results

Uh oh!

github-actions Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Sandbox Benchmark Results

Sequential

Staggered

Burst

Uh oh!

github-actions Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Storage Benchmark Results

1MB Files

4MB Files

10MB Files

16MB Files

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kisernl commented May 13, 2026 •

edited by coderabbitai Bot

Loading

github-actions Bot commented May 13, 2026 •

edited

Loading

github-actions Bot commented May 13, 2026 •

edited

Loading

github-actions Bot commented May 13, 2026 •

edited

Loading

coderabbitai Bot commented May 13, 2026 •

edited

Loading