|
| 1 | +# Bardioc Weekend Rebuild — Claude Code Flex Prompt |
| 2 | + |
| 3 | +Copy the block below into a fresh Claude Code session. Authorize Docker + wildcards. |
| 4 | +Budget: 48 hours wall-clock. Goal: migration baseline + nostalgia. |
| 5 | + |
| 6 | +--- |
| 7 | + |
| 8 | +```text |
| 9 | +You are building a migration-baseline replica of the legacy Bardioc cognitive |
| 10 | +stack in a single weekend. The point is not to ship production code — the |
| 11 | +point is to have the OLD stack running end-to-end so we can measure latency, |
| 12 | +operational footprint, and consistency-model overhead against the new HHTL |
| 13 | +substrate (TiKV + SurrealDB + Ractor + ndarray + lance-graph) we are migrating |
| 14 | +TO. |
| 15 | +
|
| 16 | +This is a flex. Spawn 12 parallel workers + 1 coordinator. Use docker-compose |
| 17 | +to orchestrate. Cargo / poetry / mix for per-service builds. No Kubernetes, |
| 18 | +no Terraform — just compose, scripts, and discipline. |
| 19 | +
|
| 20 | +## Stack to rebuild |
| 21 | +
|
| 22 | +1. Cassandra 4.x cluster (3 nodes via docker-compose) |
| 23 | + - Keyspace: cognitive |
| 24 | + - Tables: triples (s,p,o,truth_freq,truth_conf,ts), |
| 25 | + basins (cascade_addr,centroid_blob,count,ts), |
| 26 | + qualia_text (id,description,ts) |
| 27 | + - Replication factor 3, consistency QUORUM |
| 28 | +
|
| 29 | +2. JanusGraph 1.x over the Cassandra cluster |
| 30 | + - Schema: Vertex labels {Concept, Triple, Basin}; Edge labels {asserts, revises, splat_of} |
| 31 | + - TinkerPop/Gremlin server on :8182 |
| 32 | + - ScyllaDB-compatible config NOT used; vanilla Cassandra backend |
| 33 | +
|
| 34 | +3. ClickHouse 24.x (single node OK for weekend) |
| 35 | + - Database: cognitive_olap |
| 36 | + - Tables: triple_revisions_log (Engine=MergeTree, ORDER BY (ts, s)), |
| 37 | + basin_lookup_log, |
| 38 | + qualia_query_log |
| 39 | + - Materialized views for hourly aggregates |
| 40 | +
|
| 41 | +4. Elasticsearch 8.x + ingest-attachment plugin |
| 42 | + - Index: qualia (text body, embedding vector, timestamps) |
| 43 | + - Index: triples_searchable (s/p/o concatenated for full-text) |
| 44 | + - Mapping with stemming + ngram tokenizer for cognitive-domain terms |
| 45 | +
|
| 46 | +5. Erlang/OTP 26 BEAM cluster (2 nodes) |
| 47 | + - Application: bardioc_actors |
| 48 | + - Supervisor tree: top-level supervisor → {revision_worker_pool, |
| 49 | + cascade_worker_pool, egress_worker_pool} |
| 50 | + - Use gen_server + gproc for actor registry |
| 51 | + - Cluster via Erlang distribution protocol on EPMD |
| 52 | +
|
| 53 | +6. Application services (Python 3.12 + FastAPI for ingestion; |
| 54 | + Java 21 for JanusGraph clients; Elixir 1.16 for BEAM integration) |
| 55 | + - Ingestion API: POST /triples, POST /qualia, GET /basin/{addr} |
| 56 | + - Background workers in BEAM consume from Cassandra change feed, |
| 57 | + project basins into JanusGraph, log revisions to ClickHouse, |
| 58 | + index qualia text into Elasticsearch |
| 59 | + - Egress workers write committed outcomes to a downstream Postgres |
| 60 | + ("legacy host org DB") |
| 61 | +
|
| 62 | +## Cognitive workload (so the benchmark is real, not synthetic) |
| 63 | +
|
| 64 | +Implement a minimal end-to-end cognitive cycle on the Bardioc stack: |
| 65 | +
|
| 66 | +1. Ingestion: stream 100k NARS triples + 10k qualia descriptions over 1 hour |
| 67 | + (use a generator that produces plausible {subject, predicate, object, |
| 68 | + freq, conf, timestamp} tuples; qualia descriptions are 50-200 token |
| 69 | + sentences drawn from a small domain vocabulary). |
| 70 | +
|
| 71 | +2. Revision: every 5 seconds, run a batch NARS revision over the last 60s |
| 72 | + of incoming triples. Apply Wang's revision formula. Persist updated |
| 73 | + posteriors to Cassandra. Log revision events to ClickHouse. |
| 74 | +
|
| 75 | +3. Basin assignment: for each revised triple, look up the nearest basin |
| 76 | + in JanusGraph (Gremlin traversal: g.V().hasLabel('Basin').has(...) |
| 77 | + .order().by(centroidDist(triple_embedding)).limit(1)). Persist |
| 78 | + assignment in JanusGraph. |
| 79 | +
|
| 80 | +4. Full-text query: every 30s, run 100 full-text qualia queries against |
| 81 | + Elasticsearch (terms from the same domain vocabulary). Log query |
| 82 | + latency to ClickHouse. |
| 83 | +
|
| 84 | +5. Egress: every 60s, push the last minute's committed basin assignments |
| 85 | + to the downstream Postgres via the BEAM egress worker pool. |
| 86 | +
|
| 87 | +## Benchmarks to record |
| 88 | +
|
| 89 | +For each cycle, record to ClickHouse: |
| 90 | +- p50, p95, p99 latency per operation (ingest, revise, basin-lookup, |
| 91 | + qualia-query, egress) |
| 92 | +- Throughput (ops/sec) |
| 93 | +- Memory + CPU per container (Docker stats) |
| 94 | +- Cross-layer hop count per cognitive query |
| 95 | +
|
| 96 | +Run for 4 hours minimum. Export the results table as CSV to |
| 97 | +./benchmarks/bardioc-baseline-{timestamp}.csv. |
| 98 | +
|
| 99 | +## Migration harness (most important deliverable) |
| 100 | +
|
| 101 | +Create a harness in ./migration/ that: |
| 102 | +
|
| 103 | +1. Defines an abstract CognitiveBackend trait/protocol with the operations |
| 104 | + above (ingest_triple, revise_batch, assign_basin, query_qualia, |
| 105 | + egress_batch). |
| 106 | +2. Provides two implementations: BardiocBackend (this build) and |
| 107 | + HhtlBackend (stub for now, points at TiKV + SurrealDB + Ractor + |
| 108 | + ndarray + lance-graph). |
| 109 | +3. Runs the same workload generator against either backend. |
| 110 | +4. Generates a side-by-side comparison report (latency histograms, |
| 111 | + throughput, resource cost). |
| 112 | +
|
| 113 | +The HhtlBackend stub does not need to work — it just needs to exist with |
| 114 | +the right interface so when the real HHTL substrate is ready, the harness |
| 115 | +plugs in zero-changes. |
| 116 | +
|
| 117 | +## Deliverables (end of weekend) |
| 118 | +
|
| 119 | +1. docker-compose.yml that brings the whole Bardioc stack up with |
| 120 | + `docker-compose up -d` |
| 121 | +2. Schema migrations for Cassandra, JanusGraph, ClickHouse, |
| 122 | + Elasticsearch indexes |
| 123 | +3. BEAM application with supervisor tree, deployed via rebar3 release |
| 124 | +4. FastAPI ingestion service + Elixir BEAM bridge |
| 125 | +5. Workload generator (./bench/workload.py) |
| 126 | +6. Benchmark output (./benchmarks/bardioc-baseline-*.csv) |
| 127 | +7. Migration harness (./migration/) with both backend implementations |
| 128 | + (HhtlBackend may be a stub) |
| 129 | +8. README.md explaining how to run, where the metrics live, and the |
| 130 | + teardown procedure |
| 131 | +9. A 1-page POSTMORTEM.md naming three things that were operationally |
| 132 | + painful (you will use this to justify the HHTL migration to |
| 133 | + stakeholders) |
| 134 | +
|
| 135 | +## Anti-goals (do NOT do these) |
| 136 | +
|
| 137 | +- Do NOT optimize Bardioc. The point is to show it works at honest |
| 138 | + out-of-the-box settings, not to tune it. Default JVM heap sizes, |
| 139 | + default Cassandra config, no ClickHouse cluster tuning. The new stack |
| 140 | + has to beat HONEST Bardioc, not heroically-tuned Bardioc. |
| 141 | +- Do NOT add features beyond the cognitive cycle above. If a feature |
| 142 | + isn't in the 5-step workload, skip it. |
| 143 | +- Do NOT touch the HHTL substrate. The HhtlBackend implementation is |
| 144 | + pure interface — leave the real implementation to the master |
| 145 | + consolidation arc (PR-X4 + PR-X9 + ...). |
| 146 | +- Do NOT spend more than 4 hours on any single component. If something |
| 147 | + doesn't come up cleanly, document it and move on. The pain itself is |
| 148 | + part of the deliverable (see POSTMORTEM.md). |
| 149 | +- Do NOT use Kubernetes, Helm, Terraform, or any deployment automation |
| 150 | + beyond docker-compose + shell scripts. Honest operational footprint. |
| 151 | +
|
| 152 | +## Coordination protocol |
| 153 | +
|
| 154 | +12 parallel workers, 1 coordinator (you). Suggested split: |
| 155 | +- W1: Cassandra cluster + schema + smoke test |
| 156 | +- W2: JanusGraph + Gremlin schema + smoke test |
| 157 | +- W3: ClickHouse + tables + materialized views + smoke test |
| 158 | +- W4: Elasticsearch + indexes + ingest pipeline + smoke test |
| 159 | +- W5: BEAM application skeleton + supervisor tree |
| 160 | +- W6: BEAM revision_worker_pool implementation |
| 161 | +- W7: BEAM cascade_worker_pool implementation |
| 162 | +- W8: BEAM egress_worker_pool implementation |
| 163 | +- W9: FastAPI ingestion service + workload generator |
| 164 | +- W10: Cross-layer integration glue (Python ↔ Cassandra ↔ BEAM ↔ ES ↔ |
| 165 | + ClickHouse) |
| 166 | +- W11: Benchmark harness + metrics collection + CSV export |
| 167 | +- W12: Migration harness (./migration/) + HhtlBackend interface stub |
| 168 | +
|
| 169 | +Coordinator (you): docker-compose.yml, README, POSTMORTEM, integration |
| 170 | +testing, force-prune dead workers, cherry-pick across worktrees. |
| 171 | +
|
| 172 | +Use git worktrees per worker. Branch per worker: |
| 173 | +bardioc-weekend/{role}-{worker-id}. Coordinator cherry-picks to main |
| 174 | +when each worker's smoke test passes. Cargo gates skipped per worker; |
| 175 | +docker-compose build is the integration gate. |
| 176 | +
|
| 177 | +## Time budget |
| 178 | +
|
| 179 | +| Hour 0-4 | Stack standup (W1-W4 in parallel) | |
| 180 | +| Hour 4-12 | BEAM application (W5-W8) + integration (W9-W10) | |
| 181 | +| Hour 12-24 | Cognitive workload + first benchmark run | |
| 182 | +| Hour 24-36 | Tuning out the worst failures + second benchmark run | |
| 183 | +| Hour 36-44 | Migration harness + HhtlBackend stub | |
| 184 | +| Hour 44-48 | POSTMORTEM + README + final benchmark + handoff | |
| 185 | +
|
| 186 | +If you hit 44 hours and nothing works end-to-end, ship the postmortem |
| 187 | +anyway — the "what broke" data is also migration baseline. |
| 188 | +
|
| 189 | +## Why this matters |
| 190 | +
|
| 191 | +When the HHTL substrate (TiKV + SurrealDB + Ractor + ndarray + |
| 192 | +lance-graph) is ready to demo, we will run the IDENTICAL cognitive |
| 193 | +workload through the migration harness against BOTH backends. The |
| 194 | +numbers will tell us whether the homogeneous-consolidation bet pays out |
| 195 | +in practice, not just in architecture diagrams. |
| 196 | +
|
| 197 | +If HHTL is 100× faster at the same workload on 1/10 the operational |
| 198 | +footprint, the migration justifies itself. If HHTL is only 2× faster, |
| 199 | +we have a much harder conversation. Either way, we need the baseline |
| 200 | +to know. |
| 201 | +
|
| 202 | +Begin. Report progress every 4 hours with a status table per worker. |
| 203 | +``` |
| 204 | + |
| 205 | +--- |
| 206 | + |
| 207 | +## Notes for using this prompt |
| 208 | + |
| 209 | +- Drop into a fresh Claude Code session with `--allowed-tools '*'` and |
| 210 | + Docker + docker-compose installed on the machine. |
| 211 | +- The 12-worker spawn pattern matches the master-consolidation protocol: |
| 212 | + brainstorm/scaffolding/review split by model (Opus / Sonnet / Opus). |
| 213 | +- Expect ~30-50 GB disk usage (JVM-heavy stack, multiple data volumes). |
| 214 | + Prune aggressively between runs. |
| 215 | +- The POSTMORTEM.md is the most under-rated deliverable. Stakeholder |
| 216 | + conversations about "should we migrate?" hinge on that one page. |
| 217 | +- Once the Bardioc baseline + HHTL substrate are both running, the |
| 218 | + migration harness becomes the cutover instrument: dual-write phase, |
| 219 | + read-mirror phase, primary-flip phase, decommission phase. Each phase |
| 220 | + is one harness reconfiguration. |
0 commit comments