Add ARCHITECTURE.md - complete system overview #14
Conversation
1094 lines total: - src/core/scent.rs (645 lines) - Implementation - docs/SCENT_INDEX.md (449 lines) - Architecture documentation ## Core Concept Scent = compressed representative of a bucket (5 bytes) 256 scents = 1.25 KB = fits in L1 cache Query path: 1. SIMD scan 1.25 KB of scents (~50 ns) 2. Eliminate 99%+ of corpus 3. Full Hamming only on matching buckets ## Scale | Depth | Buckets | Scent Index | Coverage/Leaf | |-------|---------|-------------|---------------| | 1 | 256 | 1.25 KB | 27 TB | | 2 | 65,536 | 320 KB | 107 GB | | 3 | 16.7M | 80 MB | 420 MB | 7 PB search: ~100 ns to eliminate 99.997% ## Ada Cognitive Integration Scent nodes carry: - plasticity (learning rate for region) - decision (cached classification) - last_access (attention tracking) One scent update = millions of fingerprints affected. Ada thinks in scent regions, not individual fingerprints. Consciousness layers map to scent depth: - L0-2 (SUBSTRATE, FELT_CORE, BODY): leaf fingerprints - L3-6 (QUALIA, VOLITION, GESTALT, META): scent nodes ## Why Not Tree Tree: 8 levels × pointer chase × cache miss = ~800 ns Scent: 2 flat scans × L1 cache hit = ~100 ns Same bucket structure. Headers are free. Scent is just metadata.
Unified documentation covering: - 64-bit CAM index (type namespace + fingerprint prefix) - Hierarchical scent filtering (petabyte scale) - Cognition layers 3-6 on scent nodes - Lance/Arrow integration - Immutability guarantees - Query flow examples
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fd21700754
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| /// Assign fingerprint to (l1, l2) bucket | ||
| pub fn assign(&self, fp: &[u8]) -> (u8, u8) { | ||
| let l1 = fp[0]; | ||
| let l2 = fp[1]; | ||
| (l1, l2) |
There was a problem hiding this comment.
Guard against short fingerprints in assign
Both assign methods index into fp (fp[0]/fp[1]) without checking length, and ScentIndex::on_append calls these directly. Because the API accepts &[u8] (and extract_scent explicitly tolerates short slices), a malformed or external input shorter than 1–2 bytes will panic with an out‑of‑bounds access instead of returning an error. Consider validating length or using a fixed-size type to prevent runtime crashes.
Useful? React with 👍 / 👎.
| pub fn flatten(&self) -> u32 { | ||
| match self { | ||
| BucketAddr::L1(a) => *a as u32, | ||
| BucketAddr::L2(a, b) => ((*a as u32) << 8) | (*b as u32), | ||
| BucketAddr::L3(a, b, c) => ((*a as u32) << 16) | ((*b as u32) << 8) | (*c as u32), |
There was a problem hiding this comment.
Make BucketAddr::flatten depth-safe
flatten does not encode the address depth, so L1(0x12), L2(0x00, 0x12), and L3(0x00, 0x00, 0x12) all collapse to the same u32. The doc comment says it is for hashing/comparison, which means mixing bucket addresses from different depths will collide and be treated as identical keys, leading to incorrect lookups or aggregations.
Useful? React with 👍 / 👎.
Unified documentation covering: