Skip to content

Add ARCHITECTURE.md - complete system overview #14

Open
AdaWorldAPI wants to merge 2 commits intomainfrom
feature/scent-index
Open

Add ARCHITECTURE.md - complete system overview #14
AdaWorldAPI wants to merge 2 commits intomainfrom
feature/scent-index

Conversation

@AdaWorldAPI
Copy link
Owner

Unified documentation covering:

  • 64-bit CAM index (type namespace + fingerprint prefix)
  • Hierarchical scent filtering (petabyte scale)
  • Cognition layers 3-6 on scent nodes
  • Lance/Arrow integration
  • Immutability guarantees
  • Query flow examples

1094 lines total:
- src/core/scent.rs (645 lines) - Implementation
- docs/SCENT_INDEX.md (449 lines) - Architecture documentation

## Core Concept

Scent = compressed representative of a bucket (5 bytes)
256 scents = 1.25 KB = fits in L1 cache

Query path:
1. SIMD scan 1.25 KB of scents (~50 ns)
2. Eliminate 99%+ of corpus
3. Full Hamming only on matching buckets

## Scale

| Depth | Buckets | Scent Index | Coverage/Leaf |
|-------|---------|-------------|---------------|
| 1     | 256     | 1.25 KB     | 27 TB         |
| 2     | 65,536  | 320 KB      | 107 GB        |
| 3     | 16.7M   | 80 MB       | 420 MB        |

7 PB search: ~100 ns to eliminate 99.997%

## Ada Cognitive Integration

Scent nodes carry:
- plasticity (learning rate for region)
- decision (cached classification)
- last_access (attention tracking)

One scent update = millions of fingerprints affected.
Ada thinks in scent regions, not individual fingerprints.

Consciousness layers map to scent depth:
- L0-2 (SUBSTRATE, FELT_CORE, BODY): leaf fingerprints
- L3-6 (QUALIA, VOLITION, GESTALT, META): scent nodes

## Why Not Tree

Tree: 8 levels × pointer chase × cache miss = ~800 ns
Scent: 2 flat scans × L1 cache hit = ~100 ns

Same bucket structure. Headers are free. Scent is just metadata.
Unified documentation covering:
- 64-bit CAM index (type namespace + fingerprint prefix)
- Hierarchical scent filtering (petabyte scale)
- Cognition layers 3-6 on scent nodes
- Lance/Arrow integration
- Immutability guarantees
- Query flow examples
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fd21700754

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +234 to +238
/// Assign fingerprint to (l1, l2) bucket
pub fn assign(&self, fp: &[u8]) -> (u8, u8) {
let l1 = fp[0];
let l2 = fp[1];
(l1, l2)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard against short fingerprints in assign

Both assign methods index into fp (fp[0]/fp[1]) without checking length, and ScentIndex::on_append calls these directly. Because the API accepts &[u8] (and extract_scent explicitly tolerates short slices), a malformed or external input shorter than 1–2 bytes will panic with an out‑of‑bounds access instead of returning an error. Consider validating length or using a fixed-size type to prevent runtime crashes.

Useful? React with 👍 / 👎.

Comment on lines +485 to +489
pub fn flatten(&self) -> u32 {
match self {
BucketAddr::L1(a) => *a as u32,
BucketAddr::L2(a, b) => ((*a as u32) << 8) | (*b as u32),
BucketAddr::L3(a, b, c) => ((*a as u32) << 16) | ((*b as u32) << 8) | (*c as u32),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make BucketAddr::flatten depth-safe

flatten does not encode the address depth, so L1(0x12), L2(0x00, 0x12), and L3(0x00, 0x00, 0x12) all collapse to the same u32. The doc comment says it is for hashing/comparison, which means mixing bucket addresses from different depths will collide and be treated as identical keys, leading to incorrect lookups or aggregations.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant