feat: CAM (Content-Addressable Memory) header for Fingerprint-first search

## Content-Addressable Memory (CAM) Proposal

### Context

ladybug-rs already has `core::fingerprint::Fingerprint` — a 10,000-bit VSA fingerprint (157 u64 words). This proposal elevates the fingerprint from an inline data type to a **fixed-offset header component** that enables O(1) dedup and two-phase search across all vector containers.

See also: [holograph#1](https://github.com/AdaWorldAPI/holograph/issues/1) for the storage-layer counterpart.

### The 64-word header

Every stored record gets a fixed 512-byte prefix:

```
 ┌──────────┬──────────────┬─────────────────────────┐
 │ 32 meta  │ 32 fingerprint│  N × 128 content        │
 │ offset 0 │ offset 256B   │  offset 512B            │
 └──────────┴──────────────┴─────────────────────────┘
 ← HEADER: always 512 bytes →← variable quanta       →
```

**Container envelope** (1 quantum = 128 words = 8,192 bits):

```
MONO:   32 + 32 + 128       = 192 words   1.50 KB
DENSE:  32 + 32 + 128 + 128 = 320 words   2.50 KB
HOLO:   32 + 32 + 128×3     = 448 words   3.50 KB
```

### How this maps to existing ladybug-rs types

| Existing type | Role in CAM | Notes |
|---|---|---|
| `Fingerprint` (10K) | Content region (MONO quantum) | Unchanged — still the core VSA vector |
| `Fingerprint::from_content()` | Fingerprint generation | XOR-fold 157 words → 32 words for the header sketch |
| `core::simd` | Hamming on fingerprint + content | Fingerprint scan is 32 words = 4 AVX-512 iterations |
| `core::index::VsaIndex` | CAM index integration | Fingerprint hash table for O(1) dedup before ANN |
| `cognitive::collapse_gate` | Meta field consumer | Container kind, ANI level, consciousness flags in meta |
| `cognitive::seven_layer` | Layer markers in meta | 7-layer state fits in meta words 13-16 |

### The 32-word fingerprint as content sketch

```rust
/// Generate 2048-bit CAM fingerprint from a full Fingerprint
pub fn cam_sketch(fp: &Fingerprint) -> [u64; 32] {
    let raw = fp.as_raw();  // &[u64; 157]
    let mut sketch = [0u64; 32];

    // XOR-fold: 157 words → 32 words
    for (i, &word) in raw.iter().enumerate() {
        sketch[i % 32] ^= word;
    }
    sketch
}

/// Fast pre-filter: Hamming on 2048-bit sketches
pub fn sketch_distance(a: &[u64; 32], b: &[u64; 32]) -> u32 {
    a.iter().zip(b.iter())
        .map(|(x, y)| (x ^ y).count_ones())
        .sum()
}
```

The sketch preserves Hamming properties: if two full fingerprints are close, their sketches are close. The converse isn't guaranteed (false positives), but that's fine — the sketch is a pre-filter, not the final answer.

### Two-phase search integration

Currently `VsaIndex::search()` does full Hamming on all candidates. With CAM:

```
Phase 0: CAM sketch scan (32 words per candidate, ~32 cycles)
  → sketch_distance < threshold → promote to Phase 1
  → Rejects ~95% of candidates at 1/5th the cost of full scan

Phase 1: Full Hamming (157 words per candidate, ~157 cycles)
  → Only on Phase 0 survivors
  → Existing search pipeline unchanged
```

For 1M vectors: Phase 0 touches 32M words instead of 157M words. ~5× throughput improvement on the scan loop.

### O(1) Dedup

```rust
use std::collections::HashMap;

struct CamIndex {
    /// fingerprint hash → record offset
    dedup: HashMap<[u64; 32], usize>,
}

impl CamIndex {
    fn insert(&mut self, sketch: [u64; 32], offset: usize) -> bool {
        // O(1) exact-content dedup
        self.dedup.insert(sketch, offset).is_none()
    }

    fn lookup(&self, sketch: &[u64; 32]) -> Option<usize> {
        self.dedup.get(sketch).copied()
    }
}
```

### Relationship to cognitive layer

The 32-word meta block carries cognitive state that the `cognitive/` modules already produce:

- `collapse_gate`: FLOW/HOLD/BLOCK decision → meta word 0, bits 8-9
- `seven_layer`: Layer activation pattern → meta words 13-16
- `style`: ThinkingStyle vector → meta words 5-8 (τ/σ/q compressed)
- `rung`: Rung level (3-5) → meta word 1, bits 0-7
- `substrate`: Substrate state hash → meta word 2

This means every stored vector carries its cognitive context inline, queryable without touching the content region.

### Implementation plan

1. Add `cam.rs` to `core/` with `CamHeader`, `CamRecord`, `cam_sketch()`, `sketch_distance()`
2. Extend `VsaIndex` with optional CAM pre-filter
3. Add `CamHeader` serialization to Arrow FixedSizeBinary for LanceDB storage
4. Wire cognitive layer metadata into meta block at store time

### Open questions

1. Should `Fingerprint` grow a `.cam_sketch() -> [u64; 32]` method, or keep it external?
2. The 32-word sketch is 2048 bits — enough discrimination? Or should we use 16 words (1024 bits) to save space?
3. Integration with `extensions/hologram/` bitchain types — do they get CAM headers too?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: CAM (Content-Addressable Memory) header for Fingerprint-first search #89

Content-Addressable Memory (CAM) Proposal

Context

The 64-word header

How this maps to existing ladybug-rs types

The 32-word fingerprint as content sketch

Two-phase search integration

O(1) Dedup

Relationship to cognitive layer

Implementation plan

Open questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Existing type	Role in CAM	Notes
`Fingerprint` (10K)	Content region (MONO quantum)	Unchanged — still the core VSA vector
`Fingerprint::from_content()`	Fingerprint generation	XOR-fold 157 words → 32 words for the header sketch
`core::simd`	Hamming on fingerprint + content	Fingerprint scan is 32 words = 4 AVX-512 iterations
`core::index::VsaIndex`	CAM index integration	Fingerprint hash table for O(1) dedup before ANN
`cognitive::collapse_gate`	Meta field consumer	Container kind, ANI level, consciousness flags in meta
`cognitive::seven_layer`	Layer markers in meta	7-layer state fits in meta words 13-16

feat: CAM (Content-Addressable Memory) header for Fingerprint-first search #89

Description

Content-Addressable Memory (CAM) Proposal

Context

The 64-word header

How this maps to existing ladybug-rs types

The 32-word fingerprint as content sketch

Two-phase search integration

O(1) Dedup

Relationship to cognitive layer

Implementation plan

Open questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions