Benchmarks, optimization techniques, and tuning advice.
All benchmarks run on:
- CPU: AMD EPYC 7763 (64 cores) / Apple M2 (8 cores)
- RAM: 256GB DDR4 / 16GB unified
- OS: Ubuntu 22.04 / macOS 14
- Python: 3.11 with Numba 0.58
Single Query (10K corpus):
| Implementation | Latency | Throughput |
|---|---|---|
| Pure Python | 100 ms | 100 queries/sec |
| NumPy vectorized | 5 ms | 200 queries/sec |
| Numba (no SIMD) | 500 μs | 2,000 queries/sec |
| LadybugDB (AVX-512) | 150 μs | 6,600 queries/sec |
Batch Throughput (comparisons/sec):
| Corpus Size | LadybugDB | NumPy | Speedup |
|---|---|---|---|
| 1K | 60M/sec | 2M/sec | 30x |
| 10K | 65M/sec | 1.8M/sec | 36x |
| 100K | 65M/sec | 1.5M/sec | 43x |
| 1M | 64M/sec | 1.2M/sec | 53x |
5-hop path query:
| Nodes | Edges | LadybugDB | Neo4j | Speedup |
|---|---|---|---|---|
| 1K | 5K | 0.5 ms | 2 ms | 4x |
| 10K | 50K | 2 ms | 8 ms | 4x |
| 100K | 500K | 15 ms | 80 ms | 5x |
| 1M | 5M | 120 ms | 800 ms | 7x |
Note: Neo4j benchmark uses Enterprise Edition with warm cache.
Encoding comparison (1M rows):
| Data Type | Encoding | Ratio | Encode | Decode |
|---|---|---|---|---|
| String labels (100 unique) | Dictionary | 47x | 50 ms | 20 ms |
| Timestamps | Delta + FOR | 12x | 30 ms | 15 ms |
| Sparse flags (5% set) | RLE | 89x | 20 ms | 10 ms |
| Small integers (0-1000) | FOR | 4x | 25 ms | 12 ms |
| Random floats | Plain | 1x | 5 ms | 5 ms |
| Query Type | p50 | p99 | Max |
|---|---|---|---|
| SQL simple filter | 0.5 ms | 2 ms | 10 ms |
| SQL aggregation | 2 ms | 8 ms | 50 ms |
| Cypher 1-hop | 1 ms | 5 ms | 20 ms |
| Cypher 3-hop | 5 ms | 20 ms | 100 ms |
| Resonance (10K corpus) | 0.2 ms | 0.5 ms | 2 ms |
| Resonance (1M corpus) | 15 ms | 25 ms | 50 ms |
| Vector ANN (10K corpus) | 1 ms | 3 ms | 10 ms |
from ladybugdb.core import HammingEngine
# Create engine once at startup
engine = HammingEngine()
# Index your corpus
engine.index(corpus) # Does JIT compilation on first call
# Warm up (triggers Numba compilation)
_ = engine.search(corpus[0], k=1)
# Now searches are fast
for query in queries:
result = engine.search(query, k=10) # ~150μs each# ❌ Slow: Individual searches
for query in queries:
result = engine.search(query, k=10)
# ✅ Fast: Batch search
results = engine.batch_search(queries, k=10) # Vectorizedfrom ladybugdb.core import HammingEngine, BufferPool
# For large batches, increase buffer pool
pool = BufferPool(max_batch=100_000) # Default: 10_000
engine = HammingEngine(pool=pool)# ❌ Too permissive: Returns many results, slow post-processing
results = db.resonate(fp, threshold=0.3)
# ✅ Targeted: Fewer, more relevant results
results = db.resonate(fp, threshold=0.7, limit=10)# For frequently filtered columns, create explicit index
db.sql("CREATE INDEX idx_qidx ON nodes(qidx)")
db.sql("CREATE INDEX idx_label ON nodes(label)")# ❌ Selects all columns (including large fingerprint)
db.sql("SELECT * FROM nodes WHERE label = 'Thought'")
# ✅ Selects only needed columns
db.sql("SELECT id, content FROM nodes WHERE label = 'Thought'")# ❌ Unbounded: Can explode on dense graphs
db.cypher("MATCH (a)-[:KNOWS*]->(b) RETURN b")
# ✅ Bounded: Predictable performance
db.cypher("MATCH (a)-[:KNOWS*1..5]->(b) RETURN b")from ladybugdb.compat import ColumnStore
store = ColumnStore(chunk_size=10000)
store.add_column("timestamp", timestamps)
store.add_column("value", values)
# Only decompresses relevant chunks
chunks = store.get_chunks("value", min_val=100, max_val=200)Each fingerprint uses 1,256 bytes (157 × 8 bytes).
| Corpus Size | Fingerprint Memory |
|---|---|
| 10K | 12 MB |
| 100K | 120 MB |
| 1M | 1.2 GB |
| 10M | 12 GB |
To reduce memory:
# Use memory-mapped files for large corpora
engine = HammingEngine(mmap=True)
engine.index_from_file("corpus.npy")If using dense embeddings alongside fingerprints:
| Corpus Size | Embedding (1024D) | Total |
|---|---|---|
| 10K | 40 MB | 52 MB |
| 100K | 400 MB | 520 MB |
| 1M | 4 GB | 5.2 GB |
| Original | Dictionary Encoded | Savings |
|---|---|---|
| 100 MB strings | 2 MB indices + 1 KB dict | 98% |
| 50 MB integers | 6 MB FOR-encoded | 88% |
| 200 MB timestamps | 16 MB delta-encoded | 92% |
LadybugDB auto-detects CPU capabilities:
from ladybugdb.core import get_simd_level
level = get_simd_level()
# Returns: 'avx512', 'avx2', 'sse4', or 'scalar'import numba
# Set thread count for parallel operations
numba.set_num_threads(8) # Default: all cores
# For I/O bound workloads, reduce threads
numba.set_num_threads(4)Fingerprint comparison is memory-bound. For best performance:
- L3 Cache: Keep working set < L3 size (typically 32-64 MB)
- Prefetching: Numba auto-prefetches for sequential access
- NUMA: Pin threads to local memory nodes
# For NUMA systems
import os
os.environ["NUMBA_NUM_THREADS"] = "32" # One socket
os.environ["OMP_PROC_BIND"] = "close"from ladybugdb import connect
db = connect("./mydb", profile=True)
result = db.resonate(fp, threshold=0.6)
print(db.last_profile)
# {
# 'parse_time_ms': 0.1,
# 'plan_time_ms': 0.2,
# 'execute_time_ms': 0.15,
# 'total_time_ms': 0.45,
# 'rows_scanned': 10000,
# 'rows_returned': 42
# }from numba import config
config.NUMBA_PROFILE = 1
# Run your code...
# View profile
from numba.core.runtime import nrt
nrt.memsys.print_stats()from memory_profiler import profile
@profile
def search_benchmark():
for _ in range(1000):
engine.search(query, k=10)
search_benchmark()| Corpus Size | RAM Required | Recommended Setup |
|---|---|---|
| < 100K | 1 GB | Any modern laptop |
| 100K - 1M | 8 GB | Desktop with 16 GB |
| 1M - 10M | 32 GB | Server with 64 GB |
| 10M - 100M | 256 GB | High-memory server |
For corpora > 100M vectors, horizontal scaling planned:
┌─────────────────────────────────────────────────────────────┐
│ Query Coordinator │
└──────────────────────────┬──────────────────────────────────┘
│
┌─────────────────┼─────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Shard 1 │ │ Shard 2 │ │ Shard 3 │
│ (0-33M) │ │ (33M-66M) │ │ (66M-100M) │
└─────────────┘ └─────────────┘ └─────────────┘
| Metric | LadybugDB | Pinecone |
|---|---|---|
| Latency (10K) | 0.15 ms | 50 ms |
| Cost | Free | $70/mo |
| Self-hosted | ✅ | ❌ |
| Binary fingerprints | ✅ Native | ❌ |
| Graph queries | ✅ Cypher | ❌ |
| Metric | LadybugDB | Neo4j |
|---|---|---|
| 5-hop query (100K nodes) | 15 ms | 80 ms |
| Resonance search | ✅ Native | ❌ |
| SQL analytics | ✅ DuckDB | ❌ |
| Memory footprint | 500 MB | 4 GB |
| Metric | LadybugDB | Redis Stack |
|---|---|---|
| Hamming distance | 65M/sec | ~1M/sec |
| Persistence | ✅ Lance | ✅ RDB/AOF |
| Graph traversal | ✅ CTE | ❌ |
| Compression | ✅ Auto | ❌ |
Run benchmarks yourself: python -m ladybugdb.benchmarks