A high-performance B+ tree implementation for Python with competitive performance against highly optimized libraries like SortedDict.
Option 1: Install from source (current)
git clone https://github.com/KentBeck/BPlusTree.git
cd BPlusTree/python
pip install -e .Option 2: Install from PyPI (coming soon)
pip install bplustree- Python 3.8 or higher
- C compiler (for C extension, optional)
The library automatically selects the best available implementation:
- C Extension (preferred): 2-4x faster, used automatically if available
- Pure Python: Fallback implementation, no compilation required
Check which implementation is being used:
from bplustree import get_implementation
print(get_implementation()) # "C extension" or "Pure Python"from bplustree import BPlusTreeMap
# Create a B+ tree
tree = BPlusTreeMap(capacity=128) # Higher capacity = better performance
# Insert data
tree[1] = "one"
tree[3] = "three"
tree[2] = "two"
# Lookups
print(tree[2]) # "two"
print(len(tree)) # 3
print(2 in tree) # True
# Range queries
for key, value in tree.range(1, 3):
print(f"{key}: {value}")
# Iteration
for key, value in tree.items():
print(f"{key}: {value}")Our benchmarks against SortedDict show significant advantages in specific scenarios:
| Scenario | B+ Tree Advantage | Use Cases |
|---|---|---|
| Partial Range Scans | Up to 2.5x faster | Database LIMIT queries, pagination |
| Large Dataset Iteration | 1.1x - 1.4x faster | Data export, bulk processing |
| Medium Range Queries | 1.4x faster | Time-series analysis, batch processing |
Partial Range Scans (Early Termination):
Limit 10 items: B+ Tree 1.18x faster
Limit 50 items: B+ Tree 2.50x faster ⭐ Best performance
Limit 100 items: B+ Tree 1.52x faster
Limit 500 items: B+ Tree 1.15x faster
Large Dataset Iteration:
200K items: B+ Tree 1.29x faster
300K items: B+ Tree 1.12x faster
500K items: B+ Tree 1.39x faster ⭐ Scales well
Optimal Configuration:
- Capacity 128 provides best performance (3.3x faster than capacity 4)
- Performance continues improving with larger capacities
Excellent for:
- Database-like workloads with range queries
- Analytics dashboards ("top 100 users")
- Search systems with pagination
- Time-series data processing
- Data export and ETL operations
- Any scenario with "LIMIT" or early termination patterns
Use SortedDict when:
- Random access dominates (37x faster individual lookups)
- Small datasets (< 100K items)
- Memory efficiency is critical
- General-purpose sorted container needs
# Small capacity: More splits, good for testing
tree = BPlusTree(capacity=4)
# Medium capacity: Balanced performance
tree = BPlusTree(capacity=16)
# Large capacity: Optimal for most use cases
tree = BPlusTree(capacity=128) # Recommended!# Run tests
python -m pytest tests/
# Run performance benchmarks
python tests/test_performance_vs_sorteddict.py
# Run specific tests
python -m pytest tests/test_bplustree.py -vtree = BPlusTree(capacity=128)
# Dictionary-like interface
tree[key] = value
value = tree[key] # Raises KeyError if not found
del tree[key] # Raises KeyError if not found
key in tree # Returns bool
len(tree) # Returns int
# Safe operations
tree.get(key, default=None)
tree.pop(key, default=None)# Full iteration
for key, value in tree.items():
pass
for key in tree.keys():
pass
for value in tree.values():
pass
# Range queries
for key, value in tree.range(start_key, end_key):
pass
# Range with None bounds
for key, value in tree.range(start_key, None): # From start_key to end
pass
for key, value in tree.range(None, end_key): # From beginning to end_key
passThe C extension provides iterator safety to prevent segmentation faults during tree modifications:
tree = BPlusTree(capacity=128)
for i in range(10):
tree[i] = f"value_{i}"
# Create iterator
keys_iter = tree.keys()
first_key = next(keys_iter)
# Modify tree during iteration
tree[100] = "new_value"
# Iterator detects modification and raises RuntimeError
try:
next(keys_iter)
except RuntimeError as e:
print(e) # "tree changed size during iteration"Safety Features:
- Modification detection: Iterators track tree changes via internal counter
- Graceful failure: RuntimeError instead of segmentation fault
- Multiple iterator support: All active iterators are invalidated on modification
- Consistent behavior: Matches Python's dict iterator safety model
Safe Patterns:
# ✅ Safe: Complete iteration before modification
keys = list(tree.keys()) # Collect all keys first
for key in keys:
tree[key] = new_value
# ✅ Safe: Use fresh iterator after modifications
tree[new_key] = new_value
for key, value in tree.items(): # New iterator, safe to use
process(key, value)- Arena-based memory management for efficiency
- Linked leaf nodes for fast sequential access
- Optimized rebalancing algorithms
- Hybrid navigation for range queries
- Iterator safety with modification counter tracking
- API Reference - Complete API documentation
- Examples - Comprehensive usage examples:
- Basic Usage - Fundamental operations
- Range Queries - Range query patterns
- Performance Demo - Benchmarks vs alternatives
- Migration Guide - Migrating from dict/SortedDict
- Main Project - Dual Rust/Python implementation
- Rust Implementation - Core Rust library
- Technical Documentation - Architecture and benchmarks
This project is licensed under the MIT License - see the LICENSE file for details.