Name	Name	Last commit message	Last commit date
parent directory ..
benchmarks	benchmarks
bplustree	bplustree
bplustree_c_src	bplustree_c_src
docs	docs
examples	examples
tests	tests
tmp	tmp
CHANGELOG.md	CHANGELOG.md
LICENSE	LICENSE
MANIFEST.in	MANIFEST.in
README.md	README.md
conftest.py	conftest.py
coverage.xml	coverage.xml
py.typed	py.typed
pyproject.toml	pyproject.toml
setup.py	setup.py

BPlusTree - Python Implementation

A high-performance B+ tree implementation for Python with competitive performance against highly optimized libraries like SortedDict.

🚀 Quick Start

Installation

Option 1: Install from source (current)

git clone https://github.com/KentBeck/BPlusTree.git
cd BPlusTree/python
pip install -e .

Option 2: Install from PyPI (coming soon)

pip install bplustree

Requirements

Python 3.8 or higher
C compiler (for C extension, optional)

Implementation Selection

The library automatically selects the best available implementation:

C Extension (preferred): 2-4x faster, used automatically if available
Pure Python: Fallback implementation, no compilation required

Check which implementation is being used:

from bplustree import get_implementation
print(get_implementation())  # "C extension" or "Pure Python"

📖 Basic Usage

from bplustree import BPlusTreeMap

# Create a B+ tree
tree = BPlusTreeMap(capacity=128)  # Higher capacity = better performance

# Insert data
tree[1] = "one"
tree[3] = "three"
tree[2] = "two"

# Lookups
print(tree[2])        # "two"
print(len(tree))      # 3
print(2 in tree)      # True

# Range queries
for key, value in tree.range(1, 3):
    print(f"{key}: {value}")

# Iteration
for key, value in tree.items():
    print(f"{key}: {value}")

⚡ Performance Highlights

Our benchmarks against SortedDict show significant advantages in specific scenarios:

🏆 Where B+ Tree Excels

Scenario	B+ Tree Advantage	Use Cases
Partial Range Scans	Up to 2.5x faster	Database LIMIT queries, pagination
Large Dataset Iteration	1.1x - 1.4x faster	Data export, bulk processing
Medium Range Queries	1.4x faster	Time-series analysis, batch processing

📊 Benchmark Results

Partial Range Scans (Early Termination):

Limit  10 items: B+ Tree 1.18x faster
Limit  50 items: B+ Tree 2.50x faster  ⭐ Best performance
Limit 100 items: B+ Tree 1.52x faster
Limit 500 items: B+ Tree 1.15x faster

Large Dataset Iteration:

200K items: B+ Tree 1.29x faster
300K items: B+ Tree 1.12x faster
500K items: B+ Tree 1.39x faster  ⭐ Scales well

Optimal Configuration:

Capacity 128 provides best performance (3.3x faster than capacity 4)
Performance continues improving with larger capacities

🎯 When to Choose B+ Tree

Excellent for:

Database-like workloads with range queries
Analytics dashboards ("top 100 users")
Search systems with pagination
Time-series data processing
Data export and ETL operations
Any scenario with "LIMIT" or early termination patterns

Use SortedDict when:

Random access dominates (37x faster individual lookups)
Small datasets (< 100K items)
Memory efficiency is critical
General-purpose sorted container needs

🔧 Configuration

# Small capacity: More splits, good for testing
tree = BPlusTree(capacity=4)

# Medium capacity: Balanced performance
tree = BPlusTree(capacity=16)

# Large capacity: Optimal for most use cases
tree = BPlusTree(capacity=128)  # Recommended!

🧪 Testing

# Run tests
python -m pytest tests/

# Run performance benchmarks
python tests/test_performance_vs_sorteddict.py

# Run specific tests
python -m pytest tests/test_bplustree.py -v

📖 API Reference

Basic Operations

tree = BPlusTree(capacity=128)

# Dictionary-like interface
tree[key] = value
value = tree[key]        # Raises KeyError if not found
del tree[key]           # Raises KeyError if not found
key in tree             # Returns bool
len(tree)               # Returns int

# Safe operations
tree.get(key, default=None)
tree.pop(key, default=None)

Iteration and Ranges

# Full iteration
for key, value in tree.items():
    pass

for key in tree.keys():
    pass

for value in tree.values():
    pass

# Range queries
for key, value in tree.range(start_key, end_key):
    pass

# Range with None bounds
for key, value in tree.range(start_key, None):  # From start_key to end
    pass

for key, value in tree.range(None, end_key):    # From beginning to end_key
    pass

🔒 Iterator Safety

The C extension provides iterator safety to prevent segmentation faults during tree modifications:

tree = BPlusTree(capacity=128)
for i in range(10):
    tree[i] = f"value_{i}"

# Create iterator
keys_iter = tree.keys()
first_key = next(keys_iter)

# Modify tree during iteration
tree[100] = "new_value"

# Iterator detects modification and raises RuntimeError
try:
    next(keys_iter)
except RuntimeError as e:
    print(e)  # "tree changed size during iteration"

Safety Features:

Modification detection: Iterators track tree changes via internal counter
Graceful failure: RuntimeError instead of segmentation fault
Multiple iterator support: All active iterators are invalidated on modification
Consistent behavior: Matches Python's dict iterator safety model

Safe Patterns:

# ✅ Safe: Complete iteration before modification
keys = list(tree.keys())  # Collect all keys first
for key in keys:
    tree[key] = new_value

# ✅ Safe: Use fresh iterator after modifications
tree[new_key] = new_value
for key, value in tree.items():  # New iterator, safe to use
    process(key, value)

🏗️ Architecture

Arena-based memory management for efficiency
Linked leaf nodes for fast sequential access
Optimized rebalancing algorithms
Hybrid navigation for range queries
Iterator safety with modification counter tracking

📚 Documentation & Examples

API Reference - Complete API documentation
Examples - Comprehensive usage examples:
- Basic Usage - Fundamental operations
- Range Queries - Range query patterns
- Performance Demo - Benchmarks vs alternatives
- Migration Guide - Migrating from dict/SortedDict

🔗 Links

Main Project - Dual Rust/Python implementation
Rust Implementation - Core Rust library
Technical Documentation - Architecture and benchmarks

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

BPlusTree - Python Implementation

🚀 Quick Start

Installation

Requirements

Implementation Selection

📖 Basic Usage

⚡ Performance Highlights

🏆 Where B+ Tree Excels

📊 Benchmark Results

🎯 When to Choose B+ Tree

🔧 Configuration

🧪 Testing

📖 API Reference

Basic Operations

Iteration and Ranges

🔒 Iterator Safety

🏗️ Architecture

📚 Documentation & Examples

🔗 Links

📄 License

FilesExpand file tree

python

Directory actions

More options

Directory actions

More options

Latest commit

History

python

Folders and files

parent directory

README.md

BPlusTree - Python Implementation

🚀 Quick Start

Installation

Requirements

Implementation Selection

📖 Basic Usage

⚡ Performance Highlights

🏆 Where B+ Tree Excels

📊 Benchmark Results

🎯 When to Choose B+ Tree

🔧 Configuration

🧪 Testing

📖 API Reference

Basic Operations

Iteration and Ranges

🔒 Iterator Safety

🏗️ Architecture

📚 Documentation & Examples

🔗 Links

📄 License