This repo contains a database toy implementation, meant to experiment with concepts described in the book Designing Data-Intensive Applications.
Implement a LSM-Tree containing:
- SET method.
- GET method.
- DELETE method.
- Memtable.
- SSTable.
-
flush_memtablemethod. - WAL (Write Ahead Log) SET method.
- WAL (Write Ahead Log) DELETE method.
- WAL (Write Ahead Log)
recovermethod. - SSTable compact method.
- Bloom Filter.
- gRPC server.
Implement distributed attributes:
- Multi-partition support.
- Parition selection algorithm that works for any key.
- Fault-tolerancy in case one partition is down.
- New node addition and rebalancing.
- Latency of SET, GET and DELETE.
- Latency of gRPC requests.
- Memtable memory usage.
- SSTable memory usage.
- SSTable index memory usage.
-
flush_memtablelatency. -
compactsstables latency. - Latency to return None if key not found.
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtTo run in server mode:
make serverTo run inside a docker container with monitoring via prometheus and visualization using grafana, run:
docker compose up -dAlternatively, you can access it via python repl:
python -i
>>> from src.lsm_tree import LSMTree()
>>> db = LSMTree()
>>> db.get("user:123")To populate a local database with data run:
make populateYou can clean all local data using:
make cleanTo run the unit tests simply:
make test-unitTo run the load tests, first install the k6 tool:
# To run only the test for the SET operation.
k6 run test/load/set_load_test.js
# To run only the test for the GET operation.
k6 run test/load/get_load_test.js