This project runs reproducible streaming-ingestion benchmarks across multiple
vector databases using VectorDBBench's StreamingPerformanceCase.
The benchmark focuses on a common production pattern: vectors are inserted over time while the system is also serving searches. Instead of measuring only a fully-loaded, read-only index, the script evaluates how each database behaves while data is still being written.
Vector databases often look different under streaming workloads than under offline bulk-load workloads. During continuous ingestion, a database must balance index construction, write throughput, search latency, recall, memory use, and background compaction or persistence work.
This repository wraps VectorDBBench with a simple runner that:
- Downloads the selected VectorDBBench dataset before the run.
- Deploys and health-checks each selected database.
- Runs one database at a time to reduce resource interference.
- Stops each database after its benchmark finishes.
- Collects result JSON files and prints a compact comparison table.
- Saves a CSV summary under
/tmp/vectordb_bench_results.
The current benchmark uses HNSW-style index parameters across supported databases where possible:
M = 16ef_construction = 256ef_search = 200
The default streaming case uses:
insert_rate = 500search_stages = [0.5, 0.8]concurrencies = [5, 10]read_dur_after_write = 30
The runner currently supports:
- SeekDB
- Elasticsearch
- Milvus
- Chroma
- Qdrant
- LanceDB
Deployment helpers live in deploy/. The runner checks whether a database is
already healthy before starting it. If the database is not healthy and a deploy
script exists, the runner executes the matching deploy script automatically.
LanceDB is embedded and does not require an external service.
Use Linux for full benchmark execution. The deployment scripts assume common
Linux tools and services such as systemctl, curl, wget, tar, yum, and
mysql.
Python requirements:
- Python 3.11 or newer
- A virtual environment
- Project dependencies installed from
pyproject.toml
System requirements:
- Sufficient memory for database services. Elasticsearch is configured with a
30 GB heap in
deploy/deploy_elasticsearch.sh. - Sufficient disk space for datasets, database data, and result files.
- Container support for SeekDB. The script uses the
dockercommand. On some Alibaba Cloud Linux images,dockermay be provided bypodman-docker, which is also acceptable for the current deployment script. - Network access for downloading Python packages, benchmark datasets, and database binaries or images.
Clone or copy the project to the target Linux machine, then create a Python 3.11 virtual environment from the project root:
cd /root/vdb-bench
python3.11 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
python -m pip install -e .Do not use the system Python if it is older than Python 3.11. For example, some
Linux distributions still provide Python 3.6 as /usr/bin/python.
This project pins a compatible OpenTelemetry/protobuf dependency set for Chroma, Milvus, and the current VectorDBBench dependency graph.
The important compatibility constraints are:
protobuf>=5.27.2,<7opentelemetry-api==1.41.1opentelemetry-sdk==1.41.1opentelemetry-proto==1.41.1opentelemetry-exporter-otlp-proto-grpc==1.41.1
Chroma also requires SQLite 3.35 or newer. Many enterprise Linux systems ship an
older system SQLite. The project depends on pysqlite3-binary, and
run_bench.py swaps it in before importing Chroma-related modules.
Run all supported databases on the default dataset:
cd /root/vdb-bench
source .venv/bin/activate
python run_bench.pyRun all supported databases on the medium Cohere dataset:
python run_bench.py \
-d seekdb elasticsearch milvus chroma qdrant lancedb \
--dataset CohereMediumRun only one database:
python run_bench.py -d qdrant --dataset CohereSmallRun a subset of databases:
python run_bench.py -d milvus qdrant lancedb --dataset CohereMediumShow command-line help:
python run_bench.py --helpSupported dataset names:
CohereSmallCohereMediumCohereLarge
The runner first attempts to prepare the dataset from S3, then falls back to Aliyun OSS if the S3 download fails.
The benchmark flow is:
- Load environment variables from
.envif present. - Download or prepare the selected dataset.
- For each selected database:
- Check whether the database is already healthy.
- Deploy it if needed.
- Build the VectorDBBench task config.
- Run the streaming performance benchmark.
- Wait until the benchmark finishes.
- Stop the database service or container.
- Read VectorDBBench result JSON files.
- Print an 80% stage comparison table.
- Save a CSV summary.
Each database is benchmarked sequentially so that one database does not consume CPU, memory, or IO during another database's run.
Most defaults are defined in run_bench.py. You can override service addresses
with environment variables or a .env file.
Supported environment variables include:
MILVUS_URI=http://localhost:19530
ES_HOST=localhost
ES_PORT=9200
ES_PASSWORD=unused
QDRANT_URL=http://localhost:6333
CHROMA_HOST=localhost
CHROMA_PORT=8000
LANCEDB_URI=./lancedb_data
SEEKDB_HOST=127.0.0.1
SEEKDB_PORT=2881
SEEKDB_USER=bench
SEEKDB_PASSWORD=bench123
SEEKDB_DATABASE=testThe script prints a table similar to:
=== Streaming Benchmark Results (80% stage) ===
The summary includes:
- Concurrent QPS at the 80% stage.
- Serial P99 latency at the 80% stage.
- Concurrent P99 latency at the 80% stage.
- Recall at the 80% stage.
CSV summaries are written to:
/tmp/vectordb_bench_results
The filename format is:
summary_streaming_YYYYMMDD_HHMMSS.csv
VectorDBBench also writes its raw result files under its configured local results
directory. run_bench.py reads those JSON files directly when building the final
summary.
The deployment scripts are:
deploy/deploy_seekdb.shdeploy/deploy_elasticsearch.shdeploy/deploy_milvus.shdeploy/deploy_chroma.shdeploy/deploy_qdrant.sh
The runner maps databases to these scripts automatically. You can also execute a deployment script manually when debugging a single service:
bash deploy/deploy_qdrant.shInstall the project dependencies inside the virtual environment:
source .venv/bin/activate
python -m pip install -e .This usually means an incompatible protobuf and OpenTelemetry proto package
combination is installed. Reinstall the project dependencies after pulling the
latest pyproject.toml:
source .venv/bin/activate
python -m pip install -U -e .You can verify the relevant versions with:
python -m pip show protobuf opentelemetry-proto opentelemetry-exporter-otlp-proto-grpcInstall dependencies from this project so that pysqlite3-binary is available:
source .venv/bin/activate
python -m pip install -e .run_bench.py imports pysqlite3 before importing Chroma and replaces the
standard sqlite3 module for the current process.
Some machines provide the docker command through podman-docker instead of a
Docker daemon managed by docker.service. Check the runtime with:
docker infoThe SeekDB deploy script only needs the docker command to support container
operations such as docker run, docker ps, docker rm, and docker stop.
Check the service manually:
curl http://localhost:9200
curl http://localhost:6333/readyz
curl http://localhost:8000/api/v2/heartbeat
curl http://localhost:19530/v1/vector/collections
mysql -h 127.0.0.1 -P 2881 -u bench -pbench123 -e "SELECT 1"Then inspect the corresponding service logs or deployment logs.
Longer datasets can take a while. To keep the benchmark running after the SSH session exits:
cd /root/vdb-bench
source .venv/bin/activate
nohup python run_bench.py \
-d seekdb elasticsearch milvus chroma qdrant lancedb \
--dataset CohereMedium \
> bench_CohereMedium.log 2>&1 &Follow the log:
tail -f bench_CohereMedium.log.
├── deploy/
│ ├── deploy_chroma.sh
│ ├── deploy_elasticsearch.sh
│ ├── deploy_milvus.sh
│ ├── deploy_qdrant.sh
│ ├── deploy_seekdb.sh
│ └── qdrant_config.yaml
├── pyproject.toml
├── README.md
└── run_bench.py
Benchmark results are sensitive to hardware, kernel settings, filesystem performance, memory pressure, service versions, and dataset download location. For fair comparisons, run all databases on the same machine with the same dataset and avoid other heavy workloads during the benchmark.
Because the runner starts and stops services sequentially, results are intended to compare each database under similar machine-level conditions rather than under multi-database contention.