A growing number of emerging database management systems (such as time-series and streaming databases) have been developed to support specialized workloads with enhanced performance and functionality. However, these systems are often less mature than traditional relational databases, making them more prone to logic bugs and internal errors that affect correctness and reliability.
ACME is a research tool that addresses this through LLM-assisted differential testing. The key insight is that many emerging database systems are conceptually extensions of relational databases, so bugs can be uncovered by comparing query results against a more mature relational database. To bridge the syntax and semantic differences between the two systems, ACME uses Large Language Models to automatically discover supported clauses and generate clause mappings that translate system-specific SQL features into equivalent standard SQL.
ACME proceeds in three steps:
- On-the-fly clause mapping: whenever differential testing encounters an unsuccessful query, ACME constructs a clause-mapping request to an LLM to fix the translation.
- Mapping verification:generated mappings are validated by executing test queries; verification failures trigger re-prompting, and verified mappings are saved as reusable Python functions.
- Diverse query generation: semantically equivalent but syntactically diverse queries are generated to broaden differential testing coverage.
ACME was applied to four widely used emerging database systems, uncovering 59 previously unknown bugs (17 logic bugs, 42 internal errors), of which 52 have been fixed and 5 confirmed by vendors. Evaluation shows that ACME enhances LLM reliability through query verification, and proves effective under tight constraints — whether using local models or a limited online token budget.
This repository demonstrates ACME on QuestDB (emerging, time-series) vs PostgreSQL (mature, relational) as a concrete example.
QueryGenerator ──→ DifferentialTester ──→ QuestDB (emerging DB)
(emerging DB │ │
SQL + schema) CodeMapper Results ──→ Comparator
│ │ │
(mappings/*.py) PostgreSQL ────────→
↑
LLM (on mapping failure)
- QueryGenerator (
questdb_generator.py): Generates QuestDB-specific DDL/DML/SELECT covering a broad range of clauses and edge cases. - CodeMapper (
differential_tester.py): Applies a chain of Pythontransform(sql)functions frommappings/to translate QuestDB SQL into valid PostgreSQL SQL. - DifferentialTester (
differential_tester.py): Orchestrates the pipeline: generate → execute on QuestDB → translate via CodeMapper → execute on PostgreSQL → compare result sets. - LLM Integration: On PostgreSQL execution failure, an LLM generates a new mapping function. Verified functions are saved to
mappings/for reuse across all future queries.
- Docker & Docker Compose
- Python 3.10+
- Ollama (for local LLM inference)
docker-compose up -d- PostgreSQL: port
5432 - QuestDB: port
8812(PG wire protocol),9000(web console)
# Install Ollama from https://ollama.com/
ollama pull qwen2.5-coder:7b-instructpip install -r requirements.txtACME works best in two phases. The mapping library starts empty and needs to be built up before large-scale parallel testing is effective.
The LLM learns to translate QuestDB constructs into PostgreSQL by observing execution errors and generating Python transform rules. Run single-threaded so rule generation is not disrupted by concurrent execution.
python3 main.py test --count 0 --seed 42With a specific LLM provider:
# Local (Ollama)
python3 main.py test --offline
# Remote (OpenAI)
python3 main.py test --onlineStop when the LLM call rate drops noticeably, this indicates that most common constructs are already covered by saved mappings. All verified mappings are persisted to mappings/*.py automatically.
Once a sufficient mapping library exists, disable the LLM and run with many parallel threads for high-throughput bug finding.
python3 main.py test --count 0 --seed 42 --no-llm --threads 16Any query that fails to map cleanly is logged to test_logs/results.txt rather than retried with the LLM, keeping throughput high.
python3 main.py test [options]
| Flag | Default | Description |
|---|---|---|
--count N |
50 |
Queries to run; 0 = infinite |
--seed N |
42 |
Random seed for reproducibility |
--threads N |
1 |
Parallel worker threads |
--offline |
— | Use Ollama (qwen2.5-coder:7b-instruct) |
--online |
— | Use OpenAI (gpt-4o) |
--llm PROVIDER |
ollama |
LLM provider: ollama or openai |
--llm-model MODEL |
qwen2.5-coder:7b-instruct |
Model name |
--no-llm |
— | Disable LLM; use saved mappings only |
--max-prompts N |
— | Stop after N LLM mapping requests |
--verbose |
— | Enable debug-level logging |
The following flags alter testing behaviour and are intended for controlled experiments, not for regular bug-finding runs.
| Flag | Description |
|---|---|
--static-data |
Replace random inserts with a fixed 100-row dataset. Use when you need deterministic, reproducible results (e.g. comparing two mapping strategies or measuring LLM effectiveness). Random data generally gives better coverage. |
--ablation |
Skip LLM-generated rule verification before applying rules. Use only to measure the impact of the verification step in ablation studies. |
--no-strict |
Relax result-set comparison during rule verification. Use in controlled experiments where approximate equivalence is sufficient. |
| Path | Contents |
|---|---|
mappings/*.py |
Learned translation rules (persistent across runs) |
test_logs/results.txt |
All mismatches and mapping failures |
test_logs/token_stats.txt |
Per-call and session LLM token usage and cost estimates |
repro_scripts/ |
Standalone Python scripts to reproduce each finding |
| File | Role |
|---|---|
main.py |
CLI entry point; manages Docker containers |
differential_tester.py |
Core engine: CodeMapper + DifferentialTester |
questdb_generator.py |
QuestDB SQL generator (emerging DB example) |
mappings/*.py |
Learned QDB→PG translation rules |
docker-compose.yml |
PostgreSQL and QuestDB service definitions |
tests/ |
Unit and pipeline integration test suite |
Find more details in our FSE 2026 research paper: ACME: Automated Clause Mapping Engine for Differential Testing of Emerging Database Systems