Skip to content

YuanchengJiang/ACME

Repository files navigation

ACME: Automated Clause Mapping Engine for Differential Testing of Emerging Database Systems

Unit Tests Integration Tests

Overview

A growing number of emerging database management systems (such as time-series and streaming databases) have been developed to support specialized workloads with enhanced performance and functionality. However, these systems are often less mature than traditional relational databases, making them more prone to logic bugs and internal errors that affect correctness and reliability.

ACME is a research tool that addresses this through LLM-assisted differential testing. The key insight is that many emerging database systems are conceptually extensions of relational databases, so bugs can be uncovered by comparing query results against a more mature relational database. To bridge the syntax and semantic differences between the two systems, ACME uses Large Language Models to automatically discover supported clauses and generate clause mappings that translate system-specific SQL features into equivalent standard SQL.

ACME proceeds in three steps:

  1. On-the-fly clause mapping: whenever differential testing encounters an unsuccessful query, ACME constructs a clause-mapping request to an LLM to fix the translation.
  2. Mapping verification:generated mappings are validated by executing test queries; verification failures trigger re-prompting, and verified mappings are saved as reusable Python functions.
  3. Diverse query generation: semantically equivalent but syntactically diverse queries are generated to broaden differential testing coverage.

ACME was applied to four widely used emerging database systems, uncovering 59 previously unknown bugs (17 logic bugs, 42 internal errors), of which 52 have been fixed and 5 confirmed by vendors. Evaluation shows that ACME enhances LLM reliability through query verification, and proves effective under tight constraints — whether using local models or a limited online token budget.

This repository demonstrates ACME on QuestDB (emerging, time-series) vs PostgreSQL (mature, relational) as a concrete example.


Architecture

QueryGenerator  ──→  DifferentialTester  ──→  QuestDB (emerging DB)
  (emerging DB             │                        │
    SQL + schema)     CodeMapper               Results  ──→  Comparator
                           │                        │              │
                      (mappings/*.py)          PostgreSQL ────────→
                           ↑
                    LLM (on mapping failure)
  • QueryGenerator (questdb_generator.py): Generates QuestDB-specific DDL/DML/SELECT covering a broad range of clauses and edge cases.
  • CodeMapper (differential_tester.py): Applies a chain of Python transform(sql) functions from mappings/ to translate QuestDB SQL into valid PostgreSQL SQL.
  • DifferentialTester (differential_tester.py): Orchestrates the pipeline: generate → execute on QuestDB → translate via CodeMapper → execute on PostgreSQL → compare result sets.
  • LLM Integration: On PostgreSQL execution failure, an LLM generates a new mapping function. Verified functions are saved to mappings/ for reuse across all future queries.

Getting Started

Prerequisites

  • Docker & Docker Compose
  • Python 3.10+
  • Ollama (for local LLM inference)

1. Start Databases

docker-compose up -d
  • PostgreSQL: port 5432
  • QuestDB: port 8812 (PG wire protocol), 9000 (web console)

2. Prepare Local LLM

# Install Ollama from https://ollama.com/
ollama pull qwen2.5-coder:7b-instruct

3. Install Python Dependencies

pip install -r requirements.txt

Recommended Workflow

ACME works best in two phases. The mapping library starts empty and needs to be built up before large-scale parallel testing is effective.

Phase 1 — Build the Mapping Library (LLM enabled, single-threaded)

The LLM learns to translate QuestDB constructs into PostgreSQL by observing execution errors and generating Python transform rules. Run single-threaded so rule generation is not disrupted by concurrent execution.

python3 main.py test --count 0 --seed 42

With a specific LLM provider:

# Local (Ollama)
python3 main.py test --offline

# Remote (OpenAI)
python3 main.py test --online

Stop when the LLM call rate drops noticeably, this indicates that most common constructs are already covered by saved mappings. All verified mappings are persisted to mappings/*.py automatically.

Phase 2 — Differential Testing at Scale (no LLM, parallel)

Once a sufficient mapping library exists, disable the LLM and run with many parallel threads for high-throughput bug finding.

python3 main.py test --count 0 --seed 42 --no-llm --threads 16

Any query that fails to map cleanly is logged to test_logs/results.txt rather than retried with the LLM, keeping throughput high.


All Flags

python3 main.py test [options]
Flag Default Description
--count N 50 Queries to run; 0 = infinite
--seed N 42 Random seed for reproducibility
--threads N 1 Parallel worker threads
--offline Use Ollama (qwen2.5-coder:7b-instruct)
--online Use OpenAI (gpt-4o)
--llm PROVIDER ollama LLM provider: ollama or openai
--llm-model MODEL qwen2.5-coder:7b-instruct Model name
--no-llm Disable LLM; use saved mappings only
--max-prompts N Stop after N LLM mapping requests
--verbose Enable debug-level logging

Evaluation-Only Flags

The following flags alter testing behaviour and are intended for controlled experiments, not for regular bug-finding runs.

Flag Description
--static-data Replace random inserts with a fixed 100-row dataset. Use when you need deterministic, reproducible results (e.g. comparing two mapping strategies or measuring LLM effectiveness). Random data generally gives better coverage.
--ablation Skip LLM-generated rule verification before applying rules. Use only to measure the impact of the verification step in ablation studies.
--no-strict Relax result-set comparison during rule verification. Use in controlled experiments where approximate equivalence is sufficient.

Output

Path Contents
mappings/*.py Learned translation rules (persistent across runs)
test_logs/results.txt All mismatches and mapping failures
test_logs/token_stats.txt Per-call and session LLM token usage and cost estimates
repro_scripts/ Standalone Python scripts to reproduce each finding

Project Structure

File Role
main.py CLI entry point; manages Docker containers
differential_tester.py Core engine: CodeMapper + DifferentialTester
questdb_generator.py QuestDB SQL generator (emerging DB example)
mappings/*.py Learned QDB→PG translation rules
docker-compose.yml PostgreSQL and QuestDB service definitions
tests/ Unit and pipeline integration test suite

Research Paper

Find more details in our FSE 2026 research paper: ACME: Automated Clause Mapping Engine for Differential Testing of Emerging Database Systems

About

Automated Clause Mapping Engine for Testing Emerging Database Systems [FSE 2026]

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages