| title | SQL Data Analyst Investigation Environment | |
|---|---|---|
| colorFrom | blue | |
| colorTo | purple | |
| sdk | docker | |
| pinned | false | |
| app_port | 8000 | |
| base_path | /web | |
| tags |
|
A multi-step data investigation environment where an AI agent analyzes a realistic e-commerce database through iterative SQL queries to answer complex analytical questions β simulating a real-world data analyst workflow.
- Dual Action Space:
QueryAction(explore with SQL) vsAnswerAction(submit findings) - Multi-Step Investigation: Agent must plan, hypothesize, and iteratively drill down
- Rich Reward Shaping: Partial rewards for productive exploration + multi-dimensional final grading
- Realistic Database: 8-table e-commerce schema with ~5K+ rows and planted anomalies
- 5 Investigation Tasks: From easy lookups to complex root cause analysis (easy β hard)
from sql_data_analyst import QueryAction, AnswerAction, SqlDataAnalystEnv
async with SqlDataAnalystEnv(base_url="http://localhost:8000") as env:
# Start an investigation
result = await env.reset(task_id="anomaly_diagnosis")
print(result.observation.task_description)
print(result.observation.schema_info)
# Query the database
result = await env.step(QueryAction(
sql="SELECT quarter, SUM(revenue) FROM sales GROUP BY quarter"
))
print(result.observation.query_result)
# Submit your findings
result = await env.step(AnswerAction(
answer="Revenue dropped due to Electronics decline in APAC...",
evidence=["Electronics down 45%", "APAC is the outlier region"]
))
print(f"Score: {result.reward}")| # | Task | Difficulty | Steps | Description |
|---|---|---|---|---|
| 1 | basic_lookup |
Easy | 5 | Top 5 products by revenue in Q4 2025 |
| 2 | comparative_analysis |
Medium | 7 | Compare return rates across categories |
| 3 | trend_investigation |
Medium | 8 | Identify fastest growing customer segment |
| 4 | anomaly_diagnosis |
Hard | 10 | Root cause of Q3 2025 revenue drop |
| 5 | strategic_recommendation |
Hard | 12 | Recommend 3 products to discontinue |
8-table e-commerce analytics database:
customersβ segment, region, join dateproductsβ category, subcategory, pricesuppliersβ country, reliability scoreordersβ date, status, total amountorder_itemsβ quantity, price, discountshipmentsβ ship/delivery dates, carrierreturnsβ reason, refund amountreviewsβ rating (1-5), review text
Per-step (QueryAction):
- Information gain: +0.0 to +0.2 (new tables, productive results)
- SQL error: -0.05
- Duplicate query: -0.1
Final (AnswerAction):
- Correctness Γ 0.6 (facts matched against ground truth)
- Completeness Γ 0.3 (answer depth, evidence, specific numbers)
- Efficiency Γ 0.1 (steps used vs budget)
Total episode score: clipped to [0.0, 1.0]
# Run locally
uvicorn server.app:app --reload --port 8000
# Run tests
python test_local.py
# Run baseline (requires OPENAI_API_KEY)
python baseline.py
# Build Docker image
docker build -t sql-data-analyst:latest -f server/Dockerfile .sql_data_analyst/
βββ __init__.py # Package exports
βββ models.py # QueryAction, AnswerAction, AnalystObservation, AnalystState
βββ client.py # SqlDataAnalystEnv(EnvClient)
βββ openenv.yaml # OpenEnv manifest
βββ pyproject.toml # Dependencies
βββ baseline.py # LLM inference agent
βββ test_local.py # Local test script
βββ README.md # This file
βββ server/
βββ __init__.py
βββ app.py # FastAPI server
βββ environment.py # Core environment logic
βββ database.py # SQLite schema + seed data
βββ tasks.py # 5 task definitions
βββ grader.py # Multi-dimensional grading
βββ requirements.txt
βββ Dockerfile