Skip to content

Latest commit

Β 

History

History
124 lines (99 loc) Β· 4.2 KB

File metadata and controls

124 lines (99 loc) Β· 4.2 KB
title SQL Data Analyst Investigation Environment
colorFrom blue
colorTo purple
sdk docker
pinned false
app_port 8000
base_path /web
tags
openenv

πŸ” SQL Data Analyst Investigation Environment

A multi-step data investigation environment where an AI agent analyzes a realistic e-commerce database through iterative SQL queries to answer complex analytical questions β€” simulating a real-world data analyst workflow.

🎯 What Makes This Special

  • Dual Action Space: QueryAction (explore with SQL) vs AnswerAction (submit findings)
  • Multi-Step Investigation: Agent must plan, hypothesize, and iteratively drill down
  • Rich Reward Shaping: Partial rewards for productive exploration + multi-dimensional final grading
  • Realistic Database: 8-table e-commerce schema with ~5K+ rows and planted anomalies
  • 5 Investigation Tasks: From easy lookups to complex root cause analysis (easy β†’ hard)

πŸš€ Quick Start

from sql_data_analyst import QueryAction, AnswerAction, SqlDataAnalystEnv

async with SqlDataAnalystEnv(base_url="http://localhost:8000") as env:
    # Start an investigation
    result = await env.reset(task_id="anomaly_diagnosis")
    print(result.observation.task_description)
    print(result.observation.schema_info)

    # Query the database
    result = await env.step(QueryAction(
        sql="SELECT quarter, SUM(revenue) FROM sales GROUP BY quarter"
    ))
    print(result.observation.query_result)

    # Submit your findings
    result = await env.step(AnswerAction(
        answer="Revenue dropped due to Electronics decline in APAC...",
        evidence=["Electronics down 45%", "APAC is the outlier region"]
    ))
    print(f"Score: {result.reward}")

πŸ“Š Investigation Tasks

# Task Difficulty Steps Description
1 basic_lookup Easy 5 Top 5 products by revenue in Q4 2025
2 comparative_analysis Medium 7 Compare return rates across categories
3 trend_investigation Medium 8 Identify fastest growing customer segment
4 anomaly_diagnosis Hard 10 Root cause of Q3 2025 revenue drop
5 strategic_recommendation Hard 12 Recommend 3 products to discontinue

πŸ—οΈ Database Schema

8-table e-commerce analytics database:

  • customers β€” segment, region, join date
  • products β€” category, subcategory, price
  • suppliers β€” country, reliability score
  • orders β€” date, status, total amount
  • order_items β€” quantity, price, discount
  • shipments β€” ship/delivery dates, carrier
  • returns β€” reason, refund amount
  • reviews β€” rating (1-5), review text

🎯 Reward System

Per-step (QueryAction):

  • Information gain: +0.0 to +0.2 (new tables, productive results)
  • SQL error: -0.05
  • Duplicate query: -0.1

Final (AnswerAction):

  • Correctness Γ— 0.6 (facts matched against ground truth)
  • Completeness Γ— 0.3 (answer depth, evidence, specific numbers)
  • Efficiency Γ— 0.1 (steps used vs budget)

Total episode score: clipped to [0.0, 1.0]

πŸ› οΈ Development

# Run locally
uvicorn server.app:app --reload --port 8000

# Run tests
python test_local.py

# Run baseline (requires OPENAI_API_KEY)
python baseline.py

# Build Docker image
docker build -t sql-data-analyst:latest -f server/Dockerfile .

πŸ“ Project Structure

sql_data_analyst/
β”œβ”€β”€ __init__.py              # Package exports
β”œβ”€β”€ models.py                # QueryAction, AnswerAction, AnalystObservation, AnalystState
β”œβ”€β”€ client.py                # SqlDataAnalystEnv(EnvClient)
β”œβ”€β”€ openenv.yaml             # OpenEnv manifest
β”œβ”€β”€ pyproject.toml           # Dependencies
β”œβ”€β”€ baseline.py              # LLM inference agent
β”œβ”€β”€ test_local.py            # Local test script
β”œβ”€β”€ README.md                # This file
└── server/
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ app.py               # FastAPI server
    β”œβ”€β”€ environment.py        # Core environment logic
    β”œβ”€β”€ database.py           # SQLite schema + seed data
    β”œβ”€β”€ tasks.py              # 5 task definitions
    β”œβ”€β”€ grader.py             # Multi-dimensional grading
    β”œβ”€β”€ requirements.txt
    └── Dockerfile