AI-Assisted Schema Mapping for Education Data

Evaluating agentic AI components for automating schema mapping and transform generation between education data standards (PDP, CDTL) and the Learner Information Framework (LIF).

Owner: Olivier Mills (Baobab Tech) Client: DataKind

Key Findings

AI language models dramatically outperform traditional methods (+61% improvement)
Smaller models match larger ones at a fraction of the cost (96% accuracy at 24x less cost)
For transform code generation, Tier 2 models outperform Tier 3 (98% vs 91%)

Reports

Document	Description
Final Report (PDF/DOCX)	Complete findings from both experiments
Interactive HTML Report	Charts and detailed run data

Experiments

Experiment	Description	Status
Experiment 1	Schema mapping: Can AI identify correct field correspondences?	Complete
Experiment 2	Transform generation: Can AI write JSONata transformation code?	Complete

Project Structure

ed-schema/
├── experiments/
│   ├── exp1/              # Schema Mapping
│   ├── exp2/              # Transform Expression Generation
│   └── report/            # Final Report (source files)
├── data/
│   ├── schemas/           # PDP, LIF, CDTL schema definitions
│   ├── gold/              # Human-verified ground truth
│   └── silver/            # AI-generated mappings
├── lib/                   # Shared utilities (ai.py)
└── docs/                  # Sprint documentation

Setup

uv sync                    # Install dependencies
cp .env.example .env       # Configure API keys (ANTHROPIC, GOOGLE, GROQ, OPENAI)

Quick Start

# Run schema mapping experiment
uv run python -m experiments.exp1.code.run_experiment --llm L-03 --batch --schema pdp

# Run transform generation experiment
uv run python -m experiments.exp2.code.run_experiment --model L-03 --schema pdp

# Rebuild final report
uv run python experiments/report/build.py

Documentation

CLAUDE.md - Development guide and conventions
Experiment 1 Design - Schema mapping methodology
Experiment 2 Design - Transform generation methodology

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.claude		.claude
data		data
docs		docs
experiments		experiments
lib		lib
mdr-ai-agent		mdr-ai-agent
scraps		scraps
scripts		scripts
synthetic		synthetic
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Assisted Schema Mapping for Education Data

Key Findings

Reports

Experiments

Project Structure

Setup

Quick Start

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI-Assisted Schema Mapping for Education Data

Key Findings

Reports

Experiments

Project Structure

Setup

Quick Start

Documentation

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages