Evaluating agentic AI components for automating schema mapping and transform generation between education data standards (PDP, CDTL) and the Learner Information Framework (LIF).
Owner: Olivier Mills (Baobab Tech) Client: DataKind
- AI language models dramatically outperform traditional methods (+61% improvement)
- Smaller models match larger ones at a fraction of the cost (96% accuracy at 24x less cost)
- For transform code generation, Tier 2 models outperform Tier 3 (98% vs 91%)
| Document | Description |
|---|---|
| Final Report (PDF/DOCX) | Complete findings from both experiments |
| Interactive HTML Report | Charts and detailed run data |
| Experiment | Description | Status |
|---|---|---|
| Experiment 1 | Schema mapping: Can AI identify correct field correspondences? | Complete |
| Experiment 2 | Transform generation: Can AI write JSONata transformation code? | Complete |
ed-schema/
├── experiments/
│ ├── exp1/ # Schema Mapping
│ ├── exp2/ # Transform Expression Generation
│ └── report/ # Final Report (source files)
├── data/
│ ├── schemas/ # PDP, LIF, CDTL schema definitions
│ ├── gold/ # Human-verified ground truth
│ └── silver/ # AI-generated mappings
├── lib/ # Shared utilities (ai.py)
└── docs/ # Sprint documentation
uv sync # Install dependencies
cp .env.example .env # Configure API keys (ANTHROPIC, GOOGLE, GROQ, OPENAI)# Run schema mapping experiment
uv run python -m experiments.exp1.code.run_experiment --llm L-03 --batch --schema pdp
# Run transform generation experiment
uv run python -m experiments.exp2.code.run_experiment --model L-03 --schema pdp
# Rebuild final report
uv run python experiments/report/build.py- CLAUDE.md - Development guide and conventions
- Experiment 1 Design - Schema mapping methodology
- Experiment 2 Design - Transform generation methodology