Name	Name	Last commit message	Last commit date
parent directory ..
agent_framework_lab_gaia	agent_framework_lab_gaia
samples	samples
tests	tests
README.md	README.md

Name

Last commit message

Last commit date

Agent Framework Lab - GAIA

The GAIA benchmark can be used for evaluating agents and workflows built using the Agent Framework. It includes built-in benchmarks as well as utilities for running custom evaluations.

Note: This module is part of the consolidated agent-framework-lab package. Install the package with the gaia extra to use this module.

Setup

Install from source with GAIA dependencies:

git clone https://github.com/microsoft/agent-framework.git
cd agent-framework/python/packages/lab
pip install -e ".[gaia]"

Set up Hugging Face token:

export HF_TOKEN="hf\*..." # must have access to gaia-benchmark/GAIA

Create an evaluation script

Create a Python script (e.g., run_gaia.py) with the following content:

from agent_framework.lab.gaia import GAIA, Task, Prediction, GAIATelemetryConfig

async def run_task(task: Task) -> Prediction:
    return Prediction(prediction="answer here", messages=[])

async def main() -> None:
    # Optional: Enable telemetry for detailed tracing
    telemetry_config = GAIATelemetryConfig(
        enable_tracing=True,
        trace_to_file=True,
        file_path="gaia_traces.jsonl"
    )

    runner = GAIA(telemetry_config=telemetry_config)
    await runner.run(run_task, level=1, max_n=5, parallel=2)

See the gaia_sample.py for more detail.

Run the evaluation

Run the evaluation script using uv:

uv run python run_gaia.py

By default, the script will first look for cached GAIA data in the data_gaia_hub directory, and download it if not found. The result will be saved to gaia_results_<timestamp>.jsonl.

Don't run the script inside this directory because it will confuse the local agent_framework namespace package with the real one.

View results

We provide a console viewer for reading GAIA results:

uv run gaia_viewer "gaia_results_<timestamp>.jsonl" --detailed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Agent Framework Lab - GAIA

Setup

Create an evaluation script

Run the evaluation

View results

FilesExpand file tree

gaia

Directory actions

More options

Directory actions

More options

Latest commit

History

gaia

Folders and files

parent directory

README.md

Agent Framework Lab - GAIA

Setup

Create an evaluation script

Run the evaluation

View results