A modular Python runtime for reconfiguring simulation tasks mid-loop, with an EventBus, YAML curriculum, and pluggable adaptation policies.
The goal is not to build a large RL framework. It is to show the core software architecture behind online task reconfiguration: clear task definitions, a runtime controller, observable events, and an environment interface that can accept new task settings during execution.
Adaptive RL and simulation platforms often need a layer between the agent loop and the environment. That layer decides when the task should change, applies the new task, and records why the transition happened. This project models that layer explicitly.
It connects to adaptive environments, task reconfiguration, simulation architecture, reinforcement learning infrastructure, and agent-environment interfaces by separating task configuration from environment logic. The same runtime controller could be extended to drive a Gymnasium environment, a game simulation, or a human-adaptive experiment.
- YAML-driven task curriculum in
configs/curriculum.yaml TaskManagerfor loading and resolving task definitionsRuntimeControllerfor running episodes and applying task changesEventBusfor runtime events such astask_changedandepisode_finishedAdaptationPolicyinterface with a working episode-threshold policy and a workingPerformanceAdaptivePolicyToyLineWorld, a small plug-in environment for demonstrating reconfiguration- Console logging of task transitions and episode outcomes
- JSONL transition logs saved to
outputs/runtime_log.jsonl - Tests for task loading, events, and runtime transitions
pip install -r requirements.txtRun the runtime demo:
python examples/run_runtime_demo.pyRun tests:
python -m pytestThe default curriculum changes tasks at episode thresholds:
| Episodes | Task |
|---|---|
| 1-20 | Reach a fixed target |
| 21-40 | Reach a target that changes by episode |
| 41-60 | Reach the target while avoiding obstacles |
| 61+ | Use stronger penalties and a harder obstacle layout |
examples/run_runtime_demo.py emits runtime events as the controller crosses task thresholds:
[task_changed] episode=21 from=fixed_target to=shifting_target
[task_changed] episode=41 from=shifting_target to=obstacle_course
[task_changed] episode=61 from=obstacle_course to=high_penalty_adaptation
The end of the demo prints a compact summary:
Summary
- fixed_target: episodes=20, successes=19
- shifting_target: episodes=20, successes=19
- obstacle_course: episodes=20, successes=0
- high_penalty_adaptation: episodes=10, successes=0
Note: The results above use a random policy baseline. Zero success rates on harder stages, such as the obstacle course, are expected. These numbers establish the baseline floor, not a bug.
flowchart LR
YAML["configs/curriculum.yaml"] --> TaskManager["TaskManager"]
Policy["AdaptationPolicy"] --> Controller["RuntimeController"]
TaskManager --> Controller
Controller --> Environment["ToyLineWorld / Environment"]
Environment --> Controller
Controller --> EventBus["EventBus"]
EventBus --> Logger["RuntimeLogger"]
The runtime is intentionally split into small modules. TaskManager owns configuration, policies own task-selection decisions, RuntimeController owns orchestration, and EventBus makes transitions observable without hard-coding logging into the controller.
The toy environment uses a one-dimensional world so the runtime behavior is easy to inspect. That keeps the focus on architecture rather than environment complexity.
- Add a formal environment protocol so Gymnasium environments can be plugged in directly
- Add richer performance-based adaptation policies
- Add checkpointable runtime state for long-running experiments
- Add a small agent interface instead of using environment-sampled actions