https://arxiv.org/pdf/2411.01894 https://sites.google.com/view/rnd-dagger/home This codebase contains the code used to produce the experiments from the paper.
Create a conda environment:
conda create -n env_imitation_game python=3.10
pip install -e environments/
pip install -e interactive_module/
pip install -e gametrackr/
git clone --recursive https://github.com/DLR-RM/rl-baselines3-zoo.git
pip install -e rl-baselines3-zoo/
git clone https://github.com/araffin/pybullet_envs_gymnasium.git
Replace the script pybullet_envs_gymnasium/robot_locomotors.py by environments/envs/halfcheetah/imitation/robot_locomotors.py
Unzip executables in the godot_exe folder
Linux/WSL:
mv environments/envs/halfcheetah/imitation/robot_locomotors.py pybullet_envs_gymnasium/robot_locomotors.py
Or for Windows:
Move-Item -Path .\environments\envs\halfcheetah\imitation\robot_locomotors.py -Destination .\pybullet_envs_gymnasium\robot_locomotors.py -Force
pip install -e pybullet_envs_gymnasium/
pip install -e .\interactive_module\.
Modify the \interactive_module\interactive\configs\abs_path\abs_path.yaml with the correct paths of your working folder.
Download datasets of initial interactions (from which the interactive session is started) from our website https://sites.google.com/view/rnd-dagger
The scripts specific to each environments are located inside environments/envs
To perform an interaction session:
python environments/envs/<env_name>/interact/run.py
The configuration associated is inside interactive_module/interactive/configs/run_interaction_<initials_of_env>.yaml
The configurations of each decision rules are in configs/decision_rules
If you want to perform automatic experiments (with oracles) pass the oracle argument in the yamls to true, else to false (only available for RaceCar and Maze environments)
For instance, to launch an interactive session on RaceCar and use a Human-Gated interactive approach (i.e. you decide when to take control):
To control, use the gamepad (joystick button to take control, LT RT for forward backward) (potentially no longer true: ZQSD to control the car, Spacebar to take control)
python .\environments\envs\race_car\interact\run.py environment.run.headless=false oracle=false
e.g. launching interactive training on maze with RND-Dagger (tip: reduce max_epoch to small value to reduce the number of training iterations before interactive sessions, to check everything is working quickly)
python intheloop/work/imitationgame_racing/environments/envs/maze/interact/run.py n_initial_episodes=25 sync.session_length=2000 decision_rule=rnd max_epoch=2500 seed=42 decision_rule.parameters.context_length=2 sync.num_sessions=8 sync.n_frame_stable=1 decision_rule.trainer.parameters.model.hidden_size=32 decision_rule.trainer.parameters.model.n_layers=0 decision_rule.parameters.threshold_factor=2 eval.first_to_eval=-8 eval.n_episodes=50 decision_rule.trainer.parameters.model.type=regular decision_rule.parameters.padding_type=replicate sync.max_session_length=100000
on racecar (multirun, for cluster launches)
python intheloop/work/imitationgame_racing/environments/envs/race_car/interact/run.py -m n_initial_episodes=1 sync.session_length=2000 decision_rule=rnd max_epoch=2500 seed=120,210,420,12,42,21,1200,2100,120,210,420,12,42,21,1200,2100 sync.num_sessions=8 sync.n_frame_stable=1 decision_rule.parameters.padding_type=replicate eval.first_to_eval=-8 decision_rule.parameters.context_length=10 decision_rule.parameters.threshold_factor=2 decision_rule.trainer.parameters.model.hidden_size=32 decision_rule.trainer.parameters.model.n_layers=0 sync.lazy_threshold_divisor=1
The different results are available in: \
working_dir/map/method/seed/expe_folder/interaction_data to get the different metrics necessary to compute the results
working_dir/map/method/seed/expe_folder/evaluation to get the performance of the BC models for each session
© 2026 Ubisoft Entertainment. All Rights Reserved.