Skip to content

ubisoft/ubisoft-laforge-rnd-dagger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Efficient Active Imitation Learning with Random Network Distillation

https://arxiv.org/pdf/2411.01894 https://sites.google.com/view/rnd-dagger/home This codebase contains the code used to produce the experiments from the paper.

Install

Create a conda environment:

conda create -n env_imitation_game python=3.10
pip install -e environments/
pip install -e interactive_module/
pip install -e gametrackr/
git clone --recursive https://github.com/DLR-RM/rl-baselines3-zoo.git
pip install -e rl-baselines3-zoo/
git clone https://github.com/araffin/pybullet_envs_gymnasium.git

Replace the script pybullet_envs_gymnasium/robot_locomotors.py by environments/envs/halfcheetah/imitation/robot_locomotors.py

Unzip executables in the godot_exe folder

Linux/WSL:

mv environments/envs/halfcheetah/imitation/robot_locomotors.py pybullet_envs_gymnasium/robot_locomotors.py

Or for Windows:

Move-Item -Path .\environments\envs\halfcheetah\imitation\robot_locomotors.py -Destination .\pybullet_envs_gymnasium\robot_locomotors.py -Force
pip install -e pybullet_envs_gymnasium/
pip install -e .\interactive_module\.

Prepare the folders

Modify the \interactive_module\interactive\configs\abs_path\abs_path.yaml with the correct paths of your working folder.

Datasets

Download datasets of initial interactions (from which the interactive session is started) from our website https://sites.google.com/view/rnd-dagger

Interaction session

The scripts specific to each environments are located inside environments/envs

To perform an interaction session:

python environments/envs/<env_name>/interact/run.py

The configuration associated is inside interactive_module/interactive/configs/run_interaction_<initials_of_env>.yaml

The configurations of each decision rules are in configs/decision_rules

If you want to perform automatic experiments (with oracles) pass the oracle argument in the yamls to true, else to false (only available for RaceCar and Maze environments)

For instance, to launch an interactive session on RaceCar and use a Human-Gated interactive approach (i.e. you decide when to take control):

To control, use the gamepad (joystick button to take control, LT RT for forward backward) (potentially no longer true: ZQSD to control the car, Spacebar to take control)

python .\environments\envs\race_car\interact\run.py environment.run.headless=false oracle=false

e.g. launching interactive training on maze with RND-Dagger (tip: reduce max_epoch to small value to reduce the number of training iterations before interactive sessions, to check everything is working quickly)

python intheloop/work/imitationgame_racing/environments/envs/maze/interact/run.py n_initial_episodes=25 sync.session_length=2000 decision_rule=rnd max_epoch=2500 seed=42 decision_rule.parameters.context_length=2 sync.num_sessions=8 sync.n_frame_stable=1 decision_rule.trainer.parameters.model.hidden_size=32 decision_rule.trainer.parameters.model.n_layers=0 decision_rule.parameters.threshold_factor=2 eval.first_to_eval=-8 eval.n_episodes=50 decision_rule.trainer.parameters.model.type=regular decision_rule.parameters.padding_type=replicate sync.max_session_length=100000

on racecar (multirun, for cluster launches)

python intheloop/work/imitationgame_racing/environments/envs/race_car/interact/run.py -m n_initial_episodes=1 sync.session_length=2000 decision_rule=rnd max_epoch=2500 seed=120,210,420,12,42,21,1200,2100,120,210,420,12,42,21,1200,2100 sync.num_sessions=8 sync.n_frame_stable=1 decision_rule.parameters.padding_type=replicate eval.first_to_eval=-8 decision_rule.parameters.context_length=10 decision_rule.parameters.threshold_factor=2 decision_rule.trainer.parameters.model.hidden_size=32 decision_rule.trainer.parameters.model.n_layers=0 sync.lazy_threshold_divisor=1

Results

The different results are available in: \

working_dir/map/method/seed/expe_folder/interaction_data to get the different metrics necessary to compute the results working_dir/map/method/seed/expe_folder/evaluation to get the performance of the BC models for each session

© 2026 Ubisoft Entertainment. All Rights Reserved.

About

This codebase contains the code used to produce experiments from the paper Efficient Active Imitation Learning with Random Network Distillation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

No contributors

Languages