Skip to content

Latest commit

 

History

History
166 lines (106 loc) · 4.53 KB

File metadata and controls

166 lines (106 loc) · 4.53 KB

Agent Training

SafeVerse is not only a real-to-sim reconstruction pipeline — it is a closed-loop training platform that connects reconstructed digital twins, adversarial scene editing, and online reinforcement learning.

This section describes how to launch the Minecraft-based training environment and enable agents to evolve under dynamic attack scenarios.

Unlike traditional static benchmarks, SafeVerse supports a “Reconstruction → Attack → Immunity Evolution” workflow. Agents are trained directly inside high-fidelity digital twins reconstructed from real-world videos and continuously exposed to dynamically edited adversarial conditions.


1. Environment Preparation

1.1 Download Digital Twin Environments

SafeVerse training requires prebuilt Minecraft environments generated by the reconstruction pipeline.

Please download the environment package from:

https://huggingface.co/datasets/Thegun/SafeVerse_minecraft

After downloading:

  • Extract the dataset
  • Copy the contents into: server/env{0-31}/

Each folder corresponds to a reconstructed, interactive digital twin scene.

These environments are not static maps — they contain:

  • Physically interactive objects
  • Editable layouts

1.2 Optional Visual Enhancement (Recommended)

Some furniture assets achieve better visual realism with the TMEO resource pack.

If you would like enhanced visual fidelity during training visualization or demonstrations, you may optionally purchase and install TMEO:

https://www.minegraph.cn/shaderpacks/15

This step is not mandatory for functionality but improves realism for qualitative evaluation and demos.


2. Server Setup

The Minecraft server acts as the interactive execution engine for SafeVerse.
It hosts reconstructed scenes and enables:

  • Real-time environment manipulation
  • Adversarial scene editing
  • Agent–environment interaction
  • Online RL feedback loop

2.1 Create Conda Environment

cd ./server
conda env create -f server_env.yml

2.2Activate Environment

conda activate <your_env_name>

2.3 Launch Server

cd ./server
bash start_server.sh

After startup, the server becomes the execution backend for training. Make sure you record the server node IP address — it will be required in the training configuration.

3. Training Framework (Online Evolution)

SafeVerse enables online reinforcement learning within reconstructed real-world.

Unlike traditional embodied training pipelines that rely on fixed datasets and static environments, SafeVerse introduces a continuously evolving training loop.

Agents trained in SafeVerse:

  • Operate inside reconstructed real-world scenes
  • Face dynamically edited adversarial conditions
  • Adapt through online reinforcement learning
  • Improve robustness under distribution shifts

Typical adversarial edits include:

  • Blocking critical navigation paths
  • Locking or modifying object interaction properties
  • Rearranging furniture and spatial layouts
  • Changing lighting or visibility conditions

This creates a dynamic curriculum instead of a closed-world benchmark.


3.1 Create Training Environment

cd ./train
conda env create -f train_env.yml

3.2 Activate Training Environment

After successfully creating the training environment, activate it:

conda activate <your_env_name>

Make sure this environment matches the dependencies specified in train_env.yml.

This environment contains all required libraries for reinforcement learning, environment interaction, and experiment logging.

3.3 Start Training

After the Minecraft server is running and the training environment is activated, you can launch the SafeVerse training pipeline:

cd ./train
bash train.sh

Required Configuration (Inside train.sh)

Before launching training, make sure the following variables are properly configured inside train.sh:

export SERVER_IP=""
  • The IP address of the running Minecraft server.
  • The training process connects to this server for real-time environment interaction.
BASE_NAME=""
  • Identifier for the current experiment.

  • Used to name:

    • Training logs
    • Model checkpoints
    • Weights & Biases (WandB) runs
export SAVE_CHECKPOINT_DIR=""
  • Directory where model checkpoints will be saved.
  • Ensure the path exists and has sufficient storage space.
DATA=""
  • Path to the dataset used for training initialization (if applicable).
  • This may include pretrained weights or task configuration files.