Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ jobs:
- uses: actions/checkout@v4
- name: Run tests
run: |
pip install -e . --extra-index-url http://pyp.open3dv.site:2345/simple/ --trusted-host pyp.open3dv.site
pip install -e .[lerobot] --extra-index-url http://pyp.open3dv.site:2345/simple/ --trusted-host pyp.open3dv.site
echo "Unit test Start"
export HF_ENDPOINT=https://hf-mirror.com
pip uninstall pymeshlab -y
Expand Down
3 changes: 1 addition & 2 deletions configs/agents/rl/push_cube/gym_config.json
Original file line number Diff line number Diff line change
Expand Up @@ -113,9 +113,8 @@
}
},
"extensions": {
"obs_mode": "state",
"action_type": "delta_qpos",
"episode_length": 100,
"joint_limits": 0.5,
"action_scale": 0.1,
"success_threshold": 0.1
}
Expand Down
6 changes: 3 additions & 3 deletions configs/agents/rl/push_cube/train_config.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@
"gym_config": "configs/agents/rl/push_cube/gym_config.json",
"seed": 42,
"device": "cuda:0",
"headless": false,
"headless": true,
"enable_rt": false,
"gpu_id": 0,
"num_envs": 8,
"num_envs": 64,
"iterations": 1000,
"rollout_steps": 1024,
"eval_freq": 200,
"save_freq": 200,
"use_wandb": true,
"use_wandb": false,
"wandb_project_name": "embodychain-push_cube",
"events": {
"eval": {
Expand Down
104 changes: 80 additions & 24 deletions docs/source/overview/gym/env.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,19 @@

The {class}`~envs.EmbodiedEnv` is the core environment class in EmbodiChain designed for complex Embodied AI tasks. It adopts a **configuration-driven** architecture, allowing users to define robots, sensors, objects, lighting, and automated behaviors (events) purely through configuration classes, minimizing the need for boilerplate code.

For **Reinforcement Learning** tasks, EmbodiChain provides {class}`~envs.RLEnv`, a specialized subclass that extends {class}`~envs.EmbodiedEnv` with RL-specific utilities such as flexible action preprocessing, goal management, and standardized info structure.

## Core Architecture

Unlike the standard {class}`~envs.BaseEnv`, the {class}`~envs.EmbodiedEnv` integrates several manager systems to handle the complexity of simulation:
EmbodiChain provides a hierarchy of environment classes for different task types:

* **Scene Management**: Automatically loads and manages robots, sensors, and scene objects defined in the configuration.
* **Event Manager**: Handles automated behaviors such as domain randomization, scene setup, and dynamic asset swapping.
* **Observation Manager**: Allows flexible extension of observation spaces without modifying the environment code.
* **Dataset Manager**: Built-in support for collecting demonstration data during simulation steps.
* **{class}`~envs.BaseEnv`**: Minimal environment for simple tasks with custom simulation logic.
* **{class}`~envs.EmbodiedEnv`**: Feature-rich environment for Embodied AI tasks (IL, custom control). Integrates manager systems:
* **Scene Management**: Automatically loads and manages robots, sensors, and scene objects.
* **Event Manager**: Domain randomization, scene setup, and dynamic asset swapping.
* **Observation Manager**: Flexible observation space extensions.
* **Dataset Manager**: Built-in support for demonstration data collection.
* **{class}`~envs.RLEnv`**: Specialized environment for RL tasks, extending {class}`~envs.EmbodiedEnv` with action preprocessing, goal management, and standardized reward/info structure.

## Configuration System

Expand Down Expand Up @@ -77,7 +82,7 @@ The {class}`~envs.EmbodiedEnvCfg` class exposes the following additional paramet
Dataset collection settings. Defaults to None, in which case no dataset collection is performed. Please refer to the {class}`~envs.managers.DatasetManager` class for more details.

* **extensions** (Union[Dict[str, Any], None]):
Task-specific extension parameters that are automatically bound to the environment instance. This allows passing custom parameters (e.g., ``episode_length``, ``obs_mode``, ``action_scale``) without modifying the base configuration class. These parameters are accessible as instance attributes after environment initialization. For example, if ``extensions = {"episode_length": 500}``, you can access it via ``self.episode_length``. Defaults to None.
Task-specific extension parameters that are automatically bound to the environment instance. This allows passing custom parameters (e.g., ``episode_length``, ``action_type``, ``action_scale``) without modifying the base configuration class. These parameters are accessible as instance attributes after environment initialization. For example, if ``extensions = {"episode_length": 500}``, you can access it via ``self.episode_length``. Defaults to None.

* **filter_visual_rand** (bool):
Whether to filter out visual randomization functors. Useful for debugging motion and physics issues when visual randomization interferes with the debugging process. Defaults to ``False``.
Expand Down Expand Up @@ -108,7 +113,8 @@ class MyTaskEnvCfg(EmbodiedEnvCfg):
# 4. Task Extensions
extensions = { # Task-specific parameters
"episode_length": 500,
"obs_mode": "state",
"action_type": "delta_qpos",
"action_scale": 0.1,
}
```

Expand Down Expand Up @@ -165,54 +171,104 @@ The manager operates in a single mode ``"save"`` which handles both recording an

The dataset manager is called automatically during {meth}`~envs.Env.step()`, ensuring all observation-action pairs are recorded without additional user code.

## Reinforcement Learning Environment

For RL tasks, EmbodiChain provides {class}`~envs.RLEnv`, a specialized base class that extends {class}`~envs.EmbodiedEnv` with RL-specific utilities:

* **Action Preprocessing**: Flexible action transformation supporting delta_qpos, absolute qpos, joint velocity, joint force, and end-effector pose (with IK).
* **Goal Management**: Built-in goal pose tracking and visualization with axis markers.
* **Standardized Info Structure**: Template methods for computing task-specific success/failure conditions and metrics.
* **Episode Management**: Configurable episode length and truncation logic.

### Configuration Extensions for RL

RL environments use the ``extensions`` field to pass task-specific parameters:

```python
extensions = {
"action_type": "delta_qpos", # Action type: delta_qpos, qpos, qvel, qf, eef_pose
"action_scale": 0.1, # Scaling factor applied to all actions
"episode_length": 100, # Maximum episode length
"success_threshold": 0.1, # Task-specific success threshold (optional)
}
```

## Creating a Custom Task

To create a new task, inherit from {class}`~envs.EmbodiedEnv` and implement the task-specific logic.
### For Reinforcement Learning Tasks

Inherit from {class}`~envs.RLEnv` and implement the task-specific logic:

```python
from embodichain.lab.gym.envs import RLEnv, EmbodiedEnvCfg
from embodichain.lab.gym.utils.registration import register_env

@register_env("MyRLTask-v0", max_episode_steps=100)
class MyRLTaskEnv(RLEnv):
def __init__(self, cfg: MyTaskEnvCfg, **kwargs):
super().__init__(cfg, **kwargs)

def compute_task_state(self, **kwargs):
# Required: Compute task-specific success/failure and metrics
# Returns: Tuple[success, fail, metrics]
# - success: torch.Tensor of shape (num_envs,) with boolean values
# - fail: torch.Tensor of shape (num_envs,) with boolean values
# - metrics: Dict of metric tensors for logging

is_success = ... # Compute success condition
is_fail = torch.zeros_like(is_success)
metrics = {"distance": ..., "angle_error": ...}

return is_success, is_fail, metrics

def check_truncated(self, obs, info):
# Optional: Override to add custom truncation conditions
# Default: episode_length timeout
is_timeout = super().check_truncated(obs, info)
is_fallen = ... # Custom condition (e.g., robot fell)
return is_timeout | is_fallen
```

Configure rewards through the {class}`~envs.managers.RewardManager` in your environment config rather than overriding ``get_reward``.

### For Imitation Learning Tasks

Inherit from {class}`~envs.EmbodiedEnv` for IL tasks:

```python
from embodichain.lab.gym.envs import EmbodiedEnv, EmbodiedEnvCfg
from embodichain.lab.gym.utils.registration import register_env

@register_env("MyTask-v0", max_episode_steps=500)
class MyTaskEnv(EmbodiedEnv):
@register_env("MyILTask-v0", max_episode_steps=500)
class MyILTaskEnv(EmbodiedEnv):
def __init__(self, cfg: MyTaskEnvCfg, **kwargs):
super().__init__(cfg, **kwargs)

def create_demo_action_list(self, *args, **kwargs):
# Optional: Implement for expert demonstration data generation (for Imitation Learning)
# This method is used to generate scripted demonstrations for IL data collection.
# Required: Generate scripted demonstrations for data collection
# Must set self.action_length = len(action_list) if returning actions
pass

def is_task_success(self, **kwargs):
# Optional: Define success criteria (mainly for IL data collection)
# Required: Define success criteria for filtering successful episodes
# Returns: torch.Tensor of shape (num_envs,) with boolean values
return success_tensor

def get_reward(self, obs, action, info):
# Optional: Override for RL tasks
# Returns: torch.Tensor of shape (num_envs,)
return super().get_reward(obs, action, info)

def get_info(self, **kwargs):
# Optional: Override to add custom info fields
# Should include "success" and "fail" keys for termination
info = super().get_info(**kwargs)
info["custom_metric"] = ...
return info
```

```{note}
The {meth}`~envs.EmbodiedEnv.create_demo_action_list` method is specifically designed for expert demonstration data generation in Imitation Learning scenarios. For Reinforcement Learning tasks, you should override the {meth}`~envs.EmbodiedEnv.get_reward` method instead.
```

For a complete example of a modular environment setup, please refer to the {ref}`tutorial_modular_env` tutorial.

## See Also

- {ref}`tutorial_create_basic_env` - Creating basic environments
- {ref}`tutorial_modular_env` - Advanced modular environment setup
- {doc}`/api_reference/embodichain/embodichain.lab.gym.envs` - Complete API reference for EmbodiedEnv and EmbodiedEnvCfg
- {ref}`tutorial_rl` - Reinforcement learning training guide
- {doc}`/api_reference/embodichain/embodichain.lab.gym.envs` - Complete API reference for EmbodiedEnv, RLEnv, and configurations

```{toctree}
:maxdepth: 1
Expand Down
70 changes: 56 additions & 14 deletions docs/source/tutorial/rl.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,13 @@ The ``env`` section defines the task environment:
- **id**: Environment registry ID (e.g., "PushCubeRL")
- **cfg**: Environment-specific configuration parameters

For RL environments (inheriting from ``RLEnv``), use the ``extensions`` field for RL-specific parameters:

- **action_type**: Action type - "delta_qpos" (default), "qpos", "qvel", "qf", "eef_pose"
- **action_scale**: Scaling factor applied to all actions (default: 1.0)
- **episode_length**: Maximum episode length (default: 1000)
- **success_threshold**: Task-specific success threshold (optional)

Example:

.. code-block:: json
Expand All @@ -86,10 +93,12 @@ Example:
"id": "PushCubeRL",
"cfg": {
"num_envs": 4,
"obs_mode": "state",
"episode_length": 100,
"action_scale": 0.1,
"success_threshold": 0.1
"extensions": {
"action_type": "delta_qpos",
"action_scale": 0.1,
"episode_length": 100,
"success_threshold": 0.1
}
}
}

Expand Down Expand Up @@ -321,41 +330,74 @@ Adding a New Environment

To add a new RL environment:

1. Create an environment class inheriting from ``EmbodiedEnv``
2. Register it with the Gymnasium registry:
1. Create an environment class inheriting from ``RLEnv`` (which provides action preprocessing, goal management, and standardized info structure):

.. code-block:: python

from embodichain.lab.gym.envs import RLEnv, EmbodiedEnvCfg
from embodichain.lab.gym.utils.registration import register_env
import torch

@register_env("MyTaskRL", max_episode_steps=100, override=True)
class MyTaskEnv(EmbodiedEnv):
cfg: MyTaskEnvCfg
...
class MyTaskEnv(RLEnv):
def __init__(self, cfg: EmbodiedEnvCfg = None, **kwargs):
super().__init__(cfg, **kwargs)

def compute_task_state(self, **kwargs):
"""Compute success/failure conditions and metrics."""
is_success = ... # Define success condition
is_fail = torch.zeros_like(is_success)
metrics = {"distance": ..., "error": ...}
return is_success, is_fail, metrics

def check_truncated(self, obs, info):
"""Optional: Add custom truncation conditions."""
is_timeout = super().check_truncated(obs, info)
# Add custom conditions if needed
return is_timeout

3. Use the environment ID in your JSON config:
2. Configure the environment in your JSON config with RL-specific extensions:

.. code-block:: json

"env": {
"id": "MyTaskRL",
"cfg": {
...
"num_envs": 4,
"extensions": {
"action_type": "delta_qpos",
"action_scale": 0.1,
"episode_length": 100,
"success_threshold": 0.05
}
}
}

The ``RLEnv`` base class provides:

- **Action Preprocessing**: Automatically handles different action types (delta_qpos, qpos, qvel, qf, eef_pose)
- **Action Scaling**: Applies ``action_scale`` to all actions
- **Goal Management**: Built-in goal pose tracking and visualization
- **Standardized Info**: Implements ``get_info()`` using ``compute_task_state()`` template method

Best Practices
~~~~~~~~~~~~~~

- **Device Management**: Device is single-sourced from ``runtime.cuda``. All components (trainer/algorithm/policy/env) share the same device.
- **Use RLEnv for RL Tasks**: Always inherit from ``RLEnv`` for reinforcement learning tasks. It provides action preprocessing, goal management, and standardized info structure out of the box.

- **Action Type Configuration**: Configure ``action_type`` in the environment's ``extensions`` field. The default is "delta_qpos" (incremental joint positions). Other options: "qpos" (absolute), "qvel" (velocity), "qf" (force), "eef_pose" (end-effector pose with IK).

- **Action Scaling**: Keep action scaling in the environment, not in the policy.
- **Action Scaling**: Use ``action_scale`` in the environment's ``extensions`` field to scale actions. This is applied in ``RLEnv._preprocess_action()`` before robot control.

- **Device Management**: Device is single-sourced from ``runtime.cuda``. All components (trainer/algorithm/policy/env) share the same device.

- **Observation Format**: Environments should provide consistent observation shape/types (torch.float32) and a single ``done = terminated | truncated``.

- **Algorithm Interface**: Algorithms must implement ``initialize_buffer()``, ``collect_rollout()``, and ``update()`` methods. The algorithm completely controls data collection and buffer management.

- **Reward Components**: Organize reward components in ``info["rewards"]`` dictionary and metrics in ``info["metrics"]`` dictionary. The trainer performs dense per-step logging directly from environment info.
- **Reward Configuration**: Use the ``RewardManager`` in your environment config to define reward components. Organize reward components in ``info["rewards"]`` dictionary and metrics in ``info["metrics"]`` dictionary. The trainer performs dense per-step logging directly from environment info.

- **Template Methods**: Override ``compute_task_state()`` to define success/failure conditions and metrics. Override ``check_truncated()`` for custom truncation logic.

- **Configuration**: Use JSON for all hyperparameters. This makes experiments reproducible and easy to track.

Expand Down
6 changes: 5 additions & 1 deletion embodichain/agents/rl/algo/ppo.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,8 +94,12 @@ def collect_rollout(
current_obs, deterministic=False
)

# Wrap action as dict for env processing
action_type = getattr(env, "action_type", "delta_qpos")
action_dict = {action_type: actions}

# Step environment
result = env.step(actions)
result = env.step(action_dict)
next_obs, reward, terminated, truncated, env_info = result
done = terminated | truncated
# Light dtype normalization
Expand Down
34 changes: 31 additions & 3 deletions embodichain/agents/rl/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,20 @@
from embodichain.lab.gym.envs.managers.cfg import EventCfg


def main():
def parse_args():
"""Parse command line arguments."""
parser = argparse.ArgumentParser()
parser.add_argument("--config", type=str, required=True, help="Path to JSON config")
args = parser.parse_args()
return parser.parse_args()


def train_from_config(config_path: str):
"""Run training from a config file path.

with open(args.config, "r") as f:
Args:
config_path: Path to the JSON config file
"""
with open(config_path, "r") as f:
cfg_json = json.load(f)

trainer_cfg = cfg_json["trainer"]
Expand Down Expand Up @@ -274,8 +282,28 @@ def main():
wandb.finish()
except Exception:
pass

# Clean up environments to prevent resource leaks
try:
if env is not None:
env.close()
except Exception as e:
logger.log_warning(f"Failed to close training environment: {e}")

try:
if eval_env is not None:
eval_env.close()
except Exception as e:
logger.log_warning(f"Failed to close evaluation environment: {e}")

logger.log_info("Training finished")


def main():
"""Main entry point for command-line training."""
args = parse_args()
train_from_config(args.config)


if __name__ == "__main__":
main()
Loading