Skip to content

Conversation

@yangchen73
Copy link
Collaborator

Description

This PR migrates EmbodiAgent, a LLM-based agent system for autonomous robot task execution in EmbodiChain. The system enables robots to perceive, plan, code, execute, and validate complex manipulation tasks through a closed-loop control cycle, which is developed by https://github.com/Jasonxu1225 .

Key Features

Three specialized agents working in coordination:

  • TaskAgent: High-level reasoning and task decomposition from visual observations
  • CodeAgent: Translates natural language plans into executable Python code using atomic actions
  • ValidationAgent: Validates task execution and provides feedback for refinement

Key Related Files

  • Core agent implementation: embodichain/agents/hierarchy/
  • Agent environments: embodichain/lab/gym/envs/tasks/tableware/
  • Toolkit interfaces: embodichain/toolkits/interfaces.py
  • Motion generation: embodichain/lab/gym/motion_generation/
  • Documentation: embodichain/agents/README.md

Type of change

  • New feature (non-breaking change which adds functionality)

Screenshots

Pour Water

pour_waterr.mp4

Rearrangement

rearrangement.mp4

Dual Arm Pour Water

pour_water_dual.mp4

Checklist

  • I have run the black . command to format the code base.

  • I have made corresponding changes to the documentation

  • I have added tests that prove my fix is effective or that my feature works

  • Dependencies have been updated, if applicable.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 38 out of 39 changed files in this pull request and generated 51 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 17 to 40
if agent_config.get("TaskAgent") is not None:
self.task_agent = TaskAgent(
task_llm,
**agent_config["Agent"],
**agent_config["TaskAgent"],
task_name=task_name,
)
self.code_agent = CodeAgent(
code_llm,
**agent_config["Agent"],
**agent_config.get("CodeAgent"),
task_name=task_name,
)
self.validation_agent = ValidationAgent(
validation_llm,
task_name=task_name,
task_description=self.code_agent.prompt_kwargs.get("task_prompt")[
"content"
],
basic_background=self.code_agent.prompt_kwargs.get("basic_background")[
"content"
],
atom_actions=self.code_agent.prompt_kwargs.get("atom_actions")["content"],
)
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agent initialization in _init_agents is missing error handling when agent_config does not contain expected keys. If agent_config doesn't have 'Agent', 'TaskAgent', or 'CodeAgent' keys, this will raise a KeyError. Consider adding validation or default values to handle missing configuration gracefully.

Copilot uses AI. Check for mistakes.
Comment on lines +260 to +280
class InjectKwargs(ast.NodeTransformer):
def visit_Call(self, node):
self.generic_visit(node)
# Inject **kwargs if not present
has_kwargs = any(
kw.arg is None
and isinstance(kw.value, ast.Name)
and kw.value.id == "kwargs"
for kw in node.keywords
)
if not has_kwargs:
node.keywords.append(
ast.keyword(
arg=None, value=ast.Name(id="kwargs", ctx=ast.Load())
)
)
return node

# Transform AST to inject kwargs
tree = InjectKwargs().visit(tree)
ast.fix_missing_locations(tree)
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AST injection of kwargs into function calls (InjectKwargs transformer) modifies all function calls globally within the code, not just drive() calls. This could have unintended side effects if the generated code calls other functions that don't expect **kwargs. Consider making this more targeted to only inject kwargs into specific function calls like drive() and atomic action functions.

Copilot uses AI. Check for mistakes.
Comment on lines +245 to +265
def extract_drive_calls(code_str: str) -> list[str]:
tree = ast.parse(code_str)
lines = code_str.splitlines()

drive_blocks = []

for node in tree.body:
# Match: drive(...)
if (
isinstance(node, ast.Expr)
and isinstance(node.value, ast.Call)
and isinstance(node.value.func, ast.Name)
and node.value.func.id == "drive"
):
# AST line numbers are 1-based
start = node.lineno - 1
end = node.end_lineno
block = "\n".join(lines[start:end])
drive_blocks.append(block)

return drive_blocks
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function extract_drive_calls uses ast.parse on user-provided code_str without any validation or error handling. If the code string is malformed or contains syntax errors, this will raise an exception. Add try-except block to handle parsing errors gracefully.

Copilot uses AI. Check for mistakes.
Comment on lines 1149 to 1217
def drive(
left_arm_action=None,
right_arm_action=None,
env=None,
**kwargs,
):

if left_arm_action is not None and right_arm_action is not None:
len_left = len(left_arm_action)
len_right = len(right_arm_action)

if len_left < len_right:
diff = len_right - len_left
padding = np.repeat(left_arm_action[-1:], diff, axis=0)
left_arm_action = np.concatenate([left_arm_action, padding], axis=0)
elif len_right < len_left:
diff = len_left - len_right
padding = np.repeat(right_arm_action[-1:], diff, axis=0)
right_arm_action = np.concatenate([right_arm_action, padding], axis=0)

left_arm_index = env.left_arm_joints + env.left_eef_joints
right_arm_index = env.right_arm_joints + env.right_eef_joints
actions = np.zeros((len(right_arm_action), len(env.init_qpos)))
actions[:, left_arm_index] = left_arm_action
actions[:, right_arm_index] = right_arm_action

elif left_arm_action is None and right_arm_action is not None:
left_arm_index = env.left_arm_joints + env.left_eef_joints
right_arm_index = env.right_arm_joints + env.right_eef_joints
left_arm_action = finalize_actions(
env.left_arm_current_qpos, env.left_arm_current_gripper_state
)
left_arm_action = np.repeat(
left_arm_action[None, :], len(right_arm_action), axis=0
)

actions = np.zeros(
(len(right_arm_action), len(env.robot.get_qpos().squeeze(0))),
dtype=np.float32,
)
actions[:, left_arm_index] = left_arm_action
actions[:, right_arm_index] = right_arm_action

elif right_arm_action is None and left_arm_action is not None:
left_arm_index = env.left_arm_joints + env.left_eef_joints
right_arm_index = env.right_arm_joints + env.right_eef_joints
right_arm_action = finalize_actions(
env.right_arm_current_qpos, env.right_arm_current_gripper_state
)
right_arm_action = np.repeat(
right_arm_action[None, :], len(left_arm_action), axis=0
)

actions = np.zeros(
(len(left_arm_action), len(env.robot.get_qpos().squeeze(0))),
dtype=np.float32,
)
actions[:, left_arm_index] = left_arm_action
actions[:, right_arm_index] = right_arm_action

else:
log_error("At least one arm action should be provided.")

actions = torch.from_numpy(actions).to(dtype=torch.float32).unsqueeze(1)
actions = list(actions.unbind(dim=0))
for i in tqdm(range(len(actions))):
action = actions[i]
obs, reward, terminated, truncated, info = env.step(action)
return actions
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The drive() function modifies environment state via env.step() in a loop without any error handling. If step() raises an exception (e.g., due to invalid action), the loop will terminate abruptly. Additionally, the return value 'actions' is a list of tensors but the function documentation doesn't specify this return type. Add error handling and document the return type.

Copilot uses AI. Check for mistakes.
Comment on lines +237 to +242
response = view_selection_llm.invoke(messages).content.strip()

if response not in img_dirs:
raise ValueError(f"Invalid camera selection from LLM: {response}")

return response
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation_llm.invoke() call in select_best_view_dir doesn't have any error handling. If the LLM call fails or times out, this will raise an unhandled exception. Additionally, if the LLM returns an unexpected value (not in img_dirs), a ValueError is raised, but this might be too strict - consider logging and falling back to a default camera instead.

Copilot uses AI. Check for mistakes.
import time
import cv2
import glob
import json
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'json' is not used.

Suggested change
import json

Copilot uses AI. Check for mistakes.
# ----------------------------------------------------------------------------

import torch
import numpy as np
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'np' is not used.

Suggested change
import numpy as np

Copilot uses AI. Check for mistakes.
ChatPromptTemplate,
HumanMessagePromptTemplate,
)
from embodichain.utils.utility import encode_image, encode_image_from_path
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'encode_image_from_path' is not used.

Suggested change
from embodichain.utils.utility import encode_image, encode_image_from_path
from embodichain.utils.utility import encode_image

Copilot uses AI. Check for mistakes.

from abc import ABCMeta, abstractmethod
import os
import cv2
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'cv2' is not used.

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@yangchen73
Copy link
Collaborator Author

@copilot open a new pull request to apply changes based on the comments in this thread

Copilot AI mentioned this pull request Jan 22, 2026
22 tasks
Copy link
Contributor

Copilot AI commented Jan 22, 2026

@yangchen73 I've opened a new pull request, #86, to work on those changes. Once the pull request is ready, I'll request review from you.

@@ -0,0 +1,31 @@
{ "TaskAgent": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the folder into pour_water_agent

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I'v renamed this folder.

@@ -0,0 +1,97 @@
# EmbodiAgent System
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add agent related docs into docs/features folder

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I'll rewrite the docs and move it to docs/features.


```bash
# Set environment variables
export AZURE_OPENAI_ENDPOINT="[https://your-endpoint.openai.azure.com/](https://your-endpoint.openai.azure.com/)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docs to show how to change different LLM/VLM API

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll add it into the docs which discribe agent system.

@@ -0,0 +1,5 @@
Task:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move prompt file to configs/gym/agent/{specific_task} would be better

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggustion.

@@ -0,0 +1,136 @@
### Atom Functions for Robot Arm Control
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

database folder has been removed. Put these reusable prompts into agent folder

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it

return "{}_{}_qpos".format(scope, tag)


def get_control_part(env, agent_uid):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These teo functions seems useless. Please removed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

return dofs > 10


class HandQposNormalizer:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove useless objects in this file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: yangchen73 <115123709+yangchen73@users.noreply.github.com>
)

# Run main function
main(args, env, gym_config)
Copy link
Collaborator

@yhnsu yhnsu Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from .run_env import main instead of rewrite it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

code_file_path, kwargs, _ = self.generate_code_for_actions(
regenerate=regenerate
)
action_list = self.code_agent.act(code_file_path, **kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to define self.action_length = len(action_list) for truncate detection inside the environment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'v defined it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Features related to agentic system gym robot learning env and its related features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants