Skip to content

[WIP] VLN Benchmark: H1 Navigation in Matterport 3D with NaVILA VLM#446

Open
kanghui0204 wants to merge 1 commit intomainfrom
feature/vln-benchmark
Open

[WIP] VLN Benchmark: H1 Navigation in Matterport 3D with NaVILA VLM#446
kanghui0204 wants to merge 1 commit intomainfrom
feature/vln-benchmark

Conversation

@kanghui0204
Copy link
Collaborator

[WIP] VLN Benchmark: H1 Navigation in Matterport 3D with NaVILA VLM

Summary

Add Vision-Language Navigation (VLN) benchmark support to IsaacLab Arena, enabling H1 humanoid navigation in Matterport 3D indoor scenes using the NaVILA VLM.

This is a draft PR for early review and feedback. The core pipeline is functional and tested, but some areas may need refinement before merging.

Architecture

Two-level hierarchical policy:

  • High-level: NaVILA VLM generates velocity commands from RGB image history
  • Low-level: RSL-RL locomotion policy converts velocity commands to joint actions

Communication between Isaac Sim (client) and NaVILA (server) uses Arena's ZeroMQ remote-policy framework (merged in #394).

What's Included

Component Location Description
H1 Embodiment isaaclab_arena/embodiments/h1/ Standard H1 + VLN extension (cameras, observations)
VLN Task isaaclab_arena/tasks/vln_r2r_matterport_task.py R2R episode management, scene filtering, termination
VLN Metrics isaaclab_arena/metrics/vln_metrics.py SPL, Success, PathLength, DistanceToGoal
Matterport Background isaaclab_arena/assets/matterport_background.py Scene loading + lighting + ground plane
Client Policy isaaclab_arena/policy/vln/ VlnVlmLocomotionPolicy (VLM + RSL-RL composite)
NaVILA Server isaaclab_arena_navila/ NaVilaServerPolicy (LLaVA-based VLM inference)
Environment isaaclab_arena_environments/vln_environment.py h1_vln_matterport environment registration
Docker docker/Dockerfile.vln_server, docker/run_vln_server.sh VLM server container
Pretrained LL model isaaclab_arena/policy/vln/pretrained/ H1 locomotion checkpoint (4.7MB)

Key Design Decisions

  • Full image history + uniform sampling: The VLM receives 8 uniformly sampled frames from the entire episode history (not just the last 8). This enables the VLM to determine task completion and output "stop".
  • Scene filtering: Episodes are automatically filtered by the loaded USD scene. --episode_start/end refers to indices within the filtered set.
  • XY metrics: Distance calculations use horizontal (XY) plane only, because robot pelvis height (~0.9m) differs from dataset waypoint height (~0.17m floor level).
  • Modular server: NaVILA server is in a separate package (isaaclab_arena_navila/), following the isaaclab_arena_gr00t pattern. Other VLMs can be added without changing client code.

Test Results

Episodes Success SPL Avg Distance-to-Goal
10 (zsNo4HB9uLZ) 0.40 0.36 6.17m
3 (zsNo4HB9uLZ) 0.67 0.59 5.41m
1 (best case) 1.00 0.77 0.77m

VLM correctly outputs "stop" when task is complete (e.g., "I think I should stop because I have finished the instruction.").

Known Limitations

  • num_envs must be 1 (multi-env VLM instruction tracking not yet implemented)
  • Scene switching requires process restart
  • Uses invisible ground plane instead of Matterport mesh collision (GPU physics limitation)
  • Pretrained checkpoint included for early testing; will be removed before final merge

How to Test

See isaaclab_arena_navila/README.md for full setup instructions (English + Chinese).

Server
bash docker/run_vln_server.sh -m /path/to/navila-model --port 5555

Client (inside Isaac Sim container)

/isaac-sim/python.sh -u -m isaaclab_arena.evaluation.policy_runner \
--enable_cameras --num_envs 1 \
--policy_type isaaclab_arena.policy.vln.vln_vlm_locomotion_policy.VlnVlmLocomotionPolicy \
--remote_host localhost --remote_port 5555 \
--ll_checkpoint_path isaaclab_arena/policy/vln/pretrained/h1_navila_locomotion.pt \
--ll_agent_cfg isaaclab_arena/policy/vln/pretrained/h1_navila_agent.yaml \
--num_episodes 5 \
h1_vln_matterport \
--usd_path /datasets/VLN-CE-Isaac/matterport_usd/zsNo4HB9uLZ/zsNo4HB9uLZ.usd \
--r2r_dataset_path /datasets/VLN-CE-Isaac/vln_ce_isaac_v1.json.gz

Checklist

  • End-to-end pipeline verified (VLM inference → velocity → locomotion → navigation)
  • VLM correctly outputs "stop" for task completion
  • Standard VLN metrics (SPL, Success, PathLength, DTG)
  • Docker server build and launch scripts
  • Documentation (README with English + Chinese)
  • Pretrained H1 locomotion checkpoint included
  • Multi-env support
  • Matterport mesh collision
  • CI integration
  • Performance benchmarking across all 11 scenes

Features:
- VLN evaluation pipeline: VlnVlmLocomotionPolicy (VLM + RSL-RL composite)
- NaVilaServerPolicy with configurable history padding, frame cap, token limit
- Matterport collision support: ground plane, collision overlay, mesh colliders
- Collision proxy tools: USD-to-OBJ export and OBJ-to-USDA conversion
- User-configurable sensors: head/follow camera position, depth, height scanner
- STOP diagnostic flags: --ignore_vlm_stop, --min_vlm_stop_distance
- Updated pre-trained H1 locomotion checkpoint (rough-terrain)
- Comprehensive user-facing README with CLI reference and training guide

Cleanup:
- Remove unused VLN bridge code (rslrl_loader, vln_env_wrapper, vln_client_side_policy)
- Remove dead --ll_* CLI parameters from vln_environment
- Fix configclass default_factory handling in combine_configclass_instances
- Fix policy_runner startup: skip global asset registry for dotted policy paths
- Lazy imports in policy/__init__ to avoid remote asset timeout during training

Made-with: Cursor
@kanghui0204 kanghui0204 force-pushed the feature/vln-benchmark branch from 3da5b1c to 5dad0b0 Compare March 9, 2026 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants