[WIP] VLN Benchmark: H1 Navigation in Matterport 3D with NaVILA VLM by kanghui0204 · Pull Request #446 · isaac-sim/IsaacLab-Arena

kanghui0204 · 2026-02-26T12:08:17Z

[WIP] VLN Benchmark: H1 Navigation in Matterport 3D with NaVILA VLM

Summary

Add Vision-Language Navigation (VLN) benchmark support to IsaacLab Arena, enabling H1 humanoid navigation in Matterport 3D indoor scenes using the NaVILA VLM.

This is a draft PR for early review and feedback. The core pipeline is functional and tested, but some areas may need refinement before merging.

Architecture

Two-level hierarchical policy:

High-level: NaVILA VLM generates velocity commands from RGB image history
Low-level: RSL-RL locomotion policy converts velocity commands to joint actions

Communication between Isaac Sim (client) and NaVILA (server) uses Arena's ZeroMQ remote-policy framework (merged in #394).

What's Included

Component	Location	Description
H1 Embodiment	`isaaclab_arena/embodiments/h1/`	Standard H1 + VLN extension (cameras, observations)
VLN Task	`isaaclab_arena/tasks/vln_r2r_matterport_task.py`	R2R episode management, scene filtering, termination
VLN Metrics	`isaaclab_arena/metrics/vln_metrics.py`	SPL, Success, PathLength, DistanceToGoal
Matterport Background	`isaaclab_arena/assets/matterport_background.py`	Scene loading + lighting + ground plane
Client Policy	`isaaclab_arena/policy/vln/`	VlnVlmLocomotionPolicy (VLM + RSL-RL composite)
NaVILA Server	`isaaclab_arena_navila/`	NaVilaServerPolicy (LLaVA-based VLM inference)
Environment	`isaaclab_arena_environments/vln_environment.py`	`h1_vln_matterport` environment registration
Docker	`docker/Dockerfile.vln_server`, `docker/run_vln_server.sh`	VLM server container
Pretrained LL model	`isaaclab_arena/policy/vln/pretrained/`	H1 locomotion checkpoint (4.7MB)

Key Design Decisions

Full image history + uniform sampling: The VLM receives 8 uniformly sampled frames from the entire episode history (not just the last 8). This enables the VLM to determine task completion and output "stop".
Scene filtering: Episodes are automatically filtered by the loaded USD scene. --episode_start/end refers to indices within the filtered set.
XY metrics: Distance calculations use horizontal (XY) plane only, because robot pelvis height (~0.9m) differs from dataset waypoint height (~0.17m floor level).
Modular server: NaVILA server is in a separate package (isaaclab_arena_navila/), following the isaaclab_arena_gr00t pattern. Other VLMs can be added without changing client code.

Test Results

Episodes	Success	SPL	Avg Distance-to-Goal
10 (zsNo4HB9uLZ)	0.40	0.36	6.17m
3 (zsNo4HB9uLZ)	0.67	0.59	5.41m
1 (best case)	1.00	0.77	0.77m

VLM correctly outputs "stop" when task is complete (e.g., "I think I should stop because I have finished the instruction.").

Known Limitations

num_envs must be 1 (multi-env VLM instruction tracking not yet implemented)
Scene switching requires process restart
Uses invisible ground plane instead of Matterport mesh collision (GPU physics limitation)
Pretrained checkpoint included for early testing; will be removed before final merge

How to Test

See isaaclab_arena_navila/README.md for full setup instructions (English + Chinese).

Server
bash docker/run_vln_server.sh -m /path/to/navila-model --port 5555

Client (inside Isaac Sim container)

/isaac-sim/python.sh -u -m isaaclab_arena.evaluation.policy_runner \
--enable_cameras --num_envs 1 \
--policy_type isaaclab_arena.policy.vln.vln_vlm_locomotion_policy.VlnVlmLocomotionPolicy \
--remote_host localhost --remote_port 5555 \
--ll_checkpoint_path isaaclab_arena/policy/vln/pretrained/h1_navila_locomotion.pt \
--ll_agent_cfg isaaclab_arena/policy/vln/pretrained/h1_navila_agent.yaml \
--num_episodes 5 \
h1_vln_matterport \
--usd_path /datasets/VLN-CE-Isaac/matterport_usd/zsNo4HB9uLZ/zsNo4HB9uLZ.usd \
--r2r_dataset_path /datasets/VLN-CE-Isaac/vln_ce_isaac_v1.json.gz

Checklist

Features: - VLN evaluation pipeline: VlnVlmLocomotionPolicy (VLM + RSL-RL composite) - NaVilaServerPolicy with configurable history padding, frame cap, token limit - Matterport collision support: ground plane, collision overlay, mesh colliders - Collision proxy tools: USD-to-OBJ export and OBJ-to-USDA conversion - User-configurable sensors: head/follow camera position, depth, height scanner - STOP diagnostic flags: --ignore_vlm_stop, --min_vlm_stop_distance - Updated pre-trained H1 locomotion checkpoint (rough-terrain) - Comprehensive user-facing README with CLI reference and training guide Cleanup: - Remove unused VLN bridge code (rslrl_loader, vln_env_wrapper, vln_client_side_policy) - Remove dead --ll_* CLI parameters from vln_environment - Fix configclass default_factory handling in combine_configclass_instances - Fix policy_runner startup: skip global asset registry for dotted policy paths - Lazy imports in policy/__init__ to avoid remote asset timeout during training Made-with: Cursor

kanghui0204 requested review from alexmillane, viiik-inside and xyao-nv February 26, 2026 12:08

kanghui0204 force-pushed the feature/vln-benchmark branch from 3da5b1c to 5dad0b0 Compare March 9, 2026 09:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] VLN Benchmark: H1 Navigation in Matterport 3D with NaVILA VLM#446

[WIP] VLN Benchmark: H1 Navigation in Matterport 3D with NaVILA VLM#446
kanghui0204 wants to merge 1 commit intomainfrom
feature/vln-benchmark

kanghui0204 commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kanghui0204 commented Feb 26, 2026