Skip to content

docs: converged manifest format — dual component resolution#7

Open
samet-akcay wants to merge 4 commits into
mainfrom
docs/converged-manifest-v5
Open

docs: converged manifest format — dual component resolution#7
samet-akcay wants to merge 4 commits into
mainfrom
docs/converged-manifest-v5

Conversation

@samet-akcay
Copy link
Copy Markdown
Contributor

@samet-akcay samet-akcay commented Mar 27, 2026

Summary

Design documentation update for the converged manifest format used by both PhysicalAI and LeRobot inference exports. This PR documents the dual component resolution approach and removes the _normalize_metadata() migration shim in favor of a clean-cut nested Pydantic design.

Requesting team feedback on the design before implementation begins.

Key Design Decisions

Decision Rationale
Dual resolution (type + class_path) type + flat params for interoperability (LeRobot writes this). class_path + init_args for full-power PhysicalAI. Both resolve through the same ComponentRegistryinstantiate_component() pipeline. One if-check, not an if-chain per type.
No _normalize_metadata() Since we're writing both the schema and loader simultaneously, there's no intermediate state requiring a flattening shim. Clean cut to nested Pydantic models.
from_legacy_metadata() only Pre-manifest metadata.yaml files (early PhysicalAI exports) are the only backward compat path needed.

How to Review

  1. Start with the Executive Summary and Dual Component Resolution table in lerobot.md (top of file)
  2. Review Section 3 (How PhysicalAI Loads) — this is the core design change
  3. Check the PhysicalAI-native manifest example in inferencekit.md → Manifest Format section
  4. Review Section 10 (Migration) — significantly simplified

Questions for Reviewers

  • Does the dual resolution approach (type + class_path) make sense for your use cases?
  • Any concerns about dropping _normalize_metadata() entirely?
  • Is the migration path (Section 10) clear and sufficient?

Related

- Update lerobot.md (v4→v5): dual-path resolution (type + class_path),
  drop _normalize_metadata(), direct Pydantic parsing, rewrite runner
  resolution, processor construction, migration, and comparison table
- Update inferencekit.md (v4→v5): dual resolution in 'How models are
  loaded', add PhysicalAI-native class_path+init_args manifest example

Key design decisions documented:
- type + flat params (LeRobot interop) and class_path + init_args
  (PhysicalAI full-power) both resolve through ComponentRegistry
- No _normalize_metadata() shim — clean cut to nested Pydantic models
- from_legacy_metadata() handles pre-manifest YAML only
@samet-akcay samet-akcay changed the title docs: converged manifest format v5 — dual component resolution docs: converged manifest format — dual component resolution Mar 27, 2026
Comment thread docs/design/components/inferencekit.md Outdated
Comment on lines +548 to +554
"inputs": [
{"name": "observation.image", "dtype": "float32", "shape": ["B", 3, 96, 96]},
{"name": "observation.state", "dtype": "float32", "shape": ["B", 14]}
],
"outputs": [
{"name": "action", "dtype": "float32", "shape": ["B", 100, 14]}
],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need inputs and outputs sections? This will conflict with robots and cameras, where we basically have the same shapes as here.

In case they are absolutely needed, I'd refactor hardware as:

"hardware": {
  "robot_type": "SO-100",
  "cameras": ["top", "wrist"]
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, are the input shapes pre/post preproccessing/postprocessing?

Comment thread docs/design/components/inferencekit.md Outdated
{
"type": "normalize",
"mode": "mean_std",
"artifact": "stats.safetensors",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"artifact": "stats.safetensors",
"stats_path": "stats.safetensors",

Comment thread docs/design/components/inferencekit.md Outdated
{
"type": "denormalize",
"mode": "mean_std",
"artifact": "stats.safetensors",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"artifact": "stats.safetensors",
"stats_path": "stats.safetensors",

Comment thread docs/design/components/inferencekit.md Outdated
"inference": {
"n_obs_steps": 1,
"runner": {
"class_path": "action_chunking",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for class_path + init_args, the path should always be the full python class path. Otherwise, it makes more sense to use type

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a good suggestion, thanks

"inference": {
"n_obs_steps": 1,
"runner": {
"type": "action_chunking",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the manifest supports two component formats:

  • type + flat params: manifest exported from lerobot and can be read by physicalai.
  • class_path + init_args: for custom components that aren't in the built-in registry. Only physicalai reads this.

This makes sense, type keeps the system portable, while class_path keeps the system open so 3rd parties can plug in custom runners/processors without needing a PR to physicalai

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this is the idea

samet-akcay and others added 2 commits March 31, 2026 11:42
Address PR #7 review feedback from @maxxgx:

1. Rename top-level 'inference' section to 'model' in manifest format.
   'model.io' is self-documenting (model's I/O shapes), removing the
   ambiguity with hardware I/O that 'inference.io' caused.

2. Fix class_path examples to always use full Python paths:
   - 'action_chunking' → 'physicalai.inference.runners.ActionChunkingRunner'
   - 'normalize' → 'physicalai.inference.preprocessors.StatsNormalizer'
   - 'denormalize' → 'physicalai.inference.postprocessors.StatsDenormalizer'

3. Verified stats_path is already correct in class_path+init_args examples.

Updated both inferencekit.md and lerobot.md to v5.1.
Aggressively trim both design docs for coherency and readability:
- inferencekit.md: 1202 → 742 lines (cut Appendix, API Reference, verbose domain examples, full runner implementations)
- lerobot.md: ~960 → ~420 lines (cut rationale appendices, testing/migration sections, verbose code blocks)

Keep: Architecture, Core Components, Manifest Format (both type and class_path examples), Dual Resolution, Runners, Usage Examples.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Comment thread docs/design/components/inferencekit.md Outdated
Comment on lines +540 to +541
"robots": [],
"cameras": []
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include an example of what this looks like?
I'd like to use this as a reference for the naming mapping issue that Ronald spoke about during the weekly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added concrete examples. Both manifest examples now show a filled-in hardware section. I was thinking about something like this, but we need to discuss together to settle on the design. @maxxgx

 hardware: {
   robots: [
     {
       name: main,
       type: SO-100,
       state: {
         shape: [6], dtype: float32,
         order: [shoulder_pan, shoulder_lift, elbow_flex, wrist_flex, wrist_roll, gripper]
       },
       action: { ... same ... }
     }
   ],
   cameras: [
     {name: top, shape: [3, 480, 640], dtype: uint8},
     {name: wrist, shape: [3, 480, 640], dtype: uint8}
   ]
 }

The key thing for the naming mapping: name fields are logical names matching the keys used during training. At deployment, the user maps these to physical devices. The order field declares joint ordering explicitly — the runtime can compare it against the robot's actual joint order to catch mismatches at startup (critical for multi-arm setups).

Copy link
Copy Markdown
Contributor

@maxxgx maxxgx Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think this is the format we've settled on at the moment. Note it's possible that state might have a different shape compared to action, e.g. state might include additional sensor readings as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants