docs: converged manifest format — dual component resolution#7
docs: converged manifest format — dual component resolution#7samet-akcay wants to merge 4 commits into
Conversation
- Update lerobot.md (v4→v5): dual-path resolution (type + class_path), drop _normalize_metadata(), direct Pydantic parsing, rewrite runner resolution, processor construction, migration, and comparison table - Update inferencekit.md (v4→v5): dual resolution in 'How models are loaded', add PhysicalAI-native class_path+init_args manifest example Key design decisions documented: - type + flat params (LeRobot interop) and class_path + init_args (PhysicalAI full-power) both resolve through ComponentRegistry - No _normalize_metadata() shim — clean cut to nested Pydantic models - from_legacy_metadata() handles pre-manifest YAML only
| "inputs": [ | ||
| {"name": "observation.image", "dtype": "float32", "shape": ["B", 3, 96, 96]}, | ||
| {"name": "observation.state", "dtype": "float32", "shape": ["B", 14]} | ||
| ], | ||
| "outputs": [ | ||
| {"name": "action", "dtype": "float32", "shape": ["B", 100, 14]} | ||
| ], |
There was a problem hiding this comment.
Do we really need inputs and outputs sections? This will conflict with robots and cameras, where we basically have the same shapes as here.
In case they are absolutely needed, I'd refactor hardware as:
"hardware": {
"robot_type": "SO-100",
"cameras": ["top", "wrist"]
}There was a problem hiding this comment.
Moreover, are the input shapes pre/post preproccessing/postprocessing?
| { | ||
| "type": "normalize", | ||
| "mode": "mean_std", | ||
| "artifact": "stats.safetensors", |
There was a problem hiding this comment.
| "artifact": "stats.safetensors", | |
| "stats_path": "stats.safetensors", |
| { | ||
| "type": "denormalize", | ||
| "mode": "mean_std", | ||
| "artifact": "stats.safetensors", |
There was a problem hiding this comment.
| "artifact": "stats.safetensors", | |
| "stats_path": "stats.safetensors", |
| "inference": { | ||
| "n_obs_steps": 1, | ||
| "runner": { | ||
| "class_path": "action_chunking", |
There was a problem hiding this comment.
I think for class_path + init_args, the path should always be the full python class path. Otherwise, it makes more sense to use type
There was a problem hiding this comment.
This is actually a good suggestion, thanks
| "inference": { | ||
| "n_obs_steps": 1, | ||
| "runner": { | ||
| "type": "action_chunking", |
There was a problem hiding this comment.
If I understand correctly, the manifest supports two component formats:
type + flat params: manifest exported from lerobot and can be read by physicalai.class_path + init_args: for custom components that aren't in the built-in registry. Only physicalai reads this.
This makes sense, type keeps the system portable, while class_path keeps the system open so 3rd parties can plug in custom runners/processors without needing a PR to physicalai
There was a problem hiding this comment.
yes this is the idea
Address PR #7 review feedback from @maxxgx: 1. Rename top-level 'inference' section to 'model' in manifest format. 'model.io' is self-documenting (model's I/O shapes), removing the ambiguity with hardware I/O that 'inference.io' caused. 2. Fix class_path examples to always use full Python paths: - 'action_chunking' → 'physicalai.inference.runners.ActionChunkingRunner' - 'normalize' → 'physicalai.inference.preprocessors.StatsNormalizer' - 'denormalize' → 'physicalai.inference.postprocessors.StatsDenormalizer' 3. Verified stats_path is already correct in class_path+init_args examples. Updated both inferencekit.md and lerobot.md to v5.1.
Aggressively trim both design docs for coherency and readability: - inferencekit.md: 1202 → 742 lines (cut Appendix, API Reference, verbose domain examples, full runner implementations) - lerobot.md: ~960 → ~420 lines (cut rationale appendices, testing/migration sections, verbose code blocks) Keep: Architecture, Core Components, Manifest Format (both type and class_path examples), Dual Resolution, Runners, Usage Examples. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
| "robots": [], | ||
| "cameras": [] |
There was a problem hiding this comment.
Can we include an example of what this looks like?
I'd like to use this as a reference for the naming mapping issue that Ronald spoke about during the weekly.
There was a problem hiding this comment.
Added concrete examples. Both manifest examples now show a filled-in hardware section. I was thinking about something like this, but we need to discuss together to settle on the design. @maxxgx
hardware: {
robots: [
{
name: main,
type: SO-100,
state: {
shape: [6], dtype: float32,
order: [shoulder_pan, shoulder_lift, elbow_flex, wrist_flex, wrist_roll, gripper]
},
action: { ... same ... }
}
],
cameras: [
{name: top, shape: [3, 480, 640], dtype: uint8},
{name: wrist, shape: [3, 480, 640], dtype: uint8}
]
}
The key thing for the naming mapping: name fields are logical names matching the keys used during training. At deployment, the user maps these to physical devices. The order field declares joint ordering explicitly — the runtime can compare it against the robot's actual joint order to catch mismatches at startup (critical for multi-arm setups).
There was a problem hiding this comment.
Yeah I think this is the format we've settled on at the moment. Note it's possible that state might have a different shape compared to action, e.g. state might include additional sensor readings as well
Summary
Design documentation update for the converged manifest format used by both PhysicalAI and LeRobot inference exports. This PR documents the dual component resolution approach and removes the
_normalize_metadata()migration shim in favor of a clean-cut nested Pydantic design.Requesting team feedback on the design before implementation begins.
Key Design Decisions
type+class_path)type+ flat params for interoperability (LeRobot writes this).class_path+init_argsfor full-power PhysicalAI. Both resolve through the sameComponentRegistry→instantiate_component()pipeline. One if-check, not an if-chain per type._normalize_metadata()from_legacy_metadata()onlymetadata.yamlfiles (early PhysicalAI exports) are the only backward compat path needed.How to Review
lerobot.md(top of file)inferencekit.md→ Manifest Format sectionQuestions for Reviewers
_normalize_metadata()entirely?Related