Question About Joint Training Data Pipeline and SFT Dataset Configuration

Hi,

First of all, thank you very much for your excellent work and for open-sourcing the code!

According to the paper and the codebase, during the joint training stage you use `ReconthenUndIterableDataset` to load the SFT training data for spatial understanding datasets such as SPAR-7M, Omnispatial, Mindcube, and OST-Bench. In addition, `SftJSONLIterableDataset` is used for general VQA datasets like LLaVA-One-Vision.

I have a couple of questions regarding the data pipeline:

1. It seems that `ReconthenUndIterableDataset` requires additional 3d annotations such as depth maps and camera poses. However, as far as I know, the training sets of Omnispatial and OST-Bench do not provide such annotations. In this case, should these two datasets be loaded through SftJSONLIterableDataset and trained in a purely 2D manner instead?

2. Would it be possible to share a complete example of the SFT training data configuration (e.g., dataset config, mixing strategy, or YAML example) for the joint training stage?

Thank you very much for your time and help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question About Joint Training Data Pipeline and SFT Dataset Configuration #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question About Joint Training Data Pipeline and SFT Dataset Configuration #16

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions