Skip to content

Question About Joint Training Data Pipeline and SFT Dataset Configuration #16

@cjfcsjt

Description

@cjfcsjt

Hi,

First of all, thank you very much for your excellent work and for open-sourcing the code!

According to the paper and the codebase, during the joint training stage you use ReconthenUndIterableDataset to load the SFT training data for spatial understanding datasets such as SPAR-7M, Omnispatial, Mindcube, and OST-Bench. In addition, SftJSONLIterableDataset is used for general VQA datasets like LLaVA-One-Vision.

I have a couple of questions regarding the data pipeline:

  1. It seems that ReconthenUndIterableDataset requires additional 3d annotations such as depth maps and camera poses. However, as far as I know, the training sets of Omnispatial and OST-Bench do not provide such annotations. In this case, should these two datasets be loaded through SftJSONLIterableDataset and trained in a purely 2D manner instead?

  2. Would it be possible to share a complete example of the SFT training data configuration (e.g., dataset config, mixing strategy, or YAML example) for the joint training stage?

Thank you very much for your time and help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions