Skip to content

DLIO prints certain output on all the ranks #304

@wolfgang-desalvador

Description

@wolfgang-desalvador

Problem

When running DLIO benchmark in multi-process mode, every MPI rank independently emits debug and diagnostic messages to stdout. This results in massively duplicated output that obscures actual benchmark results and makes logs difficult to parse.

Affected Areas

1. Config debug dump after initialization (main.py)

After loading configuration, 13 print() statements dump config values (storage_type, storage_root, data_folder, batch_size, epochs, etc.) to stdout. These use raw print() rather than the structured DLIOLogger, and they fire on every rank unconditionally. This produces duplicated noise before the benchmark starts.

2. Data generation method banners (config.py)

During derive_configurations(), multi-line diagnostic banners are emitted for the data generation method (DGEN vs. NumPy fallback). These include prominent === border lines and warning messages. Every rank prints these independently.

Expected Behavior

Diagnostic and debug messages should be emitted once (from rank 0 only) and routed through the proper logging infrastructure rather than raw print().

Metadata

Metadata

Assignees

No one assigned

    Labels

    DLIO or mlpstoragerelated to code in mlpstorage or dlio

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions