Skip to content

Add optional progress output for long-running subcommands (mpileup, view, norm, index) #2559

@carstenerickson

Description

@carstenerickson

Summary

bcftools mpileup, view, norm, and index run silent on multi-hour inputs, making "still working" indistinguishable from "hung" without external tooling. An opt-in periodic stderr progress line would close the gap at near-zero cost.

Current workaround

We read /proc/<pid>/io (Linux only) for bytes-read and divide by input file size. Limitations:

  • Linux only; no /proc/<pid>/io on macOS or BSD.
  • Bytes only, no record counts.
  • Fragile child-PID discovery when bcftools sits behind a shell pipe.
  • mpileup needs a special case because the I/O-doing process is the child, not the parent shell.

Our wrapper is ~140 LOC handling these edges.

Suggested approach

Either of:

  • Opt-in flag--progress[=INTERVAL] emits a stderr line every INTERVAL records (default 1M). One extra branch per record; off by default, so scripts are unaffected.
  • TTY auto-enableisatty(STDERR_FILENO)-gated, like tar --checkpoint=.N. Interactive only, no flag.

Format suggestion:

[mpileup] 12.3 / 23.0 GB processed (53.5%), 145M records, 1234s elapsed, 11min remaining

Bytes from htsFile.fp.bgzf->block_offset; records from each subcommand's main loop; ETA from linear extrapolation, omitted when input size is unknown.

Low priority — the workaround works. Filing because the proliferation of /proc/<pid>/io wrappers across pipelines is a UX signal that upstream should own this.


bcftools 1.23.1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions