Skip to content

Add chunked hFILE input scheme#2018

Open
fabwa wants to merge 1 commit into
samtools:developfrom
fabwa:chunked-hfile-input
Open

Add chunked hFILE input scheme#2018
fabwa wants to merge 1 commit into
samtools:developfrom
fabwa:chunked-hfile-input

Conversation

@fabwa
Copy link
Copy Markdown

@fabwa fabwa commented May 22, 2026

Summary

This adds a built-in chunked: hFILE input scheme for reading byte-split binary streams such as BAMs whose raw bytes have been stored in ordered chunks.

The scheme takes a manifest file containing one chunk path per line, ignores blank/comment lines, resolves relative chunk paths against the manifest directory, and exposes the chunks as one seekable logical file. BGZF/BAM readers can consume the concatenated byte stream without samtools-specific wrapping, and normal BAM indexes can be built and used against the logical chunked input.

Example:

samtools view chunked:chunks.fofn
samtools index chunked:chunks.fofn
samtools view chunked:chunks.fofn chr1:1000-2000

chunks.fofn should list the raw BAM byte chunks in order.

Notes

Chunk files must be seekable so HTSlib can determine chunk sizes and translate logical BAM offsets to the right chunk and intra-chunk offset.

For local manifests, default index names are derived from the manifest path, so samtools index chunked:chunks.fofn writes chunks.fofn.bai and samtools view chunked:chunks.fofn region can discover it.

Tests

make -j4
REF_PATH=: ./test.pl -F test_view
test/hfile
test/test_introspection
git diff --check

Also manually tested manifest-relative chunk names with blank/comment lines in both normal and -@4 threaded reads.

@fabwa fabwa marked this pull request as ready for review May 23, 2026 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant