enable path coverage computation on gbz files with vg depth. #4883
Open
glennhickey wants to merge 4 commits into
Open
enable path coverage computation on gbz files with vg depth. #4883glennhickey wants to merge 4 commits into
glennhickey wants to merge 4 commits into
Conversation
On a GBZ/GBWT-backed graph, the default for_each_path_handle and
for_each_step_on_handle iterations elide haplotype paths. As a result,
vg depth's path coverage reports zero coverage from haplotypes, which
defeats the typical use case of measuring pangenome coverage along a
reference on a GBZ.
Switch the selection iteration to for_each_path_of_sense with
{REFERENCE, GENERIC, HAPLOTYPE}, and the per-handle step iteration in
path_depths / path_depth_of_bin to for_each_step_of_sense with the same
set. On non-GBZ graphs this is equivalent to the prior behavior (the
default sense-filtered iterators fall back on the unfiltered ones).
Verified on yeast-27 GBZ: depth along S288C#0#chrI now reports ~5x
coverage, matching the 5 haplotype paths present on that contig.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…verage The haplotype-sense expansion added in 20936aa was applied to the shared path-selection iterator, so it also leaked into -k (pack) mode. Keep for_each_path_handle for -k so pack coverage behaves as before; only path coverage (no -k) uses for_each_path_of_sense({REF, GEN, HAP}). The outer loop over ref_paths is now an OpenMP parallel-for with schedule(dynamic,1) and an ordered clause. Each iteration writes to a thread-local ostringstream and flushes under `#pragma omp ordered`, keeping output in deterministic path order. With multiple paths we cap active nesting levels so the inner `binned_*_depth` pragmas don't over-subscribe; with a single path we skip the outer region so inner parallelism still benefits binned workloads. yeast-27, 3 chromosome-scale paths, -b 100000: -t 1: 2m47s -t 8: 1m18s (output byte-identical) Tests in 49_vg_depth.t verify: - path coverage on GBZ counts haplotype paths - pack coverage on GBZ ignores the haplotype selection expansion - parallel output matches -t 1 byte-for-byte Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…egion With `#pragma omp parallel for ordered schedule(dynamic, 1)`, a thread that finished a short iteration had to block at `#pragma omp ordered` until all earlier iterations had emitted. A thread cannot pick up a new iteration while blocked there, so with a skewed work distribution (e.g. augref_* paths of very different lengths on an HPRC GBZ) most threads piled up idle waiting on one slow iteration -- effective parallelism dropped to ~2 cores even with -t 12. Each iteration now writes into its own slot of a pre-sized vector<string>; we serialize the emission after the parallel region. Threads grab new work as soon as they finish compute, so a slow iter no longer stalls the pool. Output remains deterministic across thread counts. Memory trade-off: all per-path output buffers live simultaneously. For the coarse-bin use cases this is negligible; for -b 1 on whole-genome paths this could be hefty, but that wasn't a practical workload before either. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changelog Entry
To be copied to the draft changelog by merger:
vg depthwill now work on .gbz files.Description