Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
a8ad6dd
docs: remove BUG_REVIEW.md; move CLI_MIGRATION.md to docs/
ypriverol May 26, 2026
55cff3f
docs(spec): PR-Q1 quality cleanup design + finalize CLI_MIGRATION refs
ypriverol May 26, 2026
84f8329
chore: scrub 32 dangling .java:LINE references in non-test source
ypriverol May 26, 2026
ba4c6b3
chore: neutralize "port of MS-GF+" framing in headers and CLI help
ypriverol May 26, 2026
f0831b0
chore: rename MSGFRUST_RSS_PROBE -> MSGF_RSS_PROBE (legacy accepted)
ypriverol May 26, 2026
20da1b4
chore: fix all clippy warnings (workspace)
ypriverol May 26, 2026
67316e5
ci: lift clippy gate to required (--all-targets -D warnings)
ypriverol May 26, 2026
2e1b6c7
docs: remove shipped iter39 design+plan; track PR-Q1 plan
ypriverol May 26, 2026
ea1f481
chore: address PR-Q1 final review observations
ypriverol May 26, 2026
28e7a65
docs(spec): PR-V1 value-delivering design (stacks on PR-Q1 cleanup)
ypriverol May 26, 2026
542ab6e
perf(s1): swap HashMap -> FxHashMap on hot scoring tables
ypriverol May 26, 2026
67002e4
feat(s2): MassCalibrator threshold fallback 1e-6 -> 1e-5
ypriverol May 26, 2026
09824bd
Revert "feat(s2): MassCalibrator threshold fallback 1e-6 -> 1e-5"
ypriverol May 26, 2026
9a6607a
perf(scoring): branchless f32/f64 rounding + GF DP inner-loop tightening
ypriverol May 27, 2026
d6a869d
Revert "perf(scoring): branchless f32/f64 rounding + GF DP inner-loop…
ypriverol May 27, 2026
319af81
perf(model): swap HashMap -> FxHashMap on AminoAcidSet hot tables
ypriverol May 27, 2026
096dbca
perf(search): eliminate per-interior-position Vec clone in candidate …
ypriverol May 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 4 additions & 6 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,10 +75,9 @@ jobs:
lint:
name: Lint (clippy + rustfmt)
runs-on: ubuntu-latest
# Advisory only — the iter1-38 codebase isn't fmt-clean / clippy-clean
# yet (~11k lines of fmt churn pending). Surfaces the warnings without
# blocking PRs while that cleanup is sequenced separately.
continue-on-error: true
# Clippy is REQUIRED after the PR-Q1 cleanup sweep (2026-05-26).
# Rustfmt remains advisory until a future fmt-clean sweep lands
# (~11k lines of cosmetic churn pending; tracked separately).
steps:
- name: Checkout
uses: actions/checkout@v4
Expand All @@ -96,5 +95,4 @@ jobs:
continue-on-error: true

- name: clippy
run: cargo clippy --workspace --all-targets
continue-on-error: true
run: cargo clippy --workspace --all-targets -- -D warnings
72 changes: 0 additions & 72 deletions BUG_REVIEW.md

This file was deleted.

4 changes: 3 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions DOCS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# msgf-rust documentation

This is the full reference for the `msgf-rust` binary and its outputs. For a quick start and benchmark summary, see [`README.md`](README.md). For porting Java MS-GF+ command lines and numeric legacy flags, see [`CLI_MIGRATION.md`](CLI_MIGRATION.md).
This is the full reference for the `msgf-rust` binary and its outputs. For a quick start and benchmark summary, see [`README.md`](README.md). For porting Java MS-GF+ command lines and numeric legacy flags, see [`docs/CLI_MIGRATION.md`](docs/CLI_MIGRATION.md).

Run `msgf-rust --help` for auto-generated help derived from the same `Cli` struct documented below.

Expand Down Expand Up @@ -94,7 +94,7 @@ Only tryptic enzyme models are bundled; other enzymes require `--param-file`.
|---|---|---|---|---|
| `--output-tsv` | path | *(off)* | Optional tab-separated PSM report (§3b). Skipped in bench mode (`--max-spectra > 0`). | Java `-outputFormat 1` with output path |

**Environment variable:** set `MSGFRUST_RSS_PROBE=1` on Linux to print `VmRSS` checkpoints to stderr during long runs (debugging memory use).
**Environment variable:** set `MSGF_RSS_PROBE=1` on Linux to print `VmRSS` checkpoints to stderr during long runs (debugging memory use). The legacy name `MSGFRUST_RSS_PROBE=1` is still accepted with a one-line deprecation warning and will be removed in the next quality cleanup.

---

Expand Down Expand Up @@ -459,7 +459,7 @@ msgf-rust accepts **both** canonical kebab-case flags with named enum values **a

### 8b. Numeric-legacy values

Full legacy 0…N → named-value tables for `--fragmentation`, `--instrument`, `--protocol`, and `--enzyme-specificity` (`--ntt`) live in [`CLI_MIGRATION.md`](CLI_MIGRATION.md). clap accepts named values case-insensitively (`--fragmentation hcd` ≡ `HCD`).
Full legacy 0…N → named-value tables for `--fragmentation`, `--instrument`, `--protocol`, and `--enzyme-specificity` (`--ntt`) live in [`docs/CLI_MIGRATION.md`](docs/CLI_MIGRATION.md). clap accepts named values case-insensitively (`--fragmentation hcd` ≡ `HCD`).

### 8c. Behavior differences

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ msgf-rust --spectrum spectra.mzML --database db.fasta \

**[quantms](https://github.com/bigbio/quantms) pipeline integration:**

Point quantms's PSM search step at `msgf-rust` and use the standard quantms post-processing. The `.pin` row format is the same; existing quantms scripts using legacy numeric flag values (`--fragmentation 3 --instrument 3 --protocol 4`) keep working without modification (see `CLI_MIGRATION.md`).
Point quantms's PSM search step at `msgf-rust` and use the standard quantms post-processing. The `.pin` row format is the same; existing quantms scripts using legacy numeric flag values (`--fragmentation 3 --instrument 3 --protocol 4`) keep working without modification (see [`docs/CLI_MIGRATION.md`](docs/CLI_MIGRATION.md)).

## CLI summary

Expand Down
3 changes: 1 addition & 2 deletions crates/input/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
//! Input-side readers for MS-GF+ Rust port: MGF and mzML spectrum files
//! and `.fasta` protein databases.
//! Input readers: MGF, mzML, FASTA.

pub mod fasta;
pub mod mgf;
Expand Down
6 changes: 3 additions & 3 deletions crates/input/src/mzml.rs
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,8 @@ const CV_32BIT: &str = "MS:1000521";
const CV_ZLIB: &str = "MS:1000574";

// Activation-method CV accessions (inside <precursor><activation>).
// These mirror Java MS-GF+'s `ActivationMethod.cvTable` in
// `msutil/ActivationMethod.java` — we map each to one of our five
// These mirror Java MS-GF+'s `ActivationMethod.cvTable` (Java parity)
// — we map each to one of our five
// canonical ActivationMethod variants. Unknown / unhandled child terms
// fall through and the spectrum's activation_method stays None.
const CV_CID: &str = "MS:1000133"; // collision-induced dissociation
Expand Down Expand Up @@ -348,7 +348,7 @@ impl<R: BufRead> MzMLReader<R> {
// here, so downstream param routing picks an ETD-trained
// model when ECD is the only signal.
//
// Selection rule (mirrors `StaxMzMLParser.java:595-605`):
// Selection rule (Java parity for activation-method selection):
// - ETD always wins (set unconditionally; matches Java's
// `isETD` short-circuit).
// - Other methods: first-wins. A spectrum with multiple
Expand Down
1 change: 1 addition & 0 deletions crates/model/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ license.workspace = true

[dependencies]
thiserror = { workspace = true }
rustc-hash = "2"

[dev-dependencies]
tempfile = "3.10"
18 changes: 12 additions & 6 deletions crates/model/src/aa_set.rs
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
//! Heavyweight residue-and-modification set. Built via
//! `AminoAcidSetBuilder`; queried by the candidate generator.

use std::collections::HashMap;
use std::fs;
use std::path::Path;
use std::sync::Arc;

use rustc_hash::FxHashMap;

use crate::amino_acid::AminoAcid;
use crate::enzyme::Enzyme;
use crate::modification::{ModLocation, ModParseError, Modification, ResidueSpec};
Expand All @@ -16,10 +17,15 @@ const IMPLAUSIBLE_MASS_THRESHOLD: f64 = 1000.0;
#[derive(Debug, Clone)]
pub struct AminoAcidSet {
/// (residue, location) → all variants (unmodified + modified) at that position.
table: HashMap<(u8, ModLocation), Vec<AminoAcid>>,
///
/// Iter2 perf: switched from `HashMap` (SipHash13, RandomState) to
/// `FxHashMap` after a flamegraph on the post-PR-V1 binary showed 39%
/// of Astral CPU in `variants_for` lookups via SipHash. Same hashbrown
/// internals, faster hasher.
table: FxHashMap<(u8, ModLocation), Vec<AminoAcid>>,
/// Per-location flattened AA lists, precomputed at build time. Avoids
/// per-call rebuild in the GF DP hot path (PrimitiveAaGraph::new).
aa_lists_cache: HashMap<ModLocation, Vec<AminoAcid>>,
aa_lists_cache: FxHashMap<ModLocation, Vec<AminoAcid>>,
has_cterm_mods: bool,
min_aa_mass: f64,
max_aa_mass: f64,
Expand Down Expand Up @@ -266,7 +272,7 @@ impl AminoAcidSetBuilder {
continue;
}
// Take everything after the first `=`. Java accepts whitespace around the value.
let value = line.splitn(2, '=').nth(1).unwrap_or("").trim();
let value = line.split_once('=').map(|x| x.1).unwrap_or("").trim();
let n: u32 = value.parse().map_err(|_| AaSetError::BadNumMods {
value: value.to_string(),
})?;
Expand Down Expand Up @@ -327,7 +333,7 @@ impl AminoAcidSetBuilder {
.map(Arc::new)
.collect();

let mut table: HashMap<(u8, ModLocation), Vec<AminoAcid>> = HashMap::new();
let mut table: FxHashMap<(u8, ModLocation), Vec<AminoAcid>> = FxHashMap::default();
let locations = [
ModLocation::Anywhere, ModLocation::NTerm, ModLocation::CTerm,
ModLocation::ProtNTerm, ModLocation::ProtCTerm,
Expand Down Expand Up @@ -404,7 +410,7 @@ impl AminoAcidSetBuilder {
// 5. Precompute the per-location AA lists used by `aa_list_for` and
// `cached_aa_list`. Runs once at build time so the GF DP hot path
// can borrow a slice.
let mut aa_lists_cache: HashMap<ModLocation, Vec<AminoAcid>> = HashMap::new();
let mut aa_lists_cache: FxHashMap<ModLocation, Vec<AminoAcid>> = FxHashMap::default();
let anywhere_list: Vec<AminoAcid> = STANDARD_RESIDUES
.iter()
.flat_map(|&r| {
Expand Down
2 changes: 1 addition & 1 deletion crates/model/src/amino_acid.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
//! cloned the `Modification`'s `String` `name` (and optional accession),
//! producing one heap allocation per modified residue per candidate. At
//! Astral scale that drives `PreparedSearch::prepare` to ~27 GB RSS on a
//! 31 GB VM (verified by the `MSGFRUST_RSS_PROBE=1` probe in
//! 31 GB VM (verified by the `MSGF_RSS_PROBE=1` probe in
//! `msgf-rust.rs`). Wrapping `Modification` in `Arc` makes clones a
//! refcount bump and shrinks `AminoAcid` from ~96 B to 24 B.

Expand Down
2 changes: 1 addition & 1 deletion crates/model/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
//! Domain model for MS-GF+ Rust port.
//! Core domain types: spectra, peptides, modifications, amino-acid sets, masses.
//!
//! Pure types: amino acids, modifications, peptides, enzymes,
//! tolerances, spectra, proteins, masses, activation, instrument,
Expand Down
Loading
Loading