Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Dockerfile.dev
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
FROM python:3.11-slim
RUN apt-get update && apt-get install -y --no-install-recommends git procps libglib2.0-0t64 && rm -rf /var/lib/apt/lists/*
WORKDIR /src
COPY . .
RUN pip install --no-cache-dir .
Comment on lines +1 to +5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Run the dev container as a non-root user.

The image never switches away from root. That keeps the whole dev session over-privileged and will also leave bind-mounted files owned by root on the host.

Suggested change
 FROM python:3.11-slim
-RUN apt-get update && apt-get install -y --no-install-recommends git procps libglib2.0-0t64 && rm -rf /var/lib/apt/lists/*
+RUN apt-get update \
+ && apt-get install -y --no-install-recommends git procps libglib2.0-0t64 \
+ && useradd --create-home --shell /bin/bash appuser \
+ && rm -rf /var/lib/apt/lists/*
 WORKDIR /src
 COPY . .
 RUN pip install --no-cache-dir .
+RUN chown -R appuser:appuser /src
+USER appuser
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
FROM python:3.11-slim
RUN apt-get update && apt-get install -y --no-install-recommends git procps libglib2.0-0t64 && rm -rf /var/lib/apt/lists/*
WORKDIR /src
COPY . .
RUN pip install --no-cache-dir .
FROM python:3.11-slim
RUN apt-get update \
&& apt-get install -y --no-install-recommends git procps libglib2.0-0t64 \
&& useradd --create-home --shell /bin/bash appuser \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /src
COPY . .
RUN pip install --no-cache-dir .
RUN chown -R appuser:appuser /src
USER appuser
🧰 Tools
🪛 Trivy (0.69.3)

[error] 1-1: Image user should not be 'root'

Specify at least 1 USER command in Dockerfile with non-root user as argument

Rule: DS-0002

Learn more

(IaC/Dockerfile)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile.dev` around lines 1 - 5, The Dockerfile leaves the container
running as root; create and switch to a non-root user: add user/group creation
(e.g., addgroup/adduser or groupadd/useradd), create the project directory if
needed, chown /src to that user, and add a USER <username> line before
installing/running app commands so subsequent steps run as the non-root user;
update relevant RUN/COPY/WORKDIR sequences in the Dockerfile to ensure ownership
is set (references: WORKDIR /src, COPY . ., RUN pip install --no-cache-dir .).

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ packages = [
[tool.poetry.dependencies]
python = "*"
click = "*"
sdrf-pipelines = "==0.0.33"
sdrf-pipelines = ">=0.1.1"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Which released sdrf-pipelinesversions expose bothsdrf_pipelines.converters.openms.unimod.UnimodDatabaseandsdrf_pipelines.sdrf.sdrf.read_sdrf/validate_sdrf, and what version range should downstream projects pin to in order to preserve those APIs?

💡 Result:

No released versions of sdrf-pipelines explicitly documented as exposing both sdrf_pipelines.converters.openms.unimod.UnimodDatabase and sdrf_pipelines.sdrf.sdrf.read_sdrf / validate_sdrf APIs could be identified from available sources. The package has releases from 0.0.2 to 0.1.0 (latest as of 2026-02-04), primarily CLI-focused (parse_sdrf validate-sdrf, convert-openms). Repository structure confirms sdrf_pipelines/ directory with sdrf/ and converters/, suggesting Python APIs exist, but no public API docs, changelogs, or version-specific mentions found. Downstream projects should test specific versions (e.g., >=0.0.33 <0.1.0) or pin to latest stable 0.1.0 after verifying API presence via code inspection, as no breaking change evidence exists.

Citations:


🏁 Script executed:

#!/bin/bash
# Check sdrf-pipelines versions available on PyPI and GitHub releases

# Try to fetch PyPI JSON API
echo "=== PyPI sdrf-pipelines releases ==="
curl -s "https://pypi.org/pypi/sdrf-pipelines/json" | jq -r '.releases | keys[]' | sort -V

echo -e "\n=== Latest version info ==="
curl -s "https://pypi.org/pypi/sdrf-pipelines/json" | jq '.info | {version, release_url}'

Repository: bigbio/quantms-utils

Length of output: 441


🏁 Script executed:

#!/bin/bash
# Verify the code uses the mentioned sdrf-pipelines APIs

echo "=== Searching for sdrf_pipelines imports in the codebase ==="
rg "from sdrf_pipelines|import sdrf_pipelines" -n

echo -e "\n=== Files that import sdrf_pipelines modules ==="
rg "sdrf_pipelines\." -l | head -20

Repository: bigbio/quantms-utils

Length of output: 433


Bound sdrf-pipelines to a stable version range.

This PR now depends on specific sdrf-pipelines APIs (UnimodDatabase, read_sdrf), but >=0.1.1 allows future pre-1.0 releases where these may change. Add an upper bound to keep installs reproducible.

Suggested change
-sdrf-pipelines = ">=0.1.1"
+sdrf-pipelines = ">=0.1.1,<0.2.0"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sdrf-pipelines = ">=0.1.1"
sdrf-pipelines = ">=0.1.1,<0.2.0"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` at line 34, The dependency specification for sdrf-pipelines
is too open (sdrf-pipelines = ">=0.1.1") and could allow breaking pre-1.0 API
changes that this PR relies on (UnimodDatabase, read_sdrf); update
pyproject.toml to pin a stable upper bound such as ">=0.1.1,<0.2.0" (or use a
compatible spec like "~=0.1.1") so installs remain reproducible and the expected
APIs (UnimodDatabase, read_sdrf) won't change unexpectedly.

pyopenms = ">=3.3.0"
pandas = "*"
pyarrow = ">=16.1.0"
Expand Down
2 changes: 1 addition & 1 deletion quantmsutils/diann/dianncfg.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from typing import List, Tuple
from collections import defaultdict
import click
from sdrf_pipelines.openms.unimod import UnimodDatabase
from sdrf_pipelines.converters.openms.unimod import UnimodDatabase

logging.basicConfig(format="%(asctime)s [%(funcName)s] - %(message)s", level=logging.DEBUG)
logger = logging.getLogger(__name__)
Expand Down
236 changes: 89 additions & 147 deletions quantmsutils/sdrf/check_samplesheet.py
Original file line number Diff line number Diff line change
@@ -1,192 +1,134 @@
# nf-core: Update the script to check the sdrf
# This script is based on the example at: https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv

import errno
import logging
import os
import sys

import click
import pandas as pd
from sdrf_pipelines.sdrf.sdrf import SdrfDataFrame
from sdrf_pipelines.sdrf.sdrf_schema import DEFAULT_TEMPLATE, MASS_SPECTROMETRY

from sdrf_pipelines.sdrf.sdrf import read_sdrf
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Comment on lines +6 to +7
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

sdrf-pipelines 0.0.33 read_sdrf validate_sdrf API

💡 Result:

sdrf-pipelines 0.0.33 is a Python package (https://pypi.org/project/sdrf-pipelines/0.0.33/, released Nov 23, 2025) from the bigbio/sdrf-pipelines GitHub repo (https://github.com/bigbio/sdrf-pipelines). It is the official SDRF (Sample and Data Relationship Format) file validator and converter for proteomics/multiomics, translating SDRF files to pipeline configurations (e.g., OpenMS, MaxQuant, MSstats). Primary usage is via the CLI tool parse_sdrf: - validate-sdrf: Validates SDRF structure, formatting, uniqueness, and optionally ontology terms (EFO, CL, MS) using YAML schema templates (default: mass spectrometry). Example: parse_sdrf validate-sdrf --sdrf_file path/to/sdrf.tsv --template human --skip-ontology Simplified: parse_sdrf validate-sdrf-simple path/to/sdrf.tsv --template default No explicit Python API documentation for read_sdrf or validate_sdrf functions found in official sources (PyPI, GitHub README, COMMANDS.md referenced but not detailed). These may be internal functions called by the CLI (e.g., in sdrf_pipelines.sdrf.validators or parsers), accessible after pip install sdrf-pipelines by importing the module, but primarily designed for CLI use. Check GitHub source code (sdrf_pipelines/ directory) for programmatic access. Install with pip install sdrf-pipelines==0.0.33 or sdrf-pipelines[ontology] for full validation.

Citations:


🏁 Script executed:

#!/bin/bash
# Verify sdrf-pipelines 0.0.33 API structure by checking GitHub source
# First, check if we can access the repo and look at the module structure at the tag for 0.0.33

# Try to fetch the raw content from GitHub at the specific version
# Check the structure of sdrf_pipelines.sdrf.sdrf module

curl -s "https://raw.githubusercontent.com/bigbio/sdrf-pipelines/0.0.33/sdrf_pipelines/sdrf/sdrf.py" | head -100

echo "---"

# Also check for SDRF class and validate_sdrf method
curl -s "https://raw.githubusercontent.com/bigbio/sdrf-pipelines/0.0.33/sdrf_pipelines/sdrf/sdrf.py" | grep -A 10 "def read_sdrf\|def validate_sdrf\|class.*SDRF" | head -50

Repository: bigbio/quantms-utils

Length of output: 81


🏁 Script executed:

#!/bin/bash
# Check for dependency specifications in the repository
find . -type f \( -name "setup.py" -o -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.cfg" \) | head -20

echo "---"

# Check the actual usage in the check_samplesheet.py file
cat -n quantmsutils/sdrf/check_samplesheet.py | head -50

Repository: bigbio/quantms-utils

Length of output: 1897


🏁 Script executed:

#!/bin/bash
# Check the requirements and pyproject.toml for sdrf-pipelines version
echo "=== requirements.txt ==="
cat requirements.txt | grep -i sdrf

echo ""
echo "=== pyproject.toml ==="
cat pyproject.toml | grep -A 5 -B 5 sdrf

Repository: bigbio/quantms-utils

Length of output: 285


🏁 Script executed:

#!/bin/bash
# Check the actual check_samplesheet.py to see how read_sdrf and validate_sdrf are being used
cat -n quantmsutils/sdrf/check_samplesheet.py

echo ""
echo "=== Checking if there are other files using these imports ==="
rg "read_sdrf|validate_sdrf" --type py

Repository: bigbio/quantms-utils

Length of output: 2914


Fix version conflict: requirements.txt (0.0.33) vs pyproject.toml (>=0.1.1).

The requirements.txt pins sdrf-pipelines==0.0.33 while pyproject.toml specifies sdrf-pipelines>=0.1.1. This creates an installation inconsistency. Verify whether the code targets 0.0.33 specifically or can work with 0.1.1+, then align both files. If the import path from sdrf_pipelines.sdrf.sdrf import read_sdrf and the method signature df.validate_sdrf(template=..., use_ols_cache_only=...) changed between versions, adjust the code accordingly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@quantmsutils/sdrf/check_samplesheet.py` around lines 5 - 6, Resolve the
sdrf-pipelines version mismatch by choosing the supported upstream version
(either pin to 0.0.33 in pyproject.toml or update requirements.txt to >=0.1.1),
then update the code to match that version's API: verify and, if necessary,
change the import "from sdrf_pipelines.sdrf.sdrf import read_sdrf" to the
correct module path used by the selected version and update calls to
df.validate_sdrf(template=..., use_ols_cache_only=...) to match the actual
parameter names and signature in that version (rename or remove
use_ols_cache_only or template args as required); ensure both dependency files
reference the same version constraint so installs are consistent.


logging.basicConfig(format="%(asctime)s [%(funcName)s] - %(message)s", level=logging.DEBUG)
logger = logging.getLogger(__name__)


def make_dir(path):
if len(path) > 0:
try:
os.makedirs(path)
except OSError as exception:
if exception.errno != errno.EEXIST:
raise exception


def print_error(error, context="Line", context_str=""):
error_str = "ERROR: Please check samplesheet -> {}".format(error)
if context != "" and context_str != "":
error_str = "ERROR: Please check samplesheet -> {}\n{}: '{}'".format(
error, context.strip(), context_str.strip()
)
print(error_str)
sys.exit(1)
# Minimal columns required to run quantms/quantmsdiann pipelines.
# These are checked in --minimal mode instead of full schema validation.
MINIMAL_REQUIRED_COLUMNS = [
"source name",
"assay name",
"comment[data file]",
"comment[label]",
"comment[cleavage agent details]",
"comment[instrument]",
"comment[proteomics data acquisition method]",
"technology type",
]

# Recommended columns: warn if missing but don't fail
MINIMAL_RECOMMENDED_COLUMNS = [
"comment[precursor mass tolerance]",
"comment[fragment mass tolerance]",
"comment[dissociation method]",
"comment[technical replicate]",
"comment[fraction identifier]",
]


def check_sdrf(
input_sdrf: str,
skip_ms_validation: bool = False,
skip_factor_validation: bool = False,
skip_experimental_design_validation: bool = False,
template: str = "ms-proteomics",
minimal: bool = False,
use_ols_cache_only: bool = False,
skip_sdrf_validation: bool = False,
):
"""
Check the SDRF file for errors. If any errors are found, print them and exit with a non-zero status code.
@param input_sdrf: Path to the SDRF file to check
@param skip_ms_validation: Disable the validation of mass spectrometry fields in SDRF (e.g. posttranslational modifications)
@param skip_factor_validation: Disable the validation of factor values in SDRF
@param skip_experimental_design_validation: Disable the validation of experimental design
@param use_ols_cache_only: Use ols cache for validation of the terms and not OLS internet service
@param skip_sdrf_validation: Disable the validation of SDRF
"""
if skip_sdrf_validation:
print("No SDRF validation was performed.")
sys.exit(0)

df = SdrfDataFrame.parse(input_sdrf)
errors = df.validate(DEFAULT_TEMPLATE, use_ols_cache_only)

if not skip_ms_validation:
errors = errors + df.validate(MASS_SPECTROMETRY, use_ols_cache_only)
Check the SDRF file for errors.

if not skip_factor_validation:
errors = errors + df.validate_factor_values()

if not skip_experimental_design_validation:
errors = errors + df.validate_experimental_design()
:param input_sdrf: Path to the SDRF file to check
:param template: Schema template for full validation (e.g. 'ms-proteomics', 'dia-acquisition')
:param minimal: Only validate columns required to run the pipeline (skip organism, etc.)
:param use_ols_cache_only: Use OLS cache instead of live OLS service
"""
if minimal:
errors = _validate_minimal(input_sdrf)
else:
df = read_sdrf(input_sdrf)
errors = df.validate_sdrf(
template=template,
use_ols_cache_only=use_ols_cache_only,
)

for error in errors:
print(error)

sys.exit(bool(errors))


def check_expdesign(expdesign):
"""
Check the expdesign file for errors. If any errors are found, print them and exit with a non-zero status code.
@param expdesign: Path to the expdesign file to check
"""
data = pd.read_csv(expdesign, sep="\t", header=0, dtype=str)
data = data.dropna()
schema_file = ["Fraction_Group", "Fraction", "Spectra_Filepath", "Label", "Sample"]
schema_sample = ["Sample", "MSstats_Condition", "MSstats_BioReplicate"]

# check table format: two table
with open(expdesign, "r") as f:
lines = f.readlines()
try:
empty_row = lines.index("\n")
except ValueError:
print(
"the one-table format parser is broken in OpenMS2.5, please use one-table or sdrf"
)
sys.exit(1)

s_table = [i.replace("\n", "").split("\t") for i in lines[empty_row + 1 :]][1:]
s_header = lines[empty_row + 1].replace("\n", "").split("\t")
s_data_frame = pd.DataFrame(s_table, columns=s_header)

# check missed mandatory column
missed_columns = set(schema_file) - set(data.columns)
if len(missed_columns) != 0:
print("{0} column missed".format(" ".join(missed_columns)))
sys.exit(1)

missed_columns = set(schema_sample) - set(s_data_frame.columns)
if len(missed_columns) != 0:
print("{0} column missed".format(" ".join(missed_columns)))
sys.exit(1)
def _validate_minimal(input_sdrf: str) -> list[str]:
"""Validate only the columns required to run the pipeline.

if len(set(data.Label)) != 1 and "MSstats_Mixture" not in s_data_frame.columns:
print("MSstats_Mixture column missed in ISO experiments")
sys.exit(1)

# check logical problem: may be improved
check_expdesign_logic(data, s_data_frame)
Returns a list of error strings. Only missing required columns
produce errors; missing recommended columns produce warnings (non-blocking).
"""
df_header = pd.read_csv(input_sdrf, sep="\t", nrows=0)
columns_lower = [c.lower() for c in df_header.columns]
errors = []

# Reject header-only files
df_rows = pd.read_csv(input_sdrf, sep="\t", nrows=1)
if len(df_rows) == 0:
errors.append("ERROR: SDRF file contains a header but no data rows.")
return errors

# Check required columns (case-insensitive)
for col in MINIMAL_REQUIRED_COLUMNS:
if col.lower() not in columns_lower:
errors.append(f"ERROR: Required column '{col}' is missing from the SDRF file.")

# Check at least one modification parameters column exists
has_mod_col = any(c.startswith("comment[modification parameters") for c in columns_lower)
if not has_mod_col:
errors.append(
"ERROR: At least one 'comment[modification parameters]' column is required."
)

# Warn about recommended columns (non-blocking)
for col in MINIMAL_RECOMMENDED_COLUMNS:
if col.lower() not in columns_lower:
logger.warning(
f"Recommended column '{col}' is missing. Pipeline will use default parameters."
)

def check_expdesign_logic(f_table, s_table):
fg_ints = f_table["Fraction_Group"].astype(int)
if fg_ints.max() > fg_ints.nunique():
print("Fraction_Group discontinuous!")
sys.exit(1)
f_table_d = f_table.drop_duplicates(["Fraction_Group", "Fraction", "Label", "Sample"])
if f_table_d.shape[0] < f_table.shape[0]:
print("Existing duplicate entries in Fraction_Group, Fraction, Label and Sample")
sys.exit(1)
if len(set(s_table.Sample)) < s_table.shape[0]:
print("Existing duplicate Sample in sample table!")
sys.exit(1)
return errors


@click.command(
"checksamplesheet",
short_help="Reformat nf-core/quantms sdrf file and check its contents.",
short_help="Validate an SDRF file for quantms pipelines.",
)
@click.option("--exp_design", help="SDRF/Expdesign file to be validated")
@click.option("--is_sdrf", help="SDRF file or Expdesign file", is_flag=True)
@click.option("--skip_sdrf_validation", help="Disable the validation of SDRF", is_flag=True)
@click.option("--exp_design", help="SDRF file to be validated", required=True)
@click.option(
"--skip_ms_validation",
help="Disable the validation of mass spectrometry fields in SDRF (e.g. posttranslational modifications)",
is_flag=True,
"--template", "-t",
help="Schema template for full validation (e.g. ms-proteomics, dia-acquisition)",
default="ms-proteomics",
)
@click.option(
"--skip_factor_validation",
help="Disable the validation of factor values in SDRF",
is_flag=True,
)
@click.option(
"--skip_experimental_design_validation",
help="Disable the validation of experimental design",
"--minimal",
help="Only validate columns required to run the pipeline (skip organism, metadata, etc.)",
is_flag=True,
)
@click.option(
"--use_ols_cache_only",
help="Use ols cache for validation of the terms and not OLS internet service",
help="Use OLS cache for ontology validation instead of the live OLS service",
is_flag=True,
)
def checksamplesheet(
exp_design: str,
is_sdrf: bool = False,
skip_sdrf_validation: bool = False,
skip_ms_validation: bool = False,
skip_factor_validation: bool = False,
skip_experimental_design_validation: bool = False,
template: str = "ms-proteomics",
minimal: bool = False,
use_ols_cache_only: bool = False,
):
"""
Reformat nf-core/quantms sdrf file and check its contents.
@param exp_design: SDRF/Expdesign file to be validated
@param is_sdrf: SDRF file or Expdesign file
@param skip_sdrf_validation: Disable the validation of SDRF
@param skip_ms_validation: Disable the validation of mass spectrometry fields in SDRF (e.g. posttranslational modifications)
@param skip_factor_validation: Disable the validation of factor values in SDRF
@param skip_experimental_design_validation: Disable the validation of experimental design
@param use_ols_cache_only: Use ols cache for validation of the terms and not OLS internet service

"""
# TODO validate expdesign file
if is_sdrf:
check_sdrf(
input_sdrf=exp_design,
skip_sdrf_validation=skip_sdrf_validation,
skip_ms_validation=skip_ms_validation,
skip_factor_validation=skip_factor_validation,
skip_experimental_design_validation=skip_experimental_design_validation,
use_ols_cache_only=use_ols_cache_only,
)
else:
check_expdesign(exp_design)
"""Validate an SDRF file for quantms pipelines."""
check_sdrf(
input_sdrf=exp_design,
template=template,
minimal=minimal,
use_ols_cache_only=use_ols_cache_only,
)
9 changes: 3 additions & 6 deletions recipe/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# recipe/meta.yaml
package:
name: quantms-utils
version: "0.0.25"
version: "0.0.26"

source:
path: ../
Expand All @@ -20,19 +20,16 @@ requirements:
- python
- pip
- poetry-core >=1.2.0
- setuptools <78

run:
- python >=3.9,<3.13
- click
- setuptools <78
- sdrf-pipelines >=0.0.33,<0.1.0
- sdrf-pipelines >=0.1.1
- pyopenms>=3.3.0
- pandas
- pyarrow>=16.1.0
- scipy
test:
requires:
- setuptools <78
imports:
- quantmsutils
commands:
Expand Down
Loading
Loading