Skip to content

Adding annotation file param to bigDIANNtoMSstatsFormat#12

Open
Rudhik1904 wants to merge 1 commit intodevelfrom
Updating_CleanDIANN_Annotation
Open

Adding annotation file param to bigDIANNtoMSstatsFormat#12
Rudhik1904 wants to merge 1 commit intodevelfrom
Updating_CleanDIANN_Annotation

Conversation

@Rudhik1904
Copy link
Contributor

@Rudhik1904 Rudhik1904 commented Feb 14, 2026

Motivation and Context

In the previous issue, we created an MSstats big converter for DIANN, but we missed captured annotation information so we are doing that in this story.

Changes

Adding Annotation file param to bigDIANNtoMSstatsFormat

Testing

Created test-clean_DIANN.R

Motivation and Context

The existing MSstats converter for DIANN data files did not capture annotation information. This pull request extends the DIANN conversion pipeline to support optional annotation files, enabling users to add experimental metadata (such as conditions and biological replicates) during the conversion process. The annotation capability is propagated through the chunked data processing workflow to maintain support for large out-of-memory datasets.

Detailed Changes

R/clean_DIANN.R

  • Added optional annotation = NULL parameter to reduceBigDIANN() function signature
  • Added optional annotation = NULL parameter to cleanDIANNChunk() function signature
  • Updated diann_chunk() nested function to pass the annotation parameter to cleanDIANNChunk()
  • Added MSstatsMakeAnnotation to the @importFrom MSstatsConvert declaration in cleanDIANNChunk() documentation
  • Added @param annotation documentation to both function Roxygen blocks
  • Integrated MSstatsMakeAnnotation(input, annotation) call in cleanDIANNChunk() after MSstatsClean() to merge annotation data with cleaned DIANN data prior to file output

R/converters.R

  • Added optional annotation = NULL parameter to bigDIANNtoMSstatsFormat() function signature, positioned after input_file parameter
  • Updated the internal call to reduceBigDIANN() to include and pass through the annotation argument

tests/testthat/test-clean_DIANN.R

  • Added new test file with comprehensive unit test for annotation handling
  • Test constructs minimal test data including an input chunk and annotation dataframe
  • Uses mockery library to stub internal functions: MSstatsImport, MSstatsClean, MSstatsMakeAnnotation, and .writeChunkToFile
  • Verifies that MSstatsMakeAnnotation is called exactly once with correct arguments (cleaned data and annotation)
  • Validates that the merged result from annotation processing is passed to the file writing function
  • Ensures end-to-end data flow through the cleaning and annotation steps

Unit Tests

A new test file tests/testthat/test-clean_DIANN.R was added with a single test case:

  • test_that("cleanDIANNChunk passes annotation to MSstatsMakeAnnotation") - Verifies the annotation parameter is correctly propagated through the chunk processing pipeline using mocked dependencies and assertion checks on function call counts and arguments.

Coding Guidelines

The code follows the existing style conventions in the repository. However, there is a pre-existing inconsistency in the codebase regarding function definition syntax: cleanDIANNChunk = function(...) uses the assignment operator = instead of the more idiomatic <-. This same pattern is also present in cleanSpectronautChunk in the same file. While the new code maintains consistency with the existing pattern in its immediate context, this diverges from the general R style guide which recommends using <- for assignments. The R package does not have a configured linter or explicit style guide in its repository, and the Bioconductor CI workflow does not enforce style checks.

@coderabbitai
Copy link

coderabbitai bot commented Feb 14, 2026

📝 Walkthrough

Walkthrough

The changes introduce an optional annotation parameter that flows through the DIANN cleaning pipeline: bigDIANNtoMSstatsFormatreduceBigDIANNcleanDIANNChunk. Within cleanDIANNChunk, the annotation is applied via MSstatsMakeAnnotation before writing output. A test validates the complete annotation propagation.

Changes

Cohort / File(s) Summary
DIANN Cleaning Functions
R/clean_DIANN.R
Added optional annotation = NULL parameter to reduceBigDIANN() and cleanDIANNChunk(). Updated diann_chunk() to propagate annotation through the call chain. Invokes MSstatsMakeAnnotation(input, annotation) in cleanDIANNChunk after cleaning and before file write. Updated Roxygen documentation and @importFrom directives.
DIANN Converter
R/converters.R
Added optional annotation = NULL parameter to bigDIANNtoMSstatsFormat() and propagated it downstream to reduceBigDIANN() call.
DIANN Cleaning Tests
tests/testthat/test-clean_DIANN.R
New test file validating cleanDIANNChunk() annotation workflow. Mocks dependency functions and verifies that MSstatsMakeAnnotation is called with correct arguments and result is passed to .writeChunkToFile().

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant bigDIANNtoMSstatsFormat as bigDIANNtoMSstatsFormat()
    participant reduceBigDIANN as reduceBigDIANN()
    participant cleanDIANNChunk as cleanDIANNChunk()
    participant MSstatsImport as MSstatsImport()
    participant MSstatsClean as MSstatsClean()
    participant MSstatsMakeAnnotation as MSstatsMakeAnnotation()
    participant writeChunkToFile as .writeChunkToFile()

    User->>bigDIANNtoMSstatsFormat: input_file, annotation, ...
    bigDIANNtoMSstatsFormat->>reduceBigDIANN: input_file, annotation, ...
    reduceBigDIANN->>cleanDIANNChunk: chunk_data, annotation, ...
    cleanDIANNChunk->>MSstatsImport: chunk_data
    MSstatsImport-->>cleanDIANNChunk: imported_data
    cleanDIANNChunk->>MSstatsClean: imported_data
    MSstatsClean-->>cleanDIANNChunk: cleaned_data
    cleanDIANNChunk->>MSstatsMakeAnnotation: cleaned_data, annotation
    MSstatsMakeAnnotation-->>cleanDIANNChunk: annotated_data
    cleanDIANNChunk->>writeChunkToFile: annotated_data, output_path
    writeChunkToFile-->>cleanDIANNChunk: success
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Poem

🐰 Annotations flow like morning dew,
Through channels old and pathways new,
From converter down to cleaning's way,
MSstats grows stronger every day!

🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (2 files):

⚔️ R/clean_DIANN.R (content)
⚔️ R/converters.R (content)

These conflicts must be resolved before merging into devel.
Resolve conflicts locally and push changes to this branch.
Description check ❓ Inconclusive The description is largely incomplete. It includes motivation and mentions test creation, but lacks detailed bullet points of changes as required by the template. Expand the 'Changes' section with detailed bullet points describing how annotation was added to each function (reduceBigDIANN, cleanDIANNChunk, bigDIANNtoMSstatsFormat) and how it flows through the processing pipeline.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly matches the main change: adding an annotation parameter to bigDIANNtoMSstatsFormat function across the codebase.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch Updating_CleanDIANN_Annotation
⚔️ Resolve merge conflicts (beta)
  • Auto-commit resolved conflicts to branch Updating_CleanDIANN_Annotation
  • Create stacked PR with resolved conflicts
  • Post resolved changes as copyable diffs in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
R/converters.R (2)

168-184: ⚠️ Potential issue | 🔴 Critical

Breaking change: inserting annotation before output_file_name breaks positional callers.

Existing code calling bigDIANNtoMSstatsFormat(input_file, output_file_name, backend, ...) positionally will now silently assign the output file path to annotation and the backend string to output_file_name. Move annotation = NULL after the required positional parameters (or at least after backend) to preserve backward compatibility.

Suggested fix
-bigDIANNtoMSstatsFormat <- function(input_file, 
-                                    annotation = NULL,
-                                    output_file_name,
+bigDIANNtoMSstatsFormat <- function(input_file,
+                                    output_file_name,
                                     backend,
+                                    annotation = NULL,
                                     MBR = TRUE,

187-191: 🛠️ Refactor suggestion | 🟠 Major

Use file.path for intermediate file path construction.

paste0("reduce_output_", output_file_name) will break when output_file_name contains directory components (e.g., "results/output.csv""reduce_output_results/output.csv"). Use file.path(dirname(...), paste0(..., basename(...))) instead.

Suggested fix
-  reduceBigDIANN(input_file, 
-                 paste0("reduce_output_", output_file_name),
+  intermediate_file <- file.path(dirname(output_file_name),
+                                 paste0("reduce_output_", basename(output_file_name)))
+  reduceBigDIANN(input_file, 
+                 intermediate_file,
                  MBR,
                  quantificationColumn,
                  global_qvalue_cutoff, qvalue_cutoff, pg_qvalue_cutoff, annotation)

The same fix should be applied on line 195 where the intermediate path is read back. Based on learnings, intermediate file paths should be constructed using file.path(dirname(output_file_name), paste0("prefix_", basename(output_file_name))) to avoid path issues across different working directories.

🧹 Nitpick comments (2)
tests/testthat/test-clean_DIANN.R (2)

6-39: Good test coverage for the annotation path; consider adding a NULL annotation test.

The mock-based test correctly validates that annotation flows through to MSstatsMakeAnnotation and that its output reaches .writeChunkToFile.

Consider adding a companion test for the default annotation = NULL case to ensure the no-annotation path doesn't regress — especially important given that MSstatsMakeAnnotation is called unconditionally.


4-4: context() is deprecated in testthat 3rd edition.

You can simply remove this line; test_that descriptions are sufficient for grouping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant