Skip to content

Module/scdblfinder#261

Open
KurayiChawatama wants to merge 30 commits intonf-core:devfrom
KurayiChawatama:module/scdblfinder
Open

Module/scdblfinder#261
KurayiChawatama wants to merge 30 commits intonf-core:devfrom
KurayiChawatama:module/scdblfinder

Conversation

@KurayiChawatama
Copy link

@KurayiChawatama KurayiChawatama commented Mar 12, 2026

PR checklist

  • This comment contains a description of changes (with reason).

  • added an adaptation of my implementation of scDblFinder 's regular random sampling-based doublet detection method as a module to the pipeline. (If this works fine, we can add the clustering-dependent method too as soon as tomorrow)

  • It can be run using the settings --doublet_detection scdblfinder

  • It also takes a new optional parameter doublet_rate via a columnin the samplesheet with an expected float doublet rate between 0 and 1. If not provided, scDblFinder estimates it internally.

  • If you've fixed a bug or added code that should be tested, add tests!
    Super basic tests to run without errors were added to match the other R-based modules

  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
    Done!

  • If necessary, also make a PR on the nf-core/scdownstream branch on the nf-core/test-datasets repository.
    No need, works with the usual fitered testdataset that other doublet detection tools use

  • Make sure your code lints (nf-core pipelines lint).
    It does

  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
    **My module runs fine within the pipeline tests, quit after I saw it complete, retest needed on your end @nictru as you mentioned in the Slack thread **

  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
    **Didn't bother with this either as per the same thread in Slack by @nictru **

  • Usage Documentation in docs/usage.md is updated.

  • Output Documentation in docs/output.md is updated.

  • CHANGELOG.md is updated.

  • README.md is updated (including new tool citations and authors/contributors).

@KurayiChawatama KurayiChawatama requested review from Copilot and nictru and removed request for Copilot March 12, 2026 13:01
@KurayiChawatama
Copy link
Author

@nf-core-bot fix linting

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new doublet-detection option to the nf-core/scdownstream pipeline by integrating an scDblFinder-based method as a local module, exposing it via the existing --doublet_detection parameter and wiring it into the doublet detection subworkflow.

Changes:

  • Added a new local Nextflow module SCDBLFINDER (conda env, container, R implementation, nf-test coverage).
  • Integrated scdblfinder into the DOUBLET_DETECTION subworkflow and pipeline test configs.
  • Updated user-facing documentation and citations to include scDblFinder.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
subworkflows/local/doublet_detection/main.nf Adds SCDBLFINDER execution path and mixes its outputs into predictions/versions channels.
nextflow_schema.json Extends doublet_detection allowed methods and help text to include scdblfinder.
modules/local/doublet_detection/scdblfinder/main.nf New Nextflow process definition for the scDblFinder module.
modules/local/doublet_detection/scdblfinder/templates/scdblfinder.R Implements scDblFinder execution, multiplet-rate estimation, and prediction export.
modules/local/doublet_detection/scdblfinder/environment.yml Defines conda dependencies for the new module.
modules/local/doublet_detection/scdblfinder/meta.yml Adds module metadata, I/O definitions, and tool citation details.
modules/local/doublet_detection/scdblfinder/tests/main.nf.test Adds nf-test coverage for real and stub runs.
modules/local/doublet_detection/scdblfinder/tests/main.nf.test.snap Adds test snapshots for the module outputs.
conf/modules.config Adds publishDir/prefix handling for SCDBLFINDER outputs.
conf/test.config Enables scdblfinder in the test profile’s doublet_detection methods.
conf/test_full.config Enables scdblfinder in the full test profile’s doublet_detection methods.
docs/output.md Documents scDblFinder as a supported doublet detection tool and output folder.
README.md Lists scDblFinder among supported doublet detection tools.
CITATIONS.md Adds scDblFinder citation entry.
CHANGELOG.md Notes addition of scDblFinder module.
ro-crate-metadata.json Updates embedded project description to include scDblFinder in the tool list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

…der.R

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Collaborator

@nictru nictru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general. I did not look through the R code in detail yet (will do that in the evening)

@KurayiChawatama
Copy link
Author

@nictru want me to change all the doublet detection tools to use mix in a seperate PR then?

@nictru
Copy link
Collaborator

nictru commented Mar 12, 2026

You can also do it in this PR, I am not very strict about this

@nictru
Copy link
Collaborator

nictru commented Mar 13, 2026

Also please include it here

- Added optional `doublet_rate` column in input samplesheet for per-sample expected doublet rate in `scDblFinder`.
- Updated `scDblFinder` to utilize internal `dbr` estimation when `doublet_rate` is not provided.
- Modified input and output handling in `SCDBLFINDER` process to accommodate new `doublet_rate` parameter.
- Updated relevant documentation including CHANGELOG, README, and usage examples to reflect changes.
- Added tests to validate functionality with provided `doublet_rate`.
@KurayiChawatama
Copy link
Author

@nf-core-bot fix linting

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Collaborator

@nictru nictru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think things start to clear up. For you to better understand my rationale:

Generally all modules in the pipeline follow the same schema:

  1. Read input
  2. Apply function
  3. Write output

So that the pipeline modules are simple wrappers around the R/python functions.
Deviations from this schema (additional manipulations etc) are sometimes necessary but require clear justification.

If you look at how complicated the R script was in the beginning vs what it is now, this becomes apparent. Helps with keeping overview, finding bugs, and everything else

Comment on lines +33 to +37
sce <- scDblFinder(
assays(sce)[[1]],
BPPARAM = bp,
dbr = dbr
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that scDblFinder has an optional samples argument, which is highly relevant here. You should pass the batch_col to it, which you can get via the ch_batch_col in the doublet detection subworkflow

Comment on lines +59 to +64
# Write the updated SingleCellExperiment directly as h5ad, explicitly mapping the
# primary assay to AnnData X so downstream readers see a valid matrix field.
primary_assay <- assayNames(sce)[1]
if (is.na(primary_assay) || primary_assay == "") {
stop("scDblFinder output is missing a primary assay; cannot write h5ad output.")
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is also unnecessary because the sce is created from the anndata in a way that is perfectly prepared for reversing the process

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants