nf-core · nictru · Apr 10, 2026 · Mar 12, 2026 · Mar 12, 2026 · Mar 12, 2026
@@ -14,9 +14,13 @@ Initial release of nf-core/scdownstream, created with the [nf-core](https://nf-c
 - Add `singleR` module for automated cell type annotation [[#200](https://github.com/nf-core/scdownstream/pull/200)]
 - Use topics for software versioning [[#252](https://github.com/nf-core/scdownstream/pull/252)]
 - Added `singleR` module for automated cell type annotation.
+- Added `scDblFinder` module for doublet detection.
+- Added optional `doublet_rate` column in input samplesheet to provide per-sample expected doublet rate for `scDblFinder`.
 
 ### `Fixed`
 
+- Updated `scDblFinder` to use internal `dbr` estimation when `doublet_rate` is not provided, and to use provided `doublet_rate` when available.
+
 ### `Dependencies`
 
 ### `Deprecated`
@@ -47,6 +47,10 @@
 
   > Cannoodt R, Zappia L, Morgan M, Deconinck L (2025). anndataR: AnnData interoperability in R. R package version 0.99.0
 
+- [scDblFinder](https://pubmed.ncbi.nlm.nih.gov/35118618/)
+
+  > Germain P, Lun A, Garcia Meixide C, Macnair W, Robinson M. Doublet identification in single-cell sequencing data using scDblFinder. F1000Res. 2022;11:979. doi: 10.12688/f1000research.73600.2.
+
 ## Software packaging/containerisation tools
 
 - [Anaconda](https://anaconda.com)

@@ -49,6 +49,7 @@ Steps marked with the boat icon are not yet implemented. For the other steps, th
       - [Scrublet](https://scanpy.readthedocs.io/en/stable/api/generated/scanpy.pp.scrublet.html)
       - [DoubletDetection](https://doubletdetection.readthedocs.io/en/v2.5.2/doubletdetection.doubletdetection.html)
       - [SCDS](https://bioconductor.org/packages/devel/bioc/vignettes/scds/inst/doc/scds.html)
+      - [scDblFinder](https://bioconductor.org/packages/release/bioc/html/scDblFinder.html)
    7. Cell cycle scoring ([Tirosh et al. 2015](https://doi.org/10.1038/nature14590))
 2. Sample aggregation
    1. Merge into a single H5AD file

@@ -122,6 +122,13 @@
                 "errorMessage": "Number of cells expected from the experimental design, used as input to cellbender.",
                 "meta": ["expected_cells"]
             },
+            "doublet_rate": {
+                "type": "number",
+                "minimum": 0,
+                "maximum": 1,
+                "errorMessage": "doublet_rate must be a number between 0 and 1.",
+                "meta": ["doublet_rate"]
+            },
             "ambient_correction": {
                 "type": "boolean",
                 "default": true,

@@ -213,6 +213,16 @@ process {
         ]
     }
 
+    withName: SCDBLFINDER {
+        ext.prefix = { meta.id + '_scdblfinder' }
+        publishDir = [
+            path: { "${params.outdir}/quality_control/doublet_detection/scdblfinder" },
+            mode: params.publish_dir_mode,
+            enabled: params.save_intermediates,
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
+        ]
+    }
+
     withName: DOUBLET_REMOVAL {
         publishDir = [
             path: { "${params.outdir}/quality_control/doublet_detection" },

@@ -25,7 +25,7 @@ params {
     // Input data
     input               = params.pipelines_testdata_base_path + 'samplesheet.csv'
     integration_methods = 'scvi,harmony,bbknn,combat'
-    doublet_detection   = 'solo,scrublet,scds'
+    doublet_detection   = 'solo,scrublet,scds,scdblfinder'
     celltypist_model    = 'Adult_Human_Skin'
     celldex_reference   = 'https://raw.githubusercontent.com/nf-core/test-datasets/scdownstream/singleR/references.csv'
     integration_hvgs    = 500

@@ -25,7 +25,7 @@ params {
     // Input data for full size test
     input               = params.pipelines_testdata_base_path + 'samplesheet.csv'
     integration_methods = 'scvi,harmony,bbknn,combat'
-    doublet_detection   = 'solo,scrublet,doubletdetection,scds'
+    doublet_detection   = 'solo,scrublet,doubletdetection,scds,scdblfinder'
     celltypist_model    = 'Adult_Human_Skin'
     celldex_reference   = 'https://raw.githubusercontent.com/nf-core/test-datasets/scdownstream/singleR/references.csv'
     integration_hvgs    = 500

@@ -25,6 +25,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
       - [Scrublet](https://scanpy.readthedocs.io/en/stable/api/generated/scanpy.pp.scrublet.html)
       - [DoubletDetection](https://doubletdetection.readthedocs.io/en/v2.5.2/doubletdetection.doubletdetection.html)
       - [SCDS](https://bioconductor.org/packages/devel/bioc/vignettes/scds/inst/doc/scds.html)
+      - [scDblFinder](https://bioconductor.org/packages/release/bioc/html/scDblFinder.html)
    7. Cell cycle scoring ([Tirosh et al. 2015](https://doi.org/10.1038/nature14590))
 2. Sample aggregation
    1. Merge into a single H5AD file
@@ -59,7 +60,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
   - `custom_thresholds/`: Results of applying user-defined QC thresholds.
   - `doublet_detection/`: Directories related to doublet detection.
     - `input_rds/`: RDS version of the H5AD file that is used as input to the doublet detection tools.
-    - `(doubletdetection|scds|scrublet|solo)/`: Results of doublet detection.
+    - `(doubletdetection|scdblfinder|scds|scrublet|solo)/`: Results of doublet detection.
       Each directory contains a filtered `h5ad`/`rds` and a `csv`/`pkl` file
       with the doublet annotations.
     - `${sample_id}.h5ad`: The H5AD without doublets.

@@ -87,10 +87,11 @@ The **Test strategy (this branch)** column describes what the tests on this bran
 
 ### `doublet_detection/`
 
-| Module                              | Description                                                                                                         | Reproducibility                                                                                                          | Test strategy (this branch)                                              |
-| ----------------------------------- | ------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------ |
-| `doublet_detection/doublet_removal` | Removes doublet cells from an AnnData object based on a threshold applied to aggregated doublet-caller predictions. | Fully deterministic                                                                                                      | hash                                                                     |
-| `doublet_detection/scds`            | Scores and calls doublets using bcds + cxds + hybrid (scds R package) with `set.seed(0)`.                           | Seeded / quasi-deterministic — seed is fixed, but internal boosting results may vary slightly across R/package versions. | column names — `versions` YAML + obs column names; no `process.out` MD5s |
+| Module                              | Description                                                                                                                   | Reproducibility                                                                                                                | Test strategy (this branch)                                              |
+| ----------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------ |
+| `doublet_detection/doublet_removal` | Removes doublet cells from an AnnData object based on a threshold applied to aggregated doublet-caller predictions.           | Fully deterministic                                                                                                            | hash                                                                     |
+| `doublet_detection/scdblfinder`     | Scores and calls doublets with scDblFinder (`set.seed(123)` and BiocParallel `RNGseed=123`) and exports per-cell predictions. | Seeded / quasi-deterministic — seeds are fixed, but iterative model fitting can still vary slightly across R/package versions. | hash                                                                     |
+| `doublet_detection/scds`            | Scores and calls doublets using bcds + cxds + hybrid (scds R package) with `set.seed(0)`.                                     | Seeded / quasi-deterministic — seed is fixed, but internal boosting results may vary slightly across R/package versions.       | column names — `versions` YAML + obs column names; no `process.out` MD5s |
 
 ### `hugounifier/`
 
@@ -163,7 +164,7 @@ The **Test strategy (this branch)** column describes what the tests on this bran
 | `cluster`                 | Full clustering pipeline: neighbours → UMAP → Leiden at multiple resolutions → Shannon entropy.                                                                                             | Seeded / quasi-deterministic for UMAP; **non-deterministic** due to unseeded Leiden.                                                            | structural — **`workflow.out.versions` only** (each as YAML); graph / embedding presence asserted in code outside `snapshot`.                                                                                  |
 | `combine`                 | Merges all samples and runs all configured integration methods.                                                                                                                             | Inherits from constituent modules — ranges from fully deterministic (no integration) to seeded/quasi-deterministic (scVI, Harmony, Seurat).     | structural — **`workflow.out.versions` (YAML) + `adata.yaml`** on merged H5AD.                                                                                                                                 |
 | `differential_expression` | Runs rank-genes-groups DE analysis across all combinations of clustering labels, conditions, and cell-type subsets.                                                                         | Fully deterministic for the default wilcoxon/t-test methods.                                                                                    | structural — **`workflow.out.versions` only** (YAML); DE / MultiQC presence asserted outside `snapshot` where needed.                                                                                          |
-| `doublet_detection`       | Runs one or more doublet-detection methods (scds, solo, scrublet, doubletdetection) and removes called doublets.                                                                            | **Non-deterministic** — solo, scrublet, and doubletdetection have stochastic components; scds is seeded.                                        | structural + **range assertion** on **`n_obs`**; snapshot uses **`versions` (YAML) + `adata.yaml`**.                                                                                                           |
+| `doublet_detection`       | Runs one or more doublet-detection methods (scds, scdblfinder, solo, scrublet, doubletdetection) and removes called doublets.                                                               | **Non-deterministic** — solo, scrublet, and doubletdetection have stochastic components; scds and scdblfinder are seeded.                       | structural + **range assertion** on **`n_obs`**; snapshot uses **`versions` (YAML) + `adata.yaml`**.                                                                                                           |
 | `finalize`                | Assembles the final AnnData by extending it with all collected obs/obsm/uns/layers outputs.                                                                                                 | Fully deterministic                                                                                                                             | hash — **`workflow.out.h5ad` + `workflow.out.versions` (YAML) + `adata.yaml`** — not a bare `snapshot(workflow.out)` in non-stub tests.                                                                        |
 | `integrate`               | Applies HVG selection then one or more integration methods (scVI, scANVI, Harmony, BBKNN, ComBat, Seurat, SCimilarity).                                                                     | Seeded / quasi-deterministic for scVI/scANVI/ComBat/Seurat/BBKNN; **non-deterministic** for Harmony.                                            | structural — **`workflow.out.versions` (YAML) + `adata.yaml`** on integration H5AD (e.g. Harmony / BBKNN / ComBat tests).                                                                                      |
 | `load_h5ad`               | Loads input files in H5AD, 10x H5, RDS, or CSV format and converts all to AnnData H5AD.                                                                                                     | Fully deterministic                                                                                                                             | hash — **`snapshot(workflow.out)` only** (passthrough-safe; avoids `anndata().yaml` on unstaged inputs per nf-test rules).                                                                                     |

@@ -38,10 +38,10 @@ sample3,/absolute/path/to/sample3.csv
 There are a couple of optional columns that can be used for more advanced features:
 
 ```csv title="samplesheet.csv"
-sample,filtered,unfiltered,batch_col,label_col,condition_col,unknown_label,min_genes,min_cells,min_counts_cell,min_counts_gene,expected_cells,ambient_correction,ambient_corrected_integration
-sample1,/absolute/path/to/sample1_filtered.h5ad,/absolute/path/to/sample1.h5ad,batch,cell_type,condition,unknown,1,2,3,4,5000,true,false
-sample2,relative/path/to/sample2_filtered.rds,relative/path/to/sample2.rds,batch_id,annotation,condition,unannotated,5,6,7,8,3000,false,
-sample3,/absolute/path/to/sample3_filtered.csv,/absolute/path/to/sample3.csv,,,,,9,10,11,12,,true,true
+sample,filtered,unfiltered,batch_col,label_col,condition_col,unknown_label,min_genes,min_cells,min_counts_cell,min_counts_gene,expected_cells,doublet_rate,ambient_correction,ambient_corrected_integration
+sample1,/absolute/path/to/sample1_filtered.h5ad,/absolute/path/to/sample1.h5ad,batch,cell_type,condition,unknown,1,2,3,4,5000,0.08,true,false
+sample2,relative/path/to/sample2_filtered.rds,relative/path/to/sample2.rds,batch_id,annotation,condition,unannotated,5,6,7,8,3000,,false,
+sample3,/absolute/path/to/sample3_filtered.csv,/absolute/path/to/sample3.csv,,,,,9,10,11,12,,,true,true
 ```
 
 For CSV input files, specifying the `batch_col`, `label_col`, `condition_col`, and `unknown_label` columns will not have any effect, as no additional metadata is available in the CSV file.
@@ -63,6 +63,7 @@ For CSV input files, specifying the `batch_col`, `label_col`, `condition_col`, a
 | `min_counts_cell`               | Minimum number of counts required for a cell to be considered. Defaults to `1`.                                                                                                                                                                                                                                                                                                                                     |
 | `min_counts_gene`               | Minimum number of counts required for a gene to be considered. Defaults to `1`.                                                                                                                                                                                                                                                                                                                                     |
 | `expected_cells`                | Number of expected cells, used as input to CellBender for empty droplet detection.                                                                                                                                                                                                                                                                                                                                  |
+| `doublet_rate`                  | Optional expected doublet rate (0-1) for `scDblFinder`. If not provided, `scDblFinder` estimates it internally.                                                                                                                                                                                                                                                                                                     |
 | `max_mito_percentage`           | Maximum percentage of mitochondrial reads for a cell to be considered. Defaults to `100`.                                                                                                                                                                                                                                                                                                                           |
 | `ambient_correction`            | Whether to perform ambient RNA correction for this sample. Set to `true` to use the globally configured method, `false` to skip ambient correction for this sample. Defaults to `true`.                                                                                                                                                                                                                             |
 | `ambient_corrected_integration` | Whether to use ambient-corrected counts for integration for this sample. Set to `true` to use corrected counts in downstream integration, `false` to store them only as additional layers. Can override the global `--ambient_corrected_integration` parameter. Defaults to global setting.                                                                                                                         |

@@ -0,0 +1,11 @@
+name: scdblfinder
+channels:
+  - conda-forge
+  - bioconda
+dependencies:
+  - bioconda::bioconductor-scdblfinder=1.24.0
+  - bioconda::bioconductor-singlecellexperiment=1.32.0
+  - bioconda::bioconductor-biocparallel=1.44.0
+  - bioconda::bioconductor-anndatar=1.0.2
+  - bioconda::bioconductor-rhdf5=2.54.1
+  - conda-forge::r-tidyverse=2.0.0
@@ -0,0 +1,32 @@
+process SCDBLFINDER {
+    tag "$meta.id"
+    label 'process_medium'
+
+    conda "${moduleDir}/environment.yml"
+    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
+        'https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/99/993a012a69d920412b090701eb733ccf35c8655c3d012756ca6b0af1cfcd4780/data' :
+        'community.wave.seqera.io/library/bioconductor-anndatar_bioconductor-biocparallel_bioconductor-rhdf5_bioconductor-scdblfinder_pruned:0f9db6b0855861de' }"
+
+    input:
+    tuple val(meta), path(h5ad), val(dbr), val(batch_col)
+
+    output:
+    tuple val(meta), path("${prefix}.h5ad"), emit: h5ad
+    tuple val(meta), path("${prefix}.csv"), emit: predictions
+    path "versions.yml", emit: versions, topic: versions
+
+    when:
+    task.ext.when == null || task.ext.when
+
+    script:
+    prefix = task.ext.prefix ?: "${meta.id}"
+    template('scdblfinder.R')
+
+    stub:
+    prefix = task.ext.prefix ?: "${meta.id}"
+    """
+    touch ${prefix}.h5ad
+    touch ${prefix}.csv
+    touch versions.yml
+    """
+}