Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
08a8e8c
Add scdblfinder module skeleton generated by nf-core tools
KurayiChawatama Mar 12, 2026
15033b3
Fix scdblfinder: remove mockDoubletSCE and use real SCE object directly
KurayiChawatama Mar 12, 2026
01fa004
Integrate scdblfinder into pipeline configuration and tests
KurayiChawatama Mar 12, 2026
4e205b7
Fix scdblfinder module implementation and tests
KurayiChawatama Mar 12, 2026
ec61a68
Update documentation to include scDblFinder
KurayiChawatama Mar 12, 2026
59af52f
added more documentation for scdblfinder
KurayiChawatama Mar 12, 2026
6f5943c
added scdblfinder citation to citations md
KurayiChawatama Mar 12, 2026
d7ff7b3
removed template comment from meta yml
KurayiChawatama Mar 12, 2026
a2d91e7
moved scdblfinder module to doublet detection dirtectory
KurayiChawatama Mar 12, 2026
177737c
updated docs ouput to include scdblfinder
KurayiChawatama Mar 12, 2026
90a5a33
[automated] Fix code linting
nf-core-bot Mar 12, 2026
d7ee2e7
Update modules/local/doublet_detection/scdblfinder/templates/scdblfin…
KurayiChawatama Mar 12, 2026
efcb9ed
added https version of the singularity container link
KurayiChawatama Mar 13, 2026
25147f7
refactor(scDblFinder): optimize multiplet rate calculation using find…
KurayiChawatama Mar 13, 2026
188b452
added explanation for column name change
KurayiChawatama Mar 13, 2026
c69f87d
write updated SingleCellExperiment directly as h5ad without explicit …
KurayiChawatama Mar 13, 2026
bea96c1
enhance h5ad writing with validation for cell barcodes and primary assay
KurayiChawatama Mar 13, 2026
9c845b2
add scdblfinder to input methods in doublet detection subworkflow test
KurayiChawatama Mar 13, 2026
63909fa
streamline renaming of scDblFinder columns with less clumsy code
KurayiChawatama Mar 13, 2026
e344b63
removed explicit call of artifical doublet number in scdblfinder func…
KurayiChawatama Mar 13, 2026
5f7ed16
updated test snapshot to match previous commit
KurayiChawatama Mar 13, 2026
3bdba80
Enhance scDblFinder functionality and documentation
KurayiChawatama Mar 13, 2026
61b9c86
[automated] Fix code linting
nf-core-bot Mar 13, 2026
dc6e70d
change other doublet detection methods to use mix
KurayiChawatama Mar 13, 2026
e2f6068
Remove redundant restoration of original cell barcodes in scDblFinder…
KurayiChawatama Mar 13, 2026
64e3250
Remove unnecessary comment about RNG seed parameter in scDblFinder sc…
KurayiChawatama Mar 13, 2026
b10f0c3
Refactor doublet rate handling in scDblFinder to streamline logic and…
KurayiChawatama Mar 13, 2026
2eedc34
Fix regex pattern for doublet detection tool options in nextflow_sche…
KurayiChawatama Mar 13, 2026
ef60cc2
added intial batch processign steps code
KurayiChawatama Mar 17, 2026
cb083f5
add comments for clarity, initialize samples variable to prevent crashes
KurayiChawatama Mar 19, 2026
c03068f
Refactor sample handling in scDblFinder to prevent object not found e…
KurayiChawatama Mar 19, 2026
c97fa29
Simplify h5ad writing process by removing unnecessary primary assay m…
KurayiChawatama Mar 19, 2026
a5b2233
Update process label in SCDBLFINDER from 'process_low' to 'process_me…
KurayiChawatama Mar 19, 2026
a135539
Remove saving of original cell names in scDblFinder script to streaml…
KurayiChawatama Mar 19, 2026
e7d438f
Merge remote-tracking branch 'upstream/dev' into module/scdblfinder
KurayiChawatama Apr 6, 2026
31a0492
Merge remote-tracking branch 'origin/dev' into module/scdblfinder
nictru Apr 9, 2026
c54bfc8
Fix ro-crate
nictru Apr 9, 2026
b6f6b7e
Update test snapshots
nictru Apr 10, 2026
f3e8c17
Align missing batch_col handling with other doublet detection modules
nictru Apr 10, 2026
cb22682
Use topics
nictru Apr 10, 2026
44b8045
Align test structure with new repo-wide testing approach
nictru Apr 10, 2026
eec602f
Add explicit method check to doublet detection subworkflow
nictru Apr 10, 2026
ef5d2da
Change doublet detection threshold type comment to integer
nictru Apr 10, 2026
38660b4
Remove doublet_rate note from readme to align with other samplesheet …
nictru Apr 10, 2026
4b40b5a
Revert "Add explicit method check to doublet detection subworkflow"
nictru Apr 10, 2026
418bb50
Update test inputs in quality control workflow to replace 'none' with…
nictru Apr 10, 2026
4ebfc87
Fix problematic QC test assertions
nictru Apr 10, 2026
5231177
Update QC subworkflow test snapshots
nictru Apr 10, 2026
ef558ca
Move scimilarity existence check location to prevent CI failures
nictru Apr 10, 2026
8ccffc5
Remove scimilarity model existence check from nextflow_schema.json
nictru Apr 10, 2026
1642a05
Fix scimilarity subworkflow tests
nictru Apr 10, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,13 @@ Initial release of nf-core/scdownstream, created with the [nf-core](https://nf-c
- Add `singleR` module for automated cell type annotation [[#200](https://github.com/nf-core/scdownstream/pull/200)]
- Use topics for software versioning [[#252](https://github.com/nf-core/scdownstream/pull/252)]
- Added `singleR` module for automated cell type annotation.
- Added `scDblFinder` module for doublet detection.
- Added optional `doublet_rate` column in input samplesheet to provide per-sample expected doublet rate for `scDblFinder`.

### `Fixed`

- Updated `scDblFinder` to use internal `dbr` estimation when `doublet_rate` is not provided, and to use provided `doublet_rate` when available.

### `Dependencies`

### `Deprecated`
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,10 @@

> Cannoodt R, Zappia L, Morgan M, Deconinck L (2025). anndataR: AnnData interoperability in R. R package version 0.99.0

- [scDblFinder](https://pubmed.ncbi.nlm.nih.gov/35118618/)

> Germain P, Lun A, Garcia Meixide C, Macnair W, Robinson M. Doublet identification in single-cell sequencing data using scDblFinder. F1000Res. 2022;11:979. doi: 10.12688/f1000research.73600.2.

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ Steps marked with the boat icon are not yet implemented. For the other steps, th
- [Scrublet](https://scanpy.readthedocs.io/en/stable/api/generated/scanpy.pp.scrublet.html)
- [DoubletDetection](https://doubletdetection.readthedocs.io/en/v2.5.2/doubletdetection.doubletdetection.html)
- [SCDS](https://bioconductor.org/packages/devel/bioc/vignettes/scds/inst/doc/scds.html)
- [scDblFinder](https://bioconductor.org/packages/release/bioc/html/scDblFinder.html)
7. Cell cycle scoring ([Tirosh et al. 2015](https://doi.org/10.1038/nature14590))
2. Sample aggregation
1. Merge into a single H5AD file
Expand Down
7 changes: 7 additions & 0 deletions assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,13 @@
"errorMessage": "Number of cells expected from the experimental design, used as input to cellbender.",
"meta": ["expected_cells"]
},
"doublet_rate": {
"type": "number",
"minimum": 0,
"maximum": 1,
"errorMessage": "doublet_rate must be a number between 0 and 1.",
"meta": ["doublet_rate"]
},
"ambient_correction": {
"type": "boolean",
"default": true,
Expand Down
10 changes: 10 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,16 @@ process {
]
}

withName: SCDBLFINDER {
ext.prefix = { meta.id + '_scdblfinder' }
publishDir = [
path: { "${params.outdir}/quality_control/doublet_detection/scdblfinder" },
mode: params.publish_dir_mode,
enabled: params.save_intermediates,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: DOUBLET_REMOVAL {
publishDir = [
path: { "${params.outdir}/quality_control/doublet_detection" },
Expand Down
2 changes: 1 addition & 1 deletion conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ params {
// Input data
input = params.pipelines_testdata_base_path + 'samplesheet.csv'
integration_methods = 'scvi,harmony,bbknn,combat'
doublet_detection = 'solo,scrublet,scds'
doublet_detection = 'solo,scrublet,scds,scdblfinder'
celltypist_model = 'Adult_Human_Skin'
celldex_reference = 'https://raw.githubusercontent.com/nf-core/test-datasets/scdownstream/singleR/references.csv'
integration_hvgs = 500
Expand Down
2 changes: 1 addition & 1 deletion conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ params {
// Input data for full size test
input = params.pipelines_testdata_base_path + 'samplesheet.csv'
integration_methods = 'scvi,harmony,bbknn,combat'
doublet_detection = 'solo,scrublet,doubletdetection,scds'
doublet_detection = 'solo,scrublet,doubletdetection,scds,scdblfinder'
celltypist_model = 'Adult_Human_Skin'
celldex_reference = 'https://raw.githubusercontent.com/nf-core/test-datasets/scdownstream/singleR/references.csv'
integration_hvgs = 500
Expand Down
3 changes: 2 additions & 1 deletion docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Scrublet](https://scanpy.readthedocs.io/en/stable/api/generated/scanpy.pp.scrublet.html)
- [DoubletDetection](https://doubletdetection.readthedocs.io/en/v2.5.2/doubletdetection.doubletdetection.html)
- [SCDS](https://bioconductor.org/packages/devel/bioc/vignettes/scds/inst/doc/scds.html)
- [scDblFinder](https://bioconductor.org/packages/release/bioc/html/scDblFinder.html)
7. Cell cycle scoring ([Tirosh et al. 2015](https://doi.org/10.1038/nature14590))
2. Sample aggregation
1. Merge into a single H5AD file
Expand Down Expand Up @@ -59,7 +60,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- `custom_thresholds/`: Results of applying user-defined QC thresholds.
- `doublet_detection/`: Directories related to doublet detection.
- `input_rds/`: RDS version of the H5AD file that is used as input to the doublet detection tools.
- `(doubletdetection|scds|scrublet|solo)/`: Results of doublet detection.
- `(doubletdetection|scdblfinder|scds|scrublet|solo)/`: Results of doublet detection.
Each directory contains a filtered `h5ad`/`rds` and a `csv`/`pkl` file
with the doublet annotations.
- `${sample_id}.h5ad`: The H5AD without doublets.
Expand Down
11 changes: 6 additions & 5 deletions docs/reproducibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,10 +87,11 @@ The **Test strategy (this branch)** column describes what the tests on this bran

### `doublet_detection/`

| Module | Description | Reproducibility | Test strategy (this branch) |
| ----------------------------------- | ------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------ |
| `doublet_detection/doublet_removal` | Removes doublet cells from an AnnData object based on a threshold applied to aggregated doublet-caller predictions. | Fully deterministic | hash |
| `doublet_detection/scds` | Scores and calls doublets using bcds + cxds + hybrid (scds R package) with `set.seed(0)`. | Seeded / quasi-deterministic — seed is fixed, but internal boosting results may vary slightly across R/package versions. | column names — `versions` YAML + obs column names; no `process.out` MD5s |
| Module | Description | Reproducibility | Test strategy (this branch) |
| ----------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------ |
| `doublet_detection/doublet_removal` | Removes doublet cells from an AnnData object based on a threshold applied to aggregated doublet-caller predictions. | Fully deterministic | hash |
| `doublet_detection/scdblfinder` | Scores and calls doublets with scDblFinder (`set.seed(123)` and BiocParallel `RNGseed=123`) and exports per-cell predictions. | Seeded / quasi-deterministic — seeds are fixed, but iterative model fitting can still vary slightly across R/package versions. | hash |
| `doublet_detection/scds` | Scores and calls doublets using bcds + cxds + hybrid (scds R package) with `set.seed(0)`. | Seeded / quasi-deterministic — seed is fixed, but internal boosting results may vary slightly across R/package versions. | column names — `versions` YAML + obs column names; no `process.out` MD5s |

### `hugounifier/`

Expand Down Expand Up @@ -163,7 +164,7 @@ The **Test strategy (this branch)** column describes what the tests on this bran
| `cluster` | Full clustering pipeline: neighbours → UMAP → Leiden at multiple resolutions → Shannon entropy. | Seeded / quasi-deterministic for UMAP; **non-deterministic** due to unseeded Leiden. | structural — **`workflow.out.versions` only** (each as YAML); graph / embedding presence asserted in code outside `snapshot`. |
| `combine` | Merges all samples and runs all configured integration methods. | Inherits from constituent modules — ranges from fully deterministic (no integration) to seeded/quasi-deterministic (scVI, Harmony, Seurat). | structural — **`workflow.out.versions` (YAML) + `adata.yaml`** on merged H5AD. |
| `differential_expression` | Runs rank-genes-groups DE analysis across all combinations of clustering labels, conditions, and cell-type subsets. | Fully deterministic for the default wilcoxon/t-test methods. | structural — **`workflow.out.versions` only** (YAML); DE / MultiQC presence asserted outside `snapshot` where needed. |
| `doublet_detection` | Runs one or more doublet-detection methods (scds, solo, scrublet, doubletdetection) and removes called doublets. | **Non-deterministic** — solo, scrublet, and doubletdetection have stochastic components; scds is seeded. | structural + **range assertion** on **`n_obs`**; snapshot uses **`versions` (YAML) + `adata.yaml`**. |
| `doublet_detection` | Runs one or more doublet-detection methods (scds, scdblfinder, solo, scrublet, doubletdetection) and removes called doublets. | **Non-deterministic** — solo, scrublet, and doubletdetection have stochastic components; scds and scdblfinder are seeded. | structural + **range assertion** on **`n_obs`**; snapshot uses **`versions` (YAML) + `adata.yaml`**. |
| `finalize` | Assembles the final AnnData by extending it with all collected obs/obsm/uns/layers outputs. | Fully deterministic | hash — **`workflow.out.h5ad` + `workflow.out.versions` (YAML) + `adata.yaml`** — not a bare `snapshot(workflow.out)` in non-stub tests. |
| `integrate` | Applies HVG selection then one or more integration methods (scVI, scANVI, Harmony, BBKNN, ComBat, Seurat, SCimilarity). | Seeded / quasi-deterministic for scVI/scANVI/ComBat/Seurat/BBKNN; **non-deterministic** for Harmony. | structural — **`workflow.out.versions` (YAML) + `adata.yaml`** on integration H5AD (e.g. Harmony / BBKNN / ComBat tests). |
| `load_h5ad` | Loads input files in H5AD, 10x H5, RDS, or CSV format and converts all to AnnData H5AD. | Fully deterministic | hash — **`snapshot(workflow.out)` only** (passthrough-safe; avoids `anndata().yaml` on unstaged inputs per nf-test rules). |
Expand Down
9 changes: 5 additions & 4 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,10 @@ sample3,/absolute/path/to/sample3.csv
There are a couple of optional columns that can be used for more advanced features:

```csv title="samplesheet.csv"
sample,filtered,unfiltered,batch_col,label_col,condition_col,unknown_label,min_genes,min_cells,min_counts_cell,min_counts_gene,expected_cells,ambient_correction,ambient_corrected_integration
sample1,/absolute/path/to/sample1_filtered.h5ad,/absolute/path/to/sample1.h5ad,batch,cell_type,condition,unknown,1,2,3,4,5000,true,false
sample2,relative/path/to/sample2_filtered.rds,relative/path/to/sample2.rds,batch_id,annotation,condition,unannotated,5,6,7,8,3000,false,
sample3,/absolute/path/to/sample3_filtered.csv,/absolute/path/to/sample3.csv,,,,,9,10,11,12,,true,true
sample,filtered,unfiltered,batch_col,label_col,condition_col,unknown_label,min_genes,min_cells,min_counts_cell,min_counts_gene,expected_cells,doublet_rate,ambient_correction,ambient_corrected_integration
sample1,/absolute/path/to/sample1_filtered.h5ad,/absolute/path/to/sample1.h5ad,batch,cell_type,condition,unknown,1,2,3,4,5000,0.08,true,false
sample2,relative/path/to/sample2_filtered.rds,relative/path/to/sample2.rds,batch_id,annotation,condition,unannotated,5,6,7,8,3000,,false,
sample3,/absolute/path/to/sample3_filtered.csv,/absolute/path/to/sample3.csv,,,,,9,10,11,12,,,true,true
```

For CSV input files, specifying the `batch_col`, `label_col`, `condition_col`, and `unknown_label` columns will not have any effect, as no additional metadata is available in the CSV file.
Expand All @@ -63,6 +63,7 @@ For CSV input files, specifying the `batch_col`, `label_col`, `condition_col`, a
| `min_counts_cell` | Minimum number of counts required for a cell to be considered. Defaults to `1`. |
| `min_counts_gene` | Minimum number of counts required for a gene to be considered. Defaults to `1`. |
| `expected_cells` | Number of expected cells, used as input to CellBender for empty droplet detection. |
| `doublet_rate` | Optional expected doublet rate (0-1) for `scDblFinder`. If not provided, `scDblFinder` estimates it internally. |
| `max_mito_percentage` | Maximum percentage of mitochondrial reads for a cell to be considered. Defaults to `100`. |
| `ambient_correction` | Whether to perform ambient RNA correction for this sample. Set to `true` to use the globally configured method, `false` to skip ambient correction for this sample. Defaults to `true`. |
| `ambient_corrected_integration` | Whether to use ambient-corrected counts for integration for this sample. Set to `true` to use corrected counts in downstream integration, `false` to store them only as additional layers. Can override the global `--ambient_corrected_integration` parameter. Defaults to global setting. |
Expand Down
11 changes: 11 additions & 0 deletions modules/local/doublet_detection/scdblfinder/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
name: scdblfinder
channels:
- conda-forge
- bioconda
dependencies:
- bioconda::bioconductor-scdblfinder=1.24.0
- bioconda::bioconductor-singlecellexperiment=1.32.0
- bioconda::bioconductor-biocparallel=1.44.0
- bioconda::bioconductor-anndatar=1.0.2
- bioconda::bioconductor-rhdf5=2.54.1
- conda-forge::r-tidyverse=2.0.0
32 changes: 32 additions & 0 deletions modules/local/doublet_detection/scdblfinder/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
process SCDBLFINDER {
tag "$meta.id"
label 'process_medium'

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/99/993a012a69d920412b090701eb733ccf35c8655c3d012756ca6b0af1cfcd4780/data' :
'community.wave.seqera.io/library/bioconductor-anndatar_bioconductor-biocparallel_bioconductor-rhdf5_bioconductor-scdblfinder_pruned:0f9db6b0855861de' }"

input:
tuple val(meta), path(h5ad), val(dbr), val(batch_col)

output:
tuple val(meta), path("${prefix}.h5ad"), emit: h5ad
tuple val(meta), path("${prefix}.csv"), emit: predictions
path "versions.yml", emit: versions, topic: versions

when:
task.ext.when == null || task.ext.when

script:
prefix = task.ext.prefix ?: "${meta.id}"
template('scdblfinder.R')

stub:
prefix = task.ext.prefix ?: "${meta.id}"
"""
touch ${prefix}.h5ad
touch ${prefix}.csv
touch versions.yml
"""
}
Loading
Loading