Normalization factors and DESeqDataSetFromTximport equivalent#412
Normalization factors and DESeqDataSetFromTximport equivalent#412maltekuehl wants to merge 4 commits intoscverse:mainfrom
Conversation
|
Hi @maltekuehl, sorry I haven't had the time to review your PR yet. To answer your last remark: I'm all for switching to uv for package management. I don't see much of an issue in dropping support for python 3.10 starting from v0.5.3 as many packages (e.g., numpy) have also stopped supporting it in their latest releases. Please go ahead if you wish to handle those changes! I'd just suggest you make them in a separate PR. I'll try to have a look at this PR ASAP |
[pre-commit.ci] pre-commit autoupdate (scverse#415)
There was a problem hiding this comment.
Pull request overview
This PR adds support for normalization factors from pytximport, implementing functionality equivalent to DESeq2's DESeqDataSetFromTximport. It enables gene-length correction for transcript-level quantification data (e.g., from Salmon/Kallisto/RSEM).
Changes:
- Adds
from_pytximportparameter toDeseqDataSetwith validation, normalization factor computation infit_size_factors, and propagation through IRLS/Wald test/LFC shrinkage - Adds
estimate_norm_factorsfunction in preprocessing and updatesirls_solverto accept per-gene normalization factors - Adds tests, example notebook, and documentation for the pytximport integration
Reviewed changes
Copilot reviewed 12 out of 17 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| pydeseq2/dds.py | Adds from_pytximport flag, validation, normalization factor computation in fit_size_factors, and passes norm factors to IRLS |
| pydeseq2/ds.py | Updates Wald test and LFC shrinkage to use normalization factors when available |
| pydeseq2/preprocessing.py | Adds estimate_norm_factors function implementing DESeq2's estimateNormFactors |
| pydeseq2/utils.py | Updates irls_solver to accept and use per-gene normalization factors |
| pydeseq2/default_inference.py | Passes per-gene normalization factors through to irls_solver |
| pydeseq2/inference.py | Adds normalization_factors parameter to abstract irls method |
| tests/test_pytximport.py | Tests for detection, validation, normalization computation, and full pipeline |
| examples/plot_pytximport_example.py | Example notebook demonstrating pytximport integration |
| docs/source/index.rst | Documents pytximport support |
| docs/source/refs.bib | Adds pytximport citation |
| pyproject.toml | Updates ruff config to newer tool.ruff.lint format |
| docs/source/.DS_Store | Accidentally committed macOS metadata file |
| .gitignore | Adds sphinx auto_examples and ruff cache |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Store sample-wise geometric mean of norm factors as size_factors | ||
| # This maintains compatibility with existing code while incorporating | ||
| # the gene-length correction | ||
| # with np.errstate(divide="ignore"): | ||
| # log_norm_factors = np.log(norm_factors) | ||
| # log_geom_mean_per_sample = np.mean(log_norm_factors, axis=1) | ||
| # self.obs["size_factors"] = np.exp(log_geom_mean_per_sample) | ||
|
|
| # integer counts and that pytximport was used with counts_from_abundance=None | ||
| # (raw counts) to generate the AnnData object. | ||
|
|
||
| adata = ad.read_h5ad("../tests/data/pytximport/test_pytximport.h5ad") |
| # When pytximport data is used, PyDESeq2 computes normalization factors | ||
| # that account for both library size and gene length differences. | ||
|
|
||
| dds_explicit.fit_size_factors() |
| self.from_pytximport = from_pytximport | ||
|
|
||
| if self.from_pytximport: | ||
| print("Detected pytximport data with length offsets.") |
| offset = np.log(self.dds.obs["size_factors"]).values | ||
|
|
||
| if "normalization_factors" in self.dds.obsm: | ||
| offset = np.log(self.dds.obsm["normalization_factors"]) |
Fixes #305 and closes #359. Adds support for normalization factors based on length provided by
pytximport. Very much still a draft.Open issues:
Aside: Tests pass locally but fail due to an older AnnData version on Python 3.10 here. Would you be open for this PR to also include an update of the pre-commit, GitHub Actions, full move to uv/hatch/ruff like other scverse ecosystem packages and targeting Python 3.11 - 3.13?
CC @BorisMuzellec