Releases: settylab/kompot
Releases · settylab/kompot
v0.7.0
[0.7.0] - 2026-04-13
Breaking changes
- Drop Python 3.9 support: kompot now requires Python ≥ 3.10 (driven by mellon ≥ 1.7.0 dependency).
New simplified API
kompot.de(),kompot.da(), andkompot.smooth_expression()now use Settings dataclasses (GPSettings,FDRSettings,FilterSettings,StorageSettings,OutputSettings) so the common case stays simple while advanced options remain discoverable. The oldcompute_differential_*andcompute_smoothed_expression()functions still work but emit a deprecation warning.dry_run=Trueonde()prints a resource plan (memory, disk, field overwrites) without running the analysis. Replaces the standalonedry_run_differential_expression().ModelSettingslets you inject pre-fitted predictors intode(),da(), andsmooth_expression()to skip fitting or reuse models across runs.
New features
- Null distribution inspection:
return_full_results=Truenow includes a"null"key in the result dict exposing all null gene data: Mahalanobis distances, smoothed expression, fold changes, z-scores, and standard deviations. A lightweight alternative (OutputSettings(return_null_data=True)) returns only the summary table and metadata (gene indices, names, seed, provenance) without the full expression matrices. - External null distributions for FDR: supply your own null distribution instead of relying on column-shuffled null genes.
FDRSettings(null_mahalanobis=...): pre-computed null Mahalanobis distances (e.g., from a control-vs-control run).FDRSettings(null_expression=(expr1, expr2)): raw null expression matrices fitted through the same GP model.FDRSettings(combine_with_internal=True): concatenate external and internal null distributions.
kompot.compute_fdr(real_mahal, null_mahal): standalone FDR computation from Mahalanobis distances (no AnnData needed). Returns a DataFrame withmahalanobis,pvalue,local_fdr,tail_fdr,is_de.kompot.extract_null_distribution(adata): extract Mahalanobis distances from a DE run for reuse as a null distribution elsewhere.kompot.recompute_fdr(adata, null_mahalanobis): recompute FDR on existing DE results with a new null distribution, updatingadata.varin place.DifferentialExpression.compute_fdr(null_mahal): sklearn-like method to compute FDR afterpredict(compute_mahalanobis=True).- Empirical variance (
GPSettings(use_empirical_variance=True)): estimates per-gene heteroscedastic noise from GP residuals and adjusts Mahalanobis distances accordingly. Works with or without biological replicates. CenteredLinearkernel for better extrapolation at cell-state boundaries (opt-in viacov_func; default remains Matern52).- More accurate uncertainty: density estimators now use mellon 1.7.1's default Laplacian optimizer instead of ADVI.
Run history and reproducibility
- Run parameters are now stored grouped by Settings dataclass, making them directly reconstructible.
RunInfo.call_args()returns a kwargs dict that reproduces the run — edit it and pass tode()/da()to re-run with tweaked parameters.RunInfo.to_settings()returns the Settings objects from a previous run for inspection.
Improvements
- Input validation at construction time: all Settings dataclasses now validate fields in
__post_init__. Invalid values likeGPSettings(sigma=-1)orFDRSettings(threshold=1.5)raise immediately with a clear message instead of failing deep inside mellon or JAX. The public API functions (de(),da(),smooth_expression()) also validate AnnData inputs upfront (obsm key shape, condition existence,condition1 != condition2, gene names, landmarks dimensions). - Plotting functions return
Optional[plt.Figure](controlled byreturn_fig) instead of(fig, ax)tuples, and no longer callplt.show(). - Consistent parameter naming across plot functions:
background_color_key→color,de_column→direction_column,embedding_key→basis. RunInfoHTML display now shows parameters hierarchically by Settings group (gp.sigma,fdr.threshold, …) instead of a flat list.RunComparisonshows individual changed fields (e.g.gp.ls_factor: 10.0 → 5.0) instead of opaque dict diffs.kompot smoothCLI command for single-condition GP smoothing from the command line, matching the full Python API (condition selection, gene subsetting, empirical variance, sample variance).--no-progressflag added to the DA CLI; progress bars can now be fully suppressed in both DA and DE.- DA CLI now exposes
--store-arrays-on-disk,--disk-storage-dir, and--max-memory-ratio, matching the DE CLI's StorageSettings coverage. - FDR is disabled by default when
sample_colis provided (not yet calibrated for sample variance). Override withFDRSettings(null_genes=...). - Remove
statsmodelsdependency.
Bug fixes
- Restore shared-landmark precomputation in DE (requires mellon ≥ 1.7.1). Mellon's
compute_landmarkshad a silent string-vs-enum bug wheregp_type="fixed"did not matchGaussianProcessType.FIXED, causing the function to returnNoneinstead of the documented fall-through. Kompot's shared-landmark precomputation inDifferentialExpression.fit()and the per-condition fallback inExpressionModel.fit()both routed through this code path, so on every DE call kompot was silently dropping the cross-condition shared landmark grid (each condition ended up with an independent full GP) and ignoring the user-suppliedrandom_statefor landmark selection (mellon's internal_compute_landmarksfell back to the hardcodedDEFAULT_RANDOM_SEED=42). Pinningmellon>=1.7.1enables the fix transparently — no kompot code changes were required. - Shared landmarks across conditions in DA.
DifferentialAbundance.fit()now passesgp_type="fixed"tocompute_landmarksand forwardsgp_type="fixed"to the per-conditionDensityEstimators. Previously, when either condition had fewer cells thann_landmarks, mellon's auto-selection fell back togp_type=FULLfor that estimator, silently discarding the shared-landmark grid that DA had just computed on the combined data — the two density predictors then used independent full GPs, breaking the symmetry assumption behind the Mahalanobis-style abundance comparison. This brings DA into structural parity with DE. - Fix local FDR numerical instability (Grenander estimator replaces statsmodels Poisson GLM).
- Fix tail FDR: replace Benjamini-Hochberg on empirical p-values (which breaks when
n_null<<n_genes) with fdrtool-style survival function ratioFdr(d) = S_null(d) / S_mix(d). - Fix
cell_filterdocs: parameter includes matching cells, not excludes. - Fix missing
field_mappingin DA run history:append_to_run_historywas called beforefield_mappingwas computed, so DA history entries never recorded which fields were written.
v0.6.3
- fix condition extraction across all plotting functions: condition names are now extracted from
run_infoparams (authoritative source) instead of fragile_extract_conditions_from_key()string-splitting, which was broken for multi-word con
dition names (e.g. "Pre-treatment", "Wild Type"). Affected functions:plot_gene_expression,volcano_da,volcano_de,multi_volcano_da,direction_barplot - silent fallback to pattern-matched layers/keys from potentially wrong runs has been replaced with explicit warnings in
plot_gene_expressionandvolcano_de(FDR/PTP key inference)
v0.6.2
v0.6.1
v.0.6.0
v0.5.2
- CSR→LIL→CSR layer conversion for faster appending of partial differential expression results
- same argument order in
dry_run_differential_expressionandcompute_differential_expression - bugfix: fdr computation when all p-values are 0
- increase testing coverage
- smaller pypi package
v0.5.1
- make DOI on zenodo.org
v0.5.0
- comprehensive FDR implementation for differential expression analysis
- FDR-based visualization in
volcano_deplots: support for local/tail FDR y-axes and coloring - posterior tail probability for differential expression
- introduction of "is_de" boolean column in
adata.varto indicate differential expression based on significance threshold - more flexible
volcano_deplot with FDR/PTP-based thresholding and y-axis options - "signal" and "strength" columns in stringDB gene-set enrichment analysis
- expand testing
- rename fields to include comparison, e.g., "A_to_B", before statistic name
- make de significance measures tail fdr, ptp, and zscore optional
- implement cleanup function
- bugfix: Prevent silent failure of
compute_differential_abundancewith sample variance
by making sure enough space is available on disk for covariance tensor. - dry run for differential expression
- split tutorials in 3 parts
- reduce memory demand when using batching and reflect this in dry run
- fix disk space checking to respect TMPDIR environment variable consistently
- include all computed results in full results dictionaries (std, fiel names, etc.)
v0.4.0
- StringDBReport class for gene set visualization and reporting
- make sure da directions categories are always retained and ordered correctly
- more flexible
volcano_deplot fold_change_modeparameter for heatmap to only show fold-change instead of split tiles- implement
RunInfoutility to fetch information about previous runs - bugfix passing
axtokompot.plot.embedding - implemented
mgroupsinkompot.plot.embeddingto plot multiple groupings - implement group-wise differential expression through
groupsparameter inkompot.compute_differential_expression - also return and store uncertainty estimates (stds) in de analysis
- also return and store z-scores in de analysis
- implement underrepresentation filtering for de analaysis
plot.embeddingscanpy wrapper can now plot multiple layer- make sure modified anndata is writable (use JSON for run info in
.uns) - option to store posterior covariance matrix in differential expression anndata function