Skip to content

shimlab/PALAVA

Repository files navigation

PALAVA: a neural-network model for inferring genetic pathways from single-cell data

Build using the scvi-tools

Performs nonlinear factor analysis and incorporates gene set information to pre-annotate factors with gene sets.

Alt text

Overview

PALAVA is a nonlinear factor analysis method that incorporates prior knowledge through gene sets. The method provides interpretable dimension reduction to analyse biological signals in the data. The methods models use annotated latent variable uses a gene set as prior knowledge. Thus, we associates the the biological meaning of the gene set to the corresponding latent variable. This provides us with a more meaningful latent space, as the latent variables are pre-annotated with biological meaning. We also assume we are unaware of the which gene sets (or biological processes) are relevant to the data. Thus reasonably excessive gene sets can be provided. The method provides factor importance scores that ranks the factors based on importance. The modelling is flexible enough to infer nonlinear relationships between genes to capture more complicated biological processes in the data. The design of the annotated decoder also accounts for errors in the gene set. Consequently, through interpretability techniques the gene sets can be refined based on information from the data. Thus the method can introduce relevant genes into the gene set or not use gene set genes if the data does not have such a signal in the counts data.

Installation

  • Create conda environment with python 3.10 conda create --name palava-env python=3.10
  • Activate conda environment, conda activate palava-env
  • Then run pip install git+ssh://git@github.com/shimlab/PALAVA.git

If you want an editable installation or from cloned repo, then

  • Clone this directory
  • After cloning, navigate to the repo (in the same path as the setup.py), run pip install -e . or remove the -e for normal installation.

Test run

This notebook test runs the method and visualises the output of the method. It requires the palava_on_sim_data_a_test.h5ad data file in the directory. Highly recommended to use a gpu. Training will take less than 10 mins with gpu (on cpu approx 1 hour).

Required input of PALAVA

  • The raw counts of single RNA seq data to analyse (should not be log transformed)
  • Set of gene sets you think could be relevant to the data at had (example: 50 Hallmark gene sets)
  • example on simulated data can be found in example_notebooks.

Output generated by PALAVA

  • Factor importance scores : Ranks the factors based on most importance
  • Factor activations: Provides the representation of the data in terms of factors (factor to cell relationship, analogous to PCA representation of the data)
  • Factor scores: Provides the factor loadings of the data (factor to gene relationship , analogous to factor loadings). This can be used to refine the gene sets.

TODO

  • resolve warnings

Notes

  • If using mac use accelerator='cpu', the package is not compatible with mps.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages