Skip to content

PayneLab/nifty

Repository files navigation

NIFty

Never Impute Features (thank you).

The pre-print manuscript associated with this tool can be found here: Classification with Missing Data - A NIFty Pipeline for Single-Cell Proteomics

NIFty is a python program for data-driven cell annotation (classification). NIFty can be used for top-scoring pairs (TSP)-based rule generation and feature selection, classification model generation, and model application on unlabaled data. NIFty is unique in that it does not require missing-value imputation, avoids common circular analysis pitfalls by default, and overcomes batch effects. The primary application is for classifying large molecular data, like proteomics.

NIFty uses a minimum of two user-provided tables as input for feature selection and model generation:

  1. a table with quantification data, proteins (or some other molecular data type) as the columns and samples as the rows; and
  2. a table that has the label (class) for each sample.

For model application, a minimum of one user-provided table is used as input:

  1. a table with quantification data, proteins (or some other molecular data type) as the columns and samples as the rows.

Quantitation measurements can come from any search tool and any number of measurements (minimum of 2) can be provided.

The output from our program depends on which functionalty the user would like to run. In the 'find_features' mode, the output is a list of the k best TSP-based features/rules that can be used to train a machine learning classifier for sample annotation. In the 'train_model' mode, the output is a machine learning model trained on the selected features. In the 'apply_model' mode, the output is a list of predicted sample labels or label probabilities from applying the trained model on experimental, unlabeled data.

The important thing is that we never impute; we can deal with null values.

After downloading this repository, run NIFty on your own data with the following command on the commandline (assuming config.toml exists in the same directory):

python nifty.py

After downloading this repository, run NIFty on your own data with the following command on the commandline (with a custom config filepath):

python nifty.py -c <config/file/path>

Requirements

NIFty requires Python (>= 3.11) and the following Python packages to be installed:

  • cloudpickle
  • numpy
  • pandas
  • scikit-learn
  • statsmodels

Codebase Structure

The codebase functions as follows: NIFty Flowchart

Run Modes

NIFty can be executed in several modes depending on which steps of the pipeline you want to run:

  1. find_features: Generate and score rules to find the best k features for classification.
  2. train_model: Train a machine learning classifier using the selected features.
  3. apply_model: Apply the trained classifier on experimental, unlabeled data.

You can control this behavoir using a .toml configuration file.

File Formats and Descriptions

  • A description of all necessary input files and their required formats can be found here.
  • A description of all output files can be found here.

Use Cases

Each of the use-case documents below contain the following information: (1) a brief description about when to run a particular use case of NIFty; and (2) changes to default configurations needed to run that particular use case (to see a default configuration file, see File Formats and Descriptions).

Citing NIFty

Classification with Missing Data - A NIFty Pipeline for Single-Cell Proteomics

Alyssa A Nitz, Benjamin Echarry, Blake McGee, Samuel H Payne

bioRxiv 2026.03.06.710179; doi: https://doi.org/10.64898/2026.03.06.710179

About

Never Impute Features (thank you)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages