+Many researchers working in single-cell biology and bioinformatics consider a count matrix (e.g. a gene-by-cell count matrix for single-cell RNA-seq or peak-by-cell count matrix for single-cell ATAC-seq) to be the starting point of their analysis. Yet, the preprocessing methods that generate these matrices from raw sequencing data must solve several difficult challenges, and differences in the underlying assumptions and computational procedures can result in impactful and meaningful changes in downstream analysis. I will describe some of the challenges that face single-cell preprocessing methods, and will describe alevin-fry and alevin-fry-atac, efficient open-source methods for preprocessing of single-cell data that we have developed in our lab. I will argue for the importance of accurate, efficient, and fully open-source methods in building single-cell processing pipelines. Finally, I will describe the simpleaf framework, which is designed to simplify raw-data processing, to codify best practices, and to enhance computational reproducibility and provenance tracking.
0 commit comments