A single command-line tool that wraps STAR in STARsolo mode for uniform processing of scRNA-seq data across multiple platforms.
| Subcommand | Platform | Barcode type | Chemistry auto-detection |
|---|---|---|---|
10x |
10x Genomics (v1 – v4, multiome) | CB_UMI_Simple | ✅ |
smartseq |
Smart-seq / Smart-seq2 | SmartSeq (plate-based, no UMIs) | — |
dropseq |
Drop-seq | CB_UMI_Simple (no whitelist) | — |
rhapsody |
BD Rhapsody | CB_UMI_Complex (3 segments) | — |
indrops |
inDrops | CB_UMI_Complex (2 segments + adapter) | — |
strt |
STRT-seq | CB_UMI_Simple (96-barcode list) | — |
qc |
Aggregate QC stats across runs | — | — |
The following tools must be available in $PATH before running starsolo:
| Tool | Tested version | Purpose |
|---|---|---|
| STAR | 2.7.10a | Alignment & quantification |
| samtools | 1.15.1 | BAM indexing |
| seqtk | 1.3+ | Read subsampling (10x chemistry detection) |
| pbzip2 | 1.1+ | Compressing unmapped reads |
| BBMap | 38.97 | Adapter trimming (optional, via bbduk.sh) |
Tip: The included Dockerfile builds all of the above into a single container image.
STAR must be compiled with make STAR CXXFLAGS_SIMD="-msse4.2" to support --clipAdapterType CellRanger4 (see STAR#1218).
git clone <this-repo> && cd STARsolo
# Default: symlinks into ~/.local/bin
./install.sh
# Or specify a directory:
./install.sh /usr/local/bin
# Verify
starsolo --versionThe installer creates a single starsolo symlink. All library code is resolved relative to the repo, so you can git pull to update in-place.
Use a standard STAR genome index. Cell Ranger-filtered references are recommended for comparability with Cell Ranger output.
STAR --runThreadN 16 --runMode genomeGenerate --genomeDir STAR --genomeFastaFiles $FA --sjdbGTFfile $GTFBy default, starsolo resolves references via --species:
$STARSOLO_REF_BASE/<species>/2020A/index
Override with --ref /your/path/to/index on any subcommand, or change STARSOLO_REF_BASE in etc/defaults.conf.
Download 10x barcode whitelists from Cell Ranger and place them in the whitelist directory (default: /nfs/cellgeni/STAR/whitelists, configurable via --whitelist-dir or STARSOLO_WL_DIR).
starsolo <platform> <args…> [options]
| Flag | Description | Default |
|---|---|---|
-s, --species <name> |
Species name (human, mouse, …) — resolves reference automatically |
— |
-r, --ref <path> |
Explicit STAR index path (overrides --species) |
— |
-w, --whitelist-dir <dir> |
Barcode whitelist directory | from config |
-c, --cpus <N> |
Number of threads | 16 |
--bam |
Output coordinate-sorted BAM | off |
--no-bam |
Suppress BAM output (default) | ✅ |
-h, --help |
Show help (global or per-platform) | — |
--version |
Print version | — |
Automatically detects chemistry (v1–v4, multiome), strand specificity, and paired-end mode.
starsolo 10x /data/fastqs SAMPLE1 --species human
starsolo 10x /data/fastqs SAMPLE1 --ref /path/to/index --bamThe script subsamples 200,000 reads and matches barcodes against all known whitelists:
| 10x version | Whitelist | CB len | UMI len | Strand |
|---|---|---|---|---|
| 3' v1 | 737K-april-2014_rc.txt | 14 | 10 | Forward |
| 3' v2 | 737K-august-2016.txt | 16 | 10 | Forward |
| 3' v3/v3.1 | 3M-february-2018.txt | 16 | 12 | Forward |
| 3' v4 | 3M-3pgex-may-2023.txt | 16 | 12 | Forward |
| 5' v1.1/v2 | 737K-august-2016.txt | 16 | 10 | Reverse |
| 5' v3 | 737K-august-2016.txt | 16 | 12 | Reverse |
| 5' v4 | 3M-5pgex-jan-2023.txt | 16 | 12 | Reverse |
| multiome | 737K-arc-v1.txt | 16 | 12 | Forward |
5' libraries with long R1 reads (>50 bp) are automatically processed in paired-end mode with --soloBarcodeMate 1 --clip5pNbases 39 0. 3' libraries are always single-end.
Requires a manifest file — a tab-separated file with three columns: R1_path, R2_path, cell_name.
starsolo smartseq manifest.tsv --species mouseAliases: smart-seq, ss2
12 bp cell barcode, 8 bp UMI, no whitelist.
starsolo dropseq /data/fastqs SAMPLE1 --species humanAlias: drop-seq
3 barcode segments + 8 bp UMI. Requires Rhapsody_bc1/2/3.txt in the whitelist directory.
starsolo rhapsody /data/fastqs SAMPLE1 --species humanAliases: bd-rhapsody, bd
2 barcode segments + adapter + 6 bp UMI. Requires inDrops_Ambrose2_bc1/2.txt in the whitelist directory.
starsolo indrops /data/fastqs SAMPLE1 --species human
starsolo indrops /data/fastqs SAMPLE1 --species human --adapter GAGTGATTGCTTGTGACGCCAADefault adapter: GAGTGATTGCTTGTGACGCCTT
96-barcode whitelist, 8 bp cell barcode, 8 bp UMI. Requires 96_barcodes.list in the whitelist directory.
starsolo strt /data/fastqs SAMPLE1 --species human --strand ForwardExtra option: --strand <Forward|Reverse> (default: Forward)
Alias: strt-seq
Run from the parent directory containing per-sample output folders:
starsolo qc | column -tChecks for:
- Incomplete runs (
_STARtmpstill present) - Barcode cross-contamination between samples
- Low mapping percentages
Outputs a tab-separated table with read counts, mapping rates, cell counts, and configuration for each sample.
Edit etc/defaults.conf to change default paths and parameters. All values can also be set as environment variables:
export STARSOLO_REF_BASE=/my/references
export STARSOLO_WL_DIR=/my/whitelists
export STARSOLO_CPUS=8
export STARSOLO_BAM_MODE=bamCLI flags always take precedence over environment variables, which take precedence over config file defaults.
| Script | Purpose |
|---|---|
| scripts/bbduk_trim.sh | Adapter/polyA trimming via BBMap |
| scripts/bsub_submit.sh | Submit any starsolo command as an LSF job |
./scripts/bsub_submit.sh starsolo 10x /data/fastqs SAMPLE1 --species humanThis submits with 16 CPUs / 64 GB RAM to the normal queue.
Default: 16 CPUs, 64–128 GB RAM. Some datasets require 128 GB; if jobs fail with OOM errors, increase RAM in your job submission.
The included Dockerfile builds all required tools:
docker build -t starsolo .
docker run -v /data:/data starsolo starsolo 10x /data/fastqs SAMPLE1 --species humanSTARsolo/
├── bin/
│ └── starsolo # CLI entrypoint (add to PATH)
├── lib/
│ ├── common.sh # Shared functions (FASTQ discovery, compression, post-processing)
│ ├── platform_10x.sh # 10x Genomics logic
│ ├── platform_smartseq.sh # Smart-seq2 logic
│ ├── platform_dropseq.sh # Drop-seq logic
│ ├── platform_rhapsody.sh # BD Rhapsody logic
│ ├── platform_indrops.sh # inDrops logic
│ └── platform_strt.sh # STRT-seq logic
├── etc/
│ └── defaults.conf # Default configuration
├── scripts/
│ ├── solo_qc.sh # QC aggregation
│ ├── bbduk_trim.sh # Adapter trimming helper
│ └── bsub_submit.sh # LSF job submission helper
├── Dockerfile
├── install.sh
├── LICENSE
└── README.md
See LICENSE.