Skip to content

germs-lab/interbrc-core-analysis

Repository files navigation

Inter-BRC Core Analysis Singularity Container

This Docker/Singularity container enables reproducible execution of microbiome core analysis workflows (scripts 004-007) in HPC environments. These scripts can also be deployed individually on an HPC or powerful local machine.

Purpose & Scope

What This Container Does

  • Reproducible execution of microbiome analysis workflows
  • Isolation of R package dependencies using renv
  • Compatibility with HPC environments that prefer Singularity
  • Focused analysis on core microbiome selection and ordination analyses

Container Scope

This container is specifically designed for scripts 004-007 series and includes:

  • Core microbiome extraction algorithms
  • Ordination analysis functions
  • Required phyloseq objects and ASV matrices
  • Custom BRC analysis functions

What's Included

  • R 4.4+ with bioinformatics packages
  • Pre-computed phyloseq objects and ASV matrices
  • Custom BRC analysis functions
  • All required system dependencies (cmake, git, libcurl4-openssl-dev, libssl-dev, libxml2-dev, etc.)
  • R Package Management via renv for reproducible package management

Analysis Workflow

Core Microbiome Selection Pipeline

The analysis workflow follows a sequential pipeline from quality control through publication-ready figures:

R/analysis
├── 001_quality_control.R
├── 002_phyloseq_import_qc.R
├── 003_summary_stats.R
├── 004_BC_core_selection.R
├── 004_core_selection_workflow.R
├── 005_cluster_threshold_analysis_SABR.R
├── 006_save_main_physeqs.R
├── 007_ordinations.R
├── 008_community_tests.R
└── 009_paper_figures.R

Quality Control & Data Preparation (001-003)

  • 001_quality_control.R - Initial quality filtering and sample screening
  • 002_phyloseq_import_qc.R - Import ASV data into phyloseq objects with quality checks
  • 003_summary_stats.R - Generate descriptive statistics and exploratory visualizations

Core Microbiome Identification (004-005)

  • 004_BC_core_selection.R - Core microbiome identification using Bray-Curtis dissimilarity contribution

    • Performs rarefaction normalization (5000 reads/sample, 50 iterations)
    • Uses identify_core()from BRCore package to select ASVs contributing >2% to BC dissimilarity
    • Configurable parameters: priority_var, increase_value, abundance_weight, max_otus
    • Outputs: braycore_summary_004.rda
  • 004_core_selection_workflow.R - Extended core selection workflow

    • Comparative analysis of Bray-Curtis vs. prevalence-based core selection methods
    • Multi-threshold prevalence filtering (40%, 50%, 60%)
    • Overlays abundance-occupancy plots across selection methods
    • Calculates shared vs. unique ASV contributions to mean relative abundance
    • Neutral community model fitting with configurable starting parameters
    • Identifies methodological differences in core microbiome definition
  • 005_cluster_threshold_analysis_SABR.R - Clustering parameter sensitivity analysis

    • Evaluates core sequence proportion across OTU clustering identity thresholds (85-100%)
    • Tests occurrence thresholds (50-100%)

Data Export & Ordination (006-007)

  • 006_save_main_physeqs.R - Export phyloseq objects and corresponding FASTA files for downstream use
  • 007_ordinations.R - Community ordination analysis
    • Principal Coordinates Analysis (PCoA) with Bray-Curtis dissimilarity
    • Non-metric Multidimensional Scaling (NMDS)
    • Distance-based Redundancy Analysis (dbRDA)
    • Stratified by BRC institution and crop type

Statistical Testing & Publication (008-009)

  • 008_community_tests.R - Statistical hypothesis testing

    • PERMANOVA via adonis2() for community composition differences
    • Beta-dispersion analysis
    • Pairwise comparisons across metadata factors
  • 009_paper_figures.R - Publication-ready figure generation

    • Multi-panel layouts using patchwork
    • Consistent theming with custom BRC color schemes
    • Combined abundance-occupancy plots with core comparison metrics
    • Ordination plots with statistical overlays

Scripts for HPC Deployment

The following scripts are optimized for HPC execution via Singularity:

  • 004_BC_core_selection.R - Memory-intensive rarefaction and core identification
  • 004_core_selection_workflow.R - Multi-threshold analysis requiring parallel processing
  • 007_ordinations.R - Large-scale ordination across full dataset
  • 005_cluster_threshold_analysis_SABR.R - Computationally expensive parameter grid search

See Example 4: HPC Batch Submission for SLURM integration.

Quick Start

  1. Download container:
singularity pull docker://ghcr.io/germs-lab/interbrc-core-analysis/interbrc-lite-container:v6
  1. Run analysis:
singularity exec --no-home --pwd /opt/interbrc-core-analysis interbrc-lite-container_v6.sif Rscript "R/analysis/004_core_selection.R"

Usage Instructions

Docker and Singularity differ in their file structure and library path management. Singularity automatically binds host directories, causing R to look for packages on host system. To run analyses in an isolated environment, we avoid mounting local machine or HPC home directories.

Here we bind host file system to the containers file system to ensure that the analyses results save to the host system. The --no-home flag specifically prevents Singularity from mounting your host home directory, ensuring that everything (R executable, scripts, libraries) comes from the container's file system. See documentation on using --bind and --no-home flags here. If you want to modify the R script to run something new you then need to rebuild the container with the updated contents.

Example 1: Core Selection Analysis

singularity exec \
  --bind /path/to/your/output:/opt/interbrc-core-analysis/data/output \
  --no-home --pwd /opt/interbrc-core-analysis \
  interbrc-lite-container_v6.sif Rscript "R/analysis/004_BC_core_selection.R"

Example 2: Full Ordination Analysis (HPC)

singularity exec --bind \
  /path/to/your/output:/opt/interbrc-core-analysis/data/output \
  --no-home --pwd /opt/interbrc-core-analysis \
  interbrc-lite-container_v6.sif Rscript "R/analysis/007_ordinations.R"

Example 3: Multiple Bind Mounts

singularity exec \
  --bind /path/to/your/output:/opt/interbrc-core-analysis/data/output \
  --bind /path/to/your/plots:/opt/interbrc-core-analysis/data/output/plots \
  --no-home --pwd /opt/interbrc-core-analysis \
  interbrc-lite-container_v6.sif Rscript "R/analysis/007_ordinations_full.R"

Example 4 HPC SLURM Batch Submission

#!/bin/bash

#SBATCH --nodes=1   # Number of nodes to use
#SBATCH --ntasks-per-node=16   # Use 32 processor cores per node 
#SBATCH --time=3-0:0:0   # Walltime limit (DD-HH:MM:SS)
#SBATCH --mem=256G   # Maximum memory per node
#SBATCH --job-name="interbrc_core_sel"   # Job name to display in squeue
#SBATCH --mail-user=YOUREMAIL@iastate.edu   # Email address
#SBATCH --mail-type=ALL
#SBATCH --output="YOURPATH/slurm-%j-004_core_selection.out"   # Job standard output file (%j will be replaced by the slurm job id)
#SBATCH --error="YOURPATH/slurm-%j-004_core_selection.error"   # Job standard error file (%j will be replaced by the slurm job id)


#export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK # Set OMP_NUM_THREADS to the number of CPUs per task we asked for.

##Modules/Singularity
module purge
#module load micromamba/1.4.2-lcemqbe # Latest in Nova
module load singularity


# Basic session info
echo Start Job
echo nodes: $SLURM_JOB_NODELIST
echo job id: $SLURM_JOB_ID
echo Number of tasks: $SLURM_NTASKS

#Run R script in Singularity CE container
IMAGE_SIF=interbrc-lite-container_v6.sif
R_SCRIPT="R/analysis/004_core_selection_HPC.R"

#--bind $PWD/R:/opt/interbrc-core-analysis/R \
singularity exec \
  --bind $PWD/R:/opt/interbrc-core-analysis/R \
  --bind $PWD/data/output:/opt/interbrc-core-analysis/data/output \
  --no-home \
  --pwd /opt/interbrc-core-analysis \
  $IMAGE_SIF Rscript $R_SCRIPT


module purge

echo End Job

Key Point: Without bind mounts, all output stays inside the read-only container and is lost when the container stops running.

Build Process

You can build the Docker container then convert to Singularity (preferable if you want to modify components) or build your Singularity container directly.

Method 1: Pre-built Container (Recommended for Users)

Direct download:

singularity pull docker://ghcr.io/germs-lab/interbrc-core-analysis/interbrc-lite-container:v6

With authentication (if required):

export SINGULARITY_DOCKER_USERNAME=username
export SINGULARITY_DOCKER_PASSWORD=password
singularity build interbrc-lite-container_v6.sif docker://ghcr.io/germs-lab/interbrc-core-analysis/interbrc-lite-container:v6

Method 2: Docker → Singularity (For Customization)

Local build:

# Clone repository first
docker build -t interbrc-lite-container:v6 .
# Convert to Singularity
singularity build interbrc-lite-container_v6.sif docker-daemon://interbrc-lite-container:v6

Remote build:

docker pull ghcr.io/germs-lab/interbrc-core-analysis/interbrc-lite-container:v6
singularity build interbrc-lite-container_v6.sif docker-daemon://interbrc-lite-container:v6

Method 3: Direct Singularity Build

Remember to login to Singularity (see Singularity remote login & build environment)

export SINGULARITY_DOCKER_USERNAME=username
export SINGULARITY_DOCKER_PASSWORD=password
singularity build interbrc-lite-container_v6.sif docker://ghcr.io/germs-lab/interbrc-core-analysis/interbrc-lite-container:v6

Container Structure

/opt/interbrc-core-analysis/
├── Dockerfile
├── renv.lock
├── data/output/
│   ├── asv_matrices.rda
│   └── phyloseq_objects/filtered_phyloseq.rda
├── R/
│   ├── analysis/
│   ├── functions/
│   ├── references/
│   └── utils/
└── renv/
/opt/renvcache/          # renv package cache
/opt/Rlibsymlinks/       # R library symlinks

Troubleshooting

Library Path Issues

If R cannot find packages, ensure you're using isolation flags:

singularity exec --no-home --pwd /opt/interbrc-core-analysis container.sif Rscript script.R

Environment Variables (if needed)

SINGULARITYENV_RENV_PATHS_CACHE=/opt/renvcache \
SINGULARITYENV_R_LIBS=/opt/Rlibsymlinks \
singularity exec --no-home --pwd /opt/interbrc-core-analysis container.sif Rscript script.R

Authentication Issues

For private repositories, ensure proper Docker credentials are set before building.

Important Notes

  • Container is read-only once built as Singularity .sif file
  • Designed for specific analysis workflows, not general-purpose R environment
  • Requires adequate computational resources for ordination analyses
  • Best suited for HPC environments with Singularity support
  • Docker container runs isolated with its own file system
  • The file system from Docker to Singularity requires proper renv cache and library path management

Last updated: 2025-07-30_ _Maintained by: @jibarozzo

About

Analysis for Inter-BRC core manuscript

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5

Languages