pridepy

pridepy is a Python client and CLI for the PRIDE Archive API.

You can:

download public and private PRIDE files
download by category (RAW, SEARCH, RESULT, etc.)
stream project and file metadata
search projects by keyword and filters
download raw files from ProteomeXchange XML metadata

The downloader supports ftp, aspera, s3, and globus.
By default it starts with FTP, falls back across the remaining protocols when needed, and validates downloaded files (non-empty, and checksum validation when enabled).

Requirements

Python >=3.9

Installation

Option 1: Install from PyPI with uv (recommended)

Install as a CLI tool:

uv tool install pridepy
pridepy --help

Or run without installing globally:

uvx pridepy --help

Option 2: Install from PyPI with pip

pip install --upgrade pridepy
pridepy --help

Option 3: Install from source (development)

git clone https://github.com/PRIDE-Archive/pridepy
cd pridepy
uv sync --extra dev
uv run pridepy --help

Quick Start (New Users)

1) Download all raw files for a project (robust mode)

pridepy download-all-public-raw-files \
  -a PXD008644 \
  -o ./downloads/PXD008644 \
  --checksum-check

What this does:

default ftp starts with FTP and falls back (ftp -> aspera -> s3 -> globus)
--checksum-check downloads project checksums and validates files
empty/corrupt files are retried automatically

2) Continue interrupted downloads safely

pridepy download-all-public-raw-files \
  -a PXD008644 \
  -o ./downloads/PXD008644 \
  --skip-if-downloaded-already \
  --checksum-check

3) Download only selected categories

pridepy download-all-public-category-files \
  -a PXD022105 \
  -o ./downloads/PXD022105 \
  -c RAW,SEARCH

4) Download one file by name

pridepy download-file-by-name \
  -a PXD022105 \
  -f checksum.txt \
  -o ./downloads/PXD022105 \
  --checksum-check

5) Download raw files from ProteomeXchange

pridepy download-px-raw-files \
  -a PXD039236 \
  -o ./downloads/PXD039236

6) Download a named subset of files (manifest)

pridepy download-files-by-list \
  -a PXD001819 \
  -F files.txt \
  -o ./downloads/PXD001819 \
  --checksum-check

files.txt is one filename per line (blank lines and # comments are ignored). Internally each filename is resolved against the project metadata API and downloaded via the same batch + protocol-fallback engine as download-all-public-raw-files. Use -f a.raw,b.raw,c.raw instead of -F for a small inline list.

Useful options:

-p globus — use the globus download strategy (HTTP Range + resume)
-w 3 — download up to 3 files in parallel (globus only, max 3)
--checksum-check — validate files against PRIDE checksums after download

7) Download files from raw URLs

pridepy download-files-by-url \
  -F urls.txt \
  -o ./downloads/urls

urls.txt is one fully-qualified URL per line. Schemes http, https, and ftp are dispatched to the matching downloader. Use -u/--urls for one or more comma-separated URLs, e.g. --urls https://a.com/x.raw,ftp://b.com/y.raw. Note: URLs containing literal commas are not supported with --urls; use a manifest file (-F) instead.

Useful options:

-p globus — use globus download strategy for http/https URLs (resume-capable)
-w 3 — download up to 3 files in parallel (globus only, max 3)
--checksum-check — validate against PRIDE checksums (accession inferred from PRIDE URL paths; only PRIDE archive URLs are supported)

CLI Command Overview

pridepy --help

Main commands:

download-all-public-raw-files
download-all-public-category-files
download-file-by-name
download-files-by-list
download-files-by-url
download-px-raw-files
list-private-files
stream-files-metadata
stream-projects-metadata
search-projects-by-keywords-and-filters

More CLI Examples

Search projects

pridepy search-projects-by-keywords-and-filters \
  -k human \
  -f projectTags==ProteomeTools,organismsPart==Pancreas \
  -sd DESC \
  -sf accession \
  -sf submissionDate

Stream all project metadata to JSON

pridepy stream-projects-metadata -o all_pride_projects.json

Stream all file metadata for one accession

pridepy stream-files-metadata -a PXD005011 -o PXD005011_files.json

Download private files

List files:

pridepy list-private-files -a PXD022105 -u YOUR_USER -p YOUR_PASSWORD

Download a private file:

pridepy download-file-by-name \
  -a PXD022105 \
  -f checksum.txt \
  -o ./downloads/private \
  --username YOUR_USER \
  --password YOUR_PASSWORD

Python API Examples

Example: get raw files for a project

from pridepy.files.files import Files

files = Files()
raw_files = files.get_all_raw_file_list("PXD008644")
print(f"RAW files: {len(raw_files)}")
print(raw_files[0]["fileName"])

Example: search projects

from pridepy.project.project import Project

project = Project()
results = project.search_by_keywords_and_filters(
    keyword="PXD009476",
    query_filter="",
    page_size=25,
    page=0,
    sort_direction="DESC",
    sort_fields="accession",
)
print(f"Hits: {len(results)}")

Development and Release (uv)

Run tests:

uv run pytest

Lint:

uv run flake8 .

Build distributions:

uv build

pridepy is published via GitHub Actions (.github/workflows/python-publish.yml) using uv build and a PyPI API token secret (PYPI_API_TOKEN).

White Paper

A white paper is available in paper/paper.md.

Contributing

Fork the repository
Create a branch (git checkout -b feature/my-change)
Install dev dependencies (uv sync --extra dev)
Run tests and lint (uv run pytest, uv run flake8 .)
Commit and push your branch
Open a pull request

Citation

Kamatchinathan, S., Hewapathirana, S., Bandla, C., Insua, S., Vizcaíno, J. A., & Perez-Riverol, Y. (2025). pridepy: A Python package to download and search data from PRIDE database. Journal of Open Source Software, 10(107), 7563. doi:10.21105/joss.07563

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pridepy

Requirements

Installation

Option 1: Install from PyPI with uv (recommended)

Option 2: Install from PyPI with pip

Option 3: Install from source (development)

Quick Start (New Users)

1) Download all raw files for a project (robust mode)

2) Continue interrupted downloads safely

3) Download only selected categories

4) Download one file by name

5) Download raw files from ProteomeXchange

6) Download a named subset of files (manifest)

7) Download files from raw URLs

CLI Command Overview

More CLI Examples

Search projects

Stream all project metadata to JSON

Stream all file metadata for one accession

Download private files

Python API Examples

Example: get raw files for a project

Example: search projects

Development and Release (uv)

White Paper

Contributing

Citation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

pridepy

Requirements

Installation

Option 1: Install from PyPI with uv (recommended)

Option 2: Install from PyPI with pip

Option 3: Install from source (development)

Quick Start (New Users)

1) Download all raw files for a project (robust mode)

2) Continue interrupted downloads safely

3) Download only selected categories

4) Download one file by name

5) Download raw files from ProteomeXchange

6) Download a named subset of files (manifest)

7) Download files from raw URLs

CLI Command Overview

More CLI Examples

Search projects

Stream all project metadata to JSON

Stream all file metadata for one accession

Download private files

Python API Examples

Example: get raw files for a project

Example: search projects

Development and Release (uv)

White Paper

Contributing

Citation