pridepy is a Python client and CLI for the PRIDE Archive API.
You can:
- download public and private PRIDE files
- download by category (
RAW,SEARCH,RESULT, etc.) - stream project and file metadata
- search projects by keyword and filters
- download raw files from ProteomeXchange XML metadata
The downloader supports ftp, aspera, s3, and globus.
By default it starts with FTP, falls back across the remaining protocols when needed, and validates downloaded files (non-empty, and checksum validation when enabled).
- Python
>=3.9
Install as a CLI tool:
uv tool install pridepy
pridepy --helpOr run without installing globally:
uvx pridepy --helppip install --upgrade pridepy
pridepy --helpgit clone https://github.com/PRIDE-Archive/pridepy
cd pridepy
uv sync --extra dev
uv run pridepy --helppridepy download-all-public-raw-files \
-a PXD008644 \
-o ./downloads/PXD008644 \
--checksum-checkWhat this does:
- default
ftpstarts with FTP and falls back (ftp -> aspera -> s3 -> globus) --checksum-checkdownloads project checksums and validates files- empty/corrupt files are retried automatically
pridepy download-all-public-raw-files \
-a PXD008644 \
-o ./downloads/PXD008644 \
--skip-if-downloaded-already \
--checksum-checkpridepy download-all-public-category-files \
-a PXD022105 \
-o ./downloads/PXD022105 \
-c RAW,SEARCHpridepy download-file-by-name \
-a PXD022105 \
-f checksum.txt \
-o ./downloads/PXD022105 \
--checksum-checkpridepy download-px-raw-files \
-a PXD039236 \
-o ./downloads/PXD039236pridepy download-files-by-list \
-a PXD001819 \
-F files.txt \
-o ./downloads/PXD001819 \
--checksum-checkfiles.txt is one filename per line (blank lines and # comments are
ignored). Internally each filename is resolved against the project metadata
API and downloaded via the same batch + protocol-fallback engine as
download-all-public-raw-files. Use -f a.raw,b.raw,c.raw instead of
-F for a small inline list.
Useful options:
-p globus— use the globus download strategy (HTTP Range + resume)-w 3— download up to 3 files in parallel (globus only, max 3)--checksum-check— validate files against PRIDE checksums after download
pridepy download-files-by-url \
-F urls.txt \
-o ./downloads/urlsurls.txt is one fully-qualified URL per line. Schemes http, https, and
ftp are dispatched to the matching downloader. Use -u/--urls for one or
more comma-separated URLs, e.g. --urls https://a.com/x.raw,ftp://b.com/y.raw.
Note: URLs containing literal commas are not supported with --urls; use a
manifest file (-F) instead.
Useful options:
-p globus— use globus download strategy for http/https URLs (resume-capable)-w 3— download up to 3 files in parallel (globus only, max 3)--checksum-check— validate against PRIDE checksums (accession inferred from PRIDE URL paths; only PRIDE archive URLs are supported)
pridepy --helpMain commands:
download-all-public-raw-filesdownload-all-public-category-filesdownload-file-by-namedownload-files-by-listdownload-files-by-urldownload-px-raw-fileslist-private-filesstream-files-metadatastream-projects-metadatasearch-projects-by-keywords-and-filters
pridepy search-projects-by-keywords-and-filters \
-k human \
-f projectTags==ProteomeTools,organismsPart==Pancreas \
-sd DESC \
-sf accession \
-sf submissionDatepridepy stream-projects-metadata -o all_pride_projects.jsonpridepy stream-files-metadata -a PXD005011 -o PXD005011_files.jsonList files:
pridepy list-private-files -a PXD022105 -u YOUR_USER -p YOUR_PASSWORDDownload a private file:
pridepy download-file-by-name \
-a PXD022105 \
-f checksum.txt \
-o ./downloads/private \
--username YOUR_USER \
--password YOUR_PASSWORDfrom pridepy.files.files import Files
files = Files()
raw_files = files.get_all_raw_file_list("PXD008644")
print(f"RAW files: {len(raw_files)}")
print(raw_files[0]["fileName"])from pridepy.project.project import Project
project = Project()
results = project.search_by_keywords_and_filters(
keyword="PXD009476",
query_filter="",
page_size=25,
page=0,
sort_direction="DESC",
sort_fields="accession",
)
print(f"Hits: {len(results)}")Run tests:
uv run pytestLint:
uv run flake8 .Build distributions:
uv buildpridepy is published via GitHub Actions (.github/workflows/python-publish.yml) using uv build and a PyPI API token secret (PYPI_API_TOKEN).
A white paper is available in paper/paper.md.
- Fork the repository
- Create a branch (
git checkout -b feature/my-change) - Install dev dependencies (
uv sync --extra dev) - Run tests and lint (
uv run pytest,uv run flake8 .) - Commit and push your branch
- Open a pull request
Kamatchinathan, S., Hewapathirana, S., Bandla, C., Insua, S., Vizcaíno, J. A., & Perez-Riverol, Y. (2025). pridepy: A Python package to download and search data from PRIDE database. Journal of Open Source Software, 10(107), 7563. doi:10.21105/joss.07563