Audit Proteomics Exchange (PRIDE) study metadata from the command line.
PXAudit fetches a PRIDE dataset's project metadata and file list, classifies every file with a deterministic FileTypeClassifier, then assigns a 7-tier FAIR ladder and a quantification-readiness tier. Results are written to a local SQLite database.
Requires Python >= 3.12. uv is the recommended runner.
git clone https://github.com/LangeLab/PXAudit.git
cd PXAudit
uv sync
uv run pxaudit --helpuv run pxaudit check PXD000001On first run, PXAudit fetches project metadata and file lists from the PRIDE REST API and caches both responses under ~/.pxaudit_cache/. Subsequent runs for the same accession are instant (cache hits skip the network entirely). Audit results are written to pxaudit_results.db in the current directory.
Audit a single Proteomics Exchange accession.
uv run pxaudit check PXD004683
uv run pxaudit check PXD004683 --no-cache # bypass local cache
uv run pxaudit check PXD004683 --db ~/audits/lab.dbOptions: --refresh (re-fetch, update cache), --no-cache (skip cache reads),
--db PATH (SQLite output path, default pxaudit_results.db).
Non-PRIDE accessions (MSV, JPST, IPX) are accepted without error and
assigned the Unverifiable tier; PXAudit only has access to the PRIDE API.
Audit multiple accessions in batch.
uv run pxaudit bulk-audit --input accessions.txt
uv run pxaudit bulk-audit --input accessions.txt --format tsv --output results.tsv
cat accessions.txt | uv run pxaudit bulk-audit --input -Options: --format tsv|json|csv, --output PATH, --delay SECONDS,
--continue-on-error, --overwrite.
List files for an accession from the audit database.
uv run pxaudit manifest PXD004683
uv run pxaudit manifest PXD004683 --format jsonOptions: --db PATH (default pxaudit_results.db), --format tsv|json.
Accession : PXD000001
Tier : Silver
Quant Tier: Partial
------------------------------------------------
Metadata
✔ Title TMT proteomics of human cell lines
✔ Organism Homo sapiens (9606)
✔ Instrument LTQ Orbitrap Velos
✘ Organism part annotated
✔ Publication linked
✘ Quant metadata (CV methods)
------------------------------------------------
Files (142 total)
✔ Result/Search files present
✔ PSI-standard results (mzIdentML / mzTab-ID)
✔ Open spectra (mzML / MGF)
✘ SDRF file present
✔ mzTab summary present
✘ Tabular quant table (proteinGroups / evidence)
------------------------------------------------
PXAudit scores each dataset on a 7-tier FAIR ladder. Every tier adds one FAIR requirement to the previous; a dataset must satisfy all criteria up to and including the tier it claims.
| Tier | Requirements |
|---|---|
| None | Missing a mandatory metadata field (title, organism, or instrument). |
| Raw | Mandatory metadata present; no processed result files found. |
| Bronze | Result/search files present, but none are PSI-standard (mzIdentML / mzTab). |
| Silver | PSI-standard results present; no SDRF experimental-design file. |
| Gold | SDRF present; open spectra (mzML / MGF) or organism-part annotation missing. |
| Platinum | Open spectra + organism-part annotation present; no linked PubMed publication. |
| Diamond | All FAIR criteria met: PSI results, SDRF, open spectra, organism part, and a publication. |
Tier logic is version-stamped (
tier_logic_version = "v2.0") and stored in the database so that re-scoring after a logic update can be detected.
The quant tier is independent of the FAIR tier and indicates quantification readiness.
| Quant Tier | Meaning |
|---|---|
| Unverifiable | Non-PRIDE accession; cannot be evaluated. |
| No Quant | No PSI-standard results and no tabular quant files. |
| Partial | Either PSI-standard IDs or a quant table, but not both. |
| Quant-Ready | PSI IDs + tabular quant table present; CV-term quantification metadata missing. |
| Quant-Complete | PSI IDs + tabular quant table + CV-term method metadata are fully described. |
The following scores were last verified against the live PRIDE REST API on 2026-03-21 and are included in the integration test suite.
| Accession | Tier | Quant Tier |
|---|---|---|
| PXD057701 | Raw | No Quant |
| PXD002244 | Bronze | No Quant |
| PXD000001 | Silver | Partial |
| PXD073444 | Platinum | Partial |
| PXD075811 | Platinum | Partial |
| PXD004683 | Diamond | Partial |
Every check run upserts three tables in the SQLite database:
| Table | Description |
|---|---|
study |
One row per accession: title, organism, instrument, submission year and type, keywords. |
study_files |
One row per file: name, PRIDE category, extension, FTP URL, size in bytes. |
audit |
One row per accession: computed tier, quant tier, 13 has_* quality flags, files_fetch_failed, is_unverifiable, and tier_logic_version. |
Example queries
-- Tier distribution across all audited datasets
SELECT tier, COUNT(*) AS n FROM audit GROUP BY tier ORDER BY n DESC;
-- All Diamond datasets
SELECT accession, quant_tier FROM audit WHERE tier = 'Diamond';
-- Datasets ready for re-scoring after a logic update
SELECT accession FROM audit WHERE tier_logic_version != 'v2.0';
-- File-type breakdown for a single accession
SELECT file_category, COUNT(*) AS n
FROM study_files
WHERE accession = 'PXD004683'
GROUP BY file_category;uv sync
uv run pre-commit installPre-commit runs ruff (lint + format, line-length 100) and mypy (strict mode) on every commit. See the wiki for detailed reference documentation.
src/pxaudit/
├── cli.py # click entry points (check, bulk-audit, manifest)
├── tier_engine.py # 7-tier FAIR ladder + quant tier logic
├── file_classifier.py # deterministic FileClass assignment for every file type
├── pride_client.py # PRIDE REST API v3 client with pagination + retry/backoff
├── db.py # SQLite schema + upsert helpers + migrations
└── cache.py # local JSON response cache (~/.pxaudit_cache/)
# Unit tests (default, no network required)
uv run pytest
# With coverage report
uv run pytest --cov=pxaudit --cov-report=term-missing
# Live integration tests against the real PRIDE API (requires network)
uv run pytest -m integration -v --no-covThe default run excludes integration tests (-m 'not integration' is set in pyproject.toml). The test suite has 455 unit tests with 100% branch coverage across all modules, plus 12 live integration tests covering six real PRIDE accessions.
- Reporting:
pxaudit report --db results.dbgenerating tier distributions, SDRF adoption trends, metadata completeness over time, and an exemplar shortlist as a Quarto-rendered HTML report. - Multi-repository: plugin adapters for MassIVE, jPOST, and iProX so non-PRIDE accessions are audited rather than marked Unverifiable.
Contributions and issue reports are welcome.
If you use PXAudit in your research, please cite it as:
@software{ergin_pxaudit_2026,
author = {Ergin, Enes Kemal},
title = {{PXAudit}: A command-line tool for auditing {Proteomics Exchange} study metadata},
year = {2026},
version = {0.3.0},
url = {https://github.com/LangeLab/PXAudit},
license = {MIT},
}A CITATION.cff file is included in the repository root for tools that parse it automatically (e.g. GitHub's Cite this repository button, Zenodo).
MIT License. See LICENSE for details.