GitHub - LangeLab/PXAudit: Audit PRIDE proteomics study metadata from the command line. Scores each dataset on a 7-tier FAIR ladder and a quantification-readiness axis. Results land in a local SQLite database.

Audit Proteomics Exchange (PRIDE) study metadata from the command line.

PXAudit fetches a PRIDE dataset's project metadata and file list, classifies every file with a deterministic FileTypeClassifier, then assigns a 7-tier FAIR ladder and a quantification-readiness tier. Results are written to a local SQLite database.

Installation

Requires Python >= 3.12. uv is the recommended runner.

git clone https://github.com/LangeLab/PXAudit.git
cd PXAudit
uv sync
uv run pxaudit --help

Quick Start

uv run pxaudit check PXD000001

On first run, PXAudit fetches project metadata and file lists from the PRIDE REST API and caches both responses under ~/.pxaudit_cache/. Subsequent runs for the same accession are instant (cache hits skip the network entirely). Audit results are written to pxaudit_results.db in the current directory.

Usage

`pxaudit check`

Audit a single Proteomics Exchange accession.

uv run pxaudit check PXD004683
uv run pxaudit check PXD004683 --no-cache   # bypass local cache
uv run pxaudit check PXD004683 --db ~/audits/lab.db

Options: --refresh (re-fetch, update cache), --no-cache (skip cache reads), --db PATH (SQLite output path, default pxaudit_results.db).

Non-PRIDE accessions (MSV, JPST, IPX) are accepted without error and assigned the Unverifiable tier; PXAudit only has access to the PRIDE API.

`pxaudit bulk-audit`

Audit multiple accessions in batch.

uv run pxaudit bulk-audit --input accessions.txt
uv run pxaudit bulk-audit --input accessions.txt --format tsv --output results.tsv
cat accessions.txt | uv run pxaudit bulk-audit --input -

Options: --format tsv|json|csv, --output PATH, --delay SECONDS, --continue-on-error, --overwrite.

`pxaudit manifest`

List files for an accession from the audit database.

uv run pxaudit manifest PXD004683
uv run pxaudit manifest PXD004683 --format json

Options: --db PATH (default pxaudit_results.db), --format tsv|json.

Example Output

Accession : PXD000001
Tier      : Silver
Quant Tier: Partial
------------------------------------------------
Metadata
  ✔ Title         TMT proteomics of human cell lines
  ✔ Organism      Homo sapiens (9606)
  ✔ Instrument    LTQ Orbitrap Velos
  ✘ Organism part annotated
  ✔ Publication   linked
  ✘ Quant metadata (CV methods)
------------------------------------------------
Files (142 total)
  ✔ Result/Search files present
  ✔ PSI-standard results (mzIdentML / mzTab-ID)
  ✔ Open spectra (mzML / MGF)
  ✘ SDRF file present
  ✔ mzTab summary present
  ✘ Tabular quant table (proteinGroups / evidence)
------------------------------------------------

Tier System

PXAudit scores each dataset on a 7-tier FAIR ladder. Every tier adds one FAIR requirement to the previous; a dataset must satisfy all criteria up to and including the tier it claims.

Tier	Requirements
None	Missing a mandatory metadata field (title, organism, or instrument).
Raw	Mandatory metadata present; no processed result files found.
Bronze	Result/search files present, but none are PSI-standard (mzIdentML / mzTab).
Silver	PSI-standard results present; no SDRF experimental-design file.
Gold	SDRF present; open spectra (mzML / MGF) or organism-part annotation missing.
Platinum	Open spectra + organism-part annotation present; no linked PubMed publication.
Diamond	All FAIR criteria met: PSI results, SDRF, open spectra, organism part, and a publication.

Tier logic is version-stamped (tier_logic_version = "v2.0") and stored in the database so that re-scoring after a logic update can be detected.

Quant Tier (secondary axis)

The quant tier is independent of the FAIR tier and indicates quantification readiness.

Quant Tier	Meaning
Unverifiable	Non-PRIDE accession; cannot be evaluated.
No Quant	No PSI-standard results and no tabular quant files.
Partial	Either PSI-standard IDs or a quant table, but not both.
Quant-Ready	PSI IDs + tabular quant table present; CV-term quantification metadata missing.
Quant-Complete	PSI IDs + tabular quant table + CV-term method metadata are fully described.

Validated Results

The following scores were last verified against the live PRIDE REST API on 2026-03-21 and are included in the integration test suite.

Accession	Tier	Quant Tier
PXD057701	Raw	No Quant
PXD002244	Bronze	No Quant
PXD000001	Silver	Partial
PXD073444	Platinum	Partial
PXD075811	Platinum	Partial
PXD004683	Diamond	Partial

Output Database

Every check run upserts three tables in the SQLite database:

Table	Description
`study`	One row per accession: title, organism, instrument, submission year and type, keywords.
`study_files`	One row per file: name, PRIDE category, extension, FTP URL, size in bytes.
`audit`	One row per accession: computed tier, quant tier, 13 `has_*` quality flags, `files_fetch_failed`, `is_unverifiable`, and `tier_logic_version`.

Example queries

-- Tier distribution across all audited datasets
SELECT tier, COUNT(*) AS n FROM audit GROUP BY tier ORDER BY n DESC;

-- All Diamond datasets
SELECT accession, quant_tier FROM audit WHERE tier = 'Diamond';

-- Datasets ready for re-scoring after a logic update
SELECT accession FROM audit WHERE tier_logic_version != 'v2.0';

-- File-type breakdown for a single accession
SELECT file_category, COUNT(*) AS n
FROM study_files
WHERE accession = 'PXD004683'
GROUP BY file_category;

Development Setup

uv sync
uv run pre-commit install

Pre-commit runs ruff (lint + format, line-length 100) and mypy (strict mode) on every commit. See the wiki for detailed reference documentation.

Project Layout

src/pxaudit/
├── cli.py              # click entry points (check, bulk-audit, manifest)
├── tier_engine.py      # 7-tier FAIR ladder + quant tier logic
├── file_classifier.py  # deterministic FileClass assignment for every file type
├── pride_client.py     # PRIDE REST API v3 client with pagination + retry/backoff
├── db.py               # SQLite schema + upsert helpers + migrations
└── cache.py            # local JSON response cache (~/.pxaudit_cache/)

Testing

# Unit tests (default, no network required)
uv run pytest

# With coverage report
uv run pytest --cov=pxaudit --cov-report=term-missing

# Live integration tests against the real PRIDE API (requires network)
uv run pytest -m integration -v --no-cov

The default run excludes integration tests (-m 'not integration' is set in pyproject.toml). The test suite has 455 unit tests with 100% branch coverage across all modules, plus 12 live integration tests covering six real PRIDE accessions.

Roadmap

Reporting: pxaudit report --db results.db generating tier distributions, SDRF adoption trends, metadata completeness over time, and an exemplar shortlist as a Quarto-rendered HTML report.
Multi-repository: plugin adapters for MassIVE, jPOST, and iProX so non-PRIDE accessions are audited rather than marked Unverifiable.

Contributions and issue reports are welcome.

Citation

If you use PXAudit in your research, please cite it as:

@software{ergin_pxaudit_2026,
  author   = {Ergin, Enes Kemal},
  title    = {{PXAudit}: A command-line tool for auditing {Proteomics Exchange} study metadata},
  year     = {2026},
  version  = {0.3.0},
  url      = {https://github.com/LangeLab/PXAudit},
  license  = {MIT},
}

A CITATION.cff file is included in the repository root for tools that parse it automatically (e.g. GitHub's Cite this repository button, Zenodo).

License

MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github		.github
assets		assets
src/pxaudit		src/pxaudit
tests		tests
wiki		wiki
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Quick Start

Usage

`pxaudit check`

`pxaudit bulk-audit`

`pxaudit manifest`

Example Output

Tier System

Quant Tier (secondary axis)

Validated Results

Output Database

Development Setup

Project Layout

Testing

Roadmap

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Installation

Quick Start

Usage

pxaudit check

pxaudit bulk-audit

pxaudit manifest

Example Output

Tier System

Quant Tier (secondary axis)

Validated Results

Output Database

Development Setup

Project Layout

Testing

Roadmap

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`pxaudit check`

`pxaudit bulk-audit`

`pxaudit manifest`

Packages