PyPeakRanker is a Python package for extracting quantitative features from a predefined set of ATAC-seq peaks and assembling them into a reproducible, analysis-ready table.
The resulting peak × feature matrix enables systematic ranking and comparison of regulatory elements across cell types, conditions, or species.
PyPeakRanker does not perform peak calling. Instead, it standardizes feature extraction so that peak prioritization can be performed reproducibly and transparently using downstream ranking or modeling approaches.
Given a fixed set of genomic peaks, PyPeakRanker:
- Extracts multiple quantitative features per peak from ATAC-seq data
- Aggregates features consistently across cell types or groups
- Produces a unified table where rows represent peaks and columns represent features
- Enables reproducible ranking of peaks across biological contexts
This design separates feature generation from ranking logic, allowing users to apply custom scoring functions, statistical tests, or machine-learning models downstream.
ATAC-seq experiments generate large sets of candidate regulatory regions. However, peak prioritization across cell types or conditions is often performed using ad hoc scripts with inconsistent feature definitions and normalization strategies. This limits reproducibility and cross-study comparability.
Existing tools primarily focus on:
- Peak calling
- Differential accessibility testing
- Genomic annotation
But they typically lack a standardized framework for reproducible peak-level feature extraction.
PyPeakRanker addresses this gap by providing a Python package that:
- Systematically aggregates quantitative features for predefined ATAC-seq peaks
- Produces a single, analysis-ready feature table
- Enables transparent, reproducible peak ranking and comparative analyses
PyPeakRanker currently supports extraction of the following peak-level features:
- ATAC specificity
- Sequence conservation (PhyloP score)
- GC content
- TSS distance
- Peak skewness
- Peak kurtosis
- Peak bimodality
- Gene marker score
The framework is modular and designed to be easily extended with additional peak-level features.
Install from source:
git clone https://github.com/AllenInstitute/PyPeakRankR
cd PyPeakRankR
pip install -e .
pip install git+https://github.com/AllenInstitute/PyPeakRankR.git
Initialize a feature table from a predefined peak set:
pypeakranker init \
--peaks peaks.bed \
--out features.tsvAdd signal summaries from BigWig files:
pypeakranker add-signal \
--table features.tsv \
--bigwig-files sample1.bigWig sample2.bigWig \
--stat sum \
--suffix summary \
--out features.tsvAdd GC content from a reference genome:
pypeakranker add-gc \
--table features.tsv \
--reference-fasta genome.fa \
--out features.tsvThe resulting features.tsv will contain:
Original peak coordinates and columns
- One column per BigWig summary
- A GC_content column
Saroja Somasundaram
Development was assisted by AI-based coding tools.