Skip to content

Reproducible and extensible feature extraction for ATAC-seq peak ranking.

Notifications You must be signed in to change notification settings

AllenInstitute/PyPeakRankR

Repository files navigation

PyPeakRanker

PyPeakRanker is a Python package for extracting quantitative features from a predefined set of ATAC-seq peaks and assembling them into a reproducible, analysis-ready table.
The resulting peak × feature matrix enables systematic ranking and comparison of regulatory elements across cell types, conditions, or species.

PyPeakRanker does not perform peak calling. Instead, it standardizes feature extraction so that peak prioritization can be performed reproducibly and transparently using downstream ranking or modeling approaches.

Given a fixed set of genomic peaks, PyPeakRanker:

  • Extracts multiple quantitative features per peak from ATAC-seq data
  • Aggregates features consistently across cell types or groups
  • Produces a unified table where rows represent peaks and columns represent features
  • Enables reproducible ranking of peaks across biological contexts

This design separates feature generation from ranking logic, allowing users to apply custom scoring functions, statistical tests, or machine-learning models downstream.


Statement of Need

ATAC-seq experiments generate large sets of candidate regulatory regions. However, peak prioritization across cell types or conditions is often performed using ad hoc scripts with inconsistent feature definitions and normalization strategies. This limits reproducibility and cross-study comparability.

Existing tools primarily focus on:

  • Peak calling
  • Differential accessibility testing
  • Genomic annotation

But they typically lack a standardized framework for reproducible peak-level feature extraction.

PyPeakRanker addresses this gap by providing a Python package that:

  • Systematically aggregates quantitative features for predefined ATAC-seq peaks
  • Produces a single, analysis-ready feature table
  • Enables transparent, reproducible peak ranking and comparative analyses

Features

PyPeakRanker currently supports extraction of the following peak-level features:

  • ATAC specificity
  • Sequence conservation (PhyloP score)
  • GC content
  • TSS distance
  • Peak skewness
  • Peak kurtosis
  • Peak bimodality
  • Gene marker score

The framework is modular and designed to be easily extended with additional peak-level features.


Installation

Install from source:

git clone https://github.com/AllenInstitute/PyPeakRankR
cd PyPeakRankR
pip install -e .

pip install git+https://github.com/AllenInstitute/PyPeakRankR.git

Quick Example

Initialize a feature table from a predefined peak set:

pypeakranker init \
  --peaks peaks.bed \
  --out features.tsv

Add signal summaries from BigWig files:

pypeakranker add-signal \
  --table features.tsv \
  --bigwig-files sample1.bigWig sample2.bigWig \
  --stat sum \
  --suffix summary \
  --out features.tsv

Add GC content from a reference genome:

pypeakranker add-gc \
  --table features.tsv \
  --reference-fasta genome.fa \
  --out features.tsv

The resulting features.tsv will contain:

Original peak coordinates and columns

  • One column per BigWig summary
  • A GC_content column

Author

Saroja Somasundaram

Acknowledgements

Development was assisted by AI-based coding tools.

About

Reproducible and extensible feature extraction for ATAC-seq peak ranking.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages