FlagSparse

GPU sparse operations package (SpMV, SpMM, SpGEMM, SDDMM, gather, scatter, sparse formats).

Install

pip install . --no-deps --no-build-isolation

Use --no-build-isolation to avoid downloading build deps when offline.

Runtime dependencies (install when needed):

pip install torch triton cupy-cuda12x

Layout

src/flagsparse/ - core package (sparse_operations/ is emitted as several .py modules from string literals in flagsparse.py)
tests/ - pytest tests
benchmark/ - performance benchmarks

Tests

Run from project root, or cd tests then run scripts (paths like ../matrix for .mtx dir).

The commands below are the repository's documented invocation standard. CPU-only install, build, help-text, and smoke paths are checked in CI; GPU-specific examples are documented but not executed there unless you opt into the triton smoke job locally.

pytest accuracy suite - small synthetic CUDA cases, selectable by operator marker:

pytest tests/pytest --mode quick
pytest tests/pytest --mode normal -m "spmv_csr or spmm_csr"
python run_flagsparse_pytest.py --mode quick --ops gather,spmv_csr,spmm_csr --gpus 0
python run_flagsparse_pytest.py --op-list ops.txt --gpus 0,1 --results-dir pytest_results

The runner writes per-operator accuracy.log files plus summary.json, summary.csv, and summary.xlsx when openpyxl is installed.

test_spmv.py - CSR SpMV (SuiteSparse .mtx, synthetic, or CSR CSV export):

python tests/test_spmv.py <dir_or_file.mtx>              # batch run, default float32
python tests/test_spmv.py <dir/> --dtype float64         # optional: --index-dtype int32|int64, --warmup, --iters, --no-cusparse
python tests/test_spmv.py --synthetic                    # synthetic benchmark
python tests/test_spmv.py <dir/> --csv-csr results.csv   # all value×index dtypes -> one CSV (per-matrix lines while running)

test_spmv_coo.py - COO SpMV (requires --synthetic or --csv-coo; no standalone .mtx batch):

python tests/test_spmv_coo.py --synthetic
python tests/test_spmv_coo.py <dir/> --csv-coo out.csv

test_spmv_opt.py - SpMV baseline vs optimised A/B (float32 / float64 only):

python tests/test_spmv_opt.py <dir_or_file.mtx> [...]
python tests/test_spmv_opt.py <dir/> --csv out.csv

test_spmm.py - CSR SpMM (.mtx batch, synthetic, or --csv):

python tests/test_spmm.py <dir_or_file.mtx>
python tests/test_spmm.py --synthetic                    # optional: --skip-api-checks, --skip-alg1-coverage
python tests/test_spmm.py <dir/> --csv results.csv      # float32/float64 + int32 in CSV; per-matrix console output
# common options: --dtype, --index-dtype, --dense-cols, --block-n, --block-nnz, --max-segments, --warmup, --iters, --no-cusparse

test_spmm_opt.py - CSR SpMM baseline vs optimised A/B:

python tests/test_spmm_opt.py <dir_or_file.mtx> --dense-cols 32
python tests/test_spmm_opt.py <dir/> --csv spmm_opt.csv  # optional: --dtype float32|float64, --dense-cols
# common options: --dtype, --dense-cols, --warmup, --iters

test_spmm_coo.py - native COO SpMM:

python tests/test_spmm_coo.py <dir_or_file.mtx>
python tests/test_spmm_coo.py --synthetic                # optional: --route rowrun|atomic|compare, --skip-api-checks, --skip-coo-coverage
python tests/test_spmm_coo.py <dir/> --csv out.csv      # only --route rowrun or atomic (not compare)
# same tuning flags as CSR SpMM where applicable: --dense-cols, --block-n, --block-nnz, --warmup, --iters, --no-cusparse

test_sddmm.py - CSR SDDMM (.mtx batch or --csv):

python tests/test_sddmm.py <dir_or_file.mtx> --k 64
python tests/test_sddmm.py <dir/> --csv out.csv          # optional: --dtype float32|float64, --acc_mode f32|f64, --k 64
# common options: --dtype, --index-dtype, --acc_mode, --k, --alpha, --beta, --warmup, --iters, --no-cupy-ref, --skip-api-checks

test_spgemm.py - CSR SpGEMM (.mtx batch or --csv):

python tests/test_spgemm.py <dir_or_file.mtx> --input-mode auto
python tests/test_spgemm.py <dir/> --csv results.csv     # optional: --dtype float32|float64, --input-mode auto|a_equals_b|a_at, --compare-device cpu|gpu
# common options: --dtype, --index-dtype, --warmup, --iters, --input-mode, --adaptive-loops, --no-cusparse, --ref-blocked-retry, --ref-isolated-retry, --ref-block-rows, --compare-device, --run-api-checks

test_spsv.py - SpSV (triangular solve; square matrices only). CSR and COO share this script; there is no test_spsv_coo.py.

python tests/test_spsv.py --synthetic
python tests/test_spsv.py <dir/> --csv-csr spsv.csv
python tests/test_spsv.py <dir/> --csv-coo out.csv      # same CSV columns as CSR

test_spsm.py - SpSM (triangular matrix-matrix solve; square matrices only):

python tests/test_spsm.py --synthetic --n 512 --rhs 32
python tests/test_spsm.py <dir/> --csv-csr spsm_csr.csv --rhs 32
python tests/test_spsm.py <dir/> --csv-coo spsm_coo.csv --rhs 32

test_gather.py / test_scatter.py - gather/scatter benchmarks (pytest or python tests/test_gather.py).

Accuracy suites should use tests/pytest/accuracy_utils.py for FlagGems-style golden reference and tolerance policy. Numeric compute operators compare against CPU-FP64 golden references cast back to the dtype under test, while exact/logical outputs compare against CPU int32 references.

CI/CD

.github/workflows/ci.yml is CPU-only and runs compile, format checks, lint, source-critical static checks, build, install, and smoke tests on GitHub-hosted runners.
The smoke set now covers installed-wheel validation, packaging metadata, public API surface, operator registry consistency, shared runtime policy helpers, CLI --help, and README command snippets.
conf/operators.yaml is the FlagGems-style operator interface registry for public FlagSparse sparse operators and sparse-format helpers.
.github/workflows/nightly-cpu.yml is a main-branch-only nightly CPU check that repeats the package, lint, and shared-runtime smoke tests.
.github/workflows/release.yml builds source and wheel artifacts, then attaches them to GitHub Releases on v* tags.
.github/workflows/triton-smoke.yml is a manual opt-in job for triton-dependent smoke checks.
.github/workflows/gpu-ci.yml is a manual GPU accuracy smoke workflow for a self-hosted runner labeled self-hosted, linux, and gpu.
.github/workflows/gpu-benchmark.yml adds an Actions button for synthetic GPU benchmark runs on a self-hosted runner labeled self-hosted, linux, and gpu.
.github/workflows/release-drafter.yml keeps draft release notes current from merged PRs.
make help lists the local entry points.
make ci / make check run the same CPU-only pipeline used by CI.
make format-check, make lint, and make lint-src are the non-GPU quality gates for CI formatting, CI helper lint, and critical package-source static checks.
make smoke is the CPU smoke stage alias.
make release-check / make release build, validate, and checksum release artifacts.
make triton-smoke and make triton-deps are opt-in local targets for the triton-dependent runtime checks.
make gpu-env-check validates CUDA visibility through tools/ci/check_gpu_environment.py on a GPU runner.
make gpu-benchmark runs the quick synthetic benchmark suite on a CUDA machine.
python tools/ci/run_gpu_benchmark.py --suite quick mirrors the manual GPU benchmark workflow locally on a CUDA machine.
python tools/ci/run_gpu_benchmark.py --suite full --matrix-dir tests/data runs the full benchmark matrix, including .mtx-backed SpGEMM and SDDMM suites against the repository test matrices.
tools/ci/requirements-ci.lock.txt and tools/ci/requirements-triton-smoke.lock.txt are the pinned local dependency bundles behind those make targets.
.github/dependabot.yml keeps GitHub Actions and Python dependency updates visible.
.github/ISSUE_TEMPLATE/ keeps issue entry points structured for bugs and feature requests.
The CI dependency bundle now stays on packaging and test tooling only; triton-dependent smoke is opt-in through FLAGSPARSE_TRITON_SMOKE=1.
Release artifacts now ship with a generated SHA256SUMS manifest and a matching checksum verification step in CI.
PR quality gates are implemented through the default CPU CI workflow; configure branch protection in GitHub to require the CI / Build and smoke test check before merge.
GPU accuracy and benchmark scripts still require CUDA hardware; the GPU workflows are manual and only run on a self-hosted GPU runner.

Performance

benchmark/performance_utils.py defines the pytest-style performance base class, default metrics (latency_base, latency, speedup), median timing, warmup/iteration controls, CUDA synchronization, CSV record helpers, and the two-level average speedup rule.
benchmark/attri_util.py and benchmark/core_shapes.yaml keep default and special shape grids centralized.
benchmark/summary_for_plot.py reads recorded benchmark CSV files and reports the two-level speedup summary.
benchmark/test_sparse_perf.py is an opt-in pytest entry point; real GPU runs remain manual or self-hosted because GitHub-hosted runners do not provide CUDA GPUs.
tests/data/*.mtx can be used as the default MatrixMarket smoke dataset for mtx-backed GPU benchmark suites.

License

This project is licensed under the Apache (Version 2.0) license.

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.github		.github
benchmark		benchmark
conf		conf
src/flagsparse		src/flagsparse
tests		tests
tools/ci		tools/ci
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_cn.md		README_cn.md
ops_support.csv		ops_support.csv
ops_support.py		ops_support.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
run_flagsparse_pytest.py		run_flagsparse_pytest.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlagSparse

Install

Layout

Tests

CI/CD

Performance

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FlagSparse

Install

Layout

Tests

CI/CD

Performance

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages