NPDES Permit Analysis Tools

These tools are designed to access unit process data from the National Pollutant Discharge Elimination System (NPDES) permits for California.

Description

Researchers have utilized the CWNS to quantify greenhouse gas emissions. However, this data is infrequent, voluntary, and sparse. To address these limitations, we utilize NPDES permits. The following Python tools are used to collect its data:

Build CWNS process tables
- wwtp_process_extraction/step0_build_cwns_table.py: creates cwns_processes_by_facility.csv from CWNS 2004/2008/2012 data. 2022 doesn't include CA
Scrape permits and site metadata
- wwtp_process_extraction/step1_scrape_npdes.py: downloads NPDES permit PDFs and writes site_data.csv and matched_cwns_npdes_ca.csv
Uses npdes_detection.py helpers to detect which files are actually NPDES
Detect treatment processes in permits with keyword search
- wwtp_process_extraction/step2_search_npdes_text.py: scans PDFs against unitprocess_keywords and writes kw_unit_processes.csv with present/future status
Detect treatment processes in permits with LLM search
- wwtp_process_extraction/step3a_llm_ontology.py: run the LLM extraction using the ontology format
  - use --init_ontology to reload the ontology and make it up-to-date as a .txt file under wwtp_process_extraction/data
  - use --model "model_name" --pdf_folder "path_to_pdf_folder" --facilities_information "path_to_facilities_csv" to run the LLM extraction using one PDF per facility (first PDF_File per Facility Name): the results are saved as json file under output/date/llm_search_ontology
- wwtp_process_extraction/step3b_llm_list.py: run the LLM extraction using the unitprocess_list format
  - use --model "model_name" --pdf "pdf_file_or_pdf_folder_path" to run the LLM extraction using the specific model on the specific pdf(s) : the results are saved as json file under output/date/llm_search_list
Post-process LLM output back to CWNS format
- wwtp_process_extraction/step4_postprocess_llm_output.py: post-process the outputs of the LLM using the ontology and writes llm_ontology_cwns_processes_by_facility.csv with present/planned/past status.
Compare NPDES text extraction vs CWNS survey data
- wwtp_process_extraction/step5a_compare_aggregate_results.py: compares llm_unit_processes.csv to cwns_processes_by_facility.csv with bar chart comparisons

How to Run

Executing from the repository root directory:

python wwtp_process_extraction/step0_build_cwns_table.py
python wwtp_process_extraction/step1_scrape_npdes.py
python wwtp_process_extraction/step2_search_npdes_text.py
python wwtp_process_extraction/step3a_llm_ontology.py --init_ontology
python wwtp_process_extraction/step3a_llm_ontology.py --model gemini-2.0-flash-001 --pdf_folder wwtp_process_extraction/output/2026-2-18/npdes --facilities_information wwtp_process_extraction/data/test_set_npdes_manual.csv
python wwtp_process_extraction/step4_postprocess_llm_output.py
python wwtp_process_extraction/step5a_compare_aggregate_results.py
python wwtp_process_extraction/step5b_compare_facility_results.py

Known issues and limitations

When first running permit_scrape.py, a Timeout Error may appear. Continue to rerun until the program successfully opens ChromeDrive (Ensure that the "MM-DD-YYYY" Folder is deleted before rerunning).
There are two distinct locations where permit_scrape.py is slow:
1. After Region selection
2. Selection of "ALL" Display range
...

Contact

Constance Rouffet - rouffetc@stanford.edu

Ashley Ramirez - ashlecr3@uci.edu

Daly Wettermark - dalyw@stanford.edu

Fletcher Chapin - fchapin@stanford.edu

Acknowledgements

This work is funded in part by: Stanford SURGE program National Alliance for Water Innovation

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
wwtp_process_extraction		wwtp_process_extraction
.gitignore		.gitignore
CITATION.cff		CITATION.cff
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NPDES Permit Analysis Tools

Description

How to Run

Known issues and limitations

Contact

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NPDES Permit Analysis Tools

Description

How to Run

Known issues and limitations

Contact

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages