A modular system that tracks deals from online marketplaces. It periodically checks for new deals and monitors changes like price drops in existing listings.
- Modular Scraping: Extensible scrapers for different websites
- Smart Filtering: Configurable filters for specific criteria
- Price Tracking: Monitors price changes over time
- PostgreSQL Storage: Robust data storage with history tracking
- File System Cache: All listings cached locally for offline analysis and filter development
- Change Detection: Automatic versioning when listings are modified
- Automated Execution: Designed for periodic execution via cron
-
Auto Listings (Bazos.sk)
- BMW E36, E46, E39 vehicles
- Filters: 6-cylinder petrol engine, manual transmission
-
Real Estate (Bazos.sk)
- Land plots, houses, and cottages
- Filters: ≥40,000 m² (4 hectares), price <400,000 EUR
- Python 3.10 or higher
- PostgreSQL 12 or higher
- pip (Python package manager)
- Clone the repository
cd deal_watcher- Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies
pip install -r requirements.txt- Set up PostgreSQL database
# Create database
createdb deal_watcher
# Or using psql:
psql -U postgres
CREATE DATABASE deal_watcher;
\q- Initialize database schema
psql -U postgres -d deal_watcher -f database/schema.sql- Configure environment variables
cp .env.example .env
# Edit .env and set your database connection stringExample .env file:
DB_CONNECTION_STRING=postgresql://username:password@localhost:5432/deal_watcher
LOG_LEVEL=INFO
- Customize configuration (optional)
Edit deal_watcher/config/config.json to customize:
- Search criteria
- Max pages to scrape
- Request delays
- Filter parameters
python -m deal_watcher.main[2025-11-12 10:30:15] INFO - Starting scraper: BMW E-Series Manual
[2025-11-12 10:30:18] INFO - Scraping page 1/10
[2025-11-12 10:30:18] INFO - Found 20 listings on page 1
[2025-11-12 10:30:45] INFO - ✓ NEW: BMW E46 330i Manual - 12,500€ (ID: 184779117)
[2025-11-12 10:31:02] INFO - ↓ PRICE CHANGE: BMW E39 528i - 8,200€ (ID: 183456789)
[2025-11-12 10:45:30] INFO - Scraper complete: 15 new deals, 3 price changes
Add to crontab (crontab -e):
# Run every 6 hours
0 */6 * * * cd /path/to/deal_watcher && /path/to/venv/bin/python -m deal_watcher.main >> /var/log/deal_watcher.log 2>&1deal_watcher/
├── deal_watcher/
│ ├── config/
│ │ └── config.json # Configuration file
│ ├── database/
│ │ ├── models.py # SQLAlchemy models
│ │ ├── repository.py # Database operations
│ │ ├── schema.sql # Database schema
│ │ └── migrations/ # Database migrations
│ ├── cache/
│ │ ├── cache_manager.py # File system cache manager
│ │ └── __init__.py
│ ├── scrapers/
│ │ ├── base_scraper.py # Abstract base scraper
│ │ ├── bazos_scraper.py # Bazos.sk common logic
│ │ ├── auto_scraper.py # Auto listings scraper
│ │ └── reality_scraper.py # Real estate scraper
│ ├── filters/
│ │ ├── base_filter.py # Abstract base filter
│ │ ├── auto_filter.py # Auto listings filter
│ │ └── reality_filter.py # Real estate filter
│ ├── utils/
│ │ ├── logger.py # Logging utilities
│ │ └── http_client.py # HTTP client with retries
│ └── main.py # Main CLI application
├── cache/ # File system cache (gitignored)
│ └── bazos/ # Organized by source
│ ├── auto/ # And category
│ └── reality/
├── docs/ # Documentation
│ ├── setup/ # Setup guides
│ ├── architecture/ # Design and technical docs
│ ├── development/ # Development notes
│ ├── _work.md # Work summary & future plans
│ └── _quick_summary.md # Quick reference guide
├── requirements.txt # Python dependencies
├── .env.example # Environment variables template
└── README.md # This file
- categories: Scraping categories (auto, reality)
- deals: Main deals table
- price_history: Price change tracking
- deal_images: Deal images
- scraping_runs: Execution history and statistics
- Price history tracking
- Deal lifecycle tracking (first seen, last seen, active status)
- Flexible metadata storage (JSONB)
- Comprehensive indexing for performance
Edit deal_watcher/config/config.json:
{
"scrapers": [
{
"name": "BMW E-Series Manual",
"enabled": true,
"category_id": 1,
"url": "https://auto.bazos.sk/bmw/",
"type": "auto",
"max_pages": 10,
"filters": {
"keywords_any": ["E36", "E46", "E39"],
"keywords_all": ["benzin", "manuál"],
"keywords_engine": ["6 valec", "6-valec"],
"keywords_excluded": ["havarovan", "automat"]
}
}
]
}Auto Filter:
keywords_any: At least one must match (models)keywords_all: All must be present (fuel, transmission)keywords_engine: At least one engine type must matchkeywords_excluded: None should be presentprice_min,price_max: Price range
Reality Filter:
area_min: Minimum area in m²price_max: Maximum pricekeywords_excluded: Exclude specific terms
- Create scraper class inheriting from
BaseScraper - Implement required abstract methods
- Add configuration to
config.json - Create corresponding filter class
- Create filter class inheriting from
BaseFilter - Implement
matches()method - Add filter factory logic in
main.py
# Test PostgreSQL connection
psql -U username -d deal_watcher -c "SELECT 1;"# Ensure you're in the project root
export PYTHONPATH="${PYTHONPATH}:$(pwd)"- Check
LOG_LEVEL=DEBUGin.envfor detailed logs - Verify website structure hasn't changed
- Check network connectivity
- Review rate limiting settings
- This tool is for personal use only
- Respect website Terms of Service
- Implement appropriate rate limiting
- Do not republish scraped content
- Review robots.txt compliance
All scraped listings are automatically cached to the file system, enabling:
- Offline Analysis: Test filters without re-scraping
- Historical Tracking: Multiple versions saved when content changes
- Fast Iteration: Develop filters on cached data
- Data Persistence: Keep records even after listings are removed
See docs/architecture/cache_system.md for detailed documentation.
cache/bazos/auto/184779117/
├── 2025-11-15_143020.json # Initial scrape
└── 2025-11-16_091234.json # After price change
{
"cache": {
"enabled": true,
"cache_dir": "cache",
"save_all_listings": true
}
}- Email/Telegram notifications for new matches
- Web dashboard for browsing deals
- More sophisticated NLP filtering
- Additional website modules
- Cache analytics and visualization
- Deal similarity detection
- Market analytics and trends
This project is for educational and personal use only.
For issues, questions, or contributions, please refer to the project documentation:
- Quick Reference:
docs/_quick_summary.md - Work Summary:
docs/_work.md - Architecture:
docs/architecture/design_document.md - Setup Guide:
docs/setup/setup_guide.md