Time series forecasting project using the Store Sales - Time Series Forecasting dataset from Kaggle. Predicts daily unit sales for Corporacion Favorita, an Ecuadorian grocery chain with 54 stores and 33 product families.
Forecast daily sales for ~1,782 store-family combinations using historical sales data (2013-2017), oil prices, holidays, store metadata, and transaction counts. Evaluation metric: RMSLE (Root Mean Squared Logarithmic Error).
| Phase | Notebook | Description |
|---|---|---|
| EDA | 01_data_exploration.ipynb |
Sales distributions, temporal patterns, oil/holiday/store analysis |
| Features | 02_feature_engineering.ipynb |
Lag/rolling/cyclical features, holiday processing, temporal split |
| Baselines | 03_baseline_models.ipynb |
Global Mean, Naive, Seasonal Naive, SMA, WMA |
| Statistical | 04_statistical_models.ipynb |
Decomposition, Exp. Smoothing, ARIMA, SARIMA |
| ML | 05_ml_models.ipynb |
Random Forest, XGBoost, LightGBM |
| Evaluation | 06_evaluation_and_insights.ipynb |
Grand comparison, error analysis, business insights |
- Seasonal Naive is the strongest baseline (strong weekly seasonality)
- Holt-Winters and SARIMA capture weekly patterns, beating simple baselines
- LightGBM delivers the best overall performance across all store-family combinations
- Top features: lag features (lag_1, lag_7) > day_of_week > rolling means > promotions
- Log-transform target (
log1p) aligns MSE loss with RMSLE metric - Leakage prevention:
shift(1)before all rolling features; temporal-only splits - Holiday handling: separate flags for national, regional, and local holidays
- Oil prices: forward-filled for non-trading days
demand-forecasting/
├── data/raw/ # Original CSVs (not tracked)
├── data/processed/ # Parquet feature files
├── notebooks/ # 6 Jupyter notebooks
├── figures/ # Saved plots
├── models/ # Saved model artifacts
├── app/ # Streamlit dashboard
├── requirements.txt
└── .gitignore
# 1. Create venv
python3 -m venv --system-site-packages .venv
source .venv/bin/activate
# 2. Install dependencies
pip install -r requirements.txt
# 3. Download data (requires Kaggle API key)
mkdir -p data/raw
kaggle competitions download -c store-sales-time-series-forecasting -p data/raw
unzip data/raw/store-sales-time-series-forecasting.zip -d data/raw
# 4. Register Jupyter kernel
python -m ipykernel install --user --name=demand-forecasting
# 5. Run notebooks in order (01 -> 06)
jupyter lab
# 6. Optional: run Streamlit dashboard
streamlit run app/streamlit_dashboard.pyPython 3.14 | pandas | NumPy | Matplotlib | Seaborn | scikit-learn | statsmodels | XGBoost | LightGBM | Streamlit
0941179fd3f61f79d742a28e8fa224774a54a92d