Skip to content

felipefl142/Store-Sales-Forecasting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Demand Forecasting — Store Sales (Favorita - Ecuador)

Time series forecasting project using the Store Sales - Time Series Forecasting dataset from Kaggle. Predicts daily unit sales for Corporacion Favorita, an Ecuadorian grocery chain with 54 stores and 33 product families.

Problem Statement

Forecast daily sales for ~1,782 store-family combinations using historical sales data (2013-2017), oil prices, holidays, store metadata, and transaction counts. Evaluation metric: RMSLE (Root Mean Squared Logarithmic Error).

Methodology

Phase Notebook Description
EDA 01_data_exploration.ipynb Sales distributions, temporal patterns, oil/holiday/store analysis
Features 02_feature_engineering.ipynb Lag/rolling/cyclical features, holiday processing, temporal split
Baselines 03_baseline_models.ipynb Global Mean, Naive, Seasonal Naive, SMA, WMA
Statistical 04_statistical_models.ipynb Decomposition, Exp. Smoothing, ARIMA, SARIMA
ML 05_ml_models.ipynb Random Forest, XGBoost, LightGBM
Evaluation 06_evaluation_and_insights.ipynb Grand comparison, error analysis, business insights

Key Results

  • Seasonal Naive is the strongest baseline (strong weekly seasonality)
  • Holt-Winters and SARIMA capture weekly patterns, beating simple baselines
  • LightGBM delivers the best overall performance across all store-family combinations
  • Top features: lag features (lag_1, lag_7) > day_of_week > rolling means > promotions

Key Technical Decisions

  • Log-transform target (log1p) aligns MSE loss with RMSLE metric
  • Leakage prevention: shift(1) before all rolling features; temporal-only splits
  • Holiday handling: separate flags for national, regional, and local holidays
  • Oil prices: forward-filled for non-trading days

Project Structure

demand-forecasting/
├── data/raw/              # Original CSVs (not tracked)
├── data/processed/        # Parquet feature files
├── notebooks/             # 6 Jupyter notebooks
├── figures/               # Saved plots
├── models/                # Saved model artifacts
├── app/                   # Streamlit dashboard
├── requirements.txt
└── .gitignore

Reproduction

# 1. Create venv
python3 -m venv --system-site-packages .venv
source .venv/bin/activate

# 2. Install dependencies
pip install -r requirements.txt

# 3. Download data (requires Kaggle API key)
mkdir -p data/raw
kaggle competitions download -c store-sales-time-series-forecasting -p data/raw
unzip data/raw/store-sales-time-series-forecasting.zip -d data/raw

# 4. Register Jupyter kernel
python -m ipykernel install --user --name=demand-forecasting

# 5. Run notebooks in order (01 -> 06)
jupyter lab

# 6. Optional: run Streamlit dashboard
streamlit run app/streamlit_dashboard.py

Tech Stack

Python 3.14 | pandas | NumPy | Matplotlib | Seaborn | scikit-learn | statsmodels | XGBoost | LightGBM | Streamlit

Store-Sales-Forecasting

0941179fd3f61f79d742a28e8fa224774a54a92d

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors