Skip to content

Abeshith/MLOps_PipeLine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MACHINE LEARNING OPERATIONS PIPELINE

last commit Python Languages

Built with the tools and technologies:

Flask scikit-learn XGBoost DVC MLflow Docker Kubernetes Apache Airflow GitHub Actions Prometheus Grafana


📊 About the Project

This repository demonstrates a comprehensive MLOps pipeline that showcases industry-standard practices for end-to-end machine learning workflow automation. The project implements a production-ready ML system with automated training, validation, deployment, and monitoring capabilities.

🎯 Key Features

  • 6-Stage DVC Pipeline: Data ingestion → Validation → Feature engineering → Transformation → Training → Evaluation
  • XGBoost Model: Achieved 92.15% accuracy with automated hyperparameter tuning
  • MLflow Integration: Experiment tracking and model registry for version control
  • Production Monitoring: 15+ ML metrics with Prometheus, Grafana dashboards, and health endpoints
  • Container Orchestration: Docker containerization with Kubernetes deployment
  • CI/CD Automation: GitHub Actions for testing, security scanning, and deployment

🚀 Getting Started

Prerequisites

  • Python 3.11+
  • Docker & Docker Compose
  • Git & DVC
  • Kaggle Account (for data access)

Installation

  1. Clone the repository
git clone https://github.com/Abeshith/MLOps_PipeLine.git
cd MLOps_PipeLine
  1. Set up Python environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
  1. Configure Kaggle credentials
# Create kaggle.json in ~/.kaggle/ directory
{
  "username": "your_kaggle_username",
  "key": "your_kaggle_key"
}

🔄 Pipeline Execution

Complete Pipeline

# Run all stages
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python main.py

# Or use DVC
dvc repro

Individual Stages

# Stage 1: Data Ingestion
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python -m src.mlpipeline.pipeline.stage_01_data_ingestion

# Stage 2: Data Validation  
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python -m src.mlpipeline.pipeline.stage_02_data_validation

# Stage 3: Feature Engineering
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python -m src.mlpipeline.pipeline.stage_03_feature_engineering

# Stage 4: Data Transformation
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python -m src.mlpipeline.pipeline.stage_04_data_transformation

# Stage 5: Model Training
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python -m src.mlpipeline.pipeline.stage_05_model_trainer

# Stage 6: Model Evaluation
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python -m src.mlpipeline.pipeline.stage_06_model_evaluation

Flask Application

# Start web interface
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python app.py
# Access at: http://localhost:5000

# Production app with monitoring
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python production_app.py
# Metrics at: http://localhost:5000/metrics

📊 Monitoring & Observability

Start Monitoring Stack

cd observability
docker compose up -d

# Access monitoring tools:
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000
# Kibana: http://localhost:5601

Available Metrics

  • Model Performance: accuracy, precision, recall, F1-score
  • Prediction Analytics: confidence scores, class distribution
  • System Health: error rates, response times, resource usage

📖 For detailed observability setup and configuration, see Observability.md


☸️ Kubernetes Deployment

# Start cluster
minikube start

# Deploy application
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml

# Access application
kubectl port-forward svc/mlapp-service 8000:80
# Navigate to: http://localhost:8000

🔧 Apache Airflow Pipeline

# Set up Airflow (Linux/WSL)
export AIRFLOW_HOME=~/airflow
cp model_dag.py ~/airflow/dags/

# Start Airflow
airflow standalone

# Test DAG file
python ~/airflow/dags/model_dag.py

# Access UI: http://localhost:8080
# Trigger: ml_pipeline_dag

📈 Model Performance

  • Algorithm: XGBoost Classifier
  • Accuracy: 92.15%
  • Precision: 91.47%
  • Recall: 92.00%
  • F1-Score: 91.64%
  • AUC: 94.04%

📁 Project Structure

MLOps_PipeLine/
├── src/mlpipeline/           # Core ML pipeline components and stages
├── config/                   # Configuration files for pipeline settings
├── artifacts/               # Generated model artifacts and data (DVC tracked)
├── k8s/                     # Kubernetes deployment manifests
├── observability/           # Complete monitoring stack with Prometheus, Grafana
├── .github/workflows/       # CI/CD automation pipelines
├── dvc.yaml                # DVC pipeline definition and stages
├── Dockerfile              # Container definition for deployment
├── model_dag.py           # Apache Airflow DAG for pipeline orchestration
├── app.py                 # Basic Flask web application
├── production_app.py      # Production Flask app with monitoring
└── main.py                # Main pipeline execution script

🎯 Key Achievements

End-to-End Automation: From data ingestion to model deployment
Scalable Infrastructure: Kubernetes orchestration with monitoring
Quality Assurance: Automated testing, validation, and security scanning
Observability: Comprehensive metrics, logging, and tracing
Continuous Integration: GitHub Actions for automated workflows
Model Governance: Version control, experiment tracking, performance monitoring


⭐ Star this repository if you found it helpful!

About

Complete MLOps pipeline: 6-stage ML workflow,, Kubernetes deployment, Prometheus monitoring, Airflow orchestration, CI/CD automation, Obervability Stacks (Prometheus, EFK)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors