Planning
End-to-End Machine Learning Pipeline with DVC, MLflow, and FastAPI
#### Description This project builds a fully automated MLOps pipeline for model development, versioning, experiment tracking, and production deployment. Using the UCI Adult Census Income dataset for income prediction (≤50K vs >50K), the pipeline integrates data version control (DVC), experiment tracking (MLflow), and model serving (FastAPI) to ensure reproducibility, trackability, and production readiness. The workflow covers data preprocessing, model training, evaluation, and real-time prediction serving.
Objectives
-
Develop a modular and reproducible ML pipeline with clear separation of data processing, training, evaluation, and serving stages.
-
Version and track all datasets, model artifacts, and training experiments using DVC and MLflow for complete lineage tracking.
-
Implement automated model deployment with FastAPI endpoints for real-time predictions and health monitoring.
-
Automate report generation and metric logging to enable operational insights and data-driven model improvement decisions.
-
Demonstrate integration of popular open-source MLOps tools for real-world production deployment scenarios.
Deliverables
-
Data ingestion and preprocessing scripts with feature engineering and data validation
-
Model training, evaluation, and prediction scripts with comprehensive MLflow integration for experiment tracking
-
Production-ready REST API with FastAPI serving predictions, health checks, and model information endpoints
-
DVC pipeline files (dvc.yaml and dvc.lock) orchestrating all stages from data download to model evaluation
-
Versioned datasets and model artifacts tracked via DVC and MLflow with complete reproducibility
-
Visual and numerical performance reports including confusion matrices, ROC curves, and feature importance plots
-
Command-line interface (CLI) for pipeline operations, model serving, and development workflows
This implementation provides a complete foundation for maintaining reliable, trackable ML models in production environments, demonstrating modern MLOps best practices with popular open-source tools.
Dataset: UCI Adult Census Income Dataset - 48,842 instances with demographic and employment features for binary income classification.