Customer Churn Prevention Project

Keeping Our Customers Happy & Loyal
S
Situation

Our company was losing customers at an alarming rate, and we didn't know who was at risk of leaving until it was too late. The customer data was messy and scattered across multiple systems, making it impossible to identify warning signs.

The Challenge: We had incomplete customer information, data stored in different places, and only a small percentage of customers actually churned—making it hard to spot patterns.
T
Task

My goal was to build a system that could:

  • Predict which customers were likely to leave before they actually cancelled
  • Give our retention team enough time to reach out and save the relationship
  • Work with our messy, real-world data
  • Be accurate enough to trust for business decisions
A
Action

I took a systematic approach to solve this problem:

Step 1: Clean Up The Data
I built automated systems to fix missing information, combine data from different sources, and create a single, reliable customer database.
Step 2: Build A Smart Prediction Model
I created a predictive model that learned from past customer behavior to identify early warning signs of churn. I made sure the model paid extra attention to at-risk customers (since they're rare but important).
Step 3: Test & Validate
I thoroughly tested the model to ensure it was both accurate and reliable before rolling it out to the team.
Step 4: Deploy & Monitor
I worked with engineering to integrate this into our daily operations and created dashboards so teams could act on the insights immediately.
R
Results

The impact was significant and measurable:

82%
Accuracy Rate
78%
Catch Rate
15%
Churn Reduction
Model Performance Metrics
82%
When we predict churn,
we're right 82% of the time
78%
We catch 78% of customers
who are about to leave
Customer Retention Impact
Before

Baseline Churn Rate

Lost customers without early warning

-15%

New Churn Rate

Proactive retention saves relationships

Business Impact:
  • Retention team can now focus on the right customers at the right time
  • 15% reduction in customer churn translates to significant revenue savings
  • Faster response times—we identify at-risk customers weeks before they would cancel
  • Data-driven approach replaced gut feelings and guesswork

Customer Churn Prediction Model

ML-Driven Retention Analytics
S
Situation

The organization faced challenges in predicting customer churn due to:

  • High missingness rates across key customer attributes (20-30% null values in engagement features)
  • Data fragmentation across CRM, billing, and product usage systems with inconsistent schemas
  • Severe class imbalance—churn rate of approximately 3-5%, creating a highly skewed target distribution
  • No existing predictive infrastructure for proactive retention
T
Task

Design and operationalize an end-to-end churn prediction system that:

  • Handles missing data and inconsistent sources without significant information loss
  • Addresses class imbalance to optimize for both precision and recall
  • Achieves production-grade performance suitable for business decision-making
  • Integrates into existing retention workflows with minimal latency
A
Action

Data Engineering & Feature Pipeline:

  • Built robust ETL pipelines using Python and SQL to consolidate data from CRM (Salesforce), billing systems, and product analytics
  • Implemented multiple imputation strategies: MICE for continuous variables, mode imputation for categoricals, and missingness indicators as features
  • Created time-based features (recency, frequency, monetary value) and behavioral signals (engagement decay, feature adoption drop-off)
  • Applied feature engineering including interaction terms, ratio features, and temporal aggregations

Modeling Approach:

  • Implemented class-weighted XGBoost to handle imbalance by assigning higher loss penalties to minority class
  • Used stratified k-fold cross-validation to ensure representative sampling across folds
  • Optimized hyperparameters using Bayesian optimization (Optuna) with custom objective balancing precision and recall
  • Evaluated on AUC-PR, F1-score, and custom business metrics rather than accuracy

Deployment & Operationalization:

  • Deployed model using MLflow for versioning and Docker for containerization
  • Integrated scoring pipeline into nightly batch jobs with Airflow orchestration
  • Built monitoring dashboards to track prediction drift, feature drift, and model performance degradation
  • Implemented A/B testing framework to measure retention campaign effectiveness
R
Results

Model Performance Metrics:

82%
Precision
78%
Recall
0.89
AUC-PR
0.80
F1-Score
Classification Performance
82%
Precision
(PPV)
78%
Recall
(Sensitivity)
89%
AUC-PR
(×100)
80%
F1-Score
(×100)

Business Impact:

  • 15% reduction in customer churn rate measured via A/B test over 3-month period
  • 7-14 day early warning window enabling proactive retention interventions
  • Improved retention team efficiency by 40%—focused outreach on high-probability churn cases
  • ROI of 5:1 on retention campaigns guided by model predictions

Technical Achievements:

  • Successfully handled class imbalance (3:97 ratio) through weighted loss and careful threshold selection
  • Built production-ready data pipelines processing 500K+ customer records daily
  • Achieved <50ms inference latency for real-time scoring use cases
  • Implemented automated retraining pipeline with monthly model refresh and performance monitoring