Customer Churn Prevention Project

Situation

Our company was losing customers at an alarming rate, and we didn't know who was at risk of leaving until it was too late. The customer data was messy and scattered across multiple systems, making it impossible to identify warning signs.

                        The Challenge: We had incomplete customer information, data stored in different places, and only a small percentage of customers actually churned—making it hard to spot patterns.
                    

Task

My goal was to build a system that could:

Predict which customers were likely to leave before they actually cancelled
Give our retention team enough time to reach out and save the relationship
Work with our messy, real-world data
Be accurate enough to trust for business decisions

Action

I took a systematic approach to solve this problem:

                        Step 1: Clean Up The Data

                        I built automated systems to fix missing information, combine data from different sources, and create a single, reliable customer database.

                        Step 2: Build A Smart Prediction Model

                        I created a predictive model that learned from past customer behavior to identify early warning signs of churn. I made sure the model paid extra attention to at-risk customers (since they're rare but important).

                        Step 3: Test & Validate

                        I thoroughly tested the model to ensure it was both accurate and reliable before rolling it out to the team.

                        Step 4: Deploy & Monitor

                        I worked with engineering to integrate this into our daily operations and created dashboards so teams could act on the insights immediately.

Results

The impact was significant and measurable:

82%

Accuracy Rate

78%

Catch Rate

15%

Churn Reduction

Model Performance Metrics

82%

When we predict churn,
we're right 82% of the time

78%

We catch 78% of customers
who are about to leave

Customer Retention Impact

Before

Baseline Churn Rate

Lost customers without early warning

→

-15%

New Churn Rate

Proactive retention saves relationships

                        Business Impact:
                        Retention team can now focus on the right customers at the right time
15% reduction in customer churn translates to significant revenue savings
Faster response times—we identify at-risk customers weeks before they would cancel
Data-driven approach replaced gut feelings and guesswork

                    

Situation

The organization faced challenges in predicting customer churn due to:

High missingness rates across key customer attributes (20-30% null values in engagement features)
Data fragmentation across CRM, billing, and product usage systems with inconsistent schemas
Severe class imbalance—churn rate of approximately 3-5%, creating a highly skewed target distribution
No existing predictive infrastructure for proactive retention

Task

Design and operationalize an end-to-end churn prediction system that:

Handles missing data and inconsistent sources without significant information loss
Addresses class imbalance to optimize for both precision and recall
Achieves production-grade performance suitable for business decision-making
Integrates into existing retention workflows with minimal latency

Action

Data Engineering & Feature Pipeline:

                        Built robust ETL pipelines using Python and SQL to consolidate data from CRM (Salesforce), billing systems, and product analytics
Implemented multiple imputation strategies: MICE for continuous variables, mode imputation for categoricals, and missingness indicators as features
Created time-based features (recency, frequency, monetary value) and behavioral signals (engagement decay, feature adoption drop-off)
Applied feature engineering including interaction terms, ratio features, and temporal aggregations

                    

Modeling Approach:

                        Implemented class-weighted XGBoost to handle imbalance by assigning higher loss penalties to minority class
Used stratified k-fold cross-validation to ensure representative sampling across folds
Optimized hyperparameters using Bayesian optimization (Optuna) with custom objective balancing precision and recall
Evaluated on AUC-PR, F1-score, and custom business metrics rather than accuracy

                    

Deployment & Operationalization:

                        Deployed model using MLflow for versioning and Docker for containerization
Integrated scoring pipeline into nightly batch jobs with Airflow orchestration
Built monitoring dashboards to track prediction drift, feature drift, and model performance degradation
Implemented A/B testing framework to measure retention campaign effectiveness

                    

Results

Model Performance Metrics:

82%

Precision

78%

Recall

0.89

AUC-PR

0.80

F1-Score

Classification Performance

82%

Precision
(PPV)

78%

Recall
(Sensitivity)

89%

AUC-PR
(×100)

80%

F1-Score
(×100)

Business Impact:

15% reduction in customer churn rate measured via A/B test over 3-month period
7-14 day early warning window enabling proactive retention interventions
Improved retention team efficiency by 40%—focused outreach on high-probability churn cases
ROI of 5:1 on retention campaigns guided by model predictions

Technical Achievements:

Successfully handled class imbalance (3:97 ratio) through weighted loss and careful threshold selection
Built production-ready data pipelines processing 500K+ customer records daily
Achieved <50ms inference latency for real-time scoring use cases
Implemented automated retraining pipeline with monthly model refresh and performance monitoring

Customer Churn Prevention Project

Baseline Churn Rate

New Churn Rate

Customer Churn Prediction Model