School of Engineering and Technology: Naga Nikhil Kaushik A
School of Engineering and Technology: Naga Nikhil Kaushik A
School of Engineering and Technology: Naga Nikhil Kaushik A
Submitted By
Ramachandra H.V
Professor
Dept. Of CSE, SoET, CMRU
Bengaluru - 562149
Submitted to
CERTIFICATE
Certified that the Project Work entitled “Telecom Churn” carried out by NAGA NIKHIL
KAUSHIK A (18BBTCS072), bonafide student of SCHOOL OF ENGINEERING AND
TECHNOLOGY, in partial fulfillment for the award of BACHELOR OF TECHNOLOGY
in 8th Semester Computer Science and Engineering of CMR UNIVERSITY, Bengaluru
during the year 2022. It is certified that all corrections/suggestions indicated for the Internal
Assessment have been incorporated in the report. The project has been approved as it satisfies
the academic requirements regarding project work prescribed for the said degree.
1. …………………… ……………………
2. …………………… ……………………
DECLARATION
I further declare that the work reported in this mini project work has not been
submitted and will not be submitted, either in part or in full, for the award of any other degree
in this university or any other institute or University.
This project aims to investigate the main reasons for churn in the telecommunication sector.In
the telecom industry, customers can choose from multiple service providers and actively
switch from one operator to another. In this highly competitive market, the
telecommunications industry experiences an average of 15-25% annual churn rate. Given that
it costs 5-10 times more to acquire a new customer than to retain an existing one, customer
retention has become even more important than customer acquisition.
The proposed methodology for analysis of churn prediction covers several phases:
understanding the business, selection, analysis, and data processing, implementing various
algorithms for classification, evaluation of the classifiers and choosing the best one for
prediction. Churn prediction is pretty much classification problem whether customer will
churn or will not churn.
The satisfaction that accompanies the successful completion of this project would be
incomplete without the mention of the people who made it possible, without whose constant
guidance and encouragement would have made efforts go in vain.
I express my sincere gratitude to my project guide Ramachandra H.V Professor,
Dept of CSE, whose constant guidance and support the project would not be successful.
I also express my sincere thanks to my project co guide Dr. R. Sathyaraj, Assistant
Professor, Dept of CSE, whose constant guidance and support for the successful completion
of the project.
I would like to express my thanks to Dr. Rubini P, Professor and Head of the
Department of Computer Science and Engineering, School of Engineering and Technology,
CMR University, Bangalore, for his encouragement that motivated me for the successful
completion of Project work.
I express my heartfelt sincere gratitude to Dr. C. Prabhakar Reddy, Dean,
School of Engineering and Technology, CMR University for his support.
CHAPTER I.................................................................................................................................................................................4
INTRODUCTION TO TELECOM CHURN................................................................................................................................................................................4
Background......................................................................................................................................................................4
Problem Statement..........................................................................................................................................................5
Objective...........................................................................................................................................................................5
Challenges........................................................................................................................................................................6
CHAPTER II................................................................................................................................................................................7
LITERATURE SURVEY............................................................................................................................................................................................................7
Survey...............................................................................................................................................................................7
Existing Work....................................................................................................................................................................7
Summary..........................................................................................................................................................................7
Points to Ponder........................................................................................................................................................................... 8
CHAPTER III...............................................................................................................................................................................9
PROPOSED SYSTEM.............................................................................................................................................................................................................9
Algorithms Used...............................................................................................................................................................9
Unique features of the project.........................................................................................................................................9
Advantages.......................................................................................................................................................................9
CHAPTER IV.............................................................................................................................................................................11
SOFTWARE REQUIREMENT SPECIFICATION.....................................................................................................................................................................11
Requirements.................................................................................................................................................................11
Tools Used......................................................................................................................................................................12
Hardware.................................................................................................................................................................................... 12
Software..................................................................................................................................................................................... 12
CHAPTER V..............................................................................................................................................................................13
SYSTEM DESIGN................................................................................................................................................................................................................13
Architecture........................................................................................................................................................................... 13
Process Flow.......................................................................................................................................................................... 15
Template for Predictive Models.....................................................................................................................................16
Tips............................................................................................................................................................................................. 17
CHAPTER VI.............................................................................................................................................................................18
BUILD (DEVELOPMENT)............................................................................................................................................................18
Loading Libraries............................................................................................................................................................18
Data Gathering...............................................................................................................................................................20
Loading Data............................................................................................................................................................................... 20
Data Set................................................................................................................................................................................. 20
Data cleaning......................................................................................................................................................................... 20
Check for duplicates.............................................................................................................................................................. 21
Data Pre-processing:................................................................................................................................................................... 23
Feature Selection............................................................................................................................................................23
Numerical features distribution¶...............................................................................................................................................24
Categorical feature distribution..................................................................................................................................................26
Explanatory Data Analysis (EDA)...................................................................................................................................28
Model Selection..............................................................................................................................................................39
Train-Test Split............................................................................................................................................................................ 39
Feature Scaling........................................................................................................................................................................... 39
Model Comparison.........................................................................................................................................................45
Model Comparing...........................................................................................................................................................46
Model Evaluation...........................................................................................................................................................47
Model Improvement: (Hyperparameter tuning)............................................................................................................49
Compare the model predictions against the test set.............................................................................................................52
CHAPTER VII............................................................................................................................................................................55
RESULTS.............................................................................................................................................................................................................................55
CHAPTER VIII...........................................................................................................................................................................56
CONCLUSION.....................................................................................................................................................................................................................56
CHAPTER IX.............................................................................................................................................................................57
FUTURE ENHANCEMENTS.................................................................................................................................................................................................57
CHAPTER X..............................................................................................................................................................................58
REFERENCES......................................................................................................................................................................................................................58
CHAPTER I
In the Telecom Industry, Customers can choose from multiple service providers and actively
switch from one Operator to another. Due to the technical progress and the increasingNumber
of Operators raised the level of competition. So, for the Telecom Companies Predicting the
Customers who have a High Risk of getting into Churn Proactively has become important.
However, comparing the above Strategies considering the value of Return on Investment
(RoI) ofeach into account has shown that the increase the Retention Period of Customers, is
the most Profitable Strategy. In this highly competitive market, the Telecom Industry
experiences an average of 15-25% annual Churn rate. Given the fact that it costs 5-10 times
more to acquire a new customer than to retain an existing one. So, for most of the Telecom
Operators the Customer Retention has now become even more important than Customer
Acquisition
Background
A primary goal for any service is to grow by adding customers or users through marketing
and sales. (This is true for both for-profit and non-profit enterprises.) When customers leave,
Most service providers focus on acquisitions. But to be successful, a service must also
work to minimize churn. If churn is not addressed in an ongoing, proactive way, the
prefer to see the glass as half full. In that case, people talk about customer retention.
Problem Statement
The meaning of the Customer Churn is, a customer is leaving (attrition) the service of
existingcompany or service provider and moving to the other company or service provider. If
The main cause of churn could be different and varies from customer to customer. Predicting
the customer when he leaves and what is the triggering point for leaving, we should identify
Objective
This project aims to identify the reason for customer churn. For this purpose, we are going to
Gathering Data is a big challenge. The main reason is the data from the customer is under
PCI/PII regulations. And in order to get that data commercials are involved. So here for this
Survey
Retaining existing customers is vital for organizations looking to grow their business without
relying too heavily on the significantly higher cost of acquiring new customers. Marketing,
sales, and customer retention departments need to work to make sure customers are satisfied,
provide them with incentives, and present offers at the right time to reduce churn.
Existing Work
There is no one-size-fits-all algorithm for predictive analytics, as different models have their
own strengths and weaknesses. While the implementations of these algorithms are complex,
the underlying idea can be very simple. There are two major types of prediction algorithms,
classification and regression. Classification refers to predicting a discrete value such as a
label, while regression refers to predicting a continuous number such as a price.
Summary
A simple explanation is that it leverages past experiences captured in data to find patterns
associated with the underlying problem and then makes an educated guess. The best churn
model is not the one with best statistical precision. – The best churn model is the one that
provide best insights to further prevent churn behaviour. A predictive analytics solution will
try to mimic human learning behaviour through the use of advanced statistics and machine
learning.
Points to Ponder
• For marketers and management, a predictive model is not the objective, it is a medium to
reach an objective
• The objective in this case is to reduce churn – To make customers to stay longer
(and continue paying)
• To reduce churn, you have to know the actionable factors related to churn, and act to
prevent or change those factors.
• The best churn model will include these actionable factors as components of the model,
to be able to manage the churn prevention programs.
CHAPTER III
Proposed System
Algorithms Used
In order to find a possible solution to the problem of churn prediction i.e. successfully apply a
machine learning technique to the available data, one needs a deep understanding of the
business rules of the telecom company and their importance. This knowledge enables
selection of the attributes suitable for the problem in hand.
Scope of this project is to create a model to predict customer churn. Compare some of the
models and to make a final up or down prediction on the direction of a churn. Propose a
solution to retain them.
Advantages
The solution is developed using Python Jupyter Notebook. It is classified into two models as
follows:
Model 1: It will be used to predict whether a high-value customer will churn or not, in near
future (i.e. churn phase). By knowing this, the company can take action steps such as
providing special plans, discounts on recharge etc.
Model 2: It will be used to identify important variables that are strong predictors of churn.
These variables may also indicate why customers choose to switch to other networks.
Requirements
In general, every machine learning project has certain important steps to follow
Tools Used
Hardware
Software
System Design
System design follows a SDLC with a combination of Agile and Waterfall models. Main
work starts from Requirements gathering. After that requirements are shared between
different teams and accordingly all the teams will work towards preparing High Level
Design, Low Level Design, Test Cases etc..
Architecture
The total architecture is based on predictive analytics and the algorithms applied against the
data trained.
Flow diagram of spot check models:
Predictive analytics is a category of data analytics aimed at making predictions about future
outcomes based on historical data and analytics techniques. Predictive analytics uses a variety
of statistical techniques (including data mining, machine learning, and predictive modelling)
to understand future occurrences.
In supervised learning, we are given the data sets and already know what our correct output
should look like. Supervised learning problems are categorized into Classification and
Regression.
• Regression: - In regression we fit the training data into the continuous function and try
to predict results within a continuous output, meaning that we are trying to map input
variables to some continuous function. eg :- Predicting a person to churn based on the
services of the telecom. Here pre-paid service, frequent call disconnects when you are
driving, is always a possible to check the churn and it gives a condition output. So this
is a regression problem.
• Classification: - In classification outputs is predicted in discrete value such yes or no,
true or false,0 or 1, churn or not, diabetes or not, male or female, positive or negative
etc. eg In given telecom data predicting person has the churn or not is classification.
Process Flow
The data we got was mostly balanced & categorical data then we began with Data
Cleaning, Pre-processing, removing unwanted columns, feature selection, label encoding.
Template for Predictive Models:
In real-world, prediction models should go through following major stages to successfully predict
customer churn:
1. Define Problem
2. Data Processing
3. Data Evaluation
4. Model Selection (Evaluate Algorithms)
5. Model Evaluation (Improve Accuracy Algorithms)
6. Model Improvement (Optimize Models)
7. Future Predictions
8. Present Results
9. Model Deployment
Python Project Template
1. Prepare Problem
a) Load libraries
b) Load dataset (reduce dataset if necessary, model should build in ~30 sec.. always can
scale up later)
2. Summarize Data
a) Descriptive statistics (summaries)
b) Data visualizations
3. Prepare Data (start simple, revisit and cycle with next step until algorithm and presentation
of data is accurate enough)
a) Data cleansing (remove duplicates, mark/impute missing values)
b) Feature Selection (removes redundant features, develop new features)
c) Data Transforms (scale/redistribute attributes to best expose structure of problem)
4. Evaluate Algorithms (Model)
a) Split-out validation dataset
b) Test options and evaluation metric (cross validation and evaluation metric)
c) Spot-Check Algorithms
d) Compare Algorithms
5. Improve Accuracy
a) Algorithm tuning (search for a combination of parameters for each algorithm that yields
the best results)
b) Ensembles (combine prediction of multiple models into an ensemble prediction)
6. Finalize Model
a) Predictions on validation dataset
b) Create standalone model on entire training dataset
c) Save model for later use
Tips
1. Fast First pass: go through project as fast as possible. This will give confidence that all the
parts are there and a baseline to improve on
2. Cycles: loop through 3-4, 3-4-5
3. Attempt every step
4. Ratchet Accuracy: treat changes as experiments with goal to increase accuracy
5. Adapt as Needed: blur edges of tasks such as 4-5 to beset serve model accuracy
CHAPTER VI
Build (Development)
Loading Libraries
Data Gathering
Loading Data
Data Set:
We took this telecom dataset from online website source took all the insights regarding the
data. Attributes of the dataset:
► Demographic attributes: contain the primary features of the customer such as sex, age,
nationality, place of residence, etc.
► Contract attributes: contain the attributes associated with the customer contract for a
particular service such as type of service, date of conclusion of the contract, price of
the service etc.
► Customer behaviour attributes: describe the customer activities.
Data cleaning:
We have 2 types of features in the dataset: categorical (two or more values and without any
order) and numerical. Most of the feature names are self-explanatory, except for:
Data pre-processing is important task in machine learning. It converts raw data into clean
data. Following are technique, we have applied on data: -
Missing Values – Here we had missing values in Totalcharges feature which we then
eliminated and adjusted them with mean values. These are the missing row values within data
if not handled would later lead to errors for converting data type as it takes string value for
empty spaces.
Label Encoder – For categorical variables this is perfect method to convert them into numeric
values, best used when having multiple categories. We had various categorical values
converted them into numeric for further use in algorithms.
Drop Columns – As we took insights from the data, we came to know some of the features
were of less importance, so we dropped them to reduce the number of features.
Feature Selection
► Feature selection:
o Drop the attributes that provide no useful information for the task.
Feature engineering, where appropriate: • Discretize continuous features.
Numeric summarizing techniques (mean, standard deviation, etc.) don't show us spikes,
shapes of distributions and it is hard to observe outliers with it. That is the reason we
use histograms.
Categorical feature distribution
To analyze categorical features, we use bar charts. We observe that Senior citizens and
customers without phone service are less represented in the data.
o Add promising transformations of features (e.g., log(x), sqrt(x), x^2, etc.).
o Aggregate features into promising new features.
In this phase, we will look toward those features which did not consider in the selection but
are contributing factors for prediction.
Here we can see that customers who took fiber optics for month-to-month contracts whether
male/female resulted in churn.
Month-to-month contracts, absence of online security, and tech support seem to be positively
correlated with churn. While tenure and two-year contracts seem to be negatively correlated with
churn.
Interestingly, services such as Online security, streaming TV, online backup, tech support, etc.
without internet connection seem to be negatively related to churn.
We will explore the patterns for the above correlations below before we delve into modeling and
identifying the important variables.
A.) Demographics - Let us first understand the gender, age range, partner and dependent status of the
customers
1. Gender Distribution - About half of the customers in our data set are male while the other half are
female
2. % Senior Citizens - There are only 16% of the customers who are senior citizens. Thus most of our
customers in the data are younger people.
3. Partner and dependent status- About 50% of the customers have a partner, while only 30% of the total
customers have dependents.
**What would be interesting is to look at the % of customers, who have partners, also have
dependents. We will explore this next. **
4. Interestingly, among the customers who have a partner, only about half of them also have a
dependent, while other half do not have any independents. Additionally, as expected, among the
customers who do not have any partner, a majority (80%) of them do not have any dependents..
B.) Customer Account Information: Let u now look at the tenure, contract
1. Tenure: After looking at the below histogram we can see that a lot of customers have been with the
telecom company for just a month, while quite a many are there for about 72 months. This could be
potential because different customers have different contracts. Thus, based on the contract they are
into it could be more/less easy for the customers to stay/leave the telecom company.
2. Contracts: To understand the above graph, lets first look at the # of customers by different contracts.
3. Below we will understand the tenure of customers based on their contract type.
Interestingly most of the monthly contracts last for 1-2 months, while the 2-year contracts tend to last
for about 70 months. This shows that the customers taking a longer contract are more loyal to the
company and tend to stay with it for a longer period.
This is also what we saw in the earlier chart on correlation with the churn rate.
4. Also visualized all the features within the dataset & came to know the distributions.
Now let's take a quick look at the relationship between monthly and total charges
We will observe that the total charges increase as the monthly bill for a customer increases.
I also looked at any differences between the % of customers with/without dependents and partners by
gender. There is no difference in their distribution by gender. Additionally, there is no difference in
senior citizen status by gender. We will observe that the total charges increase as the monthly bill for a
customer increase.
Finally, let's take a look at out predictor variable (Churn) and understand its interaction with other
important variables as was found out in the correlation plot.
In our data, 74% of the customers do not churn. Clearly, the data is imbalanced as we would expect a
large majority of the customers to not churn. This is important to keep in mind for our modeling as
skewness could lead to a lot of false negatives. We will see in the modeling section how to avoid
skewness in the data.
Let's now explore the churn rate by tenure, seniority, contract type, monthly charges, and total charges
to see how it varies by these variables.
i.) Churn vs Tenure: As we can see from the below plot, the customers who do not churn, tend to stay
for longer tenure with the telecom company.
ii.) Churn by Contract Type: Similar to what we saw in the correlation plot, the customers who have a
month to month contract have a very high churn rate.
iii.) Churn by Seniority: Senior Citizens have almost double the churn rate than younger population.
(iv) Churn by Monthly Charges: Higher % of customers churn when the monthly charges are
high**
v.) Churn by Total Charges: It seems that there is higher churn when the total charges are lower.
Model Selection
Train-Test Split:
To create the model, we train our dataset while testing data set is used to test the
performance. So, in our data, we have split into 80% for training data and 20% for testing
data because it makes the classification model better whilst more test data makes the error
estimate more accurate.
Feature Scaling
Compare the base line algorithms for second iteration.
Following are model, we applied to check which model gives better accuracy:
Decision Tree:
A decision tree is non-parametric supervised learning. It is used for both classification and
regression problems. It is a flowchart-like structure in which each internal node represents a
“test” on an attribute, each branch represents the outcome of the test, and each leaf node
represents a class label. The path between root and leaf represents classification rules. It
creates a comprehensive analysis along with each branch and identifies decision nodes that
need furtheranalysis.
Random Forest:
Random Forest is a meta estimator that uses the number of decision tree to fit the various sub
samples drawn from the original dataset. we also can draw data with replacement as per the
requirements.
As we had number of features and most of them were of great importance, so we used
features to get to know which of them are contributing towards the accuracy of the model.
We used Decision tree, Random Forest for feature selection so using decision tree we got
accuracy [80] and by using a random forest we got [80%] so random forest gave us four
features
Index (['tenure', 'Contract', 'MonthlyCharges', 'TotalCharges'], dtype='object')
[(0.2251735641431145, 'Contract'), (0.1687558104226648, 'tenure'),
(0.12539865168020692, 'OnlineSecurity'), (0.1128092761196452, 'TechSupport'),
(0.10731999001345587, 'TotalCharges'), (0.08573112448285626, 'MonthlyCharges'),
Here we can see contract is having more importance resulting factor for churn.
And for decision tree we got accuracy of [77%]. Used heat Map for correlation checking:
Naïve Bayes:
A Naive Bayes Classifier is a supervised machine learning algorithm which uses the Bayes’
Theorem, that features are statistically independent. It finds many uses in the probability
theory and statistics. By simple machine learning problem, where we need to learn our model
from a given set of attributes (in training examples) and then form a hypothesis or a relation
to a response variable.
Logistic Regression:
Compare Baseline Classification Algorithms - Second Iteration Using Accuracy, Precision, Recall, F1
and F2 Score Metrics
Model Evaluation:
Train & evaluate Chosen Model: Let’s fit the selected model (Logistic Regression in this case)
on the training dataset and evaluate the results.
Model Improvement: (Hyperparameter tuning)
In Machine Learning, there are a couple of ways to get better performance from your model. One
method is called Hyperparameter Tuning. The idea behind Hyperparameter Tuning is to find a
combination of hyperparameters that result in the best solution to the Machine Learning problem.
Sort of like the knobs used to tweak sound equalizers. The model above used mostly default
settings of the various hyperparameters.
Based on original and Upsampling data tunned hyperparameter for our training data for all
three models (means for all the used classifiers (algorithms).
Comparison
During the hyperparameter tuning process, I got the best parameters for each model. using this
parameter and the corss_val_score method, I have compared different metrics and tried to find best
threshold value with a lower log loss score.
Intuition process -
Because the target variable is a binary value, log loss would be a better metric to find the
uncertainty ofthe model. Once I'll explore a better model with lower log loss score, then I'll
tune threshold to improve F1/Recall/Precision score.
Model conclusion
Based on the model comparison and evaluation process, upsampling data works better
during the training process, however not with unseen data (based on log loss score). One of
the reasonscould be data leakage in corss_val_score step.
However, log loss score for original dataset remains same with training dataset as well as
testing dataset.
From above analysis, gradient boosting with original dataset has stable and best score. So, for
prediction process I have used it.
Save Model
CHAPTER VII
Results
Conclusion
A Comparison study has been made between prominent classifiers namely LR and all
Boosting algorithms, for improving the accuracy of customer churn prediction.
Secondly, the Boosting classifiers also perform well. Next, the work was focused to identify
the attribute which has higher cognition for churn using XG boost classifier. The experiment
result shows that Fiber Optic customers with greater monthly charges have a higher influence
for churn. Anticipated directions can be to predict by a hybrid of classifiers which gives high
accuracy and desirable results.
Here we had past records of customers who had churned and using that data we predicted
whether new customers would tend to churn or not, this will help the companies to get to
know the behavior of customers & how to maintain their interests in the services of the
company. Further, the company can also use a recommender system to retain customers and
avoid further churns. We used various algorithms wherein Logistic regression gave us high
accuracyclose to this accuracy were Random Forest, and SVM.
The dataset did not consist of records that would tell us whether the customer has switched
the services, which will help in recommending new services further. Now we are going to
builda recommender system to avoid churns & retain the old customers.
CHAPTER IX
Future Enhancements
References
In the current project, the literature inclined towards the new domain of conversational
information retrieval is considered. The following journals, websites and books are referred
throughout the project.
[1] https://www.researchgate.net/publication/329893308_Customer_Churn_Warning_with_M
achine_Learning
[2] https://machinelearningmastery.com/how-to-fix-futurewarning-messages-in-scikit-learn/
[3] Hands-on Scikit-Learn for Machine Learning Applications: Data Science Fundamentals
with Python
[4] https://www.folio3.ai/customer-churn-prediction/
[5] https://medium.com/@b.khaleghi/what-makes-predicting-customer-churn-a-challenge-
be195f35366e
Fighting Churn with Data (The science and strategy of Customer Retention) – Manning
Publication