School of Engineering and Technology: Naga Nikhil Kaushik A

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 62

SCHOOL OF ENGINEERING AND TECHNOLOGY

CAPSTONE PROJECT REPORT


FOR BUILD
ON
TELECOM CHURN

For the requirement of the 8th


Semester
For the requirement of B.Tech. in Computer Science and Engineering

Submitted By

Naga Nikhil Kaushik A (18BBTCS072)

Under the Guidance of

Ramachandra H.V
Professor
Dept. Of CSE, SoET, CMRU
Bengaluru - 562149

Submitted to

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


CMR University
Off Hennur - Bagalur Main Road,
Near Kempegowda International Airport,
Chagalahatti, Bengaluru, Karnataka-562149
Academic Year - 2020-21
SCHOOL OF ENGINEERING AND TECHNOLOGY

Department of Computer Science and Engineering

CERTIFICATE
Certified that the Project Work entitled “Telecom Churn” carried out by NAGA NIKHIL
KAUSHIK A (18BBTCS072), bonafide student of SCHOOL OF ENGINEERING AND
TECHNOLOGY, in partial fulfillment for the award of BACHELOR OF TECHNOLOGY
in 8th Semester Computer Science and Engineering of CMR UNIVERSITY, Bengaluru
during the year 2022. It is certified that all corrections/suggestions indicated for the Internal
Assessment have been incorporated in the report. The project has been approved as it satisfies
the academic requirements regarding project work prescribed for the said degree.

Signature of Course In-charge Signature of HOD Signature of Dean

…………………… …………………… ……………………


Dept. of CSE
SoET, CMRU, Dept. of CSE SoET, CMRU, Bangalore
Bangalore SoET, CMRU,
Bangalore

Name of the Examiners: Signature with Date:

1. …………………… ……………………

2. …………………… ……………………
DECLARATION

I, NAGA NIKHIL KAUSHIK A bearing USN 18BBTCS072, student of Bachelor of


Technology, Computer Science and Engineering, CMR University, Bengaluru, here by
declare that the Project Work entitled “Telecom Churn” submitted by me, for the award of
the Bachelor’s degree in Computer Science and Engineering to CMR University is a record
of bonafide work carried out independently by me under the supervision and guidance of
Ramachandra H.V, Professor, Dept of CSE. CMR University.

I further declare that the work reported in this mini project work has not been
submitted and will not be submitted, either in part or in full, for the award of any other degree
in this university or any other institute or University.

Place: Bengaluru (Naga Nikhil Kaushik A)


Date: 09/12/2021 (18BBTCS072)
Abstract

This project aims to investigate the main reasons for churn in the telecommunication sector.In
the telecom industry, customers can choose from multiple service providers and actively
switch from one operator to another. In this highly competitive market, the
telecommunications industry experiences an average of 15-25% annual churn rate. Given that
it costs 5-10 times more to acquire a new customer than to retain an existing one, customer
retention has become even more important than customer acquisition.

The proposed methodology for analysis of churn prediction covers several phases:
understanding the business, selection, analysis, and data processing, implementing various
algorithms for classification, evaluation of the classifiers and choosing the best one for
prediction. Churn prediction is pretty much classification problem whether customer will
churn or will not churn.

Broad Academic Area of Work: Knowledge representation, Machine learning However, it is


important to measure the accuracy of the models. The churn prediction literature shows that
best performing algorithms are neural networks, decision trees and logistic regression, but we
can obtain reasonable results with a support vector machine algorithm.
ACKNOWLEDGEMENT

The satisfaction that accompanies the successful completion of this project would be
incomplete without the mention of the people who made it possible, without whose constant
guidance and encouragement would have made efforts go in vain.
I express my sincere gratitude to my project guide Ramachandra H.V Professor,
Dept of CSE, whose constant guidance and support the project would not be successful.
I also express my sincere thanks to my project co guide Dr. R. Sathyaraj, Assistant
Professor, Dept of CSE, whose constant guidance and support for the successful completion
of the project.
I would like to express my thanks to Dr. Rubini P, Professor and Head of the
Department of Computer Science and Engineering, School of Engineering and Technology,
CMR University, Bangalore, for his encouragement that motivated me for the successful
completion of Project work.
I express my heartfelt sincere gratitude to Dr. C. Prabhakar Reddy, Dean,
School of Engineering and Technology, CMR University for his support.
CHAPTER I.................................................................................................................................................................................4
INTRODUCTION TO TELECOM CHURN................................................................................................................................................................................4
Background......................................................................................................................................................................4
Problem Statement..........................................................................................................................................................5
Objective...........................................................................................................................................................................5
Challenges........................................................................................................................................................................6
CHAPTER II................................................................................................................................................................................7
LITERATURE SURVEY............................................................................................................................................................................................................7
Survey...............................................................................................................................................................................7
Existing Work....................................................................................................................................................................7
Summary..........................................................................................................................................................................7
Points to Ponder........................................................................................................................................................................... 8

CHAPTER III...............................................................................................................................................................................9
PROPOSED SYSTEM.............................................................................................................................................................................................................9
Algorithms Used...............................................................................................................................................................9
Unique features of the project.........................................................................................................................................9
Advantages.......................................................................................................................................................................9
CHAPTER IV.............................................................................................................................................................................11
SOFTWARE REQUIREMENT SPECIFICATION.....................................................................................................................................................................11
Requirements.................................................................................................................................................................11
Tools Used......................................................................................................................................................................12
Hardware.................................................................................................................................................................................... 12
Software..................................................................................................................................................................................... 12
CHAPTER V..............................................................................................................................................................................13
SYSTEM DESIGN................................................................................................................................................................................................................13
Architecture........................................................................................................................................................................... 13
Process Flow.......................................................................................................................................................................... 15
Template for Predictive Models.....................................................................................................................................16
Tips............................................................................................................................................................................................. 17

CHAPTER VI.............................................................................................................................................................................18
BUILD (DEVELOPMENT)............................................................................................................................................................18
Loading Libraries............................................................................................................................................................18
Data Gathering...............................................................................................................................................................20
Loading Data............................................................................................................................................................................... 20
Data Set................................................................................................................................................................................. 20
Data cleaning......................................................................................................................................................................... 20
Check for duplicates.............................................................................................................................................................. 21
Data Pre-processing:................................................................................................................................................................... 23
Feature Selection............................................................................................................................................................23
Numerical features distribution¶...............................................................................................................................................24
Categorical feature distribution..................................................................................................................................................26
Explanatory Data Analysis (EDA)...................................................................................................................................28
Model Selection..............................................................................................................................................................39
Train-Test Split............................................................................................................................................................................ 39
Feature Scaling........................................................................................................................................................................... 39
Model Comparison.........................................................................................................................................................45
Model Comparing...........................................................................................................................................................46
Model Evaluation...........................................................................................................................................................47
Model Improvement: (Hyperparameter tuning)............................................................................................................49
Compare the model predictions against the test set.............................................................................................................52

CHAPTER VII............................................................................................................................................................................55
RESULTS.............................................................................................................................................................................................................................55
CHAPTER VIII...........................................................................................................................................................................56
CONCLUSION.....................................................................................................................................................................................................................56
CHAPTER IX.............................................................................................................................................................................57
FUTURE ENHANCEMENTS.................................................................................................................................................................................................57
CHAPTER X..............................................................................................................................................................................58
REFERENCES......................................................................................................................................................................................................................58
CHAPTER I

Introduction to Telecom Churn

In the Telecom Industry, Customers can choose from multiple service providers and actively
switch from one Operator to another. Due to the technical progress and the increasingNumber
of Operators raised the level of competition. So, for the Telecom Companies Predicting the
Customers who have a High Risk of getting into Churn Proactively has become important.

Telecom Companies follow three main strategies to Generate More Revenues:

• Acquire New Customers


• Upsell the Existing Customers
• Increase the Retention Period of Customers

However, comparing the above Strategies considering the value of Return on Investment
(RoI) ofeach into account has shown that the increase the Retention Period of Customers, is
the most Profitable Strategy. In this highly competitive market, the Telecom Industry
experiences an average of 15-25% annual Churn rate. Given the fact that it costs 5-10 times
more to acquire a new customer than to retain an existing one. So, for most of the Telecom
Operators the Customer Retention has now become even more important than Customer
Acquisition

Background

A primary goal for any service is to grow by adding customers or users through marketing

and sales. (This is true for both for-profit and non-profit enterprises.) When customers leave,

it counteracts the company’s growth and can even lead to contraction.

What is a Churn—when a customer quits using a service or cancels their subscription.

Most service providers focus on acquisitions. But to be successful, a service must also

work to minimize churn. If churn is not addressed in an ongoing, proactive way, the

product or service won’t reach its full potential.


Customers not churning from a service can also be framed in a positive sense, if you

prefer to see the glass as half full. In that case, people talk about customer retention.

Problem Statement

The meaning of the Customer Churn is, a customer is leaving (attrition) the service of

existingcompany or service provider and moving to the other company or service provider. If

the churnrate is the high company will move into losses.

Customer Churn could be one of the acts of

• Account closing with an existing service provider

• Not extending the existing service contract

• Moving to another vendor or service provider.

The main cause of churn could be different and varies from customer to customer. Predicting

the customer when he leaves and what is the triggering point for leaving, we should identify

the cause for leaving and try to retain them.

Objective

The objectives of this project are as follows:

• Analyze the given data


• Cleansing the given data
• Sanitize the given data
• Create a model to identify the customer will churn or will not churn
• Compare the models
• Help the telecom company by providing insights to retain the leaving customer

This project aims to identify the reason for customer churn. For this purpose, we are going to

use a prediction model.


Challenges

Gathering Data is a big challenge. The main reason is the data from the customer is under

PCI/PII regulations. And in order to get that data commercials are involved. So here for this

capstone project data used from Kaggle and UCI


CHAPTER II
Literature Survey

Survey

Retaining existing customers is vital for organizations looking to grow their business without
relying too heavily on the significantly higher cost of acquiring new customers. Marketing,
sales, and customer retention departments need to work to make sure customers are satisfied,
provide them with incentives, and present offers at the right time to reduce churn.

Predictive analytics is growing in popularity because of its potential to be leveraged for


significant business success. In general, these predictions do not need to be overwhelmingly
accurate to see good success. You might not be able to know every move your customers will
make before they make them, but there’s no doubt you can provide meaningful, timely
insights that drive more informed decisions and, ultimately, happier customers.

Existing Work

There is no one-size-fits-all algorithm for predictive analytics, as different models have their
own strengths and weaknesses. While the implementations of these algorithms are complex,
the underlying idea can be very simple. There are two major types of prediction algorithms,
classification and regression. Classification refers to predicting a discrete value such as a
label, while regression refers to predicting a continuous number such as a price.

Summary

A simple explanation is that it leverages past experiences captured in data to find patterns
associated with the underlying problem and then makes an educated guess. The best churn
model is not the one with best statistical precision. – The best churn model is the one that
provide best insights to further prevent churn behaviour. A predictive analytics solution will
try to mimic human learning behaviour through the use of advanced statistics and machine
learning.
Points to Ponder

• For marketers and management, a predictive model is not the objective, it is a medium to
reach an objective

• The objective in this case is to reduce churn – To make customers to stay longer
(and continue paying)

• To reduce churn, you have to know the actionable factors related to churn, and act to
prevent or change those factors.

• The best churn model will include these actionable factors as components of the model,
to be able to manage the churn prevention programs.
CHAPTER III

Proposed System

Algorithms Used

In order to find a possible solution to the problem of churn prediction i.e. successfully apply a
machine learning technique to the available data, one needs a deep understanding of the
business rules of the telecom company and their importance. This knowledge enables
selection of the attributes suitable for the problem in hand.
Scope of this project is to create a model to predict customer churn. Compare some of the
models and to make a final up or down prediction on the direction of a churn. Propose a
solution to retain them.

I theoretically apply the some of the general classification algorithms:

► Linear Classifiers: Logistic Regression, Naive Bayes Classifier.


► Nearest Neighbor.
► Support Vector Machines.
► Decision Trees.
► Boosted Trees.
► Random Forest.

Unique features of the project

At its core, predictive analytics encompasses a variety of statistical techniques


(including machine learning, predictive modeling and data mining) and uses statistics (both
historical and current) to estimate, or ‘predict’, future outcomes. These outcomes might
be behaviors of a customer are likely to exhibit or possible changes in the market, for
example. Predictive analytics help us to understand possible future occurrences by analyzing
the past.
This same concept you can use in other fields like banking, insurance, security and retail.

Advantages

Business analysts can play a key role rather technical people.


Predictive analytics solutions help organizations to turn their data into timely insights for
better, faster decision making.
Data governance solutions help organizations to maintain high-quality data, as well as
align operations across the business and pinpoint data problems within the same environment.
CHAPTER IV

Software Requirement Specification

The solution is developed using Python Jupyter Notebook. It is classified into two models as
follows:

Model 1: It will be used to predict whether a high-value customer will churn or not, in near
future (i.e. churn phase). By knowing this, the company can take action steps such as
providing special plans, discounts on recharge etc.

Model 2: It will be used to identify important variables that are strong predictors of churn.
These variables may also indicate why customers choose to switch to other networks.

Requirements

In general, every machine learning project has certain important steps to follow

• Understand the business problem.


• Gathering Data: In this process, we need to gather data to resolve the
business problem.
• The gathered data should be clean and well-formatted.
o By using visual explanatory data analysis (charts, bars etc) need to analyse
the data
o Do the conversion of data (YES to 1, NO to 0)
• Select the features (attributes) excluding predictor variables.
• Divide the test data and training data
• Identify suitable machine learning algorithms.
• Run the algorithms against test data, and study the results.
• While running gathers the metrics and using metrics compare the data.
• Select the best model based on the comparison results
A predictive Churn Model: There are different types of machine learning algorithms which
work based on different categories. Supervised, Unsupervised, Reinforced. Our case is the
predictive model falls under supervised. This kind of classifications works on given historical
data and It is a classification tool goes through the user activity from the historical data and
verifies which customer is going to leave. Without this information, it is difficult to take
proper measures to retain a customer and we can’t take proper decisions.

The appropriate development environment is required in order to achieve the objectives of


this project. The following hardware and software tools are needed to set up appropriate
environments for development of this application.

Tools Used

Hardware

The appropriate hardware environment for application development requires a personal


computer/laptop with either of the following operating system;

► Processor : Pentium Intel, Xeon, X86, X64


► RAM : 16 GB
► Hard Disk : 512 GB

Software

► Microsoft Windows 10 pro or later version


► Mac OS X 10.5.8 or later version with Intel chip
► Jupyter Lab (Desktop )
This project assumes Python version 3.8. And following are some of the python libraries are
used
• Scipy. – A python library for engineering maths and science
• Numpy – A python library for scientific computing
• Matplotlib. – A python library for plotting 2-dimensional charts etc..
• Pandas. – A python library for data analysis
• Scikit learn – A python library contains the machine learning algorithms
• Seaborn -- A python visualization package based on matplotlib
• Graphviz - For graphs
CHAPTER V

System Design
System design follows a SDLC with a combination of Agile and Waterfall models. Main
work starts from Requirements gathering. After that requirements are shared between
different teams and accordingly all the teams will work towards preparing High Level
Design, Low Level Design, Test Cases etc..

Architecture

The total architecture is based on predictive analytics and the algorithms applied against the
data trained.
Flow diagram of spot check models:

Predictive analytics is a category of data analytics aimed at making predictions about future
outcomes based on historical data and analytics techniques. Predictive analytics uses a variety
of statistical techniques (including data mining, machine learning, and predictive modelling)
to understand future occurrences.

Machine learning is having 3 types of algorithms supervised learning, unsupervised learning,


reinforcement learning.

In supervised learning, we are given the data sets and already know what our correct output
should look like. Supervised learning problems are categorized into Classification and
Regression.

• Regression: - In regression we fit the training data into the continuous function and try
to predict results within a continuous output, meaning that we are trying to map input
variables to some continuous function. eg :- Predicting a person to churn based on the
services of the telecom. Here pre-paid service, frequent call disconnects when you are
driving, is always a possible to check the churn and it gives a condition output. So this
is a regression problem.
• Classification: - In classification outputs is predicted in discrete value such yes or no,
true or false,0 or 1, churn or not, diabetes or not, male or female, positive or negative
etc. eg In given telecom data predicting person has the churn or not is classification.

Process Flow

The data we got was mostly balanced & categorical data then we began with Data
Cleaning, Pre-processing, removing unwanted columns, feature selection, label encoding.
Template for Predictive Models:
In real-world, prediction models should go through following major stages to successfully predict
customer churn:

1. Define Problem
2. Data Processing
3. Data Evaluation
4. Model Selection (Evaluate Algorithms)
5. Model Evaluation (Improve Accuracy Algorithms)
6. Model Improvement (Optimize Models)
7. Future Predictions
8. Present Results
9. Model Deployment
Python Project Template
1. Prepare Problem
a) Load libraries
b) Load dataset (reduce dataset if necessary, model should build in ~30 sec.. always can
scale up later)
2. Summarize Data
a) Descriptive statistics (summaries)
b) Data visualizations
3. Prepare Data (start simple, revisit and cycle with next step until algorithm and presentation
of data is accurate enough)
a) Data cleansing (remove duplicates, mark/impute missing values)
b) Feature Selection (removes redundant features, develop new features)
c) Data Transforms (scale/redistribute attributes to best expose structure of problem)
4. Evaluate Algorithms (Model)
a) Split-out validation dataset
b) Test options and evaluation metric (cross validation and evaluation metric)
c) Spot-Check Algorithms
d) Compare Algorithms
5. Improve Accuracy
a) Algorithm tuning (search for a combination of parameters for each algorithm that yields
the best results)
b) Ensembles (combine prediction of multiple models into an ensemble prediction)
6. Finalize Model
a) Predictions on validation dataset
b) Create standalone model on entire training dataset
c) Save model for later use
Tips
1. Fast First pass: go through project as fast as possible. This will give confidence that all the
parts are there and a baseline to improve on
2. Cycles: loop through 3-4, 3-4-5
3. Attempt every step
4. Ratchet Accuracy: treat changes as experiments with goal to increase accuracy
5. Adapt as Needed: blur edges of tasks such as 4-5 to beset serve model accuracy
CHAPTER VI

Build (Development)

Loading Libraries
Data Gathering

Loading Data

Data Set:

We took this telecom dataset from online website source took all the insights regarding the
data. Attributes of the dataset:

► Demographic attributes: contain the primary features of the customer such as sex, age,
nationality, place of residence, etc.
► Contract attributes: contain the attributes associated with the customer contract for a
particular service such as type of service, date of conclusion of the contract, price of
the service etc.
► Customer behaviour attributes: describe the customer activities.

Data cleaning:

► Fix or remove outliers (optional).


► Fill in missing values (e.g., with zero, mean, median...) or drop their rows (or columns).
Check for duplicates
► Data Types

We have 2 types of features in the dataset: categorical (two or more values and without any
order) and numerical. Most of the feature names are self-explanatory, except for:

• Partner: whether the customer has a partner or not (Yes, No),


• Dependents: whether the customer has dependents or not (Yes, No),
• OnlineBackup: whether the customer has online backup or not (Yes, No, No
internet service),
• tenure: number of months the customer has stayed with the company,
• MonthlyCharges: the amount charged to the customer monthly,
• TotalCharges: the total amount charged to the customer.
Data Pre-processing:

Data pre-processing is important task in machine learning. It converts raw data into clean
data. Following are technique, we have applied on data: -

Missing Values – Here we had missing values in Totalcharges feature which we then
eliminated and adjusted them with mean values. These are the missing row values within data
if not handled would later lead to errors for converting data type as it takes string value for
empty spaces.

Label Encoder – For categorical variables this is perfect method to convert them into numeric
values, best used when having multiple categories. We had various categorical values
converted them into numeric for further use in algorithms.

Drop Columns – As we took insights from the data, we came to know some of the features
were of less importance, so we dropped them to reduce the number of features.
Feature Selection
► Feature selection:

o Drop the attributes that provide no useful information for the task.
Feature engineering, where appropriate: • Discretize continuous features.

o Decompose features (e.g., categorical, date/time, etc.).

Numerical features distributionfj

Numeric summarizing techniques (mean, standard deviation, etc.) don't show us spikes,
shapes of distributions and it is hard to observe outliers with it. That is the reason we
use histograms.
Categorical feature distribution
To analyze categorical features, we use bar charts. We observe that Senior citizens and
customers without phone service are less represented in the data.
o Add promising transformations of features (e.g., log(x), sqrt(x), x^2, etc.).
o Aggregate features into promising new features.

► Feature scaling: standardize or normalize features.


Explanatory Data Analysis (EDA)

In this phase, we will look toward those features which did not consider in the selection but
are contributing factors for prediction.
Here we can see that customers who took fiber optics for month-to-month contracts whether
male/female resulted in churn.

Month-to-month contracts, absence of online security, and tech support seem to be positively
correlated with churn. While tenure and two-year contracts seem to be negatively correlated with
churn.
Interestingly, services such as Online security, streaming TV, online backup, tech support, etc.
without internet connection seem to be negatively related to churn.
We will explore the patterns for the above correlations below before we delve into modeling and
identifying the important variables.
A.) Demographics - Let us first understand the gender, age range, partner and dependent status of the
customers

1. Gender Distribution - About half of the customers in our data set are male while the other half are
female
2. % Senior Citizens - There are only 16% of the customers who are senior citizens. Thus most of our
customers in the data are younger people.
3. Partner and dependent status- About 50% of the customers have a partner, while only 30% of the total
customers have dependents.

**What would be interesting is to look at the % of customers, who have partners, also have
dependents. We will explore this next. **
4. Interestingly, among the customers who have a partner, only about half of them also have a
dependent, while other half do not have any independents. Additionally, as expected, among the
customers who do not have any partner, a majority (80%) of them do not have any dependents..

B.) Customer Account Information: Let u now look at the tenure, contract

1. Tenure: After looking at the below histogram we can see that a lot of customers have been with the
telecom company for just a month, while quite a many are there for about 72 months. This could be
potential because different customers have different contracts. Thus, based on the contract they are
into it could be more/less easy for the customers to stay/leave the telecom company.
2. Contracts: To understand the above graph, lets first look at the # of customers by different contracts.

3. Below we will understand the tenure of customers based on their contract type.
Interestingly most of the monthly contracts last for 1-2 months, while the 2-year contracts tend to last
for about 70 months. This shows that the customers taking a longer contract are more loyal to the
company and tend to stay with it for a longer period.

This is also what we saw in the earlier chart on correlation with the churn rate.

4. Also visualized all the features within the dataset & came to know the distributions.
Now let's take a quick look at the relationship between monthly and total charges
We will observe that the total charges increase as the monthly bill for a customer increases.
I also looked at any differences between the % of customers with/without dependents and partners by
gender. There is no difference in their distribution by gender. Additionally, there is no difference in
senior citizen status by gender. We will observe that the total charges increase as the monthly bill for a
customer increase.

Finally, let's take a look at out predictor variable (Churn) and understand its interaction with other
important variables as was found out in the correlation plot.
In our data, 74% of the customers do not churn. Clearly, the data is imbalanced as we would expect a
large majority of the customers to not churn. This is important to keep in mind for our modeling as
skewness could lead to a lot of false negatives. We will see in the modeling section how to avoid
skewness in the data.
Let's now explore the churn rate by tenure, seniority, contract type, monthly charges, and total charges
to see how it varies by these variables.
i.) Churn vs Tenure: As we can see from the below plot, the customers who do not churn, tend to stay
for longer tenure with the telecom company.
ii.) Churn by Contract Type: Similar to what we saw in the correlation plot, the customers who have a
month to month contract have a very high churn rate.

iii.) Churn by Seniority: Senior Citizens have almost double the churn rate than younger population.
(iv) Churn by Monthly Charges: Higher % of customers churn when the monthly charges are
high**

v.) Churn by Total Charges: It seems that there is higher churn when the total charges are lower.
Model Selection
Train-Test Split:

To create the model, we train our dataset while testing data set is used to test the
performance. So, in our data, we have split into 80% for training data and 20% for testing
data because it makes the classification model better whilst more test data makes the error
estimate more accurate.

Feature Scaling
Compare the base line algorithms for second iteration.
Following are model, we applied to check which model gives better accuracy:

Support Vector Classifier (SVC):


This algorithm is used for classification problem. The main objective of SVC is to fit to the
data we provide, returning a “best fit” hyperplane that divides, or categorizes, your data.
From there, when obtaining the hyperplane, you'll then feed some options to your category to
examine what the "predicted" class is.

Decision Tree:
A decision tree is non-parametric supervised learning. It is used for both classification and
regression problems. It is a flowchart-like structure in which each internal node represents a
“test” on an attribute, each branch represents the outcome of the test, and each leaf node
represents a class label. The path between root and leaf represents classification rules. It
creates a comprehensive analysis along with each branch and identifies decision nodes that
need furtheranalysis.

Random Forest:

Random Forest is a meta estimator that uses the number of decision tree to fit the various sub
samples drawn from the original dataset. we also can draw data with replacement as per the
requirements.
As we had number of features and most of them were of great importance, so we used
features to get to know which of them are contributing towards the accuracy of the model.
We used Decision tree, Random Forest for feature selection so using decision tree we got
accuracy [80] and by using a random forest we got [80%] so random forest gave us four
features
Index (['tenure', 'Contract', 'MonthlyCharges', 'TotalCharges'], dtype='object')
[(0.2251735641431145, 'Contract'), (0.1687558104226648, 'tenure'),
(0.12539865168020692, 'OnlineSecurity'), (0.1128092761196452, 'TechSupport'),
(0.10731999001345587, 'TotalCharges'), (0.08573112448285626, 'MonthlyCharges'),
Here we can see contract is having more importance resulting factor for churn.

And for decision tree we got accuracy of [77%]. Used heat Map for correlation checking:

K-Nearest Neighbours (KNN):


K-Nearest Neighbours (KNN) is supervised learning algorithm which is used to solve
regression and classification problem both. Where ‘K’ is number of nearest neighbours. It is
simple to implement, easy to understand and it is lazy algorithm. Lazy algorithm means it
does not need any training data points for model generation. All training data used in the
testing phase.

Naïve Bayes:
A Naive Bayes Classifier is a supervised machine learning algorithm which uses the Bayes’
Theorem, that features are statistically independent. It finds many uses in the probability
theory and statistics. By simple machine learning problem, where we need to learn our model
from a given set of attributes (in training examples) and then form a hypothesis or a relation
to a response variable.

Logistic Regression:

Logistic regression is a classification algorithm used to assign observations to a discrete set of


classes. Logistic Regression is a Machine Learning algorithm which is used for the
classification problems; it is a predictive analysis algorithm and based on the concept of
probability.
It is important to scale the variables in logistic regression so that all of them are within a
range of 0 to 1. This helped me improve the accuracy from 79.7% to 80.7%.
Model Comparison:
Model Comparing

Compare Baseline Classification Algorithms - Second Iteration Using Accuracy, Precision, Recall, F1
and F2 Score Metrics
Model Evaluation:
Train & evaluate Chosen Model: Let’s fit the selected model (Logistic Regression in this case)
on the training dataset and evaluate the results.
Model Improvement: (Hyperparameter tuning)

In Machine Learning, there are a couple of ways to get better performance from your model. One
method is called Hyperparameter Tuning. The idea behind Hyperparameter Tuning is to find a
combination of hyperparameters that result in the best solution to the Machine Learning problem.
Sort of like the knobs used to tweak sound equalizers. The model above used mostly default
settings of the various hyperparameters.

Here we tune the LR model to increase model performance without overfitting


the model.
Compare the model predictions against the test set
Hyperparameter Tunning

Based on original and Upsampling data tunned hyperparameter for our training data for all
three models (means for all the used classifiers (algorithms).

Comparison

During the hyperparameter tuning process, I got the best parameters for each model. using this
parameter and the corss_val_score method, I have compared different metrics and tried to find best
threshold value with a lower log loss score.

Intuition process -

Because the target variable is a binary value, log loss would be a better metric to find the
uncertainty ofthe model. Once I'll explore a better model with lower log loss score, then I'll
tune threshold to improve F1/Recall/Precision score.

Model conclusion

Based on the model comparison and evaluation process, upsampling data works better
during the training process, however not with unseen data (based on log loss score). One of
the reasonscould be data leakage in corss_val_score step.

However, log loss score for original dataset remains same with training dataset as well as
testing dataset.

From above analysis, gradient boosting with original dataset has stable and best score. So, for
prediction process I have used it.

Exploratory Data Analysis Concluding Remarks:

Summary of the key findings from this EDA:


 The dataset does not have any missing or erroneous data values.
 The strongest positive correlation with the target features is Monthly Charges and Age
whilst the negative correlation is with Partners, Dependents, and Tenure.
 The dataset is imbalanced with many customers being active.
 There is multicollinearity between Monthly Charges and Total Charges. Dropping Total
Charges has decreased the VIF values considerably.
 Most of the customers in the dataset are younger people.
 There are a lot of new customers in the organization (less than 10 months old) followed by
a loyal customer base that’s above 70 months old.
 Most of the customers seem to have phone service with Monthly charges spanning
between $18 to $118 per customer.
 Customers with a month-to-month connection have a very high probability to churn that
too if they have subscribed to pay via electronic checks.

Save Model
CHAPTER VII

Results

Recommendation and Request


 We should pay more attention to customers who meet the criteria below
 Contract: Month-to-month
 Tenure: Short tenure
 Internet service: Fiber optic
 Payment method: Electronic check

 Please, evaluate the service!


Especially for internet service (fiber optic) and payment methods (electronic check)
 Can we give more benefit to a new customer?
Because the new customer has a high probability to churn
CHAPTER VIII

Conclusion
A Comparison study has been made between prominent classifiers namely LR and all
Boosting algorithms, for improving the accuracy of customer churn prediction.

LR is outperforms all other models. We consider LR here.

Secondly, the Boosting classifiers also perform well. Next, the work was focused to identify
the attribute which has higher cognition for churn using XG boost classifier. The experiment
result shows that Fiber Optic customers with greater monthly charges have a higher influence
for churn. Anticipated directions can be to predict by a hybrid of classifiers which gives high
accuracy and desirable results.
Here we had past records of customers who had churned and using that data we predicted
whether new customers would tend to churn or not, this will help the companies to get to
know the behavior of customers & how to maintain their interests in the services of the
company. Further, the company can also use a recommender system to retain customers and
avoid further churns. We used various algorithms wherein Logistic regression gave us high
accuracyclose to this accuracy were Random Forest, and SVM.
The dataset did not consist of records that would tell us whether the customer has switched
the services, which will help in recommending new services further. Now we are going to
builda recommender system to avoid churns & retain the old customers.
CHAPTER IX

Future Enhancements

► Recommendations to telecom customer how to retain customers


► Tableau integration for Data visualization and end user usage
► Cloud deployment (Heroku)
► Generalizing ML pipeline for any kind of Churn (Ex: Employee churn,
Insurance Churn, Banking churn etc..)
CHAPTER X

References
In the current project, the literature inclined towards the new domain of conversational
information retrieval is considered. The following journals, websites and books are referred
throughout the project.

[1] https://www.researchgate.net/publication/329893308_Customer_Churn_Warning_with_M
achine_Learning

[2] https://machinelearningmastery.com/how-to-fix-futurewarning-messages-in-scikit-learn/

[3] Hands-on Scikit-Learn for Machine Learning Applications: Data Science Fundamentals
with Python

[4] https://www.folio3.ai/customer-churn-prediction/

[5] https://medium.com/@b.khaleghi/what-makes-predicting-customer-churn-a-challenge-
be195f35366e

In addition to above a few other resources

Fighting Churn with Data (The science and strategy of Customer Retention) – Manning
Publication

Customer Retention www.kaggle.com

UCI Machine learning repository - https://archive.ics.uci.edu/ml/index.php

Hands On Machine Learning with Scikit-Learn (Concepts, Tools, Techniques to Build


Intelligent Systems) – O’Reilly Publications

An Introduction to Statistical Learning for Machine Learning - Springer

Publications Datacamp – Beginner Tutorial for data science coding

You might also like