Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Data Mining

Project

Business Report

27.06.2021

Rakshit Tibrewal

1
Index

Contents Page Number


Problem 1 3
Problem 2 4-5
Thank you 6

2
Problem 1:

A leading bank wants to develop a customer segmentation to give promotional offers to its
customers. They collected a sample that summarizes the activities of users during the past few
months. You are given the task to identify the segments based on credit card usage.

1.1 Read the data, do the necessary initial steps, and exploratory data analysis (Univariate, Bi-
variate, and multivariate analysis).

Solution:

Refer python file ‘Problem 1 Solution’

1.2 Do you think scaling is necessary for clustering in this case? Justify

Solution:

Yes, Scaling of data is necessary maintain the same standardize data in the bank marketing dataset.

1.3 Apply hierarchical clustering to scaled data. Identify the number of optimum clusters using
Dendrogram and briefly describe them

Solution:

Refer python file ‘Problem 1 Solution’

1.4 Apply K-Means clustering on scaled data and determine optimum clusters. Apply elbow curve
and silhouette score. Explain the results properly. Interpret and write inferences on the finalized
clusters.

Solution:

1.5 Describe cluster profiles for the clusters defined. Recommend different promotional strategies
for different clusters.

Solution:

3
Problem 2:

An Insurance firm providing tour insurance is facing higher claim frequency. The management
decides to collect data from the past few years. You are assigned the task to make a model which
predicts the claim status and provide recommendations to management. Use CART, RF & ANN and
compare the models' performances in train and test sets.

2.1 Read the data, do the necessary initial steps, and exploratory data analysis (Univariate, Bi-
variate, and multivariate analysis).

Solution:

Refer the file ‘Problem 2 Solution ANN’.

2.2 Data Split: Split the data into test and train, build classification model CART, Random Forest,
Artificial Neural Network

Solution:

CART Model

Refer the file ‘Problem 2 Solution CART’

Random Forest Model

Refer the file ‘Problem 2 Solution_RF’

Artificial Neural Network

Refer the file ‘Problem 2 Solution ANN’.

2.3 Performance Metrics: Comment and Check the performance of Predictions on Train and Test sets
using Accuracy, Confusion Matrix, Plot ROC curve and get ROC_AUC score, classification reports for
each model. 

Solution:

CART Random Forest Artificial Neural


Model

Test AUC: 79.2% AUC: 81.4% AUC: 81.4%

Accuracy: 76% Accuracy: 76% Accuracy: 77%

Train AUC: 86.4% AUC: 83.4% AUC: 79.3%

Accuracy: 81% Accuracy: 79% Accuracy: 81%

4
2.4 Final Model: Compare all the models and write an inference which model is best/optimized.

Solution:

The Random forest model and ANN model has given highest performance compared to CART model.
The deviation between training and testing dataset CART model is not ideal model to use. The ANN
model has given second best performance as compared with random forest model based on the
accuracy percentages. The Random forest model is the best fit with high accuracy percentages
among other models, the percentage deviation between training and testing data set is 3% which
minimum as compared with other models.

2.5 Inference: Based on the whole Analysis, what are the business insights and recommendations?

Solution:

The purpose for the above case problem is to develop a predictive model to predict if an insurance
firm providing tour insurance is facing higher claim frequency. As per prediction there is a probability
there will be more claim in this year as per AUC and ROC area curve. The customer will responses
will increase if they insurance policy is offered via promotions and advertisements.

5
Thank
You
6

You might also like