Welcome to Scribd!

Palash Bhai - Machine Learning Assignment

Uploaded by

100% found this document useful (1 vote)

269 views18 pages

The document provides details on analyzing and modeling a political dataset. It includes: 1. Descriptive statistics and checks for null values which found no missing data but some duplicate rows. 2. Univariate and bivariate analysis along with outlier detection was performed. 3. Several models were applied including logistic regression, LDA, KNN, naive bayes, and random forest. Random forest performed best with 99.7% accuracy on training data and 89.6% on test data. 4. The models were compared and random forest was determined to be the best/most optimized model based on AUC scores.

Original Description:

ML Assignment

Original Title

Palash Bhai_Machine Learning Assignment

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

100% found this document useful (1 vote)

269 views18 pages

Palash Bhai - Machine Learning Assignment

Uploaded by

PalashKulshrestha

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 18

Search inside document

Problem 1

1. Read the dataset. Do the descriptive statistics and do null value condition check. Write an inference
on it.

Head of the data is as follows

Information about the data type of various columns is given below

Description of the data set is given below. Few interpretations are as follows:

1. Mean and 50% data representation for Blair, Hague, Europe and Political knowledge has
significant difference
2. Count for all columns is same, therefore, there are no missing values
Checking for null values to confirm our interpretation

There are 8 duplicate rows in the dataset, however, they cannot be inferred as the age for various
duplicate items is different so we will not eliminate these rows

2. Perform Univariate and Bivariate Analysis. Do exploratory data analysis. Check for Outliers.
Checking for outlier
Data Preparation

1. Encode the data (having string values) for Modelling. Is Scaling necessary here or not? Data Split: Split
the data into train and test (70:30).

Modelling

1. Apply Logistic Regression and LDA (linear discriminant analysis).

Logistic Regression

Model accuracy score for training data– 83.97%

AUC for training data – 88.9%

Model score for training data – 82.31%

AUC for training data – 88.9%

Confusion Matrix for training data

Confusion matrix for test data
Linear Discriminant Analysis
AUC for the Training Data: 88.9%

AUC for the Test Data: 88.4%

2. Apply KNN Model and Naïve Bayes Model. Interpret the results.

KNN Model

AUC for the Training Data: 91.1%

AUC for the Test Data: 86.1%

KNN_train_precision: 0.87

KNN_train_recall: 0.88

KNN_train_f1: 0.88

KNN_test_precision: 0.86

KNN_test_recall: 0.87

KNN_test_f1: 0.86

Naïve Bayes Model

AUC for the Training Data: 88.6%

AUC for the Test Data: 88.5%

3. Model Tuning, Bagging (Random Forest should be applied for Bagging) and Boosting.
AUC for the Training Data: 99.7%

AUC for the Test Data: 89.6%

4. Performance Metrics: Check the performance of Predictions on Train and Test sets using
Accuracy, Confusion Matrix, Plot ROC curve and get ROC_AUC score for each model. Final
Model: Compare the models and write inference which model is best/optimized.

Based on the various results of the models we can infer that Random Forest has the most optimized
model as it has the highest data as per Area under Curve.

Particulars Training Data Test Data

Logistical Regression 83.9% 82.37%
LDA 88.9% 88.4%
KNN 91.1% 86.1%
Naives Bayes 88.6% 88.5%
Random Forest 99.7% 89.6%

Inference

1. Based on these predictions, what are the insights?

Accuracy on Test data is 99% and on Train data is 80%.

AUC is greater than 89% for both.

Recall and Precision is low and same on both data.

While the model results between training and test sets are similar, indicating no under or overfitting
issues.

Problem 2
1. Find the number of characters, words and sentences for the mentioned documents.

For 1941 Roosevelt speech

The number of sentences in the text are: 68

The number of words in the text are: 1,360

The number of characters in the text are: 7,571

For 1961 Kennedy speech

The number of sentences in the text are: 52

The number of words in the text are: 1,390

The number of characters in the text are: 7,618

For 1973 Nixon speech

The number of sentences in the text are: 68

The number of words in the text are: 1,819

The number of characters in the text are: 9,991

2. Remove all the stopwords from all the three speeches.

3. Which word occurs the most number of times in his inaugural address for each president?
Mention the top three words. (after removing the stopwords)
Top three words for Roosevelt are: nation, know, spirit

Top three words for Kennedy are: let, us, world

Top three words for Nixon are: us, let, America

4. Plot the word cloud of each of the speeches of the variable. (after removing the stopwords)

Word cloud for Roosevelt

Word cloud for Kennedy after cleaning

Word cloud for Nixon after cleaning

Analysis of Transport Choice of Employees - A Project On Machine Learning
Document24 pages
Analysis of Transport Choice of Employees - A Project On Machine Learning
Shyam Kishore Tripathi
100% (10)
Time Series Forecasting - SoftDrink - Business Report
Document37 pages
Time Series Forecasting - SoftDrink - Business Report
Divjyot
67% (3)
MRA Project Milestone 2
Document31 pages
MRA Project Milestone 2
Puvya Ravi
100% (2)
DM Gopala Satish Kumar Business Report G8 DSBA
Document26 pages
DM Gopala Satish Kumar Business Report G8 DSBA
Satish Kumar
100% (2)
Clustering Analysis: Prepared by Muralidharan N
Document16 pages
Clustering Analysis: Prepared by Muralidharan N
rakesh sandhyapogu
No ratings yet
Time Series Forecasting
Document1 page
Time Series Forecasting
Ashu
0% (1)
Data Visualization in Tableau - Car Insurance Claim Project
Document51 pages
Data Visualization in Tableau - Car Insurance Claim Project
Tunde Asaaju
50% (2)
Project Report - 2feb20
Document6 pages
Project Report - 2feb20
E421660
67% (3)
Coffee Café Night Project
Document18 pages
Coffee Café Night Project
Bindra Jasvinder
100% (1)
Machine Learning Project: Raghul Harish
Document46 pages
Machine Learning Project: Raghul Harish
ARUNKUMAR S
100% (1)
Project Report
Document36 pages
Project Report
Akshaya Kennedy
100% (3)
Business Report TSF - Rose DataSet
Document52 pages
Business Report TSF - Rose DataSet
Charit Sharma
100% (3)
Machine Learning Business Report - Compress (AutoRecovered)
Document69 pages
Machine Learning Business Report - Compress (AutoRecovered)
Deepanshu Parashar
100% (2)
NIrupam Agarwal Business Report-ML
Document23 pages
NIrupam Agarwal Business Report-ML
Nirupam Agarwal
100% (1)
Cart-Rf-Ann: Prepared by Muralidharan N
Document33 pages
Cart-Rf-Ann: Prepared by Muralidharan N
rakesh sandhyapogu
50% (2)
Car Transport Machine Learning
Document28 pages
Car Transport Machine Learning
Satish Patnaik
88% (8)
DVT Group Assignment PDF
Document14 pages
DVT Group Assignment PDF
Anirban bhattacharya
100% (1)
Time Series Forecasting Business Report: Name: S.Krishna Veni Date: 20/02/2022
Document31 pages
Time Series Forecasting Business Report: Name: S.Krishna Veni Date: 20/02/2022
Krishna Veni
100% (1)
Predictive Model: Submitted by
Document27 pages
Predictive Model: Submitted by
Ankita Mishra
100% (2)
Project ML
Document36 pages
Project ML
ANIL
100% (4)
Predictive Modelling Project 1 PDF
Document38 pages
Predictive Modelling Project 1 PDF
preeti
50% (2)
Time Series Rose Shehroz Arfeen
Document42 pages
Time Series Rose Shehroz Arfeen
Shehroz Khan
100% (1)
FRA Assignment - India Credit Model
Document14 pages
FRA Assignment - India Credit Model
psyish
No ratings yet
Executive Sumary - Rajarshi Das (Data Visualization Using Tableau Project)
Document11 pages
Executive Sumary - Rajarshi Das (Data Visualization Using Tableau Project)
Rajarshi Das
No ratings yet
Business Report Problem 2
Document10 pages
Business Report Problem 2
gowtham
No ratings yet
ML ProjectReport-Sonali Joshi
Document38 pages
ML ProjectReport-Sonali Joshi
sonali
100% (1)
MRA Milestone-1 Graded Project
Document41 pages
MRA Milestone-1 Graded Project
kirti sharma
100% (2)
Cart-Rf-ANN: Prepared by Muralidharan N
Document16 pages
Cart-Rf-ANN: Prepared by Muralidharan N
Krishnaveni Raj
0% (1)
LDA KNN Logistic
Document29 pages
LDA KNN Logistic
shruti gujar
100% (1)
Predictive Modelling Project Gloria Susan Raju 11 APR 2021 PDF
Document56 pages
Predictive Modelling Project Gloria Susan Raju 11 APR 2021 PDF
preeti
No ratings yet
Capstone Project
Document7 pages
Capstone Project
Surya Phani
No ratings yet
Capstone Grp6 PREDICTING INSURANCE RENEWAL PROPENSITY v3
Document24 pages
Capstone Grp6 PREDICTING INSURANCE RENEWAL PROPENSITY v3
Avnika Mehta
No ratings yet
Time Series
Document34 pages
Time Series
Priti
67% (3)
Predictive Modelling Project 2
Document32 pages
Predictive Modelling Project 2
Purva Soni
100% (3)
Project - Time Series Forecasting (Sparkling - CSV) & (Rose - CSV)
Document15 pages
Project - Time Series Forecasting (Sparkling - CSV) & (Rose - CSV)
guillermo coco
100% (1)
Machine Learning Project - Sapan Parikh
Document12 pages
Machine Learning Project - Sapan Parikh
Sapan Parikh
100% (1)
Machine Learning (Project5) PDF
Document13 pages
Machine Learning (Project5) PDF
jagajits
100% (2)
Report - Project8 - FRA - Surabhi - Report
Document15 pages
Report - Project8 - FRA - Surabhi - Report
Surabhi Sood
100% (1)
Time Series Forecasting: Group Assignment - Group 5: Answer
Document29 pages
Time Series Forecasting: Group Assignment - Group 5: Answer
Shubhra Sharma
100% (2)
Lifi
Document16 pages
Lifi
Ankita Mishra
100% (1)
Project Predictive Modeling
Document69 pages
Project Predictive Modeling
yuktha
50% (2)
Mini Project DVT
Document3 pages
Mini Project DVT
sumit kumar
No ratings yet
Mra Project1 - Firoz Afzal
Document20 pages
Mra Project1 - Firoz Afzal
Kkvsh
50% (4)
Girish Chadha - 29th December 2022
Document35 pages
Girish Chadha - 29th December 2022
Girish Chadha
100% (3)
Anamit Deb Gupta Mra - Project Milestone - 1
Document30 pages
Anamit Deb Gupta Mra - Project Milestone - 1
Gupta Anacoolz
100% (1)
Machine Learning VIVEK
Document118 pages
Machine Learning VIVEK
RemyaRS
75% (4)
Grocery Project
Document40 pages
Grocery Project
Keerthi Ga
100% (4)
ML Project Report
Document35 pages
ML Project Report
veerabhadra
100% (2)
Project Time Series Forecasting
Document53 pages
Project Time Series Forecasting
harish kumar
100% (1)
Education - Post 12th Standard - CSV
Document11 pages
Education - Post 12th Standard - CSV
Ruhee's Kitchen
No ratings yet
Shivani Pandey TSF
Document32 pages
Shivani Pandey TSF
Shivich10
100% (1)
Predictive Modelling Project - Business Report
Document23 pages
Predictive Modelling Project - Business Report
gagan verma
100% (1)
MRA Project - Shehroz Khan
Document19 pages
MRA Project - Shehroz Khan
Shehroz Khan
67% (3)
MRA Project ML 1: Abhishek Kapoor Dsba Aug A20
Document47 pages
MRA Project ML 1: Abhishek Kapoor Dsba Aug A20
Lokesh Loke
100% (1)
Final Project ML Nikita Chaturvedi 03.10.2021 Text Analytics
Document32 pages
Final Project ML Nikita Chaturvedi 03.10.2021 Text Analytics
Nikita Chaturvedi
No ratings yet
Sunira - Predictive Modeling
Document65 pages
Sunira - Predictive Modeling
Deepanshu Parashar
100% (1)
Machine Learning Solution
Document12 pages
Machine Learning Solution
prabu2125
100% (1)
MRA - Project - Puvya - Ravi
Document46 pages
MRA - Project - Puvya - Ravi
Puvya Ravi
100% (2)
Anshul Dyundi Machine Learning July 2022
Document46 pages
Anshul Dyundi Machine Learning July 2022
Anshul Dyundi
50% (2)
Stochastic Dynamic Programming and the Control of Queueing Systems
From Everand
Stochastic Dynamic Programming and the Control of Queueing Systems
Linn I. Sennott
No ratings yet
Data Analytics With R: Data Management Project By
Document23 pages
Data Analytics With R: Data Management Project By
Nishant Chaturvedi
No ratings yet
BUS 173 Chapter 91011
Document5 pages
BUS 173 Chapter 91011
Ayesha Siddika
No ratings yet
Simple Regression: Multiple-Choice Questions
Document36 pages
Simple Regression: Multiple-Choice Questions
Nameera Alam
No ratings yet
Beginning Qualitative Research A Philosophical and Practical Guide
Document189 pages
Beginning Qualitative Research A Philosophical and Practical Guide
balint_helga0
No ratings yet
MEC 604 Lectures
Document40 pages
MEC 604 Lectures
Roshan KC
No ratings yet
Project Part II
Document6 pages
Project Part II
Scott Underwood
No ratings yet
Process Alarm Management
Document123 pages
Process Alarm Management
Sudarsan Prathipati
100% (1)
Salinan Dari Untitled0.Ipynb - Colaboratory
Document3 pages
Salinan Dari Untitled0.Ipynb - Colaboratory
Dandi Rifa'i Tarigan
No ratings yet
MAS 132 - Statistics II
Document6 pages
MAS 132 - Statistics II
Amit Kumar Arora
No ratings yet
Eviews 10 Tutorial: Introduction To Econometrics
Document43 pages
Eviews 10 Tutorial: Introduction To Econometrics
ameenbahaa
No ratings yet
Biostatistic
Document20 pages
Biostatistic
andirio7486
No ratings yet
Data Analyst Resume Example
Document2 pages
Data Analyst Resume Example
Hager Khaled
No ratings yet
Lecture Notes in Practical Research 2 Part 1
Document11 pages
Lecture Notes in Practical Research 2 Part 1
Charline A. Radislao
75% (8)
Sai Aravind Resume
Document1 page
Sai Aravind Resume
sankalpakashsingh
No ratings yet
Resume
Document3 pages
Resume
RajatHaldar
No ratings yet
A Comparison of Various Software Development Methodologies: Feasibility and Methods of Integration
Document5 pages
A Comparison of Various Software Development Methodologies: Feasibility and Methods of Integration
Richard De Medeiros Castro
No ratings yet
Manajemen Ekonomi Dan Kerekayasaan
Document16 pages
Manajemen Ekonomi Dan Kerekayasaan
TyasYolanSafitri'Bhs
No ratings yet
Credit Management
Document9 pages
Credit Management
JayshreeDashani
No ratings yet
SR211007115444
Document5 pages
SR211007115444
Gaming With Flame Swap
No ratings yet
B.Sc. Finance Program Objectives
Document8 pages
B.Sc. Finance Program Objectives
sohamc596
No ratings yet
Yusuf Feroze Akhtar Resume AH
Document1 page
Yusuf Feroze Akhtar Resume AH
Yusuf Akhtar
No ratings yet
Daily Dose of Data Science - Archive
Document354 pages
Daily Dose of Data Science - Archive
Zuiver Chan
No ratings yet
Nurwana 2023 (Tjiptono)
Document10 pages
Nurwana 2023 (Tjiptono)
Christopher Oswari
No ratings yet
Demand Forecasting & Collaborating Planning, Forecasting & Replenishment
Document13 pages
Demand Forecasting & Collaborating Planning, Forecasting & Replenishment
Nur aqilah
No ratings yet
Notes On
Document2 pages
Notes On
Krishnadev Madhavan Nair
No ratings yet
R Programming Exam With Solutions
Document9 pages
R Programming Exam With Solutions
Johana Coen Janssen
No ratings yet
IJRPR6647
Document4 pages
IJRPR6647
Anas Ahmad
No ratings yet
Soal Laporan Uji Hipotesis & Selang Kepercayaan
Document2 pages
Soal Laporan Uji Hipotesis & Selang Kepercayaan
alifah rahma
No ratings yet
ASRS Data Analysis v1.0
Document13 pages
ASRS Data Analysis v1.0
Aditya Tomar
No ratings yet
Homework2 Chapter4 Solution
Document7 pages
Homework2 Chapter4 Solution
nc
No ratings yet