2nd Review

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 21

SRM INSTITUTE OF SCIENCE AND TECHNOLOGY

SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTING TECHNOLOGIES
18CSP107L - MINOR PROJECT

Early Heart Disease Prediction


using Machine Learning algorithms.

Batch ID: B031

Student 1 Reg. No: RA2011003010389


Guide name: Dr. A. Anbarasi Student 1 Name: Mehul Saini
Designation: Assistant Professor
Student 2 Reg. No: RA2011003010386
Department: C-Tech
Student 2 Name: Shivank
Abstract

Cardiovascular disease refers to any critical condition that impacts the heart.
Because heart diseases can be life-threatening, researchers are focusing on
designing smart systems to accurately diagnose them based on electronic health data,
with the aid of machine learning algorithms. This work presents several machine
learning approaches for predicting heart diseases, using data of major health factors
from patients. This project will demonstrate four classification methods: K- Nearest
Neighbour (KNN), Support Vector Machine (SVM), Random Forest (RF), and
Naive Bayes (NB), to build the prediction models. Data pre-processing and feature
selection steps will be done before building the models. The models were evaluated
based on the accuracy, precision, recall, and F1-score (Machine learning evaluation
metric that measures a model's accuracy). The SVM model is projected to perform
best with 91.67% accuracy.

26-8-2023 2
Introduction
• Globally, Cardiovascular Disease (CVDs) is the primary cause of morbidity and
mortality, accounting for more than 70% of all fatalities. We’re going with Coronary
Artery Disease (Damage in heart’s major blood vessel), High Blood pressure, Cardiac
Arrest. According to the 2017, Global Burden of Disease research, cardiovascular disease is
responsible for about 43% of all fatalities. Common risk factors for heart disease in high-
income nations include lousy diet, cigarette use, excessive sugar consumption, and obesity or
excess body fat. However, low and middle-income nations also see a rise in chronic illness
prevalence.
• In addition, technologies such as Electrocardiograms and CT scans, critical for
diagnosing coronary heart disease (reduction of blood flow in heart), are sometimes too
costly and impractical for consumers. The reason mentioned above alone has resulted in
the deaths of 17 million people . Twenty five to thirty percent of firms annual medical
expenses were attributable to employees with cardiovascular disease.
• Therefore, early detection of heart disease is essential to lessen its physical and monetary
cost to people and institutions. According to the WHO estimate, the overall number of deaths
from CVDs would rise to 23.6 million by 2030, with heart disease and stroke being the
leading causes.

26-8-2023 3
Problem Statement
Create a predictive model employing advanced machine learning
methodologies to ascertain the presence or absence of cardiovascular
ailments (which includes Coronary Artery Disease (Damage in heart’s
major blood vessel), High Blood pressure and Cardiac Arrest ) within
patient cohorts, based on intricate clinical and biometric attributes. The
objective is to empower medical practitioners with a highly precise
diagnostic tool, enhancing prognostic capabilities, and optimizing healthcare
resource allocation.

In this intricate and technically nuanced problem statement, the aim is to


construct a machine learning solution, optimizing its predictive accuracy
and clinical utility for cardiovascular disease diagnosis, all while
adhering rigorously to healthcare regulations and ethical considerations.

26-8-2023 4
Motivation
• The main motivation of doing this research is to present a heart
disease prediction model for the prediction of occurrence of heart
disease. Further, this research work is aimed towards identifying the
best classification algorithm for identifying the possibility of heart
disease in a patient.
• This work is justified by performing a comparative study and analysis
using three classification algorithms namely Naïve Bayes, Decision
Tree, and Random Forest are used at different levels of evaluations.
• Although these are commonly used machine learning algorithms, the
heart disease prediction is a vital task involving highest possible
accuracy. Hence, the three algorithms are evaluated at numerous
levels and types of evaluation strategies.
• This will provide researchers and medical practitioners to establish a
better.
26-8-2023 5
Literature Review

1. Effective Heart Disease Prediction Using Machine Learning Techniques.


(Pandit Deendayal Energy University and Institute of Applied Sciences and
Intelligent Systems, National Research Council of Italy, 73100 Lecce, Italy)
By:- Chintan M. Bhatt, Parth Patel, Tarang Ghetia and Pier Luigi Mazzeo.

The primary objective of this study was to classify heart disease using different models and a real-world
dataset. The k-modes clustering algorithm was applied to a dataset of patients with heart disease to
predict the presence of the disease. The dataset was preprocessed by converting the age attribute to
years and dividing it into bins of 5-year intervals, as well as dividing the diastolic and systolic blood
pressure data into bins of 10 intervals. The dataset was also split on the basis of gender to take into
account the unique characteristics and progression of heart disease in men and women.
The results indicated that the MLP(Multi-Layer Perceptron) model had the highest accuracy of 87.23%.
These findings demonstrate the potential of k-modes clustering to accurately predict heart disease and
suggest that the algorithm could be a valuable tool in the development of targeted diagnostic and
treatment strategies for the disease. The study utilized the Kaggle cardiovascular disease dataset with
70,000 instances, and all algorithms were implemented on Google Collab. The accuracies of all
algorithms were above 86% with the lowest accuracy of 86.37% given by decision trees and the
highest accuracy given by multilayer perceptron, as previously mentioned.

26-8-2023 6
Literature Review

2.) Heart disease prediction using machine learning algorithms (Bharti


Vidyapeeth’s College Of Engineering, New Delhi.)
By:- Harshit Jindal1, Sarthak Agrawal1, Rishabh Khera1, Rachna Jain2 and Preeti Nagrath.

A cardiovascular disease detection model has been developed using three ML classification modelling
techniques. This project predicts people with cardiovascular disease by extracting the patient medical
history that leads to a fatal heart disease from a dataset that includes patients’ medical history such as
chest pain, sugar level, blood pressure, etc. This Heart Disease detection system assists a patient based
on his/her clinical information of them been diagnosed with a previous heart disease. The algorithms
used in building the given model are Logistic regression, Random Forest Classifier and KNN. The
accuracy of our model is 87.5%. Use of more training data ensures the higher chances of the model to
accurately predict whether the given person has a heart disease or not. By using these, computer aided
techniques we can predict the patient fast and better and the cost can be reduced very much. There are a
number of medical databases that we can work on as these Machine learning techniques are better and
they can predict better than a human being which helps the patient as well as the doctors. Therefore, in
conclusion this project helps us predict the patients who are diagnosed with heart diseases by cleaning
the dataset and applying logistic regression and KNN to get an accuracy of an average of 87.5% on our
model which is better than the previous models having an accuracy of 85%. Also, it is concluded that
accuracy of KNN is highest between the three algorithms that we have used i.e. 88.52%.

26-8-2023 7
Literature Review

3.) Heart Disease Prediction Using Machine Learning (University of


Sharjah)
By:- Chaimaa Boukhatem, Heba Yahia Youssef, Ali Bou Nassif

This work aims to predict the existence of heart disease in patients according to specific
health measurements.The paper demonstrated 4 classification mechanism to build the
prediction model. The data was collected and cleaned from any missing values and extreme
outliers. In addition, it was preprocessed to fit the model requirements, where it went into
different phases of visualizing the imbalances, obtaining the correlation matrix, using
dimensionality reduction techniques, and finally splitting using Hold-out. The model was
trained and tested for each machine learning algorithm. SVM algorithm with linear kernel
had the best results with a 91.67% accuracy, 92.31% precision, 88.89% recall, and F1 Score
of 90.56%. The algorithms used were able to extract the complex relations between the
symptoms and the disease. Machine learning algorithms can also be applied to other types of
diseases, especially with the generation of more accurate datasets in the medical field in the
future. This work can be enhanced by applying more extensive data analysis and trying
additional algorithms to reach the maximum possible accuracy.

26-8-2023 8
Existing System
• Heart disease is even being highlighted as a silent killer which leads to the death of a
person without obvious symptoms. The nature of the disease is the cause of growing
anxiety about the disease & its consequences.
• Hence continued efforts are being done to predict the possibility of this deadly disease in
prior. So that various tools & techniques are regularly being experimented with to suit
the present-day health needs. Electrocardiograms and CT scans are the
current existing systems.

• Machine Learning techniques can be a boon in this regard. Even though heart disease can
occur in different forms, there is a common set of core risk factors that influence whether
someone will ultimately be at risk for heart disease or not. By collecting the data from
various sources, classifying them under suitable headings & finally analysing to extract
the desired data we can conclude. This technique can be very well adapted to the do
the prediction of heart disease.
• As the well-known quote says “Prevention is better than cure”, early prediction & its
control can be helpful to prevent & decrease the death rates due to heart disease.

26-8-2023 9
Problem statement and Objectives

The major challenge in heart disease is its early detection. There


are instruments available which can predict heart disease but
either it are expensive or are not efficient to calculate chance of
heart disease in human. Early detection of cardiac diseases can
decrease the mortality rate and overall complications. However,
it is not possible to monitor patients everyday in all cases
accurately and consultation of a patient for 24 hours by a
doctor is not available since it requires more sapience, time
and expertise. Since we have a good amount of data in today’s
world, we can use various machine learning algorithms to
analyze the data for hidden patterns. The hidden patterns can
be used for health diagnosis in medicinal data.

26-8-2023 10
Innovation Idea

By incorporating multiple data sources and leveraging cutting-edge


machine learning techniques, your project can create a more robust
and accurate heart disease prediction system, ultimately leading to
better patient outcomes and improved healthcare decision-making.

Rather than relying solely on traditional health data such as medical


records and clinical measurements, your project can incorporate a
multi-modal data fusion approach. This innovation involves
integrating various datasets to create a more comprehensive and
accurate heart disease prediction model.

We are doing algorithm fusion in our project which will lead to


better results.
26-8-2023 11
Scope and Application
The scope of the project, "Heart Disease Prediction using Machine Learning," is to develop a robust and accurate
predictive system for identifying individuals at risk of heart disease. The project encompasses various
components, including data collection, pre-processing, model development, evaluation, and deployment.

The "Heart Disease Prediction using Machine Learning" project has several practical applications and potential
benefits, including:
i. Early Diagnosis: The project can assist in the early detection of heart disease, enabling timely interventions and
treatments to improve patient outcomes.
ii. Preventive Healthcare: By identifying individuals at risk, the project can support preventive measures such as
lifestyle changes and medication to reduce the risk of heart disease.
iii.Personalized Treatment: The project can provide tailored treatment recommendations based on individual risk
factors, optimizing healthcare delivery.
iv.Remote Monitoring: The real-time monitoring and telemedicine components facilitate remote patient care,
especially relevant in situations where physical visits to healthcare facilities are challenging.
v. Research and Insights: The project's data analysis can contribute to a better understanding of heart disease risk
factors and trends, assisting researchers in the field.
vi. Improved Decision Support: Healthcare providers can make more informed decisions with the support of
predictive models, improving patient care.
vii.Patient Empowerment: Patients can actively engage in their healthcare by monitoring their health data and
following personalized recommendations.

26-8-2023 12
Architecture Model

26-8-2023 13
Proposed Modules and Description

• We did Ensemble Learning for this Project which includes bagging and boosting.
• Merging Machine Learning models can be done in a variety of ways, but the most
common methods include bagging, boosting, and stacking.
• Bagging involves training multiple models on different subsets of the same data
set and then aggregating their predictions. This is often used with decision trees or
random forests to reduce variance and overfitting. Boosting involves training
multiple models sequentially, where each model attempts to correct the errors of
the previous one.
• Lastly, stacking involves training multiple models on the same or different data
sets and then using their predictions as inputs for another model. This is often
used with neural networks or linear models to increase accuracy and
generalization.

26-8-2023 14
Proposed Modules and Description
We used a single algorithm (linear regression) to learn all the components at once. But it's also
possible to use one algorithm for some of the components and another algorithm for the rest. We
are going with ADA Boost and Random Forest Classifier algorithm. This way we can always
choose the best algorithm for each component. To do this, we use one algorithm to fit the original
series and then the second algorithm to fit the residual series.
• In detail, the process is this:-
# 1. Train and predict with first model
model_1.fit(X_train_1, y_train)
y_pred_1 = model_1.predict(X_train)
# 2. Train and predict with second model on residuals
model_2.fit(X_train_2, y_train - y_pred_1)
y_pred_2 = model_2.predict(X_train_2)
# 3. Add to get overall predictions:-
y_pred = y_pred_1 + y_pred_2

26-8-2023 15
Attributes of Datasets

26-8-2023 16
Intermediate Results and Discussion

• The expected attributes leading to heart disease in patients are available


in the dataset which contains 76 features and 14 important features that
are useful to evaluate the system are selected among them. If all the
features taken into the consideration then the efficiency of the system
the author gets is less.
• To increase efficiency, attribute selection is done. In this n features have
to be selected for evaluating the model which gives more accuracy. The
correlation of some features in the dataset is almost equal and so they
are removed. If all the attributes present in the dataset are taken into
account then the efficiency decreases considerably.
• All the machine learning methods accuracies are compared based on
which one prediction model is generated.
• Hence, the aim is to use various evaluation metrics and various
algorithms and merge them like confusion matrix, accuracy, precision,
recall, and f1-score which predicts the disease efficiently.

26-8-2023 17
Engineering Reference
Data mining plays a crucial role in heart disease prediction using
machine learning projects. It involves extracting valuable patterns,
insights, and knowledge from large datasets to aid in the accurate
prediction and early detection of heart disease.

Data mining in heart disease prediction projects helps healthcare


professionals identify individuals at risk of heart disease early,
allowing for timely interventions and improved patient outcomes. It
also contributes to the ongoing research in cardiovascular health by
uncovering valuable insights from the data.

Collection, selection to Data Preprocessing and model training


all is done with the help of Data Mining.
26-8-2023 18
Proposed System

The working of the system starts with the collection of data and
selecting the important attributes. Then the required data is
Pre-processed into the required format. The data is then divided into
two parts training and testing data. The algorithms are applied and
the model is trained using the training data. The accuracy of the
system is obtained by testing the system using the testing data.

This system is implemented using the following modules.


1.) Collection of Dataset
2.) Selection of attributes
3.) Data Pre-Processing
4.) Balancing of Data
5.) Disease Prediction
26-8-2023 19
Improvement
• Expand the scope of the project:- The original project may have focused on
predicting heart disease in a specific population, such as men or women.
You could expand the scope to include other populations, such as children or the
elderly. You could also expand the scope to include other types of heart disease, such
as heart attack or stroke.
• Improve the accuracy of the model:- The original project may have used a simple
algorithm, such as logistic regression. You could improve the accuracy of the model
by using a more complex algorithm, such as random forest or deep learning. You
could also improve the accuracy of the model by collecting more data or by using
better feature selection techniques.
• Make the model more interpretable:- The original project may have produced a
black box model, which is a model that is difficult to understand.
• Deploy the model in a real-world setting:- The original project may have only
tested the model on a dataset of historical data. You could deploy the model in a real-
world setting by testing it on new data or by making it available to doctors or patients.

26-8-2023 20
References

[1] Soni J, Ansari U, Sharma D & Soni S (2011). Predictive data mining for medical diagnosis: an overview of
heart disease prediction. International Journal of Computer Applications, 17(8), 43-8
[2] Dangare C S & Apte S S (2012). Improved study of heart disease prediction system using data mining
classification techniques. International Journal of Computer Applications, 47(10), 44-8.
[3] Ordonez C (2006). Association rule discovery with the train and test approach for heart disease prediction.
IEEE Transactions on Information Technology in Biomedicine, 10(2), 334-43.
[4] Shinde R, Arjun S, Patil P & Waghmare J (2015). An intelligent heart disease prediction system using k-
means clustering and Naïve Bayes algorithm. International Journal of Computer Science and Information
Technologies, 6(1), 637-9.
[5] Bashir S, Qamar U & Javed M Y (2014, November). An ensemble-based decision support framework for
intelligent heart disease diagnosis. In International Conference on Information Society (i-Society 2014) (pp.
259-64). IEEE. ICCRDA 2020 IOP Conf. Series: Materials Science and Engineering 1022 (2021) 012072 IOP
Publishing doi:10.1088/1757-899X/1022/1/012072 9
[6] Jee S H, Jang Y, Oh D J, Oh B H, Lee S H, Park S W & Yun Y D (2014). A coronary heart disease prediction
model: the Korean Heart Study. BMJ open, 4(5), e005025.
[7] Ganna A, Magnusson P K, Pedersen N L, de Faire U, Reilly M, Ärnlöv J & Ingelsson E (2013). Multilocus
genetic risk scores for coronary heart disease prediction. Arteriosclerosis, thrombosis, and vascular biology,
33(9), 2267-72.
[8] Jabbar M A, Deekshatulu B L & Chandra P (2013, March). Heart disease prediction using lazy associative
classification. In 2013 International MutliConference on Automation, Computing,Communication, Control and
Compressed Sensing (iMac4s) (pp. 40- 6). IEEE.

26-8-2023 21

You might also like