Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Week 8

Regression and Linear


Regression Analysis
Dr. Nehad Ramaha,
Computer Engineering Department
Karabük Universities 1
The class notes are a compilation and edition from many sources. The instructor does not claim intellectual property or ownership of the lecture notes.
2
 Supervised Learning:
◦ kNN (k Nearest Neighbors)
◦ Linear Regression
◦ Naïve Bayes
◦ Logistic Regression
◦ Support Vector Machines
◦ Random Forests
 Unsupervised Learning:
◦ Clustering
◦ Factor analysis
◦ Topic Models
 Regression analysis is a statistical method to
model the relationship between a dependent
(target) and independent (predictor) variables
with one or more independent variables.
 Regression analysis helps us to understand how
the value of the dependent variable is changing
corresponding to an independent variable when
other independent variables are held fixed.
 It predicts continuous/real values such as
temperature, age, salary, price, etc.

4
 Suppose there is a marketing company A, who does various
advertisement every year and get sales on that. The below list
shows the advertisement made by the company in the last 5
years and the corresponding sales:
 Now, the company wants to do
the advertisement of $200 in the
year 2024 and wants to know the
prediction about the sales for
this year. So to solve such type of
prediction problems in machine
learning, we need regression
analysis.

5
 Regression is a supervised learning technique
which helps in finding the correlation
between variables and enables us to predict
the continuous output variable based on the
one or more predictor variables.
 It is mainly used for prediction, forecasting,
time series modeling, and determining the
causal-effect relationship between variables.

6
 Regression shows a line or curve that passes through
all the datapoints on target-predictor graph in such a
way that the vertical distance between the datapoints
and the regression line is minimum.
 The distance between datapoints and line tells
whether a model has captured a strong relationship
or not.

7
8
9
10
 Some examples of regression can be as:
◦ Prediction of rain using temperature and other
factors
◦ Determining Market trends
◦ Prediction of road accidents due to rash driving.

11
 Dependent(target) Variable: The main factor in Regression analysis which
we want to predict or understand.
 Independent(predictor( Variable: The factors which affect(used to predict
) the dependent variables.
 Outliers: Outlier is an observation which contains either very low value or
very high value in comparison to other observed values. An outlier may
hamper the result, so it should be avoided.
 Multicollinearity: If the independent variables are highly correlated with
each other than other variables, then such condition is called
Multicollinearity. It should not be present in the dataset, because it
creates problem while ranking the most affecting variable.
 Underfitting and Overfitting: If our algorithm works well with the training
dataset but not well with test dataset, then such problem is called
Overfitting. And if our algorithm does not perform well even with
training dataset, then such problem is called underfitting.

12
 Regression analysis helps in the prediction of a continuous
variable.
 There are various scenarios in the real world where we need
some future predictions such as weather condition, sales
prediction, marketing trends, etc., for such case we need
some technology which can make predictions more accurately.
 Below are some other reasons for using Regression analysis:
◦ Regression estimates the relationship between the target and the
independent variable.
◦ It is used to find the trends in data.
◦ It helps to predict real/continuous values.
◦ By performing the regression, we can confidently determine the most
important factor, the least important factor, and how each factor is
affecting the other factors.

13
14
 Linear regression is a statistical regression method which
is used for predictive analysis.
 It is one of the very simple and easy algorithms which
works on regression and shows the relationship between
the continuous variables.
 Linear regression shows the linear relationship between
the independent variable (X-axis) and the dependent
variable (Y-axis), hence called linear regression.
 If there is only one input variable (x), then such linear
regression is called simple linear regression. And if there
is more than one input variable, then such linear
regression is called multiple linear regression.

15
 The relationship between variables in the linear regression
model can be explained using the below image.
 Here we are predicting the salary of an employee on the
basis of the year of experience.

16
 Linear regression assume the relationship
between variables can be modelled through
linear equation or an equation of line

17
 In Real world a data point has various important
attributes and they need to be
 catered to while developing a regression model.
(Many independent variables and
 one dependent variable)

18
19
20
 Regression uses line to show the trend of distribution.
 There can be many lines that try to fit the data points in
scatter diagram
 The aim is to find Best fit Line

21
 Best fit line tries to explain the variance in given data. (minimize the
total residual/error)

22
23
 Least Square
 Gradient Descent

24
25

You might also like