Supervised Learning - Regression - Annotated

COE101
Introductory
Artificial Intelligence
College of Engineering
Supervised Learning:
Regression
Classical Machine Learning
Supervised
Learning
What is Supervised Learning?
Definition
• The machine has a "supervisor" or a "teacher" who gives the

machine all the answers
• The teacher has already labeled the data into classes
• The machine will learn faster with a teacher
• More commonly used in real-life
• Example: a student is given 50 problems and their solutions
• You choose the function that you think connects the input x to
the output y in y=f(x), but it is incomplete. You want AI to
complete it for you
Types of Supervised Learning
Regression
• Prediction of a specific point on a numeric axis.
Classification
• Prediction of an object's category

Regression vs. Classification
Supervised
Learning -
Regression
What is Regression
Definition
• Find a function (e.g., a line) to model the data (go
through it)
• In Regression we predict a number instead of category
• Examples
• Voltage  Temperature
• Processes, memory  Power consumption
• Protein structure  Energy
• Robot arm controls  Torque at effector
• Location, industry, past losses  Premium
Types of Regression
Linear Regression:
• When we believe the function is a straight line
• The basic idea in linear regression is to add up the effects of each of the
feature variables to produce the predicted value.
Nonlinear Regression:
• When we believe the function is not a line
• For example, the function is curved like a polynomial
Linear
Regression
What is Linear Regression - Simplified
Thinking of linear regression as a shopping bill
• Suppose you go to the grocery store and buy

2.5kg potatoes, 1.0kg carrots, and two bottles
of milk.
• Shopping Bill = Amount of Potatoes (per kg) x
2 € + Amount of Carrots (per kg) x 4 € + Milk
Bottles x 3 €
• Total = 2.5 × 2€ + 1.0 × 4€ + 2 × 3€ = 15€.
• In linear regression, the amount of potatoes,
carrots, and milk are the inputs in the data.
• The output is the cost of your shopping,
which clearly depends on both the price and
how much of each product you buy.
Linear Regression
Simple vs Multiple Regression
• Simple linear regression uses one independent

variable to explain or predict the outcome of the
dependent variable Y
• Y=A X+B
• Multiple linear regression uses two or more
independent variables to predict the outcome
• Y = A X 1 + B X2 + C
Linear Regression
Equation
Y = AX + B
y = mx + b
x= independent variable (known value)

y = dependent variable (predicted value)
A = y-axis intercept
B = slope of the line
Linear Regression
Example #1
Revenue = 20,000 + 3 (Ad Spending)
Y = 20,000 + 3 X
The coefficient A woul d represent total expected revenue when ad spending is zero.
The coefficient B would represent the average change in total revenue when ad spending is increased
by one unit (e.g. one dollar).
If B is negative, it would mean that more ad spending is associated with less revenue.
If B is close to zero, it would mean that ad spending has little effect on revenue.
Depending on the value of B, a company may decide to either decrease or increase their ad spending.
Linear Regression
Example #2
Blood Pressure = 120+ 1000(Dosage (mL))
The coefficient A would represent the expected blood pressure when dosage is zero.
The coefficient B would represent the average change in blood pressure when dosage is increased by
one unit.
If B is negative, it would mean that an increase in dosage is associated with a decrease in blood
pressure.
If B is close to zero, it would mean that an increase in dosage is associated with no change in blood
pressure.
If B is positive, it would mean that an increase in dosage is associated with an increase in blood
pressure.
Depending on the value of B, researchers may decide to change the dosage given to a patient.
Linear Regression
Example #3
Crop yield = A + B(amount of fertilizer ~ X1) + C(amount of water ~ X2)
The coefficient A would represent the expected crop yield with no fertilizer or water.
The coefficient B would represent the average change in crop yield when fertilizer is increased by one
unit, assuming the amount of water remains unchanged.
The coefficient C would represent the average change in crop yield when water is increased by one
unit, assuming the amount of fertilizer remains unchanged.
Depending on the values of B and C, the scientists may change the amount of fertilizer and water used
to maximize the crop yield.
Linear Regression
Example #4
Points Scored = A + B(yoga sessions~ X1) + C(weightlifting sessions~X2)
The coefficient A would represent the expected points scored for a player who participates in zero
yoga sessions and zero weightlifting sessions.
The coefficient B would represent the average change in points scored when weekly yoga sessions is
increased by one, assuming the number of weekly weightlifting sessions remains unchanged.
The coefficient C would represent the average change in points scored when weekly weightlifting
sessions is increased by one, assuming the number of weekly yoga sessions remains unchanged.
Depending on the values of B and C, the data scientists may recommend that a player participates in
more or less weekly yoga and weightlifting sessions in order to maximize their points scored.
Evaluation
Regression
Loss Functions
In Machine Learning, our main goal is to minimize the error which is defined by the
Loss Function.
Loss Functions
Root Mean
Sum of Errors Sum of Absolute Sum of Squared Mean Squared
Squared Errors
(SE) Errors (SAE) Errors (SSE) Errors (MSE)
(RMSE)
Linear Regression
Sum of Errors (SE)
Error will be the difference in the predicted value and the actual value. (X=5,Y_actual=7)
(X=5,Y_predicted= 2(X)+ 10 = 20) comes from data/table
From the line (AI) Golden Truth, Actual, True
= 13
SE = (40 – 45) + (50 – 50) + (65 – 60)
SE = -5 + 0 + 5 = 0 (Loss)  Misleading me to believe my AI is great when its
not
Linear Regression
Sum of Absolute Errors (SAE)
Take the absolute values of the errors for all iterations.
SAE = | 40 – 45| + | 50 – 50 | + | 65 – 60|

SAE = 5 + 0 + 5 = 10
Linear Regression
Sum of Squared Errors (SSE)
Take the squares instead of the absolutes. The loss function will now become:
SSE = (40 – 45)^2 + (50 – 50 )^2 + (65 – 60 )^2

SSE = 25 + 0 + 25 = 50
Linear Regression
Mean Squared Errors (MSE)
We take the average or mean of SSE. So more the data, lesser will be the aggregated
error, MSE.
MSE = (1/3) [ (40 – 45)^2 + (50 – 50 )^2 + (65 – 60 )^2 ]

MSE = 25 + 0 + 25 = 50/3 = 16.67
Linear Regression
Root Mean Squared Error (RMSE)
We take the root of the MSE — which is the Root Mean Squared Error:.
RMSE = sqrt( (1/3) [ (40 – 45)^2 + (50 – 50 )^2 + (65 – 60 )^2 ] )

RMSE = sqrt(25 + 0 + 25) = sqrt(50/3) = sqrt(16.67) = 4.08
Always use this one!
Linear Regression – 1. Representation
Linear Regression # X (Age) Y (Cats)
Example 1 25 2
2 30 2
3 19 1
4 5 1
5 80 5
6 70 6
7 65 4
8 28 2
9 42 3
10 39 3
11 12 2
12 55 4
13 13 1
14 45 2
15 22 1
Linear Regression – 2. Optimization
Linear Regression # X (Age) Y (Cats) XY
Example 1 25 2 50 625 4
2 30 2 60 900 4
3 19 1 19 361 1
1. Age ( independent variable)  x 4 5 1 5 25 1
2. Cats ( dependent variable)  y 5 80 5 400 6400 25
6 70 6 420 4900 36
3. Product of “x” and “y” values  XY 7 65 4 260 4225 16
4. Square of “x” value X^2 8 28 2 56 784 4
9 42 3 126 1764 9
5. Square of “y” value  Y^2 10 39 3 117 1521 9
11 12 2 24 144 4
12 55 4 220 3025 16
13 13 1 13 169 1
14 45 2 90 2025 4
15 22 1 22 484 1
Example 1 25 2 50 625 4
2 30 2 60 900 4
3 19 1 19 361 1
1. Age ( independent variable)  x 4 5 1 5 25 1
2. Cats ( dependent variable)  y 5 80 5 400 6400 25
6 70 6 420 4900 36
3. Product of “x” and “y” values  XY 7 65 4 260 4225 16
8 28 2 56 784 4
4. Square of “x” value X^2 9 42 3 126 1764 9
5. Square of “y” value  Y^2 10 39 3 117 1521 9
11 12 2 24 144 4
6. Total of the values of the numbers in 12 55 4 220 3025 16
each column  Sum 13 13 1 13 169 1
14 45 2 90 2025 4
15 22 1 22 484 1
Sum 550 39 1882 27352 135
Example 1 25 2 50 625 4
2 30 2 60 900 4
3 19 1 19 361 1
4 5 1 5 25 1
5 80 5 400 6400 25
6 70 6 420 4900 36
7 65 4 260 4225 16
8 28 2 56 784 4
9 42 3 126 1764 9
10 39 3 117 1521 9
11 12 2 24 144 4
12 55 4 220 3025 16
13 13 1 13 169 1
14 45 2 90 2025 4
15 22 1 22 484 1
Sum 550 39 1882 27352 135
𝑦 = 𝐴+ 𝐵𝑥
Linear Regression - Optimization
Example 1 25 2 50 625 4
2 30 2 60 900 4
3 19 1 19 361 1
4 5 1 5 25 1
5 80 5 400 6400 25
6 70 6 420 4900 36
7 65 4 260 4225 16
8 28 2 56 784 4
A = 0.29344962 9 42 3 126 1764 9
10 39 3 117 1521 9
11 12 2 24 144 4
12 55 4 220 3025 16
13 13 1 13 169 1
14 45 2 90 2025 4
15 22 1 22 484 1
Sum 550 39 1882 27352 135
B = 0.0629059
Example 1 25 2 50 625 4
2 30 2 60 900 4
3 19 1 19 361 1
4 5 1 5 25 1
A = 0.29344962 5 80 5 400 6400 25
6 70 6 420 4900 36
B = 0.0629059 7 65 4 260 4225 16
8 28 2 56 784 4
9 42 3 126 1764 9
10 39 3 117 1521 9
y = A + Bx 11 12 2 24 144 4
12 55 4 220 3025 16
y = 0.293 + 0.0629x 13
14
13
45
1
2
13
90
169
2025
1
4
15 22 1 22 484 1
Sum 550 39 1882 27352 135
Goodness of Fit:
Overfitting,
Underfitting, and
Generalization
Linear Regression – 3. Evaluation
Linear Regression Y (Cats) y
# X (Age)
(Actual) (Predicted)
Example 1 25 2 1.8655 0.01809
2 30 2 2.18 0.0324
y = A + Bx 3 19 1 1.4881 0.238242
4 5 1 0.6075 0.154056
y = 0.293 + 0.0629x 5 80 5 5.325 0.105625
6 70 6 4.696 1.700416
7 65 4 4.3815 0.145542
8 28 2 2.0542 0.002938
MSE = 0.344435 9 42 3 2.9348 0.004251
10 39 3 2.7461 0.064465
11 12 2 1.0478 0.906685
12 55 4 3.7525 0.061256
RMSE = sqrt(0.344435) = 0.586885849 13 13 1 1.1107 0.012254
RMSE 0.59 14 45 2 3.1235 1.262252
15 22 1 1.6768 0.458058
Mean 0.344435
Overfitting
Which one is the best?
Age Versus Cat Ownership Age Versus Cat Ownership Age Versus Cat Ownership
7
7 7
6
6 6
5
5 5
Number of Cats
Number of Cats
Number of Cats
4
4 4
3
3 3
2
2 2
1
1 1
0
0 10 20 30 40 50 60 70 80 90 0 0
0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90
Age (yrs) Age (yrs) Age (yrs)
Is it the one with the best fit to the data?
Good Fit Underfitting Overfitting
7
7 7
6
6 6
5
5 5
Number of Cats
Number of Cats
Number of Cats
4
4 4
3
3 3
2
2 2
1
1 1
0
0 10 20 30 40 50 60 70 80 90 0 0
0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90
Goodness of Fit
How well is it going to predict future data?
7
7 7
6
6 6
5
5 5
Number of Cats
Number of Cats
Number of Cats
4
4 4
3
3 3
2
2 2
1
1 1
0
0 10 20 30 40 50 60 70 80 90 0 0
0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90
Generalization
What is Generalization
• For any real-world problem with inputs and output data, we can
map these inputs to the output using a function
• The goal of supervised machine learning model is to produce a
model that
• understand the function between input and output for
training data but also
• generalize this function so that it can work with new
unseen data with good accuracy
Generalization
Example
What is Model Prediction Error?
• High Model Prediction Error means the model has created a function that fails to
understand the relationship between input and output data
• Low Model Prediction Error means the model has created a function that has
understood the relationship between input and output data
What is Performance Variance?

• The variance of the machine learning model is the amount by which its performance
varies in the future
• Low Performance Variance means, the performance of the machine learning model
does not vary much with the different data set
• High Performance Variance means, the performance of the machine learning model
varies considerably with different data set
Good Fit Model
Well-trained model
• A well-trained model should have low variance and low error. This is also known as
Good Fit
• A good fit function is actually very close to the actual true function that generalizes the
data distribution
• This means a good fit model should be generalized enough to work with any unseen
data (low performance variance) and at the same time should produce low prediction
error (low model prediction error)
• A good fit model is what we try to achieve during the training phase
Model Prediction Error Performance Variance
Goodness of Fit? (How bad did I score myself at (How different is my own scoring at
home?) home from that of my teacher’s in the
exam?)
Good Fit I solve problems. Many times I get I go to the exam. I believed at home I
(Our Target) the right answers. Sometimes I do can score 80 from my own assessment
some errors. and I get 78 in the test
 Low Model Prediction Error  Low Performance Variance
Overfitting I memorized 10 problems and their I go to the exam. I thought I will get
Dangerous (Misleading. I can solutions. I do 0 mistakes in any of 100%. I got 60 because I saw problems I
Because discover this after the these problems did not memorize
Undetectable product is out)  Low Model Prediction Error  High Performance Variance
Underfiting I seem to have studied and I go to the exam and do the same kind
(I can discover this developed some picture. But I still of performance I do at home (same
Not a big deal and solve it before do a lot of mistakes. I think I will get mistakes) I get 58%
Since it is product is out) 60%  High Model Prediction Error  Low Performance Variance
Detectable
Overfitting & Underfitting
How to detect them? How to handle them?
• Overfitting introduces the problem of high
variance and underfitting results in high
Handle Handle
model prediction error and thus resulting Overfitting Underfitting
in a bad model.
• We can identify these models during Increase Increase
training and testing phase itself. Training Data Training Data
• If a model is showing high accuracy
during the training phase but fails to Reduce Increase
show similar accuracy during the Model Complexity
testing phase it indicates overfitting Complexity of Model
• If a model fails to show satisfactory
accuracy during the training phase Remove
itself it means the model is Noise from
underfitting. Data
Overfitting vs. Underfitting
Example: A class consisting of 3 students & a Professor
Example
Example
Example
Example
Example
Example
Example
Detection through Visual Inspection
Cross Validation
for Regression
Cross Validation for Regression
Techniques
Leave
K-Fold
One Out
Test Set Cross
Cross
Method Validatio
Validatio
n
n
Test Set Method
Cross Validation Methods
Test Set Method
1. Randomly choose 30% of the data to be in a test set.

2. The remainder is a training set.
3. Perform your regression on the training set.
4. Estimate your future performance with the test set.
Test Set Method
1. Randomly choose 30% of the data to be in a test set.

2. The remainder is a training set.
3. Perform your regression on the training set.
Test Set Method – the Wrong Way – Why? # X (Age) Y (Cats)
1 25 2
2 30 2
1. Choose 30% of the data to be in a test set. 3 19 1
Training
4 5 1
2. The remainder is a training set. 5 80 5
3. Perform your regression on the training set. 6 70 6
7 65 4
4. Estimate your future performance with the test set. 8 28 2
9 42 3
10 39 3
11 12 2
12 55 4
13 13 1
14 45 2
Test
15 22 1
Y
Test Set Method # X (Age)
(Cats)
XY
1 25 2 50 625 4
2 30 2 60 900 4
A = 0.504657584 3 19 1 19 361 1
4 5 1 5 25 1
B = 0.061322329
Training
5 80 5 400 6400 25
6 70 6 420 4900 36
7 65 4 260 4225 16
y = A + Bx 8 28 2 56 784 4
9 42 3 126 1764 9
y = 0.505 + 0.0613x 10 39 3 117 1521 9
11 12 2 24 144 4
12 55 4
13 13 1
14 45 2
15 22 1
Test
Cross Validation Methods # X (Age) Y (Cats) y (Predicted)
Test Set Method 1 25 2
2 30 2
3 19 1
A = 0.504657584 4 5 1
5 80 5
Training
B = 0.061322329 6 70 6
7 65 4
8 28 2
y = A + Bx 9
10
42
39
3
3
y = 0.505 + 0.0613x 11
12
12
55
2
4 3.877386 0.015034
13 13 1 1.301848 0.091112
14 45 2 3.264162 1.598107
15 22 1 1.853749 0.728887
Test
Mean Squared Error (MSE) 0.608285
Test Set Method – the Right Way # X (Age) Y (Cats)
2 30 2
1. Randomly choose 30% of the data to be in a test set. 9 42 3
Training
4 5 1
2. The remainder is a training set. 7 65 4
3. Perform your regression on the training set. 6 70 6
15 22 1
5 80 5
8 28 2
13 13 1
11 12 2
3 19 1
1 25 2
Test
12 55 4
10 39 3
14 45 2
Test Set Method
Pros & Cons
Very Simple
Pros
Can then simply choose the method with the best test-set
score
Wastes Data: we get an estimate of the best method to

apply to 30% less data
Cons
If we don’t have much data, our test-set might just be
lucky or unlucky (“test-set estimator of performance has
high variance”)
K-Fold
K-Fold Cross Validation
Cross Validation Methods Fold #1
5-Fold Cross Validation #
1
X (Age) Y (Cats)
25 2
Test
2 30 2
1. Split the data into 5 groups. 3 19 1
2. For each unique group: 4 5 1
5 80 5
1. Take the group as a hold out or test data set 6 70 6
2. Take the remaining groups as a training data 7 65 4
set 8 28 2
Training
3. Perform your regression on the training set 9 42 3
and evaluate it on the test set 10 39 3
11 12 2
12 55 4
13 13 1
14 45 2
15 22 1
Fold #1
5-Fold Cross Validation 1 25 2 1.932722 0.004526
2 30 2 2.239749 0.057479
3 19 1 1.564291 0.318424
A = 0.39759 4 5 1
B = 0.061405 5 80 5
6 70 6
7 65 4
8 28 2
y = A + Bx 9
10
42
39
3
3
11 12 2
y = 0.39759 + 0.061405x 12 55 4
13 13 1
14 45 2
15 22 1
Fold #2
5-Fold Cross Validation 1 25 2
2 30 2
3 19 1
A = 0.41913 4 5 1 0.697237 0.091666
5 80 5 4.868839 0.017203
B = 0.055621 6 70 6 4.312626 2.847232
7 65 4
8 28 2
9 42 3
y = A + Bx 10 39 3
11 12 2
y = 0.41913 + 0.055621x 12 55 4
13 13 1
14 45 2
15 22 1
5-Fold Cross Validation # X (Age) Y (Cats)
Test Training
1 25 2
2 30 2
5 80 5
set 8 28 2
11 12 2
Training
12 55 4
13 13 1
14 45 2
15 22 1
1
X (Age) Y (Cats)
25 2
2 30 2
Training
5 80 5
set 8 28 2
Test
11 12 2
12 55 4
13 13 1
Training
14 45 2
15 22 1
Fold #3
2 30 2
3 19 1
A = 0.264577 4 5 1
5 80 5
B = 0.064639 6 70 6
7 65 4 4.466095 0.217244
8 28 2 2.074462 0.005545
y = A + Bx 9
10
42
39
3
3
2.979404 0.000424
11 12 2
y = 0.264577 + 0.064639x 12 55 4
13 13 1
14 45 2
15 22 1
1
X (Age) Y (Cats)
25 2
2 30 2
5 80 5
Training
set 8 28 2
11 12 2
12 55 4
Training Test
13 13 1
14 45 2
15 22 1
Fold #4
2 30 2
3 19 1
A = 0.060635 4 5 1
5 80 5
B = 0.065929 6 70 6
7 65 4
8 28 2
y = A + Bx 9 42 3
10 39 3 2.631858 0.135529
y = 0.505 + 0.0613x 11
12
12
55
2
4
0.851781
3.686718
1.318408
0.098146
13 13 1
14 45 2
15 22 1
1
X (Age) Y (Cats)
25 2
2 30 2
Training
5 80 5
set 8 28 2
11 12 2
12 55 4
13 13 1
14 45 2
15 22 1
Test
Fold #5
2 30 2
3 19 1
A = 0.50274 4 5 1
5 80 5
B = 0.061632 6 70 6
7 65 4
8 28 2
9 42 3
y = A + Bx 10 39 3
11 12 2
y = 0.50274 + 0.061632x 12
13
55
13
4
1 1.303958 0.092391
14 45 2 3.276188 1.628655
15 22 1 1.858648 0.737276
X Y
Overall Test MSE #
(Age) (Cats)
1 25 2
2 30 2 0.127
Fold5 Fold4 Fold3 Fold2 Fold1

3 19 1
4 5 1
5 80 5 0.985
6 70 6
7 65 4
8 28 2 0.074
9 42 3
10 39 3
11 12 2 0.517
12 55 4
13 13 1
14 45 2 0.819
15 22 1
Leave One Out Cross Validation : n-Fold CV
Model 1 (A1, B1) Performance RMSE 1

RMSE (Average, Standard Deviation)

LOOCV
Activity
A bookstore needs to know how many Number of students Books
books to buy for the new students
5 6
based on previous available data:
3 4
Perform LOOCV on the dataset shown here. 7 10

Find out A & B at each iteration 6 8
4 4
LOOCV
Activity
A = -1.5 Number Actual Predicted
of Number Number
B = 1.6 students of Books of Books
Y y
5 6 6.5 0.5
y = A + Bx 3 4
y = -1.5 + 1.6 x 7 10
6 8
4 4
LOOCV
Activity
A = -4 Number Actual Predicted
of Number Number
B=2 students of Books of Books
Y y
5 6
y = A + Bx
3 4 2 2
y = -4 + 2 x
7 10
6 8
4 4
LOOCV
Activity
of Number Number
Y y
5 6
y = A + Bx
3 4
y = -1.6 + 1.6x
7 10
6 8 8 0
4 4
LOOCV
Activity
of Number Number
Y y
5 6
y = A + Bx
3 4
y = -0.8 + 1.4 x
7 10 9 1
6 8
4 4
LOOCV
Activity
Number of Actual Predicted
students Number Number
of Books of Books
Y y
5 6 6.5 0.5
3 4 2 2
7 10 9 1
RMSE Mean 0.928
6 8 8 0 RMSE Std 0.749
4 4 5.14 1.14
LOOCV
Activity
Number Actual Predicted
A = -0.8 of Number Number
students of Books of Books
B = 1.48 Y y
5 6
y = A + Bx 3 4
y = -0.8 + 1.48x 7 10
6 8
4 4 5.14 1.14
LOOCV
For k=1 to N
1. Let be the kth record
2. Temporarily remove from the dataset.
3. Train on the remaining N-1 data points
4. Note your error
When you’ve done all points, report the mean error.

LOOCV
MSE = 2.12
LOOCV
MSE = 0.962
LOOCV
MSE = 3.33
Test Set vs. LOOCV vs. K-Fold
Cross Validation Method Disadvantages Advantages When to Use?
Test Set Variance: unreliable estimate of Cheap Lots of data available

future performance for testing
Leave One Out Expensive Doesn’t waste data Very limited data
available
K-Fold Less Expensive that LOOCV Doesn’t waste data Somewhere in
between
Nonlinear
Regression
Nonlinear Regression
Popular nonlinear regression models
Exponential Model
𝑦 = 𝑎 ⅇ𝑏𝑥
Power Model
𝑏
𝑦 =𝑎 𝑥
Saturation Growth Model
𝑎𝑥
𝑦 =
𝑏 +𝑥
Polynomial Model
𝑦 =𝑎 0 + 𝑎 1 𝑥 + … + 𝑎 𝑚 𝑥 𝑚
Regression Models
Advantages & Disadvantages
Regression Model Advantages Disadvantages
Linear Regression • Works well irrespective of the dataset • The assumptions of linear
size regression
• Gives information about the relevance • Linear Regression is
of features susceptible to over-fitting
• Linear Regression is simple to
implement and easier to interpret the
output coefficients.
Polynomial Regression • Works on any size of the dataset • We need to choose the right
• Works very well on nonlinear problems polynomial degree for good
model prediction error/
variance tradeoff
Breakout Session
Linear Regression
Class Activity
Suppose that an extensive study is carried out, and it is found that in a particular country, the life
expectancy (the average number of years that people live) among non-smoking women who don't eat any
vegetables is 80 years. Suppose further that on the average, men live 5 years less. Also take the numbers
mentioned above: every cigarette per day reduces the life expectancy by half a year, and a handful of
veggies per day increases it by one year.
Calculate the life expectancies for the following example cases:
For example, the first case is a male (subtract 5 years), smokes 8 cigarettes per day (subtract 8 × 0.5 = 4
years), and eats two handfuls of veggies per day (add 2 × 1 = 2 years), so the predicted life expectancy is
80 - 5 - 4 + 2 = 73 years.
Male Life Expectancy = 75 - 0.5 NoCig + 1 x HandfulVeggies

Female Life Expectancy = 80 - 0.5 NoCig + 1 x HandfulVeggies
Linear Regression
Class Activity
Your task: Predict the correct value as an integer (whole number) for the missing sections A, B, and C.
Smoking (cigarettes per Vegetables (handfuls

Gender Life expectancy (years)
day) per day)
male 8 2 73
male 0 6 A (75-0+1x6 = 81)
female 16 1 B (80-0.5*16+1x1 = 73)
female 0 4 C (80-0+4*1 = 84)
Male Life Expectancy = 75 - 0.5 NoCig + 1 x HandfulVeggies

Female Life Expectancy = 80 - 0.5 NoCig + 1 x HandfulVeggies
Breakout Session
Linear Regression
Activity
A bookstore needs to know how many Number of students Books
books to buy for the new students
5 6
based on previous available data?
3 4
7 10
6 8
4 4
5 7
Linear Regression
Activity Students Books Students * Books
x y xy
5 6 = 25 5*6 = 30
3 4 =9 3*4 = 12
7 10 = 49 7*10 = 70
6 8 = 36 6*8 = 48
4 4 = 16 4*4 = 16
-1.5
5 7 = 25 5*7 = 35
30 39 160 211
5 6.5 n=6
Linear Regression
Activity
Students Books Students * Books
SumX SumXsquared SumY SumXY 469
x y xy
10241 5624 107991
5 6 = 25 5*6 = 30
3 4 =9 3*4 = 12
7 10 = 49 7*10 = 70
6 8 = 36 6*8 = 48
4 4 = 16 4*4 = 16
1.6
5 7 = 25 5*7 = 35
30 39 160 211
5 6.5 n=6

Supervised Learning - Regression - Annotated

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Supervised Learning - Regression - Annotated

Uploaded by

Copyright:

Available Formats

COE101

• The machine has a "supervisor" or a "teacher" who gives the

• Prediction of a specific point on a numeric axis.

• Prediction of an object's category

• Suppose you go to the grocery store and buy

• Simple linear regression uses one independent

x= independent variable (known value)

SAE = | 40 – 45| + | 50 – 50 | + | 65 – 60|

SSE = (40 – 45)^2 + (50 – 50 )^2 + (65 – 60 )^2

MSE = (1/3) [ (40 – 45)^2 + (50 – 50 )^2 + (65 – 60 )^2 ]

RMSE = sqrt( (1/3) [ (40 – 45)^2 + (50 – 50 )^2 + (65 – 60 )^2 ] )

What is Performance Variance?

1. Randomly choose 30% of the data to be in a test set.

1. Randomly choose 30% of the data to be in a test set.

Wastes Data: we get an estimate of the best method to

Fold5 Fold4 Fold3 Fold2 Fold1

Model 1 (A1, B1) Performance RMSE 1

Model 2 (A2, B2) Performance RMSE 2

Model 3 (A3, B3) Performance RMSE 3

Model 4 (A4, B4) Performance RMSE 4

Model 5 (A5, B5) Performance RMSE 5

Model 7 (A7, B7) Performance RMSE 7

Model 8 (A8, B8) Performance RMSE 8

Model 9 (A9, B9) Performance RMSE 9

Model 10 (A10, B10) Performance RMSE 10

Perform LOOCV on the dataset shown here. 7 10

When you’ve done all points, report the mean error.

Test Set Variance: unreliable estimate of Cheap Lots of data available

Male Life Expectancy = 75 - 0.5 NoCig + 1 x HandfulVeggies

Smoking (cigarettes per Vegetables (handfuls

Male Life Expectancy = 75 - 0.5 NoCig + 1 x HandfulVeggies

You might also like