Lecture 2
Lecture 2
September 3, 2020
Machine Learning
❑ Classification
Machine Learning Definition
Supervised Unsupervised
0
1
Illustrating Classification Task
67 No
Yes Married
Large 60K
220K No
No Learn
78 No
Yes Small
Divorced 85K
220K Yes
No Model
No
89 No
No Medium
Single 75K
85K Yes
10 No Small 90K Yes
9 No Married 75K No
Model
10
10 No Training
Single Set
90K Yes
10
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
Classification Function
f: X -> Y
Classification Function
f: X -> Y
f(x) = a*x+b
y = 1 if f(x) > 0
y = -1 if f(x) <=0
Training and Test Sets of Data
❑ Train/Test Split
❑ Designate % as training data
❑ Designate % as validation if needed
❑ Rest is test data
❑ Cross Validation
❑ N-fold cross validation
❑ (N-1)/N for training
❑ 1/Nth for testing
Training and Test Sets of Data Creation Cont.
❑ Holdout
❑ Reserve 2/3 for training and 1/3 for testing
❑ Random subsampling
❑ Repeated holdout
❑ Cross validation
❑ Partition data into k disjoint subsets
❑ k-fold: train on k-1 partitions, test on the remaining one
❑ Leave-one-out: k=n
❑ Bootstrap
❑ Sampling with replacement
Classification Error
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Training Data
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Training Data
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Training Data
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Training Data
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Training Data
NO MarSt
Single, Divorced Married Assign Cheat to “No”
TaxInc NO
< 80K > 80K f(x) = y
NO YES
Classification Error
n=2: 2^2 = 4 rows. For each row we can choose T or F: 2^4 functions.
❑ Decision Tree Classifiers
❑ Acknowledgment: based on the notes by Dr. Faisal
ShafaitGerman Research Center for Artificial Intelligence
(DFKI)
Decision Tree Classifiers
0.9
0.8
x < 0.43?
0.7
Yes No
0.6
y
0.3
Yes No Yes No
0.2
:4 :0 :0 :4
0.1 :0 :4 :3 :0
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
Oblique Decision Trees
x+y<1
Class = + Class =
PREDICTED CLASS
Class=Yes Class=No
a: TP (true positive)
b: FN (false negative)
ACTUAL Class=Yes a b
c: FP (false positive)
CLASS
d: TN (true negative)
Class=No c d
Metrics for Performance Evaluation
PREDICTED CLASS
Class=Yes Class=No
ACTUAL Class=Yes a b
CLASS (TP) (FN)
Class=No c d
(FP) (TN)
PREDICTED CLASS
+ - + -
ACTUAL ACTUAL
+ 150 40 + 250 45
CLASS CLASS
- 60 250 - 5 200
Accuracy = (a + d)/N
Cost PREDICTED CLASS
Class=Yes Class=No
Cost = p (a + d) + q (b + c)
Class=Yes
ACTUAL p q = p (a + d) + q (N – a – d)
CLASS
Class=No q p = q N – (q – p)(a + d)
= N [q – (q-p) Accuracy]
Model Complexity
e(T) = 7/24=0.29
C1: 3 C1: 2 C1: 0 C1: 1 C1: 3 C1: 0
C2: 1 C2: 1 C2: 2 C2: 2 C2: 1 C2: 5
e(T) = 4/24=0.167
Model Complexity
e(T) = 7/24=0.29
C1: 3 C1: 2 C1: 0 C1: 1 C1: 3 C1: 0
C2: 1 C2: 1 C2: 2 C2: 2 C2: 1 C2: 5 Let Ω(t) = 0.5
e(T) = 4/24=0.167 error’(T) = 0.33
a
Precision (p) = a: TP (true positive)
a+c
b: FN (false negative)
a
Recall (r) = c: FP (false positive)
a+b
d: TN (true negative)
2rp 2a
F - measure (F) = =
r + p 2a + b + c
https://towardsdatascience.com/multi-class-metrics-made-simple-part-ii-the-f1-score-ebe8b2c2ca1
Cost-Sensitive Measures, Multiclass
https://towardsdatascience.com/multi-class-metrics-made-simple-part-ii-the-f1-score-ebe8b2c2ca1
Cost-Sensitive Measures, Multiclass Cont.
Macro vs Micro Measures
Points not
seen during
training
Generalization Error
❑ Missing Values
❑ Costs of Classification
Underfitting and Overfitting
Overfitting
Underfitting: when model is too simple, both training and test errors are large
Overfitting due to Noise
❑ Report content
❑ What do you see
❑ Why is it the case
❑ Is it important
❑ Does it help to understand the problem you are working on
❑ Does it help to understand the results that you get with
your approach
❑ What did you learn
❑ What do you want others to learn
❑ Analysis and Conclusions are the most important
parts of your report
HW Assignment Report Tips
❑ Decision Stump:
❑ A model consisting of a one-level decision tree.
❑ Only one internal node is immediately connected to the
terminal nodes.
❑ Predicts based on a single input feature.
❑ For continuous features a threshold feature value is selected
to split the attribute.
❑ SimpleCart:
❑ Could produce multi-level decision tree.
❑ Only binary splits on attributes.
Iris Dataset
❑ Size of tree = 4
❑ No. of leaf nodes = 3
❑ Accuracy = 66.66%
❑ 10-fold cross-validation used.
❑ In each cross-validation iteration:
❑ Size of training set = 135 records Test set = 15 records
❑ Model uses only PetalLength for classification.
❑ Petal length split on threshold value 2.45
❑ No record classified as Iris-virginica.
❑ Relatively poor performance in terms of accuracy.
❑ Random forest classifier
Ensemble Methods
Original
D Training data
Step 1:
Create Multiple D1 D2 .... Dt-1 Dt
Data Sets
Step 2:
Build Multiple C1 C2 Ct -1 Ct
Classifiers
Step 3:
Combine C*
Classifiers
Why does it work?