Lecture 7
Lecture 7
Machine Learning
BackPropaga*on for Logis*c
Regression
Last Times:
on training set(sample) .
What we'd really like: popula3on
But:
Gradient Descent.
LLN:
Now let .
• the test set does not have an op-mis-c bias like the training
set(thats why the larger effec-ve M factor)
• once you start fi?ng for things like on the test set, you cant
call it a test set any more since we lose -ght guarantee.
• test set has a cost of less data in the training set and must thus
fit a less complex model.
VALIDATION
• train-test not enough as we fit for on
test set and contaminate it
• thus do train-validate-test
If we dont fit a hyperparameter
• first assume that the valida0on set is ac0ng like a test set.
• valida0on risk or error is an unbiased es0mate of the out of
sample risk.
• Hoeffding bound for a valida0on set is then iden0cal to that of
the test set.
usually we want to fit a hyperparameter
Here we find .
Cross Valida+on considera+ons
complexity penalty
exactly 0.
MLE for Logis+c Regression
• example of a Generalized Linear Model (GLM)
• "Squeeze" linear regression through a Sigmoid func>on
• this bounds the output to be a probability
• What is the sampling Distribu>on?
Sigmoid func,on
BERNOULLI!!
Mul$plying over the samples we get:
Gradient:
Write as:
From Reverse Mode to Back Propaga4on
• Recursive Structure
• Always a vector 3mes a Jacobian
• We add a "cost layer" to $z^4$. The deriva3ve of this layer with
respect to $z^4$ will always be 1.
• We then propagate this deriva3ve back.
Layer Cake
Backpropaga)on
RULE 3: PARAMETERS