HW 1 Solutions
HW 1 Solutions
HW 1 Solutions
Part (a): What is the loss function for linear regression? Describe by words and formula.
Solution: Loss function for linear regression is sum squared error or mean squared error
Part (b): Why would we use an iterative algorithm for the linear regression problem?
Solution: Closed form equation can be used for simple linear regression problem, but it is
hard to be used to solve multiple linear regression problem.
Part (c): What can happen if the learning rate is too high or too low?
Solution: Learning rate is too small – gradient descent would take long time to converge
and can be very slow
Learning rate is too high – gradient descent can overshoot the minimum. You might jump
across the valley and end up on the other side, possibly even higher up than you were
before. So, the algorithm may fail to converge, or even diverge
Part (d): How does the gradient descent algorithm update the ’s?
Solution:
Problem 2 (Linear Regression and Gradient Descent): You are given a vector a
measurements x and true values y
1 1.5
𝑥 = [2] 𝑦=[2 ]
3 2.5
Part (b): If we start with 0 =0 and 1 =0, what is the initial value for the loss function?
1 2 1
Solution: 2𝑚 ∑(𝑦𝑖 − 𝑦̂)
𝑖 = (1.52 + 22 + 2.52 ) = 2.083
2∗3
Part (c): Compute the next estimate of 0 and 1, after 1 iteration of gradient descent.
Solution: If select the step size 𝛼 = 0.1
1 0.1
𝜃0𝑛𝑒𝑤 = 𝜃0𝑜𝑙𝑑 − 𝛼 [ ∑(𝜃0 + 𝜃1 𝑥𝑖 − 𝑦𝑖 )] = 0 − [−1.5 − 2 − 2.5] = 0.2
𝑚 3
1 0.1
𝜃1𝑛𝑒𝑤 = 𝜃1𝑜𝑙𝑑 − 𝛼 [ ∑(𝜃0 + 𝜃1 𝑥𝑖 − 𝑦𝑖 ) 𝑥𝑖 ] = 0 − [−1.5 ∗ 1 − 2 ∗ 2 − 2.5 ∗ 3] = 0.43
𝑚 3
𝑦 = 0.2 + 0.43𝑥
Problem 3 (Data Concepts): What is data object, data label, and data attribute? Describe
by words and give an example.
Solution: Data object – represents an entity (samples, examples, instances, data points, or
objects). Data label – detected or tagged data objects/samples. Data attribute – a data field
representing a characteristic or feature of a data object.
Part (a): What is Q1, Q3, median, smallest value, largest value, and IQR?
Solution:
2+2 6.8+7.2 9+10
1st method: Q1= =2; Median= =7; Q3= =9.5; Max=11.5; Min=1; IQR=Q3-Q1=7.5
2 2 2
Part (b): Use Box plots to identify if there is any outlier.
Solution: There is no outlier.
Problem 5 (Data Dissimilarity): Consider the following data matrix, Define a 3-by-3
data dissimilarity matrix.
Problem 6 (Principal Component Analysis): Consider the following data matrix. Use
PCA to reduce the dimension by 1.
Part (c): Compute the eigenvalues and eigenvectors of the covariance matrix.
1.22 1
𝜆1 = 998.9, 𝑣1 = [1.22, 1]𝑇 , then we standardized the eigenvector. 𝑣1 = [ , ]𝑇 =
√1.222 +12 √1.222 +12
[0.77339, 0.63393]T
−0.82 1
𝜆2 = 81.1, 𝑣2 = [−0.82, 1]𝑇 , then we standardized the eigenvector. 𝑣1 = [ , ]𝑇 =
√−0.822 +12 √−0.822 +12
[−1.43266, 1.74714]T