STAT3301 - Term Exam 2 - CH11 Study Package
STAT3301 - Term Exam 2 - CH11 Study Package
STAT3301 - Term Exam 2 - CH11 Study Package
A LPM says that the change in the predicted probability for a given change in 𝑋 is the same for all
values of 𝑋 (i.e. is linear), but that doesn’t make sense.
o Using the HDMA example from your textbook, a change in P/I ratio from 0.3 to 0.4 might
have a large effect on the probability of denial.
o However, once the P/I ratio is so large that the loan is very likely to be denied, increasing the
P/I ratio further will have little effect.
LPM predicted probabilities can be <0 or >1 which is nonsensical!
3. Which of the following regressions with a binary dependent variables utilizes the standard normal
cumulative distribution function to ensure that all predicted probabilities lie between 0 and 1?
a. Linear Probability Model
b. Logit Regression Model
c. Probit Regression Model
d. Two Stage Least Squares (TSLS)
Solution = c. Probit Regression Model
4. Describe the fraction correctly predicted measure used to assess the fit for the linear probability
model, probit, and/or logit regression models.
The fraction correctly predicted = fraction of 𝑌’𝑠 for which the predicted probability is > 50% when Yi =
1, or is < 50% when Yi = 0. An advantage of this measure of fit is that it is easy to understand. However,
a (major) disadvantage is that it does not reflect the quality of the predictors: If 𝑌_𝑖=1, the observation is
treated as correctly predicted whether the predicted probability is 51% or 90%.
5. In your textbook you learned that R2 and R̄2 measures of fit do not make sense with LMP, probit,
and/or logit models. What two other specialized measures are used in their place for these types of
models?
6. Draw a graph of the standard normal probability density function and its associated cumulative
distribution function Φ ( z). What regression model uses this distribution?
First, let’s determine if the coefficient associated with the white variable is statistically
significant (at the 5% level of significance). This is determined by the following hypothesis
test:
H 0 : β white=0
H 1 : β white ≠ 0
Looking at the regression output table, we see that the p-value associated with this hypothesis
test is 0.000 which is less than the level of significance of 0.05 or 5%. Therefore, we REJECT
the null hypothesis and conclude that the coefficient associated with the white variable is
statistically significant at a 5% level of significance.
Since the coefficient on the white variable is positive, we can also conclude that being white
seems to have a positive effect in getting your loan approved.
However, these obtained coefficients are not interpreted as marginal effects as in simple OLS.
So, the next step is needed in order to explain the results appropriately.
b. To determine the effect of white on approve we need to calculate the probability density
function of the standard normal distribution as follows:
Pr ¿ ¿
¿ Φ ( 1.3181 )
Looking up the probabilities for a z-value of 1.32 using Table 1 of the Appendix yields:
Pr ¿ ¿
To determine the effect of being non-white on approve we need to calculate the probability
density function of the standard normal distribution as follows:
Pr ¿ ¿
¿ Φ ( 0.5764 )
Looking up the probabilities for a z-value of 0.58 using Table 1 of the Appendix yields:
Pr ¿ ¿
Y^ i= β^ ¿ cons+ β^ white∗white=0.7178+0.1884∗white
We see that if you are non-white the probability to get a loan is equal with the constant
(0.7178), while if you are white is given by: 0.7178+0.1884*(1) = 0.9062. Thus, for this very
simple version the LPM and probit estimation methods yield very similar results.
d. Adding more variables in the Probit model yields the following:
We see that the variable white has been lowered in terms of magnitude, but is still highly
significant (as evidenced by the p-value of 0.000 for the null hypothesis test that this
coefficient is equal to 0 vs. the alternative hypothesis that it is not).
Therefore, the discrimination, even when we control for other characteristics, appears to still
be present.
9. Suppose the probability of getting a student loan is determined by a student’s grade point average
(GPA), age, sex, and level of study – undergraduate, masters, PhD student.
a. Identify the population logit model that could be used to represent this.
b. How would you estimate the probability that a 23-year-old, male, undergraduate student, with
a GPA of 3.2, will obtain a loan?
1
¿ −( ^β0+ ^β1∗3.2 + β^ 2∗23+ ^β3)
1+ e