Chapter 7: Heteroscedasticity
Chapter 7: Heteroscedasticity
Chapter 7: Heteroscedasticity
Chapter 7: Heteroscedasticity
Overview
This chapter begins with a general discussion of homoscedasticity and
heteroscedasticity: the meanings of the terms, the reasons why the
distribution of a disturbance term may be subject to heteroscedasticity,
and the consequences of the problem for OLS estimators. It continues by
presenting several tests for heteroscedasticity and methods of alleviating
the problem. It shows how apparent heteroscedasticty may be caused
by model misspecification. It concludes with a description of the use of
heteroscedasticity-consistent standard errors.
Learning outcomes
After working through the corresponding chapter in the textbook, studying
the corresponding slideshows, and doing the starred exercises in the
textbook and the additional exercises in this guide, you should be able to:
• explain the concepts of homoscedasticity and heteroscedasticity
• describe how the problem of heteroscedasticity may arise
• explain the consequences of heteroscedasticity for OLS estimators, their
standard errors, and t and F tests
• perform the Goldfeld–Quandt test for heteroscedasticity
• perform the White test for heteroscedasticity
• explain how the problem of heteroscedasticity may be alleviated
• explain why a mathematical misspecification of the regression model
may give rise to a problem of apparent heteroscedasticity
• explain the use of heteroscedasticity-consistent standard errors.
Additional exercises
A7.1
Is the disturbance term in your CES expenditure function heteroscedastic?
Sort the data by EXPPC, regress CATPC on EXPPC and SIZE, and perform
a Goldfeld–Quandt test to test for heteroscedasticity in the EXPPC
dimension. Repeat using the variables in logarithmic form.
A7.2
The observations for the occupational schools (see Chapter 5 in the
textbook) in the figure below suggest that a simple linear regression of
cost on number of students, restricted to the subsample of these schools,
would be subject to heteroscedasticity. Download the data set from the
heteroscedastic data sets folder on the website and use a Goldfeld–
Quandt test to investigate whether this is the case. If the relationship is
heteroscedastic, what could be done to alleviate the problem?
135
20 Elements of econometrics
&267
1
2FFXSDWLRQDOVFKRROV 5HJXODUVFKRROV
A7.3
A researcher hypothesises that larger economies should be more self-
sufficient than smaller ones and that M/G, the ratio of imports, M, to gross
domestic product, G, should be negatively related to G:
M
= b1 + b 2 G + u
G
with β2 < 0. Using data for a sample of 42 countries, with M and G
both measured in US$ billion, he fits the regression (standard errors in
parentheses):
M̂
= 0.37 – 0.000086 G R2 = 0.12 (1)
G
(0.03) (0.000036)
He plots a scatter diagram, reproduced as Figure 7.1, and notices that the
M
ratio tends to have relatively high variance when G is small. He also
G
plots a scatter diagram for M and G, reproduced as Figure 7.2. Defining
GSQ as the square of G, he regresses M on G and GSQ:
M̂ = 7.27 + 0.30 G – 0.000049 GSQ R2 = 0.86 (2)
(10.77) (0.03) (0.000009)
Finally, he plots a scatter diagram for log M and log G, reproduced as
Figure 3, and regresses log M on log G:
logˆM = –0.14 + 0.80 log G R2 = 0.78 (3)
(0.37) (0.07)
Having sorted the data by G, he tests for heteroscedasticity by regressing
specifications (1) – (3) first for the 16 countries with smallest G, and then
for the 16 countries with the greatest G. RSS1 and RSS2, the residual sums
of squares for these regressions, are summarised in the following table.
136
Chapter 7: Heteroscedasticity
0*
*
Figure 7.1
0
*
Figure 7.2
ORJ0
ORJ*
Figure 7.3
137
20 Elements of econometrics
A7.4
A researcher has data on the number of children attending, N, and annual
recurrent expenditure, EXP, measured in US$, for 50 nursery schools in a
US city for 2006 and hypothesises that the cost function is of the quadratic
form
EXP = β1 + β2N + β3NSQ + u
where NSQ is the square of N, anticipating that economies of scale will
cause β3 to be negative. He fits the following equation:
EXPˆ = 17,999 + 1,060 N – 1.29 NSQ R2=0.74 (1)
(12,908) (133) (0.30)
Suspecting that the regression was subject to heteroscedasticity, the
researcher runs the regression twice more, first with the 19 schools with
lowest enrolments, then with the 19 schools with the highest enrolments.
The residual sums of squares in the two regressions are 8.0 million and
64.0 million, respectively.
The researcher defines a new variable, EXPN, expenditure per student, as
EXPN = EXP/N, and fits the equation
EXPNˆ = 1,080 – 1.25 N + 16,114 NREC R2=0.65 (2)
(90) (0.25) (6,000)
where NREC = 1/N. He again runs regressions with the 19 smallest
schools and the 19 largest schools and the residual sums of squares are
900,000 and 600,000.
• Perform a Goldfeld–Quandt test for heteroscedasticity on both of the
regression specifications.
• Explain why the researcher ran the second regression.
• R2 is lower in regression (2) than in regression (1). Does this mean that
regression (1) is preferable?
A7.5
This is a continuation of Exercise A6.5.
• When the researcher presents her results at a seminar, one of the
participants says that, since I and G have been divided by Y, (2) is
less likely to be subject to heteroscedasticity than (1). Evaluate this
suggestion.
138
Chapter 7: Heteroscedasticity
A7.6
A researcher has data on annual household expenditure on food, F, and
total annual household expenditure, E, both measured in dollars, for 400
households in the United States for 2010. The scatter plot for the data is
shown as Figure 7.4. The basic model of the researcher is
F = b 1 + b 2 E + u (1)
where u is a disturbance term. The researcher suspects heteroscedasticity
and performs a Goldfeld–Quandt test and a White test. For the Goldfeld–
Quandt test, she sorts the data by size of E and fits the model for the
subsample with the 150 smallest values of E and for the subsample
with the 150 largest values. The residual sums of squares (RSS) for
these regressions are shown in column (1) of the table. She also fits the
regression for the entire sample, saves the residuals, and then fits an
auxiliary regression of the squared residuals on E and its square. R2 for this
regression is also shown in column (1) in the table. She performs parallel
tests of heteroscedasticity for two alternative models:
F 1 E
= b1 + b 2 + v (2)
A A A
log F = b 1 + b 2 log E + w (3)
A is household size in terms of equivalent adults, giving each adult a
weight of 1 and each child a weight of 0.7. The scatter plot for F / A and
E / A is shown as Figure 7.5, and that for log F and log E as Figure 7.6.
The data for the heteroscedasticity tests for models (2) and (3) are shown
in columns (2) and (3) of the table.
• Perform the Goldfeld–Quandt test for each model and state your
conclusions.
• Explain why the researcher thought that model (2) might be an
improvement on model (1).
• Explain why the researcher thought that model (3) might be an
improvement on model (1).
• When models (2) and (3) are tested for heteroscedasticity using the
White test, auxiliary regressions must be fitted. State the specification
of this auxiliary regression for model (2).
• Perform the White test for the three models.
• Explain whether the results of the tests seem reasonable, given the
scatter plots of the data.
139
20 Elements of econometrics
+RXVHKROGH[SHQGLWXUHRQIRRG
7RWDOKRXVHKROGH[SHQGLWXUH
Figure 7.4
8000
Household expenditure on food
per equivalent adult ($)
6000
4000
2000
0
0 20000 40000 60000
Total household expenditure per equivalent adult ($)
Figure 7.5
ORJKRXVHKROGH[SHQGLWXUHRQIRRG
ORJWRWDOKRXVHKROGH[SHQGLWXUH
Figure 7.6
140
Chapter 7: Heteroscedasticity
A7.7
Explain what is correct, mistaken, confused or in need of further
explanation in the following statements relating to heteroscedasticity in a
regression model:
• ‘Heteroscedasticity occurs when the disturbance term in a regression
model is correlated with one of the explanatory variables.’
• ‘In the presence of heteroscedasticity ordinary least squares (OLS) is an
inefficient estimation technique and this causes t tests and F tests to be
invalid.’
• ‘OLS remains unbiased but it is inconsistent.’
• ‘Heteroscedasticity can be detected with a Chow test.’
• ‘Alternatively one can compare the residuals from a regression using
half of the observations with those from a regression using the other
half and see if there is a significant difference. The test statistic is the
same as for the Chow test.’
• ‘One way of eliminating the problem is to make use of a restriction
involving the variable correlated with the disturbance term.’
• ‘If you can find another variable related to the one responsible for the
heteroscedasticity, you can use it as a proxy and this should eliminate
the problem.’
• ‘Sometimes apparent heteroscedasticity can be caused by a
mathematical misspecification of the regression model. This can
happen, for example, if the dependent variable ought to be logarithmic,
but a linear regression is run.’
26 smallest 26 largest
First regression 7.8 × 1010 54.4 × 1010
Second regression 6.7 × 1010 13.8 × 1010
141
20 Elements of econometrics
Answer:
For both regressions RSS will be denoted RSS1 for the 26 smallest schools
and RSS2 for the 26 largest schools. In the first regression, RSS2/RSS1
= (54.4 × 1010)/(7.8 × 1010) = 6.97. There are 24 degrees of freedom in
each subsample (26 observations, 2 parameters estimated). The critical
value of F(24,24) is approximately 3.7 at the 0.1 per cent level, and so we
reject the null hypothesis of homoscedasticity at that level. In the second
regression, RSS2/RSS1 = (13.8 × 1010)/(6.7 × 1010) = 2.06. There are 22
degrees of freedom in each subsample (26 observations, 4 parameters
estimated). The critical value of F(22,22) is 2.05 at the 5 per cent level,
and so we (just) do not reject the null hypothesis of homoscedasticity at
that significance level.
Why is the problem of heteroscedasticity less severe in the second
regression? The figure in Exercise A7.2 reveals that the cost function is
much steeper for the occupational schools than for the regular schools,
reflecting their higher marginal cost. As a consequence the two sets of
observations diverge as the number of students increases and the scatter is
bound to appear heteroscedastic, irrespective of whether the disturbance
term is truly heteroscedastic or not. The first regression takes no account
of this and the Goldfeld–Quandt test therefore indicates significant
heteroscedasticity. In the second regression the problem of apparent
heteroscedasticity does not arise because the intercept and slope dummy
variables allow separate implicit regression lines for the two types of
school.
Looking closely at the diagram, the observations for the occupational
schools exhibit a classic pattern of true heteroscedasticity, and this would
be confirmed by a Goldfeld–Quandt test confined to the subsample of
those schools (see Exercise A7.2). However the observations for the
regular schools appear to be homoscedastic and this accounts for the fact
that we did not (quite) reject the null hypothesis of homoscedasticity for
the combined sample.
7.6
The file educ.dta in the heteroscedastic data sets folder on the website
contains international cross-sectional data on aggregate expenditure on
education, EDUC, gross domestic product, GDP, and population, POP, for
a sample of 38 countries in 1997. EDUC and GDP are measured in US$
million and POP is measured in thousands. Download the data set, plot a
scatter diagram of EDUC on GDP, and comment on whether the data set
appears to be subject to heteroscedasticity. Sort the data set by GDP and
perform a Goldfeld–Quandt test for heteroscedasticity, running regressions
using the subsamples of 14 countries with the smallest and greatest GDP.
Answer:
The figure plots expenditure on education, EDUC, and gross domestic
product, GDP, for the 38 countries in the sample, measured in $ billion
rather than $ million. The observations exhibit heteroscedasticity. Sorting
them by GDP and regressing EDUC on GDP for the subsamples of 14
countries with smallest and greatest GDP, the residual sums of squares for
the first and second subsamples, denoted RSS1 and RSS2, respectively, are
1,660,000 and 63,113,000, respectively. Hence
566
)
566
The critical value of F(12,12) at the 0.1 per cent level is 7.00, and so we
reject the null hypothesis of homoscedasticity.
142
Chapter 7: Heteroscedasticity
([SHQGLWXUHRQHGXFDWLRQELOOLRQ
*'3ELOOLRQ
7.9
Repeat Exercise 7.6, using the Goldfeld–Quandt test to investigate whether
scaling by population or by GDP, or whether running the regression in
logarithmic form, would eliminate the heteroscedasticity. Compare the results
of regressions using the entire sample and the alternative specifications.
Answer:
Dividing through by population, POP, the model becomes
EDUC 1 GDP u
= b1 + b2 +
POP POP POP POP
Thus the model is still subject to heteroscedasticity at the 0.1 per cent
level. This is evident in Figure 7.8.
('8&323
*'3323
Figure 7.8 Expenditure on education per capita and GDP per capita ($ per capita)
143
20 Elements of econometrics
('8&*'3
*'3
The critical value of F(12,12) at the 5 per cent level is 2.69, so we do not
reject the null hypothesis of homoscedasticity. Could one tell this from
Figure 7.9? It is a little difficult to say.
Finally, we will consider a logarithmic specification. If the true relationship
is logarithmic, and homoscedastic, it would not be surprising that the
linear model appeared heteroscedastic. Sorting the sample by GDP, RSS1
and RSS2 are 2.733 and 3.438 for the subsamples of 14 countries with
smallest and greatest GDP. The F statistic is
566
)
566
144
Chapter 7: Heteroscedasticity
ORJ('8&
ORJ*'3
UHJ('8&*'3*'35(&
6RXUFH_66GI061XPEHURIREV
)
0RGHO_3URE!)
5HVLGXDO_5VTXDUHG
$GM5VTXDUHG
7RWDO_5RRW06(
('8&*'3_&RHI6WG(UUW3!_W_>&RQI,QWHUYDO@
*'35(&_
BFRQV_
UHJ/*((/**'3
6RXUFH_66GI061XPEHURIREV
)
0RGHO_3URE!)
5HVLGXDO_5VTXDUHG
$GM5VTXDUHG
7RWDO_5RRW06(
/*((_&RHI6WG(UUW3!_W_>&RQI,QWHUYDO@
/**'3_
BFRQV_
145
20 Elements of econometrics
sensible interpretation. We will compare this with the output from an OLS
regression that makes no attempt to eliminate heteroscedasticity:
UHJ('8&*'3
6RXUFH_66GI061XPEHURIREV
)
0RGHO_HH3URE!)
5HVLGXDO_5VTXDUHG
$GM5VTXDUHG
7RWDO_H5RRW06(
('8&_&RHI6WG(UUW3!_W_>&RQI,QWHUYDO@
*'3_
BFRQV_
7.10
It was reported above that the heteroscedasticity-consistent estimate of
the standard error of the coefficient of GDP in equation (7.13) was 0.18.
Explain why the corresponding standard error in equation (7.15) ought to
be lower and comment on the fact that it is not.
Answer:
(7.15), unlike (7.13) appears to be free from heteroscedasticity and
therefore should provide more efficient estimates of the coefficients,
reflected in lower standard errors when computed correctly. However the
sample may be too small for the heteroscedasticity-consistent estimator to
be a good guide.
146
Chapter 7: Heteroscedasticity
7.11
A health economist plans to evaluate whether screening patients on
arrival or spending extra money on cleaning is more effective in reducing
the incidence of infections by the MRSA bacterium in hospitals. She
hypothesises the following model:
MRSAi = b 1 + b 2 S i + b 3 C i + u i
where n1 and n2 are the number of available observations and k is the number
of parameters in the regression specification. However this procedure does
not work well for those categories with many zero observations because there
is a tendency for the number of zero observations to be relatively great for
low EXPPC (LOCT being an understandable exception). It would have been
better to have saved the data set under a new name for this exercise, with the
zero observations dropped, and to have identified the smallest and largest
three-eighths properly. However it is doubtful that the outcome would have
been much different.
VRUW(;33&
UHJ)'+23&(;33&6,=(LQLI)'+2!
6RXUFH_66GI061XPEHURIREV
)
0RGHO_3URE!)
5HVLGXDO_5VTXDUHG
$GM5VTXDUHG
7RWDO_5RRW06(
)'+23&_&RHI6WG(UUW3!_W_>&RQI,QWHUYDO@
(;33&_
6,=(_
BFRQV_
UHJ)'+23&(;33&6,=(LQLI)'+2!
6RXUFH_66GI061XPEHURIREV
)
0RGHO_3URE!)
5HVLGXDO_5VTXDUHG
$GM5VTXDUHG
7RWDO_5RRW06(
)'+23&_&RHI6WG(UUW3!_W_>&RQI,QWHUYDO@
(;33&_
6,=(_
BFRQV_
148
Chapter 7: Heteroscedasticity
Goldfeld–Quandt tests
linear logarithmic
n1 n2 RSS1x10 –6
RSS2x10 –6
F RSS1 RSS2 F
FDHO 326 325 65.76 221.57 3.38 40.07 61.95 1.54
FDAW 292 324 5.76 280.94 43.96 240.91 219.69 1.01
HOUS 324 326 192.79 2097.6 10.81 260.60 146.59 1.77*
TELE 320 324 6.05 75.29 12.29 134.51 112.27 1.18*
DOM 136 189 11.74 491.32 30.11 357.60 536.39 2.08
TEXT 151 206 0.13 15.86 89.43 163.28 284.78 2.38
FURN 86 155 7.64 69.07 5.02 175.07 301.58 3.10
MAPP 70 97 0.93 16.60 12.88 79.55 104.63 1.82
SAPP 141 203 0.30 1.09 2.52 172.05 190.50 1.59
CLOT 308 325 12.11 179.26 14.03 299.14 223.20 1.27*
FOOT 246 273 0.28 2.40 7.72 235.30 210.13 1.01*
GASO 283 311 12.20 59.98 4.47 163.61 110.68 1.35*
TRIP 59 173 0.90 122.07 46.26 125.87 250.34 5.83
LOCT 82 52 2.09 2.39 1.80 199.72 126.57 2.49*
HEAL 293 318 68.92 375.78 5.02 536.75 428.11 1.16*
ENT 298 323 15.48 861.87 51.37 298.52 251.60 1.09*
FEES 216 289 1.00 296.56 221.65 310.61 502.18 2.16
TOYS 206 237 3.49 20.25 5.04 298.88 303.10 1.17
READ 255 313 0.37 4.15 9.14 292.09 340.67 1.43
EDUC 107 106 2.98 300.44 101.77 233.77 337.45 1.43
TOB 146 125 4.38 9.09 2.42 148.74 122.19 1.42*
* indicates RSS2 < RSS1
A7.2
Having sorted by N, the number of students, RSS1 and RSS2 are
2.02 × 1010 and 22.59 × 1010, respectively, for the subsamples of the 13
smallest and largest schools. The F statistic is 11.18. The critical value of
F(11,11) at the 0.1 per cent level must be a little below 8.75, the critical
value for F(10,10), and so the null hypothesis of homoscedasticity is
rejected at that significance level.
One possible way of alleviating the heteroscedasticity is by scaling through
by the number of students. The dependent variable now becomes the
unit cost per student year, and this is likely to be more uniform than total
149
20 Elements of econometrics
UHJ81,7&26715(&
6RXUFH_66GI061XPEHURIREV
)
0RGHO_3URE!)
5HVLGXDO_5VTXDUHG
$GM5VTXDUHG
7RWDO_5RRW06(
81,7&267_&RHI6WG(UUW3!_W_>&RQI,QWHUYDO@
15(&_
BFRQV_
1
= 524.8 + 10976 R2 = 0.03
N
(53.9) (12741)
Multiplying through by N, it may be rewritten
= 10976 + 524.8N.
The estimate of the marginal cost is somewhat higher than the estimate of
436 obtained using OLS in Section 5.3 of the textbook.
A second possible way of alleviating the heteroscedasticity is to
hypothesise that the true relationship is logarithmic, in which case the
use of an inappropriate linear specification would give rise to apparent
heteroscedasticity. Scaling through by N, and regressing LGCOST, the
(natural) logarithm of COST, on LGN, the logarithm of N, RSS1 and RSS2
are 2.16 and 1.58. The F statistic is therefore 1.37, and again this is not
significant even at the 5 per cent level. The regression output for this
specification using the full sample is shown.
UHJ/*&267/*1
6RXUFH_66GI061XPEHURIREV
)
0RGHO_3URE!)
5HVLGXDO_5VTXDUHG
$GM5VTXDUHG
7RWDO_5RRW06(
/*&267_&RHI6WG(UUW3!_W_>&RQI,QWHUYDO@
/*1_
BFRQV_
0.909 − 1.000
t= = −1.0 .
0.091
A7.3
• Discuss whether (1) appears to be an acceptable specification, given the
data in the table and Figure 7.1.
Using the Goldfeld–Quandt test to test specification (1) for
heteroscedasticity assuming that the standard deviation of u is
inversely proportional to G, we have )
The critical value of F(14,14) at the 5 per cent level is 2.48, so we just
reject the null hypothesis of homoscedasticity at that level. Figure 7.1
does strongly suggest heteroscedasticity. Thus (1) does not appear to be
an acceptable specification.
The critical value of F(13,13) at the 0.1 per cent level is about 6.4, so
the null hypothesis of homoscedasticity is rejected. Figure 7.2 confirms
the heteroscedasticity.
• Explain what the researcher hoped to achieve by running regression (3).
Heteroscedasticity can appear to be present in a regression in natural
units if the true relationship is logarithmic. The disturbance term in
a logarithmic regression is effectively increasing or decreasing the
value of the dependent variable by random proportions. Its effect in
absolute terms will therefore tend to be greater, the larger the value
of G. The researcher is checking to see if this is the reason for the
heteroscedasticity in the second specification.
• Discuss whether (3) appears to be an acceptable specification, given the
data in the table and Figure 7.3.
Obviously there is no problem with the Goldfeld–Quandt test, since
) . Figure 7.3 looks free from heteroscadasticity.
• What are your conclusions concerning the researcher’s hypothesis?
Evidence in support of the hypothesis is provided by (3) where, with
W , the elasticity is significantly lower than 1. Figures
7.1 and 7.2 also strongly suggest that on balance larger economies
have lower import ratios than smaller ones.
151
20 Elements of econometrics
A7.4
• Perform a Goldfeld–Quandt test for heteroscedasticity on both of the
regression specifications.
The F statistics for the G–Q test for the two specifications are
) and ) .
The critical value of F(16,16) is 2.33 at the 5 per cent level and 5.20 at
the 0.1 per cent level. Hence one would reject the null hypothesis of
homoscedasticity at the 0.1 per cent level for regression 1 and one
would not reject it even at the 5 per cent level for regression 2.
• Explain why the researcher ran the second regression.
He hypothesised that the standard deviation of the disturbance term in
observation i was proportional to Ni: σi = λNi for some λ. If this is the
case, dividing through by Ni makes the specification homoscedastic,
since
u 1 1
Var i = 2 Var (ui ) = 2 (λN i ) = λ2
2
N
i N i N i
A7.5
• When the researcher presents her results at a seminar, one of the
participants says that, since I and G have been divided by Y, (2) is less
likely to be subject to heteroscedasticity than (1). Evaluate this suggestion.
If the restriction is valid, imposing it will have no implications for
the disturbance term and so it could not lead to any mitigation
of a potential problem of heteroscedasticity. [If there were
heteroscedasticity, and if the specification were linear, scaling through
by a variable proportional in observation i to the standard deviation of
ui in observation i would lead to the elimination of heteroscedasticity.
The present specification is logarithmic and dividing I and G by Y does
not affect the disturbance term.]
A7.6
• Perform the Goldfeld–Quandt test for each model and state your
conclusions.
The ratios are 4.1, 6.0, and 1.05. In each case we should look for the
critical value of F(148,148). The critical values of F(150,150) at the 5
per cent, 1 per cent, and 0.1 per cent levels are 1.31, 1.46, and 1.66,
respectively. Hence we reject the null hypothesis of homoscedasticity at
the 0.1 per cent level (1 per cent is OK) for models (1) and (2). We do
not reject it even at the 5 per cent level for model (3).
152
Chapter 7: Heteroscedasticity
153
20 Elements of econometrics
A7.7
• ‘Heteroscedasticity occurs when the disturbance term in a regression model
is correlated with one of the explanatory variables.’
This is false. Heteroscedasticity occurs when the variance of the
disturbance term is not the same for all observations.
• ‘In the presence of heteroscedasticity ordinary least squares (OLS) is an
inefficient estimation technique and this causes t tests and F tests to be
invalid.’
It is true that OLS is inefficient and that the t and F tests are invalid,
but ‘and this causes’ is wrong.
• ‘OLS remains unbiased but it is inconsistent.’
It is true that OLS is unbiased, but false that it is inconsistent.
• ‘Heteroscedasticity can be detected with a Chow test.’
This is false.
• ‘Alternatively one can compare the residuals from a regression using half
of the observations with those from a regression using the other half and
see if there is a significant difference. The test statistic is the same as for
the Chow test.’
The first sentence is basically correct with the following changes
and clarifications: one is assuming that the standard deviation
of the disturbance term is proportional to one of the explanatory
variables; the sample should first be sorted according to the size of
the explanatory variable; rather than split the sample in half, it would
be better to compare the first three-eighths (or one third) of the
observations with the last three-eighths (or one third); ‘comparing
the residuals’ is too vague: the F statistic is F(n’ – k,n’ – k) = RSS2/
RSS1 assuming n’ observations and k parameters in each subsample
regression, and placing the larger RSS over the smaller.
The second sentence is false.
• ‘One way of eliminating the problem is to make use of a restriction
involving the variable correlated with the disturbance term.’
This is nonsense.
• ‘If you can find another variable related to the one responsible for the
heteroscedasticity, you can use it as a proxy and this should eliminate the
problem.’
This is more nonsense.
• ‘Sometimes apparent heteroscedasticity can be caused by a mathematical
misspecification of the regression model. This can happen, for example, if
the dependent variable ought to be logarithmic, but a linear regression is
run.’
True. A homoscedastic disturbance term in a logarithmic regression,
which is responsible for proportional changes in the dependent
variable, may appear to be heteroscedastic in a linear regression
because the absolute changes in the dependent variable will be
proportional to its size.
154