Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

ACADEMIA DE STUDII ECONOMICE

BUCURETI
FACULTATEA DE ADMINISTRAREA AFACERILOR
cu predare n limbi strine

Econometrics project
Prof. coord.: Serban Daniela, Phd.

Vlceanu Letiia-Gabriela

FABIZ, GRP. 131


Contents
Introduction............................................................................................................................. 3
Hypothesis Testing.................................................................................................................... 3
Simple Regression Model.......................................................................................................... 5
Identify the variables:.......................................................................................................... 5
Identify the ecuation:........................................................................................................... 5
Studying the intensity of the relationship between the two variables:..................................5
Regression coefficient and intercept interpretation.............................................................6
The validity of the model.....................................................................................................6
Making the inference.......................................................................................................... 7
Interpreting the assumptions...............................................................................................7
Multiple regression model..........................................................................................................9
Identify the variables:.......................................................................................................... 9
Identify the ecuation:........................................................................................................... 9
Studying the intensity of the relationship between the three variables:...............................9
Regression coefficient and intercept interpretation...........................................................10
The validity of the model...................................................................................................10
Making the inference.........................................................................................................10
Interpreting the assumptions.............................................................................................11
Correlation between the regressors..................................................................................12
Conclusion............................................................................................................................. 12
Appendix............................................................................................................................... 13
Introduction
This project is about conducting a study about the real consumption expenditure and the level
of income in the USA for a period of 31 years, with data collected from the website
https://fred.stlouisfed.org/.
This website presents and tracks over 424.000 indicators, both from USA and from the whole
world. Basically, it helps the visitors to construct an economic research more easily.
The purpose in this project is to determine if there is a correlation between the level of real
personal income and the consumption expenditure using the simple regression model. After
adding another regressor, in our case, the year, we will determine the relationship between all
the three variables using the multiple regression model.
To explain more clearly the data, we have the following variables:

Dependent variable Yd: Real Consumption Expenditure. Expenditure is the amount


on money spent, in our situation, on goods and services for a living;
The first independent variable Xd: Real Personal Income. Personal income refers to
an individual's total earnings from wages, investment enterprises, and other ventures
(Personal Income, 2016);
The second independent variable: CPI, an index which measures changes in the
price level of market basket of consumer goods and services purchased by
households.

The goal oft he project is to discover the relationship between these three variables. One
believes that if an individual has a higher income, his or her level of consumption expenditure
will, also, increase, while the CPI is also influencing the level of the consumption.
The first chapter of this case study consists on two hypothesis testing, the second will
describe the simple regression model, while the third chapter will present the multiple
regression model. Finally, the last chapter comprises the analysis of the residuals.

Hypothesis Testing
1) According to the Fred site, the average Real Personal Income in 1970 was in value of
2921.1 billion $. However, according to the OECD website, a study was conducted
for 31 countries and the results a mean of 3000.1 bllion $ with a standard deviation of
785.2. We would like to discover if the data collected from the Fred website are
reliable or if the ones from OECD are to be trusted.

1.1) Define the hypothesis


H0: or the null hypothesis: The average income in the year 1970 was 2921.1 billion $.;
H1: or the alternative hypothesis: The average income recorded in the year 1970 is
higher and it is equal with 3000.1 billion $.

1.2) Formulate the hypothesis


H0: =2921.1;
H1: >2921.1.
1.3) Establish
We choose =5%.
1.4) Establish the rejection region

RR= (+1.645; +).

1.5) We compute Zcalculated


x 3000.12921.1

Zcalc=
s2
n
=
(785.2)2
31
= 0.57 billion $.

1.6) Comment the results


As one can see, the result is out of the the rejection region, falling into the non-rejection
region. Therefore we dont have enough evidence to reject H0.
1.7) Conclusion
Due to the fact that Zcalculated falls into the NRR, we accept the data presented by the FRED
webiste.

2) According to the FRED website, between the years 1980 and 1984, the average
consumption was 4836.64 billion $, with a standard deviation of 142.5. Also, fort he
period of 1985 and 1989 the average consumption was 5817.4 billion $, with a
standard deviation of 170.4. In both of cases, we have samples with size of 5 years.
We shall conduct to see if in the first period the average consumption was lower than
in the second period.

2.1) Defining the hypothesis


H0: or the null hypothesis: There is no difference between the two periods;
H1: or the alternative hypothesis: The average consumption in the first period is lower than
the one in the second period.

2.2)Formulize the hypothesis


H0: 1=2; 2-1=1-2=0;
H1: 1> 2; 1 - 2 <0
2.3) Establish
We choose =5%.
2.4) Establish the rejection region
RR= (-; -1.645)

2.5) We compute Zcalculated

x 1x 2( 1 2) 4836.645817.4


2 2
Zcalcultated = s s
1 2 = 29036.16+20306.25 = -9.87 billion $
+
n1 n 2 5

2.6)Comment the result


The result falls into the rejection region, so one can reject, in 95% of cases, H0.
2.7)Conclusion
Due to the fact that the result falls into the rejection, it means that the average consumption
recorded in the first period is significally lower than the one presented in the second period.

Simple Regression Model


Identify the variables:
Dependent variable Yd: Real Consumption Expenditure.
independent variable Xd: Real Personal Income.
Identify the ecuation:
The following equation will be one of first degree: = 0+1*X+.

Consumption expenditure (Xd), expressed in billion $


10000
9000
f(x) = 1.11x
8000 R = 1
7000
6000
5000
4000
3000
2000
1000
0
2000 3000 4000 5000 6000 7000 8000 9000

Figure 1 - Scatter diagram

CoefficientStandard P- Lower Upper Lower Upper


s Error t Stat value 95% 95% 95.0% 95.0%
52.4198 2.75E 384.597
Intercept 491.8085 5 9.382104 -10 384.5979 599.0191 9 599.0191
Income (Yd), expressed 0.01005 1.41E 1.00109
in billion $ 1.021649 1 101.6458 -38 1.001093 1.042206 3 1.042206
Table 1

From this table above we can extract the equation:

Real Consumption = 491.8+1.02*(Income) +


Studying the intensity of the relationship between the two variables:

Regression Statistics
Multiple R 0.9986
R Square 0.997201
Adjusted R Square 0.997104
Standard Error 83.56681
Observations 31
Table 2

With a Multiple R = 0.99, one can say that we have a strong intensity relation between
the consumption and the income. Also, it is a direct relationship, due tot he fact the Multiple
R is bigger than zero.
The R square, or the ration of determination, represents the proportion of the variance
in the dependent variable that is predictable from the independent variable. Ranging from 0 to
1, our value of 0.99 indicates 99% of the consumption level can be predicated from the level
oft he income, while holding the other factors constant.
Again, the value oft he Adjusted R Square is 0.99, meaning the other factors, which
are held constant, are influencing the depedent variable with only 1%. To conclude, the other
factors do not a high degree of influence over the level of consumption.
Regression coefficient and intercept interpretation

Studying the first table, we can extract the following results:


Coefficient, equal with 491.8;
Slope, equal with 1.02.

Thus, the regression function was:

Real Consumption = 491.8+1.02*(Income) +


- = 0+1*Income+ -

The slope is positive one, meaning that there is a positive relationship between the
two variables. This means that, if the level of income is increasing, the level of
consumption increases with 1.02.

The validity of the model

In order to check this validity of model, one must establish two hypothesis.
H0: The null hypothesis: 1= 2== 31
H1: The alternative hypothesis: At least two value are different.

Because we have a slope we can select the alternative hypothesis and confirm that the
model is valid.

ANOVA
df SS MS F Significance F
Regressio 7215167 7215167 10331.8
n 1 2 2 7 1.4096E-38
202518. 6983.41
Residual 29 9 2
7235419
Total 30 1
Table 3

Using ANOVA, we can extract the SSresidual (Sum of Squares Residual) which is equal
with 2202518.9, the MSregression has the value of 72151672 and the MSresidual has a value
of 6983.412
Using the Fischer Test (MSregression / MSresidual) indicated an F value of 10331.87.
Also, one can observe the fact that the significance F is 1.92917E-13, which is a value
very close to 0. Therefore we can say that probability for an error to hapen is very small. So,
having a lower probability to commit the error than the level of =0.005, we can correctly
reject H0 in favor of H1 and we can conclude that the model is valid. Also, the p-value is
1.4096E-38, which is lower than 0.05.

Making the inference

By looking at the two limits, which are 1.001093 and 1.042206, one can conclude that
the reference can be extended for the whole population, because none of the both limits
comprises the value zero. Also, the p-value is lower than the significance level, which is 0.05,
so chances to be wrong when stating that the slope is different to 0 are less than 5%.

Interpreting the assumptions

We have three assumptions to interpret, in order to make sure that one can use safely this
model for forecasting:

The assumption of the normal distribution of residuals;


The assumption of the constant variance of residuals;
The assumption of the independence of residuals.

We study these assumptions with the help of the three graphs generated by the Excel
program.

Normal Probability Plot


10000
8000
6000
Consumption expenditure (Xd), expressed in billion 4000
$
2000
0

Sample Percentile
Income (Yd), expressed in billion $ Line Fit Plot
Consumption
expenditure (Xd),
expressed in billion $
Consumption expenditure (Xd), expressedPredicted
in billion $
Consumption
expenditure (Xd),
expressed in billion $

Income (Yd), expressed in billion $

Income (Yd), expressed in billion $ Residual Plot


200
100
0
Residuals 2000 3000 4000 5000 6000 7000 8000 9000
-100
-200
-300
Income (Yd), expressed in billion $

The first graph indicates a left skweness, but the points are spread closely around the
trendline.
The second graph shows that the points are, again, equally spread around the mean,
which proves that the model is homoskedastic, meaning that the errors have a constant
variance.
The third is used to determine the indepedence of residuals. We use the Durbin-Watson
formula. Given that the result is 1.272344, one can say that we have a positive correlation
between the residuals, meaning that one error will influence the next after it, which will be
be bigger. By studying the table of Durbin-Watson, the result is outside the limits of the
conditions imposed, so, the errors are auto-correlated.
Multiple regression model

Identify the variables:


Dependent variable Y: Real Consumption Expenditure;
Independent variable X2: Real Personal Income;
Second independent variable X3: CPI.
Identify the ecuation:
The following equation will be one of second degree: : = 1+2*X2+ 3*X3+

Standa P-
Coefficie rd valu Lower
nts Error t Stat e 95%
57.364 11.893 1.84 564.76
Intercept 682.271 94 52 E-12 42
Income (Yd), 0.0323 27.086 1.25 0.8095
expressed in billion $ 0.875825 35 2 E-21 91
1.1202 4.6430 7.36 2.9066
CPI, expressed in $ 5.201491 87 01 E-05 88

From this table above we can extract the equation:

Real Consumption = +682.271+0.87*(Income) + 5.20*(CPI)+

Studying the intensity of the relationship between the three variables:


Regression Statistics
Multiple R 0.999209
R Square 0.998419
Adjusted R
Square 0.998306
Standard Error 63.92612
Observations 31

With a Multiple R = 0.99, one can say that we have a strong intensity relation between
the consumption, the income and the year. Also, it is a direct relationship, due to the fact the
Multiple R is bigger than zero.
The R square, or the ration of determination, represents the proportion of the variance
in the dependent variable that is predictable from the independent variables. Ranging from 0
to 1, our value of 0.99 indicates 99% of the consumption level can be predicated from the
level oft he income and the year, while holding the other factors constant.
Again, the value oft he Adjusted R Square is 0.99, meaning the other factors, which
are held constant, are influencing the depedent variable with only 1%. To conclude, the other
factors do not a high degree of influence over the level of consumption.
Regression coefficient and intercept interpretation

Studying the first table, we can extract the following results:


Coefficient, equal with 682.271;
Slopes, which are equal with 0.87 and 5.2.
Thus, the regression function was:

Real Consumption = -682.271+0.87*(Income) + 5.20*(CPI)+


- Y= 1+2*X2+ 3*X3+-

The slopes are both positive, meaning the we have a posivite relation between the factors
presented. In other words, the real consumption will be positive.

The validity of the model

In order to check this validity of model, one must establish two hypothesis.
H0: The null hypothesis: 1= 2== 31
H1: The alternative hypothesis: At least two value are different.

Because we have a slope we can select the alternative hypothesis and confirm that the
model is valid.

Significan
df SS MS F ce F
Regressi 722397 361198 8838.7
on 2 67 84 27 6.12E-40
114423. 4086.54
Residual 28 4 8
723541
Total 30 91

Using ANOVA, we can extract the SS residual (Sum of Squares Residual) which is equal
with 72239767, the MSregression has the value of 36119884 and the MSresidual has a value of
4086.548
Using the Fischer Test (MSregression / MSresidual) indicated an F value of 8838.727.
Also, one can observe the fact that the significance F is 6.12E-40, which is a value
very close to 0. Therefore we can say that probability for an error to hapen is very small. So,
having a lower probability to commit the error than the level of =0.005, we can correctly
reject H0 in favor of H1 and we can conclude that the model is valid. Also, the p-value is
6.12E-40, which is lower than 0.05.

Making the inference


By looking at the two limits, which are 2.906688 and 7.496294, one can conclude that
the reference can be extended for the whole population, because none of the both limits
comprises the value zero. Also, the p-value is lower than the significance level, which is 0.05,
so chances to be wrong when stating that the slope is different to 0 are less than 5%.

Interpreting the assumptions

We have three assumptions to interpret, in order to make sure that one can use safely this
model for forecasting:

The assumption of the normal distribution of residuals;


The assumption of the constant variance of residuals;
The assumption of the independence of residuals.

We study these assumptions with the help of the three graphs generated by the Excel
program.

Normal Probability Plot

Consump tion exp end iture (Xd ), exp ressed in b illion $

Samp le Percentile

CPI, expressed in $ Residual Plot


200

Residuals 0
0.000 50.000 100.000150.000200.000
-200
CPI, expressed in $
CPI, expressed in $ Line Fit Plot
Consumption Predicted
expenditure (Xd), Consumption
expressed
Consumption exp in (Xd), expressed
enditure expenditure in
(Xd
b),illion $
billion $ expressed in
billion $

CPI, exp ressed in $

The first graph indicates a right skweness, but the points are spread closely around the
trendline.
The second graph shows that the points are, again, equally spread around the mean,
which proves that the model is homoskedastic, meaning that the errors have a constant
variance.
The third is used to determine the indepedence of residuals. We use the Durbin-Watson
formula. Given the result is 2.01361, one can say that we have a negative correlation
between the residuals, meaning that the one erros is influencing the next one, which will
be smaller. Also, analyzing the table and the limits imposed, the results is outside te limits
imposed and tending to 2, so the residuals are indepedent.

Correlation between the regressors

Income
(Yd),
expresse
d in CPI, expressed
billion $ in $
Income (Yd), expressed in
billion $ 1
CPI, expressed in $ 0.971317 1

The level of correlation between these two regressors is 0.97, which means that they have a
strong relationship, showing a sense of colinearity between them. However, the overall model is not
affected by this aspect.

Conclusion

With the help of both the simple and regression model, we can say that there is a
positive relationship between the three variables previously. We discovered that each year, the
level of income is increasing, and so, the level of consumption, while other factors have only
an influence of 1%.

Appendix
Income (Yd), expressed in Consumption expenditure (Xd), expressed
CPI, expressed in $
billion $ in billion $

2921.1 38.100 3349.5


2968.9 39.900 3505.4
3170.2 41.700 3624.1
3353.4 41.9 3902
3326.2 43.700 3981.7
3385.7 52.600 3906.1
3572.4 55.800 4086
3755.2 58.700 4185.4
3833.8 62.700 4384.1
3932 69.200 4555.5
4025 83.200 4618.8
4050.8 89.700 4664
4077.6 94.700 4757.1
4224.8 99.400 4872.3
4589.9 104.100 5271
4878.3 106.300 5578
5097.1 109.400 5571
5168.6 114.700 5759.2
5371.8 116.500 5963.8
5530.9 121.600 6215.1
5699.3 128.600 6366.2
5711.6 136.000 6390.6
5991.1 139.400 6679.4
6189.1 143.100 6825.9
6260.1 147.100 6908.4
6442.9 151.800 7190.3
6740.1 155.500 7394.6
7056.1 159.900 7751.8
7437.5 162.800 8067
7934.1 166.000 8381.7
8211.3 173.600 8800.6

You might also like