Regression
Regression
KRISHNA PRASAD
------------------------------------------------------------------------
.
REGRESSION
e
eg
Definition:
Regression is the measure of the average relationship
ll
between two or more variables in terms of the original units of the
data.
Co
Types Of Regression:
The regression analysis can be classified into:
a) Simple and Multiple
MJ
b) Linear and Non –Linear
c) Total and Partial
Linear Regression Equation:
s,
If two variables have linear relationship then as the
independent variable (X) changes, the dependent variable (Y) also
th
changes. If the different values of X and Y are plotted, then the two
straight lines of best fit can be made to pass through the plotted
points. These two lines are known as regression lines. Again, these
ma
a, b are constants.
pt
e .
eg
ll
Co
Dependent Variable
The variable whose value is to be predicted for a given
independent variable(s) is called dependent variable, denoted by
MJ
Y. For example, if advertising (X) and sales (Y) are correlated, we
could estimate the expected sales (Y) for given advertising
expenditure (X). So in this case Y is a dependent variable.
Independent Variable
s,
The variable which is used for prediction is called an
independent variable. For example, it is possible to estimate the
th
to the value of one variable for any specific value of the other
variable.
Thus the line of Regression is the line of best fit and is
.
regression equation.
Two Regression Lines
De
e .
eg
ll
Co
Similarly when Y is treated as an independent variable and
X as dependent variable, the line of regression of X on Y is given by
(X − X ) = bxy (Y − Y )
MJ
σ
where bxy == r x = Σ xy2 here x = X − X ; y = Y − Y
σy Σy
Note
s,
The two regression equations are not reversible or
interchangeable because of the simple reason that the basis and
th
Y : 40 38 43 45 37 43
Solution :
X Y x =X− X y=Y− Y x2 y2 xy
.
pt
10 40 -3 -1 9 1 3
12 38 -1 -3 1 9 3
De
13 43 0 2 0 4 0
12 45 -1 4 1 16 -4
16 37 3 -4 9 16 -12
15 43 2 2 4 4 4
78 246 0 0 24 50 6
ΣX 78 ΣY 246
X = n = 6 = 13 Y = n = 6 = 41
Σxy − 6
bxy = = = −0.12
Σy 2 50
∴ Regression equation of X on Y is (X − X ) = bxy (Y − Y )
X − 13 = −0.12 (Y − 41) ⇒ X = 17.92 − 0.12Y
e .
REGRESSION COEFFECIENTS:
eg
ll
σy
The regression equation of Y on X is ye = y + r ( x − x)
σx
Co
Here, the regression Co.efficient of Y on X is
σ
b1 = byx = r y
σx
MJ
ye = y + b1 ( x − x)
The regression equation of X on Y is
σ
s,
X e = x + r x ( y − y)
σy
th
X e = X + b2 ( y − y )
of
.
b1 = byx = = and
∑ (X − X ) ∑x
e
2 2
eg
b2 = bxy = ∑ ( X − X )(Y − Y ) =∑
xy
∑ (Y − Y ) ∑y 2
ll
2
where x = X − X , y = Y − Y
Co
If the deviations are taken from any arbitrary values of x and y
(short – cut method)
b1 = byx = ∑
n uv − ∑ u ∑ v
MJ
n∑ u 2 − ( ∑ u )
2
n ∑ uv − ∑ u ∑ v
s,
b2 = bxy =
n∑ v 2 − ( ∑ v )
2
th
where u = x – A : v = Y-B
A = any value in X
ma
B = any value in Y
Properties of Regression Co-efficient:
1. Both regression coefficients must have the same sign, ie either
of
3. The correlation coefficient will have the same sign as that of the
pt
regression coefficients.
4. If one regression coefficient is greater than unity, then other
regression coefficient must be less than unity.
De
.
become perpendicular to each other.
8. If r= +1, the two lines of regression either coincide or parallel to
e
each other
eg
m − m2
9. Angle between the two regression lines is θ = tan-1 1
1 + m1m2
ll
where m1 and,m2 are the slopes of the regression lines X on Y
and Y on X respectively.
Co
10.The angle between the regression lines indicates the degree of
dependence between the variables.
Example 2:
MJ
4 9
If 2 regression coefficients are b1= and b2 = .What would be
5 20
the value of r?
s,
Solution:
The correlation coefficient , r = ± b1b2
th
4 9
= x
5 20
ma
36 6
= = = 0.6
100 10
Example 3:
of
15 3
Given b1 = and b2 = , Find r
8 5
Solution:
.
r = ± b1b2
pt
15 3
= x
De
8 5
9
= =1.06
8
It is not possible since r, cannot be greater than one. So the given
values are wrong
.
The regression equation of Y on X is
σ
e
Ye = Y + r y ( X − X )
σx
eg
(1)
(or)
Ye = Y + b1 ( X − X )
ll
The regression equation of X on Y is
σ
Co
X e = X + r x (Y − Y )
σy
X e = X + b2 (Y − Y )
MJ
These two regression equations represent entirely two
different lines. In other words, equation (1) is a function of X,
which can be written as Ye = F(X) and equation (2) is a function of
s,
Y, which can be written as Xe = F(Y).
The variables X and Y are not inter changeable. It is mainly
due to the fact that in equation (1) Y is the dependent variable, X is
th
Solution:
pt
X Y x = X − X y = Y −Y x2 y2 xy
1 2 -2 -2 4 4 4
De
2 3 -1 -1 1 1 -1
3 5 0 1 0 1 0
4 4 1 0 1 0 0
5 6 2 2 4 4 4
15 20 20 10 10 9
.
n 5
∑ Y 20
e
Y= = =4
eg
n 5
Regression Co efficient of Y on X
∑ xy 9
ll
byx = = = 0.9
∑ x 2 10
Co
Hence regression equation of Y on X is
Y = Y + byx ( X − X )
= 4 + 0.9 ( X – 3 )
MJ
= 4 + 0.9X – 2.7
=1.3 + 0.9X
when X = 2.5
s,
Y = 1.3 + 0.9 × 2.5
= 3.55
Regression co efficient of X on Y
th
∑ xy 9
bxy = = = 0.9
∑ y 2 10
ma
= 3 + 0.9 ( Y – 4 )
= 3 + 0.9Y – 3.6
= 0.9Y - 0.6
.
Short-cut method
pt
Example 5:
Obtain the equations of the two lines of regression for the data
De
given below:
X 45 42 44 43 41 45 43 40
Y 40 38 36 35 38 39 37 41
.
X Y u = X-A v = Y-B uv
46 40 3 9 2 4 6
e
42 38 B -1 1 0 0 0
eg
44 36 1 1 -2 4 -2
A 43 35 0 0 -3 9 0
ll
41 38 -2 4 0 0 0
45 39 2 4 1 1 2
Co
43 37 0 0 -1 1 0
40 41 -3 9 3 9 -9
0 28 0 28 -3
∑u
MJ
X = A+
n
0
= 43 + = 43
8
s,
∑u
Y = B+
n
th
0
= 38 + = 38
8
ma
b1 = byx = ∑
n uv − ∑ u ∑ v
n∑ u 2 − ( ∑ u )
2
of
b2 = bxy = ∑
n uv − ∑ u ∑ v
pt
n∑ v 2 − ( ∑ v )
2
De
8(−3) − (0)(0)
=
8(28) − (0) 2
−24
= = - 0.11
224
.
Ye = Y + b1 ( X − X )
e
= 38 – 0.11 (X-43)
eg
= 38 – 0.11X + 4.73
= 42.73 – 0.11X
The regression equation of X on Y is
ll
X e = X + b1 (Y − Y )
= 43 – 0.11 (Y-38)
Co
= 43 – 0.11Y + 4.18
= 47.18 – 0.11Y
Example 6:
MJ
In a correlation study, the following values are obtained
X Y
s,
Mean 65 67
S.D 2.5 3.5
th
Find the two regression equations that are associated with the
above values.
Solution:
of
Given,
X = 65, Y = 67, σx = 2.5, σy= 3.5, r = 0.8
The regression co-efficient of Y on X is
.
σ
byx= b1 = r y
pt
σx
3.5
De
= 0.8 × = 1.12
2.5
The regression coefficient of X on Y is
σ
bxy = b2 = r x
σy
.
3.5
e
Hence, the regression equation of Y on X is
Ye = Y + b1 ( X − X )
eg
= 67 + 1.12 (X-65)
ll
= 67 + 1.12 X - 72.8
= 1.12X – 5.8
The regression equation of X on Y is
Co
X e = X + b2 (Y − Y )
= 65 + 0.57 (Y-67)
= 65 + 0.57Y – 38.19
MJ
= 26.81 + 0.57Y
Note:
Suppose, we are given two regression equations and we
s,
have not been mentioned the regression equations of Y on X and X
on Y. To identify, always assume that the first equation is Y on X
th
Example 7:
Given 8X – 10Y + 66 = 0 and 40X – 18Y = 214. Find the
of
correlation coefficient, r.
Solution:
.
-10Y = -66-8X
10Y = 66 + 8X
De
66 8 X
Y= +
10 10
Now the coefficient attached with X is byx
8 4
i.e., byx = =
10 5
.
40X-18Y=214
e
In this keeping X left side and write other things right side
eg
i.e., 40X = 214 + 18Y
214 18
ll
i.e., X = + Y
40 40
Co
Now, the coefficient attached with Y is bxy
18 9
i.e., bxy = =
40 20
MJ
Here byx and bxy are satisfied the properties of regression
coefficients, so our assumption is correct.
Correlation Coefficient, r = b yx b xy
s,
4 9
= ×
th
5 20
36
=
ma
100
6
=
10
of
= 0.6
Example 8:
Regression equations of two correlated variables X and Y
.
coefficient.
De
Solution:
Let 5X-6Y+90 =0 represents the regression equation of X
on Y and other for Y on X
6 90
Now X= Y –
5 5
.
5
e
For 15X-8Y-130 = 0
15 130
eg
Y= X–
8 8
byx = b1
ll
15
=
8
Co
r = ± b1 b2
15 6
= ×
MJ
8 5
= 2.25
= 1.5 >1
s,
It is not possible. So our assumption is wrong. So let us take the
first equation as Y on X and second equation as X on Y.
From the equation 5x – 6y + 90 = 0,
th
5 90
Y= X –
6 6
ma
5
byx =
6
From the equation 15x - 8y – 130 = 0,
of
8 130
X= Y+
15 15
8
bxy =
.
15
pt
Correlation coefficient, r = ± b1 b2
5 8
De
= ×
6 15
40
=
90
2
= = 0.67
3
e
eg
The lines of regression of Y on X and X on Y are
respectively, y = x + 5 and 16X = 9Y – 94. Find the variance of X
if the variance of Y is 19. Also find the covariance of X and Y.
ll
Solution:
Co
From regression line Y on X,
Y = X+5
We get byx = 1
From regression line X on Y,
MJ
16X = 9Y-94
9 94
X= Y– ,
16 16
s,
we get
9
bxy =
th
16
r = ± b1 b2
ma
9
= 1×
16
3
of
=
4
σy
Again , byx = r
σx
.
3 4
pt
Variance of X = σX2
=9 cov( x, y )
1 =
cov( x, y ) 9
Again byx =
x
2 or cov (x,y) = 9.
MJ
Marks in Economics : 25 28 35 32 31 36 29 38 34 32
Marks in Statistics : 43 46 49 41 36 32 31 30 33 39
Find (i) the regression equation of Y on X
s,
(ii) estimate the marks in statistics when the marks
in Economics is 30.
th
Solution :
Let the marks in Economics be denoted by X and statistics
ma
by Y.
X Y x =X− X y=Y− Y x2 y2 xy
25 43 -7 5 49 25 -35
of
28 46 -4 8 16 64 32
35 49 3 11 9 121 33
32 41 0 3 0 9 0
.
31 36 -1 -2 1 4 2
36 32 4 -6 16 36 -24
pt
29 31 -3 -7 9 49 21
38 30 6 -8 36 64 -48
De
34 33 2 -5 4 25 -10
32 39 0 1 0 1 0
320 380 0 0 140 398 -93
ΣX 320 ΣY 380
X = n = 10 = 32 Y = n = 10 = 38
Σ xy − 93
byx = = = −0.664
Σx 2 140
(i) Regression equation of Y on X is
Y − Y = byx (X − X )
Y − 38 = −0.664 (X − 32)
⇒ Y = 59.25 − 0.664X
(ii) To estimate the marks in statistics (Y) for a given marks in the
Economics (X), put X = 30, in the above equation we get,
Y = 59.25 − 0.664(30)
= 59.25 − 19.92 = 39.33 or 39
ll
Is it possible for two regression lines to be as follows:
Y = -1.5X + 7 , X = 0.6Y + 9 ? Give reasons.
Co
Solution:
The regression coefficient of Y on X is b1 = byx = -1.5
The regression coefficient of X on Y is b2 = bxy = 0.6
Both the regression coefficients are of different sign, which is a
MJ
contrary. So the given equations cannot be regression lines.
Example 12:
s,
In the estimation of regression equation of two variables X
and Y the following results were obtained.
th
Solution:
Here, x, y are the deviations from the Arithmetic mean.
Σxy
b1 = byx =
Σx 2
of
3900
= = 0.61
6360
Σxy
.
b2 = bxy = 2
Σy
pt
3900
= = 1.36
De
2860
Regression equation of Y on X is
Ye = Y +b1 (X - X ) Regression equation of X on Y is
= 70 + 0.61 (X –90) Xe = X + b2 (Y-Y )
= 70 + 0.61 X – 54.90 = 90 + 1.36 (Y –70)
= 15.1 + 0.61X = 90 + 1.36 Y – 95.2 = 1.36Y – 5.2
MJ
‘zuogoggz->00 uo1112[911o:> 10 10119 a[q1zqo1d
9111 uodn spu:-ad:->p 11 se 12; os ug
'a1qepued:;p
1us1:>g;9oa u01uz[a110o 10 enpz/\ pa/uesqo un
10 ,(1111qe1[91 sq; 3u11sa1
"ugldunzs u10pue1 10 suoggpuoo
‘1011a pmpums s11 uaq1 suo11eA1:-asqo 10 s112d
u 10 e[du1es 2 ug wagaggaoza uo11e1a11oo peussqo
sup s11 31
s,
'g'g Kq [JQIOUQP /(uensn (1) s1 Kq uazq :
1
(OI 8)
S E1 (1) Z-3:-T
. . _ U
th
- 5
xample8-20.(a)indKarlPearson’scoeffic
""""' of correlation from
ma
J
u
sq; Suppn 10; u0se91 §17L9-() 10192; CALCULATIONS FOR CORRELATION COEFFICIENT
12111 s1
10 %()g u011nq111s1p leuuou 2 Ll} 9111 suo11eA1:-asqo Q11 u1
sq;
We have :
1 Tl §17L9-() up s1 11 1-uaqm ‘0 um:->111 pun ,0 s1 72': sq;
9111 9‘3un1
pappvfr
nZuv — (Zu)(Zv) zunoum uv s_1 1ua_1og£[aoo u0_z1n1a110a aqzfo 10112 a1qnq01d 11011144 01
‘1s_z1aag 01 8u_1p109aV
uaaa 210 saaump am 1/agqm u_n11_m sumoum saanpo1d 45 35 2111,,
— —
‘/ [H2142 — (Zu)2]
HD2111 aqz u1o1fpa1on11qns pun
>< [nZv2 — (Zv)2]
‘1ua_zog,ffa0o u0_uv;a1109
70 90
zu01f uo_z1v1a1109f0 Juapgaoo 0122111
l0><l4l—2><(—2)
']1vf11_1m u10puv1 10 P2199198 s2_112s 1; -
H 4 25 A 1 10\
65 70 1 1 1
_ -
3° 40 "
1414
1/l396>< l756 40 40 _ _
=> log r = log 1414-§[log
1
rx 2 1
S xy Sxxy 1 yx
r 2 Sand
and 1ry 2 1 r 2
Syyx
S xy x 1 r 2 and S yx y 1 r 2
.
following data :
e
n=10, Ey2=90, Xxy=120, Ex2=200 ...(*)
eg
where x=X—X, y=Y—Y. -
[Delhi Univ.iB.A. (Econ. Hons. ), I991]
ll
Solution. The standard error of the estimate of Y from the regression of Y on X is given by : ‘
S»<=<ry<1-r*>'/2; r=r<X. o .-.<**>
Co
cf =§2<Y-?>2=§2y"=%=9 [Fwm<*>1 => <:,=45 =3 '
MJ
X) -Z( y__—
Y) 2 \/Zx1.Zy2
120 120 120 2 [Usingm]
\I9o><20o \l5><360O sod? ~13
s,
Substituting in (**), we get '
(i) the coeicient of determination and (ii) standard error of the estimate of Y on X. H
Z(y—§)2=60 = <»,’=f;>2<y-§>2=°e°=’33
.
Ex lained Variation 21
, r2 _ Total Variation _ 6° _ 0 4
De
(ii) The standard error (S.E.) of the estimate of Y on X, denoted by Sy, is given by :
_ - = - 7'2)
-= —_.;~(1- )= 23£x06= =2.
Example 9-25. In a partially destroyed record, for the estimation of the two lines of regression from a
bivariate data (X, Y), the following results were available :
Coeicient of regression of Y on X = -1 -6
Coeicient of regression of X on Y = 0-4 -
Standard error of the estimate of Y on X = 3
ll
the Karl Pearson ’s coeicient of correlation.
EX = 250; ZY = 300 ; ZXY= 7,900 ; ZX2 = 6,500 ; ZY2 = 10,000 ; and N = 10.
Co
Solution. We have :
—_ZX_2so_
x_-X,-_;5_25 ,
—_ZY_300_
Y_~A7_W_30
NZXY— (EX) (ZY)
byx = Coefcient of regression of Y on X =
MJ
N EX’ — (ZX)2
_1o><790o-2so><3oo_ 79060-75o00_400o_1_6
10 X 6500
WW
(250)2 65000 — 62509 2500
=bXy(Y-Y)
=> Y-30 =1-6(X—25) => X—25 =0-4(Y-30)
= 'Y =1-6X—40+30 => Xv=0-4Y-12+25
--.> Y=1-6X-10 => X=0-4Y+l3
.
Example 9'12. In the estimation of regression equations of two variables X and Y the following results
pt
.
1. Regression analysis helps in establishing a functional
e
relationship between two or more variables.
eg
2. Since most of the problems of economic analysis are based on
cause and effect relationships, the regression analysis is a highly
ll
valuable tool in economic and business research.
3. Regression analysis predicts the values of dependent variables
from the values of independent variables.
Co
4. We can calculate coefficient of correlation ( r) and coefficient of
determination ( r2) with the help of regression coefficients.
5. In statistical analysis of demand curves, supply curves,
MJ
production function, cost function, consumption function etc.,
regression analysis is widely used.
9.8 Difference between Correlation and Regression:
s,
S.No Correlation Regression
1. Correlation is the relationship Regression means
between two or more variables, going back and it is a
th
fixed variable.
Sometimes both the
variables may be
.
random variables.
pt
.
verifying the relation between is used for the
two variables and gives limited prediction of one
e
information. value, in relationship
eg
to the other given
value.
ll
5. The coefficient of correlation is Regression coefficient
a relative measure. The range of is an absolute figure. If
Co
relationship lies between –1 and we know the value of
+1 the independent
variable, we can find
the value of the
MJ
dependent variable.
6. There may be spurious In regression there is
correlation between two no such spurious
s,
variables. regression.
7. It has limited application, It has wider
because it is confined only to application, as it
th
treatment.
9. If the coefficient of correlation is The regression
positive, then the two variables coefficient explains
are positively correlated and that the decrease in one
.
Exercise
I. Choose the correct answer:
1. When the correlation coefficient r = +1, then the two regression
lines
a) are perpendicular to each other b) coincide
c) are parallel to each other d) none of these
.
other must be
a) greater than unity b) equal to unity
e
c) less than unity d) none of these
eg
3. Regression equation is also named as
a) predication equation b) estimating equation
ll
c) line of average relationship d) all the above
4. The lines of regression intersect at the point
Co
a) (X,Y) b) ( X , Y ) c) (0,0) d) (1,1)
5. If r = 0, the lines of regression are
a) coincide b) perpendicular to each other
c) parallel to each other d) none of the above
MJ
6. Regression coefficient is independent of
a) origin b) scale c)both origin and scale
d) neither origin nor scale.
s,
7. The geometric mean of the two-regression coefficients byx
and bxy is equal to
b) r2
th
a) r c) 1 d) r
8. Given the two lines of regression as 3X – 4Y +8 = 0 and
4X – 3Y = 1, the means of X and Y are
ma
a) X = 4, Y = 5 b) X =3, Y = 4
c) X = 2, Y = 2 d) X = 4/3, Y = 5/3
9. If the two lines of regression are
of
X + 2Y – 5 = 0 and
2X + 3Y – 8 = 0, the means of X and Y are
a) X = -3, Y = 4 b) X = 2, Y = 4
c) X =1, Y = 2 d) X = -1, Y = 2
.
.
be the degree of correlation.
15. When one regression coefficient is positive, the other would
e
also be _____.
eg
16. The sign of regression coefficient is ____ as that of correlation
coefficient.
ll
III. Answer the following:
17. Define regression and write down the two regression
Co
equations
18. Describe different types of regression.
19. Explain principle of least squares.
20. Explain (i) graphic method, (ii) Algebraic method.
MJ
21. What are regression co-efficient?
22. State the properties of regression coefficients.
23. Why there are two regression equations?
s,
24. What are the uses of regression analysis?
25. Distinguish between correlation and regression.
26. What do you mean by regression line of Y on X and
th
regression line of X on Y?
27. From the following data, find the regression equation
ma
Y 11 30 25 44 38 25 20 27
29. Find the two regression equations from the following data.
X 25 22 28 26 35 20 22 40 20 18
.
Y 18 15 20 17 22 14 16 21 15 14
pt
30. Find S.D (Y), given that variance of X = 36, bxy = 0.8,
r = 0.5
De
.
X Y
Mean 12 15
e
S.D. 2 3
eg
r = 0.5 Find the two regression equations.
33. The correlation coefficient of bivariate X and Y is r=0.6,
ll
variance of X and Y are respectively, 2.25 and 4.00, X =10,
Y =20. From the above data, find the two regression lines
Co
34. For the following lines of regression find the mean values of
X and Y and the two regression coefficients
8X-10Y+66=0
MJ
40X-18Y=214
35. Given X=90, Y=70,bxy = 1.36, byx = 0.61
Find (i) the most probable values of X, when Y = 50 and
(ii) the coefficient of correlation between X and Y
s,
36. You are supplied with the following data:
4X-5Y+33 = 0 and 20X-9Y-107 = 0
th
variance of Y = 4. Calculate
(I) Mean values of x and y
(II) S.D. of X
ma
Answers:
of
I. 1. b 2. c 3.d 4. b 5. b 6. a 7. a 8. a 9. c
10. b
II.
11. dependence 12. dependence 13. more than, less than
.
III.
27. Y = 0.498X +1.366 28. Y =1.98X + 12.9;Y=42.6 30. 3.75
De