Answer2 PDF
Answer2 PDF
1. Study the Lungcap data set and answer the following questions.
2. Suppose it is given that 20% of the male smokers and 15% of the female smokers were born
caesarean. With the help of the data, verify the above statements. Give enough reasons for your
answers.
3. Plot the histogram of the distribution of Lungcap amongst smokers.
4. Plot the histogram of the distribution of Height amongst smokers.
5. Are height and Lungcap independent?
6. Are the variation of Lungcap of male smokers and female smokers equal?
7. Are the average of Lungcap of smokers and non-smokers equal?
8. Plot the histogram of the age amongst smokers.
9. What percentage of people below 16 years smoke?
10. What percentage of people above 17 years smoke?
11. Test if smoking habit and age are dependent.
12. Test if smoking habit and Lungcap are dependent.
13. Fit a suitable distribution to height and also to Lungcap. Test the goodness of fit.
QUESTION – B
Study the car data set and answer the following questions.
1. Find the average and variance of price and mileage separately. Comment on the results. How will
you interpret the result statistically?
2. Test if the mean mileage of different car manufacturers within some price range are equal. Clearly
specify all the assumptions and the null and alternative hypotheses.
3. Find a 90% confidence price range for the Chevrolet cars.
4. Find a 90% confidence for variance of prices for Pontiac cars.
5. Calculate the correlation coefficient between mileage and Liter for each company.
6. Comment on the results.
7. Suppose a car has a Liter of 3.8. How sure will you be that its mileage is more than 20,000?
8. Is there any correlation between prices and mileage?
QUESTION – C
Ý n1 N ( μ , σ =10) . Find n1
2
1. Let be the mean of a random sample of size from
approximately 0.954.
1|Page
2. Let Ź be the mean of a random sample of size
n2 from N ( μ , σ 2=9 ) . Find n2
approximately 0.90.
n1
3. Draw 200 random samples each of size (found above) from a normal distribution with
mean 5 and variance 3.
4. Write down the distribution of the sample mean. Test using the data obtained in Q3 above, if the
sample means follow that distribution.
n2
5. Draw 200 random samples each of size (found above) from a normal distribution with
mean 7 and variance 3.
6. Compute 95% confidence interval for the difference of means from each of the 200 samples.
Draw a graph to show all 200 confidence intervals and comment.
QUESTION – D
1. Collect stock prices for 5 companies from 1st Jan 2016 to 30th June 2016.
2. Plot the histogram of the returns for each company. Describe the histograms.
3. Test whether the average returns for 5 companies are equal. State clearly the assumptions
required, null and alternative hypotheses.
4. Test whether the average returns for each pair of companies are equal.
5. Comment on the results.
QUESTION – E
1. The income distribution of a very large population is exponential with average income ₹ 40, 000
per annum. Draw 500 samples (from the income distribution) of size 100 each. Sketch the
distribution of sample average income. Comment.
2. The age distribution of a very large population is given below:
Age Group 15-18 18-21 21-23 23-25 25-27 27-29 29-31 31-33 33-35
(years)
Proportion 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1
Draw 100 samples (from the age distribution) of size 50 each. Sketch the distribution of sample
average age. Comment.
2|Page
3|Page
Section-A
Q.1.i.
Two-Way Table
Smoking Habit
Gender Non Grand
Smoker Smoker Total
Male 33 334 367
Female 44 314 358
Grand Total 77 648 725
Q.1.ii. Marginal Probabilities
Smoking Habit
Gender Non Marginal
Smoker Smoker Probability
Male 0.046 0.461 0.506
Female 0.061 0.433 0.494
Marginal
Probability 0.106 0.894 1.000
Q.1.iii Given that one randomly selected person is a smoker, probability that the person is
female:
P(Female|Smoker) = #of female smokers
#of smokers
= 44
77
= 0.571
4|Page
Q.2 Given 20%(=m )of male smokers and 15%(=f ) of female smokers were born caesarean.
a) As per the sample,
# of male smokers = 33 , # of male smoker born caesarean =10
Proportion of male smoker born caesarean, Pm =10/33 =30.3%
Sample size,Nm=33
Since sample size > 30, as per CLT, Pm ~ N(Pm,SDm)
Standard deviation, SDm= sqrt(Pmx(1-Pm)/ Nm)= 0.08
Zcal=Pmm)/SD = (30.30%-20%)/0.08 = 1.29
Z+cri=Z0.975=1.96; Z-cri=Z0.025=-1.96
Hypothesis Statement:
HO: Proportion of male smoker, m = 20%
HA: Proportion of male smoker, m ≠ 20%
Rejection Rule
Reject HO if Zcal > Z+cri or Zcal < Z-cri
Since Z-cri > Zcal (=1.29) < Z+cri , there is not enough
evidence to reject HO.
Hence we accept the hypothesis that 20% of the male smokers were born caesarean.
b) As per the sample
# of female smokers = 44 , # of male smoker born caesarean =11
Proportion of male smoker born caesarean, Pf =11/44=25%
Sample size,Nf=44
Since sample size > 30, as per CLT, Pf ~ N(Pf,SDf)
Standard deviation, SDf= sqrt(Pfx(1-Pf)/ Nf)= 0.065
Zcal=Pff)/SD = (25% - 15%)/0.065 = 1.53
Z+cri=Z0.975=1.96; Z-cri=Z0.025=-1.96
Hypothesis Statement:
HO: Proportion of female smoker, f = 15%
HA: Proportion of female smoker, f ≠ 20%
Rejection Rule
Reject HO if Zcal > Z+cri or Zcal < Z-cri
Since Z-cri > Zcal (=1.53) < Z+cri , there is not enough
evidence to reject HO.
Hence we accept the hypothesis that 15% of the female smokers were born caesarean.
Q3.&4.
5|Page
Q5.
Lungcap Height Total
<63 >63
<7 229 34 263
>7 54 408 462
Total 283 442 725
Hypothesis Statement:
H0: Height and lungcap are independent for the following ranges
HA: Height and lungcap are dependent
Rejection Rule:
Reject Ho if cal is less than 5% p-value.
Observed Frequencies Expected Frequencies Difference Sq. Diff./Exp. Freq
F Value Given E Value Expected (Fij - Eij) (Fij - Eij)^2/Eij
F11 229 E11 102.66 126.34 155.479
F12 34 E12 160.34 -126.34 99.549
F21 54 E21 180.34 -126.34 88.509
F22 408 E22 281.66 126.34 56.670
Degrees of freedom= (2-1)(2-1)=1 cal 400.207
1,0.05 3.841
Since cal > 1,0.05 , the p-value for cal (~0%) is less than 5%. Hence reject H O and state that Height and
6|Page
Since Fcal (=1.19)< Fcrit(1.96), there is not enough reasons to reject H O. Hence we accept the
hypothesis and state that the variances of male smokers and female smokers are equal.
Q7. Let 1 and 2 be the average of lungcap of smokers and non-smokers. Whereas 12 and 22 are
the sample variance of the respective population.
x1= Random Variable of average of lungcap of sample smokers ~ N(1,12/n1)
x2= Random Variable of average of lungcap of sample non-smokers~ N(2,22/n2)
As per data,
No. of smokers, n1=77 No. of non-smokers, n2= 648
Average of lungcap of smokers x1=8.645 Average of lungcap of non-smokers x2= 7.77
Sample lungcap variance of smoker, s= 3.545 Sample lungcap variance of non-smoker, s=
7.432
7|Page
Q.9. # people below 16 years = 548
# people below 16 years who smoke= 42
Percentage of people below 16 who smoke = 42/548 = 7.66%
Q.10. # people above 17 years = 80
# people above 17 years who smoke= 15
Percentage of people below 16 who smoke = 15/80 = 18.75%
Q13. As per the data, we have the following descriptive statistics for lungcap and height:
LungCap Height
Mean 7.863148 Mean 64.83628
Standard Standard
Deviation 2.662008 Deviation 7.202144
Count 725 Count 725
Distribution for Lungcap
HO : We assume the lungcap distribution of the population to follow Normal Distribution
~ N(7.863,2.66)
HA: The lungcap distribution doesn’t follow ~ N(7.863,2.66)
We construct the following frequency distribution with taking bin size such that the frequency
percentage is 10%.
Percentage Z-value Bin Frequency Expected fi-ei (fi-ei)2 (fi-ei)2/ei
(fi) Frequenc
y
(ei)
10% -1.28 4.456 83 72.5 10.5 110.250 1.521
20% -0.84 5.627 62 72.5 -10.5 110.250 1.521
30% -0.52 6.479 67 72.5 -5.5 30.250 0.417
40% -0.25 7.198 61 72.5 -11.5 132.250 1.824
50% 0 7.863 72 72.5 -0.5 0.250 0.003
60% 0.25 8.529 74 72.5 1.5 2.250 0.031
70% 0.52 9.247 79 72.5 6.5 42.250 0.583
80% 0.84 10.099 71 72.5 -1.5 2.250 0.031
90% 1.28 11.271 88 72.5 15.5 240.250 3.314
More 68 72.5 -4.5 20.250 0.279
cal 9.524
We get cal = 9.52
9|Page
For significance level 5% and degrees of freedom 7 (=10-2-1), we have 7,0.05=14.064.
Since cal < 7,0.05 , p-value will be more than 5% . Hence we accept H O and lungcap distribution
follow ~ N(7.863,2.66).
Section-B
We observe that the sample variance of price is more than mileage. That means the spread of
price around average is more than that of mileage. So we can say that wide range of priced cars
have mileage closer to 19831.93.
Q2. Average
Price Range mileage (xi) Variance (i2) Sample Size(ni)
<20000 20241.52 64394503 467
20k-40k 19759.26 65564947 297
>40K 15589.65 95651556 40 10 | P a g e
Let 1,2 and 3 be the average of mileage of cars in the price range as given in the table.
Whereas 12 ,22 and 32 are the variance of the respective car price range.
Hypothesis Statement
HO : 1= 2=3
HA : 1≠ 2≠3
We conducted
Anova test. Since
the p-Value is less
than 0.05, we
reject HO and state
that the average
mileage of the cars
in above price
range are not
equal.
Q3.
Price-Chevrollet
Mean 16427.6
Standard Deviation 6901.439 t-value Price
Sample Variance 47629867t+0.05,319 0.824822 16745.82
Count 320t-0.95,319 -0.82482 16109.38
Confidence Level(90.0%) 636.4364 CL= 636.4364
Q.4. n 150
16708238
Sample Variance
171.507
149,0.1
CI Variance (90%) 14515607 (n-1)s2/149,0.1
Q.5. Manufacture
r Cov.(ML) SD(M) SD(L) Corr.(ML)
Buick 162.323 6932.136 0.230 0.102
Cadillac 594.100 8964.292 0.803 0.083
Chevrolet -285.829 8203.571 1.151 -0.030
Pontiac 959.280 8110.435 1.098 0.108
SAAB -9.525 8404.288 0.162 -0.007
Saturn -501.661 8479.994 0.301 -0.197
11 | P a g e
Q.6. Analysing the correlation coefficients of mileage and liter among different car manufacturer, it
can be stated that there is weak linear relation between mileage and liter as the correlation
coefficients are close to zero.
Q8. As correlation between price and mileage Cov.(PM) SD(P) SD(M) Corr.(PM)
is close to zero, there is a weak linear -11589868.158 9884.853 8196.320 -0.143
relation among them.
Section-C:
12 | P a g e
Q.4. Since we have taken
the sample from a
normal distribution
N(5,3) each of size 159
(>30), the average of
200 sample will follow
normal distribution
with N(5,3/159) which
can be verified in the
descriptive statistics of the sample and histogram below.
Sample Mean = 5.003317 ~ 5
Sample variance = 3/159 = 0.0188 ~ 0.020
0.5
-0.5
-1
Difference of Mean Upper Limit of CI (+0.679)
Lower Limit of CI (-0.679)
Section D:
Q.1. I collected stock price of Monnet Ispat & Energy Ltd, GAIL (India) Ltd, Alstom India Ltd, ABB India
Ltd and Siemens Ltd from 01.01.2016 to 30.06.2016.
13 | P a g e
Q.3. HO: Average stock returns of the five companies taken are equal (R1 = R2 = R3 = R4 = R5)
HO: Atleast one of average stock return is not equal to other stock returns.
Reject HO if p value < 0.05
Since the p-Value is greater than 0.05, we accept the H O and state that the average return of the
said companies are equal.
14 | P a g e
Section E:
Q.1
15 | P a g e
Since the sample size is more than 30,i.e 100, the average salary of each sample will follow normal
distribution, N(40000,SD=40000/sqrt(10))
Sample mean calculated = 40260.59 ~ 40000
Standard deviation = 3983.923 ~ 4000 (=40000/10)
Q.2 Since the sample size is more than 30, i.e 50, the average age of the samples should follow
normal distribution as per CLT.
Population average age =25.8, and standard deviation = 5.216 then sample average age must have mean
25.8 ~ 25.09 and standard deviation = 5.216/sqrt(50) = 0.737 ~ 0.64
16 | P a g e