Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 33

UNIT ONE introduction Biostatics

Part I true or false

1. If the lower quartile is farther from the median than the upper quartile, then the
distribution is negatively skewed. True
2. Sample is fixed numbers whose values are usually unknown. False
3. Mid-Range is measure of dispersion and most commonly affected by extreme valve.
Flase
4. A continuous random variable has measurable probability associated with each value
False
5. The degree of freedom of paired t-test is n-2. False
6. Dependent variable for linear regression should be numeric. False
7. The variance and standard deviation are the most superior and widely used measures
of dispersion True
8. The larger the sample size, the narrower the confidence interval and the more precise
our estimate True
9. A large p-value implies that the probability of the value observed, occurring just by
chance is low, when the null hypothesis is true. True
10. If there are real differences among groups’ means, the between groups variation will be
larger than the within variation. T rue
11. When the standard deviation is not known and the sample size is small uses t-test but f-
test is used to compare two population variance. True
12. A follow up time is right censored if we know that the event of interest took place at
unknown time prior to the actual observed time. False
Part II Matching

13. Wilcoxon rank sum test A) Randomness 20


14. Kruskal-Wallis test B) Fisher exact test 17
15. Wilcoxon signed-rank test C) Two independent t- sample test 13
16. Single-sample sign test D) Two dependent t- sample test 15
17. Chi-square test E) Pearson’s correlation coefficient 19
18. Friedman test F) One sample t-test 16
19. Spearman rank correlation coefficient G) Repeated ANOVA 14
20. Runs test H) one way ANOVA 18

Part III Multiple choice(choice the best correct answer)


21. Best measurement of central & dispersion (variation) in skewed data respectively is
A) Mean & SD B) Mode &Variance C) Median & Range D) median & SD
22. Classification of HTN: ≤89/59 mmhg hypotension, 90/60 -139/89 mmhg normal,
≥140/90 mmhg pre HTN. This is the type of scale measurement

A) Ratio B) nominal C) ordinal D) interval

23. In an unpaired samples t-test with sample sizes n1= 11 and n2= 11, the value of tabulated
t should be obtained for:

(a) 10 degrees of freedom (b) 21 degrees of freedom

(c) 22 degrees of freedom (d) 20 degrees of freedom

24. The purpose of statistical inference is:

(a) To collect sample data and use them to formulate hypotheses about a population

(b) To draw conclusion about populations and then collect sample data to support the
conclusions

(c) To draw conclusions about populations from sample data

(d) To draw conclusions about the known value of population parameter

25. Which one is mismatched?

A) Degree (level) of confidence ->1 – α B) Power of the test -> 1 – β C) Probability of


type II error - >β

D) Probability of type I error - > α E) One tailed significance level -> α / 2

26.Given IQ scores are approximately normally distributed with a mean of 100 and
standard
deviation of 15, the proportion of people with IQs above 130 is:
a. 95% b. 68% c. 5% d. 2.5%

27. Suppose you conduct a significance test for the population proportion and your p-value
is 0.184. Given a 0.10 level of significance, which of the following should be your
conclusion?

A) Accept HO b) accept HA c) Fail to reject HA d) Fail to reject HO

28. Loosely speaking, what does the Central Limit Theorem say?

(a) The area under a normal density curve is one.

(b) Measures of central tendency should always be computed with and without outliers.

(c) The sampling distribution of x is approximately normal.


(d) Confidence intervals have zero margin of error for large sample sizes

29. What is a sampling distribution?

(a) It describes all the non-sampling errors that occur when sampling from a population.

(b) It describes how a statistic's value will change from sample to sample.

(c) It describes how randomization is used to guarantee unbiased estimates.

(d) It summarizes all sources of variability not explained by a least-squares regression


model.

30. One example we looked at in class involved HIV positive rates among rural Chinese
farmers. One of the variables studied was education level, which was recorded as \
illiterate," \primary," and\secondary." What type of variable is education level in this
example?

(a) A quantitative variable (b) A categorical variable

(c) A response variable (d) A lurking variable

31. A researcher divided subjects into two groups according to gender and then selected
members from each group for her sample. What sampling method was the researcher
using?

a. Cluster b. Random c. Systematic d. Stratified

32. A survey is to be conducted on household water supply in a district comprising 20,000


households, of which 20% are urban and 80% rural. It is suspected that in urban areas the
access to safe water source is much more satisfactory. A decision is made to include 100
urban households (out of 4000) and 200 rural households (out of 16000). Which sampling
method do we use in order to determine the access to safe water for all the district
households?

a. Simple random sampling b. Cluster sampling

c. Systematic sampling d. Convenience sampling e. stratified sampling

33. Assume that the population that we want to conduct a research is on patients who are
following ART in Gondar. From the information we have, we assumed these patients have
similar characteristics with respect to the study variable. In order to select a sample of
patients from all 5000 patients who are on follow up, which sampling technique is more
appropriate?

A. Simple random sampling b. Systematic random sampling


c. stratified random sampling d. Cluster random sampling

34 If most of the measurements in a data set are of approximately the similar magnitude
except for a few measurements that are quite a bit larger, how would the mean and median
of the data set compare and what shape would a histogram of the data set have?

a. The mean would be smaller than the median and the histogram would be skewed with a
long left tail.

b. The mean would be larger than the median and the histogram would be skewed with a
long right tail.

c. The mean would be larger than the median and the graph would be skewed with a long
left tail.

d. The mean would be smaller than the median and the histogram would be skewed with a
long right tail.

e. The mean would be equal to the median and the histogram would be symmetrical.

35. Many professional schools require applicants to take a standardized test. Suppose that
1000 students write the test, and you find that your mark of 63 (out of 100) was the 73rd
percentile. This means:

a. At least 73% of the people got 63 or better. b. At least 270 people got 73 or better.

c. At least 270 people got 63 or better. d. At least 27% of the people got 73 or
worse.

e. At least 730 people got 73 or better

36. If a null hypothesis is rejected at the 0.05 level of significance for a two-tailed test, you

A. will always reject it at the 99 percent level of confidence

B. will always reject it at the 90 percent level of confidence

C. will always not reject it at the 99 percent level of confidence

D. will always not reject it at the 96 percent level of confidence

37. The relationship between level of significance, degree of freedom and the tabulated
value of t from the t-distribution is

A. the value of t has direct relation with degree of freedom and inverse relation with level
of significance ( α )
B. when we increase degree of freedom and keep level of significance ( α ) constant then
the value of t will decrease

C. the value of t has direct relation with both degree of freedom and level of significance (
α )

d. When we increase level of significance ( α ) and keep degree of freedom constant then the
value of t will decrease

38. In general, which of the following statements is FALSE?

a. The sample mean is more sensitive to extreme values than the median.

b. The sample range is more sensitive to extreme values than the standard deviation.

c. The sample standard deviation is a measure of spread around the sample mean.

d. The sample standard deviation is a measure of central tendency around the median.

e. If a distribution is symmetric, then the mean will be equal to the median.

39. Which one of the following is true about positively skewed distribution?

A. Mean is smaller than median B. Majority of scores is at the right end of the curve

C. Median is greater than mode D. Few extreme scores are scattered at the left end

40. In hypothesis testing, the level of significance is the probability of

A. failing to reject a true null hypothesis B. failing to reject a false null hypothesis

C. rejecting a false null hypothesis d. rejecting a true null hypothesis.

41. Which one of the following statement is true about 95% confidence interval for the
population proportion?

a. Contains the sample proportion with 95% certainty.

B. Is more likely to contain the population proportion than the 99% confidence interval.

c. Contains 95% of the observations in the population.

d. Can be used to give an indication of whether the sample proportion is a precise estimate
of the population proportion.

42.The study was conducted to check whether there is association between asthma and
smoking. At the end, it was found that the p-value for the variable is 0.03. Then what is the
conclusion to be made by the researcher?
a. There is no association between smoking and asthma

b. There is a weak association between smoking and asthma

c. There is significant association between smoking and asthma

d. We can’t say anything from the given information unless additional information is given

43. The following are survival times in months recorded for six tumor-bearing rats being
observed after radiation therapy: 1, 4, 1.7, 2.3, 2.9 and 3.2. If the observed value of 3.2
months is mistakenly recorded as 32 months what will be the effect on the summary
statistics of this study?

A. An increase in the median B. An increase in the mode

C. An increase in the mean and median D. An increase in the mean E. c and d

44. for constructing grouped frequency distribution table, which of the following is not
true?

A. The classes must be mutually exclusive B. The number of classes should not be
too many

C. The intervals of the classes can be open-ended D. The classes should be of equal width

45 In simple random sampling,

A. Each unit in the sampling frame has an equal chance of being selected

B. The starting point should be chosen at random from the first sampling interval

C. Groups of study units are selected

D. The distribution of the characteristics to be studied is strongly affected by a certain


variable

46. In a study of the prevalence of HIV among adolescents in Ethiopia, a random sample of
adolescents in Lideta Kifle Ketema was included. Which one of the following statement is
correct?

A.The Target Population is all adolescents in Ethiopia B. Study population is all


adolescents in Addis Ababa

C. Sample is Adolescent in Lideta Kifle Ketema D. All E. None

47. Which of the following provides the best definition of a frequency when the term is
applied to a dataset?
a. The number of occurrences for a range of values that a variable takes in a data set

b. The number of occurrences for zero values that a variable takes in a data set

c. The number of occurrences for one, or a range of values that a variable takes in a data
set

d. The number of occurrences for the mean value that a variable takes in a data set

e. The number of occurrences of inappropriate values that a variable takes in a data set

48. The interquartile range includes the following scores?

A. 50% of the unranked scores b.25% of the ranked scores c. 75% of the rank scores

D. 50% of the ranked scores e. 75% of the unranked scores

49. For a set of data that follow a normal distribution how many scores can one expect to
find within one standard deviation on each side of the mean, that is two standard
deviations in total?

a.54% b 99% c.95% d. 88% e. 68%

50. The standard error of the mean (SEM) is related to sample size, specifically:

a. As sample size increases, SEM increases b. As sample size increases, SEM


stays constant

c. As sample size increases, SEM becomes less stable d. As sample size increases, SEM
decreases

e. None of the above

51. The paired sample t statistic, is suitable in the following situation

a. Comparison of a sample proportion to that of a population proportion of 0.5

b. Comparison of a sample mean to that of a population one, where the sampling


distribution is exponential

c. Comparison of a sample distribution to that of a population

d. Comparison of a sample mean of zero to that of a population one over a time period

e. Comparison of a sample mean to that of a population mean of zero

52. The two independent sample t -statistic is suitable in the following situation:
a. Comparison of two independent sample means where the samples are <30

b. Comparison of two independent sample means where the samples are >30 or normally
distributed

c. Comparison of two independent sample means where the samples are exponentially
distributed

d. Comparison of a sample distribution to that of an independent population

e. Comparison of a specified mean to that of a population one over a time period

53. The p value (two sided) associated with the two independent samples t statistic, assumes
the following:

A. Mean of samples identical B. Mean of sample one is not equal


to that of sample two

c. Mean of sample one is less than that of sample two D. Mean of sample one is greater
than that of sample two

e. None of the above

54. When carrying out a Wilcoxon matched- pairs statistic on a small dataset (i.e. n<50),
what method of p-value computation is the most appropriate?

a. Asymptotic b. Bootstrapped c. simulated d. Z score approximation


e. exact method

55. The one parameter model in simple linear regression attempts to?

a. Fit the data to the mean value of the dependent variable

b. Fit the data to the mean value of the independent variable

c. Fit the data within the 95% CI limit, by transforming the x values

d. Fit the data, by transforming the x values to z scores

e. Fit the data by using both intercept and slope parameters

55. The coefficient of determination can be interpreted a number of ways. Which of the
following is one of them?

a. Proportion of explained variation b. Proportion of unexplained variation (i.e. residual)

c. Proportion of mean variation d. Proportion of variance variation e. Proportion


of points on the line
Question 56 and 57 are based on the following Information. Receit for diastolic blood
pressure (DBP) for 1,500 male’s ages 30-69 are showing in the table. The mean DBP 84
mm/HG.

DBP mm/HG Frequency

<65 60

65-74 270

75-84 540

85-94 420

95-104 150

105-114 45

>115 15

56. Those with DBP 95 mm/HG or above were considered to be hypertensive. Thus
hypertensive men felt

A. above the 86th percentile b. below the 14th percentile c. below the 86th percentile D
none

57. The distribution of DBP for 800 females in this population is nearly symmetric with the
same standard deviation as that for the males. The mean for females was 79 mmHg. Thus
we may conclude:

A. the median will be the same for both sexes B. the proportion that is hypertensive is
the same for both sexes

C. the variability of DBP is higher for females D. the variability of DBP is smaller for
fameless E. none

58. The classification of data according to location is what classification:

a) Chronological b) Quantitative c) Qualitative d) Geographical E)


Spatial F) D&E

59. Suppose we compared 2 random samples taken from the California all-discharge
database describe as Sample A is a random sample with 100 discharges. Sample B is a
random sample with 2,000 discharges. What can be said about the relationship between the
Sample standard error in Sample A (SEA) relative to the sample standard error of length-
of stay value in Sample B (SEB)?

A) SEA< SEB b) SEA> SEB c) SEA is exactly equal to SEB

d) Not enough information given to determine relationship between the two standard
errors.

60. The term 'simple' in simple linear regression is because?

a. There is no independent variables b. There is one independent variable

c. There is more than one independent variable d. There are multiple dependent and
independent variables

e. The dependent variable is dichotomous

61. Statistical Power is affected by several factors which of the following is false:

a. Effect size, increasing it, increases power b. Sample size, increasing it, increases
power

c. Type one error (α), increasing it, increases power d. Type two errors (β), increasing it,
increases power

e. Variance decreasing it, increases power

62. Censored observations are . . .?

a. More important than non-censored ones in survival analysis

b. Are assumed to be normally distributed over time

c. Are assumed to have the same survival chances as uncensored observations

D. Are essential to allow calculation of the Kaplan Meier plot

e. Are allocated to the baseline survival curve

63. A Cox regression analysis . . . (one correct choice)

a. Is used to analyze survival data when individuals in the study are followed for varying
lengths of time.

b. Can only be used when there are censored data

c. Always assumes that the relative hazard for a particular variable is constant at all times
d. Uses the log rank statistic to compare two survival curves

e. Relies on the assumption that the explanatory variables (covariates) in the model are
normally distributed.

64. Simple linear regression has a number of sample data assumptions, what are they?

a. Linearity, Independence, Normality, Unequal variance

b. Linearity, Independence, non-normality, Equal variance

c. Linearity, Independence, Normality, Equal variance

d. Linearity, Independence, Normality, Unequal range between x and y variables

e. Linearity, Independence, Normality, Unequal variance

Answer question 65-69.

A fisheries researcher wishes to test for a difference in mean weights of a single species of
fish caught by fishermen in three different lakes in Nova Scotia. The significance level for
the test will be 0.05. Complete the following partial ANOVA table and use it to answer
questions 31.1 to 31.4

Source Df Ss Ms F
Within 17.03
group
Between 9
group
Total 31.23

65. The null hypothesis for this analysis is:

a. Not all the fish populations have the same mean.

b. At least one of the fish populations has a different mean.

c. µ1 = µ2 = µ3 d. µ1 = µ2 = µ3 = 0 e. None of these.

66. The value of FDATA for this test is: a. 8.52 b. 5.39 c. 2.00 d.
0.1854

67. The value of FCRIT for this test is: a. 3.5874 b. 3.8625 c. 3.9824 d. 4.2565

68. If you pooled all the individuals from all three lakes into a single group, they would
have a standard deviation of: a. 1.257 b. 1.580 c. 3.767 d. 14.19
69. What is the appropriate interpretation of this test?

a. Reject H0: All three fish populations have different mean weights.

b. Reject H0: Exactly two of the three fish populations have the same means.

c. Reject H0: At least one of the fish populations differs from the others in terms of their
mean weight.

d. Fail to reject H0: The mean weights of the fish in these three populations are the same

d. Fail to reject H0: There is insufficient evidence for differences in mean weights of the
fish from these three populations.

70. Having two sets of data, we can compare their scattering as follows:

A. For approximately equal average values, the one with a higher standard deviation is
more scattered

B. For approximately equal standard deviation values, the one with a higher average is
more scattered

C. For approximately equal standard deviation values, the one with a lower average is
more scattered

D. If both the averages and standard deviations differ much between the series, we can
compare scattering using the coefficient of variation E. all

71. The relationship between number of beers consumed (x) and blood alcohol content (y)
was studied in 16 male college students by using least squares regression. The following
regression equation was obtained from this study: yˆ= -0.0127 + 0.0180x the above
equation implies that:

A. each beer consumed increases blood alcohol by 1.27%

B. on average it takes 1.8 beers to increase blood alcohol content by 1%

C. each beer consumed increases blood alcohol by an average of amount of 1.8%

D. each beer consumed increases blood alcohol by exactly 0.018

72. The coefficient of correlation

A. is the square of the coefficient of determination b. is the square root of the


coefficient of determination

C. is the same as r-square d. can never be negative


73. Two lists of numbers, X and Y, have a correlation of 0.3; X and Z have a correlation of -
0.7 We know that:

A. the stronger correlation is the correlation of X and Y, since it is positive.

B. the stronger correlation is the correlation of X and Z.

C. the two correlations are equally strong, since 1.0 - 0.7 = 0.3

D. We cannot tell which is stronger without more information

74. If a regression has the problem of heteroscedasticity,

A. The predictions it makes will be wrong on average.

B. The predictions it makes will be correct on average, but we will not be certain of the
RMSE

C. It will also have the problem of an omitted variable or variables.

D. It will also be based on a non-linear equation.

75. Among the ff which one is true about Statistical data?

A) Are always denote figures B) May be a single fact C) Are only mathematically correct

D) Used only by experts E) A and D F) B and C

76. All of the following are true about measure of fertility except

A. the numerator of both CBR & GFR is number of live birth in year

B.NRR is always lower than GRR & less by half from TFR

C. if NRR=1.00 then CBR=CDR & the population growth rate is zero

D.GRR like TFR assume hypothetical cohort the women pass from birth to reproductive
life without expiring mortality E.GRR measures the production of female

77. Sample space

A. is any process which generate well defined outcome B .is any subset of experiment

C .is conduction of any random experiment D. is the set of all possible outcome of
experiment

E. is an experiment which can be repeated any number of times under the same condition
the same results
78. All are true about Poisson distribution probability except

A. are always right skewed B. is nearly symmetry when the mean is small

C. the mean, variance & lambda are equal D. events happen randomly& independently
in time at constant rate

E. has not theoretical maximal valve but the probability towards to zero

79. suppose x can take on any valve of 1,2,3&4.If p(1)= 0.002,p(2)=0.1021


&p(3)=0.8801,then P(4) is equal to

A. 0.0158 B. 0.1058 C. 0.0237 D. 0.0957 E. 0.158

80. Suppose that x has Poisson distribution probability with parameter lambda=4.7 then
which is true?

A.P(X=5) =0.174 B. P (X≥ 2) =O.848 C.P(X<2) =0.052 D.SD=2.168 E. all

81. Which one is not true among the normal probability distribution?

A. (Z≤ z-) = P (Z≥ Z+) B. P (Z=x) =0 C. p(Z≥0) =P(Z≤0)=0.5

D. P(Z- ≤ Z ≤ Z +)= 1-2P( Z≤ Z+ ) E. P(0≤Z≤Z+)=½ P(Z- ≤Z ≤ Z-)

82. Sampling unit

A. IS the same as study unit B.IS the units on which information is collected

C .is the units of selection in sampling process

D. is the list of all the units in population from which a sample is to be picked

E. is the ratio of total size of population to the size of sample

83. Design effect common in

A. simple sample random c. systemic sample e. multiple s ample B. stratified sample d.


cluster sample f. d & e

84. All are true about Chi-square distribution test

A. has only right tails B. the x2 curves gets more bell shaped C. measure independently
of each other’s

D. degree of freedom equal to (R-1) (C-1) E. the expected frequency must be at least five
F. all
85. All are true about t-distribution test except

A. ∂ like Z-distribution is known B. is more spread than Z-distribution

C. approach to normal standard distribution if degree of freedom increases

D. the, n, should less than 30 and t less than 5 E. all are true except, a

86. All are true about Hypothesis testing except

a. HO is always parameter of sample b. HA is an equality rather than inequality

C. the normality of the population distribution, equality of variation & independence


sample are the assumption

D. the hypothesis has an intervention effect E. type ¡ (alpha) error is made when HA is
false but accepted. F. c &d

87. All are true about the probability sampling methods except

A. usually more expensive than non -probability sampling methods

B. UN like simple random sampling, systemic sampling can be conducted without sampling
frame

C. UN like stratified sampling, the cluster sampling should be homogenous

D. like multistage random sampling, cluster sampling involves picking a random sample all
units in the clusters

E. the main disadvantage of stratified sampling is require more administrative effort than
simple random sampling

88. If Var (2x+3) =16, then SD (3X) is equal to A. 4 B. 6 C.2 D.8 E. 6√2

89. All are not correct about Histogram except

A. used to display qualitative or quantitative discreet data

B. Non over lapping intervals that covers the entire data valve

C .consist s one or more variable D. is inferior to frequency polygons for comparing two
or more sets of data

E .is important in depicting the shape and location of central tendency F.B&E

90. All are not true about Interval scale of measurement except
A) All mathematic operation is possible B) Comparison is possible C) There is true
zero point

D) Are qualitative data E) is the highest precise

91. All are true about the measurements of association except

A. only tells the association b/n categories B. if RR <1, then the risk is preventable

C.RR estimates is good measurements of public health impact

D. odd ratio is calculated from cross sectional, case control, & retrospective cohort study

e. RR is calculated from prospective cohort & experiment study f .a & c

92. If the mean, median and mode of a distribution are 5, 6, 7 respectively, then the
distribution is:

A. skewed negatively B. not skewed C. skewed positively D. symmetrical E. Bimodal

93. Which of the following measures of variability is not dependent on the exact value of
each observation?

A. Range B. Variance C. standard deviation D. coefficient of variation

94. The median of a frequency distribution is found graphically with the help of:

a) Histogram b) Frequency curve c) Frequency polygon d) Ogive

94. The mean deviation about median from the data: 340, 150, 210, 240, 300, 310, and 320
is:

a) 51.6 b) 51.8 c) 52 d) 52.8

95. The classes in which the lower limit or the upper limit is not specified are known as:

a) Open end classes b) Close end classes c) Inclusive classes d) Exclusive classes

Answer questions 96 and 97 based on the following data which were extracted from the
annual death records of three adjacent districts (1998 Eth. Cal)

Name of district X Y Z

Total no. of deaths from all causes 935 1650 1080

No of deaths due to malaria 187 495 270

96 Which of the following is true about the above data?


a) The highest crude death rate was observed in district Y

b) The lowest crude death rate was observed in district X

c) The crude death rate of district Z was less than the crude death rate of district Y d) All
of the above

e) None of the above

97. Which of the following best explains the proportion of deaths due to malaria in District
(y)?

a) The proportionate mortality ratio was 30%. b) The cause specific mortality rate was
30%.

c) The fatality rate was about 30%. d) District Y had seen the highest infant mortality
rate.

e) None of the above

98. All are true about the following correctly matched?

A. Kurtosis/ Modality => peakedness of a distribution B. Skewness=>shape/symmetry of a


distribution

C. Dispersion=> spread out of the value of the variable D. Location=> average value of the
variable E. All

99. Of the 140 children, 20 lived in owner occupied houses, 70 lived in council houses and
50 lived in private rented accommodation. Appropriate graphical presentation will be:

a. Simple bar chart b. multiple bar chart c. Component bar chart d. Histogram e.
b and c

100. Which of the following is NOT a possible value of the correlation coefficient?

A. negative 0.9 B. zero C. positive 0.15 D. positive 1.5 E. negative .05

101. The regression line is drawn so that:

A. The line goes through more points than any other possible line, straight or curved

B. The line goes through more points than any other possible straight line.

C. The same number of points is below and above the regression line.

D. The sum of the absolute errors is as small as possible.


E. The sum of the squared errors is as small as possible

102. A residual plot:

A. displays residuals of the explanatory variable versus residuals of the response variable.

B. displays residuals of the explanatory variable versus the response variable.

C. displays explanatory variable versus residuals of the response variable.

D. displays the explanatory variable versus the response variable.

E. displays the explanatory variable on the x axis versus the response variable on the y axis.

Answer Question 103-108 A variety of summary statistics were collected for a small sample
(10) of bivariate data, where the dependent variable was y and an independent variable
was x.

ΣX = 90, Σ(Y − Y) (X − X) = 466

ΣY = 170, Σ(X − X) 2 = 234

n = 10, Σ(Y − Y) 2 = 1434

SSE = 505.98

103. The sample correlation coefficient: a. 0.8045 b. -0.8045 c. 0 d. 1

104 The least squares estimate of b1 equals a. 0.923 b. 1.991 c. -1.991 d. -0.923

105. The least squares estimate of b0 equals a. 0.923 b. 1.991 c. -1.991 d. -0.923

106. The sum of squares due to regression (SSR) is a. 1434 b. 505.98 c. 50.598 d.
928.02

107. The coefficient of determination equals a. 0.6471 b. -0.6471 c. 0 d. 1

108. The point estimate of y when x = 0.55 is a. 0.17205 b. 2.018 c. 1.0905 d. -2.018 e.
-0.17205

109. The purpose of simple linear regression analysis is to:

(a) Predict one variable from another variable

(b) Replace points on a scatter diagram by a straight-line

(c) Measure the degree to which two variables are linearly associated
(d) Obtain the expected value of the independent random variable for a given value of the
dependent

Variable

110. When the possible outcomes of an experiment are equally likely to occur, this we
apply:

(a) Relative probability (b) Subjective probability (c) Conditional probability (d) Classical
probability

111. The result of a statistical test, denoted p, shall be interpreted as follows:

A. the null hypothesis H0 is rejected if p <0.05 B. the null hypothesis H0 is rejected if p>
0.05

C. the alternate hypothesis H1 is rejected if p> 0.05 D. the null hypothesis H0 is accepted
if p <0.05

112. Which of the following is not a property of a binomial experiment?

A. the experiment consists of a sequence of n identical trials

B. each outcome can be referred to as a success or a failure

C. the probabilities of the two outcomes can change from one trial to the next

D. the trials are independent

113. In point estimation

A. data from the population is used to estimate the population parameter

B. data from the sample is used to estimate the population parameter

C. data from the sample is used to estimate the sample statistic

D. the mean of the population equals the mean of the sample

114. A variable that takes on the values of 0 or 1 and is used to incorporate the effect of
qualitative variables in a regression model is called

A. an interaction b. a constant variable C. a dummy variable

d. None of these alternatives is correct.

115. The ANOVA procedure is a statistical approach for determining whether or not
a. the means of two samples is equal b. the means of two or more samples
are equal

c. the means of more than two samples is equal d. the means of two or more
populations are equal

116. An important application of the chi-square distribution is

a. making inferences about a single population variance b. testing for goodness of fit

c. testing for the independence of two variables d. All of these alternatives are
correct.

117. Parametric test differ from Nonparametric Tests in that of

A. The parametric test, distribution based but arbitrary used in the case of the
nonparametric test.

B. mean in parametric test while median in nonparametric test can be used

C. complete information about the population can get from Parametric test but not
nonparametric test.

D. spearman’s rank correlation in parametric test while Pearson’s coefficient of correlation


in nonparametric test is used for measuring degree of association

E. none

Answer question 118. Accordingly the Leukemia Survival of X described as the following
table

118. which of the ff is truly accepted?

A.The COR is 7.88 B. The 95% CI of β is b/n 0.302 and3.824

C. The 95 % CI of EXP ( β ¿ b/n 1.353 and 45.832 D. The equation of fit logit p(x) =-1.946
+ 2.046 x E. all

119. In a simple random sample (SRS) of n = 100, X- residents, 38 of them said that they
had attended a football game this year. Which value is the closest to the margin of error for
a 95 percent confidence interval for the proportion of X residents who have attended a
game this year?

(a) 0.01 (b) 0.10 (c) 0.38 (d) 0.62

120. The mean of a binomial distribution is 10 and the number of trials is 30 then
probability of failure of an event is A. 0.333 B. 0.666 C. 0.9 D.
0.25

121. Bar charts may be distinguished from histograms at a glance because:

A. bar charts are not used for time series data B. histograms are used to display
discrete data

C. bar charts are based on area under the curve D. histograms do not have spaces
between consecutive columns

E. none of the above

Answer Question-122.Based on parathyroid cancer survival in year.

Year Number Withdrawn At risk Deaths Prob. of Prob. of Cumulative prob. of


(x) at start during ( rx ) ( dx ) death surviving year surviving x years
( nx ) year ( qx ) X ( Px )
( wx ) ( px )
1 0 1
2 1 0
3 1 1
4 2 1
5 1 0
122. Among the following which one is wrong?

A. The median survival time is 0.472727273

B. The Probability of death at year 5 is 0

C. The Probability of surviving year 2 is 1

D. The Cumulative probability of surviving at 3 year is 0.709090909


E. none

123. All are the Limitations of Kaplan-Meier except?

A. Mainly descriptive B. Doesn’t control for covariates C. Requires categorical


predictors

D. Can’t accommodate time-dependent variables E. None

124. Investigators studied a cohort of individuals who joined a weight-loss program by


tracking their weight loss over 1 year. Which of the following statistical test is likely the
most appropriate test for evaluating the effectiveness of the weight loss program?

a. A two-sample t-test. b. ANOVA c. Chi-square d. Kaplan-Meier methods

125. In which properties of good estimation related to estimator uses information from all
observation in the sample? A. Sufficiency B. Efficiency c. Consistency D.
un-biasness E. All

126. All of the following are possible values of probability except: a. 0.99 b. 0.1
c. -0.77 d. 0

127. Type-1 error is

a. Rejecting true null hypothesis b. Rejecting false null hypothesis

c. Accepting true null hypothesis d. Accepting true null hypothesis

128. All are true about Arithmetic mean except?

A. greatly affected by extreme valve B. easily computed from open end intervals

C. have center of gravity D. is unique and measures the central location

E. all is correct except, B

129. The larger CI of the sample will have

A. higher margin error B. small sample size

C. more variation among individual valve D. less precise the estimation e. all

130. The total fertility rate=5 and the ratio of male to female =500:1, then the GRR is equal
to

a. 2.5 b. 4.99 c. 5 d. 3.5 e. 6


131. Among older age groups of coronary patient, found that their serum level
approximately a normal distribution it was found that 10% of the group had a cholesterol
level below 182.3 mg per 100ml where as 5% had valves above 359.0 mg per 100ml the
mean and standard deviation by per 100ml respectively

a. 260, 60 b. 180, 90 c. 320, 60 d. 380, 80 e. 260, 45

Answer Question 132 and 133 based on the following data, suppose that in a certain
malarias area post experience indicates that the probability of a person with a high grade
fever will positive for malaria is 0.7 consider random selected three patients in the same
area

132. The probability at most two patients positive for malarias a. 0.441 b. 0.343 c.
0.657 d. 0.189 e. 0.027

133. The probability mean & standard deviation respectively

A. 2.1, 0.794 B. 4.2, 0.8 C. 0.193, 0.794 D. 2.3, 0.795 e. none

166. A researcher is interested in the travel time of UOG students to college. A group of 50
students is interviewed. Their mean travel time in 16.7 minutes. For this study the mean of
16.7 minutes is an example of a (n) A. Parameter B. Statistic C. Population D. Sample

167. A sports psychologist was interested in the effects of a six-week imagery intervention
on an athlete’s ability to execute a sport-specific skill such penalty taking in football. How
might you define the imagery variable?

A. Independent variable B. Dependent variable C. Outcome variable D. Resultant


variable

168. A population has a mean of μ=35 and a standard deviation of σ=5. After 3 points are
added to every score in the population, what are the new values for the mean and standard
deviation?

A. μ=35 and σ=5 B. μ=35 and σ=8 C. μ=38 and σ=5 D. μ=38 and σ=8

169. What is a definition of the standard error?

A. Standard deviation of the sample. B. Squared standard deviation. C. Standard


deviation of sample means.

D. Standard deviation of the population mean.

169. A research report summarizes the results of a t-test by stating: t(35)=5.2, p<0.05.
Which of the following is a correct interpretation of this report?
A. The H0 was not rejected and the probability of a Type I error is less than .05.

B. The H0 was not rejected and the probability of a Type II error is less than .05.

C. The H0 was rejected and the probability of a Type I error is less than .05.

D. The H0 was rejected and the probability of a Type II error is less than .05

170. Which of the following is true about a 95% confidence interval of the mean of a given
sample?

A. 95 out of 100 sample means will fall within the limits of the confidence interval.

B. There is a 95% chance that the population mean will fall within the limits of the
confidence interval.

C. 95 out of 100 population means will fall within the limits of the confidence interval.

D. There is a .05 probability that the population mean falls within the limits of the
confidence interval.

171. In an independent t-test output of SPSS, the Levene’s test result is p = .006. What can
we infer from this number?

A. The means of both groups are assumed to be unequal.

B. The means of both groups are assumed to be equal.

C. The variances of both groups are assumed to be unequal.

D. The variances of both groups are assumed to be equal

172. Which of the following statements is the most accurate description for the concept of
standard deviation?

A. The total distance from the smallest score to the highest score.

B. The square root of the total distance from the smallest score to the highest score.

C. The squared average distance between all scores and the mean.

D. The average distance between a score and the mean.

173. All are correctly matched except?

A. Coefficient of Variation=>ratio measurement of inferential statistics

B. ANOVA=>nominal measurement of inferential statistics


C. Geometric Mean=>ratio measurement of descriptive statistics

D. Chi-square=> nominal measurement of descriptive statistics

E. Percentages=>ordinal measurement of descriptive statistics

174. A distribution is more peaked than normal, termed as

A. platykurtic B. leptokurtic C. mesokurtic D. right skewed E. Moment

175. Among the following sets of data 4, 4,4,4,4 and 4, which one is true?

A. mean, mode, median and midrange are equal.

B. Quartile deviation, variance, SD and relative range are equal.

C. range, relative range, midrange and Interquartile range are equal

D. Arithmetic, geometric and harmonic mean are equal

E. none

176. The following are percentages of fat found in 5 samples of each of two brands of baby
food:

A: 5.7, 4.5, 6.2, 6.3, 7.3

B: 6.3, 5.7, 5.9, 6.4, 5.1

Which of the following procedures is appropriate to test the hypothesis of equal average fat
content in the two types of ice cream?

A) Paired t-test with 5 df B) Two sample t-test with 8 df

C) Paired t-test with 4 df D) Two sample t-test with 9 df E) Sign test

177. A process by which we estimate the value of dependent variable on the basis of one or
more independent

a variable is called: (a) Correlation (b) Regression (c) Residual (d) Slope

178. The dividing point between the region where the null hypothesis is rejected and the
region where it is not

rejected is said to be: (a) Critical region (b) Critical value (c) Acceptance region (d)
Significant region
179. Suppose we had a much simpler regression, obtained with the command >>> regress
(gdp85 ~ gdp60)

Coefficients: Estimate t-stat 95% CI

(Intercept) 3.83 2.4 (0.63, 7.03)


21.9 (0.94,
gdp60 1.04
1.13)
SE of regression (RMSE) = 10.7684 R-squared = 0.8267
If GDP in 1960 were 20 percent of US GDP (so gdp60 were 20), our prediction for gdp85
would be closest to:
A. 20 B. 25 C. 30 D. 40 E. 50

180. An absolute measure of dispersion which expresses variation in the same units as the
original data is the:

A. Standard deviation B .Coefficient of variation C. Variance D. All of the above

181. How does the computation of a sample variance differ from the computation of a
population variance?

A. μ is replaced by x B. N is replaced by n -1 C. N is replaced by n D. A and c


but not b E. A and b but not c

182 The algebraic sum of the deviations of a set of n values from their mean is a) 0 b) n –
1 c) n d) n + 1 e. none

Based on the following table answer question, six paintings were ranked by two judges.

Painting First judge (X) Second judge (Y)

A 2 2
B 1 3
C 4 4
D 5 6
E 6 5
F 3 1
183. The Spearman’s rank correlation coefficient is A. 0.71 B. 0.29 C. 0.42 D. 0.58

184. Select the correct statement concerning the p-value:

a. The p-value provides insight on how clinically important the study result is.

b. The p-value is the probability that the null hypothesis is correct.


c. The p-value can be used to get a general estimate of the study’s power.

d. Assessing multiple comparisons will not change the value of each individually calculated
p-value.

e. The p-value is the probability of avoiding a Type I error

185. A_____________ suitable MCT when the data pertains to rates and time

A. Weighted mean B. Arithmetic mean C. Harmonic mean D. geometric mean

Part III. Short answer

186. What are the factors affect and its effect on power of a test?

187. What are the criteria that we use to look into the validity of a chi-squared test? When
do you use correction for continuity in chi-squared test?

188. What is p-value of a test? What does statistically significant means?

189. Why do we sometimes need to transform data before doing further analysis? List the
different techniques of data transformation with the conditions that we need on the data.

190. What are the differences between paired and independent samples?

PART III Workout

191. A cross sectional survey was carried out among women of age group 20-60 yrs. to
determine whether there is an association between history of multiple sexual partners and
cervical cancer. Can you conclude from the survey results shown below that there is no
association between the two?

A. State the null and alternative hypothesis

B. calculate the test statistic

C. State your decision

D. Make a conclusion
192. Based the ff two SPSS out going to investigate whether there is a difference in sodium
level in the diet between men and women. Answer the question

a) What is the outcome variable? b) What type of variable is it?

c) What is the null hypothesis for this research question?

d) What sort of graph could be used to investigate this research question?

e) The mean sodium for males 3073.7 (95% CI 3006.6 to 3140.9) and females 2692.1 (95%
CI 2634.7 to 2729.7). Is there a difference between the sodium levels between males and
females explain you answer?

f) Using the SPSS output below, describe the relationship between sex and sodium intake?

g) As a result of the SPSS output, do you accept or reject your null hypothesis?

193. Based on the ff data answer question, in a cancer drug trial, 37 patients were
randomized to the treatment group and 32 patients to the control group. Their survival
times (until death) are measured in months and some observations are censored.
(Variables: group, sex, and age)

Explanatory Hazard 95% CI of 95% CI of exp( P-value


variable Ratio β (β)
Group <0.0001
Control 1.0
Treatment 0.1052 A B
Sex 0.4342
Male 1.0
Female C D 0.732-1.366
Age 1.127 E F 0.002
A) What are the value of a and b?
B) What are the valve of c and d?
C) What are the value of e and f?
D) Identify the significant variable and interpreted it?

194. A study was conducted to investigate the possible cause of gastroenteritis outbreak
following a lunch served in a high school cafeteria. Among the 225 students who ate the
sandwiches, 109 became ill. While, among the 38 students who did not eat the sandwiches, 4
became ill.

A. described by 2x2 contingency table


B. state null and alternative hypothesis
C. make decision at α = 0.05, X21, α = 3.84

195. A survey is planned to determine what proportion of the medical students have
regularly chewed khat. If no estimate of p is available and a pilot sample cannot be drawn,
what sample size would be required if a 95% confidence is desired, and d=0.04 is to be
used?

196. A physical therapist wished to estimate, with 99% confidence, the mean maximal
strength of a particular muscle in a certain group of individuals. He assumes that strength
scores are normally distributed with a variance of 144. , A sample of 15 subjects who
participated in the experiment yielded a mean of 84.3 what are 95% CI of the population
mean age?

197. Based on the ff data answer question, Culture and Gonodectin (GD) test results for
240 Urethral Discharge Specimens

GD Test Culture Result


Result
Gonorrhea No Gonorrhea Total

Positive 175 9 184

Negative 8 48 56

Total 183 57 240

a. What is the probability that a man has gonorrhea?


b. What is the probability that a man has a positive GD test?
c. What is the probability that a man has a positive GD test and gonorrhea?
d. What is the probability that a man has a negative GD test and does not have
gonorrhea
e. What is the probability that a man with gonorrhea has a positive GD test?
f. What is the probability that a man does not have gonorrhea has a negative GD
test?
g. What is the probability that a man does not have gonorrhea has a positive GD
test?
h. What is the probability that a man with positive GD test has gonorrhea?

198. Suppose that a cohort study of 400 smokers and 600 non-smokers documented the
incidence of hypertension over a period of 10 years. The following table summarizes the
data at the end of the study period:

Hypertension
Yes No Total
Yes 120 280
Smoking No 30 570
Total
Based on the above information, calculate and interpret the following measures of
association:

A. Relative risk (RR)


B. Attributable risk (AR) /or preventive fraction (PF)
C. Attributable risk percent (AR %)
D. Population attributable risk (PAR)
E. Population attributable risk percent (PAR %)

199. The following table is reproduced from a paper comparing the health and lifestyle of
people who live in traditional houses with those who live in improved ‘Habitat’ housing in
Malawi. Use the table to answer the following questions.

Habitat house Traditional house P value


Education
No education 13 (13%) 15 (13%)
Primary (age 6-13) 63 (64%) 73 (64%) 0.80
Secondary (14-17) 22 (22%) 26 (23%)
Work status
Farmer 58 (59%) 71 (62%) 0.68
Wage earner 40 (41%) 43 (38%)
Median area of land 2 2 0.63
owned
Any illness in the past 4
weeks
No 60 (62%) 57 (50%) 0.1
Yes 37 (38%) 57 (50%)
Safe water source
No 38 (39%) 52 (46%) 0.3
Yes 60 (61%) 62 54%)

a) What statistical test would have been most appropriate for looking at the association
between education and type of housing?
b) Why do you think the authors have given the median instead of the mean as a summary
of average area of land?
c) What should the authors have given with the median to describe the variability in land
ownership?
d) What test would have been used to obtain the P value for comparing land ownership
between those with habitat and traditional houses?
e) Calculate the odds ratio for the effect of type of housing upon whether people have had
an illness in the past 4 weeks. Show your working
f) The authors continued on to do a multivariate analysis to look at the effect of type of
housing on illness after adjustment for confounders such as water source. The adjusted
odds ratio for type of housing was 0.55 (95% confidence interval 0.34 to 0.75). Interpret
this result i.e. what does this mean?

200. The following table shows the association between urban or rural site of residence and
having trichuris infection in children in Jimma, using CROSSTABS in SPSS. Use it to
answer the following questions.
a) What is the exposure variable in this study?

b) What type of variable is it?


c) What percentage of urban children is infected with Trichuris?

d) Which hypothesis (or statistical) test would you use to compare the percentage with
Trichuris infection in urban and rural children?

e) From the SPSS output, what is the P value for this association?

f) Interpret this P value i.e. explain what this P value means

g) From the SPSS output, what are the odds ratio and the 95% confidence interval for
Trichuris infection in rural compared to urban children?

h) What does this confidence interval tell us?

You might also like