Final Note
Final Note
A technique for determining the statistical relationship between two or more variables where a
change in a dependent variable is associated with, and depends on, a change in one or more
independent variable. This relationship can be linear or non-linear.
Example- Income and expenditure, where- income independent and expenditure dependent
variables.
1
Multiple Regression Equation:-
𝑦 = 𝛼 + 𝛽1𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝜀
Properties of regression
i. It measures linear and non-linear relationship between two (at least) quantitative
variables.
ii. The numerical measurement is asymmetrical.
iii. Here has at least one dependent variable.
Example: Suppose the following data represents expenditure and income of a company.
(i) Identify independent and dependent variables.
(ii) Calculate regression coefficients and make comment on it.
(iii) Fit the regression model. Estimate expenditure for 13 Tk income.
Expenditure (Tk) 4 6 9 13
Income (Tk) 8 10 12 15
Solution:
(i) Independent variable → income (x)
Dependent variable → expenditure (y)
Total no. of units, n = 4
(ii)
Expenditure, y Income, x xy x2
4 8 32 64
6 10 60 100
9 12 108 144
13 15 195 225
∑y = 32 ∑x = 45 ∑xy = 395 ∑x2 = 533
∑𝑥 45 ∑𝑦 32
𝑥̅ = = = 11.25 , 𝑦̅ = = =8
𝑛 4 𝑛 4
= 1.31
Comment on b: If independent variable changes 1 Tk then dependent variable will increase 1.31
Tk on an average.
and 𝑎 = 𝑦̅ − 𝑏𝑥̅
= -6.74
2
Comment on a: If independent variable has no effect (x=0) then dependent variable, y will be -
6.74 Tk. (x=0, y=a)
Example 8.1: The following data show the duration of experience of machine operators and
their performance ratings given by the number of good parts turned out per 100 pieces.
Experience (Years) Performance Ratings
16 87
12 84
18 89
10 80
Usage To represent linear relationship To fit a best line and estimate one
between two variables. variable on the basis of another
variable.
3
BASIS FOR
CORRELATION REGRESSION
COMPARISON
4
Probability
• Experiment:
Any task or phenomenon, which gives us some outcome or result when it is performed, is called
an experiment. There are two types of an experiment as follows:
• Trial:
Unit of an experiment is known as trial. This means that trial is a special case of experiment.
Experiment may be a trial or two or more trials.
• Sample space:
The set or the collection of all possible outcomes of the random experiment is called the sample
space. It is denoted by the notation S or Ω. For example, if the random experiment is tossing a
coin, then Ω = {H, T}.
• Outcome:
In probability theory, an outcome is a possible result of an experiment. Example- if we toss a
coin then head appears then head is our outcome. Dice={1}, coin={T}
• Event:
Any subset of the sample space is called an event. The notations, namely, A, B, C, etc., always
denote an event. : Suppose a fair coin is tossed twice, then the sample space of the experiment
will be
= {HH, HT, TH.TT}. So {HH} could be event.
Probability:
If there are n mutually exclusive, equally likely and exhaustive outcomes of an experiment and if
m of these outcomes are favorable to an event A, then the probability of the event A which is
denoted by P( A) is defined by
P( A) = Favorable outcomes of an event A/Total number of outcomes of the experiments
P ( A) =
m
n
• Axioms/Laws of probability:
i) 0 ≤ P(x) ≤ 1 ii) ∑p(x)=1
Example 1: A bag contains 2 black, 3 white and 4 blue balls. If one ball is drawn randomly
from the bag, what is the probability that it is i) blue, ii) black, iii) white.
𝑛(𝑏𝑙) 2
ii) The probability that the selected ball is black, p(bl) = =9
𝑛(𝑠)
𝑛(𝑤) 3
iii) The probability that the selected ball is white, p(w) = =9
𝑛(𝑠)
5
Example 2: A coin is tossed and the probability to obtain head is 0.47. Calculate the probability
to obtain tail in the next trial.
Solution: From the 2nd axiom we know, Sample space for a coin, S={H,T} and so
P(H)+P(T) = 1
Probability to obtain tail in the next trial = 1- the probability to obtain head = 1-0.47 = 0.53
Example 3: Calculate mean, variance and standard deviation for the following data-set.
Value of x 1 2 3 4
Probability of x, P(x) 0.29 0.35 * 0.13
Value of x 1 2 3 4
Probability of x, P(x) 0.29 0.35 0.23 0.13
Joint probability:
Joint probability is the probability of two events in conjunction. That is, it is the probability of
both events together. The joint probability of A and B is written P (A and B) or.
Marginal probability:
Marginal probability is the probability of A, regardless of whether event B did or did not occur.
If B can be thought of as the event of a random variable X having a given outcome, the marginal
probability of A can be obtained by summing the joint probabilities over all outcomes for X.
Conditional probability
Let A and B be two events. The conditional probability of event A given that B has occurred,
is defined by the symbol p ( A B ) and is found to be:
p ( A B)
p ( A B) = ; provided p ( B ) 0 .
p ( B)
6
p ( A B)
Similarly, p ( B A ) = ; provided p ( A) 0 .
p ( A)
𝑛(𝐵) 150
Ans. (i) P(B) = = 200
𝑛(𝑠)
𝑛(𝑀) 50
(ii) P(M) = = 200
𝑛(𝑠)
𝑛(𝑀 ∩ 𝑜𝑣𝑒𝑟 40) 10
(iii) P(M | Over 40) = =
𝑛(𝑜𝑣𝑒𝑟 40) 50
𝑛(𝑢𝑛𝑑𝑒𝑟 30 ∩ 𝐵) 90
(iv) P (under 30 | B) = =
𝑛(𝐵) 150
7
Probability Distribution
Random Variable
A variable associates with probability is called random variable.
Example- The average height of Bangladeshi boys is 5’6” with probability 0.71.
8
➔ Distinguish between probability and frequency distribution
Frequency distribution Probability distribution
1. Frequency is how many times things of 1. Probability is how many times one
a certain category happened and this Think something will happen, and this
frequency presented in a tabular form probability presented in a tabular form
called frequency distribution. called probability distribution.
Where,
X-> random variable , x -> value of random variable
n-> total no. of trials
p-> probability of success
q-> probability of failure
p+q=1.
** X~B (n,p)
9
Example: In a community the probability of a newly born child will be boy is 2/5. Among the 4
newly born children in that community, what is the probability that (a) all the 4 boys (b) no boy
(c) exactly one boy (d) at least two boys (e) at most two boys. Also calculate mean and variance.
Solution:
Here has two possible outcomes, baby boy & baby girl, where the probability of baby boy will
born is 2/5, which is the probability of success (baby boy). Total no. of newly born children is 4
which is fixed too. So this problem follows binomial distribution.
= 328/625 = 0.5248
= 513/625 = 0.8208
10
For binomial distribution mean = np = 4*(2/5) =1.6 and
Variance = npq = 4*(2/5)*(3/5) =0.96
2. Poisson Distribution
Poisson distribution is applied in situations where there are a large number of independent
Bernoulli trials with a very small probability of success in any trial say p.
𝑒 −𝜆 𝜆𝑥
P(X = x) = ; x = 0, 1, 2, …
𝑥!
Where
X->random variable, x -> value of random variable
λ ->the rate of change
**X~P(λ)
Example- Suppose that the number of emergency patients in a given day at a certain hospital is a
poisson variable with parameter 20. What is the probability that in a given day there will be (a)
15 emergency patients, (b) no emergency patients (c) more than 20 but less than 25 patients.
Calculate mean and variance.
Solution:
The rate of emergency patients, λ = 20. (parameter of poisson distribution is the rate of
occurance, λ)
11
𝑒 −20 200
(b) The probability that no emergency patients will come, P(X= 0) = = 2.06×10-9 [0! =
0!
1]
(c) The probability that more than 20 but less than 25 patients emergency patients will come,
12
Properties of standard normal distribution:
i) The normal curve is symmetrical about the mean μ = 0 and the mean, median, and mode are
equal.
ii) The mean is at the middle and divides the area into halves.
iii) The total area under the curve is equal to 1.
iv) It is completely determined by its mean and standard deviation σ = 1 (or variance 𝜎 2 ).
v) The normal curve approaches, but never touches, the x-axis.
Example: Let height is a random variable follow normal distribution where mean is 5 feet and
standard deviation is 0.14 feet. Now calculate probability for the followings:
(i) P(X<2.91) (ii) P(X>6) (iii) P(2.5<X<3.5)
(iv) P(Z<1.21) (v) P(Z>2) (vi) P(1<Z<2)
Solution:
Height is a random variable, X ~ N(µ ,𝜎 2 )
Where, µ → mean and 𝜎 2 → variance [𝜎 → standard deviation]
Z ~ N(0 ,1) [Z → standard normal variate. A variable follows standard normal distribution]
𝑿−𝝁
=𝒁
𝝈
𝑋−𝜇 2.91−5
(i) P(X<2.91) = P( < ) = P(Z < -14.93) = 0
𝜎 0.14
𝑋−𝜇 3−5
(ii) P(X>3) = P( > ) = P(Z > 7.14) = 1 - P(Z < 7.14)= 1 – 1
𝜎 0.14
(vi) P(1<Z<2) = P(Z < 2) - P(Z < 1.00)= 0.9772 – 0.8413 = 0.1359
Q. Let weight is a random variable follow normal distribution where mean is 64 Kg and
standard deviation is 1.43 Kg. Now calculate probability for the followings:
(i) P(X<64) (ii) P(X>77) (iii) P(60<X<72)
(iv) P(Z<0.08) (v) P(Z> - 2.74) (vi) P( - 1.58 <Z< 2.49)
13
Testing of Hypothesis
Hypothesis:
Hypothesis testing is a statistical method that is used in making statistical decisions using
experimental data. Hypothesis Testing is basically an assumption that we make about the
population parameter.
Types of hypothesis:
(i) Null Hypothesis: When we assume no difference between population and sample
results this statement is called null hypothesis. It is denoted by H0.
(ii) Alternative hypothesis: Contrary to the null hypothesis, when there has a difference
between population and sample results this statement is called alternative hypothesis.
It is denoted by H1 or HA.
Critical value:
In hypothesis testing, a critical value is a point on the test distribution that is compared to the test
statistic to determine whether to reject the null hypothesis.
Test statistic
A test statistic is a single measure of some attribute of a sample used in statistical hypothesis
testing.
There are different types of test statistics
14
1. z- test
2. t- test
3. Chi square test
4. F-test.
Assumptions of Z-test:
t-distribution
The t distribution (also called Student’s t Distribution) is a family of distributions that look
almost identical to the normal distribution curve, only a bit shorter and fatter. The t distribution is
used instead of the normal distribution when you have small samples (for more on this, see: t-
score vs. z-score). The larger the sample size, the more the t distribution looks like the normal
distribution.
Assumptions of t-test:
• All data points are independent.
• The sample size is small. Generally, a sample size exceeding 30 sample units is regarded
as large, otherwise small but that should not be less than 5, to apply t-test.
• Sample values are to be taken and recorded accurately.
15
The test statistic is:
Applications of Z-tests:
1. Test of hypothesis of the population mean
2. Test of Hypothesis of the Difference between Two Means
3. Test of hypothesis of the proportion.
Applications of t-tests:
1. Test of hypothesis of the population mean
2. Test of hypothesis of the difference between two means
3. Test of hypothesis of the difference between two means with dependent samples
4. Test of hypothesis about the coefficient of correlation
Comparison Chart
BASIS FOR
T-TEST Z-TEST
COMPARISON
16
When Z or t-test will be used?
One of the important conditions for adopting t-test is that population variance is unknown.
Conversely, population variance should be known or assumed to be known in case of a z-test.
Z-test is used to when the sample size is large, i.e. n > 30, and t-test is appropriate when the size
of the sample is small, in the sense that n < 30.
17
Example-1. A simple random sample of 11 observations is selected from a population with
mean 25 and variance 6.82 and found the sample mean 20.5 and sample variance 8.75. Do you
think that the sample is selected from a population having mean 25? Also find 95% confidence
interval for µ.
Solution:
1st step: H0: µ = 25
H1: µ ≠ 25
18
Solution:
1st step: H0: µ = 25
H1: µ ≠ 25
Solution:
1st step: H0: µ = 25
H1: µ ≠ 25
19
3rd step: Test statistics
|𝑥̅ −𝜇|
𝑍= 𝜎
√𝑛
20
Sampling
Sampling is simply the process of learning about the population on the basis of a sample drawn
from it. In the Sampling technique, instead of every unit of universe only a part of the universe is
studied and the conclusion is drawn on that basis for the entire universe. The theory of sampling
has taken place only in recent years, the idea of sampling is pretty old. Since times immemorial
have examined a handful of grains to ascertain the quality of entire lot. A housewife examines
only two or three grains of boiling rice to know whether the pot of rice is ready or not. A
businessman places orders for material by examining only a small sample of the same.
Why is sampling necessary?
The following are the reasons for sampling:
1. To bring the population to a manageable number
2. To reduce cost
3. To help in minimizing error from the despondence due to large number in the population
4. Sampling helps the researcher to meet up with the challenge of time.
An overview of sampling technique:
Two types of sampling-
i. Probability sampling
ii. Non-probability sampling
i. Probability Sampling is a method wherein each member of the population has the
same probability of being a part of the sample.
(Simple random Sampling, Stratified random sampling, Systematic sampling, Cluster
sampling)
ii. Non-probability Sampling is a method wherein each member of the population does
not have an equal chance of being selected. When the researcher desires to choose
members selectively, non-probability sampling is considered. Both sampling
techniques are frequently utilized. However, one works better than others depending
on research needs.
(Quota sampling, Purposive sampling, Snowball sampling, Convenience sampling)
21
• The number of observations within each stratum Nh is known, and N = N1 + N2 + N3 +
... + NH-1 + NH.
• The researcher obtains a probability sample from each stratum.
1. Population is not divided into several 1. Population is divided into several groups.
groups.
2. Each unit has equal probability to be 2. Each unit has not equal probability to be selected.
selected.
3. It is less efficient. 3. It is more efficient than simple random sampling.
Census:
A census is the procedure of systematically acquiring and recording information about the
members of a given population. Under this method, data is collected for each and every unit viz.
person, household, field, shop, factory etc…, as the case may be of the population or universe.
For example, if the average wage of workers working in a sugar factory is to be calculated, then
wage figures would be obtained by dividing the total wages which all the workers received by no
of workers working in the sugar industry.
22
Differences between census and sampling:
Census Sampling
Census measures everyone in the whole A Sample is a portion of whole country
country
In Census, each and every unit of the Only few units of the population studied is
population is studied. studied in Sampling.
Census refers to periodic collection of However, if the next Census is far away,
information about the populace from the entire Sampling is the most convenient method of
population. obtaining data about the population.
Census method demands a large amount of Relatively less amount of finance, till labor is
finance, time and labor. required for Sampling.
Results obtained by Census are quite reliable. Results obtained by Sampling are less reliable.
It is more suitable to use Census Method if It is more suitable to use Sampling Method if
population is heterogeneous in nature. population is homogeneous in nature.
Margin of error is not present in Census as Samples have a margin of error though, which
each and every part of the geographical area gets lower as the sample size increases.
has to be approached for data collection.
Study question:
1. Define
a) Census
b) Sampling technique
c) Population
2. What are the necessities of sampling?
3. Write down the classification of sampling.
4. Write down the steps
a) Simple random sampling
b) Stratified random sampling
23