Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Regression

A technique for determining the statistical relationship between two or more variables where a
change in a dependent variable is associated with, and depends on, a change in one or more
independent variable. This relationship can be linear or non-linear.
Example- Income and expenditure, where- income independent and expenditure dependent
variables.

Independent and Dependent variables:


➢ Independent variable- It is a variable that stands alone and isn't changed by the other
variables you are trying to measure. For example, someone's age might be an independent
variable.
➢ Dependent variable- It is something that depends on other factors. For example, a test
score could be a dependent variable because it could change depending on several factors
such as how much you studied, how much sleep you got the night before you took the
test, or even how hungry you were when you took it.
Assumptions of linear regression model:
1. The dependent variable is assumed to be normally distributed.
2. The values of dependent variable are statistically independent. This means that in
selecting a sample a particular x does not depend on any other value of x.
3. The explanatory variables are uncorrelated.
4. The error term ε is normally and independently distributed with mean zero and
variance 𝜎 2 .
Types of Regression Model:
(i) Simple Linear Regression Model: When we have one independent and one
dependent variable then that regression is called is simple regression. Example-
income (independent) and expenditure(dependent).
(ii) Multiple Linear Regression Model: When we have more than one independent
variables and one dependent variable then that regression is called multiple
regression. Example- income, savings (independent variables) and expenditure
(dependent variable).

Simple Regression Equation:- y=mx+c


𝑦 = 𝛼 + 𝛽𝑥 + 𝜀 ; where,

x indicates independent variable.


y indicates dependent variable.
α or a represents intercept. It is the mean value of y when x is equal to 0.
β or b represents slope. This is called regression coefficient. It measures the change of y,
associates with a one-unit increase in x. If b is positive, the mean value of y increases as x
increases. If b is negative the mean value of y decreases as x increases.
𝜀 represents stochastic error term that describes the effects of all factors on y other than the
value of independent variable x.

1
Multiple Regression Equation:-
𝑦 = 𝛼 + 𝛽1𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝜀

Properties of regression
i. It measures linear and non-linear relationship between two (at least) quantitative
variables.
ii. The numerical measurement is asymmetrical.
iii. Here has at least one dependent variable.

Example: Suppose the following data represents expenditure and income of a company.
(i) Identify independent and dependent variables.
(ii) Calculate regression coefficients and make comment on it.
(iii) Fit the regression model. Estimate expenditure for 13 Tk income.
Expenditure (Tk) 4 6 9 13
Income (Tk) 8 10 12 15

Solution:
(i) Independent variable → income (x)
Dependent variable → expenditure (y)
Total no. of units, n = 4
(ii)
Expenditure, y Income, x xy x2
4 8 32 64
6 10 60 100
9 12 108 144
13 15 195 225
∑y = 32 ∑x = 45 ∑xy = 395 ∑x2 = 533

∑𝑥 45 ∑𝑦 32
𝑥̅ = = = 11.25 , 𝑦̅ = = =8
𝑛 4 𝑛 4

Regression coefficients are

∑ 𝑥𝑦 − 𝑛𝑥̅ 𝑦̅ 395 − (4 ∗ 11.25 ∗ 8)


𝑏= 2 2
= 𝑏=
∑ 𝑥 − 𝑛𝑥̅ 533 − (4 ∗ 11.25 ∗ 11.25)

= 1.31

Comment on b: If independent variable changes 1 Tk then dependent variable will increase 1.31
Tk on an average.

and 𝑎 = 𝑦̅ − 𝑏𝑥̅
= -6.74
2
Comment on a: If independent variable has no effect (x=0) then dependent variable, y will be -
6.74 Tk. (x=0, y=a)

(iii) The fitted regression equation is


𝑦
̂ = 𝑎 + 𝑏𝑥 ̂
For income 13 Tk, ̂ 𝑥 = 13, expenditure will be, 𝑦
̂ = -6.74 + (1.31*13) = 10.29 Tk

Example 8.1: The following data show the duration of experience of machine operators and
their performance ratings given by the number of good parts turned out per 100 pieces.
Experience (Years) Performance Ratings
16 87
12 84
18 89
10 80

i. Determine the fitted regression model for the given data.


ii. Interpret the values of α=70.3 and β=1.05.
iii. Estimate the performance ratings when an operator has 20 years’ experience. Y= 91.3

Difference between correlation and regression:-


BASIS FOR
CORRELATION REGRESSION
COMPARISON

Meaning Correlation is a statistical measure Regression describes how an


which determines co-relationship independent variable is
or association of two variables. numerically related to the
dependent variable.

Usage To represent linear relationship To fit a best line and estimate one
between two variables. variable on the basis of another
variable.

Dependent and No difference Both variables are different.


Independent
variables

Indicates Correlation coefficient indicates Regression indicates the impact of


the extent to which two variables a unit change in the known
move together. variable (x) on the estimated
variable (y).

3
BASIS FOR
CORRELATION REGRESSION
COMPARISON

Objective To find a numerical value To estimate values of random


expressing the relationship between variable on the basis of the values
variables. of fixed variable.

4
Probability

• Experiment:
Any task or phenomenon, which gives us some outcome or result when it is performed, is called
an experiment. There are two types of an experiment as follows:
• Trial:
Unit of an experiment is known as trial. This means that trial is a special case of experiment.
Experiment may be a trial or two or more trials.
• Sample space:
The set or the collection of all possible outcomes of the random experiment is called the sample
space. It is denoted by the notation S or Ω. For example, if the random experiment is tossing a
coin, then Ω = {H, T}.
• Outcome:
In probability theory, an outcome is a possible result of an experiment. Example- if we toss a
coin then head appears then head is our outcome. Dice={1}, coin={T}
• Event:
Any subset of the sample space is called an event. The notations, namely, A, B, C, etc., always
denote an event. : Suppose a fair coin is tossed twice, then the sample space of the experiment
will be
= {HH, HT, TH.TT}. So {HH} could be event.

Probability:
If there are n mutually exclusive, equally likely and exhaustive outcomes of an experiment and if
m of these outcomes are favorable to an event A, then the probability of the event A which is
denoted by P( A) is defined by
P( A) = Favorable outcomes of an event A/Total number of outcomes of the experiments
P ( A) =
m
n

• Axioms/Laws of probability:
i) 0 ≤ P(x) ≤ 1 ii) ∑p(x)=1

Example 1: A bag contains 2 black, 3 white and 4 blue balls. If one ball is drawn randomly
from the bag, what is the probability that it is i) blue, ii) black, iii) white.

Solution: Sample space, S = {black, white, blue}


Total no. of balls in sample space, n(s) = 9
𝑛(𝑏) 4
i) The probability that the selected ball is blue, p(b) = 𝑛(𝑠) = 9

𝑛(𝑏𝑙) 2
ii) The probability that the selected ball is black, p(bl) = =9
𝑛(𝑠)

𝑛(𝑤) 3
iii) The probability that the selected ball is white, p(w) = =9
𝑛(𝑠)

5
Example 2: A coin is tossed and the probability to obtain head is 0.47. Calculate the probability
to obtain tail in the next trial.
Solution: From the 2nd axiom we know, Sample space for a coin, S={H,T} and so
P(H)+P(T) = 1

Probability to obtain tail in the next trial = 1- the probability to obtain head = 1-0.47 = 0.53

Example 3: Calculate mean, variance and standard deviation for the following data-set.
Value of x 1 2 3 4
Probability of x, P(x) 0.29 0.35 * 0.13

Solution: From the 2nd axiom of probability, ∑p(x) = 1


=> p(1)+ p(2) + p(3) + p(4) = 1
=> 0.29+0.35+*+0.13 = 1
=> * = 1 – 0.77 = 0.23

Value of x 1 2 3 4
Probability of x, P(x) 0.29 0.35 0.23 0.13

Expectation of x or mean value of x, E(x) = ∑xp(x) = 1 0.29 + 2 0.35 + 3 0.23 + 4 0.13


= 2.2

Variance of x, V(x) = ∑𝑥 2 p(x) - [𝐸(𝑥)]2


= (12 × 0.29 + 22 × 0.35 + 32 × 0.23 + 42 × 0.13) –(2.2)2 = 1

Standard deviation of x, SD(x) = √𝑉(𝑥) = 1

Joint probability:
Joint probability is the probability of two events in conjunction. That is, it is the probability of
both events together. The joint probability of A and B is written P (A and B) or.

Marginal probability:
Marginal probability is the probability of A, regardless of whether event B did or did not occur.
If B can be thought of as the event of a random variable X having a given outcome, the marginal
probability of A can be obtained by summing the joint probabilities over all outcomes for X.

Conditional probability
Let A and B be two events. The conditional probability of event A given that B has occurred,
is defined by the symbol p ( A B ) and is found to be:
p ( A B)
p ( A B) = ; provided p ( B )  0 .
p ( B)

6
p ( A B)
Similarly, p ( B A ) = ; provided p ( A)  0 .
p ( A)

1. Math on Marginal, Joint and Conditional probability:


Q. The personnel department of a company has records which show the following analysis of its
200 engineers.
Age Bachelor’s degree Master’s degree Total
Under 30 90 10 100
30 to 40 20 30 50
over 40 40 10 50
Total 150 50 200

If one engineer is selected at random then calculate the probability that –


(i) he has only bachelor’s degree.
(ii) he has both degrees.
(iii) he has a master’s degree, given that he is over 40.
(iv) he is under 30, given that he has only bachelor’s degree.

𝑛(𝐵) 150
Ans. (i) P(B) = = 200
𝑛(𝑠)
𝑛(𝑀) 50
(ii) P(M) = = 200
𝑛(𝑠)
𝑛(𝑀 ∩ 𝑜𝑣𝑒𝑟 40) 10
(iii) P(M | Over 40) = =
𝑛(𝑜𝑣𝑒𝑟 40) 50
𝑛(𝑢𝑛𝑑𝑒𝑟 30 ∩ 𝐵) 90
(iv) P (under 30 | B) = =
𝑛(𝐵) 150

7
Probability Distribution

Random Variable
A variable associates with probability is called random variable.
Example- The average height of Bangladeshi boys is 5’6” with probability 0.71.

Types of random variable:


i. A discrete random variable (qualitative →count/frequency, quantitative →integer)
is a variable that represents numbers found by counting. For example: number of
marbles in a jar, number of students present or number of heads when tossing two
coins.
ii. When we have to use intervals for our random variable or all values in an interval are
possible, we call it a continuous random variable (quantitative →integer +
fractional). Example- height of a group of people or distance traveled while grocery
shopping or students test scores.

What is Probability Distribution?


Distribution of a random variable is called Probability Distribution.

➔ Types of probability distribution:


A probability distribution will be either discrete or continuous according as the random variable
is discrete or continuous.
Discrete Probability distribution:
1. Bernoulli distribution
2. Binomial distribution
3. Poisson distribution
4. Hyper geometric distribution
5. Geometric distribution
6. Negative binomial Distribution
Continuous probability distribution:
1. Normal distribution
2. Exponential distribution
3. Gamma distribution
4. Beta distribution
5. Log normal distribution

8
➔ Distinguish between probability and frequency distribution
Frequency distribution Probability distribution

1. Frequency is how many times things of 1. Probability is how many times one
a certain category happened and this Think something will happen, and this
frequency presented in a tabular form probability presented in a tabular form
called frequency distribution. called probability distribution.

Discrete Probability distribution:


1. Binomial Distribution
A binomial distribution can be thought of as simply the probability of a SUCCESS or FAILURE
outcome in an experiment or survey that is repeated multiple times. The binomial is a type of
distribution that has two possible outcomes (the prefix “bi” means two, or twice).

P(X = x) = (𝑛𝐶𝑥 )𝑝 𝑥 𝑞 𝑛−𝑥 ; x = 0, 1, 2, … , n

Where,
X-> random variable , x -> value of random variable
n-> total no. of trials
p-> probability of success
q-> probability of failure
p+q=1.
** X~B (n,p)

Conditions/assumptions of binomial distribution:


i. An experiment will have two outcomes only.
ii. Each and every trial is independent and identically distributed.
iii. There must be a fixed number of trials.
iv. The probability of a success must remain the same for each trial.

Applications of binomial distribution


Many instances of binomial distributions can be found in real life. For example, if a new drug is
introduced to cure a disease, it either cures the disease (it’s successful) or it doesn’t cure the
disease (it’s a failure). If you purchase a lottery ticket, you’re either going to win money, or you
aren’t. Basically, anything you can think of that can only be a success or a failure can be
represented by a binomial distribution. A coin has tossed only two possible outcomes: heads or
tails and taking a test could have two possible outcomes: pass or fail.

9
Example: In a community the probability of a newly born child will be boy is 2/5. Among the 4
newly born children in that community, what is the probability that (a) all the 4 boys (b) no boy
(c) exactly one boy (d) at least two boys (e) at most two boys. Also calculate mean and variance.

Solution:
Here has two possible outcomes, baby boy & baby girl, where the probability of baby boy will
born is 2/5, which is the probability of success (baby boy). Total no. of newly born children is 4
which is fixed too. So this problem follows binomial distribution.

Probability of success (baby boy), p = 2/5


Probability of failure (baby girl), q = 1- 2/5 = 3/5 [p+q=1]
Total no. of trials/sample size/total no. of newly born children, n = 4
The no. of baby boys is a random variable, X

The binomial distribution is


𝑷(𝑿 = 𝒙) = (𝒏𝑪𝒙 )𝒑𝒙 𝒒𝒏−𝒙

(a) Probability that all the 4 boys,


2 3
𝑃(𝑋 = 4) = (4𝐶4 )(5)4 (5)4−4 = 16/625 = 0.0256

(b) Probability of no boys,


2 3
𝑃(𝑋 = 0) = (4𝐶0 )(5)0 (5)4−0 = 81/625 = 0.1296

(c) Probability of exactly 1 boy,


2 3
𝑃(𝑋 = 1) = (4𝐶1 )(5)1 (5)4−1= 216/625 = 0.3456

(d) Probability of at least/minimum two boys,

𝑃(𝑋 ≥ 2) = 𝑃(𝑋 = 2) + 𝑃(𝑋 = 3) + 𝑃(𝑋 = 4)


2 3 2 3 2 3
=(4𝐶2 )(5)2 (5)4−2 + (4𝐶3 )(5)3 (5)4−3 + (4𝐶4 )(5)4 (5)4−4

= 328/625 = 0.5248

(e) Probability of at most/at best/maximum two boys,


𝑃(𝑋 ≤ 2) = 𝑃(𝑋 = 2) + 𝑃(𝑋 = 1) + 𝑃(𝑋 = 0)

2 2 3 4−2 2 1 3 4−1 2 0 3 4−0


= (4𝐶2 ) (5) (5) + (4𝐶1 ) (5) (5) + (4𝐶0 ) (5) (5)

= 513/625 = 0.8208

10
For binomial distribution mean = np = 4*(2/5) =1.6 and
Variance = npq = 4*(2/5)*(3/5) =0.96

[Note: Use of calculator: (4𝐶2 )= 4 → shift → ÷ (𝑛𝐶𝑟 ) →2 → = Result]

2. Poisson Distribution
Poisson distribution is applied in situations where there are a large number of independent
Bernoulli trials with a very small probability of success in any trial say p.

𝑒 −𝜆 𝜆𝑥
P(X = x) = ; x = 0, 1, 2, …
𝑥!

Where
X->random variable, x -> value of random variable
λ ->the rate of change
**X~P(λ)

Conditions/assumptions for poisson distribution:


i. The total numbers of trials, n are large.
ii. The probability, p to occur an event is too small.

Applications of poisson distribution


i) The number of calls coming per minute into a hotel’s reservation center.
ii) The number of automobiles arriving at a traffic light within the hour.
iii) The number of births expected during the night in a hospital.
iv) The number of typographical errors found in a book.
v) The number of home runs hit in Major League Baseball games.
vi) The number of hurricanes in a given season.

Example- Suppose that the number of emergency patients in a given day at a certain hospital is a
poisson variable with parameter 20. What is the probability that in a given day there will be (a)
15 emergency patients, (b) no emergency patients (c) more than 20 but less than 25 patients.
Calculate mean and variance.

Solution:
The rate of emergency patients, λ = 20. (parameter of poisson distribution is the rate of
occurance, λ)

The poisson distribution is


𝑒 −𝜆 𝜆𝑥
P(X = x) = 𝑥!
The numbers of emergency patient → X
𝑒 −20 2015
(a) The probability that 15 emergency patients will come, P(X= 15) = = 0.0516
15!

11
𝑒 −20 200
(b) The probability that no emergency patients will come, P(X= 0) = = 2.06×10-9 [0! =
0!

1]

(c) The probability that more than 20 but less than 25 patients emergency patients will come,

𝑒 −20 2021 𝑒 −20 2022 𝑒 −20 2023 𝑒 −20 2024


P(20 < X < 25) = + + + = 0.2441
21! 22! 23! 24!

For poisson distribution, Mean = variance = λ = 20

Continuous probability distribution


Normal Distribution
A variable is said to be normally distributed or have a normal distribution if its distribution has
the shape of a normal curve - a special bell-shaped curve.
1 𝑥−𝜇 2
1
f(X=x) = 𝑒 −2 ( 𝜎
)
; -∞<x<∞
𝜎√2𝜋
- ∞ < µ< ∞
𝜎>0
** X~N (μ, 𝝈𝟐 )

Properties of normal distribution


i) The normal curve is symmetrical about the mean μ and the mean, median, and mode are equal.
ii) The mean is at the middle and divides the area into halves.
iii) The total area under the curve is equal to 1.
iv) It is completely determined by its mean and standard deviation σ (or variance 𝜎 2 ).
v) The normal curve approaches, but never touches, the x-axis.

Applications of Normal distribution:


One reason the normal distribution is important is that many psychological and educational
variables are distributed approximately normally. Measures of reading ability, introversion, job
satisfaction, and memory are among the many psychological variables approximately normally
distributed.

Standard Normal Distribution


The standard normal distribution is a normal distribution with mean of 0, standard deviation of 1.
1 2
1
f(Z = z) = 𝑒 −2 𝑧
√2𝜋

12
Properties of standard normal distribution:
i) The normal curve is symmetrical about the mean μ = 0 and the mean, median, and mode are
equal.
ii) The mean is at the middle and divides the area into halves.
iii) The total area under the curve is equal to 1.
iv) It is completely determined by its mean and standard deviation σ = 1 (or variance 𝜎 2 ).
v) The normal curve approaches, but never touches, the x-axis.

Example: Let height is a random variable follow normal distribution where mean is 5 feet and
standard deviation is 0.14 feet. Now calculate probability for the followings:
(i) P(X<2.91) (ii) P(X>6) (iii) P(2.5<X<3.5)
(iv) P(Z<1.21) (v) P(Z>2) (vi) P(1<Z<2)

Solution:
Height is a random variable, X ~ N(µ ,𝜎 2 )
Where, µ → mean and 𝜎 2 → variance [𝜎 → standard deviation]
Z ~ N(0 ,1) [Z → standard normal variate. A variable follows standard normal distribution]

𝑿−𝝁
=𝒁
𝝈
𝑋−𝜇 2.91−5
(i) P(X<2.91) = P( < ) = P(Z < -14.93) = 0
𝜎 0.14

𝑋−𝜇 3−5
(ii) P(X>3) = P( > ) = P(Z > 7.14) = 1 - P(Z < 7.14)= 1 – 1
𝜎 0.14

2.5−5 𝑋−𝜇 3.5−5


(iii) P(2.5<X<3.5) = P( < < ) = P(-17.86 < Z < -10.71)
0.14 𝜎 0.14

= P(Z < -10.71) – P(Z < -17.86) = 0 – 0 = 0

(iv) P(Z<1.21) = 0.8869

(v) P(Z>2) = 1 – P(Z<2.00)= 1 – 0.9772 = 0.0228

(vi) P(1<Z<2) = P(Z < 2) - P(Z < 1.00)= 0.9772 – 0.8413 = 0.1359

Q. Let weight is a random variable follow normal distribution where mean is 64 Kg and
standard deviation is 1.43 Kg. Now calculate probability for the followings:
(i) P(X<64) (ii) P(X>77) (iii) P(60<X<72)
(iv) P(Z<0.08) (v) P(Z> - 2.74) (vi) P( - 1.58 <Z< 2.49)

13
Testing of Hypothesis

Hypothesis:
Hypothesis testing is a statistical method that is used in making statistical decisions using
experimental data. Hypothesis Testing is basically an assumption that we make about the
population parameter.

Types of hypothesis:
(i) Null Hypothesis: When we assume no difference between population and sample
results this statement is called null hypothesis. It is denoted by H0.
(ii) Alternative hypothesis: Contrary to the null hypothesis, when there has a difference
between population and sample results this statement is called alternative hypothesis.
It is denoted by H1 or HA.

Two types of errors:


(i) Type I error/Level of significance: - When we reject the null hypothesis, although
that hypothesis was true. Type I error is denoted by alpha. In hypothesis testing, the
normal curve that shows the critical region is called the α region.
(ii) Type II error:- When we accept the null hypothesis but it is false. Type II errors
are denoted by beta. In Hypothesis testing, the normal curve that shows the
acceptance region is called the β region.

Critical value:
In hypothesis testing, a critical value is a point on the test distribution that is compared to the test
statistic to determine whether to reject the null hypothesis.

Why do we need hypothesis test?


Hypothesis testing is an essential procedure in statistics. A hypothesis test evaluates two
mutually exclusive statements about a population to determine which statement is best supported
by the sample data. When we test based on sample, we want to find out an approximate answer
and for that we need to test using statistical hypothesis.
According to the San Jose State University Statistics Department, hypothesis testing is one of the
most important concepts in statistics because it is how you decide if something really happened,
or if certain treatments have positive effects, or if groups differ from each other or if one variable
predicts another. In short, you want to proof if your data is statistically significant and unlikely to
have occurred by chance alone. In essence then, a hypothesis test is a test of significance.

Test statistic
A test statistic is a single measure of some attribute of a sample used in statistical hypothesis
testing.
There are different types of test statistics

14
1. z- test
2. t- test
3. Chi square test
4. F-test.

Mention some tests of hypothesis:


i. Single mean test
ii. Comparison of two sample mean
iii. Paired t-test
iv. Variance test
v. Proportion test etc.

Standard Normal Distribution


The standard normal distribution is a normal distribution with mean of 0, standard deviation of 1.
1 2
1
f(Z = z) = 𝑒 −2 𝑧
√2𝜋

Assumptions of Z-test:

• All sample observations are independent


• Sample size should be more than 30.
• Distribution of Z is normal, with a mean zero and variance 1.

The test statistic is:

x ̅is the sample mean


σ is population standard deviation
n is sample size
μ is the population mean

t-distribution
The t distribution (also called Student’s t Distribution) is a family of distributions that look
almost identical to the normal distribution curve, only a bit shorter and fatter. The t distribution is
used instead of the normal distribution when you have small samples (for more on this, see: t-
score vs. z-score). The larger the sample size, the more the t distribution looks like the normal
distribution.

Assumptions of t-test:
• All data points are independent.
• The sample size is small. Generally, a sample size exceeding 30 sample units is regarded
as large, otherwise small but that should not be less than 5, to apply t-test.
• Sample values are to be taken and recorded accurately.

15
The test statistic is:

x ̅is the sample mean


s is sample standard deviation
n is sample size
μ is the population mean

Properties of the t Distribution


The t distribution has the following properties:
1. The mean of the distribution is equal to 0 .
2. The variance is equal to v / ( v - 2 ), where v is the degrees of freedom (see last section) and v
> 2.
3. The variance is always greater than 1, although it is close to 1 when there are many degrees of
freedom. With infinite degrees of freedom, the t distribution is the same as the standard normal
distribution.

Applications of Z-tests:
1. Test of hypothesis of the population mean
2. Test of Hypothesis of the Difference between Two Means
3. Test of hypothesis of the proportion.

Applications of t-tests:
1. Test of hypothesis of the population mean
2. Test of hypothesis of the difference between two means
3. Test of hypothesis of the difference between two means with dependent samples
4. Test of hypothesis about the coefficient of correlation

Comparison Chart
BASIS FOR
T-TEST Z-TEST
COMPARISON

Meaning T-test refers to a type of parametric Z-test implies a hypothesis test


test that is applied to identify, how the which ascertains if the means of
means of two sets of data differ from two datasets are different from
one another when variance is not each other when variance is
given. given.

Based on Student-t distribution Normal distribution

Population variance Unknown Known

Sample Size Small (n<30) Large

16
When Z or t-test will be used?
One of the important conditions for adopting t-test is that population variance is unknown.
Conversely, population variance should be known or assumed to be known in case of a z-test.
Z-test is used to when the sample size is large, i.e. n > 30, and t-test is appropriate when the size
of the sample is small, in the sense that n < 30.

Q. Write down the steps to hypothesis test of a single mean.


Solution: 1st step: H0: µ = c
H1: µ ≠ c

2nd step: At α level of significance we conduct test.


3rd step: Test statistics
(If population variance (σ2) is known → Z-test OR if unknown but sample size is
greater than or equal to 30 → Z-test) OR
(If population variance (σ2) is unknown but sample size is less than 30 → t-test)

Population variance (σ2) or population standard deviation (σ)

Case – I: Known Case – II: Unknown

Sample size, n ≥30 Sample size, n<30

Z-test Z-test t-test

4th step: Calculation


where, population mean → µ
population variance → σ2
population standard deviation → σ
sample mean→𝑥̅
sample size → n
sample variance → s2
sample standard deviation → s
using the above information we will calculate Zcal or tcal.

5th step: Critical value identification


If Zcal < Ztab then we may accept H0 (otherwise Zcal > Ztab then we may not accept H0 )
OR, If tcal < ttab then we may accept H0 (otherwise tcal > ttab then we may not accept H0 )

6th step: Decision


Make a valid conclusion based on step-5.

17
Example-1. A simple random sample of 11 observations is selected from a population with
mean 25 and variance 6.82 and found the sample mean 20.5 and sample variance 8.75. Do you
think that the sample is selected from a population having mean 25? Also find 95% confidence
interval for µ.
Solution:
1st step: H0: µ = 25
H1: µ ≠ 25

2nd step: At α = 0.05 level of significance we conduct test.

3rd step: Test statistics


|𝑥̅ −𝜇|
𝑍= 𝜎
√𝑛

4th step: Calculation


where, population mean → µ =25
population variance → σ2 = 6.82
population standard deviation → σ = 2.61
sample mean→𝑥̅ = 20.5
sample size → n = 11
sample variance → s2 = 8.75
sample standard deviation → s = 2.96

|𝑥̅ −𝜇| |20.5−25|


𝑍𝑐𝑎𝑙 = 𝜎 = 2.61 = 5.7
√𝑛 √11

5th step: Critical region identification


Here Zcal > Ztab so, we may not accept H0

6th step: Decision


The sample is selected from a population is not equal to mean 25.
95% confidence interval for µ:
𝜎 𝜎
Pr(̅̅̅𝑥 − 𝑍𝛼 ≤ 𝜇 ≤ 𝑥̅ + 𝑍𝛼 ) = 1-α
2 √𝑛 2 √𝑛
=> Pr( 20.5 - 1.96(2.61/√11) ≤ 𝜇 ≤ 20.5 + 1.96(2.61/√11 )) = 1-0.05
=> Pr ( 18.96 ≤ 𝜇 ≤ 22.04) = 0.95
So, C.I. = (18.96, 22.04)

Example-2. A simple random sample of 11 observations is selected from a population with


mean 25 and found the sample mean 20.5 and sample variance 8.75. Do you think that the
sample is selected from a population having mean 25? Also find the confidence interval for µ.

18
Solution:
1st step: H0: µ = 25
H1: µ ≠ 25

2nd step: At α = 0.05 level of significance we conduct test.

3rd step: Test statistics


|𝑥̅ −𝜇|
𝑡= 𝑠
√𝑛

4th step: Calculation


where, population mean → µ =25
sample mean→𝑥̅ = 20.5
sample size → n = 11
sample variance → s2 = 8.75
sample standard deviation → s = 2.96

|𝑥̅ −𝜇| |20.5−25|


𝑡𝑐𝑎𝑙 = 𝑠 = 2.96 = 5.06
√𝑛 √11

5th step: Critical region identification


Here tcal > ttab = 2.228 so, we may not accept H0

6th step: Decision


The sample is selected from a population is not equal to mean 25.

95% confidence interval for µ:


𝑠 𝑠
Pr( 𝑥̅ − 𝑡𝛼,𝑛−1 𝑛 ≤ 𝜇 ≤ 𝑥̅ + 𝑡𝛼,𝑛−1 ) = 1-α
2 √ 2 √𝑛
=> Pr( 20.5 – 2.228(2.96/√11) ≤ 𝜇 ≤ 20.5 + 2.228(2.96/√11 )) = 1-0.05
=> Pr ( 18.51 ≤ 𝜇 ≤ 22.49) = 0.95
So, C.I. = (18.51, 22.49)

Example-3. A simple random sample of 30 observations is selected from a population with


mean 25 and found the sample mean 20.5 and sample variance 8.75. Do you think that the
sample is selected from a population having mean 25? Also find the confidence interval for µ.

Solution:
1st step: H0: µ = 25
H1: µ ≠ 25

2nd step: At α = 0.05 level of significance we conduct test.

19
3rd step: Test statistics
|𝑥̅ −𝜇|
𝑍= 𝜎
√𝑛

4th step: Calculation


where, population mean → µ =25
sample mean→𝑥̅ = 20.5
sample size → n = 30
sample variance → s2 = 8.75
sample standard deviation → s = 2.96

|𝑥̅ −𝜇| |20.5−25|


𝑍𝑐𝑎𝑙 = 𝑠 = 2.96 = 8.33
√𝑛 √30

5th step: Critical region identification


Here Zcal > Ztab so, we may not accept H0

6th step: Decision


The sample is selected from a population is not equal to mean 25.

95% confidence interval for µ:


𝑠 𝑠
Pr(̅̅̅𝑥 − 𝑍𝛼 ≤ 𝜇 ≤ 𝑥̅ + 𝑍𝛼 ) = 1-α
2 √𝑛 2 √𝑛
=> Pr( 20.5 - 1.96(2.96/√30) ≤ 𝜇 ≤ 20.5 + 1.96(2.96/√30 )) = 1-0.05
=> Pr ( 19.44 ≤ 𝜇 ≤ 21.56) = 0.95
So, C.I. = (19.44, 21.56)

20
Sampling
Sampling is simply the process of learning about the population on the basis of a sample drawn
from it. In the Sampling technique, instead of every unit of universe only a part of the universe is
studied and the conclusion is drawn on that basis for the entire universe. The theory of sampling
has taken place only in recent years, the idea of sampling is pretty old. Since times immemorial
have examined a handful of grains to ascertain the quality of entire lot. A housewife examines
only two or three grains of boiling rice to know whether the pot of rice is ready or not. A
businessman places orders for material by examining only a small sample of the same.
Why is sampling necessary?
The following are the reasons for sampling:
1. To bring the population to a manageable number
2. To reduce cost
3. To help in minimizing error from the despondence due to large number in the population
4. Sampling helps the researcher to meet up with the challenge of time.
An overview of sampling technique:
Two types of sampling-
i. Probability sampling
ii. Non-probability sampling
i. Probability Sampling is a method wherein each member of the population has the
same probability of being a part of the sample.
(Simple random Sampling, Stratified random sampling, Systematic sampling, Cluster
sampling)
ii. Non-probability Sampling is a method wherein each member of the population does
not have an equal chance of being selected. When the researcher desires to choose
members selectively, non-probability sampling is considered. Both sampling
techniques are frequently utilized. However, one works better than others depending
on research needs.
(Quota sampling, Purposive sampling, Snowball sampling, Convenience sampling)

Simple Random Sampling:


A simple random sample is a set of n objects in a population of N objects where all possible
samples are equally likely to happen. Here’s a basic example of how to get a simple random
sample: put 100 numbered bingo balls into a bowl (this is the population N). Select 10 balls from
the bowl without looking (this is your sample n). Note that it’s important not to look as you could
(unknowingly) bias the sample. While the “lottery bowl” method can work fine for smaller
populations, in reality you’ll be dealing with much larger populations.
Stratified Random Sampling:
• The population consists of N elements.
• The population is divided into H groups, called strata.
• Each element of the population can be assigned to one, and only one, stratum/group.

21
• The number of observations within each stratum Nh is known, and N = N1 + N2 + N3 +
... + NH-1 + NH.
• The researcher obtains a probability sample from each stratum.

What are the differences between probability and non-probability sampling?


Basis for
Probability Sampling Non-Probability Sampling
Comparison
Probability sampling is a sampling Nonprobability sampling is a method of
technique, in which the subjects of the sampling wherein, it is not known that
Meaning
population get an equal opportunity to which individual from the population
be selected as a representative sample. will be selected as a sample.
Alternately
Random sampling Non-random sampling
known as
Basis of
Randomly Arbitrarily
selection
Opportunity of
Fixed and known Not specified and unknown
selection
Research Conclusive Exploratory
Result Unbiased Biased
Method Objective Subjective
Inferences Statistical Analytical
Hypothesis Tested Generated

Differences between simple random sampling and stratified sampling


Simple random sampling Stratified random sampling

1. Population is not divided into several 1. Population is divided into several groups.
groups.
2. Each unit has equal probability to be 2. Each unit has not equal probability to be selected.
selected.
3. It is less efficient. 3. It is more efficient than simple random sampling.

Census:
A census is the procedure of systematically acquiring and recording information about the
members of a given population. Under this method, data is collected for each and every unit viz.
person, household, field, shop, factory etc…, as the case may be of the population or universe.
For example, if the average wage of workers working in a sugar factory is to be calculated, then
wage figures would be obtained by dividing the total wages which all the workers received by no
of workers working in the sugar industry.

22
Differences between census and sampling:
Census Sampling
Census measures everyone in the whole A Sample is a portion of whole country
country
In Census, each and every unit of the Only few units of the population studied is
population is studied. studied in Sampling.
Census refers to periodic collection of However, if the next Census is far away,
information about the populace from the entire Sampling is the most convenient method of
population. obtaining data about the population.
Census method demands a large amount of Relatively less amount of finance, till labor is
finance, time and labor. required for Sampling.
Results obtained by Census are quite reliable. Results obtained by Sampling are less reliable.
It is more suitable to use Census Method if It is more suitable to use Sampling Method if
population is heterogeneous in nature. population is homogeneous in nature.
Margin of error is not present in Census as Samples have a margin of error though, which
each and every part of the geographical area gets lower as the sample size increases.
has to be approached for data collection.

Study question:
1. Define
a) Census
b) Sampling technique
c) Population
2. What are the necessities of sampling?
3. Write down the classification of sampling.
4. Write down the steps
a) Simple random sampling
b) Stratified random sampling

23

You might also like