Download as pdf or txt
Download as pdf or txt
You are on page 1of 533

Probability and Statistics

Gujarat Technological University 2019


About the Authors

Ravish R Singh is presently Director at Thakur Ramnarayan College


of Arts and Commerce, Mumbai. He obtained a BE degree from
University of Mumbai in 1991, an MTech degree from IIT Bombay
in 2001, and a PhD degree from Faculty of Technology, University
of Mumbai, in 2013. He has published several books with McGraw
Hill Education (India) on varied subjects like Engineering
Mathematics, Applied Mathematics, Electrical Networks, Network
Analysis and Synthesis, Electrical Engineering, Basic Electrical and
Electronics Engineering, etc., for all-India curricula as well as regional curricula of
some universities like Gujarat Technological University, Mumbai University, Pune
University, Jawaharlal Nehru Technological University, Anna University, Uttarakhand
Technical University, and Dr A P J Abdul Kalam Technical University. Dr Singh is a
member of IEEE, ISTE, and IETE, and has published research papers in national and
international journals. His fields of interest include Circuits, Signals and Systems, and
Engineering Mathematics.

Mukul Bhatt is presently Assistant Professor at Department of


Mathematics and Statistics, Thakur Ramnarayan College of Arts and
Commerce, Mumbai. She obtained her MSc (Mathematics) degree
from H N B Garhwal University in 1992, and a PhD degree from
Faculty of Science, PAHER University, Udaipur, Rajasthan in 2017.
She has published several books with McGraw Hill Education (India)
Private Limited on Engineering Mathematics and Applied Mathematics
for all-India curricula as well as regional curricula of some universities like Gujarat
Technological University, Mumbai University, Pune University, Jawaharlal Nehru
Technological University, Anna University, Uttarakhand Technical University, and
Dr A P J Abdul Kalam Technical University. Dr Bhatt has twenty six years of teaching
experience at various levels in engineer­ing colleges and her fields of interest include
Integral Calculus, Complex Analysis, and Operation Research. She is a member of
ISTE.
Probability and Statistics
Gujarat Technological University 2019

Ravish R Singh
Director
Thakur Ramnarayan College of Arts & Commerce
Mumbai, Maharashtra

Mukul Bhatt
Assistant Professor
Thakur Ramnarayan College of Arts & Commerce
Mumbai, Maharashtra

McGraw Hill Education (India) Private Limited


Published by McGraw Hill Education (India) Private Limited
444/1, Sri Ekambara Naicker Industrial Estate, Alapakkam, Porur, Chennai 600 116
Probability and Statistics, GTU–2019
Copyright © 2019 by McGraw Hill Education (India) Private Limited.
No part of this publication may be reproduced or distributed in any form or by any means, electronic,
mechanical, photocopying, recording, or otherwise or stored in a database or retrieval system without the
prior written permission of the publishers. The program listing (if any) may be entered, stored and executed
in a computer system, but they may not be reproduced for publication.
This edition can be exported from India only by the publishers,
McGraw Hill Education (India) Private Limited.
ISBN (13): 978-93-5316-755-4
ISBN (10): 93-5316-755-8
1 2 3 4 5 6 7 8 9 D103074 23 22 21 20 19  
Printed and bound in India.
Managing Director: Lalit Singh
Senior Portfolio Manager—Higher Education: Hemant K Jha
Portfolio Manager—Higher Education: Navneet Kumar
Production Head: Satinder S Baveja
Assistant Manager—Production: Anuj K Shriwastava
General Manager—Production: Rajender P Ghansela
Manager—Production: Reji Kumar

Information contained in this work has been obtained by McGraw Hill Education (India), from sources
believed to be reliable. However, neither McGraw Hill Education (India) nor its authors guarantee the accuracy
or completeness of any information published herein, and neither McGraw Hill Education (India) nor its
authors shall be responsible for any errors, omissions, or damages arising out of use of this information. This
work is published with the understanding that McGraw Hill Education (India) and its authors are supplying
information but are not attempting to render engineering or other professional services. If such services are
required, the assistance of an appropriate professional should be sought.

Visit us at: www.mheducation.co.in


Write to us at: [email protected]
CIN: U80302TN2010PTC111532
Toll Free Number: 1800 103 5875
Dedicated
to
Aman and Aditri
Ravish R Singh

Soumya and Siddharth


Mukul Bhatt
Contents
Preface xi
Roadmap to the Syllabus xiii
1. Probability 1.1–1.57
1.1 Introduction 1.1
1.2 Some Important Terms and Concepts 1.1
1.3 Definitions of Probability 1.3
1.4 Theorems on Probability 1.13
1.5 Conditional Probability 1.25
1.6 Multiplicative Theorem for Independent Events 1.25
1.7 Bayes’ Theorem 1.47
2. Random Variables 2.1–2.83
2.1 Introduction 2.1
2.2 Random Variables 2.2
2.3 Probability Mass Function 2.3
2.4 Discrete Distribution Function 2.4
2.5 Probability Density Function 2.18
2.6 Continuous Distribution Function 2.18
2.7 Two-Dimensional Discrete Random Variables 2.41
2.8 Two-Dimensional Continuous Random Variables 2.56
3. Basic Statistics 3.1–3.96
3.1 Introduction 3.1
3.2 Measures of Central Tendency 3.2
3.3 Measures of Dispersion 3.3
3.4 Moments 3.18
3.5 Skewness 3.25
3.6 Kurtosis 3.26
3.7 Measures of Statistics for Continuous Random Variables 3.32
3.8 Expected Values of Two Dimensional Random Variables 3.68
3.9 Bounds on Probabilities 3.84
3.10 Chebyshev’s Inequality 3.84
viii Contents

4. Correlation and Regression 4.1–4.56


4.1 Introduction 4.1
4.2 Correlation 4.2
4.3 Types of Correlations 4.2
4.4 Methods of Studying Correlation 4.3
4.5 Scatter Diagram 4.4
4.6 Simple Graph 4.5
4.7 Karl Pearson’s Coefficient of Correlation 4.5
4.8 Properties of Coefficient of Correlation 4.6
4.9 Rank Correlation 4.22
4.10 Regression 4.29
4.11 Types of Regression 4.30
4.12 Methods of Studying Regression 4.30
4.13 Lines of Regression 4.31
4.14 Regression Coefficients 4.31
4.15 Properties of Regression Coefficients 4.34
4.16 Properties of Lines of Regression (Linear Regression) 4.35
5. Some Special Probability Distributions 5.1–5.104
5.1 Introduction 5.1
5.2 Binomial Distribution 5.2
5.3 Poisson Distribution 5.27
5.4 Normal Distribution 5.53
5.5 Exponential Distribution 5.79
5.6 Gamma Distribution 5.96
6. Applied Statistics: Test of Hypothesis 6.1–6.86
6.1 Introduction 6.1
6.2 Terms Related to Tests of Hypothesis 6.2
6.3 Procedure for Testing of Hypothesis 6.5
6.4 Test of Significance for Large Samples 6.6
6.5 Test of Significance for Single Proportion – Large Samples 6.8
6.6 Test of Significance for Difference of Proportions – Large
Samples 6.13
6.7 Test of Significance for Single Mean – Large Samples 6.21
6.8 Test of Significance for Difference of Means – Large Samples 6.26
6.9 Test of Significance for Difference of Standard Deviations – Large
Samples 6.31
6.10 Small Sample Tests 6.36
6.11 Student’s t-distribution 6.36
6.12 t-test: Test of Significance for Single Mean 6.37
6.13 t-test: Test of Significance for Difference of Means 6.42
6.14 t-test: Test of Significance for Correlation Coefficients 6.51
6.15 Snedecor’s F-test for Ratio of Variances 6.55
Contents ix

2
6.16 Chi-square (c ) Test 6.65
6.17 Chi-square Test: Goodness of Fit 6.66
6.18 Chi-square Test for Independence of Attributes 6.74
7. Curve Fitting 7.1–7.26
7.1 Introduction 7.1
7.2 Least Square Method 7.2
7.3 Fitting of Linear Curves 7.2
7.4 Fitting of Quadratic Curves 7.10
7.5 Fitting of Exponential and Logarithmic Curves 7.18

Appendix A.1–A.4

Index I.1–I.4
Preface
Probability and Statistics is a key area of study in any engineering course. A sound
knowledge of this subject will help engineering students develop analytical skills, and
thus enable them to solve numerical problems encountered in real life, as well as apply
mathematical principles to physical problems, particularly in the field of engineering.

Users
This book is designed for the first year GTU engineering students pursuing the course
Probability and Statistics, Subject CODE: 3130006 in their 3rd Semester. It covers the
complete GTU syllabus for the course on Probability and Statistics.

Objective
The crisp and complete explanation of topics will help students easily understand the basic
concepts. The tutorial approach (i.e., teach by example) followed in the text will enable
students develop a logical perspective to solving problems.

Features
Each topic has been explained from the examination point of view, wherein the theory
is presented in an easy-to-understand student-friendly style. Full coverage of concepts is
supported by numerous solved examples with varied complexity levels, which is aligned
to the latest GTU syllabus. Fundamental and sequential explanation of topics are well
aided by examples and exercises. The solu­tions of examples are set following a ‘tutorial’
approach, which will make it easy for students from any background to easily grasp the
concepts. Exercises with answers immediately follow the solved examples enforcing a
practice-based approach. We hope that the students will gain logical understanding from
solved problems and then reiterate it through solving similar exercise problems themselves.
The unique blend of theory and application caters to the requirements of both the students
and the faculty.

Highlights
∑ Crisp content strictly as per the latest GTU syllabus of Probability and Statistics
∑ Comprehensive coverage with lucid presentation style
∑ Each section concludes with an exercise to test understanding of topics
∑ Rich exam-oriented pedagogy:
 Solved examples within chapters: 360+
 Unsolved exercises: 330+
xii Preface

Chapter Organization
The content spans the following 7 chapters which wholly and sequentially cover each mod-
ule of the syllabus.
 Chapter 1 introduces Probability.
 Chapter 2 discusses Random Variables.
 Chapter 3 presents Basic Statistics.
 Chapter 4 covers Correlation and Regression.
 Chapter 5 deals with Some Special Probability Distributions.
 Chapter 6 presents Applied Statistics: Test of Hypothesis.
 Chapter 7 presents Curve Fitting.

Acknowledgements
We are grateful to the following reviewers who reviewed sample chapters of the book and
generously shared their valuable comments:

D.M. Diwan Government Engineering College, Gandhinagar


Vijay Makwana Government Engineering College, Patan
Manokamana Pawan Silver Oak College of Engineering and Technology, Ahmedabad
Kumar Agarwal
Vijaykumar Ramanlal Government Engineering College, Valsad
Visavaliya
Ankit Rawal Shree Swaminarayan Institute of Technology, Ahmedabad

We would also like to thank all the staff at McGraw Hill Education (India), especially Navneet
Kumar, Hemant K Jha, Satinder Singh Baveja and Anuj Shriwastava for coordinating with
us during the editorial, copyediting, and production stages of this book.
Our acknowledgements would be incomplete without a mention of the contribution of
all our family members. We extend a heartfelt thanks to them for always motivating and
supporting us throughout the project.
Constructive suggestions for the improvement of the book will always be welcome.
Ravish R Singh
Mukul Bhatt

Publisher’s Note
Remember to write to us. We look forward to receiving your feedback, comments and
ideas to enhance the quality of this book. You can reach us at [email protected].
Please mention the title and authors’ name as the subject. In case you spot piracy of this
book, please do let us know.
Roadmap to the Syllabus
Probability and Statistics
Subject Code: 3130006

Unit-I: Basic Probability


Experiment, definition of probability, conditional probability, independent
events, Bayes’ rule, Bernoulli trials, Random variables, discrete random variable,
probability mass function, continuous random variable, probability density function,
cumulative distribution function, properties of cumulative distribution function,
Two dimensional random variables and their distribution functions, Marginal
probability function, Independent random variables.

CHAPTER 1: Probability
Go To
CHAPTER 2: Random Variables

Unit-2: Some Special Probability Distributions


Binomial distribution, Poisson distribution, Poisson approximation to the binomial
distribution, Normal, Exponential and Gamma densities, Evaluation of statistical
parameters for these distributions.

Go To CHAPTER 5: Some Special Probability Distributions

Unit-3: Basic Statistics


Measure of central tendency: Moments, Expectation, dispersion, skewness,
kurtosis, expected value of two dimensional random variable, Linear Correlation,
correlation coefficient, rank correlation coefficient, Regression, Bounds on
probability, Chebyshev’s Inequality

CHAPTER 3: Basic Statistics


Go To
CHAPTER 4: Correlation and Regression
xiv Roadmap to the Syllabus

Unit-4: Applied Statistics


Formation of Hypothesis, Test of significance: Large sample test for single
proportion, Difference of proportions, Single mean, Difference of means, and
Difference of standard deviations. Test of significance for Small samples: t-test for
single mean, difference of means, t-test for correlation coefficients, F-test for ratio
of variances, Chi-square test for goodness of fit and independence of attributes.

Go To CHAPTER 6: Applied Statistics: Test of Hypothesis

Unit-5: Curve Fitting by the Numerical Method


Curve fitting by of method of least squares, fitting of straight lines, second degree
parabola and more general curves

Go To CHAPTER 7: Curve Fitting


CHAPTER

Probability
1
Chapter Outline
1.1 Introduction
1.2 Some Important Terms and Concepts
1.3 Definitions of Probability
1.4 Theorems on Probability
1.5 Conditional Probability
1.6 Multiplicative Theorem for Independent Events
1.7 Bayes’ Theorem

1.1 Introduction

The concept of probability originated from the analysis of the games of chance. Even
today, a large number of problems exist which are based on the games of chance, such
as tossing of a coin, throwing of dice, and playing of cards. The utility of probability
in business and economics is most emphatically revealed in the field of predictions for
the future. Probability is a concept which measures the degree of uncertainty and that
of certainty as a corollary.
The word probability or ‘chance’ is used commonly in day-to-day life. Daily, we come
across the sentences like, ‘it may rain today’, ‘India may win the forthcoming cricket
match against Sri Lanka’, ‘the chances of making profits by investing in shares of
Company A are very bright, etc. Each of the above sentences involves an element
of uncertainty. A numerical measure of uncertainty is provided by a very important
branch of mathematics called theory of probability. Before we study the probability
theory in detail, it is appropriate to explain certain terms which are essential for the
study of the theory of probability.

1.2 Some Important Terms and Concepts


1. Random Experiment If an experiment is conducted, any number of times,
under identical conditions, there is a set of all possible outcomes associated with it.
1.2 Chapter 1 Probability

If the outcome is not unique but may be any one of the possible outcomes, the
experiment is called a random experiment, e.g., tossing a coin, throwing a dice.

2. Outcome The result of a random experiment is called an outcome. For example,


consider the following:

(a) Suppose a random experiment is ‘a coin is tossed’. This experiment gives two


possible outcomes—head or tail.
(b) Suppose a random experiment is ‘a dice is thrown’. This experiment gives six
possible outcomes—1, 2, 3, 4, 5 or 6—on the uppermost face of a dice.
3. Trial and Event Any particular performance of a random experiment is called
a trial and outcome. A combination of outcomes is called an event. For example,
consider the following:

(a) Tossing of a coin is a trial, and getting a head or tail is an event.


(b) Throwing of a dice is a trial and getting 1 or 2 or 3 or 4 or 5 or 6 is an event.
4. Exhaustive Event The total number of possible outcomes of a random experi-
ment is called an exhaustive event. For example, consider the following:
(a) In tossing of a coin, there are two exhaustive events, viz., head and tail.
(b) In throwing of a dice, there are six exhaustive events, getting 1 or 2 or 3 or 4 or
5 or 6.
5. Mutually Exclusive Events Events are said to be mutually exclusive if the
occurrence of one of them precludes the occurrence of all others in the same trial, i.e.,
they cannot occur simultaneously. For example, consider the following:

(a) In tossing a coin, the events head or tail are mutually exclusive since both head
and tail cannot occur at the same time.
(b) In throwing a dice, all the six events, i.e., getting 1 or 2 or 3 or 4 or 5 or 6 are
mutually exclusive events.
6. Equally Likely Events The outcomes of a random experiment are said to be
equally likely if the occurrence of none of them is expected in preference to others. For
example, consider the following:

(a) In tossing a coin, head or tail are equally likely events.


(b) In throwing a dice, all the six faces are equally likely events.
7. Independent Events Events are said to be independent if the occurrence of an
event does not have any effect on the occurrence of other events. For example, con-
sider the following:
(a) In tossing a coin, the event of getting a head in the first toss is independent of
getting a head in the second, third, and subsequent tosses.
(b) In throwing a dice, the result of the first throw does not affect the result of the
second throw.
1.3 Definitions of Probability 1.3

8. Favourable Events The favourable events in a random experiment are the


number of outcomes which entail the occurrence of the event. For example, consider
the following:

In throwing of two dice, the favourable events of getting the sum 5 is (1, 4), (4, 1),
(2, 3), (3, 2), i.e., 4.

1.3 Definitions of Probability

1.3.1 Classical Definition of Probability


Let n be the number of equally likely, mutually exclusive, and exhaustive outcomes
of a random experiment. Let m be number of the outcomes which are favourable to
the occurrence of an event A. The probability of event A occurring, denoted by P(A),
is given by
Number of outcomes favourable to A m
P( A) = =
Number of exhaustive outcomes n

1.3.2 Empirical or Statistical Definition of Probability


If an experiment is repeated a large number of times under identical conditions, the
limiting value of the ratio of the number of times the event A occurs to the total number
of trials of the experiment as the number of trials increase indefinitely is called the
probability of occurrence of the event A.
Let P(A) be the probability of occurrence of the event A. Let m be the number of times
in which an event A occurs in a series of n trials.
m
P ( A) = lim , provided the limit is finite and unique.
n Æ• n

1.3.3 Axiomatic Definition of Probability


Before discussing the axiomatic definition of probability, it is necessary to explain
certain concepts that are necessary to its understanding.
1. Sample Space A set of all possible outcomes of a random experiment is called
a sample space. Each element of the set is called a sample point or a simple event or
an elementary event.
The sample space of a random experiment is denoted by S. For example, consider the
following:

(a) In a random experiment of tossing of a coin, the sample space consists of two
elementary events.
S = {H, T}
1.4 Chapter 1 Probability

(b) In a random experiment of throwing of a dice, the sample space consists of six
elementary events.
S = {1, 2, 3, 4, 5, 6}
  The elements of S can either be single elements or ordered pairs. If two coins are
tossed, each element of the sample space consists of the following ordered pairs:
S = {(H, H), (H, T), (T, H), (T, T)}
2. Event Any subset of a sample space is called an event. In the experiment of
throwing of a dice, the sample space is S = {1, 2, 3, 4, 5, 6}. Let A be the event that
an odd number appears on the dice. Then A = {1, 3, 5} is a subset of S. Similarly, let
B be the event of getting a number greater than 3. Then B = {4, 5, 6} is another subset
of S.

Definition of Probability Let S be a sample space of an experiment and A be


any event of this sample space. The probability P(A) of the event A is defined as the
real-value set function which associates a real value corresponding to a subset A of the
sample space S. The probability P(A) satisfies the following three axioms.

Axiom I: P(A) ≥ 0, i.e., the probability of an event is a nonnegative number.


Axiom II: P(S) = 1, i.e., the probability of an event that is certain to occur must be
equal to unity.
Axiom III: If A1, A2, ..., An are finite mutually exclusive events then
P( A1 » A2 » ... » An ) = P( A1 ) + P( A2 ) +  + P ( An )
n
= Â P( Ai )
i =1

i.e., the probability of a union of mutually exclusive events is the sum of probabilities
of the events themselves.

Example 1
What is the probability that a leap year selected at random will have
53 Sundays?
Solution
A leap year has 366 days, i.e., 52 weeks and 2 days. These 2 days can occur in the
following possible ways:
(i) Monday and Tuesday (ii) Tuesday and Wednesday
(iii) Wednesday and Thursday (iv) Thursday and Friday
(v) Friday and Saturday (vi) Saturday and Sunday
(vii) Sunday and Monday
Number of exhaustive cases n = 7
Number of favourable cases m = 2
1.3 Definitions of Probability 1.5

Let A be the event of getting 53 Sundays in a leap year.


m 2
P( A) = =
n 7

Example 2
Three unbiased coins are tossed. Find the probability of getting
(i) exactly two heads, (ii) at least one tail, (iii) at most two heads, (iv) a
head on the second coin, and (v) exactly two heads in succession.
Solution
When three coins are tossed, the sample space S is given by
  S = {HHH, HTH, THH, HHT, TTT, THT, TTH, HTT}
n(s) = 8
(i) Let A be the event of getting exactly two heads.
A = {HTH, THH, HHT}
n( A) = 3
n( A) 3
P( A) = =
n(S ) 8
(ii) Let B be the event of getting at least one tail.
B = {HTH, THH, HHT, TTT, THT, TTH, HTT}
n( B) = 7
n( B) 7
P ( B) = =
n(S ) 8
(iii) Let C be the event of getting at most two heads.
C = {HTH, THH, HHT, TTT, THT, TTH, HTT}
n(C ) = 7
n(C ) 7
P (C ) = =
n(S ) 8
(iv) Let D be the event of getting a head on the second coin.
D = {HHH, THH, HHT, THT}
n( D) = 4
n( D) 4 1
P( D) = = =
n( S ) 8 2
1.6 Chapter 1 Probability

(v) Let E be the event of getting two heads in succession.


E = {HHH, THH, HHT}
n( E ) = 3
n( E ) 3
P( E ) = =
n(S ) 8

Example 3
A fair dice is thrown. Find the probability of getting (i) an even number,
(ii) a perfect square, and (iii) an integer greater than or equal to 3.
Solution
When a dice is thrown, the sample space S is given by
  S = {1, 2, 3, 4, 5, 6}
n(S) = 6
(i) Let A be the event of getting an even number.
A = {2, 4, 6}
n( A) = 3
n( A) 3 1
P( A) = = =
n(S ) 6 2
(ii) Let B be the event of getting a perfect square.
B = {1, 4}
n( B) = 2
n( B) 2 1
P ( B) = = =
n(S ) 6 3
(iii) Let C be the event of getting an integer greater than or equal to 3.
C = {3, 4, 5, 6}
n(C ) = 4
n(C ) 4 2
P (C ) = = =
n(S ) 6 3

Example 4
A card is drawn from a well-shuffled pack of 52 cards. Find the probability
of (i) getting a king card, (ii) getting a face card, (iii) getting a red card,
(iv) getting a card between 2 and 7, both inclusive, and (v) getting a
card between 2 and 8, both exclusive.
1.3 Definitions of Probability 1.7

Solution
Total number of cards = 52
One card out of 52 cards can be drawn in ways.
n(S) = 52C1 = 52
(i) Let A be the event of getting a king card. There are 4 king cards and one of them
can be drawn in 4C1 ways.
n( A) = 4C1 = 4
n( A) 4 1
P( A) = = =
n(S ) 52 13
(ii) L
 et B be the event of getting a face card. There are 12 face cards and one of them
can be drawn in 12C1 ways.
n( B) = 12C1 = 12
n( B) 12 3
P ( B) = = =
n(S ) 52 13
(iii) L
 et C be the event of getting a red card. There are 26 red cards and one of them
can be drawn in 26C1 ways.
26
n(C ) = C1 = 26
n(C ) 26 1
P(C ) = = =
n(S ) 52 2
(iv) L
 et D be the event of getting a card between 2 and 7, both inclusive. There are
6 such cards in each suit giving a total of 6 × 4 = 24 cards. One of them can be
drawn in 24C1 ways.
24
n( D) = C1 = 24
n( D) 24 6
P( D) = = =
n(S ) 52 13
(v) L
 et E be the event of getting a card between 2 and 8, both exclusive. There are 5
such cards in each suit giving a total of 5 × 4 = 20 cards. One of them can be drawn
in 20C1 ways.
20
n( E ) = C1 = 20
n( E ) 20 5
= = =
n(S ) 52 13

Example 5
A bag contains 2 black, 3 red, and 5 blue balls. Three balls are drawn
at random. Find the probability that the three balls drawn (i) are blue
(ii) consist of 2 blue and 1 red ball, and (iii) consist of exactly one black
ball.
1.8 Chapter 1 Probability

Solution
Total number of balls = 10
3 balls out of 10 balls can be drawn in 10C3 ways.
n(S) = 10C3 = 120
(i) Let A be the event that the three balls drawn are blue. 3 blue balls out of 5 blue
balls can be drawn in 5C3 ways.
n( A) = 5C3 = 10
n( A) 10 1
P( A) = = =
n(S ) 120 12
(ii) Let B be the event that the three balls drawn consist of 2 blue and 1 red ball.
  2 blue balls out of 5 blue balls can be drawn in 5C2 ways. 1 red ball out of 3 red
balls can be drawn in 3C1 ways.
n( B) = 5C2 ¥ 3C1 = 30
n( B) 30 1
P ( B) = = =
n(S ) 120 4
(iii) L
 et C be the event that three balls drawn consist of exactly one black ball, i.e.,
remaining two balls can be drawn from 3 red and 5 blue balls. One black ball can
be drawn from 2 black balls in 2C1 ways and the remaining 2 balls can be drawn
from 8 balls in 8C2 ways.
n(C ) = 2C1 ¥ 8C2 = 56
n(C ) 56 7
P(C ) = = =
n(S ) 120 15

Example 6
A class consists of 6 girls and 10 boys. If a committee of three is chosen
at random from the class, find the probability that (i) three boys are
selected, and (ii) exactly two girls are selected.
Solution
Total number of students = 16
A committee of 3 students from 16 students can be selected in 16C3 ways.
n(S) = 16C3 = 560
(i) Let A be the event that 3 boys are selected.
n( A) = 10C3 = 120
n( A) 120 3
P( A) = = =
n(S ) 560 14
1.3 Definitions of Probability 1.9

(ii) L
 et B be the event that exactly 2 girls are selected. 2 girls from 6 girls can be
selected in 6C2 ways and one boy from 10 boys can be selected in 10C1 ways.
n( B) = 6C2 ¥ 10C1 = 150
n( B) 150 15
P ( B) = = =
n(S ) 560 16

Example 7
From a collection of 10 bulbs, of which 4 are defective, 3 bulbs are
selected at random and fitted into lamps. Find the probability that (i) all
three bulbs glow, and (ii) the room is lit.
Solution
Total number of bulbs = 10
3 bulbs can be selected from 10 bulbs in 10C3 ways.
n(S) = 10C3 = 120
(i) Let A be event that all three bulbs glow. This event will occur when 3 bulbs are
selected from 6 nondefective bulbs in 6C3 ways.
n( A) = 6C3 = 20
n( A) 20 1
P( A) = = =
n(S ) 120 6
(ii) L
 et B be the event that the room is lit. Let B be the event that the room is dark.
The event B will occur when 3 bulbs are selected from 4 defective bulbs in 4C3
ways.
n( B ) = 4C3 = 4
n( B ) 4 1
P( B) = = =
n(S ) 120 30
1 29
\ P ( B) = 1 - P ( B ) = 1 - =
30 30

Example 8
There are 20 tickets numbered 1, 2, ..., 20. One ticket is drawn at random.
Find the probability that the ticket bears a number which is (i) even,
(ii) a perfect square, and (iii) multiple of 3.
Solution
There are 20 tickets numbered from 1 to 20.
1.10 Chapter 1 Probability

n(S) = 20
(i) Let A be the event that a ticket bears a number which is even.
A = {2, 4, 6, 8, 10, 12, 14, 16, 18, 20}
n( A) = 10
n( A) 10 1
P( A) = = =
n(S ) 20 2
(ii) Let B the event that a ticket bears a number which is a perfect square.
B = {1, 4, 9, 16}
n( B) = 4
n( B) 4 1
P ( B) = = =
n(S ) 20 5
(iii) Let C be the event that a ticket bears a number which is a multiple of 3.
C = {3, 6, 9, 12, 15, 18}
n(C ) = 6
n(C ) 6 3
P (C ) = = =
n(S ) 20 10

Example 9
Four letters of the word ‘THURSDAY’ are arranged in all possible ways.
Find the probability that the word formed is ‘HURT’.
Solution
Total number of letters in the word ‘THURSDAY’ = 8
Four letters from 8 letters can be arranged in 8P4 ways.
n(S) = 8P4 = 1680
Let A be the event that the word formed is ‘HURT’. The word ‘HURT’ can be formed
in one way only.
n( A) = 1
n( A) 1
P( A) = =
n(S ) 1680

Example 10
A bag contains 5 red, 4 blue, and m green balls. If the probability of
1
getting two green balls when two balls are selected at random is ,
find m. 7
1.3 Definitions of Probability 1.11

Solution
Total number of balls = 5 + 4 + m = 9 + m
2 balls out of 9 + m balls can be drawn in 9 + mC2 ways.
n(S) = 9 + mC2
Let A be the event that both the balls drawn are green.
2 green balls out of m green balls can be drawn in mC2 ways.
n( A) = m C2
m
n( A) C2
P( A) = = 9+ m
n(S ) C2
1
But P ( A) =
7
m
C2 1
9+ m
=
C2 7
m(m - 1) 1
=
(m + 9)(m + 8) 7
(m + 9) (m + 8) = 7 m (m - 1)
m 2 + 17m + 72 = 7m 2 - 7m
6 m 2 - 24 m - 72 = 0
3m 2 - 12 m - 36 = 0
3m 2 - 18m + 6 m - 36 = 0
3m(m - 6) + 6(m - 6) = 0
(3m + 6)(m - 6) = 0
3m + 6 = 0 or m - 6 = 0
m = -2 or m=6
But m π –2
\ m=6

Exercise 1.1

1. A card is drawn at random from a pack of 52 cards. Find the probability


that the card drawn is (i) an ace card, and (ii) a club card.
È 1 1˘
ÍÎ ans.: (i) 13 (ii) 4 ˙˚

1.12 Chapter 1 Probability

2. An unbiased coin is tossed twice. Find the probability of (i) exactly one
head, (ii) at most one head, (iii) at least one head, and (iv) same face on
both the coins.
È 1 3 3 1˘
ÍÎ ans.: (i) 2 (ii) 4 (iii) 4 (iv) 2 ˙˚

3. A fair dice is thrown thrice. Find the probability that the sum of the
numbers obtained is 10.
È 1˘
ÍÎ ans.: 8 ˙˚

4. A ball is drawn at random from a box containing 12 red, 18 white, 19 blue,
and 15 orange balls. Find the probability that (i) it is red or blue, and
(ii) it is white, blue, or orange.
È 2 43 ˘
ÍÎ ans.: (i) 5 (ii) 55 ˙˚

5. Eight boys and three girls are to sit in a row for a photograph. Find the
probability that no two girls are together.
È 28 ˘
ÍÎ ans.: 55 ˙˚

6. If four persons are chosen from a group of 3 men, 2 women, and 4
children, find the probability that exactly two of them will be children.
È 10 ˘
ÍÎ ans.: 21˙˚

7. A box contains 2 white, 3 red, and 5 black balls. Three balls are drawn at
random. What is the probability that they will be of different colours?
È 1˘
ÍÎ ans.: 4 ˙˚

8. Two cards are drawn from a well-shuffled pack of 52 cards. Find the
probability of getting (i) 2 king cards, (ii) 1 king card and 1 queen card,
and (iii) 1 king card and 1 spade card.
È 1 8 1˘
ÍÎ ans.: (i) 221 (ii) 663 (iii) 26 ˙˚

9. A four-digit number is to be formed using the digits 0, 1, 2, 3, 4, 5. All the
digits are to be different. Find the probability that the digit formed is
(i) odd, (ii) greater than 4000, (iii) greater than 3400, and (iv) a multiple
of 5.
È 12 2 12 9˘
ÍÎ ans.: (i) 25 (ii) 5 (iii) 25 (iv) 25 ˙˚

1.4 Theorems on Probability 1.13

10. 3 books of physics, 4 books of chemistry, and 5 books of mathematics


are arranged in a shelf. Find the probability that (i) no physics books are
together, (ii) chemistry books are always together, and (iii) books of the
same subjects are together.
È 6 1 1 ˘
ÍÎ ans.: (i) 11 (ii) 55 (iii) 4620 ˙˚

11. 8 boys and 2 girls are to be seated at random in a row for a photograph.
Find the probability that (i) the girls sit together, and (ii) the girls occupy
3rd and 7th seats.
È 1 1˘
ÍÎ ans.: (i) 5 (ii) 45 ˙˚

12. A committee of 4 is to be formed from 15 boys and 3 girls. Find the
probability that the committee contains (i) 2 boys and 2 girls, (ii) exactly
one girl, (iii) one particular girl, and (iv) two particular girls.
È 7 91 2 2˘
ÍÎ ans.: (i) 68 (ii) 204 (iii) 9 (iv) 51˙˚

13. If the letters of the word REGULATIONS are arranged at random, what
is the probability that there will be exactly four letters between R and
E?
È 6˘
ÍÎ ans.: 55 ˙˚

14. Find the probability that there will be 5 Sundays in the month of
October.
È 3˘
ÍÎ ans.: 7 ˙˚


1.4 Theorems on Probability

Theorem 1 The probability of an impossible event is zero, i.e., P(f) = 0, where f


is a null set.
Proof An event which has no sample points is called an impossible event and is
denoted by f.

For a sample space S of an experiment,


S»f=S
Taking probability of both the sides,
P(S » f) = P(S)
1.14 Chapter 1 Probability

Since S and f are mutually exclusive events,


P(S) + P(f) = P(S) [Using Axiom III]
\ P(f) = 0

Theorem 2 The probability of the complementary event A of A is

P(A ) = 1 – P(A)
Proof Let A be an event in the sample space S.

A»A=S

P(A » A ) = P(S)

Since A and A are mutually exclusive events,
P( A) + P ( A) = P (S )
P( A) + P ( A) = 1 [∵ P(S ) = 1]
\ P( A) = 1 - P( A)

Note Since A and A are mutually exclusive events,
– –
A » A = S and A « A = f
Corollary Probability of an event is always less than or equal to one, i.e., P(A) £ 1

Proof P(A) = 1 – P(A )

P(A) £ 1 [∵ P(A ) ≥ 0 by Axiom I]
De Morgan’s Laws Since an event is a subset of a sample space, De Morgan’s laws
are applicable to events.

P ( A » B) = P ( A « B )
P ( A « B) = P ( A » B )

Theorem 3 For any two events A and B in a sample space S,


P ( A « B) = P ( B) - P ( A « B)

Proof From the Venn diagram (Fig. 1.1),

B = ( A « B) » ( A « B)
P( B) = P ÈÎ( A « B) » ( A « B)˘˚
– Fig. 1.1
Since (A « B) and (A « B) are mutually exclusive events,
P ( B) = P ( A « B) + P ( A « B)
P ( A « B) = P ( B) - P ( A « B)
Similarly, it can be shown that
P( A « B ) = P( A) - P( A « B)
1.4 Theorems on Probability 1.15

Theorem 4 Additive Law of Probability (Addition Theorem)


The probability that at least one of the events A and B will occur is given by
P( A » B) = P( A) + P( B) - P( A « B)

Proof From the Venn diagram (Fig. 1.1),

A » B = A » ( A « B)
P ( A » B) = P ÎÈ A » ( A « B ˚˘

Since A and (A « B) are mutually exclusive events,

P ( A » B) = P( A) + P( A « B) [Using Axiom III]


= P ( A) + P ( B) - P ( A « B) [Using Theorem 3]

Remarks

1. If A and B are mutually exclusive events, i.e., A « B = f then P(A « B) = 0


according to Theorem 1.
Hence, P(A » B) = P(A) + P(B)
2. The event A » B (i.e., A or B) denotes the occurrence of either A or B or both.
Alternately, it implies the occurrence of at least one of the two events.
A»B=A+B
3. The event A « B (i.e., A and B) is a compound or joint event that denotes the
simultaneous occurrence of the two events.
A « B = AB
Corollary 1 From the Venn diagram (Fig. 1.1),
P ( A » B) = 1 - P ( A « B )

where P ( A « B ) is the probability that none of the events A and B occur


simultaneously.

Corollary 2 P(Exactly one of A and B occurs) = P ÎÈ( A « B ) » ( A « B)˚˘

= P ( A « B ) + P ( A « B) ÈÎ∵ ( A « B ) « ( A « B) = f ˘˚
= P ( A) - P ( A « B) + P( B) - P( A « B) [Using Theorem 3]
= P ( A) + P ( B) - 2 P ( A « B)
= P ( A » B) - P ( A « B) [Using Theorem 4]
= P (at least one of the two events occur)
– P (the two events occur simultaneously)
1.16 Chapter 1 Probability

Corollary 3 The addition theorem can be applied for more than two events. If A,
B, and C are three events of a sample space S then the probability of occurrence of at
least one of them is given by

P ( A » B » C ) = P [A » ( B » C )]
= P ( A) + P ( B » C ) - P [A « ( B » C )]
= P ( A) + P ( B » C ) - P [A « B) » ( A « C )]
= P ( A) + P ( B) + P (C ) - P ( B « C ) - P ( A « B) - P( A « C ) + P ( A « B « C )
[Applying Theorem 4 on second and third term ]
Alternately, the probability of occurrence of at least one of the three events can also
be written as
P( A » B » C ) = 1 - P( A « B « C )
If A, B, and C are mutually exclusive events,
P(A » B » C) = P(A) + P(B) + P(C)
Corollary 4 The probability of occurrence of at least two of the three events is
given by

P [A « B) » ( B « C ) » ( A « C )] = P ( A « B) + P ( B « C ) + P( A « C ) - 3P ( A « B « C )
+ P( A « B « C ) [Using Corollary 3]
= P ( A « B) + P ( B « C ) + P ( A « C ) - 2 P ( A « B « C )

Corollary 5 The probability of occurrence of exactly two of the three events is


given by

P ÎÈ A « B « C ) » ( A « B « C ) » ( A « B « C )˚˘
= P [( A « B) » ( B « C ) » ( A « C )] - P ( A « B « C ) [Using Corollary 2]
= P ( A « B) + P( B « C ) + P( A « C ) - 3P( A « B « C ) [Using Corollary 4]

Corollary 6 The probability of occurrence of exactly one of the three events is


given by
P ÈÎ( A « B « C ) » ( A « B « C ) » ( A « B « C )˘˚
= P(at least one of the three event occur) – P (at least two of the three events occur)

   = P( A) + P ( B) + P (C ) - 2 P ( A « B) - 2 P ( B « C ) - 2 P ( A « C ) + 3P ( A « B « C )
1.4 Theorems on Probability 1.17

Example 1
A card is drawn from a well-shuffled pack of cards. What is the probability
that it is either a spade or an ace?
Solution
Let A and B be the events of getting a spade and an ace card respectively.
13
C1 13
P ( A) = 52
=
C1 52
4
C1 4
P ( B) = 52
=
C1 52
1
C1 1
P ( A « B) = 52
=
C1 52
Probability of getting either a spade or an ace card
P( A » B) = P( A) + P( B) - P( A « B)
13 4 1
= + -
52 52 52
4
=
13

Example 2
Two cards are drawn from a pack of cards. Find the probability that they
will be both red or both pictures.
Solution
Let A and B be the events that both cards drawn are red and pictures respectively.
26
C2 325
P( A) = 52
=
C2 1326
12
C2 66
P ( B) = 52
=
C2 1326
6
C2 15
P ( A « B) = 52
=
C2 1326
Probability that both cards drawn are red or pictures
P( A » B) = P( A) + P( B) - P( A « B)
1.18 Chapter 1 Probability

325 66 15
= + -
1326 1326 1326
188
=
663

Example 3
2
The probability that a contractor will get a plumbing contract is
3
5
and the probability that he will not get an electric contract is . If the
9
4
probability of getting any one contract is , what is the probability that
5
he will get both the contracts?
Solution
Let A and B be the events that the contractor will get plumbing and electric contracts
respectively.
2 5 4
P( A) = , P( B ) = , P( A » B) =
3 9 5
5 4
P ( B) = 1 - P ( B ) = 1 - =
9 9
Probability that the contractor will get any one contract
P( A » B) = P( A) + P( B) - P( A « B)
Probability that the contractor will get both the contracts
P( A « B) = P( A) + P( B) - P( A » B)
2 4 4
= + -
3 9 5
14
=
45

Example 4
A person applies for a job in two firms A and B, the probability of his
being selected in the firm A is 0.7 and being rejected in the firm B is 0.5.
The probability of at least one of the applications being rejected is 0.6.
What is the probability that he will be selected in one of the two firms?
Solution
Let A and B be the events that the person is selected in firms A and B respectively.
1.4 Theorems on Probability 1.19

P( A) = 0.7, P ( B ) = 0.5, P ( A » B ) = 0.6


P ( A) = 1 - P ( A) = 1 - 0.7 = 0.3
P( B) = 1 - P( B ) = 1 - 0.5 = 0.5
P ( A » B ) = P ( A) + P ( B ) - P ( A « B )  ... (1)
Probability that the person will be selected in one of the two firms
P ( A » B) = 1 - P ( A « B )
= 1 - ÈÎ P ( A) + P ( B ) - P( A » B )˘˚ [Using Eq. (1)]
= 1 - (0.3 + 0.5 - 0.6)
= 0.8

Example 5
In a group of 1000 persons, there are 650 who can speak Hindi, 400 can
speak English, and 150 can speak both Hindi and English. If a person is
selected at random, what is the probability that he speaks (i) Hindi only,
(ii) English only, (iii) only of the two languages, and (iv) at least one of
the two languages?
Solution
Let A and B be the events that a person selected at random speaks Hindi and English
respectively.
650 400 150
P( A) = , P ( B) = , P ( A « B) =
1000 1000 1000
(i) Probability that a person selected at random speaks Hindi only
P( A « B ) = P( A) - P( A « B)
650 150
= -
1000 1000
1
=
2
(ii) Probability that a person selected at random speaks English only
P ( A « B) = P ( B) - P ( A « B)
400 150
= -
1000 1000
1
=
4
1.20 Chapter 1 Probability

(iii) Probability that a person selected at random speaks only one of the languages.
P ÈÎ( A « B ) » ( A « B)˘˚ = P( A) + P( B) - 2 P( A « B)
650 400 Ê 150 ˆ
= + - 2Á
1000 1000 Ë 1000 ˜¯
3
=
4
(iv) P
 robability that a person selected at random speaks at least one of the two
languages
P( A » B) = P( A) + P( B) - P( A « B)
650 400 150
= + -
1000 1000 1000
9
=
10

Example 6
A box contains 4 white, 6 red, 5 black balls, and 5 balls of other colours.
Two balls are drawn from the box at random. Find the probability that
(i) both are white or both are red, and (ii) both are red or both are
black.
Solution
Let A, B, and C be the events of drawing white, red and black balls from the box
respectively.
4
C2 3
P( A) = 20
=
C2 95
6
C2 3
P ( B) = 20
=
C2 38
5
C2 1
P(C ) = 20
=
C2 19
(i) Probability that the both balls are white or both are red
P( A » B) = P( A) + P( B) - P( A « B)
3 3
= + -0
95 38
21
=
190
1.4 Theorems on Probability 1.21

(ii) Probability that both balls are red or both are black
P ( B » C ) = P ( B) + P (C ) - P ( B « C )
3 1
= + -0
38 19
5
=
38

Example 7
Three students A, B, C are in a running race. A and B have the same
probability of winning and each is twice as likely to win as C. Find the
probability that B or C wins.
Solution
Let A, B, and C be the events that students A, B, and C win the race respectively.
P( A) = P ( B) = 2 P (C )
P( A) + P( B) + P(C ) = 1
2 P(C ) + 2 P(C ) + P(C ) = 1
1
P(C ) =
5
2 2
\ P( A) = and P( B) =
5 5
Probability that student B or C wins
P( B » C ) = P( B) + P(C ) - P( B « C )
2 1
= + -0
5 5
3
=
5

Example 8
A card is drawn from a pack of 52 cards. Find the probability of getting
a king or a heart or a red card.
Solution
Let A, B and C be the events that the card drawn is a king, a heart and a red card
respectively.
1.22 Chapter 1 Probability

4
C1 4
P ( A) = 52
=
C1 52
13
C1 13
P ( B) = 52
=
C1 52
26
C1 26
P(C ) = 52
=
C1 52
    
1
C1 1
P ( A « B) = 52
=
C1 52
13
C1 13
P( B « C ) = 52
=
C1 52
2
C1 2
P( A « C ) = 52
=
C1 52
1
C1 1
P( A « B « C ) = 52
=
C1 52

Probability that the card drawn is a king or a heart or a red card.


P( A » B » C ) = P( A) + P( B) + P(C ) - P( A « B) - P( B « C ) - P( A « C )
+ P( A « B « C )
4 13 26 1 13 2 1
= + + - - - +
52 52 52 52 52 52 52
7
=
3

Example 9
From a city, 3 newspapers A, B, C are being published. A is read by
20%, B is read by 16%, C is read by 14%, both A and B are read by 8%,
both A and C are read by 5%, both B and C are read by 4% and all three
A, B, C are read by 2%. What is the probability that a randomly chosen
person (i) reads at least one of these newspapers, and (ii) reads one of
these newspapers?
Solution
Let A, B, and C be the events that the person reads newspapers A, B, and C respectively.
P( A) = 0.2, P( B) = 0.16 P(C ) = 0.14
P( A « B) = 0.08, P( A « B) = 0.05, P( B « C ) = 0.04
P ( A « B « C ) = 0.02
1.4 Theorems on Probability 1.23

(i) Probability that the person reads at least one of these newspapers
P( A » B » C ) = P( A) + P( B) + P(C ) - P( A « B) - P( A « C ) - P( B « C )
+ P( A « B « C )
= 0.2 + 0.16 + 0.14 - 0.08 - 0.05 - 0.04 + 0.02
= 0.35
(ii) Probability that the person reads none of these newspapers
P( A « B « C ) = 1 - P( A » B » C )
= 1 - 0.35
= 0.65
Alternatively, the problem can be solved by a Venn diagram A C
(Fig. 1.2). 9 3 7
2
65 6 2
 (the person reads at least one paper) = 1 -
(i) P = 0.35
100 6
B 65
(ii) P(the person reads none of these papers) = 0.65
Fig. 1.2
Exercise 1.2

2
1. The probability that a student passes a Physics test is and the
3
14
probability that he passes both Physics and English tests is . The
45
4
probability that he passes at least one test is . What is the probability
5
that the student passes the English test?
È 4˘
ÍÎ ans.: 9 ˙˚

2. What is the probability of drawing a black card or a king from a well-
shuffled pack of playing cards?
È 7˘
ÍÎ ans.: 13 ˙˚

3. A pair of unbiased dice is thrown. Find the probability that (i) the sum
of spots is either 5 or 10, and (ii) either there is a doublet or a sum less
than 6.
È 7 7˘
ÍÎ ans.: (i) 36 (ii) 18 ˙˚

1.24 Chapter 1 Probability

4. From a pack of well-shuffled cards, a card is drawn at random. What is


the probability that the card drawn is a diamond card or a king card?
È 4˘
ÍÎ ans.: 13 ˙˚

5. A bag contains 6 red, 5 blue, 3 white, and 4 black balls. A ball is drawn
at random. Find the probability that the ball is (i) red or black, and
(ii) neither red or black.
È 5 4˘
ÍÎ ans.: (i) 9 (ii) 9 ˙˚

6. There are 100 lottery tickets, numbered from 1 to 100. One of them
is drawn at random. What is the probability that the number on it is a
multiple of 5 or 7?
È 8˘
ÍÎ ans.: 25 ˙˚

7. From a group of 6 boys and 4 girls, a committee of 3 is to be formed. Find
the probability that the committee will include (i) all three boys or all
three girls, (ii) at most two girls, and (iii) at least one girl.
È 1 29 5˘
ÍÎ ans.: (i) 5 (ii) 30 (iii) 6 ˙˚

8. From a pack of 52 cards, three cards are drawn at random. Find the
probability that (i) all three will be aces or all three kings, (ii) all three
are pictures or all three are aces, (iii) none is a picture, (iv) at least one
is a picture, (v) none is a spade, (vi) at most two are spades, and (vii) at
least one is a spade.

È 2 56 38 47 ˘
Í ans.: (i) 5225 (ii) 5225 (iii) 85 (iv) 85 ˙
Í ˙
Í 703 839 997 ˙
(v) (vi) (vii)
ÍÎ 1700 850 1700 ˙˚

9. From a set of 16 cards numbered 1 to 16, one card is drawn at random.
Find the probability that (i) the number obtained is divisible by 3 or 7,
and (ii) not divisible by 3 and 7.
È 7 9˘
ÍÎ ans.: (i) 16 (ii) 16 ˙˚

1.6 Multiplicative Theorem for Independent Events 1.25

10. There are 12 bulbs in a basket of which 4 are working. A person tries
to fit them in 3 sockets choosing 3 of the bulbs at random. What is
the probability that there will be (i) some light, and (ii) no light in the
room?
È 41 14 ˘
ÍÎ ans.: (i) 55 (ii) 55 ˙˚


1.5 Conditional Probability

For any two events A and B in a sample space S, the probability of their simultaneous
occurrence, i.e., both the events occurrings simultaneously is given by
P( A « B) = P( A) P( B /A)
or P ( A « B ) = P ( B ) P ( A /B )
where P(B/A) is the conditional probability of B given that A has already occurred.
P(A/B) is the conditional probability of A given that B has already occurred.

1.6 Multiplicative Theorem for Independent Events

If A and B are two independent events, the probability of their simultaneous occur-
rence is given by
P( A « B) = P ( A) P ( B)
P ( A « B ) = P ( B ) P ( A /B )  ...(1.1)
Proof A = ( A « B) » ( A « B )
Since ( A « B) and ( A « B ) are mutually exclusive events,

P( A) = P( A « B) + P( A « B ) [Using Axiom III]


= P ( B ) P ( A /B ) + P ( B ) P ( A /B )
If A and B are independent events, the proportion of A’s in B is equal to proportion of

A’s in B , i.e., P( A /B) = P( A /B ).
P( A) = P( A /B) ÈÎ P( B) + P ( B )˘˚
= P ( A /B )
Substituting in Eq. (1.1),
\ P ( A « B) = P ( A) P ( B)

Remark The additive law is used to find the probability of A or B, i.e., P(A » B).
The multiplicative law is used to find the probability of A and B, i.e., P(A « B).
1.26 Chapter 1 Probability

Corollary 1 If A, B and C are three events then


P( A « B « C ) = P ( A) P ( B /A) P [C /( A « B)]
If A, B and C are independent events,
P( A « B « C ) = P ( A) P ( B) P (C )
– – – –
Corollary 2 If A and B are independent events then A and B , A and B, A and B are
also independent.

Corollary 3 The probability of occurrence of at least one of the events A, B, C is


given by
P( A » B » C ) = 1 - P( A « B « C )
If A, B, and C are independent events, their complements will also be independent.
P( A » B » C ) = 1 - P ( A) P ( B ) P (C )

Pairwise Independence and Mutual Independence The events A, B and C are


mutually independent if the following conditions are satisfied simultaneously:
P ( A « B) = P ( A) P ( B)
P( B « C ) = P ( B) P (C )
P ( A « C ) = P ( A) P (C )
and P( A « B « C ) = P ( A) P ( B) P (C )
If the last condition is not satisfied, the events are said to be pairwise independent.
Hence, mutually independent events are always pairwise independent but not vices
versa.

Example 1
If A and B are two events such that P( A) = 2 , P( A « B) = 1 and
3 6
1
P(A « B) = , find P ( B), P ( A » B), P ( A /B), P ( B /A), P ( A » B) and
– 3
P(B ). Also, examine whether the events A and B are (i) equally likely,
(ii) exhaustive, (iii) mutually exclusive, and (iv) independent.
Solution
P ( B) = P ( A « B) + P ( A « B)
1 1
= +
6 3
1
=
2
1.6 Multiplicative Theorem for Independent Events 1.27

P ( A » B ) = P ( A) + P ( B ) - P ( A « B )
2 1 1
= + -
3 2 3
5
=
6
P ( A « B)
P ( A /B) =
P ( B)
Ê 1ˆ
ÁË 3 ˜¯
=
Ê 1ˆ
ÁË 2 ˜¯
2
=
3
P ( A « B)
P ( B /A) =
P ( A)
Ê 1ˆ
ÁË 3 ˜¯
=
Ê 2ˆ
ÁË 3 ˜¯
1
=
2
P ( A » B) = P ( A) + P ( B) - P ( A « B)
1 1 1
= + -
3 2 6
2
=
3
P ( A « B ) = 1 - P ( A » B)
5
= 1-
6
1
=
6
P ( B ) = 1 - P ( B)
1
= 1-
2
1
=
2
(i) Since P(A) π P(B), A and B are not equally like events.
(ii) Since P(A » B) π 1, A and B are not exhaustive events.
1.28 Chapter 1 Probability

(iii) Since P(A « B) π 0, A and B are not mutually exclusive events.


(iv) Since P(A « B) = P(A) P(B), A and B are independent events.

Example 2
If A and B are two events such that P(A) = 0.3, P(B) = 0.4,
P(A « B) = 0.2, find (i) P(A » B), (ii) P ( A /B), and (iii) P ( A /B ).
Solution
(i) P( A » B) = P( A) + P( B) - P( A « B)
= 0.3 + 0.4 - 0.2
= 0.5

(ii) P( A /B) = P( A « B)
P ( B)
P ( B) - P ( A « B)
=
P ( B)
0.4 - 0.2
=
0.4
= 0.5

(iii) P( A /B ) = P ( A « B )
P( B)
P ( A) - P ( A « B)
=
1 - P ( B)
0.3 - 0.2
=
1 - 0.4
1
=
6

Example 3
1 1 1
If A and B are two events with P( A) = , P( B) = , P( A « B) = .
3 4 12
Find (i) P(A/B), (ii) P(B/A), (iii) P ( B /A), and (iv) P ( A « B ).
Solution
1
P ( A « B) 12 1
(i) P( A /B) = = =
P ( B) 1 3
4
1.6 Multiplicative Theorem for Independent Events 1.29

1
P ( A « B) 12 1
(ii) P( B /A) = = =
P ( A) 1 4
3
P ( B « A)
(iii) P( B /A) =
P ( A)
P( B) - P( B « A)
=
1 - P ( A)
1 1
-
= 4 12
1
1-
3
1
=
4
(iv) P( A « B ) = P( A) - P( A « B)
1 1
= -
3 12
1
=
4

Example 4
Find the probability of drawing a queen and a king from a pack of cards
in two consecutive draws, the cards drawn not being replaced.
Solution
Let A be the event that the card drawn is a queen.
4
C1 4 1
P( A) = 52
= =
C1 52 13

Let B be the event that the cards drawn are a king in the second draw given that the
first card drawn is a queen.
4
C1 4
P( B /A) = 51
=
C1 51

Probability that the cards drawn are a queen and a king


P ( A « B) = P( A) P( B /A)
4 4
= ¥
52 51
4
=
663
1.30 Chapter 1 Probability

Example 5
A bag contains 3 red and 4 white balls. Two draws are made without
replacement. What is the probability that both the balls are red?
Solution
Let A be the event that the ball drawn is red in the first draw.
3
P( A) =
7
Let B be the event that the ball drawn is red in the second draw given that the first ball
drawn is red.
2
P( B /A) =
6
Probability that both the balls are red
P( A « B) = P( A) P( B /A)
3 2
= ¥
7 6
1
=
7

Example 6
A bag contains 8 red and 5 white balls. Two successive draws of 3 balls
each are made such that (i) the balls are replaced before the second
trial, and (ii) the balls are not replaced before the second trial. Find the
probability that the first draw will give 3 white and the second, 3 red balls.
Solution
Let A be the event that all 3 balls obtained at the first draw are white, and B be the event
that all the 3 balls obtained at the second draw are red.
(i) When balls are replaced before the second trial,
5
C3 5
P( A) = 13
=
C3 143
8
C3 28
P ( B) = 13
=
C3 143
1.6 Multiplicative Theorem for Independent Events 1.31

   Probability that the first draw will give 3 white and the second, 3 red balls
P ( A « B) = P ( A) P ( B)
5 28
= ¥
143 143
140
=
20449
(ii) When the balls are not replaced before the second trial
8C3 7
P( B /A) = =
10C3 15

   Probability that the first draw will give 3 white and the second, 3 red balls
P ( A « B) = P( A) P( B /A)
5 7
= ¥
143 15
7
=
429

Example 7
From a bag containing 4 white and 6 black balls, two balls are drawn at
random. If the balls are drawn one after the other without replacements,
find the probability that the first ball is white and the second ball is
black.
Solution
Let A be the event that the first ball drawn is white and B be the event that the second
ball drawn is black given that the first ball drawn is white.
4
P( A) =
10
6
P( B /A) =
9
Probability that the first ball is white and the second ball is black.
P ( A « B) = P( A) P( B /A)
4 6
= ¥
10 9
4
=
15
1.32 Chapter 1 Probability

Example 8
Data on readership of a certain magazine show that the proportion of
male readers under 35 is 0.40 and that over 35 is 0.20. If the proportion
of readers under 35 is 0.70, find the probability of subscribers that are
females over 35 years. Also, calculate the probability that a randomly
selected male subscriber is under 35 years of age.
Solution
Let A be the event that the reader of the magazine is a male. Let B be the event that
reader of the magazine is over 35 years of age.
P( A « B ) = 0.40, P ( A « B) = 0.20, P( B ) = 0.7
P ( B) = 1 - P ( B )
= 1 - 0.7
= 0.3
(i) Probability of subscribers that are females over 35 years
P ( A « B) = P ( B) - P ( A « B)
= 0.3 - 0.2
= 0.1
(ii) Probability that a randomly selected male subscriber is under 35 years of age
P( A « B)
P( B /A) =
P( A)
P( A « B)
=
P ( A « B) + P ( A « B )
0.4
=
0.2 + 0.4
0.4
=
0.6
2
=
    3

Example 9
From a city population, the probability of selecting (a) a male or a
7 2
smoker is , (b) a male smoker is , and (c) a male, if a smoker is
10 5
1.6 Multiplicative Theorem for Independent Events 1.33

2
already selected, is . Find the probability of selecting (i) a nonsmoker,
3
(ii) a male, and (iii) a smoker, if a male is first selected.
Solution
Let A be the event that a male is selected. Let B be the event that a smoker is
selected.
7 2 2
P ( A » B ) = , P ( A « B ) = , P ( A /B ) =
10 5 3
(i) Probability of selecting a nonsmoker
P ( B ) = 1 - P ( B)
P ( A « B)
= 1-
P ( A /B )
Ê 2ˆ
ÁË 5 ˜¯
= 1-
Ê 2ˆ
ÁË 3 ˜¯
2
=
   5

(ii) P ( B) = 1 - P ( B )
2
= 1-
5
3
=
5
P( A » B) = P( A) + P( B) - P( A « B)  ... (1)
   Probability of selecting a male
P( A) = P( A » B) + P( A « B) - P( B) [Using Eq. (1)]
7 2 3
= + -
10 5 5
1
=
   2
(iii) Probability of selecting a smoker if a male is first selected
P ( A « B)
P( B /A) =
P ( A)
1.34 Chapter 1 Probability

Ê 2ˆ
ÁË 5 ˜¯
=
Ê 1ˆ
ÁË 2 ˜¯
4
=
    5

Example 10
Sixty per cent of the employees of the XYZ corporation are college
graduates. Of these, ten percent are in sales. Of the employee who
did not graduate from college, eighty percent are in sales. What is the
probability that
(i) an employee selected at random is in sales?
(ii) an employee selected at random is neither in sales nor a college
graduate?
Solution
Let A be the event that an employee is a college graduate. Let B be the event that an
employee is in sales.
P( A) = 0.6, P ( B /A) = 0.10, P ( B /A) = 0.8

    P( A) = 1 - P( A) = 1 - 0.60 = 0.40


(i) Probability that an employee is in sales
P ( B) = P ( A « B) + P ( A « B)
= P( A) P( B /A) + P( A) P ( B /A)
= (0.6 ¥ 0.1) + (0.40 ¥ 0.80)
   = 0.38
(ii) Probability that an employee is neither in sales nor a college graduate
P ( A « B ) = 1 - P ( A » B)
= 1 - [P( A) + P( B) - P( A « B)]

= 1 - [ P( A) + P( B) - P( A) P( B /A)]
= 1 - [0.60 + 0.38 - (0.60 ¥ 0.10)]
     = 0.08
1.6 Multiplicative Theorem for Independent Events 1.35

Example 11
3 5
If A and B are two events such that P ( A) = , P ( B) = and
3 8 8
P( A » B) = , find P(A/B) and P(B/A). Show whether A and B are
4
independent.
Solution
P ( A » B ) = P ( A) + P ( B ) - P ( A « B )
3 3 5
= + - P ( A « B)
4 8 8
1
P ( A « B) =
4
P ( A « B)
P ( A /B) =
P ( B)
Ê 1ˆ
ÁË 4 ˜¯
=
Ê 5ˆ
ÁË 8 ˜¯
2
=
5
P ( A « B)
P ( B /A) =
P ( A)
Ê 1ˆ
ÁË 4 ˜¯
=
Ê 3ˆ
ÁË 8 ˜¯
2
=
3
3 5 15
P( A) P( B) = ¥ =
8 8 64
P( A « B) π P ( A) P ( B)
Hence, the events A and B are not independent.

Example 12
2
The probability that a student A solves a mathematics problem is and
2 5
the probability that a student B solves it is . What is the probability
3
1.36 Chapter 1 Probability

that (i) the problem is not solved, (ii) the problem is solved, and (iii) both
A and B, working independently of each other, solve the problem?
Solution
Let A and B be events that students A and B solve the problem respectively.
2 2
P( A) = , P( B) =
  5 3
Events A and B are independent.
Probability that the student A does not solve the problem
P( A) = 1 - P( A)
2
= 1-
5
3
=
   5
Probability that the student B does not solve the problem
P ( B ) = 1 - P ( B)
2
= 1-
3
1
=
     3
(i) Probability that the problem is not solved
P ( A « B ) = P ( A) P ( B )
3 1
= ¥
5 3
1
=
5
(ii) Probability that the problem is solved
P ( A » B) = 1 - P ( A « B )
1
= 1-
5
4
=
5
(iii) Probability that both A and B solve the problem
P( A « B) = P ( A) P ( B)
2 2
= ¥
5 3
4
=
15
1.6 Multiplicative Theorem for Independent Events 1.37

Example 13
The probability that the machine A will perform a usual function in
1
5 years’ time is , while the probability that the machine B will perform
4 1
the function in 5 years’ time is . Find the probability that both machines
3
will perform the usual function.
Solution
Let A and B be the events that machines A and B will perform the usual function
respectively.
1
P( A) =
4
1
P ( B) =
   3
Events A and B are independent.
Probability that both machines will perform the usual function
P( A « B) = P ( A) P ( B)
1 1
= ¥
4 3
1
=
12

Example 14
A person A is known to hit a target in 3 out of 4 shots, whereas another
person B is known to hit the same target in 2 out of 3 shots. Find the
probability of the target being hit at all when they both try.
 [Summer 2015]
Solution
Let A and B be the events that the persons A and B hit the target respectively.
3
P( A) =
4
2
P ( B) =
     3
Events A and B are independent.
3 1
Probability that the person A will not hit the target = P( A) = 1 - P( A) = 1 - =
4 4
1.38 Chapter 1 Probability

2 1
Probability that the person B will not hit the target = P( B ) = 1 - P( B) = 1 - =
3 3
Probability that the target is not hit at all
P ( A « B ) = P ( A) P ( B )
1 1
= ¥
4 3
1
=
   12
Probability that the target is hit at all when they both try
P ( A » B) = 1 - P ( A « B )
1
= 1-
12
11
=
    12
Aliter
       P( A » B) = P( A) + P( B) - P( A « B)
= P( A) + P( B) - P( A) P( B) [∵ A and B independent]
3 2 3 2
= + - ¥
4 3 4 3
11
=
12

Example 15
The odds against A speaking the truth are 4 : 6 while the odds in favour
of B speaking the truth are 7 : 3. What is the probability that A and B
contradict each other in stating the same fact?
Solution
Let A and B be events that A and B speak the truth respectively.
6
P( A) =
10
7
P ( B) =
        10
Events A and B are independent.
6 4
Probability that A speaks a lie = P( A) = 1 - P( A) = 1 - =
10 10
7 3
Probability that B speaks a lie = P( B ) = 1 - P( B) = 1 - =
10 10
1.6 Multiplicative Theorem for Independent Events 1.39

Probability that A and B contradict each other


È∵ ( A « B ) and ( A « B) are ˘
P ÈÎ( A « B ) » ( A « B)˘˚ = P( A « B ) + P( A « B) Í ˙
Îmutually exclusive events ˚
= P( A) P( B ) + P( A) P( B)
6 3 4 7
= ¥ + ¥
10 10 10 10
23
=
50

Example 16
An urn contains 10 red, 5 white and 5 blue balls. Two balls are drawn at
random. Find the probability that they are not of the same colour.
Solution
Let A, B, and C be the events that two balls drawn at random be of the same colour, i.e.,
red, white, and blue respectively.
10C2 9
P( A) = =
20C2 38
5C2 1
P ( B) = =
20C2 19
5C2 1
P(C ) = =
20C2 19
   
Events A, B, and C are independent.
Probability that both balls drawn are of same colour
P( A » B » C ) = P ( A) + P ( B) + P (C )
9 1 1
= + +
38 19 19
13
=
       38
Probability that both balls drawn are not of the same colour
P( A « B « C ) = 1 - P( A » B » C )
13
= 1-
38
25
=
38
1.40 Chapter 1 Probability

Example 17
A problem in statistics is given to three students A, B and C, whose
1 1 1
chances of solving it independently are , , and respectively. Find
the probability that 2 3 4
(i) the problem is solved
(ii) at least two of them are able to solve the problem
(iii) exactly two of them are able to solve the problem
(iv) exactly one of them is able to solve the problem
Solution
Let A, B, and C be the events that students A, B, and C solve the problem respec-
tively.
1 1 1
P( A) = , P( B) = , P (C ) =
2 3 4
Events A, B, and C are independent.
(i) Probability that the problem is solved or at least one of them is able to solve the
problem is same.
P( A » B » C ) = P( A) + P( B) + P(C ) - P( A « B) - P( A « C ) - P( B « C )
+ P( A « B « C )
= P( A) + P( B) + P(C ) - P( A) P( B) - P( A) P(C ) - P( B) P(C )
+ P( A) P( B) P (C )
1 1 1 Ê 1 1ˆ Ê 1 1 ˆ Ê 1 1 ˆ Ê 1 1 1 ˆ
= + + - ¥ - ¥ - ¥ + ¥ ¥
2 3 4 ÁË 2 3 ˜¯ ÁË 2 4 ˜¯ ÁË 3 4 ˜¯ ÁË 2 3 4 ˜¯
3
=
4
(ii) Probability that at least two of them are able to solve the problem
P [( A « B) » ( B « C ) » ( A « C )] = P ( A « B) + P ( B « C ) + P( A « C ) - 2 P ( A « B « C )
= P( A) P( B) + P( B) P (C ) + P ( A) P (C )
- 2 P( A) P( B) P (C )
Ê 1 1ˆ Ê 1 1 ˆ Ê 1 1 ˆ Ê 1 1 1ˆ
= Á ¥ ˜ + Á ¥ ˜ + Á ¥ ˜ - 2Á ¥ ¥ ˜
Ë 2 3¯ Ë 3 4 ¯ Ë 2 4 ¯ Ë 2 3 4¯
7
=
24
1.6 Multiplicative Theorem for Independent Events 1.41

(iii) Probability that exactly two of them are able to solve the problem
P ÎÈ( A « B « C ) » ( A « B « C » ( A « B « C ˚˘
= P ( A « B) + P ( B « C ) + P ( A « C ) - 3P ( A « B « C )
= P( A) P( B) + P( B) P(C ) + P( A) P(C ) - 3P ( A) P ( B) P (C )
Ê 1 1ˆ Ê 1 1 ˆ Ê 1 1 ˆ Ê 1 1 1ˆ
= Á ¥ ˜ + Á ¥ ˜ + Á ¥ ˜ - 3Á ¥ ¥ ˜
Ë 2 3¯ Ë 3 4 ¯ Ë 2 4 ¯ Ë 2 3 4¯
1
=
4
(iv) Probability that exactly one of them is able to solve the problem
P ÈÎ A « B « C ) » ( A « B « C ) » ( A « B « C )˘˚
= P( A) + P( B) + P(C ) - 2 P( A « B) - 2 P( B « C ) - 2 P( A « C ) + 3P( A « B « C )
1 1 1 Ê 1 1ˆ Ê 1 1ˆ Ê 1 1ˆ Ê 1 1 1ˆ
= + + - 2 Á ¥ ˜ - 2 Á ¥ ˜ - 2 Á ¥ ˜ + 3Á ¥ ¥ ˜
2 3 4 Ë 2 3¯ Ë 3 4¯ Ë 2 4¯ Ë 2 3 4¯
11
=
24

Example 18
A husband and wife appeared in an interview for two vacancies in an
1
office. The probability of the husband’s selection is and that of the
1 7
wife’s selection is . Find the probability that (i) both of them are
5
selected, (ii) only one of them is selected, (iii) none of them is selected,
and (iv) at least one of them is selected.
Solution
Let A and B be the events that the husband and wife are selected respectively.
1 1
P( A) = , P ( B) =
        7 5
Events A and B are independent.
(i) Probability that both of them are selected
P ( A « B) = P ( A) P ( B)
1 1
= ¥
7 5
1
=
      35
1.42 Chapter 1 Probability

(ii) Probability that at least one of them is selected


P ( A » B) = P( A) + P( B) - P( A « B)
1 1 1
= + -
7 5 35
11
=
      35
(iii) Probability that none of them is selected
P ( A « B ) = 1 - P ( A » B)
11
= 1-
35
24
=
      35
(iv) Probability that only one of them is selected
P ÎÈ A « B ) » ( A « B)˚˘ = P ( A » B) - P( A « B)
11 1
= -
35 35
10
=
35
2
=
7

Example 19
There are two bags. The first contains 2 red and 1 white ball, whereas
the second bag has only 1 red and 2 white balls. One ball is taken out at
random from the first bag and put in the second. Then a ball is chosen at
random from the second bag. What is the probability that this last ball
is red?
Solution
There are two mutually exclusive cases.
Case I: A red ball is transferred from the first bag to the second bag and a red ball is
drawn from it.
Case II: A white ball is transferred from the first bag to the second bag and then a red
ball is drawn from it.
Let A be the event of transferring a red ball from the first bag, and B be the event of
transferring a white ball from the first bag.
2
P ( A) =
        3
1.6 Multiplicative Theorem for Independent Events 1.43

1
P ( B) =
        3
Let E be the event of drawing a red ball from the second bag.
2
P( E /A) =
4
1
P ( E /B ) =
4
P(Case I) = P( A « E )
= P( A) P( E /A)
2 2
= ¥
3 4
1
=
3
P(Case II) = P( B « E )
= P ( B ) P ( E /B )
1 1
= ¥
3 4
1
=
12
P [( A « E ) » ( B « E )] = P( A « E ) + P( B « E )
1 1
= +
3 12
5
=
12

Example 20
An urn contains four tickets marked with numbers 112, 121, 211, and
222, and one ticket is drawn. Let Ai (i = 1, 2, 3) be the event that the ith
digit of the ticket drawn is 1. Show that the events A1, A2, A3 are pairwise
independent but not mutually independent.
Solution
A1 = {112, 121}, A2 = {112, 211}, A3 = {121, 211}
A1 « A2 = {112}, A1 « A3 = {121}, A2 « A3 = {211}
2 1
P ( A1 ) = = = P ( A2 ) = P( A3 )
4 2
1
P ( A1 « A2 ) = = P ( A1 « A3 ) = P ( A2 « A3 )
   4
1.44 Chapter 1 Probability

1
P( A1 « A2 ) = P ( A1 ) P ( A2 ) =
4
1
    P ( A2 « A3 ) = P ( A2 ) P ( A3 ) =
4
1
P ( A1 « A3 ) = P ( A1 ) P ( A3 ) =
4
Hence, events A1, A2, and A3 are pairwise independent.
P( A1 « A2 « A3 ) = P(f ) = 0
P( A1 « A2 « A3 ) π P( A1 ) P ( A2 ) P ( A3 )
Hence, events A1, A2, and A3 are not mutually independent.

Exercise 1.3

1. Find the probability of drawing 2 red balls in succession from a bag


containing 4 red and 5 black balls when the ball that is drawn first is
(i) not replaced, and (ii) replaced.
È 1 16 ˘
ÍÎ ans.: (i) 6 (ii) 81˙˚

2. Two aeroplanes bomb a target in succession. The probability of each
correctly scoring a hit is 0.3 and 0.2 respectively. The second will bomb
only if the first misses the target. Find the probability that (i) the target
is hit, and (ii) both fail to score hits.

 [ans.: (i) 0.44 (ii) 0.56 ]


3. Box A contains 5 red and 3 white marbles and Box B contains 2 red
and 6 white marbles. If a marble is drawn from each box, what is the
probability that they are both of the same colour?
 [ans.: 0.109]
4. Two marbles are drawn in succession from a box containing 10 red, 30
white, 20 blue, and 15 orange marbles, with replacement being made
after each draw. Find the probability that (i) both are white, and (ii) the
first is red and the second is white.
È 4 4˘
ÍÎ ans.: (i) 25 (ii) 75 ˙˚

5. A, B, C are aiming to shoot a balloon. A will succeed 4 times out of 5
attempts. The chance of B to shoot the balloon is 3 out of 4, and that
1.6 Multiplicative Theorem for Independent Events 1.45

of C is 2 out of 3. If the three aim the balloon simultaneously, find the


probability that at least two of them hit the balloon.
È 5˘
ÍÎ ans.: 6 ˙˚

6. There are 12 cards numbered 1 to 12 in a box. If two cards are selected,
what is the probability that the sum is odd (i) with replacement, and
(ii) without replacement?
È 1 6˘
ÍÎ ans.: (i) 2 (ii) 11˙˚

7. Two cards are drawn from a well-shuffled pack of 52 cards. Find the
probability that they are both aces if the first card is (i) replaced, and
(ii) not replaced.
È 1 1 ˘
ÍÎ ans.: (i) 169 (ii) 221˙˚

8. A can hit a target 2 times in 5 shots; B, 3 times in 4 shots; and C, 2 times
in 3 shots. They fire a volley. What is the probability that at least 2 shots
hit the target?
È 2˘
ÍÎ ans.: 3 ˙˚

9. There are two bags. The first bag contains 5 red and 7 white balls and
the second bag contains 3 red and 12 white balls. One ball is taken out
at random from the first bag and is put in the second bag. Now, a ball is
drawn from the second bag. What is the probability that this last ball is
red?
È 41 ˘
ÍÎ ans.: 192 ˙˚

1
10. In a shooting competition, the probability of A hitting the target is ;
2
2 3
of B, is ; and of C, is . If all of them fire at the target, find the
3 4
probability that (i) none of them hits the target, and (ii) at least one of
them hits the target.
È 1 23 ˘
ÍÎ ans.: (i) 24 (ii) 24 ˙˚

11. The odds against a student X solving a statistics problem are 12 to 10
and the odds in favour of a student Y solving the problem are 6 to 9.
1.46 Chapter 1 Probability

What is the probability that the problem will be solved when both try
independently of each other?
È 37 ˘
ÍÎ ans.: 55 ˙˚

12. A bag contains 6 white and 9 black balls. Four balls are drawn at random
twice. Find the probability that the first draw will give 4 white balls
and the second draw will give 4 black balls if (i) the balls are replaced,
and (ii) the balls are not replaced before the second draw.
È 6 3 ˘
ÍÎ ans.: (i) 5915 (ii) 715 ˙˚

13. An urn contains 10 white and 3 black balls. Another urn contains 3
white and 5 black balls. Two balls are transferred from the first urn to
the second urn and then one ball is drawn from the latter. What is the
probability that the ball drawn is white?
È 5˘
ÍÎ ans.: 26 ˙˚

14. A man wants to marry a girl having the following qualities: fair
1
complexion—the probability of getting such a girl is , handsome
1 20
dowry—the probability is , westernized manners and etiquettes—
50
1
the probability of this is . Find the probability of his getting
100
married to such a girl when the possessions of these three attributes are
independent.
È 1 ˘
ÍÎ ans.: 100000 ˙˚

15. A small town has one fire engine and one ambulance available for
emergencies. The probability that the fire engine is available when
needed is 0.98 and the probability that the ambulance is available when
called is 0.92. In the event of an injury resulting from a burning building,
find the probability that both the fire engine and ambulance will be
available.
 [ans.: 0.9016 ]
16. In a certain community, 36% of the families own a dog and 22% of the
families that own a dog also own a cat. In addition, 30% of the families
own a cat. What is the probability that (i) a randomly selected family
1.7 Bayes’ Theorem 1.47

owns both a dog and a cat, and (ii) a randomly selected family owns a
dog given that it owns a cat?
 [ans.: (i) 0.0792 (ii) 0.264 ]

1.7 Bayes’ Theorem

Let A1, A2, ..., An be n mutually exclusive and exhaustive events with P(Ai) π 0 for
i = 1, 2, ..., n in a sample space S. Let B be an event that can occur in combination with
any one of the events A1, A2, ..., An with P(B) π 0. The probability of the event Ai when
the event B has actually occurred is given by
P( Ai ) P ( B /Ai )
P( Ai /B) = n
 P( Ai ) P( B /Ai )
i =1

Proof Since A1, A2, ..., An are n mutually exclusive and exhaustive events of the
sample space S,

S = A1 » A2 » ... » An
Since B is another event that can occur in combination with any of the mutually
exclusive and exhaustive events A1, A2, ..., An,
B = ( A1 « B) » ( A2 « B) »  » ( An « B)
Taking probability of both the sides,
P( B) = P( A1 « B) + P( A2 « B) +  + P( An « B)

The events ( A1 « B), ( A2 « B), etc., are mutually exclusive.


n n
P( B) = Â P( Ai « B) = Â P( Ai ) P ( B /Ai )
i =1 i =1

The conditional probability of an event A given that B has already occurred is given
by
P( Ai « B)
P( Ai /B) =
P ( B)
P( Ai ) P ( B /Ai )
=
P ( B)
P( Ai ) P ( B /Ai )
= n
 P( Ai ) P( B /Ai )
i =1
1.48 Chapter 1 Probability

Example 1
A company has two plants to manufacture hydraulic machines. Plant I
manufactures 70% of the hydraulic machines, and Plant II manufactures
30%. At Plant I, 80% of hydraulic machines are rated standard quality;
and at Plant II, 90% of hydraulic machines are rated standard quality.
A machine is picked up at random and is found to be of standard quality.
What is the chance that it has come from Plant I? [Summer 2015]
Solution
Let A1 and A2 be the events that the hydraulic machines are manufactured in Plant I
and Plant II respectively. Let B be the event that the machine picked up is found to be
of standard quality.
70
P ( A1 ) = = 0.7
100
30
P( A2 ) = = 0.3
100
Probability that the machine is of standard quality given
that it is manufactured in Plant I
80 Fig. 1.3
P( B /A1 ) = = 0.8
100
Probability that the machine is of standard quality given that it is manufactured in
Plant II
90
P( B /A2 ) = = 0.9
100
Probability that a machine is manufactured in Plant I given that it is of standard quality
P( A1 ) P ( B /A1 )
P ( A1 /B) =
P( A1 ) P ( B /A1 ) + P( A2 ) P( B /A2 )
0.7 ¥ 0.8
=
0.7 ¥ 0.8 + 0.3 ¥ 0.9
= 0.6747

Example 2
A bag A contains 2 white and 3 red balls, and a bag B contains 4 white
and 5 red balls. One ball is drawn at random from one of the bags and
it is found to be red. Find the probability that the red ball is drawn from
the bag B.
1.7 Bayes’ Theorem 1.49

Solution
Let A1 and A2 be the events that the ball is drawn from bags A and B respectively. Let
B be the event that the ball drawn is red.
1
P( A1 ) =
2
1
P( A2 ) = 3
2 —
A1 5
1 B
Probability that the ball drawn is red given that it is —
2
drawn from the bag A
3 5
P( B /A1 ) = —
1 A2 9
5 — B
2
Probability that the ball drawn is red given that it is Fig. 1.4
drawn from the bag B
5
P( B /A2 ) =
9
Probability that the ball is drawn from the bag B given that it is red
P( A2 ) P( B /A2 )
P( A2 /B) =
P( A1 ) P( B /A1 ) + P( A2 ) P( B /A2 )
1 5
¥
= 2 9
Ê 1 3ˆ Ê 1 5ˆ
ÁË 2 ¥ 5 ˜¯ + ÁË 2 ¥ 9 ˜¯
25
=
52

Example 3
The chances that Doctor A will diagnose a disease X correctly is
60%. The chances that a patient will die by his treatment after correct
diagnosis is 40% and the chance of death by wrong diagnosis is 70%.
A patient of Doctor A, who had the disease X, died. What is the chance
that his disease was diagnosed correctly?
Solution
Let A1 be the event that the disease X is diagnosed correctly by Doctor A. Let A2 be the
event that the disease X is not diagnosed correctly by Doctor A. Let B be the event that
a patient of Doctor A who has the disease X, dies.
1.50 Chapter 1 Probability

60
P ( A1 ) = = 0.6
100
P( A2 ) = P( A1 ) = 1 - P( A1 ) = 0.4
Probability that the patient of Doctor A who has the disease X dies given that the
disease X is diagnosed correctly
40
P( B /A1 ) = = 0.4
100
Probability that the patient of Doctor A who has the
disease X dies given that the disease X is not diagnosed A1 0.4
B
correctly 0.6
70
P( B /A2 ) = = 0.7
100
0.4 A2
Probability that the disease X is diagnosed correctly 0.7
B
given that a patient of Doctor A who has the disease X
Fig. 1.5
dies
P( A1 ) P ( B /A1 )
P ( A1 /B) =
P ( A1 ) P ( B /A1 ) + P ( A2 ) P ( B /A2 )
0.6 ¥ 0.4
=
(0.6 ¥ 0.4) + (0.4 ¥ 0.7)
6
=
13

Example 4
In a bolt factory, machines A, B, C manufacture 25%, 35%, and 40%
of the total output and out of the total manufacturing, 5%, 4%, and 2%
are defective bolts. A bolt is drawn at random from the product and is
found to be defective. Find the probabilities that it is manufactured from
(i) Machine A, (ii) Machine B, and (iii) Machine C.
Solution
Let A1, A2 and A3 be the events that bolts are manufactured by machines A, B, and C
respectively. Let B be the event that the bolt drawn is defective.
25
P( A1 ) = = 0.25
100
35
P( A2 ) = = 0.35
100
40
P( A3 ) = = 0.4
100
Fig. 1.6
1.7 Bayes’ Theorem 1.51

Probability that the bolt drawn is defective given that it is manufactured from
Machine A
5
P ( B /A1 ) = = 0.05
100
Probability that the bolt drawn is defective given that it is manufactured from
Machine B
4
P( B /A2 ) = = 0.04
100
Probability that the bolt drawn is defective given that it is manufactured from
Machine C
2
P( B /A3 ) = = 0.02
100
(i) Probability that a bolt is manufactured from Machine A given that it is defective
P ( A1 ) P ( B /A1 )
P( A1 /B) =
P( A1 ) P ( B /A1 ) + P( A2 ) P( B /A2 ) + P( A3 ) P ( B /A3 )
0.25 ¥ 0.05
=
(0.25 ¥ 0.05) + (0.35 ¥ 0.04) + (0.4 ¥ 0.02)
= 0.3623
(ii) Probability that a bolt is manufactured from Machine B given that it is defective
P( A2 ) P( B /A2 )
P( A2 /B) =
P( A1 ) P( B /A1 ) + P( A2 ) P( B /A2 ) + P( A3 ) P( B /A3 )
0.35 ¥ 0.04
=
(0.25 ¥ 0.05) + (0.35 ¥ 0.04) + (0.4 ¥ 0.02)
= 0.4058
(iii) Probability that a bolt is manufactured from Machine C given that it is defective
P( A3 ) P( B /A3 )
P( A3 /B) =
P( A1 ) P( B /A1 ) + P( A2 ) P( B /A2 ) + P( A3 ) P( B /A3 )
0.4 ¥ 0.02
=
(0.25 ¥ 0.05) + (0.35 ¥ 0.04) + (0.4 ¥ 0.02)
= 0.2319

Example 5
A businessman goes to hotels X, Y, Z for 20%, 50%, 30% of the time
respectively. It is known that 5%, 4%, 8% of the rooms in X, Y, Z hotels
have faulty plumbings. What is the probability that the businessman’s
room having faulty plumbing is assigned to Hotel Z?
1.52 Chapter 1 Probability

Solution
Let A1, A2 and A3 be the events that the businessman goes to hotels X, Y, Z respectively.
Let B be the event that the rooms have faulty plumbings.
20
P ( A1 ) = = 0.2
100
50
P( A2 ) = = 0.5
100
30
P( A3 ) = = 0.3
100
Fig. 1.7
Probability that rooms have faulty plumbings given that
rooms belong to Hotel X
5
P ( B /A1 ) = = 0.05
100
Probability that rooms have faulty plumbing given that rooms belong to Hotel Y
4
P( B /A2 ) = = 0.04
100
Probability that rooms have faulty plumbings given that rooms belong to Hotel Z
8
P( B /A3 ) = = 0.08
100
Probability that the businessman’s room belongs to Hotel Z given that the room has
faulty plumbing
P( A3 ) P( B /A3 )
P( A3 /B) =
P( A1 ) P( B /A1 ) + P( A2 ) P( B /A2 ) + P( A3 ) P ( B /A3 )
0.3 ¥ 0.08
=
(0.2 ¥ 0.05) + (0.5 ¥ 0.04) + (0.3 ¥ 0.08)
4
=
9

Example 6
Of three persons the chances that a politician, a businessman, or an
academician would be appointed the Vice Chancellor (VC) of a university
are 0.5, 0.3, 0.2 respectively. Probabilities that research is promoted by
these persons if they are appointed as VC are 0.3, 0.7, 0.8 respectively.
(i) Determine the probability that research is promoted.
(ii) If research is promoted, what is the probability that the VC is an
academician?
1.7 Bayes’ Theorem 1.53

Solution
Let A1, A2 and A3 be the events that a politician, a businessman or an academician will
be appointed as the VC respectively. Let B be the event
that research is promoted by these persons if they are
appointed as VC.
P( A1 ) = 0.5
P( A2 ) = 0.3
P( A3 ) = 0.2
Probability that research is promoted given that a
Fig. 1.8
politician is appointed as VC
P(B/A1) = 0.3
Probability that research is promoted given that a businessman is promoted as VC
P(B/A2) = 0.7
Probability that research is promoted given that an academician is appointed as VC
P(B/A3) = 0.8
(i) Probability that research is promoted
P( B) = P( A1 ) P( B /A1 ) + P( A2 ) P( B /A2 ) + P( A3 ) P ( B /A3 )
= (0.5 ¥ 0.3) + (0.3 ¥ 0.7) + (0.2 ¥ 0.8)
= 0.52
(ii) Probability that the VC is an academician given that research is promoted by him
P( A3 ) P( B /A3 )
P( A3 /B) =
P( A1 ) P( B /A1 ) + P( A2 ) P( B /A2 ) + P( A3 ) P ( B /A3 )
0.2 ¥ 0.8
=
0.52
4
=
13

Example 7
The contents of urns I, II, and III are as follows:
1 white, 2 red, and 3 black balls,
2 white, 3 red, and 1 black ball, and
3 white, 1 red, and 2 black balls.
One urn is chosen at random and two balls are drawn. They happen
to be white and red. Find the probability that they came from (i) Urn I,
(ii) Urn II, and (iii) Urn III.
1.54 Chapter 1 Probability

Solution
Let A1, A2, and A3 be the events that urns I, II and III are chosen respectively. Let B be
the event that 2 balls drawn are white and red.
1
P ( A1 ) =
3
1
P( A2 ) =
3
1
P ( A3 ) =
3
Fig. 1.9
Probability that 2 balls drawn are white and red given
that they are chosen from the urn I
1
C1 ¥ 2C1 1¥ 2 2
P( B /A1 ) = 6
= =
C2 15 15

Probability that 2 balls drawn are white and red given that they are chosen from the
urn II
2
C1 ¥ 3C1 2¥3 6
P( B /A2 ) = 6
= =
C2 15 15

Probability that 2 balls drawn are white and red given that they are chosen from the
urn III
3
C1 ¥ 1C1 3 ¥1 3
P( B /A3 ) = 6
= =
C2 15 15

(i) Probability that 2 balls came from the urn I given that they are white and red
P ( A1 ) P ( B /A1 )
P ( A1 /B) =
P ( A1 ) P ( B /A1 ) + P ( A2 ) P ( B /A2 ) + P ( A3 ) P ( B /A3 )
1 2
¥
= 3 15
Ê1 2 ˆ Ê1 6 ˆ Ê1 3 ˆ
ÁË 3 ¥ 15 ˜¯ + ÁË 3 ¥ 15 ˜¯ + ÁË 3 ¥ 15 ˜¯
2
=
11
(ii) Probability that 2 balls came from the urn II given that they are white and red
P ( A2 ) P ( B /A2 )
P ( A2 /B) =
P ( A1 ) P ( B /A1 ) + P ( A2 ) P ( B /A2 ) + P ( A3 ) P ( B /A3 )
1 6
¥
= 3 15
Ê1 2 ˆ Ê1 6 ˆ Ê1 3 ˆ
ÁË 3 ¥ 15 ˜¯ + ÁË 3 ¥ 15 ˜¯ + ÁË 3 ¥ 15 ˜¯
1.7 Bayes’ Theorem 1.55

6
=
     11
(iii) Probability that 2 balls came from the urn III given that they are white and red
P( A3 ) P ( B /A3 )
P( A3 /B) =
P( A1 ) P ( B /A1 ) + P( A2 ) P( B /A2 ) + P ( A3 ) P ( B /A3 )
1 3
¥
= 3 15
Ê1 2 ˆ Ê1 6 ˆ Ê1 3 ˆ
ÁË 3 ¥ 15 ˜¯ + ÁË 3 ¥ 15 ˜¯ + ÁË 3 ¥ 15 ˜¯
3
=
11

Exercise 1.4

1. There are 4 boys and 2 girls in Room A and 5 boys and 3 girls in Room B.
A girl from one of the two rooms laughed loudly. What is the probability
the girl who laughed was from Room B?
È 9˘
ÍÎ ans.: 17 ˙˚

4 2 1
2. The probability of X, Y, and Z becoming managers are , , and
9 9 3
respectively. The probabilities that the bonus scheme will be introduced
3 1 4
if X, Y, and Z become managers are , , and respectively. (i) What
10 2 5
is the probability that the bonus scheme will be introduced? (ii) If the
bonus scheme has been introduced, what is the probability that the
manager appointed was X?
È 23 6˘
ÍÎ ans.: (i) 45 (ii) 23 ˙˚

3. A factory has two machines, A and B. Past records show that the machine A
produces 30% of the total output and the machine B, the remaining 70%.
Machine A produces 5% defective articles and Machine B produces 1%
defective items. An item is drawn at random and found to be defective.
What is the probability that it was produced (i) by the machine A, and
(ii) by the Machine B?
 [ans.: (i) 0.682 (ii) 0.318]
1.56 Chapter 1 Probability

4. A company has two plants to manufacture scooters. Plant I manufactures


80% of the scooters, and Plant II manufactures 20%. At Plant I, 85 out
of 100 scooters are rated standard quality or better. At Plant II, only
65 out of 100 scooters are rated standard quality or better. What is the
probability that a scooter selected at random came from (i) Plant I, and
(ii) Plant II if it is known that the scooter is of standard quality?
 [ans.: (i) 0.84 (ii) 0.16 ]
5. A new pregnancy test was given to 100 pregnant women and 100 non-
pregnant women. The test indicated pregnancy in 92 of the 100 pregnant
women and in 12 of the 100 non-pregnant women. If a randomly selected
woman takes this test and the test indicates she is pregnant, what is the
probability she was not pregnant?
È 3˘
ÍÎ ans.: 26 ˙˚

6. An insurance company insured 2000 scooter drivers, 4000 car drivers,
and 6000 truck drivers. The probability of an accident is 0.01, 0.03, and
0.15 in the respective category. One of the insured drivers meets with an
accident. What is the probability that he is a scooter driver?
È 1˘
ÍÎ ans.: 52 ˙˚

7. Consider a population of consumers consisting of two types. The upper-
income class of consumers comprise 35% of the population and each
member has a probability of 0.8 of purchasing Brand A of a product.
Each member of the rest of the population has a probability of 0.3 of
purchasing Brand A of the product. A consumer, chosen at random, is
found to be the buyer of Brand A. What is the probability that the buyer
belongs to the middle-income and lower-income classes of consumers?
È 39 ˘
ÍÎ ans.: 95 ˙˚

8. T
 here are two boxes of identical appearance, each containing 4 spark
plugs. It is known that the box I contains only one defective spark plug,
while all the four spark plugs of the box II are non-defective. A spark plug
drawn at random from a box, selected at random, is found to be non-
defective. What is the probability that it came from the box I?
È 3˘
ÍÎ ans.: 7 ˙˚

1.7 Bayes’ Theorem 1.57

9. Vijay has 5 one-rupee coins and one of them is known to have two heads.
He takes out a coin at random and tosses it 5 times—it always falls head
upward. What is the probability that it is a coin with two heads?
È 8˘
ÍÎ ans.: 9 ˙˚

10. Stores A, B, and C have 50, 75, and 100 employees and, respectively
50, 60, 70 per cent of these are women. Resignations are equally likely
among all employees, regardless of sex. One employee resigns and this
is a woman. What is the probability that she works in Store C?
 [ans.: 0.5]
CHAPTER

Random
2
Variables

Chapter Outline
2.1 Introduction
2.2 Random Variables
2.3 Probability Mass Function
2.4 Discrete Distribution Function
2.5 Probability Density Function
2.6 Continuous Distribution Function
2.7 Two-Dimensional Discrete Random Variables
2.8 Two-Dimensional Continuous Random Variables

2.1 Introduction

The outcomes of random experiments are, in general, abstract quantities or, in other
words, most of the time they are not in any numerical form. However, the outcomes
of a random experiment can be expressed in quantitative terms, in particular, by
means of real numbers. Hence, a function can be defined that takes a definite real
value corresponding to each outcome of an experiment. This gives a rationale for
the concept of random variables about which probability statements can be made.
In probability and statistics, a probability distribution assigns a probability to each
measurable subset of the possible outcomes of a random experiment. Important and
commonly encountered probability distributions include binomial distribution, Poisson
distribution, and normal distribution.
2.2 Chapter 2 Random Variables

2.2 Random Variables

A random variable X is a real-valued function of the elements of the sample space of a


random experiment. In other words, a variable which takes the real values, depending
on the outcome of a random experiment is called a random variable, e.g.,
(i) When a fair coin is tossed, S = {H, T}. If X is the random variable denoting the
number of heads,
X(H) = 1 and X(T) = 0
Hence, the random variable X can take values 0 and 1.
(ii) When two fair coins are tossed, S = {HH, HT, TH, TT}. If X is the random
variable denoting the number of heads,
X(HH) = 2, X(HT) = 1, X(TH) = 1, X(TT) = 0.
Hence, the random variable X can take values 0, 1, and 2.
(iii) When a fair die is tossed, S = {1, 2, 3, 4, 5, 6}.
If X is the random variable denoting the square of the number obtained,
X(1) = 1, X(2) = 4, X(3) = 9, X(4) = 16, X(5) = 25, X(6) = 36
Hence, the random variable X can take values 1, 4, 9, 16, 25, and 36.

Types of Random Variables


There are two types of random variables:
(i) Discrete random variables
(ii) Continuous random variables
Discrete Random Variables A random variable X is said to be discrete if it takes
either finite or countably infinite values. Thus, a discrete random variable takes only
isolated values, e.g.,
(i) Number of children in a family
(ii) Number of cars sold by different companies in a year
(iii) Number of days of rainfall in a city
(iv) Number of stars in the sky
(v) Profit made by an investor in a day
Continuous Random Variables A random variable X is said to be continuous
if it takes any values in a given interval. Thus, a continuous random variable takes
uncountably infinite values, e.g.,
(i) Height of a person in cm
(ii) Weight of a bag in kg
(iii) Temperature of a city in degree Celsius
(iv) Life of an electric bulb in hours
(v) Volume of a gas in cc.
2.3 Probability Mass Function 2.3

Example 1
Identify the random variables as either discrete or continuous in each
of the following cases:
(i) A page in a book can have at most 300 words
X = Number of misprints on a page
(ii) Number of students present in a class of 50 students
(iii) A player goes to the gymnasium regularly
X = Reduction in his weight in a month
(iv) Number of attempts required by a candidate to clear the IAS
examination
(v) Height of a skyscraper
Solution
(i) X = Number of misprints on a page
The page may have no misprint or 1 misprint or 2 misprint … or 300 misprints.
Thus, X takes values 0, 1, 2, …, 300. Hence, X is a discrete random variable.
(ii) Let X be the random variable denoting the number of students present in a
class. X takes values 0, 1, 2, …, 50. Hence, X is a discrete random variable.
(iii) Reduction in weight cannot take isolated values 0, 1, 2, etc., but it takes any
continuous value.
Hence, X is a continuous random variable.
(iv) Let X be a random variable denoting the number of attempts required by a
candidate. Thus, X takes values 1, 2, 3, …. Hence, X is a discrete random
variable.
(v) Since height can have any fractional value, it is a continuous random variable.

2.3 Probability Mass Function

Probability distribution of a random variable is the set of its possible values together
with their respective probabilities. Let X be a discrete random variable which takes
the values x1, x2, … xn. The probability of each possible outcome xi is pi = p(xi) =
P(X = xi) for i = 1, 2, …, n. The number p(xi), i = 1,2, …. must satisfy the following
conditions:
(i) p(xi) ≥ 0 for all values of i

(ii) Â p( xi ) = 1
i =1
The function p(xi) is called the probability function or probability mass function of the
random variable X. The set of pairs {x, p(xi)}, i = 1, 2, …, n is called the probability
2.4 Chapter 2 Random Variables

distribution of the random variable which can be displayed in the form of a table as
shown below:
X = xi x1 x2 x3 … xi … xn

p(xi) = P(X = xi) p(x1) p(x2) p(x3) … p(xi) … p(xn)

2.4 Discrete Distribution Function

Let X be a discrete random variable which takes the values x1, x2, … such that
x1 < x2 < … with probabilities p(x1), p(x2) … such that p(xi) ≥ 0 for all values of i and
x
 p( xi ) = 1.
i =1

The distribution function F(x) of the discrete random variable X is defined by


x
F ( x ) = P( X £ x ) = Â p( xi )
i =1

where x is any integer. The function F(x) is also called the cumulative distribution
function. The set of pairs {xi, F(x)}, i = 1, 2, … is called the cumulative probability
distribution.
X x1 x2 ...

F(x) p(x1) p(x1) + p(x2) ...

Example 1
A fair die is tossed once. If the random variable is getting an even
number, find the probability distribution of X.
Solution
When a fair die is tossed,
S = {1, 2, 3, 4, 5, 6}
Let X be the random variable of getting an even number. Hence, X can take the values
0 and 1.
3 1
P(X = 0) = P(1, 3, 5) = =
6 2
3 1
P(X = 1) = P(2, 4, 6) = =
6 2
2.4 Distribution Function 2.5

Hence, the probability distribution of X is


X=x 0 1

1 1
P(X = 1)
2 2

1 1
Also, Â P( X = x ) = 2 + 2 = 1

Example 2
Find the probability distribution of the number of heads when three
coins are tossed.
Solution
When three coins are tossed,
S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}
Let X be the random variable of getting heads in tossing of three coins. Hence X can
take the values 0, 1, 2, 3.
1
P(X = 0) = P(no head) = P(TTT) =
8
3
P(X = 1) = P(one head) = P(HTT, THT, TTH) =
8
3
P(X = 2) = P(two heads) = P(HHT, THH, HTH) =
8
1
P(X = 3) = P(three heads) = P(HHH) =
8
Hence, the probability distribution of X is

X=x 0 1 2 3

1 3 3 1
P(X = x)
8 8 8 8

1 3 3 1
Also, Â P( X = x ) = 8 + 8 + + =1
8 8
2.6 Chapter 2 Random Variables

Example 3
State with reasons whether the following represent the probability mass
function of a random variable:
(i)
X=x 0 1 2 3

P(X = x) 0.4 0.3 0.2 0.1

(ii)
X=x 0 1 2 3

1 1 1 1
P(X = x)
2 3 6 4
(iii)
X=x 0 1 2 3

1 1 1 3
P(X = x) -
2 2 4 4

Solution
(i) Here, 0 £ P(X = x) £ 1 is satisfied for all values of X.
ÂP(X = x) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3)
= 0.4 + 0.3 + 0.2 + 0.1
=1
Since ÂP(X = x) = 1, it represents probability mass function.
(ii) Here, 0 £ P(X = x) £ 1 is satisfied for all values of X.
ÂP(X = x) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3)
1 1 1 1
= + + +
2 3 6 4
5
= >1
4
Since Â(P(X = x) > 1, it does not represent a probability mass function.
(iii) Here, 0 £ P(X = x) £ 1 is not satisfied for all the values of X as
1
P(X = 0) = - .
2
Hence, P(X = x) does not represent a probability mass function.
2.4 Distribution Function 2.7

Example 4
Verify whether the following functions can be regarded as the probability
mass function for the given values of X:
1
(i) P ( X = x ) = for x = 0, 1, 2, 3, 4
5
= 0 for otherwise
x-2
(ii) P(X = x) = for x = 1, 2, 3, 4, 5
5
=0 for otherwise
x2
(iii) P(X = x) = for x = 0, 1, 2, 3, 4
30
=0 for otherwise
Solution
1
(i) P(X = 0) = P(X = 1) = P(X = 2) = P(X = 3) = P(X = 4) =
5
P(X = x) ≥ 0 for all values of x
ÂP(X = x) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4)
1 1 1 1 1
  = + + + +
5 5 5 5 5
  = 1
Hence, P(X = x) is a probability mass function.
1- 2 1
(ii) P(X = 1) = =- <0
5 5
Hence, P(X = x) is not a probability mass function.
(iii) P(X = 0) = 0
1
P(X = 1) =
30
4
P(X = 2) =
30
9
P(X = 3) =
30
16
P(X = 4) =
30
P(X = x) ≥ 0 for all values of x
2.8 Chapter 2 Random Variables

ÂP(X = x) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4)


1 4 9 16
  = 0 + + + +
30 30 30 30
  = 1
Hence, P(X = x) is a probability mass function.

Example 5
A random variable X has the probability mass function given by
X 1 2 3 4

P(X = x) 0.1 0.2 0.5 0.2

Find (i) P(2 £ x < 4), (ii) P(X > 2), (iii) P(X is odd), and (iv) P(X is
even).
Solution
(i) P(2 £ X < 4) = P(X = 2) + P(X = 3)
= 0.2 + 0.5
= 0.7
(ii) P(X > 2) = P(X = 3) + P(X = 4)
= 0.5 + 0.2
= 0.7
(iii) P(X is odd) = P(X = 1) + P(X = 3)
= 0.1 + 0.5
= 0.6
(iv) P(X is even) = P(X = 2) + P(X = 4)
= 0.2 + 0.2
= 0.4

Example 6
If the random variable X takes the value 1, 2, 3, and 4 such that
2P(X = 1) = 3P(X = 2) = P(X = 3) = 5P(X = 4). Find the probability
distribution.
Solution
Let 2P(X = 1) = 3P(X = 2) = P(X = 3) = 5P(X = 4) = k
k
P(X = 1) =
2
k
P(X = 2) =
3
2.4 Distribution Function 2.9

P(X = 3) = k
k
P(X = 4) =
5
Since Â(P(X = x) = 1,
k k k
+ + k + =1
   2 3 5
30
k=
61
Hence, the probability distribution is
X 1 2 3 4

15 10 30 6
P(X = x)
61 61 61 61

Example 7
A random variable X has the following probability distribution:
X 0 1 2 3 4 5 6 7

P(X = x) a 4a 3a 7a 8a 10a 6a 9a
(i) Find the value of a.
(ii) Find P(X < 3).
(iii) Find the smallest value of m for which P(X £ m) ≥ 0.6.
Solution
(i) Since P(X = x) is a probability distribution function,
Â(P(X = x) = 1
P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4) + P(X = 5) + P(X = 6)
+ P(X = 7) = 1
a + 4a + 3a + 7a + 8a + 10a + 6a + 9a = 1
1
a=
48
(ii) P(X < 3) = P(X = 0) + P(X = 1) + P(X = 2)
= a + 4a + 3a
= 8a
Ê 1ˆ
= 8Á ˜
Ë 48 ¯
    
1
     =
6
2.10 Chapter 2 Random Variables

(iii) P(X £ 4) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4)


= a + 4a + 3a + 7a + 8a
= 23a
Ê 1ˆ
= 23 Á ˜
Ë 48 ¯
= 0.575
P(X £ 5) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4) + P(X = 5)
= a + 4a + 3a + 7a + 8a + 10a
= 33a
Ê 1ˆ
= 33 Á ˜
Ë 48 ¯
= 0.69
Hence, the smallest value of m for which P(X £ m) ≥ 0.6 is 5.

Example 8
The probability mass function of a random variable X is zero
except at the points X = 0, 1, 2. At these points, it has the values
P(X = 0) = 3c3, P(X = 1) = 4c – 10c2, P(X = 2) = 5c – 1.
Find (i) c, (ii) P(X < 1), (iii) P(1 < X £ 2), and (iv) P(0 < X £ 2).
Solution
(i) Since P(X = x) is a probability mass function,
Â(P(X = x) = 1
P(X = 0) + P(X = 1) + P(X = 2) = 1
   3c3 + 4c – 10c2 + 5c – 1 = 1
     3c3 – 10c2 + 9c – 2 = 0
    (3c – 1) (c – 2) (c – 1) = 0
1
c = , 2, 1
     3
But c < 1, otherwise given probabilities will be greater than one or less than
zero.
1
\ c=
3
Hence, the probability distribution is
X 0 1 2

1 2 2
P(X = x)
9 9 3
2.4 Distribution Function 2.11

1
(ii) P(X < 1) = P(X = 0) =
9
2
(iii) P(1 < X £ 2) = P(X = 2) =
3
(iv) P(0 < X £ 2) = P(X = 1) + P(X = 2)
2 2
= +
9 3
8
=
9

Example 9
From a lot of 10 items containing 3 defectives, a sample of 4 items is
drawn at random. Let the random variable X denote the number of
defective items in the sample. Find the probability distribution of X.
Solution
The random variable X can take the value 0, 1, 2, or 3.
Total number of items = 10
Number of good items = 7
Number of defective items = 3
7
C4 1
P(X = 0) = P(no defective) = 10
=
C4 6
3
C1 7C3 1
P(X = 1) = P(one defective and three good items) =
10
=
C4 2
3
C2 7 C2 3
P(X = 2) = P(two defectives and two good items) =
10
=
C4 10
3
C3 7C1 1
P(X = 3) = P(three defectives and one good item) = 10
=
C4 30

Hence, the probability distribution of the random variable is


X 0 1 2 3

1 1 3 1
P(X = x)
6 2 10 30
2.12 Chapter 2 Random Variables

Example 10
Construct the distribution function of the discrete random variable X
whose probability distribution is as given below:
X 1 2 3 4 5 6 7
P(X = x) 0.1 0.15 0.25 0.2 0.15 0.1 0.05

Solution
Distribution function of X
X P(X = x) F(x)
1 0.1 0.1
2 0.15 0.25
3 0.25 0.5
4 0.2 0.7
5 0.15 0.85
6 0.1 0.95
7 0.05 1

Example 11
A random variable X has the probability function given below:
X 0 1 2
P(X = x) k 2k 3k
Find (i) k, (ii) P(X < 2), P(X £ 2), P(0 < X < 2), and (iii) the distribution
function.
Solution:
(i) Since P(X = x) is a probability mass function,

Â(P(X = x) = 1
k + 2k + 3k = 1
    6k = 1
1
     k =
6
    Hence, the probability distribution is
X 0 1 2

1 2 3
P(X = x)
6 6 6
2.4 Distribution Function 2.13

1 2 1
(ii) P(X < 2) = P(X = 0) + P(X = 1) = + =
6 6 2
1 2 3
P(X £ 2) = P(X = 0) + P(X = 1) + P(X = 2) = + + =1
6 6 6
1
P(0 < X < 2) = P(X = 1) =
3
(iii) Distribution function
X P(X = x) F(x)

1 1
0
6 6

2 1
1
6 2

3
2 1
6

Example 12
A random variable X takes the values –3, –2, –1, 0, 1, 2, 3, such that
P(X = 0) = P(X > 0) = P(X < 0),
P(X = –3) = P(X = –2) = P(X = –1) = P(X = 1) = P(X = 2) = P(X = 3).
Obtain the probability distribution and the distribution function of X.
Solution
Let P(X = 0) = P(X > 0) = P(X < 0) = k1

Since ÂP(X = x) = 1
k1 + k1 + k1 = 1
\     k1 = 1
3
1
P(X = 0) = P(X > 0) = P(X < 0) =
3
Let P(X = 1) = P(X = 2) = P(X = 3) = k2
P(X > 0) = P(X = 1) + P(X = 2) + P(X = 3)
1
= k2 + k2 + k2
3
1
\ k2 =
9
1
P(X = 1) = P(X = 2) = P(X = 3) =
9
2.14 Chapter 2 Random Variables

1
Similarly, P(X = –3) = P(X = –2) = P(X = –1) =
9
Probability distribution and distribution function
X P(X = x) F(x)

1 1
–3
9 9

1 2
–2
9 9

1 3
–1
9 9

1 6
0
3 9

1 7
1
9 9

1 8
2
9 9

1
3 1
9

Example 13
A discrete random variable X has the following distribution function:
Ï0 x <1
Ô1
Ô 1£ x < 4
Ô3
ÔÔ 1
F ( x) = Ì 4£ x<6
Ô 2
Ô5
Ô 6 6 £ x < 10
Ô
ÓÔ 1 x ≥ 10
Find (i) P(2 < X £ 6), (ii) P(X = 5), (iii) P(X = 4), (iv) P(X £ 6), and
(v) P(X = 6).
2.4 Distribution Function 2.15

Solution
5 1 3 1
(i) P(2 < X £ 6) = F(6) – F(2) = - = =
6 3 6 2
1 1
(ii) P(X = 5) = P(X £ 5) – P(X < 5) = F(5) – P(X < 5) = - =0
2 2
1 1 1
(iii) P(X = 4) = P(X £ 4) – P(X < 4) = F(4) – P(X < 4) = - =
2 3 6
5
(iv) P(X £ 6) = F(6) =
6
5 1 1
(v) P(X = 6) = P(X £ 6) – P(X < 6) = F(6) – P(X < 6) = - =
6 2 3

Exercise 2.1
1. Verify whether the following functions can be considered as probability
mass functions:
x2 + 1
  (i) P(X = x) = , x = 0, 1, 2, 3  [Ans.: Yes)
18
x2 - 2
  (ii) P(X = x) = , x = 1, 2, 3  [Ans.: No]
8
2x + 1
(iii) P(X = x) = , x = 0, 1, 2, 3  [Ans.: No]
18
2. The probability mass function of a random variable X is
X 0 1 2 3 4 5 6

P(X = x) k 3k 5k 7k 9k 11k 13k

  Find P(X < 4) and P(3 < X £ 6).


È 16 33 ˘
Í ans.: 49 , 49 ˙
 Î ˚
3. A random variable X has the following probability distribution:
X 1 2 3 4 5 6 7

P(X = x) k 2k 3k k2 k2 + k 2k2 4k2

  Find (i) k, (ii) P(X < 5), (iii) P(X > 5), and (iv) P(0 £ X £ 5)
È 1 49 3 29 ˘
Í ans.: 8 (ii) 64 (iii) 32 (iv) 32 ˙
 Î ˚
2.16 Chapter 2 Random Variables

4. A discrete random variable X has the following probability distribution:


X –2 –1 0 1 2 3

P(X = x) 0.1 k 0.2 2k 0.3 3k

  Find (i) k, (ii) P(X ≥ 2), and (iii) P(–2 < X < 2).
È 1 1 2˘
Í ans.: 15 (ii) 2 (iii) 5 ˙
 Î ˚
5. Given the following probability function of a discrete random variable X:

X 0 1 2 3 4 5 6 7

P(X = x) 0 c 2c 2c 3c c2 2c2 7c2 + c

1
 Find (i) c, (ii) P(X ≥ 6), (iii) P(X < 6), and (iv) Find k if P(X £ k) > ,
where k is a positive integer. 2

 [Ans.: (i) 0.1 (ii) 0.19 (iii) 0.81 (iv) 4]


6. A random variable X assumes four values with probabilities
1 + 3x 1 - x 1 + 2 x 1 - 4x
, , and . For what value of x do these values
4 4 4 4
represent the probability distribution of X?
È 1 1˘
Í ans.: - 3 £ X £ 4 ˙
Î ˚

7. Let X denote the number of heads in a single toss of 4 fair coins.
  Determine (i) P(X < 2), and (ii) P(1 < X £ 3).
È 5 5˘
Í ans.: (i) 16 (ii) 8 ˙
 Î ˚
8. If 3 cars are selected from a lot of 6 cars containing 2 defective cars, find
the probability distribution of the number of defective cars.

X 0 1 2

Ans.: 1 3 2
P(X = x)
5 5 5

9. Five defective bolts are accidentally mixed with 20 good ones. Find the
probability distribution of the number of defective bolts, if four bolts are
drawn at random from this lot.
2.4 Distribution Function 2.17

X 0 1 2 3 4
Ans.: 969 1140 380 40 1
P(X = x)
2530 2530 2530 2530 2530

10. Two dice are rolled at once. Find the probability distribution of the
sum of the numbers on them.

X 2 3 4 5 6 7 8 9 10 11 12
Ans.:
1 2 3 4 5 6 5 4 3 2 1
P(X = x)
36 36 36 36 36 36 36 36 36 36 36

11. A random variable X takes three values 0, 1, and 2 with probabilities


1 1 1
, , and respectively. Obtain the distribution function of X.
3 6 2
È 1 1 ˘
Í ans.: F (0) = 3 , F (1) = 2 , F (2) = 1˙
 Î ˚
12. A random variable X has the following probability function:
x 0 1 2 3 4
P(X = x) k 3k 5k 7k 9k
  Find (i) the value of k, (ii) P(X < 3), P(X ≥ 3), P(0 < X < 4), and
  (iii) distribution function of X.
È 1 9 16 3 ˘
Í ans.: (i) 25 , (ii) 25 , 25 , 5 ˙
Í ˙
Í 1 4 9 16
(iii) F (0) = , F (1) = . F (2) = , F (3) = , F (4) = 1˙
 ÎÍ 25 25 25 25 ˚˙
13. A random variable X has the probability function
X –2 –1 0 1 2 3
P(X = x) 0.1 k 0.2 2k 0.3 k
 Find (i) k, (ii) P(X £ 1), (iii) P(–2 < X < 1), and (iv) obtain the
distribution function of X.
 [Ans.: (i) 0.1 (ii) 0.6 (iii) 0.3]
14. The following is the distribution function F(x) of a discrete random
variable X:
X –3 –2 –1 0 1 2 3
P(X = x) 0.08 0.2 0.4 0.65 0.8 0.9 1
2.18 Chapter 2 Random Variables

 Find (i) the probability distribution of X, (ii) P(–2 £ X £ 1), and


(iii) P(X ≥ 1).

X –3 –2 –1 0 1 2 3
Ans.: (i)
P(X = x) 0.08 0.12 0.2 0.25 0.15 0.1 0.1
(ii) 0.72 (ii) 0.35


2.5 Probability Density Function

Let X be a continuous random variable such that the probability of the variable X
1 1
falling in the small interval x - dx to x + dx is f ( x ) dx, i.e.,
2 2
Ê 1 1 ˆ
P Á x - dx £ X £ x + dx ˜ = f ( x ) dx
Ë 2 2 ¯
The function f (x) is called the probability density function of the random variable X
and the continuous curve y = f (x) is called the probability curve.
Properties of Probability Density Function
(i) f (x) ≥ 0,    – • < x < •

(ii) Ú f ( x ) dx = 1
-•
b
(iii) P (a < x < b) = Ú f ( x ) dx
a

2.6 Continuous Distribution Function

If X is a continuous random variable having the probability density function f (x) then
the function
x
F ( x ) = P( X £ x ) = Ú f ( x ) dx, - • < x < •
-•

is called the distribution function or cumulative distribution function of the random


variable X.
Properties of Cumulative Distribution Function
(i) F(–•) = 0
(ii) F(•) = 1
(iii) 0 £ F(x) £ 1, – • < x < •
(iv) P(a < X < b) = F(b) – F(a)
2.6 Continuous Distribution Function 2.19

(v) F ¢( x ) = d F ( x ) = f ( x ), f ( x) ≥ 0
dx

Example 1
Show that the function f (x) defined by
1
f ( x) = 1< x < 8
7
=0 otherwise
is a probability density function for a random variable. Hence, find
P(3 < X < 10).
Solution
f (x) ≥ 0   in   1 < x < 8
• 1 8 •

Ú f ( x ) dx = Ú f ( x ) dx + Ú f ( x ) dx + Ú f ( x ) d x
-• -• 1 8
8
1
= 0+Ú dx + 0
1
7
1 8
= x1
7
1
= (8 - 1)
7
=1
Hence, f (x) is a probability density function.
10
P(3 < X < 10) = Ú f ( x) dx
3
8 10
= Ú f ( x ) dx + Ú f ( x ) dx
3 8
8
1
= Ú dx + 0
3
7
1
= | x | 83
7
1
= (8 - 3)
7
5
=
7
2.20 Chapter 2 Random Variables

Example 2
Is the function f (x) defined by
f ( x ) = e- x x≥0
=0 x<0
is a probability density function. If so, find the probability that the variate
having this density falls in the interval (1, 2).
Solution
f (x) ≥ 0 in (0, •)
• 0 •

Ú f ( x ) dx = Ú f ( x ) dx + Ú f ( x ) dx
-• -• 0

= 0 + Ú e - x dx
0

= -e- x 0

= -e- • + 1
=1
Hence, f (x) is a probability density function.
2
P(1 £ X £ 2) = Ú f ( x ) dx
1
2
= Ú e - x dx
1
2
= -e- x 1

= -e -2 + e -1
= 0.233

Example 3
If a random variable has the probability density function f (x) as
f ( x ) = 2e -2 x x>0
=0 x£0

Find the probabilities that it will take on a value (i) between 1 and 3,
and (ii) greater than 0.5.
2.6 Continuous Distribution Function 2.21

Solution
(i) Probability that the variable will take a value between 1 and 3
3
P(1 < X < 3) = Ú f ( x ) dx
1
3
= Ú 2 e -2 x dx
1
3
e -2 x
=2
-2 1
-6
= -(e - e -2 )

= e -2 - e -6
(ii) Probability that the variable will take a value greater than 0.5

P( X > 0.5) = Ú f ( x ) dx
0.5

Ú 2e
-2 x
= dx
0.5

e -2 x
=2
-2 0.5
-•
= -(e - e -1 )
= e -1

Example 4
Find the constant k such that the function
f ( x ) = kx 2 0< x<3
=0 otherwise
is a probability density function and compute (i) P(1 < x < 2),
(ii) P(X < 2), and (iii) P(X ≥ 2).
Solution
Since f (x) is a probability density function,

Ú f ( x ) dx = 1
-•
0 3 •

Ú f ( x ) dx + Ú f ( x ) dx + Ú f ( x ) dx = 1
-• 0 3
2.22 Chapter 2 Random Variables

3
0 + Ú kx 2 dx + 0 = 1
0
3
x3
k =1
3 0

k
(27 - 0) = 1
3
9k = 1
1
k=
9
1 2
Hence, f ( x ) = x 0< x<3
9
=0 otherwise
2
(i) P(1 < X < 2) = Ú f ( x ) dx
1
2 1 2
=Ú x dx
1 9
2
1 x3
=
9 3 1
1
= (8 - 1)
27
7
=
27
2
(ii) P( X < 2) =
Ú -•
f ( x ) dx
0 2
=Ú f ( x ) dx + Ú f ( x ) dx
-• 0
2 1 2
= 0+Ú x dx
0 9
1 2 2
9 Ú0
= x dx

2
1 x3
=
9 3 0
1
= (8 - 0)
27
8
=
27
2.6 Continuous Distribution Function 2.23

(iii) P( X ≥ 2) = 1 - P( X < 2)
8
= 1-
27
19
=
27

Example 5
If the probability density function of a random variable is given by
f ( x ) = k (1 - x 2 ) 0 < x < 1
=0 otherwise
Find the value of k and the probabilities that a random variable having
this probability density will take on a value (i) between 0.1 and 0.2, and
(ii) greater than 0.5.
Solution
Since f (x) is a probability density function,

Ú f ( x ) dx = 1
-•
0 1 •

Ú f ( x ) dx + Ú f ( x ) dx + Ú f ( x ) dx = 1
-• 0 1
1
0 + Ú k (1 - x 2 ) dx + 0 = 1
0
1
x3
k x- =1
3 0
Ê 1ˆ
k Á1 - ˜ = 1
Ë 3¯
3
k=
2
3
Hence, f ( x ) = (1 - x 2 ) 0 < x < 1
2
=0 otherwise
(i) Probability that the variable will take on a value between 0.1 and 0.2
0.2
P(0.1 < X < 0.2) = Ú f ( x ) dx
0.1
0.2
3
= Ú 2
(1 - x 2 ) dx
0.1
2.24 Chapter 2 Random Variables

0.2
3 x3
= x-
2 3 0.1

3 ÈÊ 0.008 ˆ Ê 0.001ˆ ˘
= ÍÁ 0.2 - ˜ - Á 0.1 - ˜˙
2 ÎË 3 ¯ Ë 3 ¯˚
= 0.1465
(ii) Probability that the variable will take on a value greater than 0.5

P( X > 0.5) = Ú f ( x ) dx
0.5
1 •
= Ú f ( x ) dx + Ú f ( x ) dx
0.5 1
1
3
= Ú 2
(1 - x 2 ) dx + 0
0.5
1
3 x3
= x-
2 3 0.5

3 ÈÊ 1 ˆ Ê 0.125 ˆ ˘
= ÍÁ 1 - ˜ - Á 0.5 - ˜˙
2 ÎË 3 ¯ Ë 3 ¯˚
= 0.3125

Example 6
If X is a continuous random variable with pdf
f ( x) = x2 0 £ x £1
=0 otherwise
19
If P (a £ X £ 1) = , find the value of a.
81
Solution
19
P(a £ X £ 1) =
81
1 19
Úa f ( x) dx = 81
1 2 19
Úa x dx = 81
2.6 Continuous Distribution Function 2.25

1
x3 19
=
3 a 81
1 19
(1 - a ) =
3 81
19
1- a =
27
46
a=
27

Example 7
Let X be a continuous random variable with pdf
f (x) = kx (1 – x), 0 £ x £ 1
Find k and determine a number b such that P(X £ b) = P(X ≥ b).
Solution
Since f (x) is a probability density function,

Ú f ( x) = 1
-•
0 1 •

Ú f ( x ) dx + Ú f ( x ) dx + Ú f ( x ) dx = 1
-• 0 1
1
0 + Ú kx (1 - x ) dx + 0 = 1
0
1
k Ú ( x - x 2 ) dx = 1
0
1
x2 x3
k - =1
2 3 0

ÈÊ 1 1 ˆ ˘
k ÍÁ - ˜ - (0 - 0)˙ = 1
Ë
Î 2 3 ¯ ˚
Ê 1ˆ
kÁ ˜ =1
Ë 6¯
k =6
2
Hence, f (x) = 6(x – x )   0 £ x £ 1
Since total probability is 1 and P(X £ b) = P(X ≥ b),
2.26 Chapter 2 Random Variables

1
P ( X £ b) =
2
b 1
Ú0 f ( x) dx = 2
b 1
6 Ú ( x - x 2 ) dx =
0 2
b
x2 x3 1
6 - =
2 3 0 2
b 2 b3 1
- =
2 3 12
6b 2 - 4b3 = 1
3 2
  4b - 6b + 1 = 0
(2b - 1)(2b2 - 2b - 1) = 0
1 1± 3
b= or b =
2 2
b lies in (0, 1).
1
\ b=
2

Example 8
The length of time (in minutes) that a certain lady speaks on the
telephone is found to be a random phenomenon, with a probability
function specified by the function
x
-
f ( x) = A e 5 x≥0
=0 otherwise
(i) Find the value of A that makes f (x) a probability density function.
(ii) What is the probability that the number of minutes that she will take
over the phone is more than 10 minutes?
Solution
(i) For f(x) to be a probability density function,

Ú f ( x) x = 1
-•
0 •

Ú f ( x ) dx + Ú f ( x ) dx = 1
-• 0

2.6 Continuous Distribution Function 2.27

• x
-
0 + Ú Ae 5 dx =1
0

x
-
e 5
A =1
1
-
  0 5
-• -0
-5 A(e - e ) = 1
-5 A(0 - 1) = 1
5A = 1
1
A=
5
x
1 -5
Hence, f ( x ) = e x≥0
5
=0 otherwise

(ii) P( X > 10) = Ú f ( x ) dx
10
• x
1 -
= Ú 5 e 5 dx
10

x
-
1 e 5
=
5 1
-
5 10

= - (e - • - e -2 )
= - (0 - e -2 )
1

= e2

Example 9
A continuous random variable X has a pdf f (x)2 = 3x2, 0 £ x £ 1. Find
a and b such that
(i) P(X £ a) = P(X > a) and
(ii) P(X > b) = 0.05
Solution
Since total probability is 1 and P(X £ a) = P(X > a),
2.28 Chapter 2 Random Variables

1
P( X £ a) =
2
a
1
Ú f ( x ) dx = 2
0
a
1
Ú 3x
2
dx =
0
2
a
x3 1
3 =
3 0 2
1
a3 =
    2
1
Ê 1ˆ 3
a=Á ˜
Ë 2¯
   
P( X > b) = 0.05
1

Ú f ( x) dx = 0.05
b
1

Ú 3x
2
dx = 0.05
b
1
x3
3 = 0.05
3 b
1 - b3 = 0.05
19
b3 =
20
1
Ê 19 ˆ 3
b=Á ˜
Ë 20 ¯
   

Example 10
Let the continuous random variable X have the probability density
function
2
f ( x) = 3 1< x < •
x
=0 otherwise
Find F(x).
2.6 Continuous Distribution Function 2.29

Solution
x
F ( x) = Ú f ( x ) dx
-•
1 x
= Ú f ( x ) dx + Ú f ( x ) dx
-• 1
x
2
= 0+Ú dx
1 x3
x
x -2
=2
-2 1
x
1
=-
   x2 1

Ê 1 ˆ
= - Á 2 - 1˜
Ëx ¯
1
= 1- 2
   x
1
Hence, F ( x ) = 1 - 2 1< x < •
x
=0 otherwise

Example 11
Verify that the function F(x) is a distribution function.
F ( x) = 0 x<0
x
-
= 1- e 4 x≥0
Also, find the probabilities P ( X £ 4), P( X ≥ 8), P(4 £ X £ 8).
Solution
For the function F(x),
(i) F(– •) = 0
(ii) F(•) = 1 – e–• = 1 – 0 = 1
(iii) 0 £ F(x) £ 1 –•<x<•
If f (x) is the corresponding probability density function,
f ( x ) = F ¢( x ) = 0 x < 0
x
1 -4
= e x≥0
4
2.30 Chapter 2 Random Variables

• 0 •
Ú-• f ( x) dx = Ú-• f ( x) dx + Ú0 f ( x ) dx
x
• 1 -4
= 0+Ú e dx
0 4

x
-
1 e 4
=
4 1
-
4 0

x
-
=- e 4 0

= – (0 – 1)
=1
Hence, F(x) is a distribution function.
P( X £ 4) = F (4)
= 1 - e-1
1
= 1-
e
e -1
=
e
P( X ≥ 8) = 1 - P( X £ 8)
= 1 - F (8)

   = 1 - (1 - e-2 )

= e -2
1
= 2
   e
P(4 £ X £ 8) = F (8) - F (4)
= (1 - e -2 )- (1 - e -1 )
= e -1 - e-2
1 1
= - 2
e e
e -1
= 2
e

Example 12
The troubleshooting capacity of an IC chip in a circuit is a random
variable X whose distribution function is given by
2.6 Continuous Distribution Function 2.31

F ( x) = 0 x£3
9
= 1- x>3
x2
where x denotes the number of years. Find the probability that the IC chip
will work properly (i) less than 8 years, (i) beyond 8 years, (iii) between
5 to 7 years, and (iv) anywhere from 2 to 5 years.
Solution
(i) P(X £ 8) = F(8)
9
= 1- 2
8
= 0.8594
(ii) P( X > 8) = 1 - P( X £ 8)
= 1 - F (8)
= 1 - 0.8594
= 0.1406
(iii) P(5 £ X £ 7) = F (7) - F (5)
Ê 9ˆ Ê 9ˆ
= Á1 - 2 ˜ - Á1 - 2 ˜
Ë 7 ¯ Ë 5 ¯
= 0.1763
(iv) P(2 £ X £ 5) = F (5) - F (2)
Ê 9ˆ
= Á1 - 2 ˜ - 0
Ë 5 ¯
= 0.64

Example 13
The probability density function of a continuous random variable X is
given by
Ïax 0 £ x £1
Ôa 1£ x £ 2
Ô
f ( x) = Ì
Ô3a - ax 2 £ x £ 3
ÔÓ0 otherwise
(i) Find the value of a, and (ii) find the cdf of X.
2.32 Chapter 2 Random Variables

Solution
(i) Since f (x) is a probability density function,

Ú f ( x ) dx = 1
-•
• 1 2 3

Ú f ( x ) dx + Ú f ( x ) dx + Ú f ( x ) dx + Ú f ( x ) dx = 1
-• 0 1 2
1 2 3
0 + Ú ax dx + Ú a dx + Ú (3a - ax ) dx = 1
0 1 2

1 3
x2 2 ax 2
a +a x 1
+ 3ax - =1
2 0 2 2

Ê1 ˆ ÈÊ 9a ˆ ˘
a Á - 0˜ + a(2 - 1) + ÍÁ 9a - ˜ - (6 a - 2 a)˙ = 1
Ë2 ¯ ÎË 2 ¯ ˚

1 9a
a+a+ - 4a = 1
2 2
2a = 1
1
a=
            2
x
(ii) F ( x ) =
Ú f ( x ) dx
-•
For 0 £ x £ 1,
0 x
F ( x) = Ú f ( x ) dx + Ú f ( x ) dx
-• 0
x
= 0 + Ú ax dx
0
x
x2
=a
2 0
2
ax
=
   2
For 1 £ x £ 2,
0 1 x
F ( x) = Ú f ( x ) dx + Ú f ( x ) dx + Ú f ( x ) dx
-• 0 1
1 x
= 0 + Ú ax dx + Ú a dx
0 1
2.6 Continuous Distribution Function 2.33

1
x2 x
=a +a x 1
2 0

Ê1 ˆ
= a Á - 0˜ + a( x - 1)
Ë2 ¯
a
= + ax - a
2
a
= ax -
   2
For 2 £ x £ 3,
0 1 2 x
F ( x) = Ú f ( x ) dx + Ú f ( x ) dx + Ú f ( x ) dx + Ú f ( x ) dx
-• 0 1 2
1 2 x
= 0 + Ú ax dx + Ú a dx + Ú (3a - ax ) dx
0 1 2
1 x
2
x 2 ax 2
=a +a x 1+ 3ax -
2 0 2 2

Ê1 ˆ ÈÊ ax 2 ˆ ˘
= a Á - 0˜ + a(2 - 1) + ÍÁ 3ax - ˜ - (6 a - 2 a)˙
Ë2 ¯ ÎË 2 ¯ ˚

a ax 2
=+ a + 3ax - - 4a
2 2
ax 2 5a
= 3ax - -
   2 2
2
Hence, F ( x ) = ax 0 £ x £1
2
a
= ax - 1£ x £ 2
2
ax 2 5a
= 3ax - - 2£ x£3
2 2

Example 14
The pdf of a continuous random variable X is
1
f ( x ) = e -| x |
2
Find cdf F(x).
2.34 Chapter 2 Random Variables

Solution
1 x
f ( x) = e -• < x < 0
2
1
= e- x 0< x<•
2
x
F ( x) = Ú f ( x ) dx
-•
For x £ 0,
x
1 x
F ( x) = Ú 2
e dx
-•

1 x x
= e -•
2
1
= (e x - e - • )
2
1
= ex
2
For x > 0,
0 x
F ( x) = Ú f ( x ) dx + Ú f ( x ) dx
-• 0
0 x
1 x 1
= Ú 2
e dx + Ú e - x dx
2
-• 0

1 x 0 1 x
= e - • + -e- x 0
2 2
1 1
= (1 - e - • ) + (-e - x + e0 )
2 2
1 1 -x 1
= - e +
2 2 2
1 -x
= 1- e
   2
1
Hence, F ( x ) = e x x£0
2
1
= 1 - e- x x > 0
2
2.6 Continuous Distribution Function 2.35

Example 15
Find the value of k and the distribution function F(x) given the probability
density function of a random variable X as
k
f ( x) = 2 -• < x < •
x +1
Solution
Since f (x) is the probability density function,

Ú f ( x ) dx = 1
-•


k
Ú x +12
dx = 1
-•

1
k Ú x +12
dx = 1
-•

k tan -1 x -• =1
k ÈÎtan • - tan (-•)˘˚ = 1
-1 -1

Èp Ê p ˆ ˘
k Í -Á- ˜˙ =1
Î2 Ë 2¯˚
kp = 1
1
k=
p
1 1
Hence, f ( x ) = -• < x < •
p x2 + 1
x
F ( x) = Ú f ( x ) dx
-•
x
1 1
=
p Ú x +12
dx
-•

1 x
= tan -1 x -•
p
1
= ÈÎtan -1 x - tan -1 (-•)˘˚
p
1Ê pˆ
= Á tan -1 x + ˜
pË 2¯
2.36 Chapter 2 Random Variables

Example 16
Find the constant k such that
f ( x ) = kx 2 0< x<3
=0 otherwise
is a probability function. Also, find the distribution function F(x) and
P(1 < X £ 2).
Solution
Since f (x) is probability density function,

Ú f ( x ) dx = 1
-•

• 3 •

Ú f ( x ) dx + Ú f ( x ) dx + Ú f ( x ) dx = 1
-• 0 3
3
0 + Ú kx 2 dx + 0 = 1
0
3
x3
k =1
3 0
k (9 - 0) = 1
1
k=
9
1 2
Hence, f ( x) = x 0< x<3
9
=0 otherwise
x
F ( x) = Ú f ( x ) dx
-•
0 x
= Ú f ( x ) dx + Ú f ( x ) dx
-• 0

x 1 2
= 0+Ú x dx
0 9
x
1 x3
=
9 3 0
1 3
= x
   27
2.6 Continuous Distribution Function 2.37

1 3
Hence, F ( x ) = x 0< x<3
27
=0 otherwise
2
P(1 < x £ 2) = Ú f ( x ) dx
1
2
1 2
=Ú x dx
1
9
2
1 x3
=
9 3 1

1
= (8 - 1)
27
7
=
27

Exercise 2.2
1. Verify whether the following functions are probability density functions:
(i) f (x) = k e - kx x ≥ 0, k > 0
1
(ii) f (x) = e -|x| -• < x < •
2
2 Ê xˆ
(iii) f (x) = x Á 2 - ˜ 0£x£3
9 Ë 2¯

ÈÎans.: (i) Yes (ii) Yes (iii) Yes˘˚

2. Find the value of k if the following are probability density functions:
(i) f (x) = k(1 + x) 2£x £5
2
(ii) f (x) = k(x - x ) 0 £ x £1
2
-4 x
  (iii) f (x) = kx e 0£x£•
2
x
-

  (iv) f (x) = kx e
4
0£x£•
È 2 1˘
Í ans.: (i) 27 (ii) 6 (iii) 8 (iv) 2 ˙
 Î ˚
2.38 Chapter 2 Random Variables

3. A function is defined as
Ï0 x<2
Ô
Ô 2x + 3
f ( x) = Ì 2£x£4
Ô 18
ÔÓ0 x>4

Show that f(x) is a probability density function and find P(2 < X < 3).
È 4˘
Í ans.: 9 ˙
 Î ˚
4. Let X be a continuous random variable with probability distribution
Ïx
Ô +k 0£x£3
f ( x) = Ì 6
ÔÓ0 otherwise

  Find k, and P(1 £ X £ 2).
È 1˘
Í ans.: 1, 3 ˙
 Î ˚
5. Find the value of k such that f(x) is a probability density function. Find
also, P(X £ 1.5).
Ïkx 0 £ x £1
Ô
f (x) = Ìk 1£ x £ 2
Ôk(3 - x) 2£x£3
Ó
È 1 1˘
Í ans.: 2 , 2 ˙
 Î ˚
6. If X is a continuous random variable whose probability density function is
given by
f (x) = k(4 x - 2 x 2 ) 0<x<2
=0 otherwise
  Find (i) the value of k, and (ii) P(X > 1).
È 3 1˘
Í ans.: (i) 8 (ii) 2 ˙
 Î ˚
7. If a random variable has the probability density function
f (x) = k(x 2 - 1) -1 £ x £ 3
=0 otherwise
2.6 Continuous Distribution Function 2.39

Ê1 5ˆ
Find (i) the value of k, and (ii) P Á £ X £ ˜ .
Ë2 2¯
È 3 19 ˘
Í ans.: (i) 28 (ii) 56 ˙
 Î ˚
8. The probability density function is
f (x) = k(3x 2 - 1) -1 £ x £ 2
=0 otherwise

Find (i) the value of k, and (ii) P(–1 £ X £ 0).


È 1 ˘
Í ans.: (i) 6 (ii) 0 ˙
 Î ˚
9. Is the function defined by
f ( x) = 0 x<2
1
= (2 x + 3) 2£x£4
18
=0 x>4

a probability density function? Find the probability that a variate having


f(x) as density function will fall in the interval 2 £ X £ 3.
È 4˘
Í ans.: Yes, 9 ˙
 Î ˚
10. A random variable X gives measurements x between 0 and 1 with a
probability function
f (x) = 12 x 3 - 21x 2 + 10 x 0 £ x £1
=0 otherwise

Ê 1ˆ Ê 1ˆ
  (i) Find P Á X £ ˜ and P Á X > ˜ .
Ë 2¯ Ë 2¯
1
  (ii) Find a number k such that P( X £ k) = .
2 7
È ˘
Í ans.: (i) 16 (ii) 0.452˙
 Î ˚
11. The distribution function of a random variable X is given by
ÏÔ1 - e - x
2
x>0
F ( x) = Ì
ÔÓ0 otherwise

Find the probability density function.
È ans.: f (x) = 2 xe - x ˘
2
x>0
Í ˙
ÍÎ =0 otherwise ˙˚

2.40 Chapter 2 Random Variables

12. The cdf of a continuous random variable X is given by


Ï0 x<0
Ô
F ( x) = Ì x 2 0 £ x £1
Ô1 x >1
Ó

Ê1 4ˆ
  Find the pdf and P Á £ X £ ˜ .
Ë2 5¯ ÈÎans.: 0.195˘˚

13. Find the distribution function corresponding to the following probability
density functions:
Ï 1 2 -x
Ô x e 0£x<•
(i) f (x) = Ì 2
ÔÓ0 otherwise
(ii) f (x) = x 0 £ x £1
= 2-x 1£ x £ 2
=0 otherwise
(iii) f (x) = l(x - 1)4 1 £ x £ 3, l > 0
  =0 otherwise
È ˘
Í ˙
Í Ï Ê x ˆ
2
˙
Ô1 - e - x Á 1 + x + ˜ x ≥ 0
Í ans.: (i) F (x) = Ì Ë 2¯ ˙
Í Ô0 ˙
Í Ó otherwise ˙
Í Ï0 x<0 ˙
Í Ô 2 ˙
Í ÔÔ x ˙
Í 0 £ x £1 ˙
(ii) F (x) = Ì 2
Í ˙
Í Ô2 x - 0.5x - 1 1 £ x £ 2
2
˙
Í Ô ˙
Í Ô
Ó1 x > 2 ˙
Í Ï0 x £1 ˙
Í Ô5 ˙
Í 5 Ô
Í (iii) l = , F (x) = Ì (x - 1) 1 £ x £ 3˙˙
4

32 Ô 32
Í ˙
ÎÍ ÔÓ1 x ≥ 3 ˚˙

14. A continuous random variable X has the following probability density
function
a
f ( x) = 2 £ x £ 10
x5
2.7 Two-Dimensional Discrete Random Variables 2.41

Determine the constant a, distribution function of X, and find the


probability of the event 4 £ x £ 7.
È 2500 625 Ê 1 1ˆ ˘
Í ans.: , F ( x) = ÁË - 4 ˜¯ , 0.056 ˙
Î 39 39 16 x ˚


2.7 Two-Dimensional Discrete Random Variables

In one-dimensional random variable, the outcome of any experiment had only one
characteristic. In many situations, the outcome of a random experiment depends on
two or more characteristics e.g., both voltage and current are measured in certain
experiment.
Let X and Y be two random variables defined on the same sample space S, then
the function (X, Y) that assigns a point in R2 is called a two-dimensional random
variable.
A two-dimensional random variable is said to be discrete if it takes at most a
countable number of points in R2. When (X, Y) is a two-dimensional discrete random
variable, the possible values of (X, Y) may be represented as (xi, yj), i = 1, 2, ..., m, ...;
j = 1, 2, ... , n, ...

2.7.1 Joint Probability Mass Function


If (X, Y) is a two-dimensional discrete random variable, then the joint discrete function
of X, Y, also called the joint probability mass function of X, Y, denoted by pXY is
defined by
p ( x , y ) = P( X = xi , Y = y j ) for a value of ( xi , y j ) of ( X , Y )
XY i j
and pXY ( xi , y j ) = 0, otherwise
Following conditions should be satisfied for a function to be a probability mass
function:
(i) pXY ( xi , y j ) ≥ 0, for all i and j

n m
(ii) Â Â pXY ( xi , y j ) = 1
j =1 i =1

2.7.2 Cumulative Distribution Function


If (X, Y) is a two-dimensional discrete random variable, then FXY(x, y) = P(X £ x,
Y £ y) is called the cumulative distribution function (cdf) of (X, Y) and is defined by
n m
FXY ( x, y) = Â Â pXY ( xi , y j ) = S S pij
j =1 i =1

2.42 Chapter 2 Random Variables

Properties of cdf
(i) F (-•, y) = 0 = F ( x, •) and F (•, •) = 1
(ii) P (a < X < b, Y £ y) = F (b, y) - F (a, y)
(iii) P ( X £ x , c < Y < d ) = F ( x, d ) - F ( x, c)

(iv) P (a < X < b, c < Y < d ) = F (b, d ) - F (a, d ) - F (b, c) + F (a, c)


(v) For a discrete random variable, FXY(x,y) will have step discontinuities. Deriva-
tives at such discontinuities are not defined. At points of continuity,
∂2 F
= f ( x, y )
∂x ∂y

2.7.3 Marginal Probability function


Let (X, Y) be a two-dimensional discrete random variable which takes up countable
number of values (xi, yj). Then the probability distribution of X is given by
pX ( xi ) = P ( X = xi )
= P ( X = xi , Y = y1 ) + P ( X = xi , Y = y2 ) + ... + P ( X = xi , Y = ym )
= pi1 + pi 2 +  + pij +  + pim
m
= Â pij
j =1
m
= Â p( xi , y j )
j =1

= pi*
   
and is known as marginal probability mass function or discrete marginal density
function of X.
Similarly,
n n
pY ( y j ) = P (Y = y j ) = Â pij = Â p( xi , y j ) = p* j
i =1 i =1

is the marginal probability mass function of Y.
2.7.4 Conditional Probability Function

Let (X, Y) be a two-dimensional discrete random variable. Then the conditional discrete
density function or conditional probability mass function of X, given Y = y, denoted by
pX/Y (x/y) is defined as
P ( X = x, Y = y )
pX /Y = P ( X = x / Y = y) = , provided P (Y = y) π 0.
 P(Y = y)
2.7 Two-Dimensional Discrete Random Variables 2.43

The conditional probability mass function of Y, given X = x, denoted by pY/X (y/x) is


defined as:
P ( X = x, Y = y )
pY / X = P (Y = y / X = x ) = , provided P ( X = x ) π 0.
P( X = x )
A necessary and sufficient condition for the discrete random variables X and Y to be
independent is
P(X = xi, Y = yj) = P(X = xi) P(Y = yj) for all values (xi, yj) of (X, Y)

Example 1
From the following table for bivariate distribution of (X, Y), find
(i) P(X £ 1) (ii) P(Y £ 3) (iii) P(X £ 1, Y £ 3), (iv) P(X £ 1/Y £ 3)
(v) P(Y £ 3/X £ 1) (vi) P(X + Y £ 4)
Y
1 2 3 4 5 6
X

1 2 2 3
0 0 0
32 32 32 32

1 1 1 1 1 1
1
16 16 8 8 8 8

1 1 1 1 2
2 0
32 32 64 64 64

Solution
Marginal distributions

Y
1 2 3 4 5 6 pX (x)
X

1 2 2 3 8
0 0 0
32 32 32 32 32

1 1 1 1 1 1 10
1
16 16 8 8 8 8 16

1 1 1 1 2 8
2 0
32 32 64 64 64 64

3 3 11 13 6 16 Sp ( x ) = 1
pY (y)
32 32 64 64 32 64 Sp ( y ) = 1
2.44 Chapter 2 Random Variables

(i) P( X £ 1) = P( X = 0) + P( X = 1)
8 10
= +
32 16
7
=
8

(ii) P(Y £ 3) = P(Y = 1) + P(Y = 2) + P(Y = 3)


3 3 11
= + +
32 32 64
23
=
64

(iii) P( X £ 1, Y £ 3) = P ( X = 0, Y = 1) + P( X = 0, Y = 2) + P( X = 0, Y = 3)
+ P( X = 1, Y = 1) + P ( X = 1, Y = 2) + P( X = 1, Y = 3)
1 1 1 1
=0+0+ + + +
32 16 16 8
9
=
32

P( X £ 1, Y £ 3)
(iv) P( X £ 1 / Y £ 3) =
P (Y £ 3)
9
= 32
23
64
18
=
23

P( X £ 1, Y £ 3)
(v) P(Y £ 3 / X £ 1) =
P( X £ 1)
9
= 32
7
8
9
=
28

(vi) P( X + Y £ 4) = P( X = 0, Y = 1) + P( X = 0, Y = 2) + P( X = 0, Y = 3)
+ P( X = 0, Y = 4) + P( X = 1,, Y = 1) + P ( X = 1, Y = 2)
+ P( X = 1, Y = 3) + P( X = 2, Y = 1) + P( X = 2, Y = 2)
2.7 Two-Dimensional Discrete Random Variables 2.45

1 2 1 1 1 1 1
=0+0+ + + + + + +
32 32 16 16 8 32 32
13
=
      32

Example 2
For the following joint distribution of X and Y, find the marginal
distributions:
X
0 1 2
Y

3 9 3
0
28 28 28

3 3
1 0
14 14

1
2 0 0
28

Solution
Marginal distributions

X
0 1 2 pY (y)
Y

3 9 3 15
0
28 28 28 28

3 3 6
1 0
14 14 14

1 1
2 0 0
28 28

10 15 3 Sp ( x ) = 1
pX (x)
28 28 28 Sp ( y ) = 1

Marginal distributions of X
P( X = 0) = P( X = 0, Y = 0) + P( X = 0, Y = 1) + P( X = 0, Y = 2)
3 3 1
= + +
28 14 28
10
=
28
2.46 Chapter 2 Random Variables

P( X = 1) = P( X = 1, Y = 0) + P( X = 1, Y = 1) + P( X = 1, Y = 2)
9 3
= + +0
28 14
15
=
28

P( X = 2) = P( X = 2, Y = 0) + P( X = 2, Y = 1) + P( X = 2, Y = 2)
3
= +0+0
28
3
=
28
Marginal distributions of Y
P(Y = 0) = P( X = 0, Y = 0) + P( X = 1, Y = 0) + P( X = 2, Y = 0)
3 9 3
= + +
28 28 28
15
=
28

P(Y = 1) = P( X = 0, Y = 1) + P ( X = 1, Y = 1) + P( X = 2, Y = 1)
3 3
= + +0
14 14
6
=
14

P(Y = 2) = P( X = 0, Y = 2) + P( X = 1, Y = 2) + P( X = 2, Y = 2)
1
= +0+0
28
1
=
28

Example 3
The joint distribution of X and Y is given by
x+y
f ( x, y ) = , x = 1, 2, 3; y = 1, 2
21
Find the marginal distributions.
2.7 Two-Dimensional Discrete Random Variables 2.47

Solution
Marginal distributions

X
1 2 3 pY (y)
Y

2 3 4 9
1
21 21 21 21

3 4 5 12
2
21 21 21 21

5 7 9 Sp ( x ) = 1
pX (x)
21 21 21 Sp ( y ) = 1

Marginal distributions of X
P( X = 1) = P ( X = 1, Y = 1) + P ( X = 1, Y = 2)
2 3
= +
21 21
5
=
21

P( X = 2) = P( X = 2, Y = 1) + P ( X = 2, Y = 2)
3 4
= +
21 21
7
=
21

P( X = 3) = P( X = 3, Y = 1) + P( X = 3, Y = 2)
4 5
= +
21 21
9
=
21
Marginal distributions of Y
P(Y = 1) = P( X = 1, Y = 1) + P ( X = 2, Y = 1) + P( X = 3, Y = 1)
2 3 4
= + +
21 21 21
9
=
21
2.48 Chapter 2 Random Variables

P(Y = 2) = P( X = 1, Y = 2) + P( X = 2, Y = 2) + P( X = 3, Y = 2)
3 4 5
= + +
21 21 21
12
=
21

Example 4
Given is the joint distribution of X and Y
X
0 1 2
Y
0 0.02 0.08 0.1
1 0.05 0.2 0.25
2 0.03 0.12 0.15

Find (i) marginal distributions (ii) the conditional distributions of


X given Y = 0.
Solution
Marginal distributions

X
0 1 2 pY (y)
Y
0 0.02 0.08 0.1 0.2
1 0.05 0.2 0.25 0.5
2 0.03 0.12 0.15 0.3
S p(x) = 1
pX (x) 0.1 0.4 0.5
S p(y) = 1

Marginal distributions of X
P( X = 0) = P( X = 0, Y = 0) + P( X = 0, Y = 1) + P( X = 0, Y = 2)
= 0.02 + 0.05 + 0.03
= 0.1

P( X = 1) = P( X = 1, Y = 0) + P( X = 1, Y = 1) + P( X = 1, Y = 2)
= 0.08 + 0.2 + 0.12
= 0.4
2.7 Two-Dimensional Discrete Random Variables 2.49

P( X = 2) = P( X = 2, Y = 0) + P( X = 2, Y = 1) + P( X = 2, Y = 2)
= 0.1 + 0.25 + 0.15
= 0.5

Marginal distributions of Y

P(Y = 0) = P( X = 0, Y = 0) + P( X = 1, Y = 0) + P( X = 2, Y = 0)
= 0.02 + 0.08 + 0.1
= 0.2

P(Y = 1) = P( X = 0, Y = 1) + P ( X = 1, Y = 1) + P( X = 2, Y = 1)
= 0.05 + 0.2 + 0.25
= 0.5

P(Y = 2) = P( X = 0, Y = 2) + P( X = 1, Y = 2) + P( X = 2, Y = 2)
= 0.03 + 0.12 + 0.15
= 0.3

Conditional distributions of X for Y = 0


P( X = 0, Y = 0) 0.02
P ( X = 0 / Y = 0) = = = 0.1
P(Y = 0) 0.2

P( X = 1, Y = 0) 0.08
P ( X = 1 / Y = 0) = = = 0.4
P(Y = 0) 0.2

P( X = 2, Y = 0) 0.1
P ( X = 2 / Y = 0) = = = 0.5
P(Y = 0) 0.2

X=x 0 1 2
P(X = x/y = 0) 0.1 0.4 0.5

Example 5
The joint probability distribution of two random variables X and Y is
given by
1 1 1
P( X = 0, Y = 1) = , P ( X = 1, Y = - 1) = and P ( X = 1, Y = 1) = .
3 3 3
Find (i) marginal distributions of X and Y and (ii) the conditional
probability distributions of X given Y = 1.
2.50 Chapter 2 Random Variables

Solution
Marginal distributions

X Marginal Y
–1 0 1
Y pY (y)

1 1
–1 0 0
3 3

0 0 0 0 0
1 1 2
1 0
3 3 3

Marginal X 1 2 S p(y) = 1
0
pX (x) 3 3 S p(x) = 1

Marginal distributions of X
P ( X = -1) = P( X = -1, Y = -1) + P ( X = -1, Y = 0) + P( X = -1, Y = 1)
     =0

P ( X = 0) = P( X = 0, Y = -1) + P ( X = 0, Y = 0) + P ( X = 0, Y = 1)
1
=0+0+
3
1
=
     3

P ( X = 1) = P ( X = 1, Y = -1) + P ( X = 1, Y = 0) + P( X = 1, Y = 1)
1 1
= +0+
3 3
2
=
      3

Marginal distributions of Y

P (Y = - 1) = P( X = - 1, Y = -1) + P ( X = 0, Y = - 1) + P ( X = 1, Y = - 1)
1
=0+0+
3
1
=
     3

P (Y = 0) = P( X = - 1, Y = 0) + P ( X = 0, Y = 0) + P ( X = 1, Y = 0)
      =0
2.7 Two-Dimensional Discrete Random Variables 2.51

P (Y = 1) = P ( X = - 1, Y = 1) + P ( X = 0, Y = 1) + P( X = 1, Y = 1)
1 1
=0+ +
3 3
2
=
      3
Conditional Probability distributions of X given Y = 1 is
P ( X = x, Y = y )
P( X = x / Y = y) =
P(Y = y)

P ( X = - 1, Y = 1)
P( X = -1 / Y = 1) = =0
P (Y = 1)

1
P( X = 0, Y = 1) 3 1
P( X = 0 / Y = 1) = = =
P (Y = 1) 2 2
3
1
P( X = 1, Y = 1) 3 1
P( X = 1 / Y = 1) = = =
P(Y = 1) 2 2
3

Example 6
If the joint probability mass function of (X, Y) is given by
P(x, y) = k(2x + 3y), x = 0, 1, 2; y = 1, 2, 3
Find all the marginal probability distribution. Also, find the probability
distribution of (X + Y).
Solution
P(x, y) = k(2x + 3y)
Marginal distributions

X
0 1 2 pY (y)
Y
1 3k 5k 7k 15 k
2 6k 8k 10 k 24 k
3 9k 11 k 13 k 33 k
pX (x) 18 k 24 k 30 k 72 k
2.52 Chapter 2 Random Variables

∵    Sp( x ) = Sp( y) = 1


72 k = 1
1
k=
72
Marginal distributions of X and Y

X
0 1 2 pY (y)
Y

3 5 7 15
1
72 72 72 72

6 8 10 24
2
72 72 72 72

9 11 13 33
3
72 72 72 72

18 24 30
pX (x) 1
72 72 72

Probability distribution of (X + Y)

X +Y P

3
1 p01 =
72

11
2 p02 + p11 =
72

24
3 p03 + p12 + p21 =
72

21
4 p13 + p22 =
72

13
5 p23 =
72

Total = 1
2.7 Two-Dimensional Discrete Random Variables 2.53

Example 7
Let X and Y have the following marginal probability distributions:
Y
0 1 2 pX (x)
X
0 0.1 0.04 0.06 0.2
1 0.2 0.08 0.12 0.4
2 0.2 0.08 0.12 0.4
S p(x) = 1
pY (y) 0.5 0.2 0.3
S p(y) = 1

Solution
X and Y are independent, if pij = pi* p*j for all i and j.
p0* = 0.1 + 0.04 + 0.06 = 0.2
p1* = 0.2 + 0.08 + 0.12 = 0.4
p2* = 0.2 + 0.08 + 0.12 = 0.4
p*0 = 0.1 + 0.2 + 0.2 = 0.5
p*1 = 0.04 + 0.08 + 0.08 = 0.2
p = 0.06 + 0.12 + 0.12 = 0.3
*2
Now,    p0* p*0 = (0.2) (0.5) = 0.1 = p00
p0* p*1 = (0.2) (0.2) = 0.04 = p01
p0* p*2 = (0.2) (0.3) = 0.06 = p02
Similarly, it can be verified that
p1* p*0 = p10 ; p1* p*1 = p11 ; p1* p*2 = p12
p p = p20 ; p2* p*1 = p21 ; p2* p*2 = p22
      2* *0
Hence, the random variables X and Y are independent.

Exercise 2.3
1. Find the marginal distributions of X and Y from the bivariate distribution
of (X, Y) given below:
Y
1 2
X
1 0.1 0.2
2 0.3 0.4
2.54 Chapter 2 Random Variables

X=x 1 2 Y=y 1 2
Ans.:
P(X = x) 0.3 0.7 P(Y = y) 0.4 0.6

2. For the joint probability distribution of two random variables X and Y


given below:
Y
1 2 3 4 Total
X

4 3 2 1 10
1
36 36 36 36 36

1 3 3 2 9
2
36 36 36 36 36

5 1 1 1 8
3
36 36 36 36 36

1 2 1 5 9
4
36 36 36 36 36

11 9 7 9
Total 1
36 36 36 36

Find (i) marginal distributions of X and Y.


(ii) Conditional distributions of X given the value of Y = 1 and that of Y
given the value of X = 2.

Ans.: (i) Value of X = x 1 2 3 4 Value of Y = y 1 2 3 4

10 9 8 9 11 9 7 9
P(X = x) P(Y = y)
36 36 36 36 36 36 36 36

(ii) Y=y 1 2 3 4 Y=y 1 2 3 4

4 1 5 1 1 1 1 2
P(X = x/Y = 1) P(Y = y/X = 2)
11 11 11 11 9 3 3 9

3. A two-dimensional random variable (X, Y) has the joint probability mass


2x + y
function p(x, y) = , where x and y can assume only the integer
27
values 0, 1 and 2. Find the conditional distributions of Y for X = x.
2.7 Two-Dimensional Discrete Random Variables 2.55

X
0 1 2
Y

1 2
0 0
3 3

2 3 4
1
9 9 9

4 5 6
2
15 15 15

4. Let X and Y have the following joint probability distribution:


X
0 1
Y
0 0.1 0.15
1 0.2 0.3
2 0.1 0.15

Show that X and Y are independent.

5. The joint probability distribution of X and Y is given below:


X
1 2 3
Y

1 1 1
2
8 24 12

1 1
4 0
4 4

1 1 1
6
8 24 12

Find P(X < 4), P(Y > 1), P(X < 4/Y > 1), P(2 £ X £ 5, Y > 1), P(Y = 3/X = 2),
P(X + Y £ 7).
È Ê 1 1 1 3 1 19 ˆ ˘
Í ans.: Á , , , , , ˜ ˙
Î Ë 4 2 4 8 6 24 ¯ ˚

6. For the following joint probability distribution of X and Y, find (i) marginal
distributions of X and Y, (ii) conditional distributions of X given Y = 2,
(iii) Are X and Y independent?
2.56 Chapter 2 Random Variables

X
1 2 3
Y
1 0.1 0.1 0.2
2 0.2 0.3 0.1

Ans.:
(i) X=x 1 2 3 Y=y 1 2
P(X = x) 0.3 0.4 0.3 P(Y = y) 0.4 0.6

(ii) X=x 1 2 3

1 1 1
P(X/Y = 2)
3 2 6

(iii) No

7. For the following distribution, find (i) marginal distributions of X and Y,


(ii) conditional distributions of X given Y = 1, (iii) distribution of (X + Y).
Y
1 2
X
1 0.1 0.2
2 0.3 0.4

Ans.:
(i) X=x 1 2 Y=y 1 2
P(X = x) 0.3 0.7 P(Y = y) 0.4 0.6

(ii) X=x 1 2
P(X = x) 0.25 0.75

(iii) X+Y 2 3 4
P(X + Y) 0.1 0.5 0.4

2.8 Two-Dimensional Continuous Random Variables

A two-dimensional random variable is said to be continuous if it takes all the values in


a specified region R in the xy-plane.
2.8 Two-Dimensional Continuous Random Variables 2.57

2.8.1 Joint Probability Density Function


If (X, Y) is a two-dimensional random variable such that
Ï dx dx dy dy ¸
P Ìx - £ X £ x+ and y - £ Y £ y + ˝ = f(x, y) dx dy,
Ó 2 2 2 2˛
then f(x, y) is called the joint probability density function of (x, y).
Properties of Joint Probability Density Function
(i) f(x, y) ≥ 0 for all x, y
• •
(ii) Ú Ú f ( x, y) dx dy = 1
-• -•
db
(iii) P(a £ X £ b, c £ Y £ d ) = Ú Ú f ( x, y) dx dy
c a

2.8.2 Cumulative Distribution Function


If (X, Y) is a two-dimensional continuous random variable, then
y x
FXY ( x, y) = Ú Ú f ( x, y) dx dy is called the cumulative distribution function (cdf)
-• -•
of (X, Y).

2.8.3 Marginal Probability Function


Let (X, Y) be a two-dimensional continuous random variable which assumes all the
values in a specified region R in the xy-plane. Then the probability distribution of X

is given by f X ( x ) = Ú f ( x, y) dy and is called as the marginal probability density
-•
function of X.


Similarly, fY ( y) = Ú f ( x, y) dx is the marginal probability density function of Y.
-•

2.8.4 Conditional Probability Function


Let (X, Y) be a two-dimensional continuous random variable. Then the conditional
continuous density function or conditional probability density function of X, given
Y = y, denoted by f (x/y) is defined as
f ( x, y )
f ( x /y) =
fY ( y)
2.58 Chapter 2 Random Variables

The conditional probability density function of Y, given X = x, denoted by f (y/x) is


defined as:
f ( x, y )
f ( y /x ) =
fX ( x)

A necessary and sufficient condition for the continuous random variables X and Y to
be independent is
f ( x, y) = f X ( x ) fY ( y)

Example 1
The joint probability density function of a two dimensional random vari-
able is
1 -y
f ( x, y ) = xe , 0 < x < 2, y > 0
2
= 0, , otherwise
Find the cumulative distribution function.
Solution
The cumulative distribution function is given by
y x
F ( x, y ) = Ú
-• Ú-•
f ( x, y) dx dy
y x 1
=Ú Ú0 2 xe
-y
dx dy
0
x
1 y x2
= Ú e- y dy
2 0 2
0
1 2 -y y
= x -e
4 0

1 2
= x ( - e - y + e0 )
4
1 2
= x (-e - y + 1)
4
1 2
F ( x, y ) = x (1 - e - y ), 0 < x < 2, y > 0
4
=0 , otherwise
2.8 Two-Dimensional Continuous Random Variables 2.59

Example 2
The joint probability density function of a two dimensional random
variable (X, Y) is f ( x, y) = xe - x ( y + 1) , x > 0, y > 0 . Examine whether the
variables X and Y are independent.
Solution

fX ( x) = Ú f ( x, y) dy
-•

= Ú xe - x ( y +1) dy
0

e - x ( y +1)
=x
-x
0
-• -x
= -(e -e )
-x
= e ,x > 0

fY ( y) = Ú f ( x , y ) dx
-•

= Ú xe - x ( y +1) dx
0

e - x ( y +1) e - x ( y +1)
= x -1
-( y + 1) ( y + 1)2 0
1
= ,y > 0
( y + 1)2
1
f X ( x ) ◊ fY ( y) = e - x
( y + 1)2
f ( x, y) = xe - x ( y +1)
f ( x, y) π f X ( x ) ◊ fY ( y)
Hence, X and Y are not independent.

Example 3
Two random variables X and Y have the joint pdf
f ( x, y) = Ae - (2 x + y ) , x, y ≥ 0
=0 , otherwise
Find (i) A (ii) marginal pdf of X and Y (iii) f(y/x)
2.60 Chapter 2 Random Variables

Solution
(i) Since f(x, y) is a pdf,
• •

Ú Ú f ( x, y)dx dy = 1
-• -•
••

Ú Ú Ae
- (2 x + y )
dx dy = 1
  0 0

• È• ˘
A Ú Í Ú e -2 x dx ˙ e - y dy = 1
0ÍÎ0 ˙˚
• •
e -2 x
AÚ e - y dy = 1
0
-2
0

A
-2 Ú0
(e -• - e0 )e - y dy = 1


A
-2 Ú0
(-1)e - y dy = 1

A -y •
-e =1
2 0

A
(-e -• + e0 ) = 1
2
A=2


(ii) f X ( x ) = Ú f ( x, y)dy
-•

= Ú Ae - (2 x + y ) dy
0

= 2 -e - (2 x + y )
0

= 2(- e-• + e -2 x )
= 2e-2 x , x ≥ 0
\ f X ( x ) = 2e -2 x , x≥0
=0 , x<0


fY ( y) = Ú f ( x , y ) dx
-•
2.8 Two-Dimensional Continuous Random Variables 2.61


= Ú Ae - (2 x + y ) dx
0

e - (2 x + y )
=2
-2
0
2
= - (e -• - e - y )
2
= -(0 - e - y )
= e- y , y ≥ 0

\ fY ( y) = e - y , y ≥ 0
=0 , y<0

f ( x, y )
(iii) f ( y / x) =
fX ( x)
2e - (2 x + y )
=
2e -2 x
= e- y , y ≥ 0

Example 4
The joint probability distribution of X and Y is given by
6- x- y
f ( x, y ) = , 0 < x < 2, 2 < y < 4
8
= 0, otherwise
Find f(y/x = 2).
Solution
f ( x, y )
f ( y / x) =
fX ( x)

fX ( x) = Ú f ( x, y) dy
-•
4
6- x- y
=Ú dy
2
8
4
1 y2
= 6 y - xy -
8 2
2
2.62 Chapter 2 Random Variables

1
8
[(24 - 4 x - 8) - (12 - 2 x - 2)]
=

1
= (6 - 2 x )
    8
6- x- y
f ( y / x) =
6 - 2x
Putting x = 2,
4- y
f ( y / x = 2) =
2

Example 5
The joint pdf of a two dimensional variable (X, Y) is given by
2
+ y2 )
f ( x, y) = kxye - ( x , x > 0, y > 0.
Find the value of k and prove that X and Y are independent.
Solution
Since f(x, y) is a pdf,
• •

Ú Ú f ( x, y) dx dy = 1
-• -•
••
- ( x 2 + y2 )
ÚÚk x ye dx dy = 1
0 0
• •
2 2
k Ú ye - y dy ◊ Ú xe - x dx = 1  …(1)
0 0

1
Putting x 2 = t , x = t , dx = dt
2 t
When x = 0, t = 0
When x = • , t = •
• •
- x2 1
Ú xe dx = Ú te - t
2 t
dt
0 0
1 -t •
= -e
2 0

1
= (-e -• + e0 )
2
1
=
2
2.8 Two-Dimensional Continuous Random Variables 2.63


- y2 1
Similarly, Ú ye dy =
2
0
Putting both integral values in Eq. (1),
1 1
k◊ ◊ =1
2 2
k=4
If X and Y are independent,
f X ( x ) ◊ fY ( y) = f ( x, y)

fX ( x) = Ú f ( x, y) dy
-•

2
+ y2 )
= Ú k x y e-( x dy
0

2
- y2
= k x e- x Úye dy
0
2 1
= 4 x e- x ◊
2
2
= 2 x e- x , x > 0

fY ( y) = Ú f ( x , y ) dx
-•

2
+ y2 )
= Ú k x y e-( x dx
0

2
- x2
= k y e- y Ú xe dx
0
2 1
= 4 y e- y ◊
2
2
    = 2 ye - y , y > 0
2 2
f X ( x ) ◊ fY ( y) = 2 x e - x ◊ 2 y e - y , x, y > 0
2 2
= 4 xy e - ( x + y )
= f ( x, y), x > 0, y > 0
     
Hence, X and Y are independent.

Example 6
The joint probability density function of a two dimensional random vari-
able (X, Y) is
2.64 Chapter 2 Random Variables

f ( x, y) = kx( x - y), 0 < x < 2, - x < y < x


=0 , elsewhere
Find (i) k (ii) fX(x) (iii) fY(y) (iv) f(y/x)
Solution
Since f(x, y) is a probability density function,
• •
Ú-• Ú-• f ( x, y) dx dy = 1
2 x
Ú0 Ú- x kx( x - y) dy dx = 1
x
2 y2
kÚ x xy - dx = 1
0 2
-x

2 ÈÊ x2 ˆ Ê x2 ˆ ˘
k Ú x ÍÁ x 2 - ˜ - Á - x 2 - ˜ ˙ dx = 1
0
ÍÎË 2¯ Ë 2 ¯ ˙˚
2
k Ú 2 x 3 dx = 1
0
2
2x4
k =1
4
0
k (8) = 1
1
k=
8
(ii) The region of integration is DOAB.
In DOAB, along vertical strip RS,
Limits of y : y = –x to y = x
and x varies from x = 0 to x = 2.

fX ( x) = Ú f ( x, y) dy
-•
x
= Ú kx( x - y) dy
-x
x
y2
= kx xy -
2
-x

ÈÊ x2 ˆ Ê x2 ˆ ˘
= kx ÍÁ x 2 - ˜ - Á - x 2 - ˜ ˙
ÎÍË 2¯ Ë 2 ¯ ˙˚

= kx (2 x 2 )
1
= (2 x 3 )
8 Fig. 2.1
2.8 Two-Dimensional Continuous Random Variables 2.65

x3
= , 0< x<2
4
(iii) For limits of x, DOAB is divided into two parts, DOBC and DOAC.
In DOBC, along horizontal strip PQ,
Limits of x: x = –y to x = 2 and y varies from y = –2 to y = 0.
In DOAC, along horizontal strip P¢Q¢,
Limits of x: x = y to x = 2 and y varies from y = 0 to y = 2.

fY ( y) = Ú f ( x , y ) dx
-•
2
= Ú kx( x - y) dx, - 2 £ y £ 0
-y
2
= Ú kx( x - y) dx, 0 £ y £ 2
y

Now,
2
2 x3 x2 y
Ú- y kx( x - y) dx = k
3
-
2
-y

ÈÊ 8 ˆ Ê y3 y3 ˆ ˘
= k ÍÁ - 2 y˜ - Á - - ˜ ˙
ÍÎË 3 ¯ Ë 3 2 ¯ ˙˚

1Ê8 5 y3 ˆ
= Á - 2y +
8Ë3 6 ˜¯
1 y 5 3
= - + y
3 4 48
Also,
2
2 x3 x2 y
Úy kx ( x - y ) dx = k
3
-
2
y

ÈÊ 8 ˆ Ê y3 y3 ˆ ˘
= k ÍÁ - 2 y˜ - Á - ˜ ˙
ÍÎË 3 ¯ Ë 3 2 ¯ ˙˚

1Ê8 y3 ˆ
= Á - 2y + ˜
8Ë3 6¯
1 y y3
= - +
3 4 48
1 y 5 3
Hence, fY ( y) = - + y , -2 £ y £ 0
3 4 48
1 y y3
= - + , 0£ y£2
3 4 48
2.66 Chapter 2 Random Variables

f ( x, y )
(iv) f ( y / x ) =
fX ( x)
1
x( x - y)
= 8
x3
4
x-y
= , -x< y< x
2 x2

Example 7
The joint pdf of a two-dimensional random variable (X, Y) is given by
8
f XY ( x, y) = xy, 1 £ y £ 2, 1 £ x £ y
9

=0 , otherwise
Find the marginal density function of X and Y.
Solution
The region of integration is DABC.
In DABC, along vertical strip PQ,
limits of y: y = x to y = 2
and x varies from x = 1 to x = 2.
Marginal density function of X is

fX ( x) = Ú f ( x, y)dy
-•
2 8
=Ú xy dy
x 9
2
8 y2
= x
9 2 Fig. 2.2
x
4
= x(4 - x 2 ), 1£ x £ 2
9
In DABC, along horizontal strip P¢Q¢, limits of x : x = 1 to x = y and y varies from
y = 1 to y = 2.
Marginal density function of Y is

fY ( y) = Ú f ( x, y)dx
-•
y 8
=Ú xy dx
1 9
2.8 Two-Dimensional Continuous Random Variables 2.67

y
8 x2
= y
9 2
1
4
= y( y 2 - 1), 1£ y £ 2
9

Example 8
If the joint distribution function of X and Y is given by
F ( x, y) = (1 - e - x )(1 - e - y ), x > 0, y > 0
=0 , otherwise
Find fX(x), fY(y) (ii) Are X and Y are independent (iii) Find P(1 < X < 3,
1 < Y < 2).
Solution

F ( x, y) = (1 - e - x )(1 - e - y )
The joint pdf is given by
∂2 F
f ( x, y ) =
2 x ∂y
∂ Ê ∂F ˆ
=
∂x ÁË ∂y ˜¯
∂ È∂ -x -y ˘
= Í (1 - e )(1 - e )˙
∂x Î ∂y ˚

= (1 - e - x )(e - y )
∂x
= e- x e- y
= e - ( x + y ) , x > 0, y > 0

\ f ( x, y) = e - ( x + y ) , x > 0, y > 0
= 0, , otherwise

(i) fX ( x) = Ú f ( x, y) dy
-•
• -( x + y)
=Ú e dy
0

= -e-( x + y )
0
-•
= (-e + e- x )
= e- x , x > 0
2.68 Chapter 2 Random Variables


fY ( y) = Ú f ( x , y ) dx
-•
• -( x + y)
=Ú e dx
0

= -e-( x + y )
0

= (-e -• + e - y )
= e- y , x > 0
(ii) f X ( x ) ◊ fY ( y) = e - x ◊ e - y
= e - ( x + y ) , x > 0, y > 0
= f ( x, y )
Hence, X and Y are independent.
(iii) Since X and Y are independent,
P(1 < X < 3,1 < Y < 2) = P(1 < X < 3) ◊ P(1 < Y < 2)
3 2
= Ú f X ( x ) dx ◊ Ú fY ( y) dy
1 1
3 -x 2 -y
=Ú e dx ◊ Ú e dy
1 1
3 2
= -e- x ◊ -e -y
1 1
-3
= (-e + e ) ◊ (-e -2 + e -1 )
-1

= e -5 - e -4 - e -3 - e -2

Example 9
The joint probability density of two random variables is given by
f ( x, y) = 15e -3 x -5 y , x > 0, y > 0
=0 , elsewhere
Find (i) P(1< X < 2, 0.2 < Y < 0.3) (ii) P(X < 2, Y > 0.2) (iii) marginal
probability density functions of X and Y.
Solution 0.3 2
(i) P(1 < X < 2, 0.2 < Y < 0.3) = Ú Ú f ( x, y) dxdy
0.2 1
0.3 2

Ú Ú 15e
-3 x - 5 y
= dxdy
0.2 1
0.3 È2 ˘
= 15 Ú e -5 y Í Ú e -3 x dx ˙ dy
0.2 ÍÎ 1 ˙˚
2.8 Two-Dimensional Continuous Random Variables 2.69

0.3 2
e -3 x
= 15 Ú e -5 y
dy
0.2
-3
1
0.3
= -5 Ú e -5 y (e -6 - e -3 )dy
0.2
0.3
-6 e -5 y
-3
= -5(e -e )
-5
0.2
-6 -3 -1.5
= (e - e )(e - e -1.0 )
-3
    = 6.84 ¥ 10
• 2
(ii) P(X < 2, Y > 0.2) = Ú Ú f ( x, y) dxdy
0.2 0
• 2

Ú Ú 15e
-3 x - 5 y
= dxdy
0.2 0
• È2 ˘
= 15 Ú Í Ú e -3 x dx ˙ e -5 y dy
0.2 Í
Î0 ˙˚
• 2
e -3 x
= 15 Ú e -5 y dy
0.2
-3
0

= -5 Ú (e -6 - 1)e -5 y dy
0.2

-6 e -5 y
= -5(e - 1)
-5
0.2
-6 -• -1.0
= (e - 1)(e -e )
-6 -1.0
= (e - 1)(-e )
= 0.367
(iii) The region of integration is the first quadrant.
Hence, x and y both varies from 0 to •.

fX ( x) = Ú f ( x, y) dy
-•

= Ú 15e -3 x - 5 y dy
0

-3 x e-5 y
= 15e
-5
0
Fig. 2.3
2.70 Chapter 2 Random Variables

= -3e -3 x (e -• - e0 )
-3 x
    = 3e , x > 0

fY ( y) = Ú f ( x , y ) dx
-•

= Ú 15e -3 x - 5 y dy
0

e -3 x
= 15e -5 y
-3
0

= -5e -5 y (e -• - e0 )
= 5e -5 y , y > 0

Example 10
The joint pdf of (X, Y) is given by
1 -x-y
f ( x, y ) =
e , - • < x < •, - • < y < •
4
(i) Are X and Y independent?
(ii) Find the probability that X £ 1 and Y < 0.
Solution
x = - x, -• < x £ 0

= x, 0£ x<•
Similarly, y = - y, -• < y £ 0
= y, 0£ y<•
1 -x-y
\ f ( x, y ) = e
4
1
= e x + y , - • < x £ 0, - • < y £ 0
4
1 - x- y
= e , 0 £ x < 0, 0 £ y < •
4

(i) f X ( x ) = Ú f ( x, y) dy
-•

1 -x-y
= Ú 4
e dy
-•
2.8 Two-Dimensional Continuous Random Variables 2.71


1 -x -y
= e Ú e dy
4 -•

1 -x È0 y • ˘
= e Í Ú e dy + Ú e - y dy ˙
4 ÍÎ -• 0 ˙˚
1 -x È y0 -y ˘

e = ÍÎ e -• + -e 0 ˙˚
4
1 -x
= e (1 + 1)
4
1 -x
= e , -• < x < •
    2

fY ( y) = Ú f ( x , y ) dx
-•

1 -x-y
= Ú 4
e dx
-•

1 -y -x
= e Ú e dx
4 -•

1 -y È0 x • ˘
= e Í Ú e dx + Ú e - x dx ˙
4 ÍÎ -• 0 ˙˚
1 -y È x0 -x ˘

= e ÍÎ e -• + -e 0 ˙˚
4
1 -y
= e (1 + 1)
4
1 -y
= e , -• < y < •
2
1 -x 1 -y
f X ( x ) ◊ fY ( y) = e ◊ e
2 2
1 -x-y
= e , - • < x < •, -• < y < •
4
= f ( x, y )
Hence, X and Y are independent.
0 1
(ii) P(X £ 1, Y < 0) = Ú Ú f ( x, y) dx dy
-• -•
0 1
1 -x-y
= Ú Ú 4
e dx dy
-• -•
2.72 Chapter 2 Random Variables

1
0
-y
È0 x 1 ˘
= Úe
4 -•
Í Ú e dx + Ú e - x dx ˙ dy
ÍÎ -• ˙˚
0
0
1 -y È x 0 1˘
= Úe
4 -• ÍÎ e -•
+ -e - x ˙ dy

0
1 -y
= Ú
4 -•
e (1 - e -1 + 1) dy

0
1
= (2 - e -1 ) Ú e y dy
4 -•
1 0
(2 - e -1 ) e y
=
4 -•

1
= (2 - e -1 )(1)
4
1
= (2 - e -1 )
4

Example 11
The joint pdf of (X, Y) is given by
p
f ( x, y) = ke - x cos y, 0 £ x £ 2, 0 £ y £
2
=0 , otherwise
Ê pˆ
Find (i) k (ii) P Á X + Y ≥ ˜ .
Ë 2¯
Solution
(i) Since f(x, y) is a pdf,
• •

Ú Ú f ( x, y) dx dy = 1
-• -•
p
2 2

Ú Úk e
-x
cos y dx dy = 1
0 0
p
2 2
k Ú cos y -e - x dy = 1
0
0
p
2
k Ú cos y (-e -2 + 1) dy = 1
0
2.8 Two-Dimensional Continuous Random Variables 2.73

p
k (1 - e -2 ) sin y 02 = 1
k (1 - e -2 )(1) = 1
1
k=
1 - e -2 y

Ê pˆ Ê pˆ
(ii) P Á X + Y ≥ ˜ = 1 - P Á X + Y < ˜
Ë 2¯ Ë 2¯
Ê pˆ
The region of integration x + y < 1 is the B Á 0, ˜
Ë 2¯
DOAB. In DOAB, along horizontal strip P¢Q¢,
P¢ Q¢ p
p x+y=
Limits of x : x = 0 to x = -y 2
2
O x
p Êp ˆ
Limits of y : y = 0 to y = A Á , 0˜
2 Ë2 ¯
p p Fig. 2.4
-y
P( X + Y < 1) = Ú 2 Ú 2 ke - x cos y dx dy
0 0
p
p -y
2
= k Ú cos y -e - x
2 dy
0 0

p È - ÊÁ p - yˆ˜ ˘
= kÚ 2 cos y Í-e Ë 2 ¯ + e0 ˙ dy
0 Í ˙
Î ˚
p
È -p ˘
= k Ú 2 cos y Í-e 2 e y + 1˙ dy
0 Î ˚
È -p p p ˘
= k Í-e 2 Ú 2 e y cos y dy + Ú 2 cos y dy ˙
0 0
ÍÎ ˙˚
È y
p
2 p ˘
Í -p e
= k Í-e 2 (cos y + sin y) + sin y 02 ˙˙
1+1
ÍÎ 0 ˙˚
È Ï p2 ¸ ˘
-p Ôe Ê p p ˆ 1Ô
= k Í-e 2 Ì ÁË cos + sin - ˝ + sin p˙
Í
ÔÓ 2 2 2 ˜¯ 2 Ô 2˙
ÎÍ ˛ ˚˙
È Ï p ¸ ˘
Í -p Ôe 2 1Ô ˙
= k - e Ì (1) - ˝ + 1
2
Í 2Ô ˙
ÍÎ ÔÓ 2 ˛ ˙˚
Ê -p ˆ
1 e 2
Á
=k - + + 1˜
Á 2 2 ˜
Ë ¯
2.74 Chapter 2 Random Variables

kÊ -p ˆ
= Á 1+ e 2 ˜
2Ë ¯

Ê pˆ kÊ -p ˆ
2
+
PÁ X Y ≥ = 1 - +
Á1 e ˜
Ë 2 ˜¯ 2Ë ¯
Ê -p ˆ
2
Á1 + e ˜
Ë ¯
= 1-
2(1 - e-2 )

Example 12
The joint p.d.f of a two-dimensional random variable (X, Y) is given by
1
f ( x, y) = (6 - x - y), 0 < x < 2, 2 < y < 4
8
=0 , otherwise
Find (i) P(X < 1, Y < 3) (ii) P(X < 1/Y < 3).
Solution 31
(i) P(X < 1, Y < 3) = Ú Ú f ( x, y) dx dy
20
31
1
= Ú Ú (6 - x - y) dx dy
20
8
3 1
1 x2
= Ú 6x - - xy dy
82 2
0
3
1 Ê 1 ˆ
= Ú
82 Á
Ë
6 - - y˜ dy
2 ¯
3
1 Ê 11 ˆ
8 Ú2 ÁË 2
= - y˜ dy
¯
3
1 11 y2
= y-
8 2 2
2

1 ÈÊ 33 9 ˆ ˘
= Í - - (11 - 2)˙
8 ÎÁË 2 2 ˜¯ ˚
3
=
     8
2.8 Two-Dimensional Continuous Random Variables 2.75

P ( X < 1, Y < 3)
(ii) P(X < 1/Y < 3) =  ...(1)
P (Y < 3)
32
P(Y < 3) = Ú Ú f ( x, y) dx dy
20
32
1
= Ú Ú (6 - x - y) dx dy
20
8
3 2
1 x2
8 Ú2
= 6 x - - xy dy
2
0
3
1
8 Ú2
= (12 - 2 - 2 y) dy

3
1
8 Ú2
= (10 - 2 y) dy

1 3
=10 y - y 2
8 2

1
= [(30 - 9) - (20 - 4)]
8
5
=
   8
Substituting in Eq (1),
Ê 3ˆ
ÁË 8 ˜¯ 3
P(X <1/Y < 3) = =
Ê 5ˆ 5
ÁË 8 ˜¯

Example 13
The joint pdf of a two-dimensional random variable (X, Y) is given by
Ï 2 x2
Ô xy + , 0 < x < 2, 0 < y <1
f XY ( x, y) = Ì 8
Ô0 , Otherwise
Ó
Ê 1ˆ Ê 1ˆ
Find (i) P(X > 1) (ii) P Á Y < ˜ (iii) P Á X > 1 / Y < ˜
Ë 2¯ Ë 2¯
Ê 1 ˆ
(iv) P Á Y < / X > 1˜ (v) P(X < Y) (vi) P(X + Y £ 1)
Ë 2 ¯
2.76 Chapter 2 Random Variables

Solution
12
(i) P( X > 1) = Ú Ú f ( x, y) dx dy
01
12
Ê x2 ˆ
= Ú Ú Á xy 2 + ˜ dx dy
0 1Ë

1 2
x 2 y2 x3
=Ú + dy
0
2 24 Fig. 2.5
1
1
Ê 1 ˆ Ê y2 1 ˆ
= Ú Á 2 y2 + ˜ - Á + dy
0
Ë 3 ¯ Ë 2 24 ˜¯
1
Ê 3y2 7 ˆ
= ÚÁ + ˜ dy

2 24 ¯
1
y3 7 y
= +
2 24
0
1 7
= +
2 24
19
=
24
1
2 2
(ii) P ÊÁ Y < ˆ˜ = Ú Ú f ( x, y) dx dy
1
Ë 2¯ 0 0
1
2 2Ê
x2 ˆ
= Ú Ú Á xy 2 + ˜ dx dy
0 0Ë

1
2
2
x 2 y2 x3
=Ú + dy
0
2 24
0
1 Fig. 2.6
2
Ê 1ˆ
= Ú Á 2 y 2 + ˜ dy
0
Ë 3¯
1
3 2
2y 1
= + y
3 3
0
1 1
= +
12 6
1
=
4
2.8 Two-Dimensional Continuous Random Variables 2.77

1
2
Ê 1ˆ 2
(iii) P Á X > 1, Y < ˜ = Ú Ú f ( x, y) dx dy
Ë 2¯ 0 1
1
2 2Ê
x2 ˆ
= Ú Ú Á xy 2 + ˜ dx dy
0 1Ë

1
2
2
x 2 y2 x3
=Ú + dy
0
2 24
1
1 Fig. 2.7
2
Ê 1ˆ Ê y 1ˆ 2
= Ú Á 2 y2 + ˜ - Á + ˜ dy
0
Ë 3 ¯ Ë 2 24 ¯
1

3 y2 7 ˆ
= ÚÁ + ˜ dy

2 24 ¯
1
y3 7 y 2
= +
2 24
0

1 7
= +
16 48
5
=
   24
Ê 1ˆ 5
P Á X > 1, Y < ˜
Ê 1ˆ Ë 2 ¯ 24 5
PÁ X >1/ Y < ˜ = = =
Ë 2¯ Ê 1ˆ 1 6
P ÁY < ˜
Ë 2¯ 4
    
Ê 1ˆ 5
P Á X > 1, Y < ˜
Ê 1 ˆ Ë 2 ¯ 24 5
(iv) P Á Y < / X > 1˜ = = =
Ë 2 ¯ P( X > 1) 19 19
24
1 y
(v) P(X < Y) = Ú Ú f ( x, y) dx dy
00
1 y
Ê x2 ˆ
= Ú Ú Á xy 2 + ˜ dx dy
0 0Ë

1 y
x 2 y2 x3
=Ú + dy
0
2 24
0
2.78 Chapter 2 Random Variables

1
Ê y 4 y3 ˆ
= ÚÁ + dy

2 24 ˜¯
1
y5 y 4
= +
10 96
0
1 1
= +
10 96
53
=
480
1 1- y
(vi) P(X + Y £ 1) = Ú Ú f ( x, y) dx dy
0 0
1 1- y
Ê 2 x2 ˆ
=Ú Ú Á xy + 8 ˜ dx dy
0 0 Ë ¯
1 1- y
x 2 y2 x3
=Ú + dy
0
2 24
0

1
ÔÏ (1 - y) y (1 - y)3 Ô¸
2 2
= ÚÌ + ˝ dy
0Ô Ó 2 24 Ô˛
1
ÏÔ (1 - 2 y + y 2 ) y 2 (1 - y)3 ¸Ô
= ÚÌ + ˝ dy
0Ó Ô 2 24 ˛Ô
1
Ï1 1 ¸
= Ú Ì ( y 2 - 2 y3 + y 4 ) + (1 - y)3 ˝ dy

2 24 ˛
1
1 Ê y3 y 4 y 5 ˆ 1 (1 - y)4
= Á - + ˜+
2Ë 3 2 5 ¯ 24 (-4)
0
1 Ê 1 1 1ˆ 1 1
= - + + ◊
2 ÁË 3 2 5 ˜¯ 24 4
13
=
  480

Example 14
The joint pdf of a two dimensional random variable (X, Y) is given by
x 2 + y2
1 -
f ( x, y ) = e 2 a2 , -• < x, y < •
2
2p a
2.8 Two-Dimensional Continuous Random Variables 2.79

2 2
Find P( X + Y £ 4).
Solution
P( X 2 + Y 2 £ 4) = ÚÚ f ( x, y) dx dy
x 2 + y2 £ 4

x 2 + y2
1 -
= ÚÚ 2p a 2
e 2 a2 dx dy
x 2 + y2 £ 4

The region of integration is the interior of the circle


x2 + y2 = 4.
Converting to polar coordinates by putting
x = cos q , y = r sin q , dxdy = rdrdq , equation of the
circle x2 + y2 = 4 reduces to r = 2. Fig. 2.8
In the region, along elementary radial strip OA,
Limits of r : r = 0 to r = 2
and in the region
Limits of q: q = 0 to q = 2p
r2
2p 2 1 -
P ( X + Y £ 4) = Ú
2 2
Ú0 2p a2 e 2 a2 ◊ rdr dq
0

r2
-
1 2p 2 Ê r ˆ
=
2p Ú0 Ú0 -e 2 a2
ÁË - 2 ˜¯ dr dq
a
2
r2
1 2p -
È f ¢( x )dx = e f ( x ) ˙˘
Ú0 2 a2
ÍÎ∵ Ú0 e
f ( x)
= -e dq
2p ˚
a

1 2p Ê - a 2 ˆ
2

2p Ú0 ÁË
= Á - e + 1˜ dq
˜¯

1 Ê -
2 ˆ
2 2p
= Á1 - e a ˜ q 0
2p ÁË ˜¯

1 Ê -
2 ˆ
2
= Á 1 - e ˜ (2p )
a
2p ÁË ˜¯
2
-
=1- e a2
2.80 Chapter 2 Random Variables

Example 15
A gun is aimed at a certain point (origin of the co-ordinate system).
Because of the random factors, the actual hit point can be any point
(X, Y) in a circle of radius ‘a’ about the origin. Assume that the joint
density of X and Y is constant in this circle and is given by
f ( x, y) = c, x 2 + y2 £ a2
= 0, otherwise
Find (i) c (ii) fX(x).
Solution
(i) Since f(x, y) is a probability density func-
tion,
• •

Ú Ú f ( x, y) dx dy = 1
-• -•

ÚÚ c dx dy = 1
x 2 + y2 £ a2

c
2
ÚÚ
2 2
dxdy = 1
x + y £a
    
c(area of circle x2 + y2= a2) = 1 Fig. 2.9
  c(pa2) = 1
1
c=
           p a2
(iii) The region of integration is the interior of the circle x2 + y2 = a2.
2 2 2 2
In the region along the vertical strip AB, Limits of y = - a - x to y = a - x
and x varies from x = –a to x = a.

fX ( x) = Ú f ( x , y ) dy
-•

a2 - x 2
=Ú c dy
- a2 - x 2

a2 - x 2
=c y
- a2 - x 2

=c ( a -x + a -x )
2 2 2 2

=
1
pa
(2 a - x )
2
2 2

2
= a2 - x2 , -a £ x £ a
p a2
2.8 Two-Dimensional Continuous Random Variables 2.81

Exercise 2.4
1. The joint pdf of a two dimensional random variable (X, Y) is given by
f (x, y ) = 2, 0 < x < 1, 0 < y < 1
= 0, otherwise

Find (i) marginal density function (ii) conditional density function.


È ans.: ˘
Í 1
(i) f ( x ) = 2 x , 0 < x < 1 (ii) f (y / x) = , 0 < x < 1˙
Í x ˙
Í = 0 , otherwise 1 ˙
Í f (x / y ) = , 0 < y < 1˙
Í 1- y ˙
Í ˙
Í f (y ) = 2(1 - y ), 0 < y <1 ˙
Í ˙
ÍÎ =0 , otherwise ˙˚
2. The joint pdf of a two dimensional random variable (X, Y) is given by
1
f ( x, y ) = , 0 < x < a, 0 < y < b
ab
= 0 , otherwise

Find the joint probability distribution function and marginal density
functions. Are X and Y independent?
È xy ˘
Í ans.: F (x, y ) = ab , 0 < x £ a, 0 < y £ b ˙
Í ˙
Í = 0 , otherwise ˙
Í 1 1 ˙
Í fX (x) = , fY (y ) = , Yes ˙
Î a b ˚
3. The joint pdf of (X, Y) is given by f (x, y ) = k,0 £ x < y £ 2 .
Find (i) k (ii) fX(x), fY(y) (iii) f(y/x), f(x/y)
È 1 1 1 ˘
Í ans.: (i) 2 , (ii) fX (x) = 2 (2 - x),0 £ x £ 2; fY (y ) = 2 y ,0 £ y £ 2˙
Í ˙
Í 1 1 ˙
ÍÎ (iii) f ( y / x ) = , x < y < 2; f ( x / y ) = ,0 < x < y ˙˚
2-x y

4. The joint pdf of (X, Y) is given by


3 3
f (x, y ) = k(x y + xy ),0 £ x £ 2,0 £ y £ 2
Find (i) k (ii) fX(x), fY(y) (iii) f(y/x), f(x/y)
2.82 Chapter 2 Random Variables

È 1 1 3 1 3 ˘
Í ans.: (i) k = 16 , (ii)fX (x) = 8 (x + 2 x),0 £ x £ 2; fY (y ) = 8 (y + 2y ),0 £ y £ 2˙
Í ˙
Í y(x 2 + y 2 ) x(x 2 + y 2 ) ˙
Í (iii) f (y /x) = 2
,0 £ y £ 2; f (x/y ) = 2
,0 £ x £ 2. ˙
Î 2(x + 2) 2(y + 2) ˚
5. The joint pdf of (X, Y) is given by 
1
f (x, y ) = (3x 2 + xy ), 0 < x £ 1, 0 < y £ 2
3
= 0, , otherwise

Find P( X + Y ≥ 1) .
È 65 ˘
Í ans.: 72 ˙
Î ˚
6. The joint pdf of (X, Y) is given by
f(x,y) = k(6 – x – y), 0 < x < 2, 2 < y < 4
Find (i) k (ii) P(X < 1, Y < 3), (iii) P(X + Y < 3), (iv) P( X < 1/ y < 3)

È 1 3 5 3˘
Í ans.: (i) 8 , (ii) 8 , (iii) 24 , (iv) 5 ˙
Î ˚
-y
7. The joint pdf of (X, Y) is given by f (x, y ) = e , x > 0, y > x
= 0 , otherwise
Find (i) P(X > 1 / Y < 5) (ii) marginal distributions of X and Y.
È e4 - 5 ˘
Í ans.: (i) 5 , (ii) fX (x) = e - x , x > 0; fY (y ) = ye - y , y > 0 ˙
ÍÎ e -6 ˙˚
x y
1 -4-3
8. The joint pdf of (X, Y) is given by f (x, y ) = e , x ≥ 0, y ≥ 0
12
= 0, otherwise
(i) Find conditional density functions of X and Y.
(ii) Are X, Y independent?
È 1 -
y
1 -
x ˘
Í ans.: (i) f (y /x) = e 3 , y ≥ 0; f (x/y ) = e 4 , x ≥ 0, (ii) Yes˙
ÍÎ 3 4 ˙˚

2
9. The joint pdf of (X, Y) is given by f (x, y ) = , x > 0, y > 0
(1 + x + y )3
=0 , otherwise
Find (i) F(x, y) (ii) fX(x) (iii) f(y/x).
2.8 Two-Dimensional Continuous Random Variables 2.83

È 1 1 1 ˘
Í ans.: (i) F (x, y ) = 1 - 1 + x + 1 + x + y - 1 + y ˙
Í ˙
Í 1 ˙
Í (ii) fX (x) = 2
, x>0 ˙
Í (1 + x) ˙
Í =0 , otherwise ˙
Í 2
˙
Í 2(1 + x) ˙
Í (iii) f (y /x) = 3 ˙
Î (1 + x y+ ) ˚

2 xy
10. The joint pdf of (X, Y) is given by f (x, y ) = x + , 0 < x < 1,0 < y < 2
3
=0 , otherwise

Ê 1ˆ Ê 1 1ˆ
Find (i) P Á X > ˜ (ii) P(Y < X) (iii) P Á Y < / X < ˜
Ë 2¯ Ë 2 2¯

È 5 7 5˘
Í ans.: (i) 6 , (ii) 24 , (iii) 32 ˙
Î ˚

1
11. The joint pdf of (X, Y) is given by f (x, y ) = (1 + xy ), x < 1, y < 1
4
=0 , otherwise
Show that X and Y are independent.
- x -2 y
12. The joint pdf of (X, Y) is given by f (x, y ) = Ae . Show that X and Y
are independent.
CHAPTER
3
Basic Statistics

Chapter Outline
3.1 Introduction
3.2 Measures of Central Tendency
3.3 Measures of Dispersion
3.4 Moments
3.5 Skewness
3.6 Kurtosis
3.7 Measures of Statistics for Continuous Random Variables
3.8 Expected Values of Two Dimensional Random Variables
3.9 Bounds on Probabilities
3.10 Chebyshev’s Inequality

3.1 Introduction

A discrete random variable is described by probability function or probability mass


function. Similarly, a continuous random variable is described by its probability density
function. Instead of a function, a more compact description can be made by a few
parameters, known as statistical measures, that are representative of the distribution. In
descriptive statistics, statistical measures are used to summarize a set of observations
in order to communicate the information as simply as possible. The observations are
described in
(i) a measure of location or central tendency, such as arithmetic mean
(ii) a measure of statistical dispersion like standard deviation
(iii) a measure of the shape of the distribution like skewness or kurtosis
(iv) if more than one variable is measured, a measure of statistical dependence
such as correlation coefficient.
3.2 Chapter 3 Basic Statistics

3.2 Measures of Central Tendency

In statistics, a central tendency or measure of central tendency is a central or typical


value of a probability distribution. It is also called a center or location of the distribution.
Measures of central tendency are often called averages. An average is a single value
which can be taken as a representative of the whole distribution. There are five types
of measures of central tendency or averages which are commonly used.
(i) Arithmetic mean or mean or expectation
(ii) Median
(iii) Mode
(iv) Geometric mean
(v) Harmonic mean
1. Mean The mean or average value (m) of the probability distribution of a discrete
random variable X is called as expectation and is denoted by E(X).

m = E ( X ) = Â xi p( xi ) = Â x p( x )
i =1

where p(x) is the probability mass function of the discrete random variable X.
Expectation of any function f(x) of a random variable X is given by

E [f ( x )] = Â f ( xi ) p( xi ) = Â f ( x ) p( x )
i =1

Some important results on expectation:


(i) E(X + k) = E(X) + k
(ii) E(aX ± b) = aE(X) ± b
(iii) E(X + Y) = E(X) + E(Y) provided E(X) and E(Y) exists.
(iv) E(XY) = E(X) E(Y) if X and Y are two independent random variables.
2. Median The median is the point which divides the entire distribution into two
equal parts. If X is a random variable, the value of X = x for which the cumulative dis-
1
tribution function F ( x ) = is called the median of X. For a discrete random variable
2 1
X, if there exits no x such that F ( x ) = then the median M of probability distribution
is given by 2

1
M= ( x + xk +1 )
2 k
1 1
where F(xk) < and F(xk+1) > and xk and xk+1 are two consecutive values of X.
2 2

3. Mode The mode is the value of discrete random variable X for which the prob-
ability is maximum.
3.3 Measures of Dispersion 3.3

4. Geometric Mean The geometric mean G of a random variable X is defined by


log G = E(log X). The geometric mean of the probability distribution of a discrete
random variable X is given by

log G = Â (log xi ) p( xi ) = Â (log x ) p( x )
i =1

where p(x) is the probability mass function of the discrete random variable X.
5. Harmonic Mean The harmonic mean of a random variable X is defined by
1 Ê 1ˆ
= E Á ˜ . The harmonic mean of the probability distribution of a discrete random
H Ë X¯
variable X is given by

1 1 1
= Â p( xi ) = Â p( x )
H i =1 xi x

where p(x) is the probability mass function of the discrete random variable X.

3.3 Measures of Dispersion

A measure of central tendency is a representative value of the random variable. But it


is important to know how the values are clustered around or scattered away from the
measure of central tendency. The property of the random variable or its distribution
by which its values are clustered around or scattering away from the central value is
called dispersion. There are three types of measures of dispersion which are commonly
used.
(i) Quartile Deviation
(ii) Mean Deviation
(iii) Standard Deviation
1. Quartile Deviation Quartile deviation or semi-inter quartile range of the prob-
ability distribution of a discrete random variable X is given by
1
Q = (Q3 – Q1)
2
where Q1 and Q3 are the first and third quartiles of the distribution respectively.
2. Mean Deviation Mean deviation of the probability distribution of a discrete
random variable X is given by
{
MD = E X - m }

= Â xi - m p( xi )
i =1

= Â x - m p( x )

where p(x) is the probability mass function of the discrete random variable X.
3.4 Chapter 3 Basic Statistics

3. Standard Deviation Standard deviation is the positive square root of the arith-
metic mean of the squares of the deviations of the given values from their arithmetic
mean. It is denoted by the Greek letter s.


SD = s = Â xi2 p( xi ) - m 2
i =1

= E (X 2 ) - m2

= E ( X 2 ) - [ E ( X )]2
Variance Variance characterizes the variability in the distributions since two
distributions with same mean can still have different dispersion of data about their means.
Variance of the probability distribution of a discrete random variable X is given by
Var(X) = s2 = E(X – m)2
= E(X2 – 2Xm + m2)
= E(X2) – E(2Xm) + E(m2)
= E(X2) – 2m E(X) + m2 [∵ E(constant) = (constant)]
= E(X2) – 2mm + m2
= E(X2) – m2
= E(X2) – [E(X)]2
Some important results on variance:
(i) Var (k) = 0
(ii) Var (kX) = k2 Var (X)
(iii) Var (X + k) = Var (X)
(iv) Var (aX + b) = a2 Var(X)

Example 1
A random variable X has the following distribution:
X 1 2 3 4 5 6

1 3 5 7 9 11
P(X = x)
36 36 36 36 36 36
Find (i) mean, (ii) variance, and (iii) P(1 < X < 6).
Solution
(i) Mean = m = S xp(x)
Ê 1ˆ Ê 3ˆ Ê 5ˆ Ê 7ˆ Ê 9ˆ Ê 11 ˆ
= 1Á ˜ + 2 Á ˜ + 3 Á ˜ + 4 Á ˜ + 5 Á ˜ + 6 Á ˜
Ë 36 ¯ Ë 36 ¯ Ë 36 ¯ Ë 36 ¯ Ë 36 ¯ Ë 36 ¯

3.3 Measures of Dispersion 3.5

161
=
36
= 4.47
(ii) Variance = s = S x2p(x) – m2
2

Ê 1ˆ Ê 3ˆ Ê 5ˆ Ê 7ˆ Ê 9ˆ
= 1 Á ˜ + 4 Á ˜ + 9 Á ˜ + 16 Á ˜ + 25 Á ˜
Ë 36 ¯ Ë 36 ¯ Ë 36 ¯ Ë 36 ¯ Ë 36 ¯
Ê 11 ˆ
+ 36 Á ˜ - (4.47)
2
Ë 36 ¯
791
= - 19.98
36
= 1.99
(iii) P(1 < X < 6) = P(X = 2) + P(X = 3) + P(X = 4) + P(X = 5)
3 5 7 9
= + + +
36 36 36 36
24
=
36
= 0.67

Example 2
The probability distribution of a random variable X is given below. Find
(i) E(X), (ii) Var(X), (iii) E(2X – 3), and (iv) Var (2X – 3)
X –2 –1 0 1 2
P(X = x) 0.2 0.1 0.3 0.3 0.1

Solution
(i) E(X) = S x p( x )
= –2(0.2) – 1(0.1) + 0 + (0.3) + 2(0.1)
=0
(ii) Var(X) = S x 2 p( x ) - [ E ( X )]2
= 4(0.2) + 1(0.1) + 0 + 1(0.3) + 4(0.1) – 0
= 1.6
(iii) E(2X – 3) = 2E(X) – 3
= 2(0) – 3
= –3
(iv) Var (2X – 3) = (2)2 Var (X)
  = 4(1.6)
  = 6.4
3.6 Chapter 3 Basic Statistics

Example 3
Mean and standard deviation of a random variable X are 5 and 4
respectively. Find E(X2) and standard deviation of (5 – 3X).
Solution
E(X) = m = 5
SD = s = 4
\   Var(X) = s2 = 16
    Var(X) = E(X2) – [E(X)]2
       16 = E(X2) – (5)2
\     E(X2) = 41
Var (5 – 3X) = Var (5) – (–3)2 Var (X)
= 0 + 9(16)
= 144
SD (5 – 3X) = Var (5 - 3 X )

= 144
= 12

Example 4
A machine produces an average of 500 items during the first week of the
month and on average of 400 items during the last week of the month,
the probability for these being 0.68 and 0.32 respectively. Determine the
expected value of the production. [Summer 2015]
Solution
Let X be the random variable which denotes the items produced by the machine. The
probability distribution is
X 500 400

P(X = x) 0.68 0.32

Expected value of the production E(X) = Â x p( x)


= 500(0.68) + 400(0.32)
= 468

Example 5
The monthly demand for Allwyn watches is known to have the following
probability distribution:
3.3 Measures of Dispersion 3.7

Demand (x) 1 2 3 4 5 6 7 8

Probability p(x) 0.08 0.12 0.19 0.24 0.16 0.10 0.07 0.04

Find the expected demand for watches. Also, compute the variance.
Solution
E ( X ) = S x p( x )
= 1(0.08) + 2(0.12) + 3(0.19) + 4(0.24) + 5(0.16)
+ 6(0.10) + 7(0.07) + 8(0.04)
= 4.06
Var( X ) = E ( X 2 ) - [ E ( X )] 2
= S x 2 p( x ) - [ E ( X )] 2
= 1(0.08) + 4(0.12) + 9(0.19) + 16(0.24) + 25(0.16)
+ 36(0.10) + 49(0.07) + 64(0.04) - (4.06)2
= 19.7 - 16.48
= 3.21

Example 6
A discrete random variable has the probability mass function given
below:
X –2 –1 0 1 2 3

P(X = x) 0.2 k 0.1 2k 0.1 2k

Find k, mean, and variance.


Solution
Since P(X = x) is a probability mass function,
 P( X = x ) = 1
0.2 + k + 0.1 + 2 k + 0.1 + 2 k = 1
5k + 0.4 = 1
5k = 0.6
0.6 3
k= =
5 25
Hence, the probability distribution is
X –2 –1 0 1 2 3

2 3 1 6 1 6
P(X = x)
10 25 10 25 10 25
3.8 Chapter 3 Basic Statistics

Mean = E ( X ) = Â x p( x )
Ê 2ˆ Ê 3ˆ Ê 6ˆ Ê 1ˆ Ê 6ˆ
= (-2) Á ˜ + (-1) Á ˜ + 0 + 1 Á ˜ + 2 Á ˜ + 3 Á ˜
Ë 10 ¯ Ë 25 ¯ Ë 25 ¯ Ë 10 ¯ Ë 25 ¯
6
=
25
Variance = Var( X ) = E ( X 2 ) - [E ( X )]
2

= Â x 2 p( x ) - [E ( X )]
2

2
Ê 2ˆ Ê 3ˆ Ê 6ˆ Ê 1ˆ Ê 6ˆ Ê 6ˆ
= 4 Á ˜ + 1Á ˜ + 0 + 1Á ˜ + 4 Á ˜ + 9 Á ˜ - Á ˜
Ë 10 ¯ Ë 25 ¯ Ë 25 ¯ Ë 10 ¯ Ë 25 ¯ Ë 25 ¯
73 36
= -
250 625
293
=
    625

Example 7
A random variable X has the following probability function:
x 0 1 2 3 4 5 6 7

p(x) 0 k 2k 2k 3k k2 2k2 7k2 + k

(i) Determine k. (ii) Evaluate P(X < 6), P(X ≥ 6), P(0 < X < 5) and
P(0 £ X £ 4). (iii) Determine the distribution function of X. (iv) Find the
mean. (v) Find the variance.
Solution
(i) Since p(x) is a probability mass function,
 p( x) = 1
0 + k + 2 k + 2 k + 3k + k + 2 k + 7k 2 + k = 1
2 2

10 k 2 + 9 k - 1 = 1
(10 k - 1) (k + 1) = 0
1
k= or k = -1
10
1
k= = 0.1 [∵ p( x ) ≥ 0, k π -1]
10
3.3 Measures of Dispersion 3.9

Hence, the probability function is


X 0 1 2 3 4 5 6 7

P(X = x) 0 0.1 0.2 0.2 0.3 0.01 0.02 0.17

(ii) P( X < 6) = P( X = 0) + P( X = 1) + P( X = 2) + P( X = 3) + P( X = 4) + P( X = 5)
= 0 + 0.1 + 0.2 + 0.2 + 0.3 + 0.01
= 0.81
P( X ≥ 6) = 1 - P( X < 6)
= 1 - 0.81
= 0.19
P (0 < X < 5) = P ( X = 1) + P( X = 2) + P( X = 3) + P( X = 4)
= 0.1 + 0.2 + 0.2 + 0.3
= 0.8
P(0 £ X £ 4) = P ( X = 0) + P ( X = 1) + P( X = 2) + P ( X = 3) + P( X = 4)
= 0 + 0.1 + 0.2 + 0.2 + 0.3
= 0.8
(iii) Distribution function of X
x p(x) F(x)
0 0 0
1 0.1 0.1
2 0.2 0.3
3 0.2 0.5
4 0.3 0.8
5 0.01 0.81
6 0.02 0.83
7 0.17 1

(iv) m = Â xp( x )
= 0 + 1(0.1) + 2(0.2) + 3(0.2) + 4(0.3) + 5(0.01) + 6(0.02) + 7(0.17)
= 3.66
(v) Var( X ) = s 2 = Â x 2 p( x ) - m 2
= 0 + 1(0.1) + 4(0.2) + 9(0.2) + 16(0.3) + 25(0.01) + 36(0.02)
+ 49(0.17) - (3.66)2
= 3.4044
3.10 Chapter 3 Basic Statistics

Example 8
A fair dice is tossed. Let the random variable X denote the twice the
number appearing on the dice. Write the probability distribution of X.
Calculate mean and variance.
Solution
Let X be the random variable which denotes twice the number appearing on the dice.
(i) Probability distribution of X
x 2 4 6 8 10 12

1 1 1 1 1 1
p(x)
6 6 6 6 6 6

(ii) Mean = m = Â xp( x )


Ê 1ˆ Ê 1ˆ Ê 1ˆ Ê 1ˆ Ê 1ˆ Ê 1ˆ
= 2 Á ˜ + 4 Á ˜ + 6 Á ˜ + 8 Á ˜ + 10 Á ˜ + 12 Á ˜
Ë 6¯ Ë 6¯ Ë 6¯ Ë 6¯ Ë 6¯ Ë 6¯
=7

(iii) Variance = s2 = Â x 2 p( x ) - m 2
Ê 1ˆ Ê 1ˆ Ê 1ˆ Ê 1ˆ Ê 1ˆ Ê 1ˆ
= 4 Á ˜ + 16 Á ˜ + 36 Á ˜ + 64 Á ˜ + 100 Á ˜ + 144 Á ˜ - (7)2
Ë 6¯ Ë 6¯ Ë 6¯ Ë 6¯ Ë 6¯ Ë 6¯
= 11.67

Example 9
Two unbiased dice are thrown at random. Find the probability distribution
of the sum of the numbers on them. Also, find mean and variance.
Solution
Let X be the random variable which denotes the sum of the numbers on two unbiased
dice. The random variable X can take values 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12. The
probability distribution is
X 2 3 4 5 6 7 8 9 10 11 12

1 2 3 4 5 6 5 4 3 2 1
P(X = x)
36 36 36 36 36 36 36 36 36 36 36

Mean = m = S x p(x)
3.3 Measures of Dispersion 3.11

Ê 1ˆ Ê 2ˆ Ê 3ˆ Ê 4ˆ Ê 5ˆ Ê 6ˆ Ê 5ˆ
= 2 Á ˜ + 3Á ˜ + 4 Á ˜ + 5Á ˜ + 6 Á ˜ + 7Á ˜ + 8Á ˜
Ë 36 ¯ Ë 36 ¯ Ë 36 ¯ Ë 36 ¯ Ë 36 ¯ Ë 36 ¯ Ë 36 ¯
Ê 4ˆ Ê 3ˆ Ê 2ˆ Ê 1ˆ
+ 9 Á ˜ + 10 Á ˜ + 11 Á ˜ + 12 Á ˜
Ë 36 ¯ Ë 36 ¯ Ë 36 ¯ Ë 36 ¯
252
=
36
=7
Variance = s2 = Â x 2 p( x ) - m 2
Ê 1ˆ Ê 2ˆ Ê 3ˆ Ê 4ˆ Ê 5ˆ
= 4 Á ˜ + 9 Á ˜ + 16 Á ˜ + 25 Á ˜ + 36 Á ˜
Ë 36 ¯ Ë 36 ¯ Ë 36 ¯ Ë 36 ¯ Ë 36 ¯
Ê 6ˆ Ê 5ˆ Ê 4ˆ Ê 3ˆ
+ 49 Á ˜ + 64 Á ˜ + 81 Á ˜ + 100 Á ˜
Ë 36 ¯ Ë 36 ¯ Ë 36 ¯ Ë 36 ¯
Ê 2ˆ Ê 1ˆ
+ 121 Á ˜ + 144 Á ˜ - (7)2
Ë 36 ¯ Ë 36 ¯
1974
= - 49
36
   = 5.83

Example 10
A sample of 3 items is selected at random from a box containing 10
items of which 4 are defective. Find the expected number of defective
items.
Solution
Let X be the random variable which denotes the defective items.
Total number of items = 10
Number of good items = 6
Number of defective items = 4
6
C3 1
P( X = 0) = P(no defective item) = 10
=
C3 6
6
C2 4C1 1
P( X = 1) = P(one defective item) = 10
=
C3 2
6
C1 4C2 3
P( X = 2) = P(two defective items) = 10
=
C3 10
4
C3 1
P( X = 3) = P(three defective items) = 10
=
C3 30
3.12 Chapter 3 Basic Statistics

Hence, the probability distribution is


X 0 1 2 3

1 1 3 1
P(X = x)
6 2 10 30

Expected number of defective items = E ( X ) = Â x p( x )


Ê 1ˆ Ê 3ˆ Ê 1ˆ
= 0 + 1Á ˜ + 2 Á ˜ + 3 Á ˜
Ë 2¯ Ë 10 ¯ Ë 30 ¯
= 1.2

Example 11
A player tosses two fair coins. He wins ` 100 if a head appears and
` 200 if two heads appear. On the other hand, he loses ` 500 if no
head appears. Determine the expected value of the game. Is the game
favourable to the players?
Solution
Let X be the random variable which denotes the number of heads appearing in tosses
of two fair coins.
S = {HH, HT, TH, TT}
1
p( x1 ) = P ( X = 0) = P (no heads) =
4
2 1
p( x2 ) = P ( X = 1) = P(one head) = =
4 2
1
p( x3 ) = P( X = 2) = P(two heads) =
4
Amount to be lost if no head appears = x1 = – ` 500
Amount to be won if one head appears = x2 = ` 100
Amount to be won if two heads appear = x3 = ` 200
Expected value of the game = m = Â x p( x )
= x1 p( x1 ) + x2 p( x2 ) + x3 p( x3 )
Ê 1ˆ Ê 1ˆ Ê 1ˆ
= -500 Á ˜ + 100 Á ˜ + 200 Á ˜
Ë 4¯ Ë 2¯ Ë 4¯
= ` - 25
Hence, the game is not favourable to the player.
3.3 Measures of Dispersion 3.13

Example 12
Amit plays a game of tossing a dice. If a number less than 3 appears, he
gets ` a, otherwise he has to pay ` 10. If the game is fair, find a.
Solution
Let X be the random variable which denotes tossing of a dice.
2 1
Probability of getting a number less than 3, i.e., 1 or 2 = p( x1 ) = =
6 3
4 2
Probability of getting number more than or equal to 3, i.e., 3, 4, 5, or 6 = p( x2 ) = =
6 3
Amount to be received for number less than 3 = x1 = ` a
Amount to be paid for numbers more than or equal to 3 = x2 = ` –10
E ( X ) = Â x p( x )
= x1 p( x1 ) + x2 p( x2 )
Ê 1ˆ Ê 2ˆ
= a Á ˜ + (-10) Á ˜
Ë 3¯ Ë 3¯
a 20
= -
   3 3
For a pair game, E(x) = 0.
a 20
- =0
3 3
   a = 20

Example 13
A man draws 2 balls from a bag containing 3 white and 5 black balls.
If he is to receive ` 14 for every white ball which he draws and ` 7 for
every black ball, what is his expectation?
Solution
Let X be the random variable which denotes the balls drawn from a bag. 2 balls drawn
may be either (i) both white, or (ii) both black, or (iii) one white and one black.
3
C2 3
Probability of drawing 2 white balls = p( x1 ) = 8 =
C2 28
5
C2 10
Probability of drawing 2 black balls = p( x2 ) = 8
=
C2 28
3.14 Chapter 3 Basic Statistics

3
C1 5C1 15
Probability of drawing 1 white and 1 black ball = p( x3 ) = 8
=
C2 28
Amount to be received for 2 white balls = x1 = ` 14 × 2 = ` 28
Amount to be received for 2 black balls = x2 = ` 7 × 2 = ` 14
Amount to be received for 1 white and 1 black ball = x3 = ` 14 + ` 7 = ` 21
Expectation = E ( X ) = Â x p( x )
= x1 p( x1 ) + x2 p( x2 ) + x3 p( x3 )
Ê 3ˆ Ê 10 ˆ Ê 15 ˆ
= 28 Á ˜ + 14 Á ˜ + 21 Á ˜
Ë 28 ¯ Ë 28 ¯ Ë 28 ¯
= ` 19.25

Example 14
The probability that there is at least one error in an account statement
prepared by A is 0.2 and for B and C, they are 0.25 and 0.4 respectively.
A, B, and C prepared 10, 16, and 20 statements respectively. Find the
expected number of correct statements in all.
Solution
Let p(x1), p(x2) and p(x3) be the probabilities of the events that there is no error in the
account statements prepared by A, B, and C respectively.
p( x1 ) = 1 - (Probability of at least one error in the account
statement prepared by A)
= 1 - 0.2
= 0.8
Similarly, p(x2) = 1 – 0.25 = 0.75
p(x3) = 1 – 0.4 = 0.6
Also, x1 = 10,   x2 = 16,   x3 = 20
Expected number of correct statements = E ( X ) = Â x p( x )
= x1 p( x1 ) + x2 p( x2 ) + x3 p( x3 )
= 10(0.8) + 16 (0.75) + 20 (0.6)
= 32

Example 15
A man has the choice of running either a hot-snack stall or an ice-cream
stall at a seaside resort during the summer season. If it is a fairly cool
3.3 Measures of Dispersion 3.15

summer, he should make ` 5000 by running the hot-snack stall, but if


the summer is quite hot, he can only expect to make ` 1000. On the
other hand, if he operates the ice-cream stall, his profit is estimated at
` 6500, if the summer is hot, but only ` 1000 if it is cool. There is a 40
percent chance of the summer being hot. Should he opt for running the
hot-snack stall or the ice-cream stall?
Solution
Let X and Y be the random variables which denote the income from the hot-snack and
ice-cream stalls respectively.

Probability of hot summer = p1 = 40% = 0.4


Probability of cool summer = p2 = 1 – p1 = 1 – 0.4 = 0.6
x1 = 1000,   x2 = 5000,   y1 = 6500,   y2 = 1000
Expected income from hot-snack stall = E ( X )
= x1 p1 + x2 p2
= 1000 (0.4) + 5000(0.6)
= ` 3400

Expected income from ice-cream stall = E (Y )


= y1 p1 + y2 p2
= 6500 (0.4) + 1000(0.6)
= ` 3200
Hence, he should opt for running the hot-snack stall.

Exercise 3.1
1. The probability distribution of a random variable X is given by
X –2 –1 0 1 2 3
P(X = x) 0.1 k 0.2 2k 0.3 k

  Find k, the mean, and variance.


 ÎÈans.: 0.1, 0.8, 2.16 ˚˘

2. Find the mean and variance of the following distribution:


X 4 5 6 8
P(X = x) 0.1 0.3 0.4 0.2
ÈÎans.: 5.9, 1.49˘˚

3.16 Chapter 3 Basic Statistics

3. Find the value of k from the following data:

X 0 10 15

k-6 2 14
P(X = x)
5 k 5k

  Also, find the distribution function and expectation of X.

X 0 10 15
Ans.: 8, 2 13 , 31
F(X) 1 4
5 20

4. For the following distribution,

X –3 –2 –1 0 1 2

P(X = x) 0.01 0.1 0.2 0.3 0.2 0.15

  Find (i) P(X ≥ 1), (ii) P(X < 0), (iii) E(X), and (iv) Var(X)
ÈÎans.: (i) 0.35 (ii) 0.35 (iii) 0.05 (iv) 1.8475˘˚

5. A random variable X has the following probability function:

X 0 1 2 3 4 5 6 7 8

k k k k 2k 6k 7k 8k 4k
P(X = x)
45 15 9 5 45 45 45 45 45

  Determine (i) k, (ii) mean, (iii) variance, and (iv) SD.

 ÎÈans.: (i) 1(ii) 0.4622 (iii) 4.9971(iv) 2.24 ˚˘


6. A fair coin is tossed until a head or five tails appear. Find (i) discrete
probability distribution, and (ii) mean of the distribution.

X 1 2 3 4 5
Ans.: (i) 1 1 1 1 1
P(X = x)
2 4 8 16 16
 (ii) 1.9
7. Let X denotes the minimum of two numbers that appear when a pair
of fair dice is thrown once. Determine (i) probability distribution,
(ii) expectation, and (iii) variance.
3.3 Measures of Dispersion 3.17

X 1 2 3 4 5 6

Ans.: (i) 11 9 7 5 3 1
P(X = x)
36 36 36 36 36 36

 (ii) 2.5278 (iii) 1.9713

8. For the following probability distribution,


X –3 –2 –1 0 1 2 3

P(X = x) 0.001 0.01 0.1 ? 0.1 0.01 0.001

  Find (i) missing probability, (ii) mean, and (iii) variance.


ÈÎans.: (i) 0.778 (ii) 0.2 (iii) 0.258 ˘˚

9. A discrete random variable can take all integer values from 1 to k each with
1 k +1 k2 + 1
the probability of . Show that its mean and variance are and
k 2 2
respectively.

10. An urn contains 6 white and 4 black balls; 3 balls are drawn without
replacement. What is the expected number of black balls that will be
obtained?
È 6˘
Í ans.: 5 ˙
 Î ˚
11. A six-faced dice is tossed. If a prime number occurs, Anil wins that
number of rupees but if a nonprime number occurs, he loses that number
of rupees. Determine whether the game is favourable to the player.

 ÎÈans.: The game is favourable to Anil ˚˘


12. A man runs an ice-cream parlour at a holiday resort. If the summer is
mild, he can sell 2500 cups of ice cream; if it is hot, he can sell 4000
cups; if it is very hot, he can sell 5000 cups. It is known that for any year,
1 4
the probability of summer to be mild is and to be hot is . A cup of
7 7
ice cream costs ` 2 and is sold for ` 3.50. What is his expected profit?
ÈÎans.: ` 6107.14 ˘˚

13. A player tosses two fair coins. He wins ` 1 or ` 2 as 1 tail or 1 head
appears. On the other hand, he loses ` 5 if no head appears. Find the
expected gain or loss of the player.
ÈÎans.: Loss of ` 0.25˘˚

3.18 Chapter 3 Basic Statistics

14. A bag contains 2 white balls and 3 black balls. Four persons A, B, C, D in
the order named each draws one ball and does not replace it. The first
to draw a white ball receives ` 20. Determine their expectations.
ÈÎans.: ` 8, ` 6, ` 4, ` 2˘˚


3.4 Moments

Moment is the arithmetic mean of the various powers of the deviations of items from
their assumed mean or actual mean. If the deviations of the items are taken from the
arithmetic mean of the distribution, it is known as central moment. If the mean of the
first power of deviations are taken, the first moment about the mean is obtained and
is denoted by m1. The mean of the second power of the deviations gives the second
moment about the mean and is denoted by m2. Similarly, the mean of the cubes of
deviations gives third moment about the mean and is denoted by m3. The mean of
the fourth power of the deviations from the mean gives the fourth moment about the
mean and is denoted by m4. Thus, the mean of the rth power of deviations gives the rth
moment about mean or rth central moment and is denoted by mr.

3.4.1 Central Moments or Moments about Actual Mean


The moments about the mean value m = E(X) are called central moments and denoted
by mr.
mr = E{( x - m )r }

= Â ( xi - m )r p( xi )
i =1

= Â ( x - m )r p( x )

If frequency distribution is given and n = Âf, then p( xi ) =


 fi
N

 fi ( xi - m )r
i =1
mr =
N

3.4.2 Properties of Central Moments


(i) The first moment about the mean is always zero, i.e., m1 = 0.
(ii) The second moment about the mean measures variance, i.e.,
m2 = s 2 or SD = s = ± m2

(iii) The third moment about the mean measures skewness.
3.4 Moments 3.19

If m3 > 0, the distribution is positively skewed.


If m3 < 0, the distribution is negatively skewed.
If m3 = 0, the distribution is symmetrical.
m32
Skewness b1 =
m23

(iv) The fourth moment about the mean measures kurtosis. It gives information on
the peakedness or height of the peak of a frequency distribution, i.e., whether
it is more peaked or more flat topped than a normal curve.
m4
Kurtosis b2 =
m22
(v) In a symmetric distribution, all odd moments are zero, i.e., m1 = m3 = m5 = ...
= m2r+1 = 0.

3.4.3 Raw Moments or Moments about Arbitrary Origin


When the actual mean of a distribution is a fraction, it is tedious to calculate central
moments. In such cases, moments about an arbitrary origin ‘a’ is calculated and then
these moments are converted into the moments about actual mean. The moments
about the arbitrary origin are known as raw moments and are denoted by m¢r. Thus,
m¢1 denotes the first moment about an arbitrary origin, m¢2 denotes the second moment
about an arbitrary origin and so on.
mr¢ = E{( X - a )r }

= Â ( xi - a )r p( xi )
i =1

= Â ( x - a )r p( x )

If frequency distribution is given and n = Âf, then p( xi ) =


 fi
N

 fi ( xi - a)r
i =1
mr¢ =
N

When a = 0, m¢r is called rth order simple moments.

mr¢ = E{X r }

= Â xir p( xi )
i =1
3.20 Chapter 3 Basic Statistics

= Â x r p( x )

=
 fxr
n

3.4.4 Relation between Central Moments and


Raw Moments
The moments about the actual mean, i.e., central moments and moments about the
arbitrary origin, i.e., raw moments are related with each other by the following equations:
First central moment m1 = m1¢ - m1¢ = 0

m2 = m2¢ - ( m1¢ )
2
Second central moment

Third central moment m3 = m3¢ - 3m2¢ m1¢ + 2( m1¢ )3

m4 = m 4¢ - 4 m3¢ m1¢ + 6 m2¢ ( m1¢ ) - 3 ( m1¢ )


2 4
Fourth central moment

Similarly, the raw moments can be expressed in terms of central moments.


First raw moment m¢1 = m – a
Second raw moment m¢2 = m2 + (m¢1)2
Third raw moment m¢3 = m3 + 3m2 m¢1 + (m¢1)3
Fourth raw moment m¢4 = m4 + 4m3 m¢1 + 6m2(m¢1)2 + (m¢1)4

Example 1
Calculate the first four moments from the following data:
x 0 1 2 3 4 5 6 7 8

f 5 10 15 20 25 20 15 10 5

Also, calculate the values of b1 and b2.


Solution
N = Â f = 125

x=
 fx = 500 = 4
N 125
3.4 Moments 3.21

x f fx x–m f (x – m) f (x – m)2 f (x – m)3 f (x – m)4


0 5 0 –4 20 80 –320 1280
1 10 10 –3 –30 90 –270 810
2 15 30 –2 –30 60 –120 240
3 20 60 –1 –20 20 –20 20
4 25 100 0 0 0 0 0
5 20 100 1 20 20 20 20
6 15 90 2 30 60 120 240
7 10 70 3 30 90 270 810
8 5 40 4 20 80 320 1280
Âf Âfx Âf (x – m) Âf (x – m)2 Âf (x – m)4 =
Âf (x – m)3 = 0
= 125 = 500 =0 = 500 4700

Moments about the actual mean:

m1 =
 f (x - m) = 0
=0
N 125

m2 =
 f ( x - m )2 =
500
=4
N 125

m3 =
 f ( x - m )3 =
0
=0
N 125
 f (x - m)
4
4700
m4 = = = 37.6
N 125
m32 0
b1 = = =0
m23 64
m4 37.6
b2 = = = 2.35
m22 16

Example 2
Calculate the first four moments of the following distribution about the
mean:
x 0 1 2 3 4 5 6 7 8

f 1 8 28 56 70 56 28 8 1

Also, evaluate b1 and b2.


3.22 Chapter 3 Basic Statistics

Solution
Let a = 4 be the arbitrary origin.

x f x–a f (x – a) f (x – a)2 f (x – a)3 f (x – a)4


0 1 –4 –4 16 –64 256
1 8 –3 –24 72 –216 648
2 28 –2 –56 112 –224 448
3 56 –1 –56 56 –56 56
4 70 0 0 0 0 0
5 56 1 56 56 56 56
6 28 2 56 112 224 448
7 8 3 24 72 216 648
8 1 4 4 16 64 256
Âf Âf(x – a) Âf(x – a) 2
Âf(x – a)3
Âf(x – a)4
= 256 =0 = 512 =0 = 2816
N = S f = 256
Moments about the arbitrary origin:

m1¢ =
 f ( x - a) = 0
=0
N 256

m2¢ =
 f ( x - a )2 =
512
=2
N 256

m3¢ =
 f ( x - a)3 =
0
=0
N 256

m¢ =
 f ( x - a )4
=
2816
= 11
4
N 256
Moments about the actual mean:
m1 = 0
m2 = m2¢ - ( m1¢ )
2

= 2-0
=2
m3 = m3¢ - 3 m2¢ m1¢ + 2 ( m1¢ )
3

= 0 - 3(2)(0) + 2(00)3
=0
3.4 Moments 3.23

m 4 = m 4¢ - 4 m3¢ m1¢ + 6 m2¢ (m1¢ ) - 3 (m1¢ )


2 4

= 11 - 4(0)(0) + 6(2)(0)2 - 3(0)4


= 11
m32
b1 = =0
m23
m4 11
b2 = = = 2.75
m22 (2)2

Example 3
The first four moments of distribution about x = 2 are 1, 2.5, 5.5, and 16.
Calculate the four moments about m.
Solution
m1¢ = 1, m2¢ = 2.5, m3¢ = 5.5, m 4¢ = 16
Moments about the mean:
m1 = 0
m2 = m2¢ - ( m1¢ )
2

= 2.5 - (1)2
= 1.5
m3 = m3¢ - 3 m2 m1¢ + 2 ( m1¢ )
3

= 5.5 - 3(22.5)(1) + 2(1)3


=0
m 4 = m 4¢ - 4 m3¢ m1¢ + 6 m2¢ ( m1¢ ) - 3 ( m1¢ )
2 4

= 16 - 4(5.5)(1) + 6(2.5)(1)2 - 3(1)4


=6

Example 4
The first three moments of a distribution about the value 2 of the
variables are 1, 16, and –40. Show that the mean = 3, variance = 15
and m3 = –86.
Solution
a = 2, m1¢ = 1, m2¢ = 16, m2¢ = 16, m3¢ = - 40
3.24 Chapter 3 Basic Statistics

m1¢ = m - a
1= m-2
\ m=3

     Mean = 3
m2 = m2¢ - ( m1¢ )
2

= 16 - (1)2
= 15
Variance = m2 = 15

m3 = m3¢ - 3m2¢ m1¢ + 2 ( m1¢ )


3

= -40 - 3(16)(1) + 2(1)3


= -86

Exercise 3.2
1. Calculate the first four moments about the mean from the following
data:
x 1 2 3 4 5

f 2 3 5 4 1

 [Ans.: 0, 1.262, 0.722, 3.795]


2. Calculate the first four moments about the mean and also the value of b2
from the following table:
x 0 1 2 3 4 5 6 7 8

f 1 8 28 156 170 56 28 8 1

 [Ans.: 0, 1.294, 0.642, 0.582, 3.93]


3. The first four moments of a distribution about the value 4 of the variables
are 1, 4, 10, and 45. Show that the mean = 5, variance = 3, and m3 = 0.
4. The first four central moments of a distribution are 0, 2.5, 0.7, and 18.75.
Calculate b1 and b2.
 [Ans.: 0.031, 3]
5. The values of m1, m2, m3 and m4 are 0, 9.2, 3.6, and 1.22 respectively. Find
skewness and kurtosis of the distribution.
 [Ans.: 0.129, 1.4]
3.5 Skewness 3.25

6. The first four moments about the working mean 28.5 of a distribution
are 0.294, 7.144, 14.409, and 454.98. Calculate the moments about the
mean. Also, evaluate b1 and b2.
 [Ans.: 28.794, 7.058, 36.151, 408.738, 3.717, 8.205]

3.5 Skewness

Skewness is a measure that refers to the extent of symmetry or asymmetry in a


distribution. A distribution is said to be symmetrical when its mean, median, and
mode are equal, and the frequencies are symmetrically distributed about the mean.
A symmetrical distribution when plotted on a graph will give a perfectly bell-shaped
curve which is known as a normal curve (Fig. 3.1).

Fig. 3.1

A distribution is said to be asymmetrical or skewed when the mean, median, and


mode are not equal, i.e., the mean median, and mode do not coincide. If the curve
has a longer tail towards the left, it is said to be a negatively skewed distribution
(Fig. 3.2a). If the curve has a longer tail towards the right, it is said to be positively skewed
(Fig. 3.2b).

Fig. 3.2

Skewness gives an idea of the nature and degree of concentration of observations about
the mean.

3.5.1 Measures of Skewness


A measure of skewness gives the extent and direction of skewness of a distribution.
These measures can be absolute or relative. The absolute measures are also known as
measures of skewness.
Absolute skewness = Mean – Mode
3.26 Chapter 3 Basic Statistics

If the value of the mean is greater than the mode, the skewness will be positive and if
the value of the mean is less than the mode, the skewness will be negative.
The relative measures of skewness is called the coefficient of skewness.

3.5.2 Karl Pearson’s Coefficient of Skewness


Karl Pearson’s coefficient of skewness denoted by Sk, is given by
Mean - Mode
Sk =
Standard Deviation
Mean - Mode
=
  s
When the mode is ill-defined and the distribution is moderately skewed, the averages
have the following relationship:
Mode = 3 Median - 2 Mean
Mean - (3 Median - 2 Mean)
Sk =
Standard Deviation
3(Mean - Median)
=
Standard Deviation
3(Mean - Median)
=
s
The coefficient of skewness usually lies between –1 and 1.
For a positively skewed distribution, Sk > 0.
For a negatively skewed distribution, Sk < 0.
For a symmetrical distribution, Sk = 0.

3.6 Kurtosis

Measures of central tendency, dispersion and skewness of a random variable cannot give
a complete idea about the probability distribution. In order to analyse the probability
distribution completely, another characteristic, Kurtosis is required. Kurtosis means
the convexity of the probability curve of the distribution. It measures the degree of
peakedness of distribution and is given by
m m
b2 = 42 = 44
m2 s

a. Leptokurtic    b. Platykurtic    c. Mesokurtic

Fig. 3.3
3.6 Kurtosis 3.27

The curves with b2 > 3 is called Leptokurtic and those with b2 < 3 are called platykurtic.
The normal curve for which b 2 = 3 is called Mesokurtic.
2 m
As b = m3 and b2 = 42 determine the shape of the probability curve, these are
1
m23 m2
called Pearson’s shape coefficients.

Example 1
From the marks scored by 100 students in Section A and 100 students in
Section B of a class, the following measures were obtained:
Section A m A = 55 sA = 15.4 Mode = 58.72

Section B m A = 53 sB = 15.4 Mode = 48.83

Determine which distribution of marks is more skewed.


Solution
Mean - Mode 55 - 58.72
Sk A = = = -0.24
sA 15.4
Mean - Mode 53 - 48.83
Sk B = = = 0.27
sB 15.4
0.27 > -0.24
Hence, the distribution of marks of Section B is more skewed.

Example 2
For a group of 10 items, Âx = 452, Âx2 = 24270, and mode = 43.7. Find
Karl Pearson’s coefficient of skewness.
Solution
n = 10, Â x = 452, Â x 2 = 24270, mode = 43.7

Sx 452
m= = = 45.2
n 10
2
Sx 2 Ê Sx ˆ
s= -Á ˜
n Ë n¯
2
24270 Ê 452 ˆ
= -Á
10 Ë 10 ˜¯
= 19.59
3.28 Chapter 3 Basic Statistics

Mean - Mode
Sk =
s
45.2 - 43.7
=
19.59
= 0.077

Example 3
In a distribution, the mean = 65, median = 70, coefficient of skewness =
–0.6. Find the mode and coefficient of variation.
Solution
m = 65, Median = 70, Sk = -0.6
Mode = 3 Median – 2 Mean = 3(70) – 2(65) = 80
Mean - Mode
    Sk =
s
65 - 80
-0.6 =
s
\ s = 25
s 25
   CV = ¥ 100 = ¥ 100 = 38.64%
x 65

Example 4
The following information was obtained from the records of a factory
relating to wages:
Arithmetic mean = ` 56.8, Median = ` 59.5, Standard deviation = ` 12.4
Give the information about the distribution of wages.
Solution
m = 56.8, Median = 59.5, s = 12.4
3(Mean - Median) 3(56.8 - 59.5)
Sk = = = -0.65
s 12.4
Mode = 3 Median - 2 Mean = 3(59.5) - 2(56.8) = 64.9

Hence, the maximum wages is ` 64.9.


There is a negative skewness in wages.
3.6 Kurtosis 3.29

Example 5
For a moderately skewed distribution of retail price for men’s shoes,
it is found that the mean price is ` 20 and the median price is ` 17.
If the coefficient of variation is 20%, find the Pearson’s coefficient of
skewness.
Solution
m = 20, Median = 17, CV = 20%
s
CV = ¥ 100
x
s
20 = ¥ 100
20
\ s =4
3(Mean - Median) 3(20 - 17)
Sk = = = 2.25
s 4

Example 6
Find the mean, SD, quartiles, median and Karl Pearson’s coefficient of
skewness for the following probability distribution:
X=x 1 2 3 4 5 6 7 8
p(x) 0.008 0.032 0.142 0.216 0.240 0.206 0.143 0.013

Solution
(i) Mean = µ = Âx p(x)
     = 1(0.008) + 2(0.032) + 3(0.142) + 4(0.216) + 5(0.240) + 6(0.206)
+ 7(0.143) + 8(0.013)
     = 4.903
(ii) Var (X) = s 2 = Â x 2 p( x ) - m 2
    = 1(0.008) + 4(0.032) + 9(0.142) + 16(0.216) + 25(0.240) + 36(0.206)
+ 49(0.143) + 64(0.013) – (4.903)2
    = 2.086
SD = Var( X ) = 2.086 = 1.444
(iii) F(3) = 0.008 + 0.032 + 0.142 = 0.182 < 0.25
F(4) = 0.008 + 0.032 + 0.142 + 0.216 = 0.398 > 0.25
1
Q1 = (3 + 4) = 3.5
  3
F(5) = 0.008 + 0.032 + 0.142 + 0.216 + 0.240 = 0.638 < 0.75
F(6) = 0.008 + 0.032 + 0.142 + 0.216 + 0.240 + 0.206 = 0.844 > 0.75
3.30 Chapter 3 Basic Statistics

1
Q2 =
(4 + 5) = 4.5
  2
1
Q3 = (5 + 6) = 5.5
  2
(iv) Median = Q2 = 4.5
(v) Pearson’s coefficient of skewness
Mean - Median 4.903 - 4.5
Sk = = = 0.279
SD 1.444

Example 7
Find the mean, median QD, MD, SD, b1 and b2 of the following prob-
ability distribution:
X=x 0 1 2 3 4 5 6 7 8

p(x) 0.004 0.036 0.1 0.232 0.280 0.204 0.112 0.028 0.004

Solution
(i) Mean = m = Âx p(x)
= 0 + 1(0.036) + 2 (0.1) + 3(0.232) + 4(0.280) + 5(0.204) + 6(0.112)
+ 7(0.028) + 8(0.04)
= 3.972
(ii) Median
F(3) = 0.004 + 0.036 + 0.1 + 0.232 = 0.372 < 0.5
F(4) = 0.004 + 0.036 + 0.1 + 0.232 + 0.280 = 0.652 > 0.5
1
Median M = (3 + 4) = 3.5
2
(iii) Mode is the value of X for which P(X = x) is maximum.
Mode = 4   [∵ P(X = 4) = 0.280 is maximum probability]
(iv) Variance = s = Â x p( x ) - m
2 2 2

       = 0 + 1 (0.036) + 4(0.1) + 9(0.232) + 16(0.28) + 25(0.204)


+ 36(0.112) + 49(0.028) + 64(0.004) – (3.972)2
       = 1.987
SD = Var( X ) = 1.987 = 1.41
(v) F(2) = 0.004 + 0.036 + 0.1 = 0.14 < 0.25
F(3) = 0.372 > 0.25
1
  Q1 = (2 + 3) = 2.5
2
3.6 Kurtosis 3.31

F(4) = 0.652 < 0.75


F(5) = 0.004 + 0.036 + 0.1 + 0.232 + 0.280 + 0.204 = 0.856 > 0.75
1
  Q3 = (4 + 5) = 4.5
2
1 1
QD = (Q3 - Q1 ) = (4.5 - 2.5) = 1
2 2
(vi) MD = Â x - m p( x )
   = 3.972(0.004) + 2.972(0.036) + 1.972(0.1) + 0.972(0.232)
+ 0.028 (0.28) + 1.028(0.204) + 2.028(0.112)
+ 3.028(0.028) + 4.028(0.004)
   = 1.091
(vii) µ1¢ = µ = 3.972
µ2¢ = E(X2) = Âx2p(x)
= 0 + 1(0.036) + 4(0.1) + 9(0.232) + 16(0.280)
+ 25(0.204) + 36(0.112) + 49(0.028) + 64(0.004)
= 17.764
µ3¢ = E(X3) = Âx3p(x)
= 0 + 1(0.036) + 8 (0.1) + 27(0.232) + 64(0.280) + 125(0.204)
+ 216(0.112) +343(0.028) + 512(0.004)
= 86.364
µ4¢ = E(X4) = Âx4p(x)
= 0 + 1(0.036) + 16(0.1) + 81(0.232) + 256(0.280)
+ 625(0.204) + 1296(0.112) + 2401(0.028) + 4096(0.004)
= 448.372
µ2 = s2 = 1.987
µ3 = m3¢ - 3m2¢ m1¢ + 2( m1¢ )3
= 86.364 – 3(17.764) (3.972) + 2(3.972)3
= 0.019
2 4
µ4 = m 4¢ - 4 m3¢ m1¢ + 6 m2¢ ( m1¢ ) - 3( m1¢ )
= 448.372 – 4(86.364) (3.972) + 6(17.764) (3.972)2 – 3(3.972)4
= 11.053
m32 (0.019)2
b1 = = = 0.00005
m23 (1.987)3
m4 11.053
b2 = = = 2.8
m22 (1.987)2
3.32 Chapter 3 Basic Statistics

Exercise 3.3

1. Karl Pearson’s measure of skewness of a distribution is 0.5. Its median and


mode are respectively 42 and 36. Find the coefficient of variation.

 [ans.: 40 ]
2. From the marks scored by 120 students in Section A and 120 students in
Section B of a class, the following measures are obtained:

Section A x = 46.83 SD = 14.8 mode = 51.67

Section B x = 47.83 SD = 14.8 mode = 47.07

Determine which distribution of marks is more skewed.


 [ans.: Section A]
3. For a moderately skewed data, the arithmetic mean is 200, the coefficient
of variation is 8, and Karl Pearson’s coefficient of skewness is 0.3. Find
the mode and median.
 [ans.: 195.2, 198.4 ]
4. Karl Pearson’s coefficient of skewness of a distribution is 0.32. Its standard
deviation is 6.5 and the mean is 29.6. Find the mode and median for the
distribution.
 [ans.: 27.52, 28.9]
5. The median, mode and coefficient of skewness for a certain distribution
are respectively 17.4, 15.3, and 0.35. Find the coefficient of variation.
 [ans.: 48.78%]
6. In a distribution, mean = 65, median = 70, coefficient of skewness = –6.
Find the mode and coefficient of variation.
 [ans.: 80, 39.78%]

3.7  easures of Statistics for Continuous Random


M
Variables

1. Mean The mean or average value (m) of the probability distribution of a continuous
random variable X is called the expectation and is denoted by E(X).

m = E( X ) = Ú x f ( x ) dx
-•
3.7 Measures of Statistics for Continuous Random Variables 3.33

Expectation of any function f(x) of a continuous random variable X is given by



E [f ( x )] = Ú f ( x) f ( x) dx
-•

2. Median The median is the point which divides the entire distribution into two
equal parts. In case of a continuous distribution, the median is the point which divides
the total area into two equal parts. Thus, if a continuous random variable X is defined
from a to b and M is the median,
M b
1
Ú f ( x ) dx = Ú f ( x) dx = 2
a M

By solving any one of this equation, the median is obtained.


3. Mode The mode is value of x for which f (x) is maximum. Mode is given by
f ¢( x ) = 0 and f ¢¢( x ) < 0 for a < x < b

4. Geometric Mean The geometric mean of the probability distribution of a con-


tinuous random variable X is given by

log G = Ú (log x) f ( x) dx
-•

5. Harmonic Mean The harmonic mean of the probability distribution of a con-


tinuous random variable X is given by

1 1
H
= Ú x
f ( x ) dx
-•

6. Quartile Deviation The rth quartile of the probability distribution of a continu-


ous random variable X and denoted by Qr, is given by
Qr
r , r = 1, 2, 3
Ú f ( x ) dx =
4
-•
Q1 and Q3 are called the first (lower) and the third (upper) quartiles respectively. Q2 is
the median (middle or second quartile). Quartile deviation or semi-interquartile range
of the probability distribution of a continuous random variable X is given by
1
Q = (Q - Q1 )
2 3

7. Mean Deviation Mean deviation of the probability distribution of a continuous


random variable X is given by
3.34 Chapter 3 Basic Statistics


MD = Ú x - m f ( x ) dx
-•

8. Standard Deviation The standard deviation of the probability distribution of a


continuous random variable X is given by
SD = Var( X ) = s

9. Variance The variance of the probability distribution of a continuous random


variable X is given by

Var ( X ) = s 2 = Ú (x - m)
2
f ( x ) dx
-•

= Ú x 2 f ( x ) dx - m 2
-•

10. Moments Central moments or moments about actual mean of the probability
distribution of a continuous random variable X is given by

Ú (x - m)
r
mr = f ( x ) dx
-•

Raw moments or moments about arbitrary origin of the probability distribution of a


continuous random variable X is given by

Ú ( x - a)
r
mr¢ = f ( x ) dx
-•

When a = 0, m¢r is called rth order simple moments.


Úx
r
mr¢ = f ( x ) dx
-•

11. Skewness Skewness of the probability distribution of a continuous random


variable X is given by

m32
b1 =
m23

12. Kurtosis Kurtosis of the probability distribution of a continuous random vari-


able X is given by
m4 m4
b2 = =
m22 s4

3.7 Measures of Statistics for Continuous Random Variables 3.35

Note The formulae of various measures of central tendency, dispersion, skewness


and kurtosis of discrete probability distribution can be easily extended to the case of
continuous probability distribution by simply replacing p(x) by f (x)dx and the summa-
tion by integration over the specified range of the variable X.

Example 1
For the continuous random variable having pdf
f ( x) = 4 x3 0 £ x £1
=0 otherwise
Find the mean and variance of X.
Solution

Mean = m = Ú x f ( x ) dx
-•
0 1 •
= Ú x f ( x ) dx + Ú x f ( x ) dx + Ú x f ( x ) dx
-• 0 1
1
= 0 + Ú x (4 x 3 ) dx + 0
0
1
= 4 Ú x 4 dx
0
1
x5
=4
5 0

Ê1 ˆ
= 4 Á - 0˜
Ë5 ¯
4
=
  5

Var (X) = Ú x 2 f ( x ) dx - m 2
-•
0 1 •
= Ú x 2 f ( x ) dx + Ú x 2 f ( x ) dx + Ú x 2 f ( x ) dx - m 2
-• 0 1
1 2
Ê 4ˆ
= 0 + Ú x 2 (4 x 3 ) dx + 0 - Á ˜
Ë 5¯
0
3.36 Chapter 3 Basic Statistics

1
16
= 4 Ú x 5 dx -
0
25
1
x6 16
=4 -
6 0 25
4 16
= -
6 25
2
=
75

Example 2
For the triangular distribution
f ( x) = x 0 < x £1
= 2- x 1£ x £ 2
=0 otherwise
Find the mean and variance.
Solution

m= Ú x f ( x ) dx
-•
0 1 2 •
= Ú x f ( x ) dx + Ú x f ( x ) dx + Ú x f ( x ) + Ú x f ( x ) dx
-• 0 1 2
1 2
= 0 + Ú x ◊ x dx + Ú x (2 - x ) dx + 0
0 1
1 2
= Ú x 2 dx + Ú (2 x - x 2 ) dx
0 1
1 2
x3 x2 x3
= +2 -
3 0 2 3 1

Ê 1 ˆ ÈÊ 8ˆ Ê 1ˆ ˘
= Á - 0˜ + ÍÁ 4 - ˜ - Á 1 - ˜ ˙
Ë 3 ¯ ÎË 3¯ Ë 3¯ ˚
1 4 2
+ -=
3 3 3
  = 1
3.7 Measures of Statistics for Continuous Random Variables 3.37


Var (X) = Ú x 2 f ( x ) dx - m 2
-•
0 1 2 •
= Ú x 2 f ( x ) dx + Ú x 2 f ( x ) dx + Ú x 2 f ( x ) dx + Ú x 2 f ( x ) dx - m 2
-• 0 1 2
1 2
= 0 + Ú x 2 ◊ x dx + Ú x 2 (2 - x ) dx + 0 - 1
0 1
1 2
= Ú x 3 dx + Ú (2 x 2 - x 3 ) dx - 1
0 1
1 2
4
x 2 x3 x 4
= + - -1
4 0 3 4 1

Ê1 ˆ ÈÊ 16 16 ˆ Ê 2 1 ˆ ˘
= Á - 0˜ + ÍÁ - ˜ - Á - ˜ ˙ - 1
Ë4 ¯ ÎË 3 4 ¯ Ë 3 4¯ ˚
7
= -1
6
1
=
    6

Example 3
If the probability density function of X is given by
Ïx
Ô2 0 < x £1
Ô
ÔÔ 1 1< x £ 2
f ( x) = Ì 2
Ô3 - x
Ô 2< x<3
Ô 2
ÔÓ0 otherwise
Find the expected value of f (x) = x2 – 5x + 3.
Solution

E [Ef ( x )] = Ú f ( x) f ( x) dx
-•

E ( x 2 - 5 x + 3) = Ú (x
2
- 5 x + 3) f ( x ) dx
-•
3.38 Chapter 3 Basic Statistics

1 2
x 1
= Ú ( x 2 - 5 x + 3) dx + Ú ( x 2 - 5 x + 3) dx +
0
2 1
2
3
Ê 3- xˆ
Ú (x
2
- 5 x + 3) Á dx
Ë 2 ˜¯
2
1 2
1 1
=
20Ú ( x 3 - 5 x 2 + 3 x ) dx + Ú ( x 2 - 5 x + 3) dx
21
3
1
2 Ú2
+ (- x 3 + 8 x 2 - 18 x + 9) dx
   
1 2 3
1 x 4 5x3 3x2 1 x3 5x2 1 x 4 8 x 3 18 x 2
= - + + - + 3x + - + - + 9x
2 4 3 2 0 2 3 2 1 2 4 3 2 2
1 Ê 1 5 3ˆ 1 Ê 8 1 5 ˆ
= Á - + ˜ + Á - 10 + 6 - + - 3˜¯
2 Ë 4 3 2¯ 2 Ë 3 3 2
1 Ê 81 216 162 16 64 72 ˆ
+ Á- + - + 27 + - + - 18˜
2Ë 4 3 2 4 3 2 ¯
1 13 19
= - -
24 12 24
11
=-
6

Example 4
A continuous random variable has the probability density function
f ( x ) = kxe - l x x ≥ 0, l > 0
=0 otherwise
Determine (i) k, (ii) mean, and (iii) variance.
Solution
Since f (x) is a probability density function,

Ú f ( x ) dx = 1
-•
0 •

Ú f ( x ) dx + Ú f ( x ) dx = 1
-• 0

0 + Ú k x e - l x dx = 1
0
3.7 Measures of Statistics for Continuous Random Variables 3.39

Ú xe
-l x
k dx = 1
0

e- l x e- l x
k x -1 2 =1
-l l 0

È Ê 1 ˆ˘
k Í(0 - 0) - Á 0 - 2 ˜ ˙ = 1
Î Ë l ¯˚
k = l2
2 -l x
Hence, f ( x ) = l x e x ≥ 0, l = 0
=0 otherwise

(ii) Mean = m = Ú x f ( x ) dx
-•
0 •
= Ú x f ( x ) dx + Ú x f ( x ) dx
-• 0

= 0 + Ú x l 2 x e - l x dx
0

= l 2 Ú x 2 e - l x dx
0

Ê e- l x ˆ
2 2
Ê e- l x ˆ Ê e- l x ˆ
=l x Á ˜ - 2x Á 2 ˜ + 2 Á
Ë -l ¯ Ë l ¯ Ë - l 3 ˜¯ 0

È Ê 2 ˆ˘
= l 2 Í(0 - 0 + 0) - Á 0 - 0 - 3 ˜ ˙
Î Ë l ¯˚
2
=
l

(iii) Variance = s 2 =
Ú x 2 f ( x ) dx - m 2
-•
0 •
= Ú x 2 f ( x ) dx + Ú x 2 f ( x ) dx - m 2
-• 0
• 2
Ê 2ˆ
= 0 + Ú x 2 l 2 x e - l x dx - Á ˜
Ë l¯
0

4
= l 2 Ú x 3 e - l x dx -
0 l2
3.40 Chapter 3 Basic Statistics


Ê e- l x ˆ
2 3
Ê e- l x ˆ Ê e- l x ˆ Ê e- l x ˆ 4
=l x Á ˜ - 3x2 Á 2 ˜ + 6 x Á ˜ - 6 Á -
Ë -l x ¯ Ë l ¯ Ë -l 3 ¯ Ë l 4 ˜¯ 0 l2
È Ê 6 ˆ˘ 4
= l 2 Í(0 - 0 + 0 - 0) - Á 0 - 0 + 0 - 4 ˜ ˙ - 2
Î Ë l ¯˚ l

6 4
= 2
-
l l2
2
=
   l2

Example 5
The probability density f (x) of a continuous random variable is given
1
by f (x) = k e–|x|, –• < x < • (i) show that k = , and (ii) find the mean
2
and variance of the distribution. (iii) Also, find the probability that the
variate lies between 0 and 4.
Solution
(i) Since f (x) is a probability density function,

Ú f ( x ) dx = 1
-•

Ú ke
-| x |
dx = 1
-•

Úe
-| x |
k dx = 1
-•

2 k Ú e - | x | dx = 1 ÈÎ∵ e -| x| is an even function ˘˚
0

2 k Ú e - x dx = 1 [∵ |x|= x 0 £ x £ •]
0

2k -e- x 0 = 1
-2 k (0 - 1) = 1
1
k=
2
3.7 Measures of Statistics for Continuous Random Variables 3.41

1 -| x |
Hence, f ( x ) = e -• < x < •
2

(ii) m = Ú x f ( x ) dx
-•

1
2 -Ú•
= x e - | x | dx

=0 [∵ the integrand is an odd function ]



(iii) Var ( X ) = s 2 =
Ú x 2 f ( x ) dx - m 2
-•

1
2 -Ú•
= x 2 e - | x | dx - 0


Ê 1ˆ
= 2 Á ˜ Ú x 2 e -| x| dx [∵ the integrand is an even function]
Ë 2¯
0

= Ú x 2 e -| x| dx
0

e- x 2e- x e- x
= x - 2x +2
-1 1 -1 0
= 0 - (-2)
      = 2
(iii) Probability that the variate lies between 0 and 4
4
P(0 < X < 4) = Ú f ( x ) dx
0
1 4
= Ú e -| x| dx
2 0
1 4
= Ú e - x dx ÎÈ∵ x =x 0 < x < 4 ˚˘
2 0
1 4
= - e- x 0
2
1
= - (e -4 - 1)
2
= 0.4908

Example 6
The daily consumption of electric power is a random variable X with
probability density function
3.42 Chapter 3 Basic Statistics

x
-
f ( x) = k x e 3 x>0
=0 x£0
Find the value of k, the expectation of X, and the probability that on a
given day, the electric consumption is more than the expected value.
Solution
Since f (x) is a probability density function,

Ú f ( x ) dx = 1
-•
0 •

Ú f ( x ) dx + Ú f ( x ) dx = 1
-• 0
• x
-
0+ Ú k xe 3 dx = 1
0


Ê -x ˆ Ê -x ˆ
Áe 3 ˜ Áe 3 ˜
k xÁ - (1) =1
1˜ Á 1 ˜
ÁË - ˜¯ ÁË ˜¯
3 9 0
k [(0 - 0) - (0 - 9)] = 1
9k = 1
1
k=
9
x
1 -3
Hence, f ( x ) = xe x>0
9
=0 x£0

E( X ) = Ú x f ( x ) dx
-•
0 •
= Ú x f ( x ) dx + Ú x f ( x ) dx
-• 0
• x
1 -
= 0+ Ú x◊ x e 3 dx
0
9
• x
1 2 -3
9 Ú0
= x e dx
3.7 Measures of Statistics for Continuous Random Variables 3.43


Ê -x ˆ Ê -x ˆ Ê -x ˆ
1 2Áe 3 ˜ Áe 3 ˜ Áe 3 ˜
= x Á - 2 x + 2
9 1˜ Á 1 ˜ Á 1 ˜
ÁË - ˜¯ ÁË ˜¯ ÁË - ˜¯
3 9 27 0
1
= (0 - 0 + 0 + 54)
9
     = 6
6
P( X > 6) = Ú f ( x ) dx
0
6 x
1 -3
=Ú x e dx
0
9
6 x
1 -
=
90Ú x e 3 dx


Ê -x ˆ Ê -x ˆ
1 Áe 3 ˜ Áe 3 ˜
= xÁ -1
9 1˜ Á 1 ˜
ÁË - ˜¯ ÁË ˜¯
3 9 0

=


(
(0 - 0) - -18 e -2 - 9 e -2 ˘˚ )
= 3 e -2
= 0.406

Example 7
Let X be a random variable with E(X) = 10 and Var(X) = 25. Find the
positive values of a and b such that Y = aX – b has an expectation of
0 and a variance of 1.
Solution
E (Y ) = E (aX - b)
0 = aE ( X ) - b
= a(10) - b
10 a - b = 0
Var(Y ) = Var(aX - b)
1 = a 2 Var( X )
= a 2 (25)
3.44 Chapter 3 Basic Statistics

25a 2 = 1
1
a=
5
   b = 2

Example 8
A continuous random variable X is distributed over the interval [0, 1]
with pdf f (x) = ax2 + bx, where a, b are constants. If the mean of X is 0.5,
find the values of a and b.
Solution
Since f (x) is probability density function,

Ú f ( x ) dx = 1
-•
0 1 •

Ú f ( x ) dx + Ú f ( x ) dx + Ú f ( x ) dx = 1
-• 0 1
1
0 + Ú (ax 2 + bx ) dx + 0 = 1
0
1
ax 3 bx 2
+ =1
3 2 0
a b
+ =1
3 2
2 a + 3b = 6  ...(1)
Also, m = 0.5
1

Ú x f ( x) dx = 0.5
0
1

Ú x (ax
2
+ bx ) dx = 0.5
0
1

Ú (ax
3
+ bx 2 ) dx = 0.5
0
1
ax 4 bx 3
+ = 0.5
4 3 0
a b
+ = 0.5
4 3
3a + 4b = 6  ...(2)
3.7 Measures of Statistics for Continuous Random Variables 3.45

Solving Eqs (1) and (2),


a = –6,    b = 6

Example 9
A continuous random variable X has the pdf defined by f (x) = A + Bx,
0 £ x £ 1. If the mean of the distribution is 1 , find A and B.
3
Solution
Since f (x) is a probability density function,

Ú f ( x ) dx = 1
-•
0 1 •

Ú f ( x ) dx + Ú f ( x ) dx + Ú f ( x ) dx = 1
-• 0 1
1
0 + Ú ( A + Bx ) dx + 0 = 1
0
1
Bx 2
Ax + =1
2 0
B
A+ = 1 ...(1)
2
1
Also,       m=
3

1
Ú x f ( x ) dx =
3
-•
0 1
1
Ú x f ( x ) dx + Ú x f ( x ) dx =
3
-• 0
1
1
0 + Ú x ( A + Bx ) dx =
0
3
1
1
Ú ( Ax + Bx
2
) dx =
0
3
1
Ax 2 Bx 3 1
+ =
2 3 0 3
A B 1
+ =
2 3 3
3A + 2B = 2  ...(2)
3.46 Chapter 3 Basic Statistics

Solving Eqs (1) and (2),


A = 2,   B = –2

Example 10
A continuous random variable has probability density function
f (x) = 6(x –x2)   0 £ x £ 1.
Find the (i) mean, (ii) variance, (iii) median, and (iv) mode.
Solution

(i) m = Ú x f ( x ) dx
-•
0 1 •
= Ú x f ( x ) dx + Ú x f ( x ) dx + Ú x f ( x ) dx
-• 0 1
1
= 0 + Ú x 6( x - x 2 ) dx + 0
0
1
= 6 Ú ( x 2 - x 3 ) dx
0
1
x3 x 4
=6 -
3 4 0

Ê 1 1ˆ
= 6Á - ˜
Ë 3 4¯
1
=
2

(ii) Var ( X ) = Ú x 2 f ( x ) dx - m 2
-•
0 1 •
= Ú x 2 f ( x ) dx + Ú x 2 f ( x ) dx + Ú x 2 f ( x ) dx - m 2
-• 0 1
1 2 1
= 0+Ú x 6 ( x - x 2 ) dx + 0 -
0 4
1 1
= 6 Ú ( x 3 - x 4 ) dx -
0 4
1
x4 x5 1
=6 - -
4 5 0 4
Ê 1 1ˆ 1
= 6Á - ˜ -
Ë 4 5¯ 4
3.7 Measures of Statistics for Continuous Random Variables 3.47

6 1
= -
20 4
1
=
20

M b
1
(iii) Ú f ( x ) dx = Ú f ( x) dx = 2
a M
M
1
Ú 6( x - x
2
) dx =
0
2
M
x2 x3 1
6 - =
2 3 0 2
Ê M2 M3 ˆ 1
6Á - ˜=
Ë 2 3 ¯ 2
1
3M 2 - 2 M 3 =
2
3 2
4M - 6M + 1 = 0
(2 M - 1) (2 M 2 - 2 M - 1) = 0
1 1± 3
M= or M =
2 2
1
M = lies in (0, 1)
2
1
Hence, median M =
2
(iv) Mode is the value of x for which f (x) is maximum. For f (x) to be maximum,
f ¢( x ) = 0 and f ¢¢( x ) < 0.
f ¢( x ) = 0
6 (1 - 2 x ) = 0
1
x=
2
f ¢¢( x ) = -12 x
1
At x = , f ¢¢( x ) = -12 < 0
2
1
Hence, f (x) is maximum at x = .
2
1
Mode =
2
3.48 Chapter 3 Basic Statistics

Example 11
The probability density function of a random variable X is
1
f ( x ) = sin x 0 £ x £ p
2
=0 otherwise
Find the mean, mode, and median of the distribution and also, find the
p
probability between 0 and .
2
Solution

(i) m = Ú f ( x ) dx
-•
0 p •
= Ú x f ( x ) dx + Ú x f ( x ) dx + Ú x f ( x ) dx
-• 0 p
p
Ê1 ˆ
= 0 + Ú x Á sin x ˜ dx + 0
Ë2 ¯
0
p
1
2 Ú0
= x sin x dx

1 p
= - x cos x + sin x 0
2
p
=
2
(ii) Mode is the value of x for which f (x) is maximum. For f (x) to be maximum,
f ¢( x ) = 0 and f ¢¢( x ) < 0.
f ¢( x ) = 0
cos x = 0
p
x=
2
1
f ¢¢( x ) = - sin x
2
p 1
At x = , f ¢¢( x ) = - < 0
2 2
p
Hence, f (x) is maximum of x = .
2
p
Mode =
2
3.7 Measures of Statistics for Continuous Random Variables 3.49

M b
1
(iii)
Ú f ( x ) dx = Ú f ( x) dx = 2
a M
M p
1 1 1
Ú 2
sin x dx = Ú sin x dx =
2 2
0 M
M
1 1
Ú 2 sin x dx = 2
0
1 1M
- cos x 0 =
2 2
1 1
- (cos M - 1) =
2 2
1 - cos M = 0
cos M = 0
p
M=
2
p
Hence, median M =
2
p
(iv) P ÊÁ 0 < X < p ˆ˜ = 2 f ( x ) dx
Ë 2 ¯ Ú0
p
1
=Ú2 sin x dx
0 2
p
1 2
= - cos x 0
2
1
= - (0 - 1)
2
1
=
2

Example 12
The cumulative distribution function of a continuous random variable X
is F ( x ) = 1 - e-2 x x ≥ 0
=0 x<0
Find the (i) the probability density function, (ii) mean, and
(iii) variance.
3.50 Chapter 3 Basic Statistics

Solution
d
(i) f ( x ) = F ( x)
dx
1
f ( x ) = e -2 x x ≥ 0
2
=0 x<0

(ii) m = Ú x f ( x ) dx
-•
0 •
=Ú x f ( x ) dx + Ú x f ( x ) dx
-• 0
• 1
= 0+Ú x ◊ e -2 x dx
0 2
1 •

2 Ú0
= x e -2 x dx

1 Ê e -2 x ˆ Ê e -2 x ˆ
= xÁ ˜ - 1Á ˜
2 Ë -2 ¯ Ë 4 ¯ 0

1È Ê 1ˆ˘
= Í(0 - 0) - ÁË 0 - ˜¯ ˙
2Î 4 ˚
1
=
8
• 2
(iii) Var ( X ) = Ú x f ( x ) dx - m 2
-•
0 •
=Ú x2 f ( x ) dx + Ú x 2 f ( x ) dx - m 2
-• 0
2
1 -2 x
• Ê 1ˆ
= 0 + Ú x2 ◊ e dx - Á ˜
0 2 Ë 8¯
1 • 1
= Ú x 2 e -2 x dx -
2 0 64

1 Ê e -2 x ˆ Ê e -2 x ˆ Ê e -2 x ˆ 1
= x2 Á ˜ - 2x Á ˜ + 2Á -
2 Ë -2 ¯ Ë 4 ¯ Ë -8 ˜¯ 0 64
1È Ê 1ˆ˘ 1
= Í(0 - 0 - 0) - ÁË 0 - 0 - ˜¯ ˙ -
2Î 4 ˚ 64
1 1
= -
8 64
7
=
64
3.7 Measures of Statistics for Continuous Random Variables 3.51

Example 13
A continuous random variable X has the distribution function
F ( x) = 0 x £1
= k ( x - 1)4 1< x £ 3
=1 x>3
Determine (i) f (x), (ii) k, and (iii) mean.
Solution
d
(i) f ( x ) = F ( x)
dx
f ( x) = 0 x £1
3
= 4 k ( x - 1) 1< x £ 3
=0 x>3
(ii) Since f (x) is a probability density function,

Ú-• f ( x) dx = 1
1 3 •
Ú-• f ( x) dx + Ú1 f ( x ) dx + Ú f ( x ) dx = 1
3
3
0 + Ú 4 k ( x - 1)3 dx + 0 = 1
1
3
( x - 1)4
4k =1
4 1
k (16 - 0) = 1
1
k=
16
Hence, f ( x ) = 0 x £1
1
= ( x - 1)3 1< x £ 3
4
=0 x>3

(iii) m = Ú x f ( x ) dx
-•
1 3 •
=Ú x f ( x ) dx + Ú x f ( x ) dx + Ú x f ( x ) dx
-• 1 3
3 1
= 0+Ú x ◊ ( x - 1)3 dx + 0
1 4
1 3
Ú1 x ( x - 1)
3
= dx
4
3.52 Chapter 3 Basic Statistics

ÈPutting x - 1 = t ˘
1 2 Í ˙
4 Ú0
= (t + 1) t 3 dt Í When x = 1, t = 0 ˙
ÍÎ When x = 3, t = 2 ˙˚
1 2
Ú0 (t
4
= + t 3 ) dt
4
2
1 t5 t4
= +
4 5 4 0

1 ÈÊ 25 2 4 ˆ ˘
= ÍÁ + ˜ - (0)˙
4 ÎË 5 4¯ ˚
= 2.6

Example 14
If the density function of a random variable X is given by
f(x) = kx (1 – x), 0 £ x £ 1,
find (i) AM, (ii) HM, (iii) Median, (iv) Mode, (v) SD, (vi) MD about the
mean.
Solution
(i) Since f(x) is a probability density function,

Ú f ( x )dx = 1
-•
1

Ú kx(1 - x)dx = 1
0
1
k Ú ( x - x 2 ) dx = 1
0
1
x2 x3
k - =1
2 3
0

Ê 1 1ˆ
kÁ - ˜ =1
Ë 2 3¯
k =6
Hence, f ( x ) = 6 x(1 - x ), 0 £ x £ 1

(ii) AM = m = E(x) = Ú xf ( x) dx
-•
3.7 Measures of Statistics for Continuous Random Variables 3.53

1
= Ú x ◊ 6 x(1 - x ) dx
0
1
= 6 Ú ( x 2 - x 3 ) dx
0
1
x3 x 4
=6 -
3 4
0

Ê 1 1ˆ
= 6Á - ˜
Ë 3 4¯
1
=
    2

1 1
(iii) = Ú f ( x ) dx
H -• x
1
1
=Ú ◊ 6 x(1 - x ) dx
0
x
1
= 6 Ú (1 - x ) dx
0
1
x2
=6 x-
2
0

Ê 1ˆ
= 6 Á1 - ˜
Ë 2¯
=3
1
H=
3
M
1
(iv) Ú f ( x) dx = 2
0
M
1
Ú 6 x(1 - x) dx = 2
0
M
1
6 Ú ( x - x 2 )dx =
0
2
M
x2 x3 1
6 - =
2 3 2
0
3.54 Chapter 3 Basic Statistics

Ê M2 M3 ˆ 1
6Á - =
Ë 2 3 ˜¯ 2
1
3M 2 - 2 M 3 =
2
6M 2 - 4M 3 = 1
4M 3 - 6M 2 + 1 = 0
1 1 3
M= , ±
2 2 2
1 3
The values M = ± lie outside (0, 1).
2 2
1
Hence, M =
2
(v) Mode is the value of x for which f(x) is maximum. For f(x) to be maximum,
f¢(x) = 0 and f¢¢ (x) < 0.
f ¢( x ) = 0
6 - 12 x = 0
1
x=
2
f ¢¢( x ) = -12 < 0
1
Hence, f(x) is maximum at x =
2
1
Mode =
2
As the mean, median and mode are equal, the distribution is symmetrical.

(vi) E ( X 2 ) = Úx
2
+ f ( x ) dx
-•
1
= Ú x 2 ◊ 6 x(1 - x ) dx
0
1
= 6 Ú ( x 3 - x 4 ) dx
0
1
x4 x5
=6 -
4 5
0

Ê 1 1ˆ
= 6Á - ˜
Ë 4 5¯
3
=
10
3.7 Measures of Statistics for Continuous Random Variables 3.55

Var ( X ) = E ( X 2 ) - {E ( X )}2
2
3 Ê 1ˆ
= -
10 ÁË 2 ˜¯
1
=
20
1 1
SD = Var( X ) = =
20 2 5
(vii) Mean deviation about the mean

MD = Ú x - m f ( x ) dx
-•
1
1
= Ú x- 6 x(1 - x ) dx
0
2
1
2 1
Ê1 ˆ Ê 1ˆ
= Ú Á - x ˜ 6 x (1 - x ) dx + Ú Á x - ˜ 6 x (1 - x ) dx
Ë2 ¯ 1
Ë 2¯
0
2
1
2 1
= Ú (3 x - 9 x 2 + 6 x 3 )dx + Ú (-3 x + 9 x 2 - 6 x 3 ) dx
0 1
2
1
1
3x2 3x4 2 3x2 3x4
= - 3x3 + +- + 3x3 -
2 2 2 2 1
0
2

Ê3 3 3 ˆ Ê 3 3ˆ Ê 3 3 3 ˆ
= Á - + ˜ +Á- +3- ˜ -Á- + - ˜
Ë 8 8 32 ¯ Ë 2 2 ¯ Ë 8 8 32 ¯
3
=
16

Example 15
Prove that geometric mean G of the distribution
f(x) = 6(2 – x) (x – 1), 1 £ x £ 2
is given by 6 log(16G) = 19.
3.56 Chapter 3 Basic Statistics

Solution

log G = Ú (log x) f ( x) dx
-•
2
= Ú (log x ) 6(2 - x ) ( x - 1) dx
1
2
= -6 Ú ( x 2 - 3 x + 2) log x dx
1

ÈÊ 3 ˆ
2
2
Ê x3 3 x2 ˆ 1 ˘˙
Í x 3x2
= -6 Á - + 2 x˜ log x - Ú Á - + 2 x ˜ dx
ÍË 3 2 ¯ 3 2 ¯x ˙
ÍÎ 1 1Ë ˙˚
ÈÊ 8 ˆ
2
Ê x2 3x ˆ ˘
= -6 ÍÁ - 6 + 4˜ log 2 - Ú Á - + 2 ˜ dx ˙
ÍÎË 3 ¯ 1Ë
3 2 ¯ ˙˚
È 2˘
Í 2 x3 3 x2
= -6 log 2 - - + 2x ˙
Í3 9 4 ˙
Î 1˚

È2 Ê8 ˆ Ê1 3 ˆ˘
= -6 Í log 2 - Á - 3 + 4˜ + Á - + 2˜ ˙
Î3 Ë9 ¯ Ë9 4 ¯˚
È2 17 49 ˘
= -6 Í log 2 - + ˙
Î3 9 36 ˚
19
= -4 log 2 +
6
19
log G + 4 log 2 =
6
19
log(G ¥ 2 4 ) =
6
19
log(16G ) =
6

Example 16
The probability distribution of a random variable X is
p
f ( x ) = k sin
x, 0 £ x £ 5
5
Determine the constant k and obtain the median and quartiles of the
distribution.
3.7 Measures of Statistics for Continuous Random Variables 3.57

Solution
Since f(x) is a probability distribution,

Ú f ( x ) dx = 1
-•
5
p
Ú k sin 5 x dx = 1
0
5
p
- cos x
k 5 =1
p
5 0
5k
(- cos p + cos 0) = 1
p
5k
[ -(-1) + 1] = 1
p
10 k
=1
p
p
k=
10
p p
Hence,           f ( x ) = sin x, 0£ x£5
10 5
The rth quartile Qr is given by
Qr
r
Ú f ( x ) dx =
4
, r = 1, 2, 3
-•
Qr
p p r
Ú 10 sin 5 x dx = 4
   0

Qr
p
- cos x
p 5 r
=
10 p 4
5 0
1Ê p ˆ r
Á - cos Qr + cos 0˜ =
2 Ë 5 ¯ 4
p r
- cos Qr + 1 =
       5 2
3.58 Chapter 3 Basic Statistics

p r
cos Qr = 1 -
5 2
p Ê rˆ
Qr = cos-1 Á 1 - ˜
5 Ë 2¯
5 Ê rˆ
Qr = cos-1 Á 1 - ˜
p Ë 2¯
5 Ê 1ˆ 5 Ê 1ˆ 5 Ê p ˆ 5
Q1 = cos-1 Á 1 - ˜ = cos-1 Á ˜ = Á ˜ =
p Ë 2¯ p Ë 2¯ p Ë 3 ¯ 3
5 5 5 Êpˆ 5
Q2 = cos-1 (1 - 1) = cos-1 (0) = Á ˜ =
p p p Ë 2¯ 2
5 Ê 3ˆ 5 Ê 1 ˆ 5 Ê 2p ˆ 10
Q3 = cos-1 Á 1 - ˜ = cos-1 Á - ˜ = Á ˜ =
p Ë 2¯ p Ë 2¯ p Ë 3 ¯ 3

5
         Median = Q2 =
2

Example 17
Find the median, mode and quartile deviation of continuous random
variable X, given that its density functions is
k
f ( x) = , - • < x < •.
1 + x2
Solution
(i) Since f(x) is a probability density function,

Ú f ( x )dx = 1
-•

k
Ú 1 + x 2 dx = 1
-•

1 È a a ˘
2k Ú 2
dx = 1 Í ∵ Ú f ( x ) dx = 2 Ú f ( x ) dx , if f ( x ) is even function ˙
0 1+ x ÍÎ - a 0 ˙˚

2k tan -1 x =1
0
3.7 Measures of Statistics for Continuous Random Variables 3.59

2 k (tan -1 • - tan -1 0) = 1
Êpˆ
2k Á ˜ = 1
Ë 2¯
1
k=
p
1
Hence, f ( x) = ,-• < x < •
    p (1 + x 2 )

(ii) The rth quartile Qr is given by


Qr
r
Ú f ( x ) dx =
4
, r = 1, 2, 3
-•
Qr
1 r
Ú p (1 + x 2 ) dx = 4
-•
1 Qr r
tan -1 x =
p -• 4
1 È -1 r
Î tan Qr - tan -1 (-•)˘˚ =
p 4
1 È -1 Ê p ˆ ˘ r
Ítan Qr - ÁË - ˜¯ ˙ =
pÎ 2 ˚ 4
p
tan -1 Qr = (r - 2)
4
Ïp ¸
Qr = tan Ì (r - 2) ˝
Ó4 ˛
Ê pˆ
Q1 = tan Á - ˜ = -1
Ë 4¯
Q2 = tan(0) = 0
Êpˆ
Q3 = tan Á ˜ = 1
Ë 4¯
1
QD = (Q - Q1 )
2 3
1
= [1 - (-1)]
2
=1
(iii) Median Q2 = 0
(iv) Mode is the value of x for which f(x) is maximum. For f(x) to be maximum
f ¢(x) = 0 and f ¢¢(x) < 0.
3.60 Chapter 3 Basic Statistics

f ¢( x ) = 0
2x
- =0
p (1 + x 2 )2
x=0
2È (1 + x 2 )2 - x ◊ 2(1 + x 2 ) ◊ 2 x ˘
f ¢¢( x ) = - Í ˙
pÍÎ (1 + x 2 )4 ˙˚
2 È 3x2 - 1 ˘
= Í ˙
p ÎÍ (1 + x 2 )3 ˚˙
2
f ¢¢(0) = -
<0
p
Hence, f(x) is maximum at x = 0.
Mode = 0

Example 18
Find the mean, variance and the coefficients b1, b2 of the distribution
f(x) = kx2e–x, 0 < x < •
Solution
Since f(x) is a probability density function,

Ú f ( x )dx = 1
-•

Ú kx
2 -x
e dx = 1
   0


k x 2 (-e - x ) - 2 x e - x + 2(-e - x ) =1
0

k (2 e 0 ) = 1
1
k=
2
1 2 -x
Hence, f ( x) = x e ,0 < x < •
2

Úx
r
m2¢ = f ( x ) dx
-•

1 2 -x
= Ú xr x e dx
0
2
3.7 Measures of Statistics for Continuous Random Variables 3.61


1 - x r +2
2 Ú0
= e x dx

1
= r +3
2
1
= (r + 2)!
2
1
m1¢ = (3!) = 3
2
1
m2¢ = (4!) = 12
2
1
m 3¢ = (5!) = 60
2
1
m 4¢ = (6!) = 360
2
m2 = m2¢ - ( m1¢ )2 = 12 - (3)2 = 3

m3 = m3¢ - 3m2¢ m1¢ + 2( m1¢ )3 = 60 - 3(12) (3) + 2(3)3 = 6


m 4 = m 4¢ - 4 m3¢ m1¢ + 6 m2¢ ( m1¢ )2 - 3( m1¢ )4
= 360 - 4(6) (3) + 6(12) (3)2 - 3(3)4
= 45
Mean = m1¢ = 3
Variance = m2 = 3
m32 (6)2 4
b1 = = =
m32 (3)3 3
m4 45
b2 = = =5
m22 (3)2

Example 19
The probability density function of a random variable X is given by
f(x) = kx (2 – x), 0 ≤ x ≤ 2. Find mean, variance b1 and b2.
Solution
Since f(x) is a probability density function,

Ú f ( x ) dx = 1
-•
3.62 Chapter 3 Basic Statistics

Ú k x(2 - x) dx = 1
0
2
k Ú (2 x - x 2 ) dx = 1
0
2
2x3
k x - =1
3
0

Ê 8ˆ
kÁ4- ˜ =1
Ë 3¯
3
k=
4
3
Hence,     f ( x ) = x(2 - x ) , 0 ≤ x ≤ 2
4

Úx
r
mr¢ = f ( x ) dx
-•
2
3
= Ú xr x(2 - x )dx
0
4
2
3 r +1
4 Ú0
= x (2 - x )dx

2(2r +1 )
=
(r + 2)(r + 3)
3(22 )
�m1¢ = =1
(3)(4)
3(23 ) 6
m2¢ = =
(4)(5) 5
3(2 4 ) 8
m3¢ = =
(5)(6) 5
3(25 ) 16
m 4¢ = =
(6)(7) 7
6 1
m2 = m2¢ - ( m1¢ )2 = -1 =
5 5
8 Ê 6ˆ
m3 = m3¢ - 3m2¢ m1¢ + 2( m1¢ )3 = - 3 Á ˜ (1) + 2 = 0
5 Ë 5¯
m 4 = m 4¢ - 4 m3¢ m1¢ + 6 m2¢ ( m1¢ )2 - 3( m1¢ )4

3.7 Measures of Statistics for Continuous Random Variables 3.63

16 Ê 8ˆ Ê 6ˆ
= - 4 Á ˜ (1) + 6 Á ˜ (1)2 - 3(1)4
7 Ë 5¯ Ë 5¯
3
=
35
Mean = m1¢ = 1
1
Variance = m2 =
5
m32
b1 = =0
m23
3
m435 15
b2 = 2 = =
m2 Ê 1 ˆ 2 7
ÁË 5 ˜¯

Example 20
Show that for the symmetrical distribution
2a Ê 1 ˆ
f ( x) = , -a£ x £ a
p ÁË a 2 + x 2 ˜¯
a 2 (4 - p ) Ê 8ˆ
m2 = and m 4 = a 4 Á 1 - ˜
p Ë 3p ¯

Solution
• a
2a Ê 1 ˆ
Ú f ( x ) dx = Ú p ÁË a 2 + x 2 ˜¯
dx
-• -a
a
2a 1 x
= tan -1
p a a -a
a
2 x
= tan -1
p a -a

2 È -1
= tan (1) - tan -1 (-1)˘˚

    = 1
Hence, f(x) represents a probability density function.
3.64 Chapter 3 Basic Statistics


m1¢ = Ú x f ( x) dx
-•
a
2a Ê 1 ˆ
= Úxp ÁË 2 ˜ dx
a + x2 ¯
-a
a
2a x
= Ú
p -a a + x2
2
dx

a
2a 1
= log(a 2 + x 2 )
p 2 -a
=0 [∵ integrand is an odd function of x ]


u2¢ = Úx
2
f ( x ) dx
-•
a
2a Ê 1 ˆ
Úx
2
= dx
-a
p ÁË a 2 + x 2 ˜¯
a
2a x2
p -Úa a 2 + x 2
= dx

a
4a x 2 + a2 - a 2
p Ú0 a 2 + x 2
= dx

4a Ê a2 ˆ
a
= Ú 1- 2
p 0 Ë a + x 2 ˜¯
Á dx

a
4a x
= x - a tan -1
p a
0

4a
= (a - a tan -1 1)
p
4a Ê pˆ
= Á a-a ˜
p Ë 4¯
a 2 (4 - p )
=
p
a 2 (4 - p ) a 2 (4 - p )
m2 = m2¢ - ( m1¢ )2 = -0 =
p p
m 4 = m 4¢ (∵ m1¢ = 0)

Úx
4
m4 = ◊ f ( x ) dx
        -•
3.7 Measures of Statistics for Continuous Random Variables 3.65

a
2a Ê 1 ˆ
Úx
4
= ◊ dx
-a
p ÁË a 2 + x 2 ˜¯
a
2a x4
= Ú
p - a a2 + x2
dx

4a Ê 2 a4 ˆ
a

p Ú0 ÁË
2
= x - a + ˜ dx
a2 + x2 ¯
a
4a 1 3 x
= x - a 2 x + a 3 tan -1
p 3 a 0

4a Ê a3 ˆ
= Á - a 3 + a 3 tan -1 1˜
p Ë 3 ¯
4a Ê a3 pˆ
= Á - a3 + a3 ˜
p Ë 3 4¯
Ê 8 ˆ
= a4 Á1 - ˜
Ë 3p ¯

Exercise 3.4
1. If the probability density function is given by
f (x) = kx 2 (1 - x 3 ) 0 £ x £1
=0 otherwise
Ê 1ˆ
  Find (i) k, (ii) P Á 0 < X < ˜ , (iii) X , and (iv) s 2 .
Ë 2¯

È 15 9 9 ˘
Í ans.: (i) 6 (ii) 64 (iii) 14 (iv) 245 ˙
 Î ˚
2. If the probability density function of a random variable is given by
f (x) = kx 0£x£2
= 2k 2£x£4
= 6 k - kx 4£x£6


  Find (i) k, (ii) P(1 £ X £ 3), and (iii) X .
È 1 1 383 ˘
Í ans.: (i) 2 (ii) 3 (iii) 36 ˙
 Î ˚
3.66 Chapter 3 Basic Statistics

3. If the probability density of a random variable is given by


x
-
f ( x) = k x e 3
x>0
=0 x£0


  Find (i) k, (ii) X , and (iii) s2.
È 1 ˘
Í ans.: (i) 9 (ii) 6 (iii) 18 ˙
 Î ˚
4. A continuous random variable has the probability density function
f (x) = 2 e -2 x x>0
=0 x£0

  Find (i) E(X), (ii) E(X ), (iii) Var (X), and (iv) SD of X.
È 1 1 1 1˘
Í ans.: (i) 2 (ii) 2 (iii) 4 (iv) 2 ˙
 Î ˚
5. A random variable X has the pdf
k
f ( x) = ,-• < x < •
1+ x 2
  Determine (i) k, (ii) P(X ≥ 0), (iii) mean, and (iv) variance.
È 1 1 ˘
Í ans.: (i) p (ii) 2 (iii) 0 (iv) does not exist ˙
 Î ˚
6. The distribution function of a continuous random variable X is given by
F (x) = 1 - (1 + x) e - x , x ≥ 0. Find (i) pdf, (ii) mean, and (iii) variance.
ÈÎ ans.: (i) f (x) = x e - x , x ≥ 0 (ii) 2 (iii) 2 ˘˚

7. If f(x) is the probability density function of a continuous random variable,
find k, mean, and variance.
f (x) = kx 2 0 £ x £1
2
= (2 - x) 1£ x £ 2

È 11 ˘
Í ans.: 2, 12 , 0.626 ˙
 Î ˚
8. A continuous random variable X has the probability density function given
by
f (x) = 2ax + b 0£x£2
=0 otherwise
3.7 Measures of Statistics for Continuous Random Variables 3.67

  If the mean of the distribution is 3, find the constants a and b.


È 3 5˘
Í ans.: 2 , - 2 ˙
 Î ˚
9. If X is a continuous random variable with probability density function
given by
f (x) = k (x - x 3 ) 0 £ x £1
=0 otherwise

  Find (i) k, (ii) mean, (iii) variance, and (iv) median.


È 1 ˘
Í ans.: (i) 2 (ii) 0.06 (iii) 0.04 (iv) 2 ˙
 Î ˚
10. The probability density function of a random variable is given by
f ( x) = 0 x<2
2x + 3
= 2£x£4
18
=0 x>4

  Find the mean and variance.


È 83 ˘
Í ans.: (i) 27 , 0.33˙
 Î ˚
11. A continuous random variable X has the probability density function
f ( x) = x 3 0 £ x £1
3
= (2 - x) 1£ x £ 2
=0 otherwise

  Find P(0.5 £ X £ 1.5) and mean of the distribution.
È 15 1 ˘
Í ans.: 32 , 2 ˙
 Î ˚
12. The probability density function of a continuous random variable X is
given by
f(x) = kx (2 – x)    0 £ x £ 2
  Find k, mean, and variance.
È 3 1˘
Í ans.: 4 , 1, 5 ˙
 Î ˚
13. If the density function of a continuous random variable X is given by f(x)
= le-l(x - a), a ≤ x < •, show that b1 = 4 and b2 = 9.
3.68 Chapter 3 Basic Statistics

14. If the continuous random variable has the density function
kx
f ( x) = , x ≥ 0, find the value of k, median and mode.
(1 + x)3
È 1˘
Í ans.: 2, 1 + 2, 2 ˙
Î ˚
15. The density function of a continuous random variable X is given by
3
f (x) = x(2 - x),0 £ x £ 2. Find the mean, median, mode, harmonic
4
mean, MD about mean and SD.
È 2 3 1 ˘
Í ans.: 1, 1, 1, , , ˙
Î 3 8 5˚

16. The density function of a continuous random variable X is given by


Ï1 2
Ô 16 (3 + x ) -3 £ x £ -1
Ô
Ô1
f (x) Ì (6 - 2 x 2 ) -1 £ x £ 1
Ô 16
Ô1 2
Ô 16 (3 - x ) 1£ x £ 3
Ó
Find the mean, SD and MB about the mean.
È 13 ˘
Í ans.: 0, 1, 16 ˙
Î ˚

3.8 Expected Values of Two Dimensional Random


Variables

If (X, Y) is a two dimensional discrete random variable with joint probability mass
function P(xi, yj) = pij, then the mathematical expectation of a function g(x, y) is given
by
• •
E[ g( X , Y )] = ÂÂ g( xi , y j ) pij
j =1 i =1

= ÂÂ g( x, y) f ( x ), y)
x y

If (X, Y) is a two dimensional continuous random variable with joint probability density
function f(x, y), then the mathematical expectation of a function g(x, y) is given by
• •
E[ g( X , Y )] = Ú Ú g ( x , y ) f ( x , y ) dx dy
-• -•
3.8 Expected Values of Two Dimensional Random Variables 3.69

3.8.1 Properties of Expected Values of Two Dimensional


Random Variables
(i) If X and Y are random variables, then E(X+Y) = E(X) + E(Y) provided all the
expectations exist.
(ii) If X and Y are independent random variables then E(XY) = E(X) . E(Y).

3.8.2 Conditional Expectation and Conditional Variance


If (X,Y) is a two dimensional discrete random variable with joint probability mass
function pij then the conditional expectations of g(X, Y) is given by

E{g( X , Y ) / Y = y j } = Â g( xi, y j )P( X = xi / Y = y j )
i =1
• g( xi, y j )P( X = xi , Y = y j )

i =1 P(Y = y j )
• pij
= Â g( xi, y j )
i =1 p* j

In particular, the conditional expectation of a discrete random variable X given Y = yj
is given by

E ( X / Y = y j ) = Â xi P ( X = xi / Y = y j )
i =1

The conditional variance of X given Y = yj is given by


Var( X / Y = y j ) = E ÈÎ{X - E ( X / Y = y j )}2 / Y = y j ˘˚

If (X, Y) is a two-dimensional continuous random variable with joint probability
density function f(x, y), then the conditional expectation of g(X, Y) is given by

E{g( X , Y ) / Y = y} = Ú g( x, y) f ( x /y)dx
-•

g( x, y) f ( x, y)dx
= Ú fY ( y)
-•

In particular, the conditional expectation of X given Y = y is given by


Ú xf ( x, y)dx
-•
E ( X / Y = y) =
fY ( y)

Ú yf ( x, y) dx
-•
Similarly, E (Y / X = x ) =
fX ( x)
3.70 Chapter 3 Basic Statistics

The conditional variance of X is given by


Var (X/Y = y) = E[{X – E(X/Y = y)}2/ Y = y]
Similarly, Var(Y/X = x) = E[{Y – E(Y/X = x)}2 /X = x]

3.8.3 Properties of Conditional Expectation


(i) If x and y are independent random variables, then
E(Y/X) = E(Y)
and E(X/Y) = E(X)
(ii) E(XY) = E[X. E(Y/X)]
(iii) E(X2Y2) = E(X2. E(Y2/X)]

Example 1
Given a pair of discrete random variable X and Y whose joint probabil-
ity distribution is given by
X
2 4
Y
1 0.1 0.15
2 0.2 0.3
3 0.1 0.15

Find the expected value of the function g(X, Y) given that g(X, Y) = 2X + Y.
Solution
E[ g( x, y)] = ÂÂ g( x, y) f ( x, y)
x y

= ÂÂ (2 x + y) f ( x, y)
x y

= {2(2) + 1}0.1 + {2(4) + 1}0.15


+ {2(2) + 2}0.2 + {2(4) + 2}0.3
+ {2(2) + 3}0.1 + {2(4) + 3}0.15
= 8.4

Example 2
Let X and Y be two random variables each taking values –1, 0 and 1 and
having the joint probability distribution as given below:
3.8 Expected Values of Two Dimensional Random Variables 3.71

X
–1 0 1 Total p(y)
Y
–1 0 0.1 0.1 0.2
0 0.2 0.2 0.2 0.6
1 0 0.1 0.1 0.2
Total p(x) 0.2 0.4 0.4 1.0

(i) Show that X and Y have different expectation.


(ii) Find E(XY)
(iii) Find Var(X) and Var(Y).
(iv) Given that Y = 0, what is the conditional probability distribution
of X?
Solution
(i) E ( X ) = Â xp( x )
= -1(0.2) + 0(0.4) + 1(0.4)
= 0.2
E (Y ) = Â yp( y)
= -1(0.2) + 0(0.6) + 1(0.2)
=0
E ( X ) π E (Y )
(ii) E ( XY ) = Â xi y j pij
= (-1)(-1)(0) + (0)(-1)(0.1) + (1)(-1)(0.1)
+ (-1)(0)0.2 + (0)(0)(0.2) + (1)(0)(0.2)
+ (-1)(1)(0) + (0)(1)(0.1) + (1)(1)(0.1)
=0

(iii) E ( X 2 ) = Â x 2 p( x )
= (-1)2 (0.2) + 0(0.4) + (1)2 (0.4)
= 0.6
Var( X ) = E ( X 2 ) - {E ( X )}2
= 0.6 - (0.2)2
= 0.56

E (Y 2 ) = Â y 2 p( y)
= (-1)2 (0.2) + 0(0.6) + (1)2 (0.2)
= 0.4
3.72 Chapter 3 Basic Statistics

Var(Y ) = E (Y 2 ) - {E (Y )}2
= 0.4 - 0
= 0.4
P ( X = -1, Y = 0)
(iv) P ( X = -1 / Y = 0) =
P (Y = 0)
0.2 1
= =
0.6 3
P ( X = 0, Y = 0)
P ( X = 0 / Y = 0) =
P (Y = 0)
0.2 1
= =
0.6 3
P ( X = 1, Y = 0)
P ( X = 1 / Y = 0) =
P (Y = 0)
0.2 1
= =
0.6 3

Example 3
If the joint pdf of (X,Y) is given by
Ï16 y
Ô x > 2,0 < y < 1
f ( x, y ) = Ì x 3
Ô0 elsewhere
Ó
then find E(X,Y).
Solution
• •
E( X ,Y ) = Ú Ú xyf ( x, y) dx dy
-• -•
1•
Ê 16 y ˆ
= Ú Ú xy Á 3 ˜ dx dy
Ë x ¯
02
1•
Ê y2 ˆ
= 16 Ú Ú Á 2 ˜ dx dy
0 2Ë x ¯
1 •
1
= 16 Ú y 2 - dy
0
x 2
1
1 2
= 16 Ú y dy
0
2

3.8 Expected Values of Two Dimensional Random Variables 3.73

1
y3
=8
3
0
8
= (1 - 0)
3
8
=
           3

Example 4
The joint PDF of (X,Y) is given by
f(x, y) = 24xy ,   x > 0, y > 0, x + y £ 1
    = 0   ,   elsewhere
Find the conditional mean and variance of Y, given X.
Solution
The region of integration is DOAB.
In DOAB, along vertical strip PQ, limits of y: y = 0 to y = 1 – x and x varies from x = 0 to
x = 1. y

fX ( x) = Ú f ( x, y)dy
-•
1- x
B
= Ú 24 xy dy
Q
0 x+y=1
1- x
2
y
= 24 x
2 O P A(1, 0) x
0

= 12 x(1 - x )2 , 0 £ x £1 Fig. 3.4


f ( x, y )
f ( y /x ) =
fX ( x)
24 xy
=
12 x(1 - x )2
2y
=
(1 - x )2

E (Y / X = x ) = Ú yf ( y /x)dy
-•
1- x
2y
= Ú y.
(1 - x )2
dy
      0
3.74 Chapter 3 Basic Statistics

1- x
2 y3
=
(1 - x )2 3
0
2
= (1 - x )
3

E (Y 2 / x ) = Úy
2
f ( y /x )dy
-•
1- x
2y
= Ú y2
(1 - x )2
dy
     0
1- x
Var(Y 2 /x ) = ( E (2Y 2 / xy)4- {E (Y / x )}2
= 2
1 - x ) 24 Ï0 2
(1 2
¸
= (1 - x ) - Ì (1 - x )˝
12 3
= (1 - x )2 Ó ˛
21 4
= (1 - x )2 - (1 - x )2
2 9
1
= (1 - x )2
18
5 4
1 x3
= Ú
96 1
y2
3
dy
0
5
1 4
= Ú
288 1
y 2 x 3 dy
0

5
1
288 Ú1
= 64 y 2 dy

5
2 y3
=
9 3
1
2
= (125 - 1)
27
248
=
    27

(iv) E (2 X + 3Y ) = 2 E ( X ) + 3E (Y )
Ê 8ˆ Ê 31 ˆ
= 2 Á ˜ + 3Á ˜
Ë 3¯ Ë 9¯
47
=
3
3.8 Expected Values of Two Dimensional Random Variables 3.75

• •
E( X 2 ) = Ú Úx
(v) 2
f ( x, y) dx dy
-• -•
54
Ê xy ˆ
= Ú Ú x Á ˜ dx dy
2
Ë 96 ¯
10
5 4
1 x4
96 Ú1 4
= y dy
0
5
1
384 Ú1
= 256 y dy

5
2 y2
=
3 2
1
1
= (25 - 1)
3
=8
Var( X ) = E ( X 2 ) - {E ( X )}2
2
Ê 8ˆ
= 8-Á ˜
Ë 3¯
8
=
9
• •
E (Y 2 ) = Ú Úy
2
(vi) f ( x, y) dxdy
-• -•
54
Ê xy ˆ
= Ú Ú y Á ˜ dx dy
2
Ë 96 ¯
10
5 4
1 x2
= Ú
96 1
y3
2
dy
0
5
1
192 Ú1
= 16 y3 dy

5
1 y4
=
12 4
1
1
= (625 - 1)
48
= 13
3.76 Chapter 3 Basic Statistics

Var(Y ) = E (Y 2 ) - {E (Y )}2
2
Ê 31ˆ
= 13 - Á ˜
Ë 9¯
92
=
81
(vii) Cov( X , Y ) = E ( XY ) - E ( X )E (Y )
248 Ê 8 ˆ Ê 31ˆ
= -
27 ÁË 3 ˜¯ ÁË 9 ˜¯
=0

Example 5
Two random variables X and Y have the following joint probability den-
sity function:
ÏÔ2 - x - y, 0 £ x £ 1, 0 £ y £ 1
f ( x, y ) = Ì
ÔÓ0 , otherwise
Find (i) Marginal probability density function of X and Y.
(ii) Conditional density functions
(iii) Var(X) and Var(Y)
(iv) Covariance between X and Y
Solution

(i) fX ( x) = Ú f ( x, y) dy
-•
1
= Ú (2 - x - y) dy
0
1
y2
= 2 y - xy -
2
0

Ê 1ˆ
= Á2 - x - ˜
Ë 2¯
3
= -x
2
Ï3
Ô - x, 0 < x < 1
\ fX ( x) = Ì 2
ÔÓ0 , otherwise
3.8 Expected Values of Two Dimensional Random Variables 3.77

Ï3
Ô - y, 0 < y < 1
Similarly, fY ( y) = Ì 2
ÔÓ0 , otherwise
f ( x, y )
(ii) f X /Y ( x / y) =
fY ( y)
(2 - x - y)
= , 0 < ( x, y ) < 1
Ê3 ˆ
ÁË 2 - y˜¯
f ( x, y )
fY / X ( y / x ) =
fX ( x)
(2 - x - y)
=
Ê3 ˆ
ÁË 2 - x ˜¯

(iii) E ( X ) = Ú x f X ( x)dx
-•
1
Ê3 ˆ
= Ú x Á - x ˜ dx
Ë2 ¯
0
1
3x2 x3
= -
4 3
0
3 1
= -
4 3
5
=
12

E (Y ) = Ú y fY ( y)dy
-•
1
Ê3 ˆ
= Ú y Á - y˜ dy
Ë2 ¯
0
1
3 y 2 y3
= -
4 3
0
3 1
= -
4 3
5
=
12
3.78 Chapter 3 Basic Statistics


E( X 2 ) = Úx
2
f X ( x )dx
-•
1
Ê3 ˆ
= Ú x 2 Á - x ˜ dx
Ë2 ¯
0
1
x3 x 4
= -
2 4
0
1 1
= -
2 4
1
=
  4
Var( X ) = E ( X 2 ) = {E ( X )}2
2
1 Ê 5ˆ
= -Á ˜
4 Ë 12 ¯
11
=
144
11
Similarly, Var(Y ) =
144

Example 6
If the joint pdf of (X, Y) is given by
f(x, y) = 24y(1 – x), 0 ≤ y ≤ x ≤ 1,
then find E(XY).
Solution
The region of integration is DOAB. In DOAB, along horizontal strip P¢Q¢,
Limits of x: x = y to x = 1 and y varies from y = 0 to y = 1.
• •
y

E ( XY ) = Ú Ú xy f ( x, y) dx dy
-• •
11 B(1, 1)
x
= Ú Ú xy ◊ 24 y(1 - x ) dx dy y=
0 y P¢ Q¢
11 x=1
= 24 Ú Ú xy 2 (1 - x ) dx dy
0 y O x
A(1, 0)
Fig. 3.5
3.8 Expected Values of Two Dimensional Random Variables 3.79

1 1
x2 x3
= 24 Ú y 2 - dy
0
2 3
y
1
Ê 1 1 y 2 y3 ˆ
= 24 Ú y 2 Á - - + ˜ dy
0 Ë2 3 2 3¯
1
Ê 1 y 2 y3 ˆ
= 24 Ú y 2 Á - + ˜ dy
0 Ë6 2 3¯
1
Ê y2 y 4 y5 ˆ
= 24 Ú Á - + ˜ dy

6 2 3¯
1
y3 y 5 46
= 24 - +
18 10 18
0

Ê 1 1 1ˆ
= 14 Á - + ˜
Ë 18 10 18 ¯
4
=
15

Example 7
Two random variables have joint pdf
Ï xy
Ô , 0 < x < 4, 1 < y < 5
f ( x, y) = Ì 96
ÔÓ0 , elsewhere
Find (i) E(X) (ii) E(Y) (iii) E(XY) (iv) E(2X + 3Y) (v) Var(X) (vi) Var(Y)
(vii) Cov(X,Y)
Solution
• •
(i) E ( X ) = Ú Ú x f ( x, y)dx dy
-• -•
54
Ê xy ˆ
= Ú Ú x Á ˜ dx dy
Ë 96 ¯
10
5 4
1 x3
96 Ú1 3
= y dy
0
5
1 Ê 64 ˆ
= Ú y Á ˜ dy
96 1 Ë 3 ¯
3.80 Chapter 3 Basic Statistics

5
2 y2
=
9 2
1
2 Ê 25 1 ˆ
= Á - ˜
9 Ë 2 2¯
8
=
   3
• •
(ii) E (Y ) = Ú Ú yf ( x, y) dx dy
-• -•
54
Ê xy ˆ
= Ú Ú y Á ˜ dx dy
Ë 96 ¯
10
5 4
2
1 2 x
96 Ú1
= y dy
2
0
5
1
96 Ú1
= 8 y 2 dy

5
1 y3
=
12 3
1
1
= (125 - 1)
36
31
=
9
• •
(iii) E ( XY ) = Ú Ú xy f ( x, y) dx dy
-• -•
54
Ê xy ˆ
= Ú Ú xy Á ˜ dx dy
Ë 96 ¯
10
54
1
96 Ú1 Ú0
= x 2 y 2 dx dy

• •
(iv) E ( XY ) = Ú Ú xy f ( x, y)dx dy
-• -•
11
= Ú Ú xy (2 - x - y) dx dy
00
1 1
x3 x2 y
=Úy x - - 2
dy
0
3 2
0
3.8 Expected Values of Two Dimensional Random Variables 3.81

1
Ê 1 yˆ
= Ú y Á 1 - - ˜ dy
Ë 3 2¯
0
1
Ê 2 y y2 ˆ
=ÚÁ - ˜ dy
0 Ë
3 2¯
1
y 2 y3
= -
3 6
0
1 1
= -
3 6
1
=
6
(v) Cov( X , Y ) = E ( XY ) - E ( X )E (Y )
1 Ê 5 ˆÊ 5 ˆ
= -
6 ÁË 12 ˜¯ ÁË 12 ˜¯
1
=-
144

Example 8
Let f(x, y) = 8xy, 0<x<y<1
      = 0 , elsewhere
Find (i) E (Y/X = x) (ii) E(XY/X = x) (iii) Var(Y/X = x).
Solution
The region of integration is DOAB. In DOAB, along vertical strip PQ, limits of y: y = x
to y = 1 and x varies from x = 0 to x = 1.
• y
fX ( x) = Ú f ( x, y) dy
-•
1 B Q y=1
A (1, 1)
= Ú 8 xy dy P¢ Q¢
x
x
y=
1 P
y2
= 8x O x
2 x
Fig. 3.6
= 4 x(1 - x 2 ) 0 < x <1

In DOAB, along horizontal strip P¢Q¢,


3.82 Chapter 3 Basic Statistics

Limits of x : x = 0 to x = y and y varies from y = 0 to y = 1



fY ( y) = Ú f ( x, y) dx
-•
y
= Ú 8 xy dx
0
y
x2
= 8y
2
0
3
= 4y , 0 < y <1
f ( x, y )
f X /Y ( x / y) =
fY ( y)
8 xy
=
4 y3
2x
=
y2
f ( x, y )
fY / X ( y / x ) =
fX ( x)
8 xy
=
4 x(1 - x 2 )
2y
=
1 - x2

(i) E (Y / X = x ) = Ú y fY / X ( y / x) dy
-•
1
Ê 2y ˆ
= Ú yÁ dy
Ë 1 - x 2 ˜¯
x
1
2 y3
=
1 - x2 3
x

2 Ê 1- x ˆ 3
=
3 ÁË 1 - x 2 ˜¯

2 Ê 1 + x + x2 ˆ
=
3 ÁË 1 + x ˜¯

(ii) E ( XY / X = x ) = x E (Y / X = x )
2 x(1 + x + x 2 )
=
3 (1 + x )
3.8 Expected Values of Two Dimensional Random Variables 3.83


(iii) E (Y 2 / X = x ) = Úy
2
fY / X ( y / x ) dy
-•
1
Ê 2y ˆ
= Ú y2 Á dy
Ë 1 - x 2 ˜¯
x
1
2 y4
=
1 - x2 4
x

1 Ê 1- x ˆ 4
=
2 ÁË 1 - x 2 ˜¯
Var(Y /X = x ) = E (Y 2 /X = x ) - {E (Y /X = x )}2
2
1 + x2 È 2 Ê 1 + x + x2 ˆ ˘
= -Í Á ˜˙
2 ÍÎ 3 Ë 1 + x ¯ ˙˚
1 + x 2 4 (1 + x + x 2 )2
= -
2 9 (1 + x )2

Exercise 3.5
1. If the pdf of (X, Y) is given by
f(x, y) = 2 – x – y, 0 ≤ x ≤ y ≤ 1
Find E(X) and E(Y).
È 5 5˘
Í ans.: 12 , 12 ˙
Î ˚

Ï1
Ô , 0 < x2 + y2 < 1
2. If f ( x , y ) = Ìp
Ô0, x2 + y2 > 1
Ó
Find the covariance of X, Y. [Ans.: 0]
3. Joint pdf of X and Y is given by
f(x, y) = 3(x +y)    0 ≤ x ≤ 1, 0 ≤ y ≤ 1
Find E(Y/X = x) and Cov(X,Y).
È (1 - x)(x + 2) 13 ˘
Í ans.: ,- ˙
Î 3(1 + x) 320 ˚

4. Let fXY (x, y) = e–(x+y) 0 ≤ x < •, 0 < y < •
Find Cov(X, Y).
 [Ans.: 0]
3.84 Chapter 3 Basic Statistics

5. If the joint pdf of (X, Y) is given by


f(x, y) = 2, 0 £ x < y £ 1,
find the conditional mean and conditional variance of X given that Y = y.
È y y2 ˘
Í ans.: , ˙
ÎÍ 2 12 ˙˚

6. If the joint pdf of (X, Y) is given by


f(x, y) = 21 x2 y3 , 0 £ x < y £ 1
find the conditional mean and conditional variance of X, given that
Y = y, 0 < y < 1.
È 3y 3y 2 ˘
Í ans.: , ˙
ÎÍ 4 80 ˚˙

7. If the joint pdf of (X, Y) is given by
f(x, y) = 3xy (x + y) , 0 < x £ y £ 1,
17
verify that E {E(Y/X)} = E(Y) = .
24

3.9 Bounds on Probabilities

If the probability distribution of a random variable is known E(X) and Var(X) can be
computed. Conversely, if E(X) and Var(X) are known, probability distribution of X
{ }
cannot be constructed and quantities such as P X - E ( X ) £ k can not be evaluated.
Several approximation techniques have been developed to yield upper and /or lower
bounds to such probabilities. The most important of such techniques is Chebyshev’s
inequality.

3.10 Chebyshev’s Inequality

If X is a random variable with mean m and variance s2, then for any positive umber k,
1
{
P X - m ≥ ks £ 2
k
}
1
{
or P X - m < ks ≥ 1 - 2
k
}
Proof
Let X be a continuous random variable.
s 2 = E[ X - E ( X )]2


= E[ X - m ]2 [∵ m = E ( X )]
3.10 Chebyshev’s Inequality 3.85

Ú ( x - m)
2
= f ( x )dx where f ( x ) is pdf of X .
-•
m - ks m + ks •

Ú Ú Ú
2 2
= ( x - m ) f ( x )dx + ( x - m ) f ( x )dx + ( x - m )2 f ( x )dx
-• m - ks m + ks
m - ks •
≥ Ú ( x - m )2 f ( x )dx + Ú ( x - m )2 f ( x )dx ...(1)
-• m + ks

We know that x ≤ m – ks and x ≥ m + ks


\ |x – m| ≥ ks
Substituting in Eq. (1),
m - ks •
s2 ≥ Ú k 2s 2 f ( x )dx + Ú k 2s 2 f ( x )dx
-• m + ks

È m - ks • ˘
= k 2s 2 Í Ú f ( x )dx + Ú f ( x ) dx ˙
Í -• ˙
Î m + ks ˚
= k 2s 2 [ P( X £ m - ks ) + P ( X ≥ m + ks )]
= k 2s 2 [ P( X - m £ - ks ) + P( X - m ≥ ks )]


{
= k 2 s 2 P X - m ≥ ks }
{ } k12
P X - m ≥ ks £

∵ P { X - m ≥ ks } + P { X - m < ks } = 1
P { X - m < ks } = 1 - P { X - m ≥ ks }
1
≥ 1-
k2

Note
1. If ks = c > 0
s2
{
P X-m ≥c £ } c2
s2
{
and   P X - m < c ≥ 1 -
c2
}
2. To find the lower bound of probabilities following form of Chebyshev’s in-
equality is used:
1
{
P X - m < ks ≥ 1 - 2
k
}
3.86 Chapter 3 Basic Statistics

s2
or { }
P X - m < c ≥ 1-
c2
3. To find the upper bound of probabilities following form of Chebyshev’s in-
equality is used;
1
{
P X - m ≥ ks £ } k2
s2
or {
P X-m ≥c £ } c2

Example 1
A random variable X has a mean m = 12 and a variance s2 = 9 and
unknown probability distribution. Find P(6 < X < 18).
Solution
m = 12, s2 = 9
s=3
By Chebyshev’s inequality,
1
{
P X - m < ks ≥ 1 - } k2
1
P {- ks < X - m < ks } ≥ 1 -
k2
1
P { m - ks < X < m + k s } ≥ 1 -
k2
1
P {12 - 3k < X < 12 + 3k } ≥ 1 - 2
k
Comparing with P(6 < X < 18),
12 – 3k = 6
12 + 3k = 18
\     k = 2
1
P{6 < X < 18} ≥ 1 -
4
3
P{6 < X < 18} ≥
4
3.10 Chebyshev’s Inequality 3.87

Example 2
A random variable X has a mean 10 and a variance 4 and
unknown probability distribution. Find the value of c such that
P{|X – 10| ≥ c} ≤ 0.04.
Solution
m = 10, s2 = 4
s=2
By Chebyshev’s inequality,
1
{
P X - m ≥ ks £ } k2

{ }
Comparing with P X - 10 ≥ c £ 0.04,

1
= 0.04
k2
k=5
and ks = c
c = 5(2) = 10

Example 3
A random variable X has pdf f(x) = e–x, x ≥ 0. Use Chebyshev’s inequal-
1
ity to show that P { X - 1 > 2} £ and also, show that the actual prob-
4
ability is given by e–3 .
Solution
f(x) = e–x
The random variable X follows exponential distribution with parameter l = 1.
1
E( X ) = m = = 1
l
1
Var( X ) = s 2 = 2 = 1
l
By Chebyshev’s inequality,
1
{
P X - m > ks £ } k2

3.88 Chapter 3 Basic Statistics

{
Comparing with P X - m > 2 , }
ks = 2
k (1) = 2
k=2
1
\ {
P X -1 > 2 £ } 4
The actual probability is given by
{ }
P X - 1 > 2 = 1- P X - 1 £ 2 { }
= 1 - P{-1 < X £ 3}
= 1 - P{0 < X £ 3}
3
= 1 - Ú e - x dx
0
3
= 1 - e- x
0
-3
=1- e

Example 4
A random variable X is exponentially distributed with parameter 1. Use
3
Chebyshev’s inequality to show that P{-1 £ X £ 3} ≥ . Find the actual
4
probability also.
Solution
For an exponential distribution with parameter l = 1,
1
E( X ) = m = = 1
l
2 1
Var( X ) = s = 2 = 1
l
s = 1
By Chebyshev’s inequality,
1
{
P X - m < ks ≥ 1 - } k2
1
P {- ks < X - m < ks } ≥ 1 -
k2
3.10 Chebyshev’s Inequality 3.89

1
P { m - ks < X < m + k s } ≥ 1 -
k2
1
P {1 - k < X < 1 + k } ≥ 1 -
k2
3
Comparing with P {-1 £ X £ 3} ≥ ,
4
1- k = -1
k=2
1
\ P {-1 £ X £ 3} ≥ 1 -
4
3

4
The actual probability is given by
P {-1 £ X £ 3} = P {0 £ X £ 3} [∵ x > 0 for exponential distribution ]
3
= Ú f ( x )dx
0
3
= Ú e - x dx
0
3
= -e- x
0

= -e -3 + e0
= 1 - e -3
= 0.9502

Example 5
A fair dice is tossed 120 times. Use Chebyshev’s inequality to find a
lower bound for the probability of getting 80 to 120 sixes.
Solution
Let X be the random variable which denotes number of sixes obtained when a fair dice
is tossed by 720 times.
n = 720
Probability of getting 6 in single toss
1
p=
6
1 5
q = 1- p = 1- =
6 6
3.90 Chapter 3 Basic Statistics

X follows a binomial distribution.


Ê 1ˆ
m = np - (720) Á ˜ = 120
Ë 6¯
Ê 1ˆ Ê 5ˆ
s 2 = npq = (720) Á ˜ Á ˜ = 100
Ë 6¯ Ë 6¯
s = 10

By Chebyshev’s inequality,
1
{
P X - m < ks ≥ 1 - } k2
1
P {- ks < X - m < ks } ≥ 1 -
k2
1
P { m - ks < X < m + k s } ≥ 1 -
k2
1
P {120 - 10 k < X < 120 + 10 k } ≥ 1 - 2
k
Comparing with P{80 < X < 120},
120 - 10 k = 80
k=4
1
P {80 < X < 120} ≥ 1 -
42
15
P {80 < X < 120} ≥
16
15
Hence, the lower bound for probability =
16

Example 6
Two dice are thrown once. If X is the sum of the numbers sharing up,
35
prove that P { X - 7 ≥ 3} £ . Compare this value with the exact prob-
34
ability.
Solution
Let X1 and X2 be the random variables which denote the outcomes of first and second
dice.
1 7
E ( X1 ) = E ( X 2 ) = (1 + 2 + 3 + 4 + 5 + 6) =
6 2
7 7
E ( X ) = E ( X1 ) + E ( X 2 ) = m = + = 7
2 2
3.10 Chebyshev’s Inequality 3.91

1 2 91
E ( X12 ) = E ( X 22 ) = (1 + 22 + 32 + 42 + 52 + 62 ) =
6 6
2
91 Ê 7 ˆ 35
Var( X1 ) = Var( X 2 ) = -Á ˜ =
6 Ë 2¯ 12
Var( X ) = Var( X1 + X 2 ) = (1)2 Var( X1 ) + (1)2 Var( X 2 )
35 35 35
s2 = + =
12 12 6
35
s=
6
By Chebyshev’s inequality,
1
{
P X - m ≥ ks £ } k2

{
Comparing with P X - 7 ≥ 3 , }
m=7
ks = 3
35
k =3
6
6
k =3
35
1
\ {
P X -7 ≥ 3 £ } 2
Ê 6 ˆ
Á 3 35 ˜
Ë ¯
35
£
54
Actual probability is given by
{ }
P X - 7 ≥ 3 = P{X = 1, 2, 3, 4,10,11,12}
1 2 3 4 3 2 1
= + + + + + +
36 36 36 36 36 36 36
4
=
9

Example 7
Use Chebyshev’s inequality to find how many times a fair coin must be
tossed in order that probability that the ratio of the number of heads
3.92 Chapter 3 Basic Statistics

to the number of tosses will the between 0.45 and 0.55 will be at least
0.95.
Solution
Let X be the random variable which denotes the number of heads obtained when a fair
coin is tossed n times.
1
p=q=
2
X follows a binomial distribution.
Mean = np and Var(X) = npq
x Ê1 ˆ 1
Mean of required ratio = E Á X ˜ = E( X )
n Ën ¯ n
1 1
= np = p =
n 2
1
\ m=
2
2
Ê X ˆ Ê 1ˆ 1 pq
Var Á ˜ = Á ˜ Var( X ) = 2 npq =
Ë n ¯ Ë n¯ n n
1.1
s=
pq
= 2 2 = 1
n n 2 n
By Chebyshev’s inequality,
ÏX ¸ 1
P Ì - m < ks ˝ ≥ 1 - 2
Ó n ˛ k
Ï X ¸ 1
P Ì - ks < - m < k s ˝ ≥ 1 - 2
Ó n ˛ k
Ï X ¸ 1
P Ì m - ks < < m + k s ˝ ≥ 1 - 2
Ó n ˛ k
Ï X ¸
But P Ì0.45 < < 0.55˝ ≥ 0.95
Ó n ˛
1
1- = 0.95
k2
1
= 0.05
k2
    k = 20
3.10 Chebyshev’s Inequality 3.93

m - ks = 0.45
Ê 1 ˆ
0.5 - Á = 0.45
Ë 2 n ˜¯
n = 2000
Hence, the fair coin must be tossed 2000 times.

Example 8
If X is the number on a dice when it is thrown, prove that
P { X - m ≥ 2.5} £ 0.47, where m is the mean.
Solution
Let x be the random variable which denotes the number on a dice. The probability
function is
X 1 2 3 4 5 6

1 1 1 1 1 1
P(X = x)
6 6 6 6 6 6

E ( X ) = m = Â xp( x )
Ê 1ˆ Ê 1ˆ Ê 1ˆ Ê 1ˆ Ê 1ˆ Ê 1ˆ
= 1Á ˜ + 2 Á ˜ + 3 Á ˜ + 4 Á ˜ + 5 Á ˜ + 6 Á ˜
Ë 6¯ Ë 6¯ Ë 6¯ Ë 6¯ Ë 6¯ Ë 6¯
7
=
2
Var( X ) = s 2 = Â x 2 p( x ) - m 2
2
Ê 1ˆ Ê 1ˆ Ê 1ˆ Ê 1ˆ Ê 1ˆ Ê 1ˆ Ê 7ˆ
= 1 Á ˜ + 4 Á ˜ + 9 Á ˜ + 16 Á ˜ + 25 Á ˜ + 36 Á ˜ - Á ˜
Ë 6¯ Ë 6¯ Ë 6¯ Ë 6¯ Ë 6¯ Ë 6¯ Ë 2¯
= 2.9167
s = 1.707

By Chebyshev’s inequality,
1
{
P X - m > ks < } k2

{
Comparing with P X - m > 2.5 , }
ks = 2.5
k (1.707) = 2.5
k = 1.46
3.94 Chapter 3 Basic Statistics

1
{
\ P X - m > 2.5 < } (1.46)2
P {{X - m} > 2.5} < 0.47

Example 9
The number of planes landing at an airport in a 30 minutes interval
obeys the Poisson law with mean 25. Use Chebyshev’s inequality to find
the least chance that the number of planes landing within a given 30
minutes interval will be between 15 and 25.
Solution
Let x be a random variable which denotes the number of planes landing at an airport.
For Poisson distribution,
E ( X ) = m = 25
Var( X ) = s 2 = m = 25
s =5
By Chebyshev’s inequality,
1
{ }
P X - m < ks ≥ 1 -
k2
1
P {- ks < X - m < ks } ≥ 1 -
k2
1
P { m - ks < X < m + k s } ≥ 1 -
k2
1
P {25 - 5k < X < 25 + 5k } ≥ 1 - 2
k
Comparing with P{15 < X < 25},
25 – 5k = 15 and 25 + 5k = 25
    k = 2
1
\ P{15 < X < 25} ≥ 1 - 2
(2)
3

4
3.10 Chebyshev’s Inequality 3.95

Exercise 3.6
1. A discrete random variable takes the values –1, 0, 1 with probability
1 3 1
{
, , respectively. Find P X - m ≥ 25 .
8 4 8
}
È 1˘
Í ans.: 4 ˙
 Î ˚
2. Use Chebyshev’s inequality to prove that P { X = m } = 1 if Var(X) = 0.
3. If X is a random variable with E(X) = 3 and E(X2) = 13, find the lower bound
for P(–2 < X < 8) using Chebyshev’s inequality.
È 21 ˘
Í ans.: 25 ˙
 Î ˚
4. Can we find a random variable for which P{m - 2s < X < m + 2s} = 0.6?
[Ans.: No]
5. If X denotes the sum of the numbers obtained when 2 dice are drawn, obtain
an upper bound for P{|X - 7| ≥ 4}. Compare with actual probability.
È 35 1 ˘
Í ans.: 96 , 6 ˙
Î ˚
6. A fair dice is tossed 720 times. Use Chebyshev’s inequality to find a lower
bound for getting 100 to 140 sixes.
È 3˘
Í ans.: 4 ˙
Î ˚
7. A pair of dice is rolled 900 times and X denotes the number of times a
total of 9 occurs. Find P(80 ≤ X ≤ 120) using Chebyshev’s inequality.
È 2˘
Í ans.: 9 ˙
 Î ˚
8. A discrete random variable X can assume the values x = 1, 2, 3, … with
1
probability 2-x. Show that P{|X - 2| ≥ 2} £ , while the actual probability
1 2
is .
8
1 16
9. A random variable X has the pmf P( X = 1) = , P( X = 2) = ,
18 18
1 s2
P( X = 3) =
18
{
. Show that there is a value of c such that P X - m ≥ c = 2 , }
c
3.96 Chapter 3 Basic Statistics

so that, in general, the bound given by Chebyshev’s inequality can not be


improved.
10. Using Chebyshev’s inequality find how many times a fair coin must be
tossed in order that the probability of the ratio of number of heads to the
number of tosses will lie between 0.4 and 0.6 will be at least 0.9.
[Ans.: 250]
11. Suppose that number of articles produced in a factory during a week is a
random variable with mean 500 and variance 100. What can be said about
the probability that a week’s production will lie between 400 and 600.
[Ans.: At least 0.99]
CHAPTER
4
Correlation and
Regression

Chapter Outline
4.1 Introduction
4.2 Correlation
4.3 Types of Correlations
4.4 Methods of Studying Correlation
4.5 Scatter Diagram
4.6 Simple Graph
4.7 Karl Pearson’s Coefficient of Correlation
4.8 Properties of Coefficient of Correlation
4.9 Rank Correlation
4.10 Regression
4.11 Types of Regression
4.12 Methods of Studying Regression
4.13 Lines of Regression
4.14 Regression Coefficients
4.15 Properties of Regression Coefficients
4.16 Properties of Lines of Regression (Linear Regression)

4.1 Introduction

Correlation and regression are the most commonly used techniques for investigating the
relationship between two quantitative variables. Correlation refers to the relationship
of two or more variables. It measures the closeness of the relationship between the
variables. Regression establishes a functional relationship between the variables. In
correlation, both the variables x and y are random variables, whereas in regression, x is
a random variable and y is a fixed variable. The coefficient of correlation is a relative
measure whereas the regression coefficient is an absolute figure.
4.2 Chapter 4 Correlation and Regression

4.2 Correlation

Correlation is the relationship that exists between two or more variables. Two variables
are said to be correlated if a change in one variable affects a change in the other variable.
Such a data connecting two variables is called bivariate data. Thus, correlation is a
statistical analysis which measures and analyses the degree or extent to which two
variables fluctuate with reference to each other. Some examples of such a relationship
are as follows:
1. Relationship between heights and weights.
2. Relationship between price and demand of commodity.
3. Relationship between rainfall and yield of crops.
4. Relationship between age of husband and age of wife.

4.3 Types of Correlations

Correlation is classified into four types:


1. Positive and negative correlations
2. Simple and multiple correlations
3. Partial and total correlations
4. Linear and nonlinear correlations

4.3.1 Positive and Negative Correlations


Depending on the variation in the variables, correlation may be positive or negative.
1. Positive Correlation If both the variables vary in the same direction, the
correlation is said to be positive. In other words, if the value of one variable increases,
the value of the other variable also increases, or, if value of one variable decreases, the
value of the other variable decreases, e.g., the correlation between heights and weights
of group of persons is a positive correlation.

Height (cm) 150 152 155 160 162 165


Weight (kg) 60 62 64 65 67 69

2. Negative Correlation If both the variables vary in the opposite direction,


correlation is said to be negative. In other words, if the value of one variable increases,
the value of the other variable decreases, or, if the value of one variable decreases,
the value of the other variable increases, e.g., the correlation between the price and
demand of a commodity is a negative correlation.

Price (` per unit) 10 8 6 5 4 1


Demand (units) 100 200 300 400 500 600
4.4 Methods of Studying Correlation 4.3

4.3.2 Simple and Multiple Correlations


Depending upon the study of the number of variables, correlation may be simple or
multiple.
1. Simple Correlation When only two variables are studied, the relationship is
described as simple correlation, e.g., the quantity of money and price level, demand
and price, etc.

2. Multiple Correlation When more than two variables are studied, the relationship
is described as multiple correlation, e.g., relationship of price, demand, and supply of
a commodity.

4.3.3 Partial and Total Correlations


Multiple correlation may be either partial or total.
1. Partial Correlation When more than two variables are studied excluding some
other variables, the relationship is termed as partial correlation.
2. Total Correlation When more than two variables are studied without excluding
any variables, the relationship is termed total correlation.

4.3.4 Linear and Nonlinear Correlations


Depending upon the ratio of change between two variables, the correlation may be
linear or nonlinear.
1. Linear Correlation If the ratio of change between two variables is constant, the
correlation is said to be linear. If such variables are plotted on a graph paper, a straight
line is obtained, e.g.,

Milk (l) 5 10 15 20 25 30
Curg (kg) 2 4 6 8 10 12

2. Nonlinear Correlation If the ratio of change between two variables is not


constant, the correlation is said to nonlinear. The graph of a nonlinear or curvilinear
relationship will be a curve, e.g.,

Advertising expenses (` in lacs) 3 6 9 12 15


Sales (` in lacs) 10 12 15 15 16

4.4 Methods of Studying Correlation

There are two different methods of studying correlation, (1) Graphic methods
(2) Mathematical methods.
Graphic methods are (a) scatter diagram, and (b) simple graph.
4.4 Chapter 4 Correlation and Regression

Mathematical methods are (a) Karl Pearson’s coefficient of correlation, and


(b) Spearman’s rank coefficient of correlation.

4.5 Scatter Diagram

The scatter diagram is a diagrammatic representation


of bivariate data to find the correlation between two
variables. There are various correlationships between
two variables represented by the following scatter
diagrams.
1. Perfect Positive Correlation If all the plotted
points lie on a straight line rising from the lower
left-hand corner to the upper right-hand corner, the Fig. 4.1
correlation is said to be perfectly positive (Fig. 4.1).

2. Perfect Negative Correlation If all the


plotted points lie on a straight line falling from the
upper-left hand corner to the lower right-hand corner,
the correlation is said to be perfectly negative
(Fig. 4.2). Fig. 4.2

3. High Degree of Positive Correlation If all


the plotted points lie in the narrow strip, rising from
the lower left-hand corner to the upper right-hand
corner, it indicates a high degree of positive correlation
(Fig. 4.3).
Fig. 4.3
4. High Degree of Negative Correlation If all
the plotted points lie in a narrow strip, falling from
the upper left-hand corner to the lower right-hand
corner, it indicates the existence of a high degree of
negative correlation (Fig. 4.4).

5. No Correlation If all the plotted points lie on a Fig. 4.4


straight line parallel to the x-axis or y-axis or in a
haphazard manner, it indicates the absence of any
relationship between the variables (Fig. 4.5).

Merits of a Scatter Diagram


1. It is simple and nonmathematical method to find
out the correlation between the variables. Fig. 4.5
4.7 Karl Pearson’s Coefficient of Correlation 4.5

2. It gives an indication of the degree of linear correlation between the


variables.
3. It is easy to understand.
4. It is not influenced by the size of extreme items.

4.6 Simple Graph

A simple graph is a diagrammatic representation of bivariate data to find the correlation


between two variables. The values of the two variables are plotted on a graph paper.
Two curves are obtained, one for the variable x and the other for the variable y. If both
the curves move in the same direction, the correlation is said to be positive. If both
the curves move in the opposite direction, the correlation is said to be negative. This
method is used in the case of a time series. It does not reveal the extent to which the
variables are related.

4.7 Karl Pearson’s Coefficient of Correlation

The coefficient of correlation is the measure of correlation between two random vari-
ables X and Y, and is denoted by r.
cov( X , Y )
r=
s XsY

where cov (X, Y) is the covariance of variables X and Y,
sX is the standard deviation of variable X,
and sY is the standard deviation of variable Y.
This expression is known as Karl Pearson’s coefficient of correlation or Karl Pearson’s
product-moment coefficient of correlation.
1
cov( X , Y ) =
n
 (x - x ) (y - y)

sX =
 ( x - x )2
n

sY =
 ( y - y )2
n

\      r =
 (x - x ) (y - y)
 ( x - x )2  ( y - y )2
The above expression can be further modified.
4.6 Chapter 4 Correlation and Regression

Expanding the terms,

r=
 ( xy - xy - xy + x y )
 (x 2 - 2 x x + x 2 )  (y2 - 2 yy + y 2 )
=
 xy - y  x - x  y + x y Â1
 x 2 - 2 x  x + x 2 Â1  y2 - 2 y  y + y 2 Â1
Ây Âx Âx Ây
 xy - n  x - n  y + n n ◊ n
=
2 2
Âx Ê Â xˆ Ây Ê Â yˆ
Âx 2
-2
n
 x+Á
Ë n ˜¯
n Ây 2
-2
n
 y+Á
Ë n ˜¯
n

Âx Ây
 xy - n
=
(Â x ) (Â y )
2 2

Âx 2
-
n
Ây 2
-
n

4.8 Properties of Coefficient of Correlation

1. The coefficient of correlation lies between —1 and 1, i.e., —1 £ r £ 1.

Proof Let x and y be the mean of x and y series and sx and sy be their respective
standard deviations.
2
Ê x- x y- yˆ È∵ sum of squares of real quantities˘
Let  Á s ± s ˜ ≥ 0 Í cannot be negative ˙
Ë x y ¯ Î ˚

 ( x - x )2 +  ( y - y )2 ±
2Â ( x - x ) ( y - y )
≥0
s x2 s y2 s xs y
n + n ± 2 nr ≥ 0
2 n ± 2 nr ≥ 0
2 n (1 ± r ) ≥ 0
1± r ≥ 0
i.e., 1+ r ≥ 0 or 1- r ≥ 0
r ≥ -1 or r £1
Hence, the coefficient of correlation lies between –1 and 1, i.e., –1 £ r £ 1.
4.8 Properties of Coefficient of Correlation 4.7

2. C
 orrelation coefficient is independent of change of origin and change
of scale.
x-a y-b
Proof Let d x = , dy =
h k
x = a + hd x , y = b + kd y

where a, b, h (>0) and k(>0) are constants.


x = a + hd x fi x = a + h d x fi x - x = h(d x - d x )
y = b + kd y fi y = b + h d y fi y - y = k (d y - d y )

rxy =
 (x - x ) (y - y)
 ( x - x )2  ( y - y )2
=
 h (d x - d x ) k (d y - d y )
 h 2 ( d x - d x )2  k 2 ( d y - d y )2
=
 (d x - d x ) (d y - d y )
 ( d x - d x )2 ( d y - d y )2
= rd x d y

Hence, the correlation coefficient is independent of change of origin and change of


scale.
Note Since correlation coefficient is independent of change of origin and change
of scale,
 dx  dy
 dx dy - n
r=
(Â dx ) (Â d )
2 2

 Â
y
d x2 - d y2 -
n n

3. Two independent variables are uncorrelated.

Proof If random variables X and Y are independent,

 (x - x ) (y - y) = 0 or cov ( X , Y ) = 0

\   r=0

Thus, if X and Y are independent variables, they are uncorrelated.


Note The converse of the above property is not true, i.e., two uncorrelated variables
may not be independent.
4.8 Chapter 4 Correlation and Regression

Example 1
Calculate the correlation coefficient between x and y using the following
data:
x 2 4 5 6 8 11
y 18 12 10 8 7 5

Solution
n=6
x y x2 y2 xy
2 18 4 324 36
4 12 16 144 48
5 10 25 100 50
6 8 36 64 48
8 7 64 49 56
11 5 121 25 55
Âx = 36 Ây = 60 Âx = 266
2
Ây­ = 706
2
Âxy = 293

Âx Ây
 xy - n
r=
(Â x ) (Â y )
2 2

Âx 2
-
n
Ây 2
-
n
(36)(60)
293 -
= 6
(36)2 (60)2
266 - 706 -
6 6
= -0.9203

Note Âx, Ây, Âx2, Ây2, Âxy can be directly obtained with the help of scientific
calculator.

Example 2
Calculate the coefficient of correlation from the following data:
x 12 9 8 10 11 13 7
y 14 8 6 9 11 12 3
4.8 Properties of Coefficient of Correlation 4.9

Solution
n=7
x y x2 y2 xy
12 14 144 196 168
9 8 81 64 72
8 6 64 36 48
10 9 100 81 90
11 11 121 121 121
13 12 169 144 156
7 3 49 9 21
Âx = 70 Ây = 63 Âx = 728 2
Ây­ = 651
2
Âxy = 676

Âx Ây
Âxy- n
r=
(Â x ) (Â y )
2 2

Âx 2
-
n
Ây 2
-
n
(70) (63)
676 -
= 7
(70)2 (63)2
728 - 651 -
7 7
= 0.949

Example 3
Calculate the coefficient of correlation for the following data:
x 9 8 7 6 5 4 3 2 1
y 15 16 14 13 11 12 10 8 9
4.10 Chapter 4 Correlation and Regression

Solution
n=9
x y x2 y2 xy
9 15 81 225 135
8 16 64 256 128
7 14 49 196 98
6 13 36 169 78
5 11 25 121 55
4 12 16 144 48
3 10 9 100 30
2 8 4 64 16
1 9 1 81 9
Âx = 45 Ây = 108 Âx = 285
2
Ây­ = 1356
2
Âxy = 597

Âx Ây
Âx y- n
r=
(Â x ) (Â y )
2 2

Âx 2
-
n
Ây 2
-
n
(45)(108)
597 -
= 9
(45)2 (108)2
285 - 1356 -
9 9
= 0.95

Example 4
Calculate the correlation coefficient between the following data:
x 5 9 13 17 21
y 12 20 25 33 35
4.8 Properties of Coefficient of Correlation 4.11

Solution
n=5

x=
 x = 65 = 13
n 5

y=
Ây =
125
= 25
n 5

x y x-x y-y ( x - x )2 ( y - y )2 ( x - x )( y - y )

5 12 –8 –13 64 169 104


9 20 –4 –5 16 25 20
13 25 0 0 0 0 0
17 33 4 8 16 64 32
21 35 8 10 64 100 80

Â( x - x ) Â( y - y ) Â( x - x )2 Â( y - y )2 Â ( x - x )( y - y )
Âx = 65 Ây = 125
=0 =0 = 160 = 358 = 236

r=
 ( x - x )( y - y )
 ( x - x )2  ( y - y )2
236
=
160 358
= 0.986

Note Since Âx, Ây, Âx2, Ây2, Âxy can be directly obtained with the help of scientific
calculator, correlation coefficient can be calculated without using mean.

Example 5
Calculate the correlation coefficient between for the following values of
demand and the corresponding price of a commodity:
Demand in Quintals 65 66 67 67 68 69 70 72
Price in rupees per kg 67 68 65 68 72 72 69 71
4.12 Chapter 4 Correlation and Regression

Solution
Let the demand in quintal be denoted by x and the price in rupees per kg be denoted
by y.
n=8

x=
 x = 544 = 68
n 8

y=
 y = 552 = 69
n 8

x y x-x y-y ( x - x )2 ( y - y )2 ( x - x )( y - y )

65 67 –3 –2 9 4 6
66 68 –2 –1 4 1 2
67 65 –1 –4 1 16 4
67 68 –1 –1 1 1 1
68 72 0 3 0 9 0
69 72 1 3 1 9 3
70 69 2 0 4 0 0
72 71 4 2 16 4 8
Â( x - x ) Â( y - y ) Â( x - x )2 Â( y - y )2 Â ( x - x )( y - y )
Âx = 544 Ây = 552
=0 =0 = 36 = 44 = 24

r=
 ( x - x )( y - y )
 ( x - x )2  ( y - y )2
24
=
36 44
= 0.603

Example 6
Calculate the coefficient of correlation for the following pairs of
x and y:
x 17 19 21 26 20 28 26 27
y 23 27 25 26 27 25 30 33
4.8 Properties of Coefficient of Correlation 4.13

Solution
Let a = 23 and b = 27 be the assumed means of x and y series respectively.
d x = x - a = x - 23
d y = y - b = y - 27
n=8

x y dx dy dx2 dy2 dx dy
17 23 –6 –4 36 16 24
19 27 –4 0 16 0 0
21 25 –2 –2 4 4 4
26 26 3 –1 9 1 –3
20 27 –3 0 9 0 0
28 25 5 –2 25 4 –10
26 30 3 3 9 9 9
27 33 4 6 16 36 24
Âdx = 0 Âdy = 0 Âdx2 = 124 Âdy2 = 70 Âdx dy = 48

 dx  dy
 dx dy - n
r=
(Â dx ) (Â d )
2 2

 Â
y
d x2 - d y2 -
n n
48 - 0
=
124 - 0 70 - 0
= 0.515

Note Since Âx, Ây, Âx2, Ây2, Âxy can be directly obtained with the help of scientific
calculator, the correlation coefficient can be calculated without using assumed mean.

Example 7
Calculate the correlation coefficient from the following data:
x 23 27 28 29 30 31 33 35 36 39
y 18 22 23 24 25 26 28 29 30 32
4.14 Chapter 4 Correlation and Regression

Solution
Let a = 30 and b = 25 be the assumed means of x and y series respectively.
d x = x - a = x - 30
d y = y - b = x - 25
n = 10

x y dx dy dx2 dy2 dx dy
23 18 –7 –7 49 49 49
27 22 –3 –3 9 9 9
28 23 –2 –2 4 4 4
29 24 –1 –1 1 1 1
30 25 0 0 0 0 0
31 26 1 1 1 1 1
33 28 3 3 9 9 9
35 29 5 4 25 16 20
36 30 6 5 36 25 30
39 32 9 7 81 49 63
Âdx = 11 Âdy = 7 Âdx2 = 215 Âdy2 = 163 Âdx dy = 186

 dx  dy
 dx dy - n
r=
(Â d x ) (Â d )
2 2

 Â
y
d x2 - d y2 -
n n
(11)(7)
186 -
= 10
(111)2 ( 7) 2
215 - 163 -
10 10
= 0.996

Example 8
Calculate the coefficient of correlation between the ages of cars and
annual maintenance costs.
Age of cars (year) 2 4 6 7 8 10 12
Annual maintenance cost
1600 1500 1800 1900 1700 2100 2000
(`)
4.8 Properties of Coefficient of Correlation 4.15

Solution
Let the ages of cars in years be denoted by x and annual maintenance costs in rupees
be denoted by y.
Let a = 7 and b = 1800 be the assumed means of x and y series respectively.
Let h = 1,   k = 100
x-a x-7
dx = = = x-7
h 1
y - b y - 1800
dy = =
k 100
n=7

x y dx dy dx2 dy2 dxdy


2 1600 –5 –2 25 4 10
4 1500 –3 3 9 9 9
6 1800 –1 0 1 0 0
7 1900 0 1 0 1 0
8 1700 1 –1 1 1 –1
10 2100 3 3 9 9 9
12 2000 5 2 25 4 10
Âdx = 0 Âdy = 0 Âdx2 = 70 Âdy2 = 28 Âdxdy = 37

 dx  dy
 dx dy - n
r=
(Â d x ) (Â d )
2 2

 Â
y
d x2 - d y2 -
n n
37 - 0
=
70 - 0 28 - 0
= 0.836

Example 9
Calculate Karl Pearson’s coefficient of correlation for the data given
below:
x 10 14 18 22 26 30
y 18 12 24 6 30 36
4.16 Chapter 4 Correlation and Regression

Solution
Let a = 22 and b = 24 be the assumed means of x and y series respectively.
Let h = 4, k = 6
x - a x - 22
dx = =
h 4
y - b y - 24
dy = =
k 6
n=6

x y dx dy dx2 dy2 dx dy
10 18 –3 –1 9 1 3
14 12 –2 –2 4 4 4
18 24 –1 0 1 0 0
22 6 0 –3 0 9 0
26 30 1 1 1 1 1
30 36 2 2 4 4 4
Âdx = –3 Âdy = –3 Âdx2 = 19 Âdy2 = 19 Âdx dy = 12

 dx  dy
 dx dy - n
r=
(Â d x ) (Â d )
2 2

 Â
y
d x2 - d y2 -
n n
(-3)(-3)
12 -
= 6
(-3)2 (-3)2
19 - 19 -
6 6
= 0.6

Example 10
The coefficient of correlation between two variables X and Y is 0.48. The
covariance is 36. The variance of X is 16. Find the standard deviation
of Y.
Solution
r = 0.48,   cov(X, Y) = 36,   sX2 = 16
\ sX = 4
4.8 Properties of Coefficient of Correlation 4.17

cov ( X , Y )
r=
s X sY
36
0.48 =
4 sY
\ s Y = 18.75

Example 11
Given n = 10, sX = 5.4, sY = 6.2, and sum of the product of deviations
from the mean of x and y is 66. Find the correlation coefficient.
Solution
n = 10, s X = 5.4, s Y = 6.2
 ( x - x )( y - y ) = 66

sX =
 ( x - x )2
n

5.4 =
 ( x - x )2
10
\ Â (x - x ) 2
= 291.6

sY =
 ( y - y )2
n

6.2 =
 ( y - y )2
10
\ Â (y - y) 2
= 384.4

r=
 ( x - x )( y - y )
 ( x - x )2  ( y - y )2
66
=
291.6 384.4
= 0.197

Example 12
From the following information, calculate the value of n.
 x = 4,  y = 4,  x 2 = 44,  y2 = 44,  xy = -40, r = -1
4.18 Chapter 4 Correlation and Regression

Solution
Âx Ây
 xy - n
r=
(Â x ) (Â y )
2 2

Âx 2
-
n
Ây 2
-
n
(4)(4)
-40 -
-1 = n
( 4 )2 ( 4 )2
44 - 44 -
n n
\ n=8

Example 13
From the following data, find the number of items n.
r = 0.5, Â ( x - x )( y - y ) = 120, s Y = 8, Â ( x - x )2 = 90

Solution

sY =
 ( y - y )2
n

8=
 ( y - y )2
n
 (y - y) 2
= 64 n

r=
 ( x - x )( y - y )
 ( x - x )2  ( y - y )2
120
0.5 =
90 64 n
\ n = 10

Example 14
Calculate the correlation coefficient between x and y from the following
data:
n = 10, Â x = 140, Â y = 150, Â ( x - 10)2 = 180
 ( y - 15)2 = 215,  ( x - 10) ( y - 15) = 60
4.8 Properties of Coefficient of Correlation 4.19

Solution
 dx2 =  ( x - 10)2 = 180
 dy2 =  ( y - 15)2 = 215
 dx dy =  ( x - 10) ( y - 15) = 60
a = 10
b = 15
n = 10

x=
 x = 140 = 14
n 10

y=
Ây 150
= = 15
n 10

x = a+
 dx
n

14 = 10 +
 dx
  10
\ Â dx = 40
y = b+
 dy
n

15 = 15 +
 dy
10
\ Â dy = 0
 dx  dy
 dx dy - n
r=
(Â d x ) (Â d )
2 2

 Â
y
d x2 - d y2 -
n n
(40)(0)
60 -
= 10
( 40 ) 2 0
180 - 215 -
10 10
   = 0.915

Example 15
A computer operator while calculating the coefficient between two
variates x and y for 25 pairs of observations obtained the following
constants:
4.20 Chapter 4 Correlation and Regression

n = 25, Â x = 125, Â x 2 = 650, Â y = 100,


 y2 = 460,  xy = 508
It was later discovered at the time of checking that he had copied down
two pairs as (6, 14) and (8, 6) while the correct pairs were (8,12) and
(6, 8). Obtain the correct value of the correlation coefficient.
Solution
n = 25
Corrected  x = Incorrect  x - (Sum of incorrect x ) + (Sum of correct x )
= 125 - (6 + 8) + (8 + 6)
= 125

Similarly,
Corrected  y = 100 - (14 + 6) + (12 + 8) = 100
Corrected  x 2 = 650 - (62 + 82 ) + (82 + 62 ) = 650
Corrected  y 2 = 460 - (142 + 62 ) + (122 + 82 ) = 436
Corrected  xy = 508 - (84 + 48) + (96 + 48) = 520

Correct value of correlation coefficient


 x y
 xy - n
r=
(Â x ) (Â y )
2 2

Âx 2
-
n
Ây 2
-
n
(125)(100)
520 -
= 25
(125)2 (100)2
650 - 436 -
25 25
       = 0.67

Exercise 4.1

1. Draw a scatter diagram to represent the following data:


x 2 4 5 6 8 11
y 18 12 10 8 7 5
  Calculate the coefficient of correlation between x and y.
 [Ans.: —0.92]
4.8 Properties of Coefficient of Correlation 4.21

2. Find the coefficient of correlation between x and y for the following


data:
x 10 12 18 24 23 27
y 13 18 12 25 30 10
 [Ans.: 0.223]
3. From the following information relating to the stock exchange quotations
for two shares A and B, ascertain by using Pearson’s coefficient of
correlation how shares A and B are correlated in their prices?
Price share (A) ` 160 164 172 182 166 170 178
Price share (B) ` 292 280 260 234 266 254 230
 [Ans.: —0.96]
4. Find the correlation coefficient between the income and expenditure
of a wage earner.
Month Jan Feb Mar Apr May Jun Jul
Income 46 54 56 56 58 60 62
Expenditure 36 40 44 54 42 58 54
 [Ans.: 0.769]
5. From the following data, examine whether the input of oil and output
of electricity can be said to be correlated.
Input of oil 6.9 8.2 7.8 4.8 9.6 8.0 7.7
Output of Electricity 1.9 3.5 6.5 1.3 5.5 3.5 2.2
 [Ans.: 0.696]
6. For the following data, show that cov (x, x2) = 0.
x —3 —2 —1 0 1 2 3
2
x 9 4 1 0 1 4 9
7. Find the coefficient of correlation between x and y for the following
data:

x 62 64 65 69 70 71 72 74
y 126 125 139 145 165 152 180 208
 [Ans.: 0.9032]
8. The following data gave the growth of employment in lacs in the
organized sector in India between 1988 and 1995:

Year 1988 1989 1990 1991 1992 1993 1994 1995


Public sector 98 101 104 107 113 120 125 128
Private sector 65 65 67 68 68 69 68 68
4.22 Chapter 4 Correlation and Regression

 Find the correlation coefficient between the employment in public and


private sectors.
 [Ans.: 0.77]
9. Calculate Karl Pearson’s coefficient of correlation from the following
data, using 20 as the working mean for price and 70 as working mean
for demand.
Price 14 16 17 18 19 20 21 22 23
Demand 84 78 70 75 66 67 62 58 60
 [Ans.: —0.954]
10. A sample of 25 pairs of values x and y lead to the following results:

 x = 127,  y = 100,  x 2
= 760, Â y 2 = 449, Â xy = 500

Later on, it was found that two pairs of values were taken as (8, 14)
and (8, 6) instead of the correct values (8, 12) and (6, 8). Find the
corrected coefficient between x and y.
 [Ans.: —0.31]

4.9 Rank Correlation


Let a group of n individuals be arranged in order of merit with respect to some
characteristics. The same group would give a different order (rank) for different
characteristics. Considering the orders corresponding to two characteristics A and B,
the correlation between these n pairs of ranks is called the rank correlation in the
characteristics A and B for that group of individuals.

4.9.1 Spearman’s Rank Correlation Coefficient


Let x, y be the ranks of the ith individuals in two characteristics A and B respectively
where i = 1, 2, ..., n. Assuming that no two individuals have the same rank either for x
or y, each of the variables x and y take the values 1, 2, ..., n.
1 + 2 + 3 +  + n n(n + 1) n + 1
x=y= = =
     n 2n 2

 ( x - x )2 =  ( x 2 - 2 x x + x 2 )
=  x2 - 2 x  x + x 2 Â1
= Â x 2 - 2 nx 2 + nx 2 ÈÎ∵ Â x = nx and Â1 = n˘˚
= Â x2 - n x 2
2
Ê n + 1ˆ
= (12 + 22 +  + n2 ) - n Á
Ë 2 ˜¯
4.9 Rank Correlation 4.23

n(n + 1)(2 n + 1) n(n + 1)2


= -
6 4
1
= ( n3 - n)
12
1
Similarly, Â ( y - y )2 = 12 (n3 - n)
If d denotes the difference between the ranks of the ith individuals in the two
variables,
d = x - y = (x - x ) - (y - y ) [∵ x = y ]
Squaring and summing over i from 1 to n,

 d 2 =  [( x - x ) - ( y - y )]
2

= Â ( x - x )2 + Â ( y - y )2 - 2 Â ( x - x ) ( y - y )
1
 ( x - x ) ( y - y ) = 2 ÈΠ( x - x )2 +  ( y - y )2 -  d 2 ˘˚
1 3 1
( n - n) - Â d 2
=
12 2
Hence, the coefficient of correlation between these variables is

r=
 ( x - x )( y - y )
 ( x - x )2  ( y - y )2
1 3 1
( n - n) - Â d 2
= 12 2
1 3
( n - n)
12
6 Â d2
= 1- 3
n -n
6 Â d2
= 1-
       n(n2 - 1)
This is called Spearman’s rank correlation coefficient and is denoted by r.
Note  d =  ( x - y) =  x -  y = n ( x - y ) = 0

Example 1
Ten participants in a contest are ranked by two judges as follows:
x 1 3 7 5 4 6 2 10 9 8
y 3 1 4 5 6 9 7 8 10 2

Calculate the rank correlation coefficient.


4.24 Chapter 4 Correlation and Regression

Solution
   n = 10

Rank by first Rank by second


d=x–y d2
Judge x Judge y
1 3 –2 4
3 1 2 4
7 4 3 9
5 5 0 0
4 6 –2 4
6 9 –3 9
2 7 –5 25
10 8 2 4
9 10 –1 1
8 2 6 36
Âd = 0 Âd2 = 96

6 Â d2
r = 1-
n(n2 - 1)
6 (96)
= 1-
10 ÈÎ(10)2 - 1˘˚
= 0.418

Example 2
Ten competitors in a musical test were ranked by the three judges A, B,
and C in the following order:
Rank by A 1 6 5 10 3 2 4 9 7 8
Rank by B 3 5 8 4 7 10 2 1 6 9
Rank by C 6 4 9 8 1 2 3 10 5 7

Using the rank correlation method, find which pair of judges has the
nearest approach to common liking in music. [Summer 2015]
Solution
n = 10
4.9 Rank Correlation 4.25

Rank Rank Rank


d1 = d2 = d3 =
by A by B by C d12 d22 d32
x–y y–z z–x
x y z
1 3 6 –2 –3 5 4 9 25
6 5 4 1 1 –2 1 1 4
5 8 9 –3 –1 4 9 1 16
10 4 8 6 –4 –2 36 16 4
3 7 1 –4 6 –2 16 36 4
2 10 2 –8 8 0 64 64 0
4 2 3 2 –1 –1 4 1 1
9 1 10 8 –9 1 64 81 1
7 6 5 1 1 –2 1 1 4
8 9 7 –1 2 –1 1 4 1
Âd1 = 0 Âd2 = 0 Âd3 = 0 Âd12 = 200 Âd22 = 214 Âd32 = 60

6 Â d12
r ( x, y ) = 1 -
n (n2 - 1)
6 (200)
= 1-
10 ÈÎ(10)2 - 1˘˚
= -0.21
6 Â d22
r ( y, z ) = 1 -
n (n2 - 1)
6 (214)
= 1-
10 ÈÎ(10)2 - 1˘˚
= -0.296
6 Â d32
r ( z, x ) = 1 -
n (n2 - 1)
6 (60)
= 1-
10 ÈÎ(10)2 - 1˘˚
= 0.64
Since r (z, x) is maximum, the pair of judges A and C has the nearest common
approach.

Example 3
Ten students got the following percentage of marks in mathematics and
physics:
4.26 Chapter 4 Correlation and Regression

Mathematics (x) 8 36 98 25 75 82 92 62 65 35
Physics (y) 84 51 91 60 68 62 86 58 35 49

Find the rank correlation coefficient.


Solution
n = 10
Rank in Rank in
x y d=x–y d2
mathematics x Physics y
8 84 10 3 7 49
36 51 7 8 –1 1
98 91 1 1 0 0
25 60 9 6 3 9
75 68 4 4 0 0
82 62 3 5 –2 4
92 86 2 2 0 0
62 58 6 7 –1 1
65 35 5 10 –5 25
35 49 8 9 –1 1
2
Âd = 0 Âd = 90

6 Â d2
r = 1-
n (n2 - 1)
6 (90)
= 1-
10 Î(10)2 - 1˘˚
È
= 0.455

Example 4
The coefficient of rank correlation of the marks obtained by 10 students
in physics and chemistry was found to be 0.5. It was later discovered
that the difference in ranks in the two subjects obtained by one of the
students was wrongly taken as 3 instead of 7. Find the rank coefficient
of the rank correlation.
Solution
n = 10
4.9 Rank Correlation 4.27

6 Â d2
r = 1-
n (n2 - 1)
6 Â d2
0.5 = 1 -
10 (100 - 1)
\
Âd 2
= 82.5

Correct  d 2 = Incorrect  d 2 - (Incorrect rank difference)2


+ (Correct rank difference)2
= 82.5 - (3)2 + (7)2
= 122.5
6(122.5)
Correct coefficient of rank correlation r = 1 -
10(100 - 1)
= 0.26

4.9.2 Tied Ranks


If there is a tie between two or more individuals ranks, the rank is divided among equal
individuals, e.g., if two items have fourth rank, the 4th and 5th rank is divided between
4+5
them equally and is given as = 4.5th rank to each of them. If three items have
2
4+5+6
the same 4th rank, each of them is given = 5th rank. As a result of this, the
3
following adjustment or correction is made in the rank correlation formula. If m is the
1
number of item having equal ranks then the factor (m3 - m) is added to  d 2 . If
12
there are more than one cases of this type, this factor is added corresponding to each
case.
È 1 1 ˘
6 ÍÂ d 2 + (m13 - m1 ) + (m23 - m2 ) + ˙
r = 1- Î 12 12 ˚
2
n(n - 1)

Example 1
Obtain the rank correlation coefficient from the following data:
x 10 12 18 18 15 40
y 12 18 25 25 50 25

Solution
Here, n = 6
4.28 Chapter 4 Correlation and Regression

x y Rank x Rank y d=x–y d2


10 12 1 1 0 0
12 18 2 2 0 0
18 25 4.5 4 0.5 0.25
18 25 4.5 4 0.5 0.25
15 50 3 6 –3 9
40 25 6 4 2 4
Âd = 13.5
2

There are two items in the x series having equal values at the rank 4. Each is given the
rank 4.5. Similarly, there are three items in the y series at the rank 3. Each of them is
given the rank 4.
m1 = 2, m2 = 3
È 1 1 ˘
6 ÍÂ d 2 + (m13 - m1 ) + (m23 - m2 )˙
r = 1- Î 12 12 ˚
2
n(n - 1)
È 1 1 ˘
6 Í13.50 + (8 - 2) + (27 - 3)˙
= 1- Î 12 12 ˚
6 ÈÎ(6) - 1˘˚
2

= 0.5429

Exercise 4.2

1. Compute Spearman’s rank correlation coefficient from the following


data:
x 18 20 34 52 12
y 39 23 35 18 46
 [Ans.: —0.9]
2. Two judges gave the following ranks to a series of eight one-act plays
in a drama competition. Examine the relationship between their
judgements.
Judge A 8 7 6 3 2 1 5 4
Judge B 7 5 4 1 3 2 6 8
 [Ans.: 0.62]
3. From the following data, calculate Spearman’s rank correlation between
x and y.
4.10 Regression 4.29

x 36 56 20 42 33 44 50 15 60
y 50 35 70 58 75 60 45 80 38
 [Ans.: 0.92]
4. Ten competitors in a voice test are ranked by three judges in the
following order:
Rank by First Judge 6 10 2 9 8 1 5 3 4 7
Rank by Second Judge 5 4 10 1 9 3 8 7 2 6
Rank by Third Judge 4 8 2 10 7 6 9 1 3 6
Use the method of rank correlation to gauge which pairs of judges has
the nearest approach to common liking in voice.
 [Ans.: The first and third judge]
5. The following table gives the scores obtained by 11 students in English
and Tamil translation. Find the rank correlation coefficient.
Scores in English 40 46 54 60 70 80 82 85 85 90 95
Scores in Tamil 45 45 50 43 40 75 55 72 65 42 70
 [Ans.: 0.36]
6. Calculate Spearman’s coefficient of rank correlation for the following
data:
x 53 98 95 81 75 71 59 55
y 47 25 32 37 30 40 39 45
 [Ans.: —0.905]
7. Following are the scores of ten students in a class and their IQ:
Score 35 40 25 55 85 90 65 55 45 50
IQ 100 100 110 140 150 130 100 120 140 110
  Calculate the rank correlation coefficient between the score IQ.
 [Ans.: 0.47]

4.10 Regression

Regression is defined as a method of estimating the value of one variable when that
of the other is known and the variables are correlated. Regression analysis is used to
predict or estimate one variable in terms of the other variable. It is a highly valuable tool
for prediction purpose in economics and business. It is useful in statistical estimation
of demand curves, supply curves, production function, cost function, consumption
function, etc.
4.30 Chapter 4 Correlation and Regression

4.11 Types of REgression

Regression is classified into two types:


1. Simple and multiple regressions
2. Linear and nonlinear regressions

4.11.1 Simple and Multiple Regressions


Depending upon the study of the number of variables, regression may be simple or
multiple.
1. Simple Regression The regression analysis for studying only two variables at a
time is known as simple regression.

2. Multiple Regression The regression analysis for studying more than two
variables at a time is known as multiple regression.

4.11.2 Linear and Nonlinear Regressions


Depending upon the regression curve, regression may be linear or nonlinear.
1. Linear Regression If the regression curve is a straight line, the regression is
said to be linear.

2. Nonlinear Regression If the regression curve is not a straight line i.e., not a
first-degree equation in the variables x and y, the regression is said to be nonlinear
or curvilinear. In this case, the regression equation will have a functional relation
between the variables x and y involving terms in x and y of the degree higher than one,
i.e., involving terms of the type x2, y2, x3, y3, xy, etc.

4.12 Methods of Studying Regression

There are two methods of studying correlation:


(i) Method of scatter diagram
(ii) Method of least squares

4.12.1 Method of Scatter Diagram


It is the simplest method of obtaining the lines of regression. The data are plotted
on a graph paper by taking the independent variable on the x-axis and the dependent
variable on the y-axis. Each of these points are generally scattered in a narrow strip. If
the correlation is perfect, i.e., if r is equal to one, positive, or negative, the points will
lie on a line which is the line of regression.
4.14 Regression Coefficients 4.31

4.12.2 Method of Least Squares


This is a mathematical method which gives an objective treatment to find a line of
regression. It is used for obtaining the equation of a curve which fits best to a given set of
observations. It is based on the assumption that the sum of squares of differences between
the estimated values and the actual observed values of the observations is minimum.

4.13 Lines of Regression

If the variables, which are highly correlated, are plotted on a graph then the points lie
in a narrow strip. If all the points in the scatter diagram cluster around a straight line,
the line is called the line of regression. The line of regression is the line of best fit and
is obtained by the principle of least squares.
Line of Regression of y on x
It is the line which gives the best estimate for the values of y for any given values of x.
The regression equation of y on x is given by
sy
y-y =r (x - x )
sx
It is also written as
y = a + bx
Line of Regression of x on y
It is the line which gives the best estimate for the values of x for any given values of y.
The regression equation for x on y is given by
sx
x-x =r (y - y)
sy
It is also written as
x = a + by
where x and y are means of x series and y series respectively, sx and sy are standard
deviations of x series and y series respectively, r is the correlation coefficient between
x and y.

4.14 Regression Coefficients

The slope b of the line of regression of y on x is also called the coefficient of regression
of y on x. It represents the increment in the value of y corresponding to a unit change
in the value of x.
byx = Regression coefficient of y on x
sy
=r
  sx
4.32 Chapter 4 Correlation and Regression

Similarly, the slope b of the line of regression of x on y is called the coefficient of


regression of x on y. It represents the increment in the value of x corresponding to a
unit change in the value of y.
bxy = Regression coefficient of x on y
s
=r x
sy

Expressions for Regression Coefficients


(i) We know that

r=
 (x - x ) (y - y)
 ( x - x )2  ( y - y )2
sx =
 ( x - x )2
n

sy =
 ( y - y )2
n
sy
byx = r
sx

=
 (x - x ) (y - y)
 ( x - x )2
sx
and bxy = r
sy

=
 ( x - x )( y - y )
 ( y - y )2
(ii) We know that
 x y
 xy - n
r=
(Â x ) (Â y )
2 2

Âx 2
-
n
Ây 2
-
n

(Â x )
2

sx = Â x2 - n

(Â y )
2

sy = Ây 2
-
n
4.14 Regression Coefficients 4.33

sy
byx = r
sx
 x y
 xy - n
=
(Â x )
2

Âx 2
-
n
sx
and bxy = r
sy

 x y
 xy - n
=
(Â y )
2

Ây 2
-
n
(iii) We know that
 dx  dy
 dx dy - n
r=
(Â d x ) (Â d )
2 2

 Â
y
d x2 - d y2 -
n n

(Â d x )
2

sx = Â dx2 - n

(Â d )
2

Â
y
sy = d y2 -
n
sy
byx = r
sx
 dx  dy
 dx dy - n
=
(Â d x )
2

 d x2 -
n
sx
and bxy = r
sy

 dx  dy
 dx dy - n
=
(Â d )
2

Â
y
d y2 -
n
4.34 Chapter 4 Correlation and Regression

4.15 Properties of Regression Coefficients

1. T
 he coefficient of correlation is the geometric mean of the coefficients of
regression, i.e., r = byx bxy .

Proof We know that


sy
byx = r
sx
sx
bxy = r
sy
sy sx
byx bxy = r ◊r
sx sy
2
=r
r = byx bxy

2. I f one of the regression coefficients is greater than one, the other must be less
than one.
Proof Let byx > 1

We know that
r 2 £ 1 and r 2 = byx bxy
byx bxy £ 1
1
byx £
bxy

Hence, if byx < 1, bxy > 1

3. T
 he arithmetic mean of regression coefficients is greater than or equal to the
coefficient of correlation.
Proof We have to prove that
1
(b + bxy ) ≥ r
2 yx
1 Ê sy s ˆ
i.e., Á r +r x˜ ≥r
2 Ë sx sy ¯

sy sx
i.e.,      + ≥2
sx sy

i.e.,   s y2 + s x2 - 2s xs y ≥ 0
4.15 Properties of Regression Coefficients 4.35

i.e., (s y - s x )2 ≥ 0
which is always true, since the square of a real quantity is 1 ≥ 0.
4. Regression Coefficients are independent of the change of origin but not of
scale.

x-a y-b
Proof Let dx = , dy =
h k
x = a + hd x , y = b + kd y
where a, b, h (> 0) and k(> 0) are constants.
1 1
rd x d y = rxy , s d2x = 2 s x2 , s d2y = 2 s y2
   h k
sd
bd x d y = rd x d y x
s dy
sx k
= rxy
h sy
k s
= rxy x
h sy
k
= b
   h xy
h
Similarly, bd y d x = b
k yx
5. B oth regression coefficients will have the same sign i.e., either both are positive
or both are negative.
6. The sign of correlation is same as that of the regression coefficients, i.e., r > 0 if
bxy > 0 and byx > 0; and r < 0 if bxy < 0 and byx < 0.

4.16 Properties of Lines of Regression


(Linear Regression)

1. The two regression lines x on y and y on x always intersect at their means


(x, y ) .
2. Since r2 = byx bxy, i.e., r = byx bxy , therefore, r, byx, bxy all have the same
sign.
3. If r = 0, the regression coefficients are zero.
4. The regression lines become identical if r = ±1. It follows from the regression
equations that x = x and y = y . If r = 0, these lines are perpendicular to each
other.
4.36 Chapter 4 Correlation and Regression

Example 1
The regression lines of a sample are x + 6y = 6 and 3x + 2y = 10. Find
(i) sample means x and y , and
(ii) the coefficient of correlation between x and y.
(iii) Also estimate y when x = 12.
Solution
(i) The regression lines pass through the point ( x , y ) .
x + 6y = 6  ...(1)
3 x + 2 y = 10  ...(2)
Solving Eqs (1) and (2),
1
x = 3, y =
2
(ii) Let the line x + 6y = 6 be the line of regression of y on x.
6y = -x + 6
1
y = - x +1
6
1
\ byx = -
6
Let the line 3x + 2y = 10 be the line of regression of x on y.
3 x = -2 y + 10
2 10
x=- y+
3 3
2
\ bxy = -
3

Ê 1ˆ Ê 2ˆ 1
r = byx bxy = Á - ˜ Á - ˜ =
Ë 6¯ Ë 3¯ 3

Since byx and bxy are negative, r is negative.
1
r=-
3
Estimated value of y when x = 12 is
1
y = - (12) + 1 = -1
6
4.15 Properties of Regression Coefficients 4.37

Example 2
If the two lines of regression are 4x – 5y + 30 = 0 and 20x – 9y – 107 = 0,
which of these are lines of regression of x on y and y on x? Find rxy and
sy when sx = 3.
Solution
For the line 4x – 5y + 30 = 0,
     –5y = – 4x – 30
      y = 0.8 x + 6
\    byx = 0.8
For the line 20x – 9y – 107 = 0
     20x = 9y + 107
      x = 0.45y + 5.35
\      bxy = 0.45
Both byx and bxy are positive.
Hence, line 4x – 5y + 30 = 0 is the line of regression of y one x and line
20x – 9y – 107 = 0 is the line of regression of x on y.
r = byx bxy = (0.8)(0.45) = 0.6
sy
byx = r
sx
Ê sy ˆ
0.8 = 0.6 Á ˜
Ë 3 ¯
\ sy = 4

Example 3
The following data regarding the heights (y) and weights (x) of 100
college students are given:
 x = 15000,  x 2 = 2272500,  y = 6800
 y2 = 463025,  xy = 1022250
Find the coefficient of correlation between height and weight and also
the equation of regression of height and weight.
Solution
  n = 100
4.38 Chapter 4 Correlation and Regression

 x y
 xy - n
byx =
(Â x )
2

Âx 2
-
n
(15000)(6800)
1022250 -
= 100
(15000)2
2272500 -
100
= 0.1
 x y
 xy - n
bxy =
(Â y )
2

Ây 2
-
n
(15000)(6800)
1022250 -
= 100
(6800)2
463025 -
100
= 3.6

r = byx bxy = (0.1)(3.6) = 0.6


x=
 x = 15000 = 150
n 100

y=
 y = 6800 = 68
   n 100
The equation of the line of regression of y on x is

y - y = byx ( x - x )
y - 68 = 0.1( x - 150)
y = 0.1x + 53
The equation of the line of regression of x on y is

x - x = bxy ( y - y )
x - 150 = 3.6( y - 68)
x = 3.6 y - 94.8
4.15 Properties of Regression Coefficients 4.39

Example 4
For a bivariate data, the mean value of x is 20 and the mean value of y is
1
45. The regression coefficient of y on x is 4 and that of x on y is .
9
Find
(i) the coefficient of correlation, and
(ii) the standard deviation of x if the standard deviation of y is 12.
(iii) Also write down the equations of regression lines.
Solution
1
x = 20, y = 45, byx = 4, bxy =
9

Ê 1ˆ 2
   (i) r = byx bxy = (4) Á ˜ = = 0.667
Ë 9¯ 3

sy
(ii) byx = r
sx
2 Ê 12 ˆ
4=
3 ÁË s x ˜¯
\ sx = 2
(iii) The equation of the regression line of y on x is
y - y = byx ( x - x )
y - 45 = 4( x - 20)
y = 4 x - 35

The equation of the regression line of x on y is
x - x = bxy ( y - y )
1
x - 20 = ( y - 45)
9
1
x = y + 15
9

Example 5
From the following results, obtain the two regression equations and
estimate the yield when the rainfall is 29 cm and the rainfall, when the
yield is 600 kg:
4.40 Chapter 4 Correlation and Regression

Yield in kg Rainfall in cm
Mean 508.4 26.7
SD 36.8 4.6
The coefficient of correlation between yield and rainfall is 0.52.
Solution
Let rainfall in cm be denoted by x and yield in kg be denoted by y.
x = 26.7, y = 508.4, s x = 4.6, s y = 36.8, r = 0.52

sy
byx = r
sx
Ê 36.8 ˆ
= 0.52 Á
Ë 4.6 ˜¯
= 4.16
sx
bxy = r
sy
Ê 4.6 ˆ
= 0.52 Á
Ë 36.8 ˜¯
= 0.065
The equation of the line of regression of y on x is
y - y = byx ( x - x )
y - 508.4 = 4.16 ( x - 26.7)
y = 4.16 x + 397.328
The equation of the line of regression of x on y is
x - x = bxy ( y - y )
x - 26.7 = 0.065( y - 508.4)
x = 0.065 y - 6.346
Estimated yield when the rainfall is 29 cm is
y = 4.16 (29) + 397.328 = 517.968 kg
Estimated rainfall when the yield is 600 kg is
x = 0.065 (600) – 6.346 = 32.654 cm

Example 6
Find the regression coefficients byx and bxy and hence, find the correlation
coefficient between x and y for the following data:
4.15 Properties of Regression Coefficients 4.41

x 4 2 3 4 2

y 2 3 2 4 4

Solution
n=5

x y x2 y2 xy

4 2 16 4 8

2 3 4 9 6

3 2 9 4 6

4 4 16 16 16

2 4 4 16 8

Sx = 15 Sy = 15 Sx2 = 49 Sy2 = 49 Sxy = 44

 x y
 xy - n
byx =
(Â x )
2

Âx 2
n
-

(15)(15)
44 -
= 5
(15)2
49 -
5
= - 0.25

 x y
 xy - n
bxy =
(Â y)
2

Ây 2
n
-

(15)(15)
44 -
= 5
(15)2
49 -
5
= - 0.25

r = byx bxy = (-0.25)(-0.25) = 0.25

Since byx and bxy are negative, r is negative.


r = – 0.25
4.42 Chapter 4 Correlation and Regression

Note Âx, Ây, Âx2, Ây2, Âxy can be directly obtained with the help of scientific cal-
culator.

Example 7
The following data give the experience of machine operators and their
performance rating as given by the number of good parts turned out per
100 pieces.
Operator 1 2 3 4 5 6
Performance rating (x) 23 43 53 63 73 83
Experience (y) 5 6 7 8 9 10

Calculate the regression line of performance rating on experience and


also estimate the probable performance if an operator has 11 years of
experience. [Summer 2015]

Solution
   n=6
x y y2 xy
23 5 25 115
43 6 36 258
53 7 49 371
63 8 64 504
73 9 81 657
83 10 100 830
Âx = 338 Ây = 45 Ây = 355
2
Âxy = 2735

 x y
 xy - n
bxy =
(Â y )
2

Ây 2
n
-

(338)(45)
2735 -
= 6
(45)2
355 -
6
= 11.429
4.15 Properties of Regression Coefficients 4.43

x=
 x = 338 = 56.33
n 6

y=
Ây =
45
= 7.5
   n 6
The equation of regression line of x on y is
x - x = bxy ( y - y )
x - 56.33 = 11.429( y - 7.5)
x = 11.429 y - 29.3875
     
Estimated performance if y = 11 is
  x = 11.429(11) – 29.3875 = 96.3315

Example 8
The number of bacterial cells (y) per unit volume in a culture at different
hours (x) is given below:
x 0 1 2 3 4 5 6 7 8 9
y 43 46 82 98 123 167 199 213 245 272

Fit lines of regression of y on x and x on y. Also, estimate the number of


bacterial cells after 15 hours.
Solution
n = 10
x y x2 xy y2­
0 43 0 0 1849
1 46 1 46 2116
2 82 4 164 6724
3 98 9 294 9604
4 123 16 492 15129
5 167 25 835 27889
6 199 36 1194 39601
7 213 49 1491 45369
8 245 64 1960 60025
9 272 81 2448 73984
Âx = 45 Ây = 1488 Âx = 285
2
Âxy = 8924 Ây = 282290
2
4.44 Chapter 4 Correlation and Regression

Âx Ây
 xy - n
byx =
(Â x )
2

Âx 2
-
n
(45)(1488)
8924 -
= 10
(45)2
285 -
10
= 27.0061
Âx Ây
 xy - n
bxy =
(Â y )
2

Ây 2
-
n
(45)(1488)
8924 -
= 10
(1488)2
282290 -
10
= 0.0366

x=
 x = 45 = 4.5
n 10

y=
Ây =
1488
= 148.8
n 10
The equation of the line of regression of y on x is
y - y = byx ( x - x )
y - 148.8 = 27.0061 ( x - 4.5)
y = 27.0061x + 27.2726
The equation of the line of regression of x on y is
x - x = bxy ( y - y )
x - 4.5 = 0.0366 ( y - 148.8)
x = 0.366 y - 0.9461
At x = 15 hours,
y = 27.0061 (15) + 27.2726 = 432.3641
4.15 Properties of Regression Coefficients 4.45

Example 9
Find the regression coefficient of y on x for the following data:
x 1 2 3 4 5
y 160 180 140 180 200

Solution
n=5

x=
 x = 15 = 3
n 5

y=
 y = 860 = 172
n 5

x y x-x y-y ( x - x )2 ( x - x )( y - y )

1 160 –2 –12 4 24
2 180 –1 8 1 –8
3 140 0 –32 0 0
4 180 1 8 1 8
5 200 2 28 4 56

Âx = 15 Ây = 860 Â( x - x ) = 0 Â( y - y ) = 0 Â( x - x )2 = 10 Â( x - x )( y - y ) = 80

byx =
 ( x - x )( y - y )
 ( x - x )2
80
=
10
=8

Note Since Âx, Ây, Âx2, Ây2, Âxy can be directly obtained with the help of scientific
calculator, the regression coefficient can be calculated without using mean.

Example 10
Calculate the two regression coefficients from the data and find
correlation coefficient.
x 7 4 8 6 5
y 6 5 9 8 2
4.46 Chapter 4 Correlation and Regression

Solution
n=5

x=
 x = 30 = 6
n 5

y=
 y = 30 = 6
n 5

x y x-x y-y ( x - x )2 ( y - y )2 ( x - x )( y - y )

7 6 1 0 1 0 0
4 5 –2 –1 4 1 2
8 9 2 3 4 9 6
6 8 0 2 0 4 0
5 2 –1 –4 1 16 4

Âx = Ây = Â( x - x ) Â( y - y ) Â( x - x )2 Â( y - y )2
Â( x - x )( y - y ) = 12
30 30 =0 =0 = 10 = 30

byx =
 ( x - x )( y - y )
 ( x - x )2
12
=
10
= 1.2

bxy =
 ( x - x )( y - y )
 ( y - y )2
12
=
30
= 0.4
r = byx bxy = (1.2)(0.4) = 0.693

Example 11
Obtain the two regression lines from the following data and hence, find
the correlation coefficient.
x 6 2 10 4 8

y 9 11 5 8 7

 [Summer 2015]
4.15 Properties of Regression Coefficients 4.47

Solution
n=5

x=
 x = 30 = 6
n 5

y=
Ây
=
40
=8
n 5

x y x-x y-y ( x - x )2 ( y - y )2 ( x - x )( y - y )

6 9 0 1 0 1 0

2 11 –4 3 16 9 –12
10 5 4 –3 16 9 –12
4 8 –2 0 4 0 0
8 7 2 –1 4 1 –2

Â( x - x ) Â( y - y ) Â( x - x )2 Â( y - y )2 Â( x - x )( y - y )2
Âx = 30 Ây = 40
=0 =0 = 40 = 20 = -26

byx =
 (x - x ) (y - y)
 ( x - x )2
-26
=
40
= -0.65

bxy =
 (x - x ) (y - y)
 ( y - y )2
-26
=
20
= -1.3
The equation of regression line of y on x is
y - y = byx ( x - x )
y - 8 = -0.65( x - 6)
y = -0.65 x + 11.9
The equation of regression line of x on y is
x - x = bxy ( y - y )
x - 6 = -1.3( y - 8)
x = -1.3 y + 16.4
r = byx bxy = (-0.65) (-1.3) = 0.9192
4.48 Chapter 4 Correlation and Regression

Since byx and bxy are negative, r is negative.


r = –0.9192.

Example 12
Calculate the regression coefficients and find the two lines of regression
from the following data:
x 57 58 59 59 60 61 62 64
y 67 68 65 68 72 72 69 71

Find the value of y when x = 66.

Solution
n=8

x=
 x = 480 = 60
n 8

y=
Ây =
552
= 69
n 8

x y x-x y-y ( x - x )2 ( y - y )2 ( x - x )( y - y )

57 67 –3 –2 9 4 6
58 68 –2 –1 4 1 2
59 65 –1 –4 1 16 4
59 68 –1 –1 1 1 1
60 72 0 3 0 9 0
61 72 1 3 1 9 3
62 69 2 0 4 0 0
64 71 4 2 16 4 8

Âx = Ây = Â( x - x ) Â( y - y ) Â( x - x )2 Â( y - y )2
Â( x - x )( y - y ) = 24
480 552 =0 =0 = 36 = 44

byx =
 ( x - x )( y - y )
 ( x - x )2
24
=
36
= 0.667
4.15 Properties of Regression Coefficients 4.49

bxy =
 ( x - x )( y - y )
 ( y - y )2
24
=
44
= 0.545
The equation of regression line of y on x is
y - y = byx ( x - x )
y - 69 = 0.667( x - 60)
y = 0.667 x + 28.98
The equation of regression line of x on y is
x - x = bxy ( y - y )
x - 60 = 0.545( y - 69)
x = 0.545 y + 22.395
Value of y when x = 66 is
   y = 0.667 (66) + 28.98 = 73.002

Example 13
The following data represents rainfall (x) and yield of paddy per hectare
(y) in a particular area. Find the linear regression of x on y.
x 113 102 95 120 140 130 125
y 1.8 1.5 1.3 1.9 1.1 2.0 1.7

Solution
Let a = 120 and b = 1.8 be the assumed means of x and y series respectively.
d x = x - a = x - 120
d y = y - b = y - 1.8
n=7
4.50 Chapter 4 Correlation and Regression

x y dx dy dy2 dxdy
113 1.8 –7 0 0 0
102 1.5 –18 –0.3 0.09 5.4
95 1.3 –25 –0.5 0.25 12.5
120 1.9 0 0.1 0.01 0
140 1.1 20 –0.7 0.49 –14
130 2.0 10 0.2 0.04 2.0
125 1.7 5 –0.1 0.01 –0.5
Âx = 825 Ây = 11.3 Âdx = –15 Âdy = –1.3 Âdy2 = 0.89 Âdxdy = 5.4

 dx  dy
 dx dy - n
bxy =
(Â d )
2

Â
y
d y2 -
n
(-15)(-1.3)
5.4 -
= 7
(-1.3)2
0.89 -
7
= 4.03

x=
 x = 825 = 117.86
n 7

y=
Ây =
11.3
= 1.614
n 7
The equation of the regression line of x on y is
x - x = bxy ( y - y )
x - 117.86 = 4.03 ( y - 1.614)
x = 4.03 y + 111.36

Note Since Âx, Ây, Âx2, Ây2, Âxy can be directly obtained with the help of scientific
calculator, the regression coefficient can be calculated without using assumed mean.

Example 14
Find the two lines of regression from the following data:
Age of husband (x) 25 22 28 26 35 20 22 40 20 18
Age of wife (y) 18 15 20 17 22 14 16 21 15 14
4.15 Properties of Regression Coefficients 4.51

Hence, estimate (i) the age of the husband when the age of the wife is 19,
and (ii) the age of the wife when the age of the husband is 30.

Solution
Let a = 26 and b = 17 be the assumed means of x and y series respectively.

d x = x - a = x - 26
d y = y - b = y - 17
n = 10

x y dx dy dx2 dy2 dxdy


25 18 –1 1 1 1 –1
22 15 –4 –2 16 4 8
28 20 2 3 4 9 6
26 17 0 0 0 0 0
35 22 9 5 81 25 45
20 14 –6 –3 36 9 18
22 16 –4 –1 16 1 4
40 21 14 4 196 16 56
20 15 –6 –2 36 4 12
18 14 –8 –3 64 9 24
Âx = 256 Ây = 172 Âdx = –4 Âdy = 2 Âdx2 = 450 Âdy2 = 78 Âdxdy = 172

 dx  dy
 dx dy - n
byx =
(Â d x )
2

 d x2 -
n
(-4)(2)
172 -
= 10
(-4)2
450 -
10
= 0.385
4.52 Chapter 4 Correlation and Regression

 dx  dy
 dx dy - n
bxy =
(Â d )
2

Â
y
d y2 -
n
(-4)(2)
172 -
= 10
( 2 )2
78 -
10
= 2.227

x=
 x = 256 = 25.6
n 10

y=
Ây =
172
= 17.2
  n 10
The equation of the regression line of y on x is
y - y = byx ( x - x )
y - 17.2 = 0.385( x - 25.6)
y = 0.385 x + 7.344
The equation of the regression line of x on y is
x - x = bxy ( y - y )
x - 25.6 = 2.227( y - 17.2)
x = 2.227 y - 12.704
Estimated age of the husband when the age of the wife is 19 is
    x = 2.227 (19) – 12.704 = 29.601 or 30 nearly
Age of the husband = 30 years
Estimated age of the wife when the age of the husband is 30 is
    y = 0.385 (30) + 7.344 = 18.894 or 19 nearly
Age of the wife = 19 years

Example 15
From the following data, obtain the two regression lines and correlation
coefficient.
Sales (x) 100 98 78 85 110 93 80

Purchase (y) 85 90 70 72 95 81 74
4.15 Properties of Regression Coefficients 4.53

Solution
Let a = 93 and b = 81 be the assumed means of x and y series respectively.
dx = x – a = x – 93
dy = y – b = y – 91
n=7

x y dx dy dx2 dy2 dxdy


100 85 7 4 49 16 28
98 90 5 9 25 81 45
78 70 –15 –11 225 121 165
85 72 –8 –9 64 81 72
110 95 17 14 289 196 238
93 81 0 0 0 0 0
80 74 –13 –7 169 49 91
Âx = 644 Ây = 567 Âdx = –7 Âdy = 0 Âdx2 = 821 Âdy2 = 544 Âdxdy = 639

 dx  dy
 dx dy - n
byx =
(Â d x )
2

 d x2 -
n
(-7)(0)
639 -
= 7
(-7)2
821 -
7
= 0.785

 dx  dy
 dx dy - n
bxy =
(Â d )
2

Â
y
d y2 -
n
(-7)(0)
639 -
= 7
( 0 )2
544 -
7
  = 1.1746
4.54 Chapter 4 Correlation and Regression

x=
 x = 644 = 92
n 7

y=
Ây =
567
= 81
n 7

The equation of regression line of y on x is


y - y = byx ( x - x )
y - 81 = 0.785( x - 92)
y = 0.785 x + 8.78

The equation of regression line of x on y is


x - x = bxy ( y - y )
x - 92 = 1.1746 ( y - 81)
x = 1.1746 - 3.1426
r = byx bxy = (0.785)(1.1746) = 0.9602

Exercise 4.3

1. The following are the lines of regression 4y = x + 38 and 9y = x + 288.


Estimate y when x = 99 and x when y = 30. Also, find the means of x and
y.
 [Ans.: y = 43, x = 82, x = 162, y = 50 ]
2. The equations of the two lines of regression are x = 19.13 — 0.87 y and
y = 11.64 – 0.50 x. Find (i) the means of x and y, and (ii) the coefficient
of correlation between x and y.
 [Ans.: x = 15.79, y = 3.74, (ii) r = -0.66, byx = -0.5, bxy = 0.87 ]
3. Given var(x) = 25. The equations of the two lines of regression are
5x — y = 22 and 64 x — 45 y = 24. Find (i) x and y , (ii) r, and (iii) sy.
 [Ans.: x = 6, y = 8, (ii) r = 1.87 (iii) s y = 0.2 ]

4. In a partially destroyed laboratory record of analysis of correlation data


the following results are legible. Variance = 9, the equations of the
lines of regression 4x — 5y + 33 = 0, 20 x — 9 y — 107 = 0. Find (i) the
mean values of x and y, (ii) the standard deviation of y, and (iii) the
coefficient of correlation between x and y
 [Ans.: (i) x = 13, y = 17, (ii) s y = 4, (iii) r = 0.6 ]
4.15 Properties of Regression Coefficients 4.55

5. From a sample of 200 pairs of observation, the following quantities


were calculated:

 x = 11.34,  y = 20.78,  x 2
= 12.16, Â y 2 = 84.96, Â xy = 22.13

From the above data, show how to compute the coefficients of the
equation y = a + bx.
 [Ans.: a = 0.0005, b = 1.82 ]
6. In the estimation of regression equations of two variables x and y, the
following results were obtained:
x = 90, y = 70, n = 10, S( x - x )2 = 6360, S(y - y )2 = 2860
S(x - x ) (y - y ) = 3900
Obtain the two lines of regression.
 [Ans.: x = 1.361 y — 5.27, y = 0.613 x + 14.812]
7. Find the likely production corresponding to a rainfall of 40 cm from the
following data:
Rainfall (in cm) Output (in quintals)
mean 30 50
SD 5 10
r = 0.8
 [Ans.: 66 quintals]
8. The following table gives the age of a car of a certain make and annual
maintenance cost. Obtain the equation of the line of regression of cost
on age.
Age of a car 2 4 6 8
Maintenance 1 2 2.5 3
 [Ans.: x = 0.325 y + 0.5]
9. Obtain the equation of the line of regression of y on x from the following
data and estimate y for x = 73.
x 70 72 74 76 78 80
y 163 170 179 188 196 220
 [Ans.: y = 5.31 x — 212.57, y = 175.37]
10. The heights in cm of fathers (x) and of the eldest sons (y) are given
below:
x 165 160 170 163 173 158 178 168 173 170 175 180
y 173 168 173 165 175 168 173 165 180 170 173 178
4.56 Chapter 4 Correlation and Regression

 Estimate the height of the eldest son if the height of the father is
172 cm and the height of the father if the height of the eldest son is
173 cm. Also, find the coefficient of correlation between the heights of
fathers and sons.
 [Ans.: (i) y = 1.016 x — 5.123 (ii) x = 0.476 y + 98.98
 (iii) 169.97, 173.45 (iv) r = 0.696]
11. Find (i) the lines of regression, and (ii) coefficient of correlation for
the following data:
x 65 66 67 67 68 69 70 72
y 67 68 65 66 72 72 69 71
 [Ans.: (i) y = 19.64 + 0.72 x, x = 33.29 + 0.5 y, (ii) r = 0.604]
12. Find the line of regression for the following data and estimate y
corresponding to x = 15.5.
x 10 12 13 16 17 20 25
y 19 22 24 27 29 33 37
 [Ans.: y = 1.21x + 7.71, y = 26.465]
13. The following data give the heights in inches (x) and weights in lbs (y)
of a random sample of 10 students:
x 61 68 68 64 65 70 63 62 64 67
y 112 123 130 115 110 125 100 113 116 126
Estimate the weight of a student of height 59 inches.
 [Ans.: 126.4 lbs]
14. Find the regression equations of y on x from the data given below
taking deviations from actual mean of x and y.
Price in rupees (x) 10 12 13 12 16 15
Demand (y) 40 38 43 45 37 43
Estimate the demand when the price is `20.
 [Ans.: y = —0.25 x + 44.25, y = 39.25]
CHAPTER
5
Some Special
Probability
Distributions

Chapter Outline
5.1 Introduction
5.2 Binomial Distribution
5.3 Poisson Distribution
5.4 Normal Distribution
5.5 Exponential Distribution
5.6 Gamma Distribution

5.1 Introduction

There are some specific distributions that are used in practice. There is a random
experiment behind each of these distributions. Since these random experiments model
a lot of real life phenomenon, these special distributions are used frequently in different
applications. Often a random experiment that we encounter in practice is such that we
are interested in the associated random variable X with such a standard distribution.
This chapter discusses special random variables and their distributions. These
include binomial distribution, Poisson distribution, normal distribution, exponential
distribution and gamma distribution.
5.2 Chapter 5 Some Special Probability Distributions

5.2 Binomial Distribution

Consider n independent trials of a random experiments which results in either success


or failure. Let p be the probability of success remaining constant every time and
q = 1 – p be the probability of failure. The probability of x successes and n – x failures
is given by px qn – x (multiplication theorem of probability). But these x successes and
n – x failures can occur in any of the nCx ways in each of which the probability is same.
Hence, the probability of x successes is nCx px qn – x.
P( X = x ) = nC x p x q n - x , x = 0, 1, 2, ..., n, where p + q = 1
A random variable X is said to follow the binomial distribution if the probability of x
is given by
P( X = x ) = p( x ) = nC x p x q n - x , x = 0, 1, 2,..., n and q = 1 - p
The two constants n and p are called the parameters of the distribution.

5.2.1 Examples of Binomial Distribution


(i) Number of defective bolts in a box containing n bolts.
(ii) Number of post-graduates in a group of n people.
(iii) Number of oil wells yielding natural gas in a group of n wells test drilled.
(iv) Number of machines lying idle in a factory having n machines.

5.2.2 Conditions for Binomial Distribution


The binomial distribution holds under the following conditions:
(i) The number of trials n is finite.
(ii) There are only two possible outcomes, success or failure.
(iii) The trials are independent of each other.
(iv) The probability of success p is constant for each trial.

5.2.3 Constants of the Binomial Distribution


1. Mean of the Binomial Distribution
E ( X ) = Â x p( x )
n
= Â x nC x p x qn- x
x =0

= 0 ◊ nC0 p0 q n + 1 ◊ nC1 p q n -1 + 2 ◊ nC2 p2 q n - 2 +  + n p n


= np [q n -1 + ( n -1)
C1 q n - 2 p + ( n -1)
C2 q n - 3 p2 +  + p n -1 ]
n -1
= np (q + p)
= np [∵ p + q = 1]
5.2 Binomial Distribution 5.3

2. Variance of the Binomial Distribution


Var( X ) = E ( X 2 ) - m 2
n
= Â x 2 p( x ) - m 2
x =0
n
= Â x 2 nC x p x q n - x - m 2
x =0
n
= Â [x + x( x - 1)] nC x p x qn- x - m 2
x =0
n n
= Â x nC x p x q n - x + Â x ( x - 1) nC x p x qn- x - m 2
x =0 x =0
n
n(n - 1)
= np + Â x ( x - 1) x ( x - 1) ◊ (n-2)C x -2 p x q n- x - m 2
x =0
n
= np + Â n (n - 1) ◊ (n-2)C x -2 p2 p x -2 q n- x - m 2
x =0
n
= np + n(n - 1) p2 Â ( n - 2) C x - 2 p x - 2 q n - x - m 2
x =0

= np + n(n - 1) p ◊ (q + p)n - 2 - m 2
2

= np + n(n - 1) p2 - m 2 [∵ p + q = 1]
= np [1 + (n - 1) p ]- m 2
= np [1 - p + np ] - m 2
= np [q + np] - m 2 [∵ 1 - p = q ]
2
= np (q + np) - (np)
= npq
   
3. Standard Deviation of the Binomial Distribution

SD = Variance = npq

4. Mode of the Binomial Distribution

Mode of the binomial distribution is the value of x at which p(x) has maximum value.
Mode = integral part of (n + 1)p, if (n + 1)p is not an integer
    = (n +1) p and (n + 1) p – 1, if (n + 1) p is an integer.
5.4 Chapter 5 Some Special Probability Distributions

5.2.4 Recurrence Relation for the Binomial Distribution


For the binomial distribution,
P( X = x ) = nC x p x q n - x
P( X = x + 1) = nC x +1 p x +1 q n - x -1
P( X = x + 1) n
C x +1 p x +1 q n - x -1
=
P( X = x ) n
Cx p x qn- x
n! x ! (n - x )! p
= ¥ ◊
( x + 1)! (n - x - 1)! n! q
(n - x ) (n - x - 1)! x ! p
= ◊
( x + 1) x ! (n - x - 1)! q
n- x p
= ◊
x +1 q
n- x p
P( X = x + 1) = ◊ ◊ P( X = x )
x +1 q

5.2.5 Binomial Frequency Distribution


If n independent trials constitute one experiment and this experiment is repeated N
times, the frequency of x successes is N P(X = x), i.e., N nC x p x q n - x . This is called
expected or theoretical frequency f (x) of a success.
n n È n ˘
 f ( x) = N  P( X = x ) = N Í∵
ÍÎ
 P( X = x) = 1˙˙
x =0 x =0 x =0 ˚
The expected or theoretical frequencies f (0), f (1), f (2), ..., f (n) of 0, 1, 2, ..., n, successes
are respectively the first, second, third, ..., (n + 1)th term in the expansion of N(q + p)n.
The possible number of successes and their frequencies is called a binomial frequency
distribution. In practice, the expected frequencies differ from observed frequencies
due to chance factor.

Example 1
The mean and standard deviation of a binomial distribution are 5 and 2.
Determine the distribution.
Solution
m = np = 5
SD = npq = 2
npq = 4
5.2 Binomial Distribution 5.5

npq 4
=
np 5
4
\ q=
5
4 1
p = 1- q = 1- =
5 5
np = 5
Ê 1ˆ
nÁ ˜ = 5
Ë 5¯
\ n = 25
Hence, the binomial distribution is

P( X = x ) = nC x p x q n - x
x 25 - x
25 Ê 1ˆ Ê 4ˆ
= Cx Á ˜ Á ˜ , x = 0, 1, 2,..., 25
Ë 5¯ Ë 5 ¯

Example 2
The mean and variance of a binomial variate are 8 and 6. Find
P(X ≥ 2).
Solution
m = np = 8
s 2 = npq = 6
npq 6 3
= =
np 8 4
3
\ q=
4
3 1
p = 1- q = 1- =
4 4
np = 8
Ê 1ˆ
nÁ ˜ = 8
Ë 4¯

   \ n = 32
P( X = x ) = nC x p x q n - x
x 32 - x
32 Ê 1ˆ Ê 3ˆ
= Cx Á ˜ Á ˜ , x = 0, 1, 2, ..., 32
Ë 4¯ Ë 4¯
5.6 Chapter 5 Some Special Probability Distributions

P( X ≥ 2) = 1 - P( X < 2)
= 1 - [P( X = 0) + P( X = 1)]
1
= 1 - Â P( X = x )
x =0
1 x 32 - x
Ê 1ˆ Ê 3ˆ
= 1- Â 32
Cx Á ˜ Á ˜
Ë 4¯ Ë 4¯
x =0
= 0.9988

Example 3
Suppose P(X = 0) = 1 – P(X = 1). If E(X) = 3 Var (X), find P(X = 0).
Solution
E ( X ) = 3 Var ( X )
np = 3 npq
1 = 3q
1
\ q=
3
1 2
p = 1- q = 1- =
3 3
Let P(X = 1) = p
P( X = 0) = 1 - P ( X = 1)
= 1- p
2
= 1-
3
1
=
3

Example 4
4
The mean and variance of a binomial distribution are 4 and
respectively. Find P(X ≥ 1). 3

Solution
m = np = 4
4
s 2 = npq =
3
5.2 Binomial Distribution 5.7

4
npq 3 1
= =
np 4 3
1
\ q =
3
1 2
p = 1- q = 1- =
3 3
np = 4
Ê 2ˆ
nÁ ˜ = 4
Ë 3¯
\ n=6

P( X = x ) = nC x p x q n - x
x 6- x
Ê 2ˆ Ê 1ˆ
= 6C x Á ˜ Á ˜ , x = 0, 1, 2, ..., 6
Ë 3¯ Ë 3¯
P ( X ≥ 1) = 1 - P ( X < 1)
= 1 - P ( X = 0)
0 6
Ê 2ˆ Ê 1ˆ
= 1 - 6 C0 Á ˜ Á ˜
Ë 3¯ Ë 3¯
= 0.9986

Example 5
A discrete random variable X has mean 6 and variance 2. If it is assumed
that the distribution is binomial, find the probability that 5 £ X £ 7.
Solution
m = np = 6
s 2 = npq = 2
npq 2 1
= =
np 6 3
1
\ q=
3
1 2
p = 1- q = 1- =
3 3
np = 6
Ê 2ˆ
nÁ ˜ = 6
Ë 3¯
\ n=9
5.8 Chapter 5 Some Special Probability Distributions

P( X = x ) = nC x p x q n - x
x 9- x
Ê 2ˆ Ê 1ˆ
= 9C x Á ˜ Á ˜ , x = 0, 1, 2, ..., 9
Ë 3¯ Ë 3¯
P(5 £ X £ 7) = P( X = 5) + P( X = 6) + P( X = 7)
7
= Â P( X = x )
x =5
7 x 9- x
Ê 2ˆ Ê 1ˆ
= Â 9
Cx Á ˜ Á ˜
Ë 3¯ Ë 3¯
x =5
4672
=
6561
= 0.7121

Example 6
With the usual notation, find p for a binomial distribution if n = 6 and
9P(X = 4) = P(X = 2).
Solution
For the binomial distribution,
P( X = x ) = nC x p x q n - x , x = 0, 1, 2, ..., n
n= 6
9 P ( X = 4) = P ( X = 2)
9 6 C 4 p 4 q 2 = 6 C2 p 2 q 4
9 p2 = q 2 = (1 - p)2
9 p2 = 1 - 2 p + p2
8 p2 + 2 p - 1 = 0
-2 ± 4 + 32 -2 ± 6 1 1
p= = =- ,
2¥8 16 2 4
1
Since probability cannot be negative, p = .
4

Example 7
In a binomial distribution consisting of 5 independent trials, the
probability of 1 and 2 successes are 0.4096 and 0.2048 respectively.
Find the parameter p of the distribution.
5.2 Binomial Distribution 5.9

Solution
n = 5, P( X = 1) = 0.4096, P( X = 2) = 0.2048

Probability of getting x successes out of 5 trials


P( X = x ) = nC x p x q n - x = 5C x p x q 5- x , x = 0, 1, 2, ..., 5
P ( X = 1) = 5C1 p q 4 = 0.4096  ...(1)
5 2 3
P( X = 2) = C2 p q = 0.2048  ...(2)
Dividing Eq. (2) by Eq. (1),
5
C2 p 2 q 3 0.2048
5 4
=
C1 p q 0.4096
10 p 1
=
5q 2
p 1
=
q 4
4 p = q = 1- p
5p =1
1
p=
5

Example 8
In a binomial distribution, the sum and product of the mean and variance
25 50
are and respectively. Determine the distribution.
3 3
Solution
For the binomial distribution,
25
np + npq =
3
25
np (1 + q ) =  ...(1)
3
50
and np (npq ) =
3
50
n2 p2 q =
3  ...(2)
5.10 Chapter 5 Some Special Probability Distributions

Squaring Eq. (1) and then dividing by Eq. (2),


625
n2 p2 (1 + q)2
= 9
n2 p2 q 50
3
1 + 2q + q 2 25
=
q 6
6 (q 2 + 2q + 1) = 25 q
6 q 2 - 13 q + 6 = 0
(2q - 3) (3q - 2) = 0
3 2
q = or q =
2 3
Since q can not be greater than 1,
2
q=
3
2 1
p = 1- q = 1- =
3 3
From Eq. (1),
Ê 1 ˆ Ê 2 ˆ 25
n Á ˜ Á1 + ˜ =
Ë 3¯ Ë 3¯ 3
\       n = 15
Hence, the binomial distribution is
x 15 - x
Ê 1ˆ Ê 2ˆ
P( X = x ) = 15C x Á ˜ Á ˜ , x = 0, 1, 2, ...15
Ë 3¯ Ë 3¯

Example 9
1
If the probability of a defective bolt is , find the (i) mean, and
8
(ii) variance for the distribution of 640 defective bolts.
Solution
1
p= , n = 640
8
640
m = np = = 80
8
1 7
q = 1- p = 1- =
8 8
5.2 Binomial Distribution 5.11

Ê 1ˆ Ê 7ˆ
Variance of the distribution = npq = 640 ÁË ˜¯ ÁË ˜¯ = 70
8 8

Example 10
In eight throws of a die, 5 or 6 is considered as a success. Find the mean
number of success and the standard deviation.
Solution
Let p be the probability of success.
1 1 1
p= + =
6 6 3
1 2
q = 1- p = 1- =
3 3
n=8
Ê 1ˆ 8
m = np = 8 Á ˜ =
Ë 3¯ 3
Ê 1ˆ Ê 2ˆ 4
SD = npq = 8 Á ˜ Á ˜ =
Ë 3¯ Ë 3¯ 3

Example 11
4 coins are tossed simultaneously. What is the probability of getting
(i) 2 heads? (ii) at least 2 heads? (iii) at most 2 heads?
Solution
Let p be the probability of getting a head in the toss of a coin.
1 1 1
p = , q = 1- p = 1- = , n = 4
2 2 2
The probability of getting x heads when 4 coins are tossed
x 4- x
Ê 1ˆ Ê 1ˆ
P( X = x ) = nC x p x q n - x = 4C x Á ˜ Á ˜ , x = 0, 1, 2, 3, 4
Ë 2¯ Ë 2¯

(i) Probability of getting 2 heads when 4 coins are tossed


2 2
Ê 1ˆ Ê 1ˆ 3
P( X = 2) = 4C2 Á ˜ Á ˜ =
Ë 2¯ Ë 2¯ 8

5.12 Chapter 5 Some Special Probability Distributions

(ii) Probability of getting at least two heads when 4 coins are tossed
P( X ≥ 2) = P( X = 2) + P( X = 3) + P( X = 4)
4
= Â P( X = x )
x =2
4 x 4- x
Ê 1ˆ Ê 1ˆ
= Â 4
Cx Á ˜ Á ˜
Ë 2¯ Ë 2¯
x =2
11
=
16
(iii) Probability getting at most 2 heads when 4 coins are tossed
P( X £ 2) = P( X = 0) + P( X = 1) + P( X = 2)
2
= Â P( X = x )
x =0
2 x 4- x
Ê 1ˆ Ê 1ˆ
= Â 4
Cx Á ˜ Á ˜
Ë 2¯ Ë 2¯
x =0
11
=
16

Example 12
Two dice are thrown five times. Find the probability of getting the sum as
7 (i) at least once, (ii) two times, and (iii) P(1 < X < 15).
Solution
In a single throw of two dice, a sum of 7 can occur in 6 ways out of 6 × 6 = 36 ways.
(1, 6), (6, 1), (2, 5), (5, 2), (3, 4), (4, 3)
Let p be the probability of getting the sum as 7 in a single throw of a pair of dice.
6 1 1 5
p= = , q = 1- p = 1- = , n = 5
36 6 6 6
Probability of getting the sum x times in 5 throws of a pair of dice
x 5- x
Ê 1ˆ Ê 5ˆ
P ( X = x ) = n C x p x q n - x = 5C x Á ˜ Á ˜ , x = 0, 1, 2, ..., 5
Ë 6¯ Ë 6¯
(i) Probability of getting the sum as 7 at least once in 5 throws of two dice
P( X ≥ 1) = 1 - P( X = 0)
0 5
Ê 1ˆ Ê 5ˆ
= 1 - 5 C0 Á ˜ Á ˜
Ë 6¯ Ë 6¯

5.2 Binomial Distribution 5.13

3125
= 1-
7776
4651
=
7776
(ii) Probability of getting the sum as 7 two times in 5 throws of two dice
2 3
Ê 1ˆ Ê 5ˆ 625
P( X = 2) = 5C2 Á ˜ Á ˜ =
Ë 6¯ Ë 6¯ 3888

(iii) Probability of getting the sum as 7 for P(1 < X < 5) in 5 throws of two dice
P(1 < X < 5) = P( X = 2) + P( X = 3) + P( X = 4)
4
= Â P( X = x )
x =2
4 x 5- x
Ê 1ˆ Ê 5ˆ
= Â 5
Cx Á ˜ Á ˜
Ë 6¯ Ë 6¯
x =2
1525
=
7776

Example 13
If 10% of the screws produced by a machine are defective, find the
probability that out of 5 screws chosen at random, (i) none is defective,
(ii) one is defective, and (iii) at most two are defective.
Solution
Let p be the probability of defective screws.
p = 0.1,   q = 1 – p = 1 – 0.1 = 0.9,   n = 5
Probability that x screws out of 5 screws are defective
P( X = x ) = nC x p x q n - x = 5C x (0.1) x (0.9)5- x , x = 0, 1,2, ..., 5
(i) Probability that none of the screws out of 5 screws is defective
P(X = 0) = 5C0 (0.1)0 (0.9)5 = 0.5905
(ii) Probability that one screw out of 5 screws is defective
P(X = 1) = 5C1 (0.1)1 (0.9)4 = 0.3281
(iii) Probability that at most 2 screws out of 5 screws are defective
P( X £ 2) = P( X = 0) + P( X = 1) + P( X = 2)
2
= Â P( X = x )
x =0
2
= Â 5
C x (0.1) x (0.9)5- x
x =0

= 0.9914
5.14 Chapter 5 Some Special Probability Distributions

Example 14
A multiple-choice test consists of 8 questions with 3 answers to each
question (of which only one is correct). A student answers each question
by rolling a balanced die and checking the first answer if he gets 1 or 2,
the second answer if he gets 3 or 4, and the third answer if he gets
5 or 6. To get a distinction, the student must secure at least 75% correct
answers. If there is no negative making, what is the probability that the
student secures a distinction? [Summer 2015]
Solution
Let p be the probability of getting an answer to a question correctly. There are three
answers to each question, out of which only one is correct.
1 1 2
p = , q = 1- p = 1- = , n = 8
3 3 3
Probability of getting x correct answers in an 8 questions test
x 8- x
Ê 1ˆ Ê 2ˆ
P ( X = x ) = n C x p x q n - x = 8C x Á ˜ Á ˜ , x = 0, 1, 2, ..., 8
Ë 3¯ Ë 3¯
Probability of securing a distinction, i.e., getting at least 6 correct answers out of the
8 questions
P( X £ 6) = P ( X = 6) + P ( X = 7) + P ( X = 8)
8
= Â P( X = x )
x =6

8 x 8- x
Ê 1ˆ Ê 2ˆ
= Â 8
Cx Á ˜ Á ˜
Ë 3¯ Ë 3¯
x =6
43
=
2187
= 0.0197

Example 15
A and B play a game in which their chances of winning are in the ratio
3:2. Find A’s chance of winning at least three games out of the five
games played.
Solution
Let p be the probability that A wins the game.
3 3 3 2
p= = , q = 1- p = 1- = , n=5
3+2 5 5 5
5.2 Binomial Distribution 5.15

Probability that A wins x games out of 5 games


x 5- x
Ê 3ˆ Ê 2ˆ
P ( X = x ) = n C x p x q n - x = 5C x Á ˜ Á ˜ , x = 0, 1, 2, ..., 5
Ë 5¯ Ë 5¯
Probability that A wins at least 3 games
P( X ≥ 3) = P( X = 3) + P( X = 4) + P( X = 5)
5
= Â P( X = x )
x =3
5 x 5- x
Ê 3ˆ Ê 2ˆ
= Â 5
Cx Á ˜ Á ˜
Ë 5¯ Ë 5¯
x =3
2133
=
3125
= 0.6826

Example 16
It has been claimed that in 60% of all solar heat installations the
utility bill is reduced by at least one-third. Accordingly, what are the
probabilities that the utility bill will be reduced by at least one third in
(i) four of five installations? (ii) at least four of five installations?
Solution
Let p be the probability that the utility bill is reduced by one-third in the solar heat
installations.
p = 60% = 0.6,    q = 1 – p = 1 – 0.6 = 0.4,   n = 5
Probability that the utility bill is reduced by one-third in x installations out of 5
installations
P( X = x ) = nC x p x q n - x = 5C x (0.6) x (0.4)5- x , x = 0, 1, 2, ..., 5
Probability that the utility bill is reduced by one-third in 4 of 5 installations
162
P( X = 5) = 5C4 (0.6)4 (0.4)1 =
625
Probability that the utility bill is reduced by one-third in at least 4 of 5 installations
P( X ≥ 4) = P( X = 4) + P( X = 5)
5
= Â P( X = x )
x=4
5
= Â 5
C x (0.6) x (0.4)5- x
x=4
1053
=
3125
= 0.337
5.16 Chapter 5 Some Special Probability Distributions

Example 17
The incidence of an occupational disease in an industry is such that the
workers have a 20% chance of suffering from it. What is the probability
that out of 6 workers chosen at random, four or more will suffer from
the disease?
Solution
Let p be the probability of a worker suffering from the disease.
p = 0.2,   q = 1 – p = 1 – 0.2 = 0.8,   n = 6
Probability that x workers will suffer from the disease
P( X = x ) = nC x p x q n - x = 6C x (0.2) x (0.8)6 - x , x = 0, 1, 2, ..., 6
Probability that 4 or more workers will suffer from the disease
P( X ≥ 4) = P( X = 4) + P( X = 5) + P( X = 6)
6
= Â P( X = x )
x=4
6
= Â 6
C x (0.2) x (0.8)6 - x
x=4
53
=
3125
= 0.017

Example 18
The probability that a man aged 60 will live up to 70 is 0.65. What is
the probability that out of 10 such men now at 60 at least 7 will live up
to 70?
Solution
Let p be the probability that a man will live up to 70.
p = 0.65,   q = 1 – p = 1 – 0.65 = 0.35,    n = 10
Probability that x men out of 10 will live up to 70
P( X = x ) = nC x p x q n - x = 10C x (0.65) x (0.35)10 - x , x = 0, 1, 2, ..., 10
Probability that at least 7 men out of 10 will live up to 70
P( X ≥ 7) = P( X = 7) + P( X = 8) + P( X = 9) + P( X = 10)
10
= Â P( X = x )
x =7
5.2 Binomial Distribution 5.17

10
= Â 10
C x (0.65) x (0.35)10 - x
x =7
= 0.5138

Example 19
In a multiple-choice examination, there are 20 questions. Each question
has 4 alternative answers following it and the student must select one
correct answer. 4 marks are given for a correct answer and 1 mark is
deducted for a wrong answer. A student must secure at least 50% of the
maximum possible marks to pass the examination. Suppose a student
has not studied at all, so that he answers the questions by guessing only.
What is the probability that he will pass the examination?
Solution
Since there are 20 questions and each carries with 4 marks, the maximum marks are
80. If the student solves 12 questions correctly and 8 questions wrongly, he gets 48 – 8
= 40 marks required for passing. If he gets more than 12 correct answers, he gets more
than 40 marks. Let p be the probability of getting a correct answer.
1 1 3
p = , q = 1 - p = 1 - = , n = 20
4 4 4
Probability of getting x correct answers out of 20 answers
x 20 - x
Ê 1ˆ Ê 3ˆ
P( X = x ) = nC x p x q n - x = 20
Cx Á ˜ Á ˜ , x = 0, 1, 2, ..., 20
Ë 4¯ Ë 4¯
Probability of passing the examination, i.e., probability of getting at least 12 correct
answers out of 20 answers
20
P( X ≥ 12) = Â P( X = x )
x =12
20 x 20 - x
Ê 1ˆ Ê 3ˆ
= Â 20
Cx Á ˜ Á ˜
Ë 4¯ Ë 4¯
x =12

= 9.3539 ¥ 10 -4

Example 20
1
The probability of a man hitting a target is . (i) If he fires 5 times, what
3
is the probability of his hitting the target at least twice? (ii) How many
times must he fire so that the probability of his hitting the target at least
once is more than 90%?
5.18 Chapter 5 Some Special Probability Distributions

Solution
Let p be probability of hitting a target.
1 1 2
p = , q = 1- p = 1- = , n = 5
3 3 3
Probability of hitting the target x times out of 5 times
x 5- x
Ê 1ˆ Ê 2ˆ
P ( X = x ) = n C x p x q n - x = 5C x Á ˜ Á ˜ , x = 0, 1, 2,..., 5
Ë 3¯ Ë 3¯
(i) Probability of hitting the target at least twice out of 5 times
P( X ≥ 2) = P( X = 2) + P( X = 3) + P( X = 4) + P( X = 5)
5
= Â P( X = x )
x =2
5 x 5- x
Ê 1ˆ Ê 2ˆ
= Â 5
Cx Á ˜ Á ˜
Ë 3¯ Ë 3¯
x =2
131
=
243
= 0.5391
(ii) Probability of hitting the target at least once out of 5 times
P( X ≥ 1) > 0.9
1 - P( X = 0) > 0.9
0 n
n Ê 1ˆ Ê 2ˆ
1 - C0 Á ˜ Á ˜ > 0.9
Ë 3¯ Ë 3 ¯
n
Ê 2ˆ
1 - Á ˜ > 0.9
Ë 3¯

6
Ê 2ˆ
For n = 6, 1 - Á ˜ = 0.9122
Ë 3¯
Hence, the man must fire 6 times so that the probability of hitting the target at lest once
is more than 90%.

Example 21
In sampling a large number of parts manufactured by a machine, the
mean number of defectives in a sample of 20 is 2. Out of 1000 such
samples, how many would be expected to contain exactly two defective
parts? [Summer 2015]
Solution
Let p be the probability of parts being defective.
5.2 Binomial Distribution 5.19

m = np = 2,   n = 20,   N = 1000


np = 2
20( p) = 2
\ p = 0.1
q = 1 - p = 1 - 0.1 = 0.9
Probability that the samples contain x defective parts out of 20 parts
P( X = x ) = nC x p x q n - x = 20
C x (0.1) x (0.9)20 - x , x = 0, 1, 2, ..., 20
Probability that the samples contain exactly 2 defective parts
20
P( X = 2) = C2 (0.1)2 (0.9)18 = 0.2852
Expected number of samples to contain exactly 2 defective parts = N P(X = 2)
= 1000 (0.2852)
= 285.2
ª 285

Example 22
An irregular 6-faced die is thrown such that the probability that it gives
3 even numbers in 5 throws is twice the probability that it gives 2 even
numbers in 5 throws. How many sets of exactly 5 trials can be expected
to give no even number out of 2500 sets?
Solution
Let p be the probability of getting an even number in a throw of a die.
n = 5,   N = 2500
Probability of getting x even numbers in 5 throws of a die
P( X = x ) = nC x p x q n - x = 5C x p x q 5- x , x = 0, 1, 2, ..., 5

P(X = 3) = 2 P(X = 2)
5
C3 p3 q 2 = 2 (5 C2 p2 q3 )
10 p3 q 2 = 20 p2 q 3
p = 2q
p = 2(1 - p) = 2 - 2 p
2
\ p=
3
2 1
q = 1- p = 1- =
3 3
5.20 Chapter 5 Some Special Probability Distributions

Probability of getting no even number in 5 throws of a die


0 5
Ê 2ˆ Ê 1ˆ 1
P( X = 0) = 5C0 Á ˜ Á ˜ =
Ë 3¯ Ë 3¯ 243
Expected number of sets = NP (X = 0)
2500
   =
243

Example 23
Out of 800 families with 5 children each, how many would you expect
to have (i) 3 boys? (ii) 5 girls? (iii) either 2 or 3 boys? (iv) at least one
boy? Assume equal probabilities for boys and girls.
Solution
Let p be the probability of having a boy in each family.
1 1 1 1
p = , q = 1 - = 1 - = , n = 5, N = 800
2 2 2 2
Probability of having x boys out of 5 children in each family
x 5- x
Ê 1ˆ Ê 1ˆ
P ( X = x ) = n C x p x q n - x = 5C x Á ˜ Á ˜ , x = 0, 1, 2, ..., 5
Ë 2¯ Ë 2¯
(i) Probability of having 3 boys out of 5 children in each family
3 2
Ê 1ˆ Ê 1ˆ 5
P( X = 3) = 5C3 Á ˜ Á ˜ =
Ë 2¯ Ë 2¯ 16

Expected number of families having 3 boys out of 5 children = N P(X = 3)
Ê 5ˆ
= 800 Á ˜
Ë 16 ¯

= 250
(ii) Probability of having 5 girls, i.e., no boys out of 5 children in each family
0 5
Ê 1ˆ Ê 1ˆ 1
P( X = 0) = 5C0 Á ˜ Á ˜ =
Ë 2¯ Ë 2¯ 32

Expected number of families 5 girls out of 5 children = NP(X = 0)
Ê 1ˆ
= 800 Á ˜
Ë 32 ¯
                  = 25
(iii) Probability of having either 2 or 3 boys out of 5 children in each family
3
P( X = 2) + P( X = 3) = Â P( X = x )
x =2
5.2 Binomial Distribution 5.21

3 x 5- x
Ê 1ˆ Ê 1ˆ
= Â 5
Cx Á ˜ Á ˜
Ë 2¯ Ë 2¯
x =2
5
=
8
Expected number of families having either 2 of 3 boys out of 5 children
= N [ P( X = 2) + P( X = 3)]
Ê 5ˆ
= 800 Á ˜
Ë 8¯
         = 500
(iv) Probability of having at least one boy out of 5 children in each family
P( X ≥ 1) = P( X = 1) + P( X = 2) + P( X = 3) + P( X = 4) + P( X = 5)
5
= Â P( X = x )
x =1
5 x 5- x
Ê 1ˆ Ê 1ˆ
= Â 5C x Á ˜ Á ˜
Ë 2¯ Ë 2¯
x =1
31
=
32
Expected number of families having at least-one boy out of 5 children
= NP( X ≥ 1)
Ê 31 ˆ
= 800 Á ˜
Ë 32 ¯
= 775

Example 24
If hens of a certain breed lay eggs on 5 days a week on an average, find
how many days during a season of 100 days a will poultry keeper with
5 hens of this breed expect to receive at least 4 eggs.
Solution
Let p be the probability of hen laying an egg on any day of a week.
5 5 2
p = , q = 1 - p = 1 - = , n = 5, N = 100
7 7 7
Probability of x hens laying eggs on any day of a week
x 5- x
Ê 5ˆ Ê 2ˆ
P ( X = x ) = n C x p x q n - x = 5C x Á ˜ Á ˜ , x = 0, 1, 2, ..., 5
Ë 7¯ Ë 7¯
5.22 Chapter 5 Some Special Probability Distributions

Probability of receiving at least 4 eggs on any day of a week


P( X ≥ 4) = P( X = 4) + P( X = 5)
5
= Â P( X = x )
x=4
5 x 5- x
Ê 5ˆ Ê 2ˆ
= Â 5
Cx Á ˜ Á ˜
Ë 7¯ Ë 7¯
x=4
= 0.5578
Expected number of days during a season of 100 days, a poultry keeper with 5 hens of
this breed will receive at least 4 eggs = N P( X ≥ 4)
= 100 (0.5578)
= 55.78
ª 56

Example 25
Seven unbiased coins are tossed 128 times and the number of heads
obtained is noted as given below:
No. of heads 0 1 2 3 4 5 6 7
Frequency 7 6 19 35 30 23 7 1
Fit a binomial distribution to the data.
Solution
Since the coin is unbiased,
1 1
p = , q = , n = 7, N = 128
2 2
For binomial distribution,
x 7- x
Ê 1ˆ Ê 1ˆ
P ( X = x ) = n C x p x q n - x = 7C x Á ˜ Á ˜ , x = 0, 1, 2, ..., 7
Ë 2¯ Ë 2¯
Theoretical or expected frequency f (x) = N P(X = x)
x 7- x 7
Ê 1ˆ Ê 1ˆ Ê 1ˆ
f ( x ) = 128 7C x Á ˜ Á ˜ = 128 7C x Á ˜
Ë 2¯ Ë 2¯ Ë 2¯
7
Ê 1ˆ
f (0) = 128 7C0 Á ˜ = 1
Ë 2¯
7
Ê 1ˆ
f (1) = 128 7C1 Á ˜ = 7
Ë 2¯
5.2 Binomial Distribution 5.23

7
Ê 1ˆ
f (2) = 128 7C2 Á ˜ = 21
Ë 2¯
7
Ê 1ˆ
f (3) = 128 7C3 Á ˜ = 35
Ë 2¯
7
Ê 1ˆ
f (4) = 128 7C4 Á ˜ = 35
Ë 2¯
7
Ê 1ˆ
f (5) = 128 7C5 Á ˜ = 21
Ë 2¯
7
Ê 1ˆ
f (6) = 128 7C6 Á ˜ = 7
Ë 2¯
7
7 Ê 1ˆ
f (7) = 128 C7 Á ˜ = 1
Ë 2¯

Binomial distribution
No. of heads x 0 1 2 3 4 5 6 7

Expected binomial frequency f (x) 1 7 21 35 35 21 7 1

Example 26
Fit a binomial distribution to the following data:
x 0 1 2 3 4 5
f 2 14 20 34 22 8

Solution

Mean =
 fx
Âf
2(0) + 14(1) + 20(2) + 34(3) + 22(4) + 8(5)
=
2 + 14 + 20 + 34 + 22 + 8
284
=
100
    = 2.84
For binomial distribution,
  n = 5
5.24 Chapter 5 Some Special Probability Distributions

m = np = 2.84
5 p = 2.84
\ p = 0.568
q = 1 - p = 1 - 0.568 = 0.432

P( X = x ) = nC x p x q n - x = 5C x (0.568) x (0.432)5- x , x = 0, 1, 2, ..., 5


N = Â f = 100

Theoretical or expected frequency f (x) = N P(X = x)

f ( x ) = 100 5C x (0.568) x (0.432)5- x

f (0) = 100 5C0 (0.568)0 (0.432)5 = 1.505 ª 1.5


f (1) = 100 5C1 (0.568)1 (0.432)4 = 9.89 ª 10
f (2) = 100 5C2 (0.568)2 (0.432)3 = 26.01 ª 26
f (3) = 100 5C2 (0.568)3 (0.432)2 = 34.2 ª 34
f (4) = 100 5C2 (0.568)4 (0.432)1 = 22.48 ª 22
f (5) = 100 5C2 (0.568)5 (0.432)0 = 5.91 ª 6
Binomial Distribution
x 0 1 2 3 4 5

Expected binomial frequency 1.5 10 26 34 22 6

Exercise 5.1
1. Find the fallacy if any in the following statements:
  (a) The mean of a binomial distribution is 6 and SD is 4.
  (b) The mean of a binomial distribution is 9 and its SD is 4.
È 8 ˘
Í ans.: (a) False, q = 3 is impossible ˙
Í ˙
Í 19
(b) False, q = is impossible ˙
ÍÎ 9 ˙˚

2. The mean and variance of a binomial distribution are 3 and 1.2
respectively. Find n, p, and P(X < 4).
È 2068 ˘
Í ans.: 5, 0.6, 3125 ˙
 Î ˚
5.2 Binomial Distribution 5.25

10
3. Find the binomial distribution if the mean is 5 and the variance is .
Find P(X = 2). 3

È 25
x
Ê 1ˆ Ê 2 ˆ
25 - x
˘
Í ans.: P( X = x) = Cx Á ˜ Á ˜ , 0.003˙
ÍÎ Ë 3¯ Ë 3¯ ˙˚

4. In a binomial distribution, the mean and variance are 4 and 3 respectively.
Find P(X ≥ 1).
ÈÎans.: 0.9899˘˚

5. The odds in favour of X winning a game against Y are 4:3. Find the
probability of Y winning 3 games out of 7 played.
 ÎÈans.: 0.0929˚˘
6. On an average, 3 out of 10 students fail in an examination. What is the
probability that out of 10 students that appear for the examination
none will fail?
 ÎÈans.: 0.0282˚˘
7. If on the average rain falls on 10 days in every thirty, find the probability
(i) that the first three days of a week will be fine and remaining wet,
and (ii) that rain will fall on just three days of a week.
È 8 280 ˘
Í ans.: (i) 2187 (ii) 2187 ˙
 Î ˚
8. Two unbiased dice are thrown three times. Find the probability that the
sum nine would be obtained (i) once, and (ii) twice.

 ÎÈans.: (i) 0.26 (ii) 0.03˚˘


9. For special security in a certain protected area, it was decided to put
three lightbulbs on each pole. If each bulb has probability p of burning
out in the first 100 hours of service, calculate the probability that at
least one of them is still good after 100 hours. If p = 0.3, how many
bulbs would be needed on each pole to ensure with 99% safety that at
least one is good after 100 hours?
ÈÎ ans.: (i) 1 - p 3 (ii) 4 ˘˚

10. It is known from past records that 80% of the students in a school do
their homework. Find the probability that during a random check of
10 students, (i) all have done their homework, (ii) at the most two
have not done their homework, and (iii) at least one has not done the
homework.
 ÎÈans.: (i) 0.1074 (ii) 0.6778 (iii) 0.8926 ˚˘
5.26 Chapter 5 Some Special Probability Distributions

11. An insurance salesman sells policies to 5 men, all of identical age and
good health. According to the actuarial tables, the probability that a
2
man of this particular age will be alive 30 years hence is . Find the
3
probability that 30 years hence (i) at least 1 man will be alive, (ii) at
least 3 men will be alive, and (iii) all 5 men will be alive.
È 242 64 32 ˘
Í ans.: (i) 243 (ii) 81 (iii) 243 ˙
 Î ˚
12. A company has appointed 10 new secretaries out of which 7 are trained.
If a particular executive is to get three secretaries selected at random,
what is the chance that at least one of them will be untrained?
ÈÎans.: 0.7083˘˚

13. The overall pass rate in a university examination is 70%. Four candidates
take up such an examination. What is the probability that (i) at least
one of them will pass? (ii) all of them will pass the examination?
ÈÎans.: (i) 0.9919 (ii) 0.7599˘˚

14. T
 he normal rate of infection of a certain disease in animals is known
to be 25%. In an experiment with a new vaccine, it was observed that
none of the animals caught the infection. Calculate the probability of
the observed result.
È 729 ˘
Í ans.: 4096 ˙
 Î ˚
15. Suppose that weather records show that on the average, 5 out of 31
days in October are rainy days. Assuming a binomial distribution with
each day of October as an independent trial, find the probability that
the next October will have at most three rainy days.
ÈÎans.: 0.2403˘˚

16. Assuming that half the population of a village is female and assuming
that 100 samples each of 10 individuals are taken, how many samples
would you expect to have 3 or less females?
ÈÎans.: 17 ˘˚

17. Assuming that half the population of a town is vegetarian so that the
1
chance of an individual being vegetarian is , and assuming that 100
2
investigators can take a sample of 10 individuals to see whether they
are vegetarians, how many investigators would you expect to report
that three people or less in the sample were vegetarians?
ÈÎans.: 17 ˘˚

5.3 Poisson Distribution 5.27

18. The probability of failure in a physics practical examination is 20%.


If 25 batches of 6 students each take the examination, in how many
batches of 4 or more students would pass?
ÈÎans.: 23˘˚

19. A lot contains 1% defective items. What should be the number of items
in a lot so that the probability of finding at least one defective item in
it is at least 0.95?
 ÎÈans.: 299˚˘
20. The probability that a bomb will hit the target is 0.2. Two bombs
are required to destroy the target. If six bombs are used, find the
probability that the target will be destroyed.
ÈÎans.: 0.3447 ˘˚

21. Out of 1000 families with 4 children each, how many would you expect
to have (i) 2 boys and 2 girls? (ii) at least one boy? (iii) no girl? (iv) at
most 2 girls?
ÈÎans.: (i) 375 (ii) 938 (iii) 63 (iv) 69˘˚

22. In a sampling of a large number of parts produced by a machine, the
mean number of defectives in a sample of 20 is 2. Out of 1000 such
samples, how many samples would you expect to contain at least 3
defectives?
ÈÎans.: 323˘˚

23. F
 ive pair coins are tossed 3200 times, find the frequency distribution
of the number of heads obtained. Also, find the mean and SD.
ÈÎans.: (i) 100, 500, 1000, 1000, 500, 100 (ii) 1600 (iii) 28.28˘˚

24. Fit a binomial distribution to the following data:

x 0 1 2 3 4

f 12 66 109 59 10

 ÎÈans.: 17, 67, 96, 61, 15˚˘

5.3 Poisson Distribution

A random variable X is said to follow poisson distribution if the probability of x is


given by
e- l l x
P( X = x ) = p( x ) = , x = 0, 1, 2, ...
x!
where l is called the parameter of the distribution.
5.28 Chapter 5 Some Special Probability Distributions

5.3.1 Poisson Approximation to the Binomial Distribution


Poisson distribution is a limiting case of binomial distribution under the following
conditions:
(i) The number of trials should be infinitely large, i.e., n Æ •.
(ii) The probability of successes p for each trial should be very small, i.e., p Æ 0.
(iii) np = l should be finite where l is a constant.
The binomial distribution is
P( X = x ) = nC x p x q n - x
x
Ê pˆ
= nC x Á ˜ q n
Ë q¯
x
Ê p ˆ
= nC x Á (1 - p)n
Ë 1 - p ˜¯

l
Putting p = ,
n
x
Ê l ˆ
n
n(n - 1)(n - 2) (n - x + 1) Á n ˜ Ê l ˆ
P( X = x ) = Á Á 1 - ˜
x! l˜ Ë x¯
Á 1- ˜
Ë n ¯
n
n(n - 1)(n - 2) (n - x + 1) l x 1 Ê lˆ
= ÁË 1 - ˜¯
x! n Ê lˆx
x x
ÁË 1 - ˜¯
n
n- x
n(n - 1)(n - 2) (n - x + 1) l x Ê l ˆ
= Á 1 - ˜¯
x! nx Ë n
Ê 1 ˆ Ê 2 ˆ È Ê x - 1ˆ ˘
1 Á 1 - ˜ Á 1 - ˜  Í1 - Á
Ë n ¯ Ë n ¯ Î Ë n ˜¯ ˙˚ xÊ lˆ
n- x
= l Á1 - ˜
x! Ë n¯
n- x
Ê lˆ
Since lim Á 1 - ˜ = e- l
n Æ• Ë n¯
Ê 1ˆ Ê 2ˆ
and lim Á 1 - ˜ = lim Á 1 - ˜ = 1
n Æ• Ë n ¯ nÆ• Ë n ¯
5.3 Poisson Distribution 5.29

Taking the limits of both the sides as n Æ •,


l x e- l
P( X = x ) = , x = 0, 1, 2, ..., •
x!

5.3.2 Examples of Poisson Distribution


(i) Number of defective bulbs produced by a reputed company
(ii) Number of telephone calls per minute at a switchboard
(iii) Number of cars passing a certain point in one minute
(iv) Number of printing mistakes per page in a large text
(v) Number of persons born blind per year in a large city

5.3.3 Conditions of Poisson Distribution


The Poisson distribution holds under the following conditions:
(i) The random variable X should be discrete.
(ii) The numbers of trials n is very large.
(iii) The probability of success p is very small (very close to zero).
(iv) l = np is finite.
(v) The occurrences are rare.

5.3.4 Constants of the Poisson Distribution


1. Mean of the Poisson Distribution

E( X ) = Â x p( x)
x =0

e- l l x
= Âx x!
x =0

x e - l l l x -1
= Â x!
x =0

x l x -1
= e- l ◊ l  x!
x =1

l x -1 È x 1 ˘
= l e- l  ( x - 1)! Í∵ =
x ! ( x - 1)! ˙˚
x =1 Î
Ê l2 ˆ
= l e- l Á 1 + l + + ˜
Ë 2! ¯
= l e- l el
   =l
5.30 Chapter 5 Some Special Probability Distributions

2. Variance of the Poisson Distribution

Var ( X ) = E ( X 2 ) - m 2

= Â x 2 p( x ) - m 2
x =0

e- l l x
= Â x2
x!
- l2
x =0
• -l
e lx
= Â x [( x - 1) + x ] x!
- l2
x =0

x( x - 1) e - l l x • x e - l l x
= Â x!

x!
- l2
x =0 x =0

x( x - 1) e - l l x - 2 l 2
= Â x ( x - 1)( x - 2)  1
+ l - l2
x =0

l x -2
= e- l l 2 Â + l - l2
x =2 ( x - 2)!
Ê l2 ˆ
= e- l l 2 Á 1 + l + + ˜ + l - l 2
Ë 2! ¯
= -el e- l l 2 + l - l 2
= l2 + l - l2
=l

3. Standard Deviation of the Poisson Distribution

SD = Variance = l

4. Mode of the Poisson Distribution


Mode is the value of x for which the probability p(x) is maximum.
p( x ) ≥ p( x + 1) and p( x ) ≥ p( x - 1)
When p(x) ≥ p(x + 1),

e - l l x e - l l x +1

x! ( x + 1)!
l
1≥
x +1
( x + 1) ≥ l
x ≥ l -1  ...(5.1)
Similarly, for p(x) ≥ p(x – 1),
    x £ l ...(5.2)
5.3 Poisson Distribution 5.31

Combining Eqs (5.1) and (5.2),


l–1£x£l
Hence, the mode of the Poisson distribution lies between l – 1 and l.
Case I If l is an integer then l – 1 is also an integer. The distribution is bimodal and
the two modes are l – 1 and l.

Case II If l is not an integer, the distribution is unimodal and the mode of the Poisson
distribution is an integral part of l. The mode is the integer between l – 1 and l.

5.3.5 Recurrence Relation for the Poisson Distribution


For the Poisson distribution,
e- l l x
p( x ) =
x!
e - l l x +1
p( x + 1) =
( x + 1)!
p( x + 1) e - l l x +1 x!
= ◊ -l x
p( x ) ( x + 1)! e l
l
=
x +1
l
p( x + 1) = p( x )
x +1

Example 1
Find out the fallacy if any in the statement. “The mean of a Poisson
distribution is 2 and the variance is 3.”
Solution
In a Poisson distribution, the mean and variance are same. Hence, the above statement
is false.

Example 2
If the mean of the Poisson distribution is 4, find
P(l - 2s < X < l + 2s ).
Solution
For a Poisson distribution,
Variance = l
5.32 Chapter 5 Some Special Probability Distributions

Mean = l = 4, s =2
-l
l
e e -4 4 x
x
P( X = x ) = = , x = 0, 1, 2, ...
x! x!
P(l - 2s < X < l + 2s ) = P(0 < X < 8)
7
= Â P( X = x )
x =1
7
e -4 4 x

x =1 x !
= 0.9306

Example 3
If the mean of a Poisson variable is 1.8, find (i) P(X > 1), (ii) P(X = 5),
and (iii) P(0 < X < 5).
Solution
For a Poisson distribution,
    l = 1.8
e - l l x e -1.8 1.8 x
P( X = x ) = = , x = 0, 1, 2, ...
x! x!
(i) P( X > 1) = 1 - P( X £ 1)
= 1 - [P( X = 0) + P( X = 1)]
1
= 1 - Â P( X = x )
x =0
1
e -1.8 1.8 x
= 1- Â
x =0 x!
= 0.5372
e -1.8 1.85
(ii) P( X = 5) = = 0.026
5!
(iii) P(0 < X < 5) = P( X = 1) + P( X = 2) + P( X = 3) + P( X = 4)
4
= Â P( X = x )
x =1
4
e -1.8 1.8 x

x =1 x!
= 0.7983
5.3 Poisson Distribution 5.33

Example 4
If a random variable has a Poisson distribution such that P(X = 1) =
P(X = 2), find (i) the mean of the distribution, (ii) P(X = 4), (iii) P(X ≥ 1),
and (iv) P(1 < X < 4).
Solution
For a Poisson distribution,

e- l l x
P( X = x ) = , x = 0, 1,2,...
x!
(i) P( X = 1) = P( X = 2)
e- l l1 e- l l 2
=
1! 2!
l 2 = 2l
l 2 - 2l = 0
l (l - 2) = 0
l = 0 or l = 2

Since l π 0,   l = 2
e - l l x e -2 2 x
Hence, P( X = x ) = = , x = 0, 1, 2, ...
x! x!
-2 4
(ii) P( X = 4) = e 2 = 0.9022
4!
(iii) P( X ≥ 1) = 1 - P( X < 1)
= 1 - P ( X = 0)
e -2 20
= 1-
0!
= 0.8647
(iv) P(1 < X < 4) = P( X = 2) + P( X = 3)
3
= Â P( X = x )
x =2
3
e -2 2 x
= Â x!
x =2
= 0.4511
5.34 Chapter 5 Some Special Probability Distributions

Example 5
If X is a Poisson variate such that P(X = 0) = P(X = 1), find P(X = 0)
and using recurrence relation formula, find the probabilities at x = 1, 2,
3, 4, and 5.
Solution
For a Poisson distribution,
e- l l x
P( X = x ) = , x = 0, 1,2, ...
x!
P( X = 0) = P( X = 1)
e- l l 0 e- l l1
=
0! 1!
l=1

e - l 1x
Hence, P ( X = x ) = , x = 0, 1, 2, ...
x!
e- l l 0
(i) P( X = 0) = = 0.3678
0!
(ii) By recurrence relation,
l
p( x + 1) = p( x )
x +1
1
p( x + 1) = p( x ) [∵ l = 1]
x +1
p(1) = p(0) = 0.3678
1 1
p(2) = p(1) = (0.3678) = 0.1839
2 2
1 1
p(3) = p(2) = (0.1839) = 0.0613
3 3
1 1
p(4) = p(3) = (0.0613) = 0.015325
4 4
1 1
p(5) = p(4) = (0.015325) = 0.003065
5 5

Example 6
If the variance of a Poisson variate is 3, find the probability that (i) X = 0,
(ii) 0 < X £ 3, and (iii) 1 £ X < 4.
5.3 Poisson Distribution 5.35

Solution
For a Poisson distribution,
Variance = Mean = l = 3
e - l l x e -3 3 x
P( X = x ) = = , x = 0, 1,2, ...
x! x!
e -3 30
(i) P ( X = 0) = = 0.0498
0!
(ii) P (0 < X £ 3) = P( X = 1) + P( X = 2) + P( X = 3)
3
= Â P( X = x )
x =1
3
e -3 3 x

x =1 x !
= 0.5974
(iii) P(1 £ X < 4) = P ( X = 1) + P( X = 2) + P( X = 3)
3
= Â P( X = x )
x =1
3
e -3 3 x

x =1 x !
= 0.5974

Example 7
3
If a Poisson distribution is such that P( X = 1) = P( X = 3), find
2
(i) P(X ≥ 1), (ii) P(X £ 3), and (iii) P(2 £ X £ 5).
Solution
For a Poisson distribution,
e- l l x
P( X = x ) = , x = 0, 1, 2,...
x!
3
P( X = 1) = P( X = 3)
2
3 e- l l1 e- l l 3
=
2 1! 3!
3 l3
l=
2 6
l 3 - 9l = 0
5.36 Chapter 5 Some Special Probability Distributions

l (l 2 - 9) = 0
l = 0, 3, - 3
Since l > 0,   l = 3
e -3 3 x
Hence, P( x = x ) = , x = 0, 1, 2, ...
x!
(i) P( X ≥ 1) = 1 - P( X < 1)
= 1 - P( X = 0)
e -3 30
= 1-
0!
= 0.9502
(ii) P( X £ 3) = P( X = 0) + P( X = 1) + P( X = 2) + P( X = 3)
3
= Â P( X = x )
x =0
3
e -3 3 x
= Â
x =0 x !
= 0.6472
(iii) P(2 £ X £ 5) = P( X = 2) + P( X = 3) + P( X = 4) + P( X = 5)
5
= Â P( X = x )
x =2
5
e-3 3 x
= Â x!
x =2
= 0.7169

Example 8
If X is a Poisson variate such that
P( X = 2) = 9 P ( X = 4) + 90 P ( X = 6)
Find (i) the mean of X, (ii) the variance of X, (iii) P(X < 2), (iv) P(X > 4),
and (v) P(X ≥ 1).
Solution
For a Poisson distribution,
e- l l x
P( X = x ) = , x = 0, 1, 2, ...
x!
P( X = 2) = 9 P ( X = 4) + 90 P ( X = 6)
5.3 Poisson Distribution 5.37

e- l l 2 e- l l 4 e- l l 6
=9 + 90
2! 4! 6!
Ê 9l 2
90 l 4 ˆ
= e- l l 2 Á + ˜
Ë 4! 6! ¯
1 9l 2 90 l 4
= +
2 4! 6!
1 3l 2 l 4
= +
2 8 8
l 4 + 3l 2 - 4 = 0
3 ± 9 + 16 -3 ± 5
l2 = - = = 1, - 4
2 2
Since l > 0, l2 = 1
(i) Mean = l = 1
(ii) Variance = l = 1
e -11x
P( X = x ) = , x = 0, 1, 2, ...
x!
(iii) P( X < 2) = P( X = 0) + P( X = 1)
1
e -11x
= Â
x =0 x !
= 0.7358
(iv) P( X > 4) = 1 - P( X £ 4)
= 1 - [P( X = 0) + ( X = 1) + P( X = 2) + P( X = 3) + P( X = 4)]
4
e -11x
= 1- Â
x =0 x !
= 0.00366
(v) P( X ≥ 1) = 1 - P( X = 0)
e -110
= 1-
1!
= 0.6321

Example 9
3
If a Poisson distribution is such that P( X = 1) = P( X = 3), find
2
(i) P(X ≥ 1), (ii) P(X £ 3), and (iii) P(2 £ X £ 5).
5.38 Chapter 5 Some Special Probability Distributions

Solution
3
P( X = 1) = P( X = 3)
2
3 e- l l1 e- l l 3
=
2 1! 3!
3 l2
=
2 6
l2 = 9
l = ±3
Since l > 0,    l = 3
e -3 3 x
P( X = x ) = , x = 0, 1, 2, ...
x!
(i) P( X ≥ 1) = 1 - P( X < 1)
= 1 - P ( X = 0)
e -3 30
= 1-
0!
= 0.9502
(ii) P( X £ 3) = P( X = 0) + P( X = 1) + P( X = 2) + P( X = 3)
3
= Â P( X = x )
x =0
3
e -3 3 x
= Â
x =0 x !
= 0.6472
(iii) P(2 £ X £ 5) = P( X = 2) + P( X = 3) + P( X = 4) + P( X = 5)
5
= Â P( X = x )
x =2
5
e -3 3 x
= Â
x =2 x !
= 0.7169

Example 10
If X is a Poisson variate such that
1
3 P( X = 4) =P ( X = 2) + P ( X = 0)
2
Find (i) the mean of X, and (ii) P(X £ 2).
5.3 Poisson Distribution 5.39

Solution
(i) For a Poisson distribution,
e- l l x
P( X = x ) = , x = 0, 1,2, ...
x!
1
3 P ( X = 4) = P( X = 2) + P( X = 0)
2
-l 4
e l 1 e- l l 2 e- l l 0
3 = +
4! 2 2! 0!
l 4 - 2l 2 - 8 = 0
(l 2 - 4) (l 2 + 2) = 0
l = ±2 (∵ l is real)
l=2 (∵ l > 0)
Mean = l = 2
e -2 2 x
Hence, P( X = x ) = , x = 0, 1, 2, ...
x!
(ii) P( X £ 2) = P( X = 0) + P( X = 1) + P( X = 2)
2
= Â P( X = x )
x =0
2
e -2 2 x
= Â x!
x =0
= 0.6766

Example 11
A manufacturer of cotterpins knows that 5% of his products are defective.
If he sells cotterpins in boxes of 100 and guarantees that not more than
10 pins will be defective, what is the approximate probability that a box
will fail to meet the guaranteed quality?
Solution
Let p be the probability of a pin being defective.
p = 5% = 0.05,   n = 100
Since p is very small and n is large, Poisson distribution is used.
l = np = 100 × 0.05 = 5
Let X be the random variable which denotes the number of defective pins in a box of
100.
5.40 Chapter 5 Some Special Probability Distributions

Probability of x defective pins in a box of 100


e - l l x e -5 5 x
P( X = x ) = = , x = 0, 1, 2, ...
x! x!
Probability that a box will fail to meet the guaranteed quality
P( X > 10) = 1 - P( X £ 10)
10
= 1 - Â P( X = x )
x =0

e -5 5 x
10
= 1- Â
x =0 x !
= 0.0137

Example 12
A car-hire firm has two cars, which it hires out day by day. The number
of demands for a car on each day is distributed as a Poisson distribution
with a mean of 1.5. Calculate the proportion of days on which (i) neither
car is used, and (ii) the proportion of days on which some demand is
refused.
Solution
l = 1.5
Let X be the random variable which denotes the number of demands for a car on each
day.
Probability of days on which there are x demands for a car
e - l l x e -1.5 1.5 x
P( X = x ) = = , x = 0, 1, 2, ...
x! x!
(i) Proportion or probability of days on which neither car is used
e -1.5 1.50
P( X = 0) = = 0.2231
0!
(ii) Proportion or probability of days on which some demand is refused
P( X > 2) = 1 - P( X £ 2)
2
= 1 - Â P( X = x )
x =0
2
e -1.5 1.5 x
= 1- Â
x =0 x!
= 0.1912
5.3 Poisson Distribution 5.41

Example 13
Six coins are tossed 6400 times. Using the Poisson distribution, what is
the approximate probability of getting six heads 10 times?
Solution
Let p be the probability of getting one head with one coin.
1
p=
2
6
Ê 1ˆ 1
Probability of getting 6 heads with 6 coins = ÁË ˜¯ =
2 64
n = 6400
Ê 1ˆ
l = np = 6400 Á ˜ = 100
Ë 64 ¯
Probability of getting x heads
e - l l x e -100 100 x
P( X = x ) = = , x = 0, 1, 2, ...
x! x!
Probability of getting 6 heads 10 times
e -100 10010
P ( X = 10) = = 1.025 ¥ 10 -30
10!

Example 14
If 2% of lightbulbs are defective, find the probability that (i) at least one
is defective, and (ii) exactly 7 are defective. Also, find P(1 < X < 8) in a
sample of 100.
Solution
Let p be the probability of defective bulb.
p = 2% = 0.02
n = 100
Since p is very small and n is large, Poisson distribution is used.
l = np = 100(0.02) = 2
Let X be the random variable which denotes the number of defective bulbs in a sample
of 100.
Probability of x defective bulb in a sample of 100
e - l l x e -2 2 x
P( X = x ) = = , x = 0, 1, 2, ...
x! x!
5.42 Chapter 5 Some Special Probability Distributions

(i) Probability that at least one bulb is defective


P( X ≥ 1) = 1 - P( X = 0)
e -2 20
= 1-
0!
= 0.8647
(ii) Probability that exactly 7 bulbs are defective
e -2 27
P( X = 7) = = 0.0034
7!
(iii) P(1 < X < 8) = P( X = 2) + P( X = 3) + P( X = 4) + P( X = 5) + P( X = 6)
+ P( X = 7)
7
= Â P( X = x )
x =2
7
e -2 2 x
= Â x!
x =2
= 0.5929

Example 15
An insurance company insured 4000 people against loss of both eyes
in a car accident. Based on previous data, the rates were computed on
the assumption that on the average, 10 persons in 100000 will have
car accidents each year that result in this type of injury. What is the
probability that more than 3 of the insured will collect on their policy in
a given year?
Solution
Let p be the probability of loss of both eyes in a car accident.
10
p= = 0.0001
100000
n = 4000
Since p is very small and n is large, Poisson distribution is used.
l = np = 4000 (0.0001) = 0.4
Let X be the random variable which denotes the number of car accidents in a group of
4000 people.
Probability of x car accidents in a group of 4000 people
e - l l x e -0.4 0.4 x
P( X = x ) = = , x = 0, 1, 2, ...
x! x!
Probability that more than 3 of the insured will collect on their policy, i.e., probability
of more than 3 car accidents in a group of 4000 people
5.3 Poisson Distribution 5.43

P( X > 3) = 1 - P( X £ 3)
= 1 - [P( X = 0) + ( X = 1) + P( X = 2) + P( X = 3)]
3
= 1 - Â P( X = x )
x =0
3
e -0.4 0.4 x
= 1- Â
x =0 x!
= 0.00077

Example 16
If two cards are drawn from a pack of 52 cards which are diamonds,
using Poisson distribution, find the probability of getting two diamonds
at least 3 times in 51 consecutive trials of two cards drawing each
time.
Solution
Let p be the probability of getting two diamonds from a pack of 52 cards.
13
C2 3
p= 52
= , n = 51
C2 51
Since p is very small and n is large, Poisson distribution is used.
Ê 3ˆ
l = np = 51 Á ˜ = 3
Ë 51¯
Let X be the random variable which denotes the drawing of two diamond cards.
Probability of x trials of drawing two diamond cards in 51 trials
e - l l x e -3 3 x
P( X = x ) = = , x = 0, 1, 2, ...
x! x!
Probability of getting two diamond cards at least 3 times in 51 trials
P(X ≥ 3) = 1 – P(X < 3)
     = 1 – [P(X = 0) + P(X = 1) + P(X = 2)]
2
e -3 3 x
= 1- Â
x =0 x !

    = 0.5768

Example 17
Suppose a book of 585 pages contains 43 typographical errors. If
these errors are randomly distributed throughout the book, what is the
probability that 10 pages, selected at random, will be free from errors?
5.44 Chapter 5 Some Special Probability Distributions

Solution
Let p be the probability of errors in a page.
43
p= = 0.0735, n = 10
585
Since p is very small and n is large, Poisson distribution is used.
l = np = 10(0.0735) = 0.735
Let X be the random variable which denotes the errors in the pages.
Probability of x errors in a page in a book of 585 pages
e - l l x e -0.735 0.735 x
P( X = x ) = = , x = 0, 1, 2, ...
x! x!
Probability that a random sample of 10 pages will contain no error.
e -0.735 0.7350
P( X = 0) = = 0.4795
0!

Example 18
A hospital switchboard receives an average of 4 emergency calls in a
10-minute interval. What is the probability that (i) there are at most
2 emergency calls? (ii) there are exactly 3 emergency calls in an interval
of 10 minutes?
Solution
Let p be the probability of receiving emergency calls per minute.
4
p= = 0.4, n = 10
10
l = np = 10 (0.4) = 4
Let X be the random variable which denotes the number of emergency calls per
minute.
Probability of x emergency calls per minute
e - l l x e -4 4 x
P( X = x ) = = , x = 0, 1, 2, ...
x! x!
Probability that there are at most 2 emergency calls
P( X £ 2) = P( X = 0) + P( X = 1) + P( X = 2)
2
= Â P( X = x )
x =0
2
e -4 4 x
= Â x!
x =0
= 0.238
5.3 Poisson Distribution 5.45

Probability that there are exactly 3 emergency calls


e -4 43
P( X = 3) = = 0.1954
3!

Example 19
A manufacturer, who produces medicine bottles, finds that 0.1% of
the bottles are defective. The bottles are packed in boxes containing
500 bottles. A drug manufacturer buys 100 boxes from the producer of
bottles. Using Poisson distribution, find how many boxes will contain
(i) no defective bottles and (ii) at least 2 defective bottles.
Solution
Let p be the probability of deflective bottles.
p = 0.1% = 0.001
n = 500
l = np = 500 (0.001) = 0.5
Let X be the random variable which denotes the number of defective bottles in a box
of 500.
Probability of x defective bottles in a box of 500
e - l l x e -0.5 0.5 x
P( X = x ) = = , x = 0, 1, 2, ...
x! x!
(i) Probability of no defective bottles in a box
e -0.5 0.50
P ( X = 0) = = 0.6065
0!
Number of boxes containing no defective bottles
f (x) = N P(x = 0) = 100(0.6065) ª 61
(ii) Probability of at least 2 defective bottles
P( X ≥ 2) = 1 - P( X < 2)
= 1 - [P( X = 0) + P( X = 1)]
1
= 1 Â P( X = x )
x =0
1
e -0.5 0.5 x
= 1- Â
x =0 x!
= 0.0902
Number of boxes containing at least 2 defective bottles
f (x) = N P(X ≥ 2) = 100 (0.0902) ª 9
5.46 Chapter 5 Some Special Probability Distributions

Example 20
1
In a certain factory turning out blades, there is a small chance of
500
for any blade to be defective. The blades are supplied in packets of 10.
Use the Poisson distribution to calculate the approximate number of
packets containing no defective, one defective, and two defective blades
in a consignment of 10000 packets.
Solution
Let p be the probability of defective blades in a packet.
1
p= , n = 10, N = 10000
500
Ê 1 ˆ
l = np = 10 Á = 0.02
Ë 500 ˜¯
Let X be the random variable which denotes the number of defective blades in a
packet.
Probability of x defective blades in a packet
e - l l x e -0.02 0.02 x
P( X = x ) = = , x = 0, 1, 2, ...
x! x!
(i) Probability of no defective blades in a packet
e -0.02 0.020
P ( X = 0) = = 0.9802
0!
Number of packets with no defective blades
f (­x) = N P(X = 0) = 10000(0.9802) = 9802
(ii) Probability of one defective blade in a packet
e-0.02 0.021
P( X = 1) = = 0.0196
1!
Number of packets with one defective blade
f (x) = N P(X = 1) = 10000 (0.0196) = 196
(iii) Probability of two defective blades in a packet
e -0.02 0.022
P( X = 2) = = 1.96 ¥ 10 -4
2!
Number of packets with 2 defective blades
f (x) = N P(X = 2) = 10000 (1.96 × 10–4) = 1.96 ª 2

Example 21
The number of accidents in a year attributed to taxi drivers in a city
follows Poisson distribution with a mean of 3. Out of 1000 taxi drivers,
5.3 Poisson Distribution 5.47

find approximately the number of drivers with (i) no accidents in a year,


and (ii) more than 3 accidents in a year.
Solution
For a Poisson distribution,
  l = 3, N = 1000
Probably of x accidents in year
e - l l x e -3 3 x
P( X = x ) = = , x = 0, 1, 2, ...
x! x!
(i) Probability of no accidents in a year
e -3 30
P( X = 0) = = 0.0498
0!
Number of drivers with no accidents
f (x) = N P(X = 0) = 1000(0.0498) = 49.8 ª 50
(ii) Probability of more than 3 accidents in a year
P( X > 3) = 1 - P( X £ 3)
= 1 - [P( X = 0) + P( X = 1) + P( X = 2) + P( X = 3)]
3
= 1 - Â P( X = x )
x =0
3
e -3 3 x
= 1- Â
x =0 x!
= 0.3528
Number of drivers with more than 3 accidents
f (x) = N P(X > 3) = 1000 (0.3528) = 3528 ª 353

Example 22
Fit a Poisson distribution to the following data:
Number of deaths (x) 0 1 2 3 4
Frequency (f) 122 60 15 2 1

Solution
Mean =
 fx
Âf
122(0) + 60(1) + 15(2) + 2(3) + 1(4)
=
122 + 60 + 15 + 2 + 1
100
=
200
= 0.5
5.48 Chapter 5 Some Special Probability Distributions

For a Poisson distribution,


l = 0.5
e - l l x e -0.5 0.5 x
P( X = x ) = = , x = 0, 1, 2, 3, 4
x! x!
N = Â f = 100

Theoretical or expected frequency f (x) = N P(X = x)


200 e -0.5 0.5 x
f ( x) =
x!
200 e -0.5 0.50
f (0) = = 121.31 ª 121
0!
200 e -0.5 0.51
f (1) = = 60.65 ª 61
1!
200 e -0.5 0.52
f (2) = = 15.16 ª 15
2!
200 e -0.5 0.53
f (3) = = 2.53 ª 3
3!
200 e -0.5 0.54
f (4) = = 0.32 ª 0
4!
Poisson Distribution

Number of deaths (x) 0 1 2 3 4

Expected Poisson frequency f (x) 121 61 15 3 0

Example 23
Assuming that the typing mistakes per page committed by a typist follows
a Poisson distribution, find the expected frequencies for the following
distribution of typing mistakes:
Number of mistakes per page 0 1 2 3 4 5

Number of pages 40 30 20 15 10 5

Solution

Mean =
 fx
Âf
40(0) + 30(1) + 20(2) + 15(3) + 10(4) + 5(5)
=
40 + 30 + 20 + 15 + 10 + 5
5.3 Poisson Distribution 5.49

180
=
120
= 1.5
For a Poisson distribution,
l = 1.5
e - l l x e -1.5 1.5 x
P( X = x ) = = , x = 0, 1, 2,3, 4, 5
x! x!
N = Â f = 120

Expected frequency f ( x ) = N P ( X = x )
120 e -1.5 1.5 x
f ( x) =
x!
120 e -1.5 1.50
f (0) = = 26.78 ª 27
0!
120 e -1.5 1.51
f (1) = = 40.16 ª 40
1!
120 e -1.5 1.52
f (2) = = 30.12 ª 30
2!
120 e -1.5 1.53
f (3) = = 15.06 ª 15
3!
120 e -1.5 1.54
f (4) = = 5.65 ª 6
4!
120 e -1.5 1.55
f (5) = = 1.69 ª 2
  5!

Exercise 5.2
1. The mean and variance of a probability distribution is 2. Write down
the distribution.
È e -2 2 x ˘
Í ans.: P( X = x) = , x = 0, 1, 2, ...˙
Î x! ˚

2. In a Poisson distribution, the probability P(X = 0) is 20 per cent. Find the
mean of the distribution.
 ÎÈans.: 2.9957 ˚˘
3. If X is a Poisson variate and P(X = 0) = 6 P(X = 3), find P(X = 2).

 ÎÈans.: 0.1839˚˘
5.50 Chapter 5 Some Special Probability Distributions

4. The standard deviation of a Poisson distribution is 3. Find the probability


of getting 3 successes.
ÈÎans.: 0.0149˘˚

5. The probability that a Poisson variable X takes a positive value is
1 – e–1.5. Find the variance and the probability that X lies between –1.5
and 1.5.
 ÎÈans.: 1.5, 0.5578˚˘
6. If 2 per cent bulbs are known to be defective bulbs, find the probability
that in a lot of 300 bulbs, there will be 2 or 3 defective bulbs using
Poisson distribution.
ÈÎans.: 0.1338˘˚

7. In a certain manufacturing process, 5% of the tools produced turn out
to be defective. Find the probability that in a sample of 40 tools, at
most 2 will be defective.
ÈÎans.: 0.675˘˚

8. If the probability that an individual suffers a bad reaction from a
particular injection is 0.001, determine the probability that out of 2000
individuals (i) exactly three, and (ii) more than two individuals suffer a
bad reaction.
ÈÎans.: (i) 0.1804 (ii) 0.3233˘˚

9. It is known from past experience that in a certain plant, there are on
the average 4 industrial accidents per year. Find the probability that
in a given year, there will be less than 4 accidents. Assume Poisson
distribution.
 ÎÈans.: 0.43˚˘
10. Find the probability that at most 5 defective fuses will be found in
a box of 200 fuses, if experience shows that 2% of such fuses are
defective.
ÈÎans.: 0.7851˘˚

11. Assume that the probability of an individual coal minor being killed
1
in a mine accident during a year is . Use appropriate statistical
2400
distribution to calculate the probability that in a mine employing 200
miners, there will be at least one fatal accident every year.
ÈÎans.: 0.07 ˘˚

12. Between the hours of 2 and 4 p.m., the average number of phone calls
per minute coming into the switchboard of a company is 2.5. Find the
5.3 Poisson Distribution 5.51

probability that during a particular minute, there will be (i) no phone


call at all, (ii) 4 or less calls, and (iii) more than 6 calls.
ÈÎans.: (i) 0.0821(ii) 0.8909 (iii) 0.0145˘˚

13. Suppose that a local appliances shop has found from experience that
the demand for tubelights is roughly distributed as Poisson with a
mean of 4 tubelights per week. If the shop keeps 6 tubelights during a
particular week, what is the probability that the demand will exceed
the supply during that week?
ÈÎans.: 0.1106 ˘˚

14. The distribution of the number of road accidents per day in a city is
Poisson with a mean of 4. Find the number of days out of 100 days
when there will be (i) no accident, (i) at least 2 accidents, and (iii) at
most 3 accidents.
 ÎÈans.: (i) 2 (ii) 91(iii) 44 ˚˘
15. A manufacturer of electric bulbs sends out 500 lots each consisting of
100 bulbs. If 5% bulbs are defective, in how many lot can we expect
(i) 97 or more good bulbs? (ii) less than 96 good bulbs?
ÈÎans.: (i) 62 (ii) 132 ˘˚

16. A
 firm produces articles, 0.1 per cent of which are defective. It packs
them in cases containing 500 articles. If a wholesaler purchases 100
such cases, how many cases can be expectecd (i) to be free from
defects? (ii) to have one defective article?
ÈÎans.: (i) 16 (ii) 30 ˘˚

17. In a certain factory producing certain articles, the probability that an
1
article is defective is . The articles are supplied in packets of 20.
500
Find approximately the number of packets containing no defective,
one defective, two defectives in a consignment of 20000 packets.
ÈÎans.: 19200, 768, 15˘˚

18. In a certain factory manufacturing razor blades, there is a small
1
chance, for any blade to be defective. The blades are placed in
50
packets, each containing 10 blades. Using the Poisson distribution,
calculate the approximate number of packets containing not more
than 2 defective blades in a consignment of 10000 packets.
ÈÎans.: 9988 ˘˚

5.52 Chapter 5 Some Special Probability Distributions

19. It is known that 0.5% of ballpen refills produced by a factory are
defective. These refills are dispatched in packaging of equal numbers.
Using a Poisson distribution, determine the number of refills in a
packing to be sure that at least 95% of them contain no defective
refills.
ÈÎans.: 10 ˘˚

20. A
 manufacturer finds that the average demand per day for the
mechanics to repair his new product is 1.5 over a period of one
year and the demand per day is distributed as a Poisson variate. He
employs two mechanics. On how many days in one year (i) would both
mechanics would be free? (ii) some demand is refused?

 ÎÈans.: (i) 81.4 days (ii) 69.8 days˚˘


21. Fit a Poisson distribution to the following data:

X 0 1 2 3 4

f 211 90 19 5 0

 ÎÈans.: l = 0.44, Frequencies : 209, 92, 20, 3, 1˚˘


22. Fit a Poisson distribution to the following data:

No. of defects per piece 0 1 2 3 4

No. of pieces 43 40 25 10 2

 ÎÈans.: Frequencies: 42, 44, 24, 8, 2˚˘


23. Fit a Poisson distribution to the following data:

X 0 1 2 3 4 5

f 142 156 69 27 5 1

 ÎÈans.: Frequencies:147, 147, 74, 24, 6, 2 ˚˘


24. Fit a Poisson distribution to the following data:

X 0 1 2 3 4 5 6 7 8

f 56 156 132 92 37 22 4 0 1

 ÎÈans.: Frequency : 70, 137, 135, 89, 44, 17, 6, 2, 0 ˚˘


5.4 Normal Distribution 5.53

5.4 Normal Distribution

A continuous random variable X is said to


follow normal distribution with mean m and
variance s2, if its probability function is
given by
2
1 Ê x-m ˆ
1 - Á ˜
2Ë s ¯
f ( x) = e
s 2p
- • < X < •, - • < m < •, s > 0 Fig. 5.1
where m and s are called parameters of the normal distribution. The curve representing
the normal distribution is called the normal curve (Fig. 5.1).

5.4.1 Properties of the Normal Distribution


A normal probability curve, or normal curve, has the following properties:
(i) It is a bell-shaped symmetrical curve about the ordinate X = m. The ordinate is
maximum at X = m.
(ii) It is a unimodal curve and its tails extend infinitely in both the directions, i.e.,
the curve is asymptotic to X-axis in both the directions.
(iii) All the three measures of central tendency coincide, i.e.,
mean = median = mode
(iv) The total area under the curve gives the total probability of the random variable
X taking values between – • to •. Mathematically,
2
• 1 Ê x-m ˆ
1 - Á ˜
2Ë s ¯
P(-• < X < •) = Ú e dx = 1
-• s 2p

(v) The ordinate at X = m divides the area under the normal curve into two equal
parts, i.e.,
m •
1
Ú f ( x ) dx = Ú f ( x) dx = 2
-• m

(vi) The value of f (x) is always nonnegative for all values of X, i.e., the whole curve
lies above the X-axis.
(vii) The points of inflexion (the point at which curvature changes) of the curve are
at X = m + s and the curve changes from concave to convex at X = m + s to
X = m – s.
(viii) The area under the normal curve (Fig. 5.2) is distributed as follows:
(a) The area between the ordinates at m – s and m + s is 68.27%
(b) The area between the ordinates at m – 2s and m + 2s is 95.45%
(c) The area between the ordinates at m – 3s and m + 3s is 99.74%
5.54 Chapter 5 Some Special Probability Distributions

Fig. 5.2

5.4.2 Constants of the Normal Distribution

1. Mean of the Normal Distribution



E( X ) = Ú x f ( x ) dx
-•
2
• 1 Ê x-m ˆ
1 - Á ˜
2Ë s ¯
= Ú x
s 2p
e dx
-•

x-m
Putting = t , dx = s dt
s
• 1
1 - t2
E( X ) = Ú (m + s t) 2p
e 2 dt
-•
• 1 • 1
1 - t2 t - t2
=m Ú 2p
e 2 dt + Ú s
2p
e 2 dt

-• -•

Putting t2 = u in the second integral,


2t dt = du
When t Æ •,   u Æ •
When t Æ –•,   u Æ •
1

1È • 11
- t2
- u du ˘
E( X ) = m ◊ 2p + Ú s ◊ Í∵ Ú e 2 dt = 2p ˙
e 2
2p • 2p ÍÎ -•
2 ˙˚
= m+0 [∵ the limits of integration are same ]
=m

2. Variance of the Normal Distribution

Var(X) = E(X – m)2


5.4 Normal Distribution 5.55

Ú (x - m)
2
= f ( x ) dx
-•
2
• 1 Ê x-m ˆ
1 - Á ˜
2Ë s ¯
Ú (x - m)
2
= e dx
-• 2p s
   
x-m
Putting = t , dx = s dt
s
• 1
1 - t2
Var ( X ) = Ú s 2t2
2p
e 2 dt

-•
• 1
s2 - t2
=
2p
Ú t2 e 2 dt
-•
• 1
2s 2 - t2
= Út e
2 2 dt
[∵ integral is an even function ]
2p 0

t2
Putting = u,
2
t = 2u
1 1
dt = 2 du = du
2 u 2u
When t = 0,   u = 0
When t = •,   u = •

2s 2 1
Ú 2ue
-u
Var( X ) = du
2p 0 2u
• 1
2s 2
Ú e u 2 du
-u
=
p 0

2s 2
3 È • ˘
Úe
- x n -1
= Í∵ x dx = n ˙
p 2 ÍÎ 0 ˙˚

2s 2 1 1
=
p 2 2
2s 2 1
= p
p 2
2
    = s
3. Standard Deviation of the Normal Distribution
SD = s
5.56 Chapter 5 Some Special Probability Distributions

4. Mode of the Normal Distribution

Mode is the value of x for which f (x) is maximum. Mode is given by


f ¢( x ) = 0 and f ¢¢( x ) < 0
For normal distribution,
2
1 Ê x-m ˆ
1 - Á ˜
2Ë s ¯
f ( x) = e
s 2p
Differentiating w.r.t. x,
2
1 Ê x-m ˆ
1 - Á
2Ë s ¯
˜ È Ê x - mˆ˘
f ¢( x ) = e Í- ÁË 2 ˜¯ ˙
s 2p Î s ˚
x-m
=- f ( x)
s2
When f ¢( x ) = 0, x-m =0
x=m
1
f ¢¢( x ) = - [( x - m ) f ¢( x) + f ( x)]
s2
1 È Ï (x - m) ¸ ˘
=- 2 Í
( x - m ) Ì- 2
f ( x ) ˝ + f ( x )˙
s Î Ó s ˛ ˚
1 È ( x - m )2 ˘
=- f ( x ) Í1 - ˙
s2 Î s2 ˚
At x = m,
f ( x) 1
f ¢¢( x ) = 2
=- 3
<0
s s 2p
Hence, x = m is the mode of the normal distribution.
5. Median of the Normal Distribution

If M is median of the normal distribution,


M
1
Ú f ( x ) dx =
2
-•
2
M 1 Ê x-m ˆ
1 - Á ˜ 1
2Ë s ¯
Ú e dx =
-• s 2p 2
2 2
m 1 Ê x-m ˆ M - 1 Ê x-m ˆ
1 - Á ˜ 1 Á ˜ 1
2Ë s ¯ 2Ë s ¯
s 2p
Úe dx +
s 2p
Úe dx =
2
 ...(5.3)
-• m
5.4 Normal Distribution 5.57

x-m
Putting = t in the first integral,
s
dx = s dt
When x = – •,   t = – •
When x = m,   t=0
2
m 1 Ê x-m ˆ 0 1
1 - Á ˜ 1 - t2
2Ë s ¯
s 2p
Ú e dx =
s 2p
Ú e 2 s dt
-• -•
0 1
1 - t2
=
2p
Ú e 2 dt
-•
• 1
1 - t2
=
2p
Úe 2 dt [By symmetry]
0

1 p
=
2p 2
1
=  ...(5.4)
2
From Eqs (5.3) and (5.4),
2
M - 1 Ê x-m ˆ
1 1 Á
2Ë s ¯
˜ 1
+
2 s 2p Úe dx =
2
m
2
M - 1 Ê x-m ˆ
1 Á
2Ë s ¯
˜

s 2p
Úe dx = 0
m
2
M - 1 Ê x-m ˆ
Á ˜
2Ë s ¯
Úe dx = 0
m

È b ˘
m = M Í∵ if Ú f ( x) dx = 0 then a = b where f ( x) > 0˙˙
ÎÍ a ˚
Hence, mean = median for the normal distribution.
Note For normal distribution,
mean = median = mode = m
Hence, the normal distribution is symmetrical.
5.58 Chapter 5 Some Special Probability Distributions

5.4.3 Probability of a Normal Random Variable in an


Interval
Let X be a normal random variable with P(X)
mean m and standard deviation s. The
probability of X lying in the interval
(x1, x2) (Fig. 5.3) is given by
2
x2 1 Ê x-m ˆ
1 - Á ˜
2Ë s ¯ X
P( x1 £ X £ x2 ) = Ús 2p
e dx O x1 x2
x1
Fig. 5.3
Hence, the probability is equal to the
area under the normal curve between the ordinates X = x1 and X = x2 respectively.
P(x1 < X < x2) can be evaluated easily by converting a normal random variable into
another random variable.
X-m
Let Z= be a new random variable.
s
Ê X - mˆ 1
E(Z ) = E Á = [E ( X ) - m ] = 0
Ë s ˜¯ s
Ê X - mˆ 1 1
Var ( Z ) = Var Á = Var ( X - m ) = 2 Var ( X ) = 1
Ë s ˜¯ s 2 s
The distribution of Z is also normal. Thus, if X is a normal random variable with mean
X-m
m and standard deviation s then Z = is a normal random variable with mean 0
s
and standard deviation 1. Since the parameters of the distribution of Z are fixed, it is a
known distribution and is termed standard normal distribution. Further, Z is termed as
a standard normal variate. Thus, the distribution of any normal variate X can always
be transformed into the distribution of the standard normal variate Z.

ÈÊ x - m ˆ Ê X - m ˆ Ê x2 - m ˆ ˘
P( x1 £ X £ x2 ) = P ÍÁ 1 ˜ £Á ˜ £Á ˜˙
ÎË s ¯ Ë s ¯ Ë s ¯ ˚
= P ( z1 £ Z £ z2 )

x1 - m x -m
where z1 = and z2 = 2
s s
This probability is equal to the area under the standard normal curve between the
ordinates at Z = z1 and Z = z2.
5.4 Normal Distribution 5.59

Case I If both z1 and z2 are positive


(or both negative) (Fig. 5.4),
P( x1 £ X £ x2 )
= P( z1 £ Z £ z2 )
= P(0 £ Z £ z2 ) - P(0 £ Z £ z1 )
= (Area under the normal curve from 0 to z2 )
- (Area under the normal curve from 0 to z1 ) Fig. 5.4

Case II If z1 < 0 and z2 > 0 (Fig. 5.5),


P( x1 £ X £ x2 )
= P (- z1 £ Z £ z2 )
= P (- z1 £ Z £ 0) + P (0 £ Z £ z2 )
= P (0 £ Z £ z1 ) + P (0 £ Z £ z2 )
[By symmetry]
= (Area under the normal curve from 0 to z1 ) Fig. 5.5

+ (Area under the normal curve from 0 to z2 )


When X > x1, Z > z1, the probability P(Z > z1) can be found for two cases as follows:

Case I If z1 > 0 (Fig. 5.6),


P( X > x1 ) = P( Z > z1 )
= 0.5 - P(0 £ Z £ z1 )
  = 0.5 – (Area under the curve
from 0 to z1)
Fig. 5.6
Case II If z1 < 0 (Fig. 5.7),
P( X > x1 ) = P( Z > - z1 )
= 0.5 + P (- z1 < Z < 0)

  = 0.5 + P(0 < Z < z1)


 [By symmetry)
  = 0.5 + (Area under the curve
from 0 to z1) Fig. 5.7
5.60 Chapter 5 Some Special Probability Distributions

When X < x1, Z < z1, the probability P(Z < z1) can be found for two cases as follows:
Case I If z1 > 0 (Fig. 5.8),
P( X < x1 ) = P ( Z < z1 )
= 1 - P ( Z ≥ z1 )
= 1 - ÈÎ0.5 - P (0 < Z < z1 )˘˚
= 0.5 + P(0 < Z < z1 )
= 0.5 + (Area under the curve from 0 to z1 ) Fig. 5.8

Case II If z1 < 0 (Fig. 5.9),

P( X < x1 ) = P( Z < - z1 )
= 1 - P( Z ≥ - z1 )
= 1 - ÎÈ0.5 + P(- z1 £ Z £ 0)˚˘
= 1 - ÈÎ0.5 + P(0 £ Z £ z1 )˘˚
[By symmetry] Fig. 5.9
= 0.5 - P(0 £ Z £ z1 )
= 0.5 - (Area under the curve from 0 to z1 )

Note
x1
(i)   P( X < x1 ) = F ( x1 ) =
Ú f ( x ) dx
-•

 Hence, P(X < x1) represents the


area under the curve from X = – •
to X = x1.
(ii) I f P(X < x1) < 0.5, the point x1
lies to the left of X = m and the
corresponding value of standard
Fig. 5.10
normal variate will be negative
(Fig. 5.10).

(iii) If P(X < x1) > 0.5, the point x1 lies
to the right of x = m and the
corresponding value of standard
normal variate will be positive
(Fig. 5.11).

Fig. 5.11
5.4 Normal Distribution 5.61

Standard Normal (Z) Table, Area between 0 and z

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3990 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4115 0.4131 0.4147 0.4162
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
5.62 Chapter 5 Some Special Probability Distributions

5.4.4 Uses of Normal Distribution


(i) The normal distribution can be used to approximate binomial and Poisson
distributions.
(ii) It is used extensively in sampling theory. It helps to estimate parameters from
statistics and to find confidence limits of the parameter.
(iii) It is widely used in testing statistical hypothesis and tests of significance in
which it is always assumed that the population from which the samples have
been drawn should have normal distribution.
(iv) It serves as a guiding instrument in the analysis and interpretation of statistical
data.
(v) It can be used for smoothing and graduating a distribution which is not normal
simply by contracting a normal curve.

Example 1
What is the probability that a standard normal variate Z will be (i) greater
than 1.09? (ii) less than –1.65? (iii) lying between –1 and 1.96?
(iv) lying between 1.25 and 2.75?
Solution
(i) Z > 1.09 (Fig. 5.12)
P( Z > 1.09) = 0.5 - P(0 £ Z £ 1.09)
= 0.5 - 0.3621
= 0.1379
Fig. 5.12
(ii) Z £ –1.65 (Fig. 5.13)
P( Z £ -1.65) = 1 - P( Z > -1.65)
= 1 - [0.5 + P (-1.65 < Z < 0)]
= 1 - [0.5 + P (0 < Z < 1.65)]
[By symmetry]
= 0.5 - P(0 < Z < 1.65)
= 0.5 - 0.4505 Fig. 5.13
= 0.0495
(iii) –1 < Z < 1.96 (Fig. 5.14)
P(-1 < Z < 1.96)
= P(-1 < Z < 0) + P(0 < Z < 1.96)
= P(0 < Z < 1) + P(0 < Z < 1.96)
[By symmetry]
= 0.3413 + 0.4750
= 0.8163 Fig. 5.14
5.4 Normal Distribution 5.63

(iv) 1.25 < Z < 2.75 (Fig. 5.15)


P(1.25 < Z < 275)
= P(0 < Z < 2.75) - P(0 < Z < 1.25)
= 0.4970 - 0.3944
= 0.1026
Fig. 5.15

Example 2
If X is a normal variate with a mean of 30 and an SD of 5, find the
probabilities that (i) 26 £ X £ 40, and (ii) X ≥ 45.
Solution
m = 30,   s = 5
X-m
Z=
s
26 - 30
(i) When X = 26, Z = = -0.8
5
40 - 30
When X = 40, Z = =2
5
Fig. 5.16
P(26 £ X £ 40) = P (-0.8 £ Z £ 2) (Fig. 5.16)
= P (-0.8 £ Z £ 0) + P(0 £ Z £ 2)
= P (0 £ Z £ 0.8) + P (0 £ Z £ 2) [By symmetry]
= 0.2881 + 0.4772
= 0.7653
45 - 30
(ii) When X = 45, Z = =3
5
P( X ≥ 45) = P( Z ≥ 3) (Fig. 5.17)
= 0.5 - P(0 < Z < 3)
= 0.5 - 0.4987
= 0.0013
Fig. 5.17

Example 3
X is normally distributed and the mean of X is 12 and the SD is 4. Find
out the probability of the following:
(i) X ≥ 20 (ii) X £ 20 (iii) 0 £ X £ 12.
5.64 Chapter 5 Some Special Probability Distributions

Solution
m = 12,   s = 4
X-m
Z=
s
20 - 12
(i) When X = 20, Z = =2
4
P( X ≥ 20) = P( Z ≥ 2) (Fig. 5.18)
= 0.5 - P(0 < Z < 2)
= 0.5 - 0.4772
= 0.0228
Fig. 5.18
(ii) P( X £ 20) = 1 - P( X > 20)
= 1 - 0.0228
= 0.9772
0 - 12
(iii) When X = 0, Z = = -3
4
12 - 12
When X = 12, Z = =0
4 Fig. 5.19
P(0 £ X £ 12) = P (-3 £ Z £ 0) (Fig. 5.19)
= P(0 £ Z £ 3) [By symmetry ]
= 0.4987

Example 4
If X is normally distributed with a mean of 2 and an SD of 0.1, find
P ( X - 2 ) ≥ 0.01)?

Solution:
m = 2, s = 0.1
X-m
Z=
s
1.99 - 2
When X = 1.99, Z = = -0.1 Fig. 5.20
0.1
2.01 - 2
When X = 2.01, Z = = 0.1
0.1
5.4 Normal Distribution 5.65

P ( X - 2 £ 0.01) = P(1.99 £ X £ 2.01) (Fig. 5.20)


= P(-0.1 £ Z £ 0.1)
= P(-0.1 £ Z £ 0) + P(0 £ Z £ 0.1)
= P(0 £ Z £ 0.1) + P (0 £ Z £ 0.1) [By symmetry]
= 2 P(0 < Z £ 0.1)
= 2(0.0398)
= 0.0796
P ( X - 2 ≥ 0.01) = 1 - P ( X - 2 < 0.01)
= 1 - 0.0796
= 0.9204

Example 5
If X is a normal variate with a mean of 120 and a standard deviation of
10, find c such that (i) P(X > c) = 0.02, and (ii) P(X< c) = 0.05.
Solution
For normal variate X,
m = 120, s = 10
X-m
Z=
s
(i) P(X > c) = 0.02
P(X < c) = 1 – P(X ≥ c)
    = 1 – 0.02
    = 0.98
Since P(X < c) > 0.5, the corre-
sponding value of Z will be positive.
P(X > c) = P(Z > z1) (Fig. 5.21)
0.02 = 0.5 – P(0 £ Z £ z1)
P(0 £ Z £ z1) = 0.48
\ z1 = 2.05 [From normal table]
Fig. 5.21
c - 120
Z= = z1 = 2.05
10
c = 2.05(10) + 120 = 140.05
(ii) Since P(X < c) < 0.5, the corresponding
value of Z will be negative.
P( X < c) = P( Z < - z1 ) (Fig. 5.22)
0.05 = 1 - P( Z ≥ - z1 )
Fig. 5.22
0.05 = 1 - ÈÎ0.5 + P(- z1 £ Z £ 0)˘˚

5.66 Chapter 5 Some Special Probability Distributions

0.05 = 1 - ÎÈ0.5 + P (0 £ Z £ z1 )˚˘ [By symmetry]


0.05 = 0.5 - P (0 £ Z £ z1 )

P(0 £ Z £ z1 ) = 0.5 - 0.05 = 0.45
\ z1 = -1.64 [From normal table]
c - 120
Z= = z1 = -1.64
10
c = 10 (-1.64) + 120 = 103.6

Example 6
A manufacturer knows from his experience that the resistances of
resistors he produces is normal with m = 100 ohms and SD = s = 2 ohms.
What percentage of resistors will have resistances between 98 ohms and
102 ohms?
Solution
Let X be the random variable which denotes the resistances of the resistors.
m = 100, s =2
X-m
Z=
s
98 - 100
When X = 98, Z= = -1
2
102 - 100
When X = 102, Z = =1 Fig. 5.23
2
P(98 £ X £ 102) = P(-1 £ Z £ 1) (Fig. 5.23)
= P(-1 £ Z £ 0) + P(0 £ Z £ 1)
= P(0 £ Z £ 1) + P(0 £ Z £ 1) [By symmetry]
= 2 P(0 £ Z £ 1)
= 2(0.3413)
= 0.6826
Hence, the percentage of resistors have resistances between 98 ohms and 102 ohms =
68.26%.

Example 7
The average seasonal rainfall in a place is 16 inches with an SD of
4 inches. What is the probability that the rainfall in that place will be
between 20 and 24 inches in a year?
5.4 Normal Distribution 5.67

Solution
Let X be the random variable which denotes the seasonal rainfall in a year.
m = 16, s =4
X-m
Z=
s
20 - 16
When X = 20, Z= =1
4
24 - 16
When X = 24, Z= =2
4 Fig. 5.24
P(20 < X < 24) = P (1 < Z < 2) (Fig. 5.24)
= P(0 < Z < 2) - P (0 < Z < 1)
= 0.4772 - 0.3413
= 0.1359

Example 8
The lifetime of a certain kind of batteries has a mean life of 400 hours
and the standard deviation as 45 hours. Assuming the distribution of
lifetime to be normal, find (i) the percentage of batteries with a lifetime
of at least 470 hours, (ii) the proportion of batteries with a lifetime
between 385 and 415 hours, and (iii) the minimum life of the best 5%
of batteries.
Solution
Let X be the random variable which denotes the lifetime of a certain kind of batteries.
m = 400, s = 45
X-m
Z=
s
(i) When X = 470,
470 - 400
Z= = 1.56
45
P( X ≥ 470) = P( Z ≥ 1.56) (Fig. 5.25)
= 0.5 - P(0 < Z < 1.56)
= 0.5 - 0.4406
= 0.0594
Fig. 5.25
Hence, the percentage of batteries
with a lifetime of at least 470 hours
= 5.94%.
5.68 Chapter 5 Some Special Probability Distributions

(ii) When X = 385,


385 - 400
   Z = = -0.33
45
When X = 415,
415 - 400
   Z = = 0.33
45
Fig. 5.26
P(385 < X < 415) = P(-0.33 < Z < 0.33) (Fig. 5.26)
= P(-0.33 < Z < 0) + P(0 < Z < 0.33)
= P(0 < Z < 0.33) + P (0 < Z < 0.33) [By symmetry]
= 2 P(0 < Z < 0.33)
= 2(0.1293)
= 0.2586
Hence, the proportion of batteries with a lifetime between 385 and 415 hours
= 25.86%.
(iii) P(X > x1) = 0.05 (Fig. 5.27)
P(X > x1) = P(Z > z1)
0.05 = 0.5 – P(0 £ Z £ z1)
P(0 £ Z £ z1) = 0.5 – 0.05 = 0.45
\ z1 = 1.65 [From normal table]
x - 400
Z= 1 = z1 = 1.65
45
\ x1 = 1.65(45) + 400 = 474.25 hours Fig. 5.27

Example 9
If the weights of 300 students are normally distributed with a mean of
68 kg and a standard deviation of 3 kg, how many students have weights
(i) greater than 72 kg? (ii) less than or equal to 64 kg? (iii) between
65 kg and 71 kg inclusive?
Solution
Let X be the random variable which denotes the weight of a student.
m = 68, s = 3, N = 300
X-m
Z=
s
72 - 68
(i) When X = 72, Z= = 1.33
3

Fig. 5.28
5.4 Normal Distribution 5.69

P( X > 72) = P ( Z > 1.33) (Fig. 5.28)


= 0.5 - P(0 £ Z £ 1.33)
= 0.5 - 0.4082
= 0.0918

Number of students with weights more than 72 kg = N P( X > 72)


= 300(0.0918)
= 27.54
ª 28
64 - 68
(ii) When X = 64, Z = = -1.33
3
P( X £ 64) = P ( Z £ -1.33) (Fig. 5.29)
= P ( Z ≥ 1.33) [By symmetry]
= 0.5 - P(0 < Z < 1.33)
= 0.5 - 0.4082
= 0.0918
Number of students with weights
less than or equal to 64 kg
= N P( X £ 64)
= 300 (0.0918) Fig. 5.29

= 27.54
ª 28
65 - 68
(iii) When X = 65, Z= = -1
3
71 - 68
When X = 71, Z= =1
3

P(65 £ X £ 71) = P(-1 £ Z £ 1) (Fig. 5.30) Fig. 5.30


= P(-1 £ Z £ 0) + P(0 £ Z £ 1)
= P(0 £ Z £ 1) + P(0 £ Z £ 1) [By symmetry]
= 2 P(0 £ Z £ 1)
= 2(0.3413)
= 0.6826
Number of students with weights between 65 and 71 kg = N P(65 £ X £ 71)
= 300(0.6826)
= 204.78
ª 205
5.70 Chapter 5 Some Special Probability Distributions

Example 10
The mean yield for a one-acre plot is 662 kg with an SD of 32 kg.
Assuming normal distribution, how many one-acre plots in a batch of
1000 plots would you expect to have yields (i) over 700 kg? (ii) below
650 kg? (iii) What is the lowest yield of the best 100 plots?
Solution
Let X be the random variable which denotes the yield for the one-acre plot.
m = 662, s = 32, N = 1000
X-m
Z=
s
700 - 662
(i) When X = 700, Z= = 1.19
32
P( X > 700) = P ( Z > 1.19) (Fig. 5.31)
= 0.5 - P(0 £ Z £ 1.19)
= 0.5 - 0.3830
= 0.1170 Fig. 5.31
Expected number of plots with yields over 700 kg = N P ( X > 700)
= 1000 (0.1170)
= 117

(ii) When X = 650,


650 - 662
Z = = -0.38
32
P( X < 650) = P ( Z < -0.38) (Fig. 5.32)
= P( Z > 0.38)
[By symmetry]
= 0.5 - P(0 £ Z £ 0.38)
= 0.5 - 0.1480 Fig. 5.32

= 0.352
Expected number of plots with yields below 650 kg = N P( X < 650)
= 1000(0.352)
= 352
(iii) The lowest yield, say, x1 of the best 100 plots is given by
100
P( X > x1 ) = = 0.1
1000
5.4 Normal Distribution 5.71

x1 - 662
When X = x1 , Z= = z1
32
P( X > x1 ) = P( Z > z1 )
0.1 = 0.5 - P (0 £ Z £ z1 )
P(0 £ Z £ z1 ) = 0.4
\ z1 = 1.2 (approx.) [From normal table]
x1 - 662
= 1.28
32
x1 = 702.96
Hence, the best 100 plots have yields over 702.96 kg.

Example 11
Assume that the mean height of Indian soldiers is 68.22 inches with a
variance of 10.8 inches. How many soldiers in a regiment of 1000 would
you expect to be over 6 feet tall?
Solution
Let X be the continuous random variable which denotes the heights of Indian
soldiers.
m = 68.22, s 2 = 10.8, N = 1000
s = 3.29
X-m
Z=
s
When X = 6 feet = 72 inches,
72 - 68.22
Z= = 1.15
3.29
Fig. 5.33
P( X > 72) = P ( Z > 1.15) (Fig. 5.33)
= 0.5 - P(0 £ Z £ 1.15)
= 0.5 - 0.3749
= 0.1251
Expected number of Indian soldiers having heights over 6 feet (72 inches)
 = N P ( X > 72)
= 1000(0.1251)
= 125.1
ª 125
5.72 Chapter 5 Some Special Probability Distributions

Example 12
The marks obtained by students in a college are normally distributed
with a mean of 65 and a variance of 25. If 3 students are selected at
random from this college, what is the probability that at least one of
them would have scored more than 75 marks?
Solution
Let X be the continuous random variable which denotes the marks of a student.
m = 65,s 2 = 25
s =5
X-m
Z=
s
75 - 65
When X = 75, Z= =2
5
P( X > 75) = P( Z > 2) (Fig. 5.34)
= 0.5 - P(0 £ Z £ 2) Fig. 5.34
= 0.5 - 0.4772
= 0.0228
If p is the probability of scoring more than 75 marks,
p = 0.0228, q = 1 – p = 1 – 0.0228 = 0.9772
P(at least one student would have scored more than 75 marks)
3
= Â 3C x p x q n - x
x =1
3
= Â 3C x (0.0228) x (0.9772)3- x
x =1

    = 0.0668

Example 13
Find the mean and standard deviation in which 7% of items are under
35 and 89% are under 63.
Solution
Let m be the mean and s be standard deviation of the normal curve.
P( X < 35) = 0.07
P( X < 63) = 0.89
5.4 Normal Distribution 5.73

P( X > 63) = 1 - P( X < 63) = 1 - 0.89 = 0.11


X-m
Z=
s
Since P(X < 35) < 0.5, the corresponding value of Z will be negative.
35 - m
When X = 35, Z = = – z1 (say)
s
Since P(X < 63) > 0.5, the corresponding value of Z will be positive.
63 - m
When X = 63, Z= = z2 (say)
s
From Fig. 5.35,
P(Z < –z1) = 0.07
P(Z > z2) = 0.11
P (0 < Z < z1 ) = P (- z1 < Z < 0)
= 0.5 - P ( Z £ - z1 ) Fig. 5.35

= 0.5 - 0.07
= 0.43
z1 = 1.48 [From normal table]
P(0 < Z < z2 ) = 0.5 - P ( Z ≥ z2 )
= 0.5 - 0.11
= 0.39
    z2 = 1.23 [From normal table]
35 - m
Hence, = -1.48
s
–1.48 s + m = 35 ...(1)
63 - m
and = 1.23
s
1.23 s + m = 63  ...(2)
Solving Eqs (1) and (2),
m = 50.29,   s = 10.33

Example 14
In an examination, it is laid down that a student passes if he secures 40 %
or more. He is placed in the first, second, and third division according to
whether he secures 60% or more marks, between 50% and 60% marks
and between 40% and 50% marks respectively. He gets a distinction in
case he secures 75% or more. It is noticed from the result that 10% of
5.74 Chapter 5 Some Special Probability Distributions

the students failed in the examination, whereas 5% of them obtained


distinction. Calculate the percentage of students placed in the second
division. (Assume normal distribution of marks.)
Solution
Let X be the random variable which denotes the marks of students in the examination.
Let m be the mean and s be the standard deviation of the normal distribution of
marks.
P( X < 40) = 0.10
P( X ≥ 75) = 0.05
P( X < 75) = 1 - P( X ≥ 75) = 1 - 0.05 = 0.95
X-m
Z=
s
Since P(X < 40) < 0.5, the corresponding value of Z will be negative.
40 - m
When X = 40, Z = = - z1 (say)
s
Since P(X < 75) < 0.5, the corresponding value of Z will be positive.
75 - m
When X = 75, Z = = z2 (say)
s
From Fig. 5.36,
P( Z < - z1 ) = 0.10
P( Z > z2 ) = 0.05
P(0 < Z < z1 ) = P (- z1 < Z < 0)
= 0.5 - P( Z £ - z1 ) Fig. 5.36

= 0.5 - 0.10
= 0.40
z1 = 1.28 [From normal table]
P(0 < Z < z2 ) = 0.5 - P ( Z ≥ z2 )
= 0.5 - 0.05
= 0.45
z2 = 1.64 [From normal table]
40 - m
Hence, = -1.28
s
m – 1.28 s = 40 ...(1)
75 - m
and = 1.64
s
m + 1.64 s = 75 ...(2)
5.4 Normal Distribution 5.75

Solving Eqs (1) and (2),


m = 55.34 ª 55
s = 11.98 ª 12
Probability that a student is placed in the second division is equal to the probability
that his score lies between 50 and 60
50 - 55
When X = 50, Z = = -0.42
12
60 - 55
When X = 60, Z = = 0.42
12
P(50 < X < 60) = P (-0.42 < Z < 0.42)
= P (-0.42 < Z < 0) + P (0 < Z < 0.42)
= P (0 < Z < 0.42) + P (0 < Z < 0.42) [By symmetry]
= 2 P(0 < Z < 0.42)
= 2 (0.1628)
= 0.3256
ª 0.32
Hence, the percentage of students placed in the second division = 32%.

5.4.5 Fitting a Normal Distribution


Fitting a normal distribution or a normal curve to the data means to find the equation
2
1 Ê x-m ˆ
1 - Á ˜
2Ë s ¯
of the curve in the form f ( x ) = e which will be as close as possible
s 2p
to the points given. There are two purposes of fitting a normal curve:
(i) To judge the whether the normal curve is the best fit to the sample data.
(ii) To use the normal curve to estimate the characteristics of a population.
The area method for fitting a normal curve is given by the following steps:
(i) Find the mean m and standard deviation s for the given data if not given.
(ii) Write the class intervals and lower limits X of class intervals in two columns.
X-m
(iii) Find Z = for each class interval.
s
(iv) Find the area corresponding to each Z from the normal table.
(v) Find the area under the normal curve between the successive values of Z. These
are obtained by subtracting the successive areas when the corresponding Z’s
have the same sign and adding them when the corresponding Z’s have the same
sign and adding them when the corresponding Z’s opposite sign.
(vi) Find the expected frequencies by multiplying the relative frequencies by the
number of observations.
5.76 Chapter 5 Some Special Probability Distributions

Example 1
Fit a normal curve from the following distribution. It is given that the
mean of the distribution is 43.7 and its standard distribution is 14.8.

Class interval 11–20 21–30 31–40 41–50 51–60 61–70 71–80

Frequency 20 28 40 60 32 20 8

Solution
m = 43.7,   s = 14.8   N = Sf = 200
The series is converted into an inclusive series.

Class X-m Area from Area in class Expected


Lower class Z=
Interval s 0 to Z Interval Frequencies

10.5–20.5 10.5 –2.24 0.4875 0.0457 9.14 ª 9


20.5–30.5 20.5 –1.57 0.4418 0.1285 25.7 ª 26
30.5–40.5 30.5 –0.89 0.3133 0.2262 45.24 ª 45
40.5–50.5 40.5 –0.22 0.0871 0.2643 52.86 ª 53
50.5–60.5 50.5 0.46 0.1772 0.1957 39.14 ª 39
60.5–70.5 60.5 1.14 0.3729 0.092 18.4 ª 18
70.5–80.5 70.5 1.81 0.4649 0.0287 5.74 ª 65
80.5 2.49 0.4936

Example 2
Fit a normal distribution to the following data:
X 125 135 145 155 165 175 185 195 205

Y 1 1 14 22 25 19 13 3 2

It is given that m = 165.5 and s = 15.26.


Solution
m = 165.5,   s = 15.26   N = Sf = 100
The data is first converted into class intervals with inclusive series.
5.4 Normal Distribution 5.77

Class X-m Area from Area in class Expected


Lower class Z=
Interval s 0 to Z Interval Frequencies

120–130 120 –2.98 0.4986 0.0085 0.85 ª 1


130–140 130 –2.33 0.4901 0.0376 3.74 ª 4
140–150 140 –1.67 0.4525 0.1064 10.64 ª 11
150–160 150 –1.02 0.3461 0.2055 20.55 ª 21
160–170 160 –0.36 0.1406 0.2547 25.47 ª 25
170–180 170 0.29 0.1141 0.2148 21.48 ª 21
180–190 180 0.95 0.3289 0.1174 11.74 ª 12
190–200 190 1.61 0.4463 0.0418 4.18 ª 4
200–210 200 2.26 0.4881 0.0101 1.01 ª 1
210–220 210 2.92 0.4982

Exercise 5.3
1. If X is normally distributed with a mean and standard deviation of 4,
find (i) P(5 £ X £ 10), (ii) P(X ≥ 15), (iii) P(10 £ X £ 15), and (iv) P(X £ 5).
ÈÎans.: (i) 0.3345 (ii) 0.003 (iii) 0.0638 (iv) 0.4013˘˚

2. A normal distribution has a mean of 5 and a standard deviation of 3.
What is the probability that the deviation from the mean of an item
taken at random will be negative?
ÈÎans.: 0.0575˘˚

3. If X is a normal variate with a mean of 30 and an SD of 6, find the value
of X = x1 such that P(X ≥ x1) = 0.05.
 ÎÈans.: 39.84 ˚˘
4. If X is a normal variate with a mean of 25 and SD of 5, find the value of
X = x1 such that P(X £ x1) = 0.01.
ÈÎans.: 11.02˘˚

5. The weights of 4000 students are found to be normally distributed with
a mean of 50 kg and an SD of 5 kg. Find the probability that a student
selected at random will have weight (i) less than 45 kg, and (ii) between
45 and 60 kg.
ÈÎans.: (i) 0.1587 (ii) 0.8185˘˚

6. T
 he daily sales of a firm are normally distributed with a mean of ` 8000
and a variance of ` 10000. (i) What is the probability that on a certain
5.78 Chapter 5 Some Special Probability Distributions

day the sales will be less than ` 8210? (ii) What is the percentage of
days on which the sales will be between ` 8100 and ` 8200?
ÈÎans.: (i) 0.482 (ii) 14% ˘˚

7. The mean height of Indian soldiers is 68.22¢¢ with a variance of 10.8¢¢.
Find the expected number of soldiers in a regiment of 1000 whose
height will be more than 6 feet.
 ÎÈans.: 125˚˘
8.  The life of army shoes is normally distributed with a mean of 8 months
and a standard deviation of 2 months. If 5000 pairs are issued, how
many pairs would be expected to need replacement after 12 months?
ÈÎans.: 2386 ˘˚

9.  In an intelligence test administered to 1000 students, the average was
42 and the standard deviation was 24. Find the number of students
(i) exceeding 50, (ii) between 30 and 54, and (iii) the least score of top
1000 students.
ÈÎans.: (i) 129 (ii) 383 (iii) 72.72 ˘˚

10. In a test of 2000 electric bulbs, it was found that the life of a
particular make was normally distributed with an average of life
of 2040 hours and a standard deviation of 60 hours. Estimate the
number of bulbs likely to burn for (i) more than 2150 hours, and
(ii) less than 1950 hours.
ÈÎans.: (i) 67 (ii) 184 ˘˚

11. T
 he marks of 1000 students of a university are found to be normally
distributed with a mean of 70 and a standard of deviation 5. Estimate
the number of students whose marks will be (i) between 60 and 75,
(ii) more than 75, and (iii) less than 68.

 ÎÈans.: (i) 910 (ii) 23 (iii) 37 ˚˘


12. In a normal distribution, 31% items are under 45 and 8% are over 64.
Find the mean and standard deviation. Find also, the percentage of
items lying between 30 and 75.
ÈÎans.: 50, 10, 0.957 ˘˚

13. Of a large group of men, 5% are under 60 inches in height and 40% are
between 60 and 65 inches. Assuming a normal distribution, find the
mean and standard deviation of distribution.
ÈÎans.: 65.42, 3.27 ˘˚

5.5 Exponential Distribution 5.79

14. T
 he marks obtained by students in an examination follow a normal
distribution. If 30% of the students got marks below 35 and 10% got
marks above 60, find the mean and percentage of students who got
marks between 40 and 50.
ÈÎans.: 42.23, 13.88, 28% ˘˚

15. Fit a normal distribution to the following data:

Class 60–65 65–70 70–75 75–80 80–85 85–90 90–95 95–100

Frequency 3 21 150 335 326 135 26 4

ÈÎans.: Expected frequency : 3, 31, 148, 322, 319, 144, 30, 3˘˚


5.5 Exponential Distribution

A continuous random variable X is said to follow exponential distribution if its


probability function is given by
f ( x ) = l e- l x , x>0

=0 , x£0
where l > 0 is called the rate of the distribution.

5.5.1 Memoryless Property of the Exponential Distribution


The exponential distribution has the memoryless (forgetfulness) property. This property
indicates that the distribution is independent of its part, that means future happening
of an event has no relation to whether or not this even has happened in the past. This
property is as follows:
If X is exponentially distributed, and s, t are two positive real numbers then
P[( X > s + t )/( X > s)] = P( X > t )
P [( X > s + t ) « ( X > s)]
Proof: P[( X > s + t )/( X > s)] =
P ( X > s)

[using conditional probability]
P( X > s + t )
=
P ( X > s)

Ú l e dx
-l x

= s •+ t
Ú s l e dx
-l x
     
5.80 Chapter 5 Some Special Probability Distributions


e- l x
l
-l
s+t
= •
-l x
e
l
-l
s

e- l ( s + t )
=
e- l s
= e- lt  ...(5.5)

P( X > t ) = Ú l e - l x dx
t

e- l x
=l
-l
t

= e- lt  ...(5.6)
From Eq. (5.5) and Eq. (5.6),
P[( X > s + t )/( X > s)] = P( X > t ) ,    for s, t > 0

5.5.2 Constants of the Exponential Distribution


1. Mean of the Exponential Distribution

E( X ) = Ú x f ( x )dx
-•

=Ú x l e – l x dx
0

e- l x e- l x
=l x◊ - 1◊ 2
-l l 0

1
=l◊
l2
1
=
l

2. Variance of the Exponential Distribution

Var (X) = E(X2) – [E(X)]2  ...(5.7)



E( X 2 ) = Ú x 2 f ( x ) dx
-•


= Ú x 2 l e - l x dx
        0
5.5 Exponential Distribution 5.81


e- l x
2 e- l x e- l x
=l x - 2x + 2
-l l2 -l 3 0

Ê 2 ˆ
= lÁ 3˜
Ël ¯
2
=
         l2
Substituting in Eq (5.7),

2 1 1 È 1˘
Var( X ) = 2
- 2
= 2 Í∵ m = l ˙
l l l Î ˚

3. Standard Deviation of the Exponential Distribution

1 1
SD = Var( X ) = =
l 2 l

4. Mode of the Exponential Distribution

Mode is the value of x for which f(x) is maximum.

f ( x ) = l e- l x , x>0
=0 , x£0

f(x) will be maximum when e–lx is maximum.


Maximum value of e–lx = 1, which is at x = 0.
Hence, x = 0 is the mode of the exponential distribution
5. Median of the Exponential Distribution

If M is the median of the exponential distribution,


M 1
Ú- • f ( x ) dx =
2
M 1
Ú0 l e - l x dx =
2
M
e- l x 1
l =
-l 2
0

1
–(e - l M - 1) =
2
5.82 Chapter 5 Some Special Probability Distributions

1 1
-e- l M = -1= -
2 2
1
e- l M =
2
1
- l M log e = log = - log 2
2
l M = log 2
1
M= log 2
  l

Example 1
Let X be a random variable with pdf
Ï1 - x
Ô 5 x>0
f ( x) = Ì 5 e
Ô0 otherwise
Ó
Find (i) P(X > 5) (ii) P (3 £ X £ 6) (iii) mean (iv) variance.
Solution
1
l=
5

(i) P( X > 5) = Ú f ( x ) dx
5
x
• 1 –5
=Ú e dx
5 5

x

1 e 5
=
5 1

5 5

x

=– e 5

5
-•
= - (e - e -1 )
= e -1

      = 0.3679
5.5 Exponential Distribution 5.83

6
(ii) P(3 £ X £ 6) = Ú f ( x ) dx
3
x
6 1 –5
=Ú e dx
3 5
6
x

1 e 5
=
5 1

5 3
6
x

=– e 5
3

Ê -6 -

= - Ëe 5 -e 5¯

3 6
- -
= e 5 -e 5

= 0.2476
1 1
(iii) Mean m = = =5
l Ê 1ˆ
ÁË 5 ˜¯

1 1
(iv) Variance = Var( X ) = = = 25
2 2
l Ê 1ˆ
ÁË 5 ˜¯

Example 2
A random variable has pdf f(x) = ce–2x for x > 0. Find (i) P(X > 2)
Ê 1ˆ
(ii) P Á X < ˜ .
Ë c¯

Solution
Since f(x) is a probability density function,

Ú-• f ( x) dx = 1

Ú0 ce
-2 x
dx = 1

ce -2 x
=1
-2
0
5.84 Chapter 5 Some Special Probability Distributions

c -2 x •
- e =1
2 0

c
- (e -• - e0 ) = 1
2
c
=1
2
c=2

\ f ( x ) = 2e -2 x , x>0


(i) P( X > 2) = Ú f ( x ) dx
2

= Ú 2e -2 x dx
2

e -2 x
=2
-2
2

-2 x
=-e
2
-•
= -(e - e -4 )
= e -4
= 0.0183

Ê 1ˆ Ê 1ˆ
(ii) PÁ X < ˜ = PÁ X < ˜
Ë c¯ Ë 2¯
1
= Ú 2 f ( x ) dx
0
1
= Ú 2 2e -2 x dx
0
1
e -2 x 2
=2
-2
0
1
-2 x 2
=- e
0

= -(e -1 - e0 )
= -e -1 + 1
= 0.6321
5.5 Exponential Distribution 5.85

Example 3
If X is random variable which follows an exponential distribution with
parameter l with P(X £ 1) = P(X > 1), find Var(X).
Solution
Since X is random variable which follows an exponential distribution,
-l x
f ( x ) = l e , x ≥ 0
P( X £ 1) = P( X > 1)
1 - P ( X > 1) = P( X > 1)
2 P ( X > 1) = 1
1
P ( X > 1) =
2
• 1
Ú1 f ( x ) dx =
2
• 1
Ú1 l e dx = 2
-l x


e- l x 1
l =
-l 2
1

-l x
• 1
-e =
1 2
1
-(e -• - e- l ) =
2
1
e- l =
2
1 1
l
=
e 2
el = 2
l = loge 2
1 1
Var( X ) = 2
=
l (loge 2)2

Example 4
If X is a exponentially distributed random variable with parameter l,
P( X > k )
find the value of k such that = a.
P( X £ k )
5.86 Chapter 5 Some Special Probability Distributions

Solution
P( X > k )
=a
P( X £ k )
P( X > k )
=a
1 - P( X > k )
P( X > k ) = a [1 - P( X > k )]
P( X > k )(1 + a ) = a
a
P( X > k ) =
1+ a
• a
Ú k f ( x ) dx = 1 + a
• a
Ú k l e dx = 1 + a
-l x


e- l x a
l =
-l 1+ a
k
• a
- e- l x =
k 1+ a
a
-(e -• - e- l k ) =
1+ a
a
e- l k =
1+ a
1 a
lk
=
e 1+ a
1+ a
el k =
a
Ê1 + aˆ
l k = log Á
Ë a ˜¯
1 Ê1 + aˆ
k= log Á
l Ë a ˜¯

Example 5
If the density function of a continuous random variable X is
1
f(x) = ce–b(x – a), a £ x where a, b, c are constants. Show that b = c =
and a = m – s, where m = E(X) and s 2 = Var(X). s
5.5 Exponential Distribution 5.87

Solution
Since f(x) is a density function,

Ú-• f ( x) dx = 1

Úa ce
- b( x - a )
dx = 1

e- b( x - a )
c =1
-b
a
c - b( x - a ) •
e - =1
b a

c
- (e-• - e0 ) = 1
b
c
=1
b
      b = c  ...(1)

m = E ( X ) = Ú bxe - b( x - a )
dx
a


ab
Ê e - bx ˆ e - bx
= be xÁ ˜-
Ë -b ¯ b2 a

Êa 1 ˆ
= be ab Á e - ab + 2 e - ab ˜
Ëb b ¯
1
    = a+  ...(2)
b

E ( X 2 ) = Ú bx 2 e - b( x - a ) dx
a


ab
Ê e - bx ˆ
2
Ê e - bx ˆ Ê e - bx ˆ
= be x Á ˜ - 2 x Á 2˜ + 2 Á 3˜
Ë -b ¯ Ë -b ¯ Ë -b ¯ a

Ê a2 2a 2 ˆ
= bÁ + + ˜
Ë b b 2 b3 ¯
1
= (a 2 b2 + 2 ab + 2)
    b2
5.88 Chapter 5 Some Special Probability Distributions

Var( X ) = E ( X 2 ) - [ E ( X )]2
1 Ê 2a 1 ˆ
s2 = (a 2 b2 + 2 ab + 2) - Á a 2 + +
b2 Ë b b2 ˜¯
1
=
b2
1
s=  ...(3)
b
From Eq. (1) and (3),
1
b=c=
  s
Subtracting Eq. (3) from Eq. (2),
m -s = a
\ a = m - s

Example 6
The mileage which car owners get with a certain kind of radial tire
is a random variable having an exponential distribution with mean
4000 km. Find the probabilities that one of these tires will last (i) at least
2000 km (ii) at most 3000 km.
Solution
Let X be the random variable which denotes the mileage obtained with the tire.
1
Mean m = = 4000 km
l
f ( x) = l e- l x , x>0
1
1 - x
= e 4000 , x>0
4000

(i) P( X ≥ 2000) = Ú f ( x ) dx
2000
1
• 1 - 4000 x
=Ú e dx
2000 4000

1
- x
1 e 4000
=
4000 1
-
4000 2000
5.5 Exponential Distribution 5.89


1
- x
=-e 4000

2000
-• -0.5
= -(e -e )
-0.5
=e
= 0.6065

3000
(ii) P( X £ 3000) = Ú f ( x ) dx
0
1
3000 1 - 4000 x
=Ú e dx
0 4000
3000
1
- x
1 e 4000
=
4000 1
-
4000 0
3000
1
- x
=-e 4000

0
-0.75
= -(e - e0 )
= – e-0.75 + 1
= 0.5270

Example 7
If the number of kilometers that a car can run before its battery wears
out is exponentially distributed with an average value of 10000 km and
if the owner desires to take a 5000 km trip, what is the probability that
he will be able to complete his trip without having to replace the car
battery. Assume that the car has been used for same time.
Solution
Let X be the random variable which denotes the number of kilometers that a car can
run before its battery wears out.
1
Mean m = = 10000
l
f ( x) = l e– l x , x>0
1
1 - x
= e 10000 , x>0
1000
5.90 Chapter 5 Some Special Probability Distributions


P( X > 5000) = Ú f ( x ) dx
5000
1
• 1 - x
=Ú e 10000 dx
5000 10000

1
- x
1 e 10000
=
10000 1
-
10000 5000

1
- x
=-e 10000

5000
-• –0.5
= -(e -e )
-0.5
=e
= 0.6065

Example 8
The average time it takes to serve a customer at a petrol pump is 6 min-
utes. The service time follows exponential distribution. Calculate the
probability that
(i) A customer will take less than 2 minutes to complete the service.
(ii) A customer will take between 4 and 5 minutes to get the service.
(iii) A customer will take more than 10 minutes for his service.
Solution
Let X be the random variable which denotes the service time.

1
Mean m = =6
l
f ( x) = l e– l x , x > 0
1
1 -6 x
= e ,x>0
6

2
(i) P( X < 2) = Ú f ( x ) dx
0
1
2 1 -6 x
=Ú e dx
0 6
5.5 Exponential Distribution 5.91

2
1
- x
1 e 6
=
6 1
-
6 0
2
1
- x
=-e 6

0
1
-
= -(e 3 - e0 )
1
-
= –e 3 + 1
= 0.2835

5
(ii) P(4 < X < 5) = Ú f ( x ) dx
4
1
5 1 -6 x
=Ú e dx
4 6
5
1
- x
1 e 6
=
6 1
-
6 4
5
1
- x
=-e 6

Ê -5 –

= –Ëe 6 –e 3¯

= 0.0788

(iii) P( X > 10) = Ú f ( x ) dx
10
1
• 1 -6 x
=Ú e dx
10 6

1
- x
1 e 6
=
6 1
-
6 10

1
- x
=- e 6
10
5.92 Chapter 5 Some Special Probability Distributions

Ê – ˆ
10
= - Á e -• - e 6 ˜
ÁË ˜¯
10

=e 6
= 0.1889

Example 9
The length of time X to complete a job is exponentially distributed with
1
E ( X ) = m = = 10 hours. (i) Compute the probability of job comple-
l
tion between two consecutive jobs exceeding 20 hours. (ii) The cost of
job completion is given by C = 4 + 2X + 2X2. Find the expected value
of C.
Solution
Let X be a random variable which denotes the length of time to complete a job.
1
E( X ) = m = = 10
l
f ( x) = l e– l x
1
1 - 10 x
= e
10

(i) P( X > 20) = Ú f ( x ) dx
20
1
• 1 - 10 x
=Ú e dx
20 10

1
- x
1 e 10
=
10 1
-
10 20

1
- x
=– e 10
20
-•
= -(e - e –2 )
= e –2
= 0.1353
5.5 Exponential Distribution 5.93

(ii) For an exponential random variable,


1
E( X ) = m = = 10
l
1
Var ( X ) =
l2
Var ( X ) = E ( X 2 ) - m 2

E ( X 2 ) = Var ( X ) + m 2
1 1
= 2 + 2
l l
2
= 2
l
  = 200
E (C ) = E (4 + 2 X + 2 X 2 )
= E (4) + 2 E ( X ) + 2 E ( X 2 )
= 4 + 2(10) + 2(200)
= 424

Example 10
The time (in hours) required to repair a machine is exponentially dis-
1
tributed with parameter l = .
2
(i) What is the probability that the repair time exceeds 2 hours?
(ii) What is the conditional probability that a repair takes at least
11 hours given that its direction exceeds 8 hours?
Solution
Let X be the random variable which denotes the time to repair the machine.
1
l=
2
f ( x) = l e– l x , x>0
1
1 -2 x
= e , x>0
2
5.94 Chapter 5 Some Special Probability Distributions


(i) P( X > 2) = Ú f ( x ) dx
2
1
• 1 -2 x
=Ú e dx
2 2

1
- x
1 e 2
=
2 1
-
2 2

1
- x
=–e 2

= -(e -• - e –1 )
= e –1
= 0.3679

(ii) P( X ≥ 11/X > 9) = P( X > 3) (By the memoryless property)



= Ú f ( x ) dx
3
1
• 1 -2 x
=Ú e dx
3 2

1
- x
1 e 2
=
2 1
-
2 3

1
- x
=-e 2

3
-•
= -(e - e -1.5 )
= e -1.5
= 0.2231

Example 11
The daily consumption of milk in excess of 20000 gallons is approxi-
1
mately exponentially distributed with l = . The city has a daily
3000
stock of 35000 gallons. What is the probability that of 2 days selected at
random, the stock is insufficient for both the days.
5.5 Exponential Distribution 5.95

Solution
Let Y be a random variable which denotes the daily consumption of milk consumed in
a day. The random variable X = Y – 20000 has an exponential distribution.
1
l=
3000
f ( x) = l e– l x , x>0
1
1 - x
= e 3000 , x>0
3000

Probability that the stock is insufficient for both days

     P(Y > 35000) = P( X > 15000)



=Ú f ( x ) dx
15000
1
• 1 - 3000 x
=Ú e dx
15000 3000

1
- x
1 e 3000
=
3000 1
-
3000 15000

1
- x
=-e 3000

15000
-• -5
= -(e -e )
-5
=e
= 0.0067

Exercise 5.4
1. If X is exponentially distributed, prove that probability that X exceeds
its expected value is less than 0.5.
2. The amount of time that a watch will run without having to be reset
is a random variable having an exponential distribution with mean 120
days. Find the probability that such a watch will
(a) have to be set in less than 24 days.
(b) not have to be reset in at least 180 days.
 [Ans.: (a) 0.1813, (b) 0.2231]
5.96 Chapter 5 Some Special Probability Distributions

3. The length of the shower on a tropical island during rainy season has
an exponential distribution with parameter 2, time being measured
in minutes. What is the probability that a shower will last more than
3 minutes? If a shower has already lasted for 2 minutes, what is the
probability that it will last for at least one more minute?
 [Ans.: (a) 0.0025, (b) 0.1353]
4. If X is exponentially distributed with parameter l, find the value of k
such that P(X > k)/P(X £ k) = a.
È -1 Ê 1ˆ ˘
Í ans.: l log ÁË 1 + ˜¯ ˙
Î a ˚

5. The life length X of an electronic component follows an exponential
distribution. These are 2 processes by which the component may be
manufactured. The expected life length of the component is 100 hrs
if process I is used to manufacture, while it is 150 hrs if process II is
used. The cost of manufacturing a single component by process I is
`10, while is `20 for process II. Moreover, if the component lasts less
than the guaranteed life of 200 hrs, a loss of `50 is to be borne by the
manufacturer. Which process is advantageous to the manufacturer?
 [Ans.: Process I is advantageous to the manufacturer]
6. The life of an electronic component follows exponential distribution
with a mean of 4 years. The manufacturer of this component gives a
replacement warranty of 3 years.
(a) What proportion of components will be replaced in the period
of warranty?
(b) What is the probability that a randomly selected component
will have life within two standard deviations of the mean life?
 [Ans.: (a) 0.5276, (b) 0.9502]

5.6 Gamma Distribution

A continuous random variable X is said to follow exponential distribution if its


probability function is given by
lr
f ( x) = x r -1e - l x , x>0
r

=0 , x£0
5.6 Gamma Distribution 5.97

5.6.1 Constants of the Gamma Distribution

1. Mean of the Gamma Distribution



E( X ) = Ú x f ( x ) dx
-•
• lr
=Ú x x r -1e - l x dx
0 r
lr • -l x r
=
r
Ú0 e x dx

lr r + 1 È • - kx n -1 n˘
= r +1 Í∵ Ú0 e x dx = n ˙
r l ÍÎ k ˙˚
lrr r
=
r ◊ l r +1
r
=
l

2. Variance of the Gamma Distribution

    Var(X) = E(X2) – [E(X)]2 ...(5.8)



E( X 2 ) = Ú x 2 f ( x ) dx
-•
• lr
= Ú x2 x r -1e - l x dx
0 r
lr • - l x r +1
=
r
Ú0 e x dx

lr r + 2 È • - kx n -1 n˘
= r +2 Í∵ Ú0 e x dx = n ˙
r l ÍÎ k ˙˚
(r + 1)r r
=
r ◊ l2
2
r +r
=
l2

Substituting in Eq. (5.8),

r2 + r r2
Var( X ) = -
l2 l2
r
=
l2
5.98 Chapter 5 Some Special Probability Distributions

3. Standard Deviation of the Gamma Distribution

r r
SD = Var( X ) = =
l 2 l

4. Mode of the Gamma Distribution


Mode is the value of x for which f(x) is maximum.

lr
f ( x) = x r -1e - l x , x>0
r
=0 , x£0
Differentiating w.r.t. x,

lr È
f ¢( x ) = Î(r - 1) x r - 2 e - l x + x r -1e - l x (- l )˘˚
r
l r r -2 - l x
= x e [(r - 1) - l x ]
r
For maximum value of f(x),
f ¢( x ) = 0
(r - 1) - l x = 0
r -1
x=
l
Differentiating f¢(x) w.r.t. x,

lr È
f ¢¢( x ) = (r - 2) x r - 3 e - l x (r - 1 - l x )
r Î
+ x r - 2 e - l x (- l )(r - 1 - l x ) + x r - 2 e - l x (- l )˘˚

lr
= x r - 3 e - l x [(r - 2)(r - 1 - l x ) - l x(r - 1 - l x ) - l x ]
         r

r -1
Putting x = ,
l
lr
f ¢¢( x ) = x r - 3 e - l x [(r - 2)(r - 1 - r + 1) - l x(r - 1 - r + 1) - (r - 1)]
r
lr
= x r - 3 e - l x (1 - r )
         r
5.6 Gamma Distribution 5.99

r -1
f(x) is maximum when x = , if f¢¢(x) < 0,
l
f ¢¢( x ) < 0 if 1 - r < 0
1< r
or r >1
r -1
Hence, x = is the mode of the gamma distribution for r > 1.
l

Example 1
Given a Gamma random variable X with r = 3 and l = 2. Compute
E(X), Var(X) and P(X £ 1.5 years).
Solution
l = 2, r=3
r
l
f ( x) = x r -1e - l x , x>0
r
r 3
(a) E ( X ) = = = 1.5 years
l 2
r 3
(b) Var(X ) = 2 = = 0.75
l ( 2 )2
1.5
(c) P ( X £ 1.5 years) = Ú f ( x ) dx
0

1.5 23
=Ú x 2 e -2 x dx
0 3
1.5
Ê e -2 x ˆ
2
Ê e -2 x ˆ Ê e -2 x ˆ
=4 x Á ˜ - 2x Á ˜ + 2Á ˜
Ë -2 ¯ Ë 4 ¯ Ë -8 ¯ 0

È Ê e -3 ˆ Ê e -3 ˆ Ê e -3 ˆ 1 ˘
= 4 Í(1.5)2 Á ˜ - 2(1.5) Á ˜ + 2Á ˜+ ˙
ÍÎ Ë -2 ¯ Ë 4 ¯ Ë -8 ¯ 4 ˙˚
= 0.5768

Example 2
The daily consumption of milk in a city, in excess 20000 litres, is approx-
1
imately distributed as a Gamma variate with parameters l =
10000
5.100 Chapter 5 Some Special Probability Distributions

and r = 2. The city has a daily stock of 30000 litres. What is the prob-
ability that the stock is insufficient on a particular day?
Solution
Let Y be the random variable which denotes the daily consumption of milk (in litres)
in a city. The random variable X = Y – 20000 has a gamma distribution.
1
l= ,r=2
10000
lr
f ( x) = x r -1e - l x , x > 0
r
2
Ê 1 ˆ
ÁË 10000 ˜¯ -
1
x
= x 2 -1e 10000
2
1
- x
xe 1000
=
   (10000)2
Probability that the stock is insufficient on a particular day
P(Y > 30000) = P( X > 10000)

=Ú f ( x )dx
10000
1
- x
• xe 1000
=Ú dx
1000 (10000)2
1 • -4
Ú10 xe -10 x
= 8 4 dx
10
-4 -4 •
1 x ◊ e -10 x
1 ◊ e -10 x
= -
108 -10 -4 (-10 -4 )2
10 4

1 Ê e-1 e-1 ˆ
= Á + ˜
108 Ë 10 -8 10 -8 ¯
= e-1 + e-1
= 2e-1
= 0.7358
5.6 Gamma Distribution 5.101

Example 3
In a certain city, the daily consumption of electric power in millions
of kilowatt hours can be treated as a random variable having gamma
1
distribution with parameters l = and r = 3. If the power plant of this
2
city has a daily capacity of 12 millions kilowatt-hours, what is the prob-
ability that this power supply will be inadequate on any given day.
Solution
Let X be a random variable which denotes the daily consumption of electric power in
millions killowatt-hours.
lr
f ( x) = x r -1e - l x , x > 0
r
3
Ê 1ˆ
ÁË 2 ˜¯ 1
- x
2
= x e 2
3
P(power supply is inadequate) = P( X > 12)

= Ú f ( x ) dx
12
1
• 1 1 - x
=Ú x2e 2 dx
12 3 23

Ê -1x ˆ Ê -1x ˆ Ê -1x ˆ
1 2Áe 2 ˜ Áe 2 ˜ Áe 2 ˜
= x Á - 2x Á + 2Á
16 1 ˜ 1 ˜ 1 ˜
Á - ˜ Á ˜ Á - ˜
Ë 2 ¯ Ë 4 ¯ Ë 8 ¯ 12

1 -6
= e (288 + 96 + 16)
16
= 25e -6
      = 0.062

Example 4
If a company employs n sales persons, its gross sales in thousands
of rupees may be regarded as a random variable having a gamma
1
distribution with l = and r = 80 n. If the sales cost is `8000 per
2
5.102 Chapter 5 Some Special Probability Distributions

salesperson, how many salespersons should the company employ to


maximise the expected profit?
Solution
Let X be the random variable which denotes the gross sales in rupees by n salesper-
sons.
1
l = , r = 80000 n
2
l r r -1 - l x
f ( x) = x e , x>0
r
r 80000 n
E( X ) = = = 160000 n
l 1
2
If y denotes the total expected profit of the company,
y = total expected sales – total sales cost
  = 160000 n - 8000 n
dy 80000
= - 8000
       n d n
For maximum profits,
dy
=0
dn
80000
- 8000 = 0
n
80000
= 8000
n
n = 10
n = 100
      
d2 y 40000
2
=- 3
dn
n2
d2 y
When n = 100, = - 40 < 0
dn 2
\ y is maximum when n = 100.
Hence, the company should employ 100 salespersons to maximise the expected
profit.
5.6 Gamma Distribution 5.103

Example 5
Consumer demand for milk in a certain locality, per month, is known to
be a general gamma random variable. If the average demand is ‘a’ litres
and the most likely demand is ‘b’ litres (b < 0), what is the variance of
the demand?
Solution
Let X be the random variable which denotes the monthly consumer demand of milk.
Average demand is the value of E(X). Most likely demand is the value of the mode of
X or the value of X for which its probability density function is maximum.
lr
f ( x) = x r -1e - l x , x > 0
r
lr È
f ¢( x ) = (r - 1) x r - 2 e - l x - l x r -1e - l x ˘˚
r Î
l r r -2 - l x
= x e [(r - 1) - l x ]
r
For maximum value of f(x),
f ¢( x ) = 0
(r - 1) - l x = 0
r -1
x=
        l
Differentiating f¢(x) w.r.t. x,

f ¢¢( x ) =
lr È r -2 - l x
Í- l x e
r Î
+ {(r - 1) - l x}
dx
x e {
d r -2 - l x ˘
˙
˚
}
lr
= x r - 3 e - l x (1 - r )
        r
r -1
f ¢¢( x ) < 0 when x =
l

r -1
f(x) is maximum when x = if f ¢¢( x ) < 0
l
f ¢¢( x ) < 0 if 1 - r < 0
1< r
    or r >1
5.104 Chapter 5 Some Special Probability Distributions

r -1
Most likely demand = = b, r > 1
l
r -1
=b
l
r 1
     = b+  ...(1)
l l
r
Average demand = E ( X ) = =a ...(2)
l
Putting in Eq. (1),

1
a = b+
l
1
= a-b ...(3)
l
r r 1
Var( X ) = 2 = ◊
l l l

      
= a( a - b) [from Eq. (2) and (3)]

Exercise 5.5
1. Find the probabilities that the value of a random variable will exceed
4, if it has gamma distribution with
1 1
(a) l = , r = 2 (b) l = ,r = 3
3 4

[Ans.: (a) 0.5551 (b) 4]


2. If X follows the gamma distribution with parameter l and r, prove that
1
r+
2.
the expected value of the positive square root of X is
lr

3. A random sample of size n is taken from a population which is


exponentially distributed with parameter l. If X is the sample mean,
show that nl X follows a simple gamma distribution with parameter
n.
CHAPTER
6
Applied Statistics:
Test of Hypothesis

Chapter Outline
6.1 Introduction
6.2 Terms Related to Tests of Hypothesis
6.3 Procedure for Testing of Hypothesis
6.4 Test of Significance for Large Samples
6.5 Test of Significance for Single Proportion – Large Samples
6.6 Test of Significance for Difference between Two Proportions – Large
Samples
6.7 Test of Significance for Single Mean – Large Samples
6.8 Test of Significance for Difference between Two Means – Large Samples
6.9 Test of Significance for Difference of Standard Deviations – Large Samples
6.10 Small Sample Tests
6.11 Student’s t-distribution
6.12 t-test: Test of Significance for Single Mean
6.13 t-test: Test of Significance for Difference of Means
6.14 t-test: Test of Significance for Correlation Coefficients
6.15 Snedecor’s F-test for Ratio of Variances
2
6.16 Chi-square (c ) Test
6.17 Chi-square Test: Goodness of Fit
6.18 Chi-square Test for Independence of Attributes

6.1 Introduction

The main purpose behind the sampling theory is the study of the Tests of Hypothesis
or Tests of significance. In many situations, assumptions are made about the population
6.2 Chapter 6 Applied Statistics: Test of Hypothesis

parameters involved in order to arrive at decisions related to population on the basis


of sample information. Such an assumption is called statistical hypothesis which
may or may not be true. The procedure which enables us to decide on the basis of
sample results whether a hypothesis is true or not, is called test of hypothesis or test
of significance.

6.2 Terms Related to Tests of Hypothesis

(1) Parameters: The statistical constants of population such as mean (m), standard
deviation (s), correlation coefficient (r), population proportion (P) etc. are
called the parameters. Greek letters are used to denote the population param-
eters.
(2) Statistic: The statistical constants for the sample drawn from the given popula-
tion such as mean ( x ), standard deviation (s), correlation coefficient (r), sam-
ple proportion (p) etc., are called the statistic. Roman letters are used to denote
the sample statistic.
(3) Sampling Distribution: Consider all possible samples of size ‘n’ which can be
drawn from a population of size ‘N’. These samples will give different values
of a statistic. The means of the samples will not be identical. If these different
means are arranged according to their frequencies, the frequency distribution
formed is called sampling distribution of mean. Similarly, the sampling dis-
tribution of other statistics can be defined.
(4) Standard Error: The standard deviation of the sampling distribution of a statis-
tic is known as its standard error SE. Standard error plays a very important role
in the large sample theory and forms the basis of the testing of hypothesis.
(5) Null Hypothesis: Null hypothesis is the hypothesis which is tested for possible
rejection under the assumption that it is true. It is denoted by H0. It asserts that
there is no significant difference between the statistic and the population param-
eter and whatever observed difference exists, is merely due to the fluctuations in
sampling from the same population.
(6) Alternative Hypothesis: Any hypothesis which is complementary to the null
hypothesis is called an alternative hypothesis. It is denoted by H1. It is set
in such a way that the rejection of null hypothesis implies the acceptance of
alternative hypothesis. For example, if the null hypothesis is that the average
height of the students of a college is 166 cm. i.e., m0 = 166 cm, say then the null
hypothesis is
H 0 : m = 166( = m0 )
and the alternative hypothesis could be
    (i) H1 : m ¹ m 0 (i.e., m > m 0 or m < m 0 )
    (ii) H1 : m > m0
    (iii) H1 : m < m0
Thus, there can be more than one alternative hypothesis.
(7) Test Statistic: After setting up the null hypothesis and alternative hypothesis,
test statistic is calculated. The test statistic is a statistic based on appropriate
6.2 Terms Related to Tests of Hypothesis 6.3

probability distribution. It is used to test whether the null hypothesis should


be accepted or rejected. Different probability distribution values are used in
appropriate cases while testing the null hypothesis.
For Z-distribution under normal curve for large samples (n > 30), the ­Z-statistic
is defined by
t - E (t )
Z=
SE (t )
  (8) Errors in Hypothesis Testing: The main objective in sampling theory is to draw

valid inferences about the population parameters on the basis of the sample
results. There is every chance that a decision regarding a null hypothesis may
be correct or may not be correct. There are two types of errors.
    (i) Type I error: It is the error of rejecting the null hypothesis H0, when it is
true. It occurs when a null hypothesis is true, but the difference of means is
significant and the hypothesis is rejected. If the probability of making a type I
error is denoted by a, the level of significance, then the probability of mak-
ing a correct decision is (1 – a).
    (ii) Type II error: It is the error of accepting the null hypothesis H0, when it is
false. It occurs when a null hypothesis is false, but the difference of means
is insignificant and the hypothesis is accepted. The probability of making
a type II error is denoted by b.
(9) Level of Significance: The level of significance is the maximum probability of
making a type I error and is denoted by a, i.e., P(Rejecting H0 when H0 is true)
= a. The commonly used level of significance in practice are 5% (0.05) and 1%
(0.01). For 5% level of significance (a = 0.05), the probability of making type
I error is 0.05 or 5% i.e., P(Rejecting H0 when H0 is true) = 0.05. This means
that there is a probability of making 5 out of 100 type I error. Similarly, 1%
level of significance (a = 0.01) means that there is a probability of making 1
error out of 100. If no level of significance is given, a is taken as 0.05.
(10) Critical Region: The critical region or rejection region is the region of the
standard normal curve corresponding to a predetermined level of significance
a. The region under the normal curve which is not covered by the rejection
region is known as acceptance region. Thus, the statistic which leads to
rejection of null hypothesis H0 gives rejection region or critical region. The
value of the test statistic calculated to test the null hypothesis H0 is known as
critical value. Thus, the critical value separates the rejection region from the
acceptance region.
(11) Two Tailed Test and One Tailed Test: When the test of hypothesis is made
on the basis of rejection region represented by both the sides of the standard
normal curve, it is called a two tailed test. A test of statistical hypothesis, where
the alternative hypothesis H1 is two sided or two tailed such as:
Null Hypothesis H 0 : m = m0
Alternative Hypothesis H1 : m ≠ m 0 ( m > m 0 and m < m 0 ) , is called two tailed
test or two sided test.
6.4 Chapter 6 Applied Statistics: Test of Hypothesis

P(Z )

Rejection region Rejection region


Acceptance
region

Z
−a O a
Fig. 6.1 Two tailed test

A test of statistical hypothesis, where the alternative hypothesis is one sided is


called one tailed test or one sided test. There are two types of one tailed tests.
(i) Right Tailed Test: In the right tailed test, the rejection region or critical
region lies entirely on the right tail of the normal curve (Fig. 6.2).
(ii) Left Tailed Test: In the left tailed test, the rejection region or critical
region lies entirely on the left tail of the normal curve.
P(Z ) P(

Rejection region Rejection region


Acceptance Acceptance
region region
Z
O a −a O
Fig. 6.2 Right tailed test
P(Z )

Rejection region Rejection region


Acceptance
region
Z Z
a −a O
Fig. 6.3 Left tailed test

For example, in a test for testing the mean (m) of the population
Null Hypothesis H 0 : m = m0

Alternative Hypothesis H1 : m > m0 (Right tailed)


m < m0 (Left tailed)
A two tailed test is applied in such cases when the difference between the
sample mean and population mean is tending to reject the null hypothesis
H0, the difference may be positive or negative.
A one tailed test is applied in such cases when the population mean is at
least as large as some specified value of the mean (right tailed test) or at
least as small as some specified value of the mean (left tailed test).
6.3 Procedure for Testing of Hypothesis 6.5

Critical value (Za) Level of significance a


1% 5% 10%
Two tailed test |Za| = 2.58 |Za| = 1.96 |Za| = 1.645
Right tailed test Za = 2.33 Za = 1.645 Za = 1.28
Left tailed test Za = –2.33 Za = –1.645 Za = – 1.28

(12) Confidence Limits: The limits within which a hypothesis should lie with
specified probability are called confidence limits or fiducial limits. Gener-
ally, the confidence limits are set up with 5% or 1% level of significance.
If the sample value lies between the confidence limits, the hypothesis is
accepted, if it does not, then the hypothesis is rejected at the specified level
of significance. Suppose that the sampling distribution of a statistic S is
normal with mean m and standard deviation s. The sample statistic S can
be expected to lie in the interval (m – 1.96s, m + 1.96s) for 95% times
(Fig. 6.29). Because of this, (S – 1.96s, S + 1.96s) is called the 95% confi-
dence interval for estimation of m. The ends of this interval, i.e., S ± 1.96s
are called 95% confidence limits for S. Similarly, S ± 2.58s are 99% confi-
dence limits. The numbers 1.96, 2.58 etc. are called confidence coefficients.
P(X )

Critical Critical
region region

X
–1.96σ m 1.96σ
Fig. 6.4 Confidence Limits

6.3 Procedure for Testing of Hypothesis

The various steps in testing of a statistical hypothesis are as follows:


(i) Null Hypothesis: Set up the Null Hypothesis H0
(ii) Alternative Hypothesis: Set up the Alternative Hypothesis H1.
This will decide the use of single-tailed (right or left) or Two tailed test.
(iii) Level of Significance; Select the appropriate level of significance (a)
depending on the reliability of the estimates and permissible risk. If no level
of significance is given, a is selected as 0.05.
(iv) Test Statistic: Calculate the test statistic
t - E (t )
Z= under H 0
SE (t )
(v) Critical Value: Find the significant value (tabulated value) Za of Z at the
given level of significance a.
6.6 Chapter 6 Applied Statistics: Test of Hypothesis

(vi) Decision: Compare the calculated value of Z with the tabulated value Za.
If |Z| < Za i.e., if the calculated value of Z is less than tabulated value Za at
the level of significance a, the null hypothesis is accepted. If |Z| > Za i.e.,
if the calculated value of Z is more than tabulated value Za at the level of
significance a, the null hypothesis is rejected.

6.4 Test of Significance for Large Samples

If a sample consists of more than 30 items, i.e., n > 30, it is considered as large sample.
The following assumptions are applied for significance tests of large samples:
(i) The random sampling distribution of statistic has the properties of the normal
curve.
(ii) Values (i.e., statistic) given by the samples are sufficiently close to the popu-
late values (i.e., parameters) and can be used in its place for calculating the
standard error (SE) of the estimate.
For example, if SD of the population is not known, SE can be calculated by SD of the
sample.
Suppose the hypothesis to be tested is that the probability of success in such trail is
p. Assuming it to be true, the mean m and the standard deviation s of the sampling
distribution of the number of successes are np and npq respectively as the sampling
distribution of number of successes follows a binomial probability distribution.
If x is the observed number of successes in the sample and Z is the standard normal
variate then
x−m
Z=
s
The tests of significance are as follows:
(i) If |Z| < 1.96, the difference between the observed and expected number of
successes is not significant.
(ii) If |Z| > 1.96, the difference is significant at 5% level of significance.
(iii) If |Z| > 2.58, the difference is significant at 1% level of significance.

Example 1
A coin was tossed 960 times and returned heads 183 times. Test the
hypothesis that the coin is unbiased. Use a 0.05 level of significance.
Solution
n = 960
1
p = probability of getting head =
2
1 1
q = 1- p = 1- =
2 2
6.4 Test of Significance for Large Samples 6.7

æ1ö
m = np = 960 ç ÷ = 480
è2ø
1 1
s = npq = 960 ´ ´ = 15.49
2 2
x = number of sucessess = 183
(i) Null Hypothesis H0: The coin is unbiased.
(ii) Alternative Hypothesis H1: The coin is biased.
(iii) Level of significance: a = 0.05
x - m 183 - 480
(iv) Test statistic: Z = = = -19.17
s 15.49
| Z | = 19.17
(v) Critical value: |Z0.05| = 1.96
(vi) Decision: Since |Z| > |Z0.05|, the null hypothesis is rejected at 5% level of
significance, i.e., the coin is biased.

Example 2
A dice is tossed 960 times and it falls with 5 upwards 184 times. Is the
dice unbiased at a level of significance of 0.01?
Solution
nn == 960
960
11
pp == Probability
Probabilityof of throwing
throwing 55 with
with one die ==
one die
66
11 55
qq ==11−− pp == 11−− ==
66 66
 11
mm == np
np == 960
960  ==160 160
 66
11 55
ss == npq npq == 960 960×× ×× ==11 11..55
55
66 66
xx == number
number of of succes
succes ses ==184
ssses 184
(i) Null Hypothesis H0: The dice is unbiased.
(ii) Alternative Hypothesis H1: The dice is biased.
(iii) Level of significance: a = 0.01
x - m 184 - 160
(iv) Test statistic: Z = = = 2.08
m 11.55
Z = 2.08
(v) Critical value: |Z0.01| = 2.58
(vi) Decision: Since |Z| < |Z0.01|, the null hypothesis is accepted at 1% level of
significance, i.e., the dice is unbiased.
6.8 Chapter 6 Applied Statistics: Test of Hypothesis

6.5 Test of Significance for Single Proportion —


Large Samples

Let p be the sample proportion in a large random sample of size n drawn from a
population having proportion P. Also, the population proportion P has a specified
value P0.
Working Rule
Null Hypothesis H0: P = P0, i.e., the population proportion P has a specified
(i)
value P0.
(ii) Alternative Hypothesis H1: P π P0 (i.e., P > P0 or P < P0)
or H1: P > P0
or H1: P < P0
(iii) Level of significance: Select the level of significance a
p-P
(iv) Test statistic: Z = , where Q = 1 – P
PQ
n
(v) Critical Value: Find the critical value (tabulated value) Za of Z at the given
level of significance.
(vi) Decision: If |Z | < Za at the level of significance a, the null hypothesis is
accepted. If |Z | > Za at the level of significance a, the null hypothesis is
rejected.
Note
1. Null Hypothesis H0 is rejected when Z > 3 without mentioning any level of
significance.
2. Confidence limits:
PQ
(i) 95% confidence limits = p ± 1.96
n
PQ
(ii) 99% confidence limits = p ± 2.58
n
If the population proportions P and Q are not known, p and q are used in equations.

Example 1
A manufacturer claimed that atleast 95% of the equipment which he
supplied to a factory conformed to specification. An examination of a
sample of 200 pieces of equipment revealed that 18 were faulty. Test his
claim at 5% level of significance.
Solution
n = 200
6.5 Test of Significance for Single Proportion — Large Samples 6.9

Number of pieces conforming to specification = 200 – 18 = 182


182
p = Sample proportion of pieces conforming to specification = = 0.91
200
P = Population proportion of pieces conforming to specification = 0.95
Q = 1 – P = 1 – 0.95 = 0.05
(i) Null Hypothesis H0: P = 0.95 i.e., the proportion of pieces conforming to
proportion is 95%.
(ii) Alternating Hypothesis H1: P < 0.95 (Left tailed test)
(iii) Level of significance: a = 0.05
p−P 0.91 − 0.95
(iv) Test statistic: Z = = = −2.59
PQ (0.95)(0.05)
n 200
      |Z| = 2.59
(v) Critical value: Z 0.05 = 1.645
(vi) Decision: Since |Z| > |Z0.05|, the null hypothesis is rejected at 5% level of
significance, i.e., the manufacturer’s claim is rejected.

Example 2
In a hospital 480 female and 520 male babies were born in a week. Do
these figures confirm the hypothesis that males and females were born
in equal numbers?
Solution
     n = Total number of births = 480 + 520 = 1000
480
p = Sample proportion of females born = = 0.48
1000
P = Population proportion of females born = 0.5
Q = 1 – P = 1 – 0.5 = 0.5
(i) Null Hypothesis H0: P = 0.5 i.e., the males and females were born in equal
numbers.
(ii) Alternative Hypothesis H1: P π 0.5 (Two tailed test)
(iii) Level of significance: a = 0.05 (assumption)
p−P 0.48 − 0.5
(iv) Test statistic: Z = = = −1.265
PQ (0.5)(0.5)
n 1000
|Z | = 1.265
(v) Critical value: |Z0.05| = 1.96
(vi) Decision: Since |Z| < |Z0.05|, the null hypothesis is accepted at 5% level of
significance, i.e., males and females were born in equal proportions.
6.10 Chapter 6 Applied Statistics: Test of Hypothesis

Example 3
In a study designed to investigate whether certain detonators used with
explosives in a coal mining meet the requirement that at least 90% will
ignite the explosive when charged. It is found that 174 of 200 detonators
function properly. Test the null hypothesis P = 0.9 against the alternative
hypothesis P < 0.9 at the 0.05 level of significance.
Solution
n = 2000
174
p = Sample proportion of detonators functioning properly = = 0.87
200
P = Population proportion of detonators functioning properly = 0.9
Q = 1 – P = 1 – 0.9 = 0.1
(i) Null Hypothesis H0: P = 0.9
(ii) Alternative Hypothesis H1: P < 0.9 (Left tailed test)
(iii) Level of significance: a = 0.05

(iv) Test statistic: Z = p - P = 0.87 - 0.9 = - 1.41


PQ (0.9)(0.1)
n 200
Z = 1.41

(v) Critical value: Z 0.05 = 1.645

(vi) Decision: Since Z < Z 0.05 , the null hypothesis is accepted at 5% level of
significance.

Example 4
A salesman in a departmental store claims that at most 60 percent of the
shoppers entering the store leave without making a purchase. A random
sample of 50 shoppers showed that 35 of them left without making a
purchase. Are these sample results consistent with the claim of the
salesman? Use a level of significance of 0.05.
Solution
n = 50
35
p = Sample proportion of shoppers not making a purchase = = 0.7
50
6.5 Test of Significance for Single Proportion — Large Samples 6.11

P = Population proportion of shoppers not making a purchase = 0.6


Q = 1 – P = 1 – 0.6 = 0.4
(i) Null Hypothesis H0: P = 0.6, i.e., the proportion of shoppers not making a
purchase is 60%.
(ii) Alternative Hypothesis H1: P > 0.6 (Right tailed test)
(iii) Level of significance: a = 0.05

(iv) Test statistic: Z = p - P = 0.7 - 0.6 = 1.443


PQ (0.6)(0.4)
n 50
Z = 1.443

(v) Critical value: Z 0.05 = 1.645

(vi) Decision: Since Z < Z 0.05 , the null hypothesis is accepted, i.e., the sample
results are consistent with claim of the salesman.

Example 5
The fatality rate of typhoid patients is believed to be 17.26%. In a certain
year 640 patients suffering from typhoid were treated in a metropolitan
hospital and only 63 patients died. Can you consider the hospital efficient
at 1% level of significance?
Solution
n = 640
63
p = Sample proportion of typhoid patients died = = 0.0984
640
P = Population proportion of typhoid patients died = 0.1726
Q = 1 – P = 1 – 0.1726 = 0.8274
(i) Null Hypothesis H0: P = 0.1726, i.e., the hospital is efficient.
(ii) Alternative Hypothesis H1: P < 0.1726 (Left tailed test)
(iii) Level of significance: a = 0.01
p-P 0.0984 - 0.1726
(iv) Test statistic: Z = = = - 4.97
PQ (0.1726)(0.8274)
n 640
Z = 4.97

(v) Critical value: Z 0.01 = 2.33


6.12 Chapter 6 Applied Statistics: Test of Hypothesis

(vi) Decision: Since Z > Z 0.01 , the null hypothesis is rejected at 1% level of
significance, i.e., the hospital is efficient.

Example 6
In a big city, 325 men out of 600 were found to be smokers. Does this
information support the conclusion that the majority of men in this city
are smokers?
Solution
n = 600
325
p = Sample proportion of smokers in city = = 0.542
600
P = Population proportion of smokers in city = 0.5
Q = 1 – P = 1 – 0.5 = 0.5
(i) Null Hypothesis H0: P = 0.5, i.e., the proportion of smokers in the city is
50%.
(ii) Alternative Hypothesis H1: P > 0.5 (Right tailed test)
(iii) Level of significance: a = 0.05 (assumption)
p-P 0.542 - 0.5
(iv) Test statistic: Z = = = 2.06
PQ (0.5)(0.5)
n 600
Z = 2.06

(v) Critical value: Z 0.05 = 1.645

(vi) Decision: Since Z > Z 0.05 , the null hypothesis is rejected at 5% level of sig-
nificance, i.e., proportion of smokers in city is more than 50% and majority of
men in the city are smokers.

Example 7
In a random sample of 160 worker exposed to a certain amount of
radiation, 24 experienced some ill effects. Construct a 95% confidence
interval for the corresponding true percentage.
Solution
n = 160
24
p = Sample proportion of workers exposed to radiation = = 0.15
160
q = 1 – p = 1 – 0.15 = 0.85
6.6 Test of Significance for Difference of Proportions — Large Samples 6.13

Confidence interval at 95% level of significance is


Ê pq pq ˆ
Á p - 1.96 , p + 1.96 ˜
Ë n n ¯

Ê (0.15)(0.85) (0.15)(0.85) ˆ
i.e., Á 0.15 - 1.96 , 0.15 + 1.96 ˜
Ë 160 160 ¯
i.e., (0.0947, 0.2053)

6.6 Test of Significance for Difference of


Proportions — Large Samples

Let p1 and p2 be the sample proportions in two large samples of sizes n1 and n2 drawn
from two populations having proportions P1 and P2.
Working Rule
Null Hypothesis H0: P1 = P2, i.e., there is no significant difference in two
(i)
population proportions P1 and P2.
(ii) Alternative Hypothesis H1: P1 π P2
or H1: P1 > P2
or H1: P1 < P2
(iii) Level of significance: Select level of significance a
(iv) Test statistic: There are two cases:
(a) When the population proportions P1 and P2 are known
P1 − P2
Z=
P1Q1 P2 Q2
+
n1 n2
(b) When the population proportions P1 and P2 are not known but sample
proportions p1 and p2 are known
There are two methods to estimate P1 and P2.
Method of Substitution: In this method, sample proportions p1 and p2
are substituted for P1 and P2.
p1 − p2
Z=
p1q1 p2 q2
+
n1 n2
Method of pooling: In this method, the estimated value of two popula-
tion proportions is obtained by pooling the two sample proportions p1
and p2 into a single proportion p.
6.14 Chapter 6 Applied Statistics: Test of Hypothesis

n1 p + n2 p2
p=
n1 + n2
p1 − p2
Z=
1 1
pq  + 
 n1 n2 
(v) Critical value: Find the critical value (tabulated value) of Z at given level of
significance.
(vi) Decision: If |Z| < Za at the level of significance, the null hypothesis is
accepted. If |Z| > Za at the level of significance, the null hypothesis is
rejected.
Note
1. Null Hypothesis H0 is rejected when Z > 3 without mentioning any level of
significance.
2. Confidence limits:
P1Q1 P2Q2
(i) 95% confidence limits = ( p1 - p2 ) ± 1.96 +
n1 n2
P1Q1 P2Q2
(ii) 99% confidence limits = ( p1 - p2 ) ± 2.58 +
n1 n2

If the population proportions P1 and P2 are not known, p1, p2 , q1 and q2 are used in
equations.

Example 1
Random samples of 400 men and 600 women were asked whether they
would like to have a flyover near their residence 200 men and 325
women were in favour of the proposal. Test the hypothesis that propor-
tions of men and women in favour of the proposal are same at 5% level
of significance.
Solution
n1 = 400, n2 = 600
200
p1 = Proportion of men = = 0.5
400
325
    p2 = Proportion of women = = 0.541
600
n1 p1 + n2 p2 (400)(0.5) + (600)(0.541)
p= = = 0.525
n1 + n2 400 + 600

q = 1 - p = 1 - 0.525 = 0.475
6.6 Test of Significance for Difference of Proportions — Large Samples 6.15

(i) Null Hypothesis H0: P1 = P2, i.e., there is no significant difference in pro-
portion of men and women in favour of the proposal.
(ii) Alternative Hypothesis is H1: P1 π P2 (Two tailed test)
(iii) Level of significance: a = 0.05
p1 - p2 0.5 - 0.541
(iv) Test statistic: Z = = = -1.28
æ1 1ö æ 1 1 ö
pq ç + ÷ (0.525)(0.475) ç + ÷
è n1 n2 ø è 400 600 ø
| Z | = 1.28
(v) Critical value: |Z0.05| = 1.96
(vi) Decision: Since |Z | < |Z0.05|, the null hypothesis is accepted at 5% level of
significance, i.e., there is no significant difference of opinion between
men and women in favour of the proposal.

Example 2
In a city A, 20% of a random sample of 900 school boys has a certain
slight physical defect. In another city B, 18.5% of a random sample
of 1600 school boys has the same defect. Is the difference between the
proportions significant at 0.05 level of significance?
Solution
n1 = 900, n2 = 1600
p1 = Proportion of school boys in city A = 0.2
p2 = Proportion of school boys in city B = 0.185
n p +n p (900)(0.2) + (1600)(0.185)
p= 1 1 2 2 = = 0.1904
        n1 + n2 900 + 1600
q = 1 – p = 1 – 0.1904 = 0.8096
(i) Null Hypothesis H0: P1 = P2, i.e., there is no significant difference in propor-
tion of two city school boys.
(ii) Alternative Hypothesis H1: P1 π P2 (Two tailed test)
(iii) Level of significance: a = 0.05
p1 - p2 0.2 - 0.185
(iv) Test statistic: Z = = = 0.916
Ê1 1ˆ Ê 1 1 ˆ
pq Á + ˜ (0.1904)(0.8096) Á +
Ë n1 n2 ¯ Ë 900 1600 ˜¯

Z = 0.916
(v) Critical value: Z 0.05 = 1.96
6.16 Chapter 6 Applied Statistics: Test of Hypothesis

(vi) Decision: Since Z < Z 0.05 , the null hypothesis is accepted at 5% level of
significance, i.e., there is no significant difference between the proportions of
two city school boys.

Example 3
Before an increase in excise duty on tea, 800 people out of a sample of
1000 were consumers of tea. After an increase in excise duty, 800 people
were consumers of tea in a sample of 1200 persons. Find whether there
is significant decrease in the consumption of tea after the increase in
duty.
Solution
n1 = 1000, n2 = 1200
800
p1 = Proportion of consumers of tea before increase in excise duty = = 0.8
1000
800
p2 = Proportion of consumers of tea after increase in excise duty = = 0.67
1200
n p +n p (1000)(0.8) + (1200)(0.67)
p= 1 1 2 2 = = 0.73
n1 + n2 1000 + 1200
       
q = 1 – p = 1 – 0.73 = 0.27
(i) Null Hypothesis H0: P1 = P2, i.e., there is no significant decrease in the con-
sumption of tea after the increase in duty.
(ii) Alternative Hypothesis H1: P1 > P2 (Right tailed test)
(iii) Level of significance: a = 0.05 (assumption)
p1 - p2 0.8 - 0.67
(iv) Test statistic: Z = = = 6.84
Ê 1 1ˆ Ê 1 1 ˆ
pq Á + ˜ (0.73)(0.27) Á +
Ë n1 n2 ¯ Ë 1000 1200 ˜¯

Z = 6.84
(v) Critical value: Z 0.05 = 1.645

(vi) Decision: Since Z > Z 0.05 , the null hypothesis is rejected at 5% level of sig-
nificance, i.e., there is significant decrease in the consumption of tea after the
increase in duty.

Example 4
15.5% of a random sample of 1600 undergraduates smokers, whereas
20% of a random sample of 900 postgraduates were smokers in a state.
6.6 Test of Significance for Difference of Proportions — Large Samples 6.17

Can we conclude that less number of undergraduates are smokers than


the postgraduates?
Solution
n1 = 1600, n2 = 900
p1 = Proportion of undergraduate smokers = 0.155
p2 = Proportion of postgraduate smokers = 0.2
n p +n p (1600)(0.155) + (900)(0.2)
p= 1 1 2 2 = = 0.1712
n1 + n2 1600 + 900
       
q = 1 – p = 1 – 0.1712 = 0.8288
(i) Null Hypothesis H0: P1 = P2, i.e., there is no significant difference in propor-
tion of undergraduate and postgraduate smokers.
(ii) Alternative Hypothesis H1: P1 < P2 (Left tailed test)
(iii) Level of significance: a = 0.05 (assumption)
p1 - p2 0.155 - 0.2
(iv) Test statistic: Z = = = - 2.87
Ê 1 1ˆ Ê 1 1 ˆ
pq Á + ˜ (0.1712)(0.8288) Á +
Ë n1 n2 ¯ Ë 1600 900 ˜¯

Z = 2.87

(v) Critical value: Z 0.05 = 1.645

(vi) Decision: Since Z > Z 0.05 , the null hypothesis is rejected at 5% level of sig-
nificance, i.e., less number of undergraduates smokers than the postgraduates.

Example 5
A machine produced 20 defective articles in a batch of 400. After
overhauling it produced 10 defective articles in a batch of 300. Has the
machine improved?
Solution
n1 = 400, n2 = 300
20
p1 = Proportion of defective articles before overhauling = = 0.05
400
10
p2 = Proportion of defective articles after overhauling = = 0.033
300
n p +n p (400)(0.05) + (300)(0.033)
p= 1 1 2 2 = = 0.043
n1 + n2 400 + 300
       
q = 1 – p = 1 – 0.043 = 0.957
6.18 Chapter 6 Applied Statistics: Test of Hypothesis

(i) Null Hypothesis H0: P1 = P2, i.e., the proportions of defective articles before
and after overhauling are equal.
(ii) Alternative Hypothesis H1: P1 > P2 (Right tailed test)
(iii) Level of significance: a = 0.05 (assumption)
p1 - p2 0.05 - 0.033
(iv) Test statistic: Z = = = 1.097
Ê 1 1ˆ Ê 1 1 ˆ
pq Á + ˜ (0.043)(0.957) Á +
Ë 1
n n2¯
Ë 400 300 ˜¯

Z = 1.097

(v) Critical value: Z 0.05 = 1.645

(vi) Decision: Since Z < Z 0.05 , the null hypothesis is accepted at 5% level of sig-
nificance, i.e., proportion of defective articles before and after are equal and
machine has not improved.

Example 6
In two large populations, there are 30% and 25% fair haired people
respectively. Is this difference likely to be hidden in samples of 1200 and
900 respectively from the two populations?
Solution
     n1 = 1200, n2 = 900
P1 = Proportion of faired people in the first population = 0.3
Q1 = 1 - P1 = 1 - 0.3 = 0.7
P2 = Proportion of faired people in the second populationn = 0.25

    Q2 = 1 - P2 = 1 - 0.25 = 0.75


(i) Null Hypothesis H0: P1= P2, i.e., the difference in population proportions is
likely to be hidden in sampling.
(ii) Alternative Hypothesis H1: P1 π P2 (Two tailed test)
(iii) Level of significance: a = 0.05 (assumption)
P1 - P2 0.3 - 0.25
(iv) Test statistic: Z = = = 2.56
P1Q1 P2Q2 (0.3)(0.7) (0.25)(0.75)
+ +
n1 n2 1200 900
Z = 2.56
(v) Critical value: |Z0.05| = 1.96
(vi) Decision: Since |Z| > |Z0.05|, the null hypothesis is rejected at 5% level of
significance, i.e., the difference in population proportions is not likely to be
hidden in sampling.
6.6 Test of Significance for Difference of Proportions — Large Samples 6.19

Example 7
A random sample of 300 shoppers at a supermarket includes 204 who
regularly uses cents off coupons. In another sample of 500 shoppers at
a supermarket includes 75 who regularly uses cents off coupons. Obtain
95% confidence limits for the difference in the population proportions.
Solution
n1 = 300, n2 = 500
p1 = Proportion of shoppers who uses cents of coupons in the first sample
204
= = 0.68
300
q1 = 1 – p1 = 1 – 0.68 = 0.32
p2 = Proportion of shoppers who uses cents of coupons in the second sample
75
= = 0.15
500
q2 = 1 – p2 = 1 – 0.15 = 0.85
p1q1 p2 q2 (0.68)(0.32) (0.15)(0.85)
SE = + = + = 0.031
n1 n2 300 500
   
95% confidence limits for the difference in population proportion is
p1q1 p2 q2 pq pq
( p1 - p2 ) - 1.96 + , ( p1 - p2 ) + 1.96 1 1 + 2 2
n1 n2 n1 n2

i.e.,    (0.68 – 0.15) – 1.96 (0.031), (0.68 – 0.15) + 1.96 (0.031)


i.e.,    (0.469, 0.591)

Exercise 6.1
1. A manufacturer claims at least 95% of the items he produces are failure
free. Examinations of a random sample of 600 items showed 39 to be
defective. Test the claim at a significance level of 0.05.
[Ans.: Claim is rejected]
2. In a sample of 400 parts manufactured by a factory, the number of
defective parts was found to be 30. The company, however, claim that
only 5% of their product is defective. Is the claim tenable?
[Ans.: Claim is rejected]
3. A sample of 600 persons selected at random from a large city shows
that the percentage of male in the sample is 53%. It is believed that
1
male to the total population ratio in the city is . Test whether this
2
6.20 Chapter 6 Applied Statistics: Test of Hypothesis

belief is confirmed by the observation.


 [Ans.: Belief is confirmed by the observation]
4. In a sample of 1000 people in Karnataka, 540 are rice eaters and the
rest are wheat eaters. Can we assume that both rice and wheat are
equally popular in this state at 1% level of significance?
[Ans.: Both rice and wheat are equally popular in state]
5. In a big city 325 men out of 600 men were found to be smokers. Does
this information support the conclusion that the majority of men in this
city are smokers?
[Ans.: Majority of men in the city are smokers]
6. A dice was thrown 400 times and ‘six’ resulted 80 times. Do the data
justify the hypothesis of an unbiased dice.
[Ans.: The dice is unbiased]
7. In a random sample of 125 cold drinkers, 68 said they prefer ‘Thumsup’
to Pepsi’. Test the null hypothesis P = 0.5 against the alternative
hypothesis P > 0.5.
[Ans.: Null hypothesis is accepted]
8. A social worker believes that fewer than 25% of the couples in a certain
area have ever used any form of birth control. A random sample of 120
couples was contacted. Twenty of them said they have used. Test the
belief of the social worker at 0.05 level.
[Ans.: Belief of the social worker is true]
9. 20 people were attacked by a disease and only 18 survived. Will you
reject the hypothesis that the survival rate is attacked by this disease
is 85% in favour of the hypothesis that is more at 5% level?
[Ans.: The hypothesis is accepted]
10. A manufacturer of electronic equipment subjects samples of two
completing brands of transistors to an accelerated performance test. If
45 of 180 transistors of the first kind and 34 of 120 transistors of second
kind fail the test, what can be conclude at the level of significance
a = 0.05 about the difference between the corresponding sample
proportion?
[Ans.: The difference between the proportions is not significant]
11. On the basis of their total scores, 200 candidates of a civil service
examination are divided into two groups, the upper 30% and the
remaining 70%. Consider the first question of the examination. Among
the first group, 40 had the correct answer, whereas among the second
group, 80 had the correct answer. On the basis of these results, can one
conclude that the first question is not good at discriminating ability of
the type being examined here?
[Ans.: The first question is good enough at discriminating
ability of the type being examined]
6.7 Test of Significance for Single Mean — Large Samples 6.21

12. A company wanted to introduce a new plan of work and a survey was
conducted for this purpose. Out of sample of 500 workers in one
group, 62% favoured the new plan and another group of sample of
400 workers, 41% were against the new plan. Is there any significant
difference between the two groups in their attitude towards the new
plan at 5% level of significance?
[Ans.: There is no significant difference between the
two groups in their attitude towards the new plan]
13. In a random sample of 1000 persons from town A, 400 are found to be
consumers of wheat. In a sample of 800 from town B, 400 are found to
be consumers of wheat. Do these data reveal a significant difference
between town A and town B, so far as the proportion of wheat consumers
is concerned?
[Ans.: There is significant difference between town A and town B
as the proportion of wheat consumers is concerned]
14. 100 articles from a factory are examined and 10 are found to be
defective. Out of 500 similar articles from a second factory 15 are
found to be defective. Test the significance between the difference of
two proportions at 5% level.
[Ans.: There is a significant difference between the two proportions]

6.7 Test of Significance for Single Mean — Large


Samples
Let a random sample size n (n > 30) has the sample mean x and population has the
mean m. Also, the population mean m has a specified value m0.
Working Rule
Null Hypothesis H 0 : m = m 0 , i.e., the population mean m has a specified
(i)
value m0.
(ii) Alternative Hypothesis H1 : m ¹ m0 .
(iii) Level of significance: Select the level of significance a.
(iv) Test statistic: There are two cases for calculating a test statistic Z.
(a) When the standard deviation s of population is known
x -m
Z=
æ s ö
ç ÷
è nø
(b) When the standard deviation s of population is not known
x -m
Z=
æ s ö
ç ÷
è nø
where s is the sample SD.
6.22 Chapter 6 Applied Statistics: Test of Hypothesis

(v) Critical value: Find the critical value (tabulated value) Za of Z at the given level
of significance a.
(vi) Decision: If | Z | < Za at the level of significance a, the null hypothesis is
accepted. If | Z | > Za at the level of significance a, the null hypothesis is
rejected.
Note
1. Null Hypothesis H0 is rejected when Z > 3 without mentioning any level of
significance.
2. Confidence limits:
Ê s ˆ
(i) 95% confidence limits = x ± 1.96 Á
Ë n ˜¯

Ê s ˆ
(ii) 99% confidence limits = x ± 2.58 Á
Ë n ˜¯
If standard deviation s of population is not known, s is used in equations.

Example 1
A random sample of 100 Indians has an average life span of 71.8 years
with standard deviation of 8.9 years. Can it be concluded that the
average life span of an Indian is 70 years?
Solution
n = 100, x = 71.8 years, m = 70 years, s = 8.9 years
(i) Null Hypothesis H0: m = 70 years i.e., the average life span of an Indian is
70 years.
(ii) Alternative Hypothesis H1: m π 70 years (Two tailed test)
(iii) Level of Significance: a = 0.05 (assumption)

x - m 71.8 - 70
(iv) Test statistic: Z = = = 2.02
æ s ö æ 8.9 ö
ç ÷ ç ÷
è n ø è 100 ø
Z = 2.02
(v) Critical value: |Z0.05| = 1.96
(vi) Decision: Since |Z| > |Z0.05|, the null hypothesis is rejected at 5% level of
significance, i.e., the average life span of an Indian is not 70 years.

Example 2
A random sample of 50 items gives the mean 6.2 and variance 10.24.
Can it be regarded as drawn from a normal population with mean 5.4 at
5% level of significance?
6.7 Test of Significance for Single Mean — Large Samples 6.23

Solution
n = 50, x = 6.2, m = 5.4, s = 10.24

(i) Null Hypothesis H0: m = 5.4, i.e., the sample is drawn from a normal popula-
tion with mean 5.4.
(ii) Alternative Hypothesis H1: m π 5.4 (Two tailed test)
(iii) Level of significance: a = 0.05
x-m 6.2 - 5.4
(iv) Test statistic: Z = = = 1.77
Ê s ˆ Ê 10.24 ˆ
ÁË n ˜¯ Á ˜
Ë 50 ¯
Z = 1.77
(v) Critical value: Z 0.05 = 1.96

(vi) Decision: Since Z < Z 0.05 , the null hypothesis is accepted at 5% level of sig-
nificance i.e., the sample is drawn from a normal population with mean 5.4.

Example 3
A random sample of 400 members is found to have a mean of 4.45 cm.
Can it be reasonably regarded as a sample from a large population
whose mean is 5 cm and variance is 4 cm?
Solution
n = 400, x = 4.45 cm, m = 5 cm s = 4 = 2 cm

(i) Null Hypothesis H0: m = 5 cm, i.e., the sample is drawn from a large population
with mean 5 cm.
(ii) Alternative Hypothesis H1: m π 5 cm (Two tailed test)
(iii) Level of significance: a = 0.05 (assumption)

x-m 4.45 - 5
(iv) Test statistic: Z = = = 5.55
Ê s ˆ Ê 2 ˆ
ÁË n ˜¯ ÁË 400 ˜¯

Z = 5.55
(v) Critical value: Z 0.05 = 1.96

(vi) Decision: Since Z > Z 0.05 , the null hypothesis is rejected at 5% level of signif-
icance, i.e., the sample is not drawn from the large population with mean 5 cm.
6.24 Chapter 6 Applied Statistics: Test of Hypothesis

Example 4
A sample of 900 members has a mean of 3.4 cm and SD 2.61 cm. Is the
sample from a large population of mean 3.25 cm and SD 2.61 cm? If
the population is normal and its mean is unknown, find the 95% fiducial
limits of its true mean.
Solution
n = 900, x = 3.4 cm, s = 2.61 cm, m = 3.25 cm, s = 2.61 cm
(i) Null Hypothesis H0: m = 3.25 cm, i.e., the sample has been drawn from the
population with mean m = 3.25 cm and SD = 2.61 cm.
(ii) Alternative Hypothesis H1: m π 3.25 cm (Two tailed test)
(iii) Level of significance: a = 0.05
x - m 3.4 - 3.25
(iv) Test statistic: Z = = = 1.72
Ê s ˆ Ê 2.61 ˆ
ÁË n ˜¯ ÁË 900 ˜¯

Z = 1.72
Critical value: |Z0.05| = 1.96
(v)
(vi)
Decision: Since |Z| < |Z0.05|, the null hypothesis is accepted at 5% level of
significance i.e., the sample has been drawn from the population with mean
m = 3.25 cm.
95% fiducial limits:
 s   2.61 
x ± 1.96  = 3.4 ± 1.96  = 3.4 ± 0.1705,
 n   900 
i.e., 3.5705 and 3.2295

Example 5
A type company claims that the lives of tyres have mean 42000 km
with s.d. of 4000 km. A change in the production process is believed to
result in better product. A test sample of 81 new tyres has a mean life of
42500 km. Test at 5% level of significance that the new product is
significantly better than the old one.
Solution
n = 81, x = 42500 km, m = 42000 km, s = 4000 km

(i) Null Hypothesis H0: m = 42000 km, i.e., the new product is not significantly
better than the old one.
(ii) Alternative Hypothesis H1: m > 42000 km (Right tailed test)
(iii) Level of significance: a = 0.05
6.7 Test of Significance for Single Mean — Large Samples 6.25

x - m 42500 - 42000
(iv) Test statistic: Z = = = 1.125
Ê s ˆ Ê 4000 ˆ
ÁË n ˜¯ ÁË 81 ˜¯

Z = 1.125
(v) Critical value: Z 0.05 = 1.645

(vi) Decision: Since Z < Z 0.05 , the null hypothesis is accepted at 5% level of sig-
nificance, i.e., the new product is not significantly better than the old one.

Example 6
The mean breaking strength of cables supplied by a manufacturer is 1800
with standard deviation 100. By a new technique in the manufacturing
process it is claimed that the breaking strength of the cable has increased.
In order to test the claim a sample of 50 cables is tested. It is found that
the mean breaking strength is 1850. Can we support the claim at 1%
level of significance?
Solution
n = 50, x = 1850, m = 1800, s = 100

(i) Null Hypothesis H0: m = 1800, i.e., the mean breaking strength of cables sup-
plied by manufacturer is 1800.
(ii) Alternative Hypothesis H1: m > 1800 (Right tailed test)
(iii) Level of significance: a = 0.01
x - m 1850 - 1800
(iv) Test statistic: Z = = = 3.54
Ê s ˆ Ê 100 ˆ
ÁË n ˜¯ ÁË 50 ˜¯

Z = 3.54
(v) Critical value: Z 0.01 = 2.33

(vi) Decision: Since Z > Z 0.01 , the null hypothesis is rejected at 1% level of sig-
nificance, i.e., the mean breaking strength of cables supplied is more than
1800.

Example 7
An ambulance service claims that it takes on the average 10 minutes
to reach its destination in emergency calls. A sample of 36 calls has a
6.26 Chapter 6 Applied Statistics: Test of Hypothesis

mean of 11 minutes and the variance of 16 minutes. Test the claim at


0.05 level of significance.
Solution
n = 36, x = 11 minutes, m = 10 minutes, s = 16 = 4 minutes
(i) Null Hypothesis H0: m = 10 minutes, i.e., ambulance service takes 10 minutes
to reach the destination.
(ii) Alternative Hypothesis H1: m > 10 minutes (Right tailed test)
(iii) Level of significance: a = 0.05
x-m 11 - 10
(iv) Test statistic: Z = = = 1.5
Ê s ˆ Ê 4 ˆ
ÁË n ˜¯ ÁË 36 ˜¯

Z = 1.5
(v) Critical value: Z 0.05 = 1.645

(vi) Decision: Since Z < Z 0.05 , the null hypothesis is accepted at 5% level of con-
fidence, i.e., the ambulance service takes on the average 10 minutes to reach its
destination.

6.8 Test of Significance for Difference of


Means — Large Samples

Let x1 and x2 be the sample means of two independent large random samples with sizes
n1 and n2 (n1 > 30, n2 > 30) drawn from two populations with means m1 and m2 and
standard deviations s1 and s2.
Working Rule
Null Hypothesis H0: m1 = m2, i.e., the two samples have been drawn from
(i)
two different populations having the same means and equal standard devia-
tions.
(ii) Alternative Hypothesis H1: m1 π m2 (two tailed test)
         or H1: m1 < m2 (one tailed test)
         or H1: m1 > m2 (one tailed test)
(iii) Level of significance: Select the level of significance a.
(iv) Test statistic: There are two cases for calculating test statistic.
(a) When the population standard deviations s1 and s2 are known
x1 − x2
Z=
s 12 s 22
+
n1 n2
6.8 Test of Significance for Difference of Means — Large Samples 6.27

(b) When the population standard deviations s1 and s2 are not known
x - x2
Z= 1
s12 s22
+
n1 n2
   where s1 and s2 are sample standard deviations.
(v) Critical Value: Find the critical value (tabulated value) Za of Z at the given
level of significance.
(vi) Decision: If |Z| < Za at the level of significance a, the null hypothesis is
accepted. If |Z| > Za at the level of significance a, the null hypothesis is
rejected.
Note:
1. Null Hypothesis H0 is rejected when Z > 3 without mentioning any level of
significance.
2. Confidence limits:
s2 s2
(i) 95% confidence limits = ( x1 - x2 ) ± 1.96 1 + 2
n1 n2

s 12 s 22
(ii) 99% confidence limits = ( x1 - x2 ) ± 2.58 +
n1 n2
If population standard deviation s1 and s2 are not known, s1 and s2 are used in
equations.

Example 1
Test the significance of the difference between the means of two normal
population with the same standard deviation from the following data:
Size Mean SD
Sample I 100 64 6
Sample II 200 67 8

Solution
n1 = 100, n2 = 200, x1 = 64, x2 = 67 s1 = 6, s2 = 8
(i) Null Hypothesis H 0 : m1 = m2 i.e., there is no significant difference between
two means.
(ii) Alternative Hypothesis H1: m1 π m2 (Two tailed test)
(iii) Level of significance: a = 0.05 (assumption)
x1 - x2 64 - 67
(iv) Test statistic: Z = = = -3.31
s12 s22 (6)2 (8)2
+ +
n1 n2 200 100
6.28 Chapter 6 Applied Statistics: Test of Hypothesis

   |Z| = 3.31
(v) Critical value: |Z0.05| = 1.96
(vi) Decision: Since |Z| > |Z0.05|, the null hypothesis is rejected at 5% level of
significance, i.e., the samples do not support the hypothesis that the two
population have the same mean although they may have the same standard
deviation.

Example 2
The means of simple samples of sizes 1000 and 2000 are 67.5 and 68 cm
respectively. Can the samples be regarded as drawn from the same
population of S.D. 2.5 cm.
Solution
n1 = 1000, n2 = 2000, x1 = 67.5 cm, x2 = 68 cm, s = 2.5 cm
(i) Null Hypothesis H0: m1 = m2 i.e., the samples have been drawn from the
same population of S.D. 2.5 cm
(ii) Alternative Hypothesis H1 : m1 π m2 (Two tailed test)
(iii) Level of significance: a = 0.05 (assumption)
x1 - x2 67.5 - 68
(iv) Test statistic: Z = = = -5.16
s2
s 2
( 2.5) 2 ( 2.5) 2
+ +
n1 n2 1000 2000
|Z| = 5.16
(v) Critical value: |Z0.05| = 1.96
(iv) Decision: Since |Z| > |Z0.05|, the null hypothesis is rejected at 5% level of
significance, i.e., the samples cannot be regarded as drawn from the same
population of SD 2.5 cm.

Example 3
The mean life of a sample of 10 electric bulbs was found to be 1456
hours with SD of 423 hours. A second sample of 17 bulbs chosen from
a different batch showed a mean life of 1280 with SD of 398 hours. Is
there a significant difference between the means of two batches?
Solution
n1 = 10, n2 = 17, x1 = 1456 hours, x2 = 1280 hours, s1 = 423 hours, s2 = 398 houurs

(i) Null Hypothesis H0: m1 = m2, i.e., there is no significant difference between
the means of two batches.
(ii) Alternative Hypothesis H1: m1 π m2 (Two tailed test)
(iii) Level of significance: a = 0.05 (assumption)
6.8 Test of Significance for Difference of Means — Large Samples 6.29

x1 - x2 1456 - 1280
(iv) Test statistic: Z = = = 1.07
2 2
s s ( 423) 2 (398) 2
1
+ 2
+
n1 n2 10 17
Z = 1.07
(v) Critical value: |Z0.05| = 1.96
(vi) Decision: Since Z < |Z0.05|, the null hypothesis is accepted at 5% level of
significance, i.e., there is no significant difference between the means of
two batches.

Example 4
The average of marks scored by 32 boys is 72 with standard deviation 8
while that of 36 girls is 70 with standard deviation 6. Test at 1% level of
significance whether the boys perform better than the girls.
Solution
n1 = 32, n2 = 36, x1 = 72, x2 = 70, s1 = 8, s2 = 6

(i) Null Hypothesis H0: m1 = m2, i.e., there is no significant difference between the
performance of boys and girls.
(ii) Alternative Hypothesis H1: m1 > m2 (Right tailed test)
(iii) Level of significance: a = 0.01
x1 - x2 72 - 70
(iv) Test statistic: Z = = = 1.1547
s12 s22 (8)2 (6)2
+ +
n1 n2 32 36
Z = 1.1547
(v) Critical value: Z 0.01 = 2.33

(vi) Decision: Since Z < Z 0.01 , the null hypothesis is accepted at 1% level of sig-
nificance, i.e., the boys do not perform better than the girls.

Example 5
A simple sample of heights of 6400 English men has a mean of 170 cm
and a s.d. of 6.4 cm, while a simple sample of heights of 1600 Americans
has a mean of 172 cm and a s.d. of 63 cm. Do the data indicate that
American are, on the average, taller than the English men?
Solution
n1 = 1600, n2 = 6400, x1 = 172 cm, x2 = 170 cm, s1 = 6.3 cm, s2 = 6.4 cm
6.30 Chapter 6 Applied Statistics: Test of Hypothesis

(i) Null Hypothesis H0: m1 = m2, i.e., there is no significant difference in heights of
Americans and English men.
(ii) Alternative Hypothesis H1: m1 > m2 (Right tailed test)
(iii) Level of significance: a = 0.01 (assumption)
x1 - x2 172 - 170
(iv) Test statistic: Z = = = 11.32
s12 s22 (6.3)2 (6.4)2
+ +
n1 n2 1600 6400
Z = 11.332
(v) Critical value: Z 0.01 = 2.33

(vi) Decision: Since Z > Z 0.01 , the null hypothesis is rejected at 1% level of sig-
nificance, i.e., Americans are, on the average, taller than English men.

Example 6
In a certain factory there are two different processes of manufacturing
the same item. The average weight in a sample of 250 items produced
from one process is found to be 120 gm with a s.d. of 12 gm; the
corresponding figures in a sample of 400 items from the other process
are 124 gm and 14 gm. Is this difference between the two sample means
significant?
Solution
n1 = 250, n2 = 400, x1 = 120 gm, x2 = 124 gm, s1 = 12 gm, s2 = 14 gm

(i) Null Hypothesis H0: m1 = m2, i.e., there is no significant difference between the
two sample means.
(ii) Alternative Hypothesis H1: m1 π m2 (Two tailed test)
(iii) Level of significance: a = 0.05 (assumption)
x1 - x2 120 - 124
(iv) Test statistic: Z = = = - 3.87
s12 s22 (12)2 (14)2
+ +
n1 n2 250 400
Z = 3.87
(v) Critical value: Z 0.05 = 1.96

(vi) Decision: Since Z > Z 0.05 , the null hypothesis is rejected at 5% level of
significance, i.e., there is significant difference between two sample means.
6.9 Test of Significance for Difference of Standard Deviations — Large Samples 6.31

Example 7
The mean height of 50 male students who participate in sports is 68.2
inches with a s.d. of 2.5 inches. The mean height of 50 male students who
have not participated in sport is 67.2 inches with a s.d. of 2.8 inches.
Test the hypothesis that the height of students who have participated in
sports is more than the students who have not participated in sports.
Solution
n1 = 50, n2 = 50, x1 = 68.2 inch, x2 = 67.2 inch, s1 = 2.5 inch, s2 = 2.8 inch

(i) Null Hypothesis H0: m1 = m2, i.e., there is no significant difference in heights of
students who have participated in sports or not.
(ii) Alternative Hypothesis H1: m1 > m2 (Right tailed test)
(iii) Level of significance: a = 0.05 (assumption)
x1 - x2 68.2 - 67.2
(iv) Test statistic: Z = = = 1.88
s12 s22 (2.5)2 (2.8)2
+ +
n1 n2 50 50
Z = 1.88
(v) Critical value: Z 0.05 = 1.645

(vi) Decision: Since Z > Z 0.05 , the null hypothesis is rejected at 5% level of sig-
nificance, i.e., the height of students who have participated in sports is more
than the students who have not participated in sports.

6.9 Test of Significance for Difference of Standard


Deviations — Large Samples

Let s1 and s2 be the standard deviations of two independent large random samples with
sizes n1 and n2 (n1 > 30, n2 > 30) drawn from two populations with standard deviations
s1 and s2.
Working Rule

(i) Null Hypothesis H0: s1 = s2, i.e., the two samples have been drawn from two
different populations having same standard deviations.
(ii) Alternative Hypothesis H1: s1 π s2 (Two tailed test)
or H1: s1 < s2 (One tailed test)
or H1: s1 > s2 (One tailed test)
(iii) Level of significance: Select the level of significance a.
6.32 Chapter 6 Applied Statistics: Test of Hypothesis

(iv) Test statistic: There are two cases for calculating test statistic.
(a) When the population standard deviations s1 and s2 are known
s1 - s2
Z=
s 12 s 22
+
2 n1 2 n2

(a) When the population standard deviations s1 and s2 are not known
s1 - s2
Z=
s12 s2
+ 2
2 n1 2 n2

where s1 and s2 are sample standard deviation.
(v) Critical value: Find the critical vale (tabulated value) Za of Z at the given level
of significance.
(vi) Decision: If Z < Za at the level of significance a, the null hypothesis is ac-
cepted. If Z > Za at the level of significance a, the null hypothesis is re-
jected.

Example 1
The SD of a random sample of 1000 is found to be 2.6 and the SD
of another random sample of 500 is 2.7. Assuming the samples to
be independent, find whether the two samples could have come from
populations with the same SD.
Solution
n1 = 1000, n2 = 500, s1 = 2.6, s2 = 2.7

(i) Null Hypothesis H0: s1 = s2, i.e., there is no significant difference between
two standard deviations.
(ii) Alternative Hypothesis H1: s1 π s2 (Two tailed test)
(iii) Level of significance: a = 0.05 (assumption)
s1 - s2 2.6 - 2.7
(iv) Test statistic: Z = = = - 0.997
s12 s22 (2.6)2 (2.7)2
+ +
2 n1 2 n2 2(1000) 2(500)
Z = 0.97
(v) Critical value: Z 0.05 = 1.96
(vi) Decision: Since Z < Z 0.05 , the null hypothesis H0 is accepted at 5% level
of significance, i.e., there is no significance difference between two standard
6.9 Test of Significance for Difference of Standard Deviations — Large Samples 6.33

deviations and the two samples could have come from populations with the
same SD.

Example 2
Random samples drawn from two countries gave the following data
relating to the heights of adult males:
Country A Country B
Standard deviation (in inches) 2.58 2.50
Number in samples 1000 1200

Is the difference between the standard deviations significant?


Solution
n1 = 1000, n2 = 1200, s1 = 2.58 inch, s2 = 2.50 inch

(i) Null Hypothesis H0: s1 = s2, i.e., there is no significant difference between
two standard deviations.
(ii) Alternative Hypothesis H1: s1 π s2 (Two tailed test)
(iii) Level of significance: a = 0.05 (assumption)
s1 - s2 2.58 - 2.50
(iv) Test statistic: Z = = = 0.077
s12 s22 (2.58)2 (2.50)2
+ +
2 n1 2 n2 2(1000) 2(1200)
Z = 0.077
(v) Critical value: Z 0.05 = 1.96

(vi) Decision: Since Z < Z 0.05 , the null hypothesis is accepted at 5% level of
significance, i.e., there is no significance difference between the standard de-
viations.

Example 3
Examine whether the two samples for which the data are given in the
following table could have been drawn from populations with the same
SD.
Size SD
Sample I 100 5
Sample II 200 7
6.34 Chapter 6 Applied Statistics: Test of Hypothesis

Solution
n1 = 100, n2 = 200, s1 = 5, s2 = 7

(i) Null Hypothesis H0: s1 = s2, i.e., the two samples could have been drawn from
populations with the same SD.
(ii) Alternative Hypothesis H1: s1 π s2 (Two tailed test)
(iii) Level of significance: a = 0.05 (assumption)
s1 - s2 5-7
(iv) Test statistic: Z = = = - 4.02
s12 s22 2
(5) (7)2
+ +
2 n1 2 n2 2(100) 2(200)
Z = 4.02
(v) Critical value: Z 0.05 = 1.96

(vi) Decision: Since Z > Z 0.05 , the null hypothesis is rejected at 5% level of sig-
nificance, i.e., the two samples could not have been drawn from populations
with the same SD.

Exercise 6.2
1. A random sample of 100 students gave a mean weight of 58 kg with a
SD of 4 kg. Test the hypothesis that the mean weight in the population
is 60 kg.
[Ans.: The mean weight in the population is not 60 kg]
2. A sample of 400 items is taken from a normal population whose mean
is 4 and whose variance is also 4. If the sample mean is 4.45, can the
sample be regarded as truly random sample?
[Ans.: Sample cannot be regarded as truly random sample]
3. The mean IQ of a sample of 1600 children was 99. Is it likely that this
was a random sample from a population with mean IQ 100 and SD 15?
[Ans.: Sample was not drawn from a
population with mean 100 and SD 15]
4. In a random sample of 60 workers, the average time taken by them to
get to work is 33.8 minutes with a standard deviation of 6.1 minutes. Can
we reject the null hypothesis m = 32.6 minutes in favour of alternative
hypothesis m > 32.6 at a = 0.025 level of significance
[Ans.: The null hypothesis is accepted]
5. It is claimed that a random sample of 49 types has a mean life of 15200
km. This sample was drawn from a population whose mean is 15150 km
and a standard deviation of 1200 km. Test the significance at 0.05 level.
[Ans.: The null hypothesis is accepted]
6.9 Test of Significance for Difference of Standard Deviations — Large Samples 6.35

6. An ambulance service claims that it takes on the average less than 10


minutes to reach its destination in emergency calls. A sample of 36
calls has a mean of 11 minutes and the variance of 16 minutes. Test the
claim at 0.05 level of significance.
 [Ans.: The null hypothesis is accepted]
7. Samples of students were drawn from two universities and from their
weights in kilograms, the mean and standard deviations are calculated.
Make a large sample test to test the significance of the difference
between the means.
Mean SD Size of the sample
University A 55 10 400
University B 57 15 100
[Ans.: There is no significant difference between the means]
8. A researchers wants to know the intelligence of students in a school.
He selected two groups of students. In the first group, there are 150
students having mean IQ of 75 with a SD of 15. In the second group
there are 250 students having mean IQ of 70 with SD of 20. Test the
significance that the groups have come from same population.
[Ans.: The groups have not come from same population]
9. Random samples drawn from two places gave the following data relating
to the heights of children:

Mean height in cm SD in cm No. of children in sample


Place A 68.50 2.5 1200
Place B 68.58 3.0 1500

Test at 5% level of significance that the mean height is the same for
children at two places.
[Ans.: The mean height is same for children at two places]
10. The mean life of a sample of 10 electric bulbs was found to be 1456
hours with SD of 423 hours. A second sample of 17 bulbs chosen from a
different batch showed a mean life of 1280 hours with SD of 398 hours.
Is there a significant difference between the means of two batches?
[Ans.: There is no difference between
the mean life of two batches]
11. T
 he SD of a random sample of 900 members is 4.6 and that of another
independent sample of 1600 members is 4.8. Examine if the two samples
could have been drawn from a population with SD 4?
 [Ans.: Two samples could have been drawn
from a population with SD 4]
6.36 Chapter 6 Applied Statistics: Test of Hypothesis

12. The variability of two sets of plots is as given below:


Set of 40 plots Set of 60 Plots
SD per plot 34 kg 28 kg

Examine whether the difference in the variability in yields is significant.


 [Ans.: The difference in the variability in yields is significant]

6.10 Small sample Tests

If the samples are large (n > 30) then the sampling distribution of a statistic is normal.
But if the samples are small (n £ 30) then the above result does not hold good. For
estimation of the parameter as well as for testing a hypothesis, following distributions
are used:
(i) Student’s t-distribution
(ii) Snedecer’s F-distribution
2
(iii) Chi-square (c ) distribution

6.11 Student’s t-distribution

The theory of small or exact sample was developed by Irish statistician William S.
Gosset who used to write under pen-name of student. The quantity t is defined as
Difference of population parameter and the corresponding statisttic
t=
Standard error of statistic

with (n – 1) degrees of freedom if the sample size is n.


Let x1, x2, …., xn be a random sample of size n (n £ 30) drawn from a normal population
with mean m and SD s. The student’s t statistic is defined by
x-m x-m
t= or t =
Ê m ˆ Ê s ˆ
ÁË n ˜¯ ÁË n - 1 ˜¯

S( x - x ) 2
where x is sample mean and s = is an unbiased estimate of s2. The test sta-
n
x-m
tistic t = is a random variable having t-distribution with n = n – 1 degrees of
Ê s ˆ
ÁË n - 1 ˜¯ - (n +1)
Ê t2 ˆ 2
freedom and with probability density function f (t ) = c Á 1 + ˜ , where n = n – 1
Ë n¯

and c is a constant required to make the area under the curve unity, i.e.,

−∞
f (t )dt = 1.
6.12 t-test: Test of Significance for Single Mean 6.37

The t-distribution is used when (i) the sample size is less than or equal to 30, and
(ii) population standard deviation is not known.

6.11.1 Assumptions for t-test


(i) Samples are drawn from normal population and are random.
(ii) The population standard deviation may not be known.
(iii) For testing the equality of two population mean, the population variances are
regarded as equal.
(iv) In case of two samples, some adjustments in degrees of freedom for t are
made.

6.11.2 Properties of t-distribution


(i) The t-distribution is asymptotic to the x-axis, i.e., it extends to infinity on
either side.
(ii) The t-distribution is symmetrical about the mean.
(iii) The shape of the curve varies with the degrees of freedom.
(iv) The larger the number of degrees of freedom, the more closely t-distribution
resembles standard normal distribution.
(v) Sampling distribution of t does not depend on population parameter but it
depends only on degree of freedom n, i.e., on the sample size.

6.11.3 Applications of t-distribution


The t-distribution has following applications in testing f (t)
of hypotheses for small samples:
(i) To test the significance of the sample mean,
when the population variance s is not
known
(ii) To test the significance of the mean of the
sample i.e., to test if the sample mean dif- t
O
fers significantly from the population mean
(iii) To test the significance of the difference Fig. 6.5 t-distribution
between two sample means, the population curve
variances being equal and unknown
(iv) To test the significance of an observed sample correlation coefficient

6.12 t-test: Test of Significance for Single Mean

If x1, x2, …, xn is a random sample of size n (n £ 30) drawn from a normal population
with mean m and SD s and if the sample mean x differs significantly from the
population mean m then the student’s t statistic is given by
6.38 Chapter 6 Applied Statistics: Test of Hypothesis

x-m S( x - x ) 2
t= , where s = with n = n - 1
Ê s ˆ n
ÁË n - 1 ˜¯

Note: Confidence Limit
Ê s ˆ
(i) 95% confidence limits = x ± t0.05 Á
Ë n - 1 ˜¯
where t0.05 is the 5% critical value of t for n = n – 1 degree of freedom for a
Two tailed test.
Ê s ˆ
(ii) 99% confidence limits = x ± t0.01 Á
Ë n - 1 ˜¯
where t0.01 is the 1% critical value of t for n = n – 1 degree of freedom for a
Two tailed test.

Example 1
A machinist is making engine parts with axle diameter of 0.7 cm. A
random sample of 10 parts shows a mean diameter of 0.742 cm with a
standard deviation of 0.04 cm. Compute the statistic you would use to
test whether work is meeting the specification at 0.05 level of signifi-
cance.
Solution
n = 10, x = 0.742 cm, s = 0.04 cm m = 0.7 cm
(i) Null Hypothesis H0: m = 0.7 cm, i.e., the product is meeting the specifica-
tion.
(ii) Alternative Hypothesis H1: m π 0.7 cm (Two tailed test)
(iii) Level of significance: a = 0.05
x-m 0.742 - 0.7
(iv) Test statistic: t = = = 3.15
Ê s ˆ Ê 0.04 ˆ
ÁË n - 1 ˜¯ ÁË 10 - 1 ˜¯

t = 3.15
(v) Critical value: n = n – 1 = 10 – 1 = 9
     t0.05 (n = 9) = 2.262
(vi) Decision: Since t > t0.05, the null hypothesis is rejected at 5% level of
significance i.e., the product is not meeting the specification.
6.12 t-test: Test of Significance for Single Mean 6.39

Example 2
Ten objects are chosen at random from a large population and their
weights are found to be in grams: 63, 63, 64, 65, 66, 69, 69, 70, 70, 71.
Discuss the suggestion that the mean weight is 65 g.
Solution
n = 10, m = 65 g
x = 67 g ¸
˝ From calculator
s = 2.966 g ˛
(i) Null Hypothesis H0: m = 65 g, i.e., there is no significant difference in the
mean weight of sample and population.
(ii) Alternate Hypothesis H1: m π 65 g (Two tailed test)
(iii) Level of significance: a = 0.05 (assumption)
x -m 67 - 65
(iv) Test statistic: t = = = 2.023
æ s ö æ 2.966 ö
ç ÷ ç ÷
è n - 1 ø è 10 - 1 ø
t = 2.023
(v) Critical value: n = n – 1 = 10 – 1 = 9
    t0.05 (n = 9) = 2.262
(vi) Decision: Since t < t0.05, the null hypothesis is accepted at 5% level of
significance, i.e., the mean weight is 65 g.

Example 3
The mean lifetime of a sample of 25 bulbs is found as 1550 hours with a
SD of 120 hours. The company manufacturing the bulbs claims that the
average life of their bulbs is 1600 hours. Is the claim acceptance at 5%
level of significance?
Solution

n = 25, x = 1550 hours, s = 120 hours, m = 1600 hours


(i) Null Hypothesis H0: m = 1600 hours, i.e., the average life of bulbs is 1600
hours.
(ii) Alternative Hypothesis H1: m < 1600 hours (One tailed test)
(iii) Level of significance: a = 0.05
6.40 Chapter 6 Applied Statistics: Test of Hypothesis

x-m 1550 - 1600


(iv) Test statistic: t = = = - 2.04
Ê s ˆ Ê 120 ˆ
ÁË n - 1 ˜¯ ÁË 25 - 1 ˜¯

t = 2.04
(v) Critical value: n = n – 1 = 25 – 1 = 24
t0.05 (n = 24) = 1.711
(vi) Decision: Since t > t0.05 , the null hypothesis is rejected at 5% level of sig-
nificance, i.e., the average life of bulbs is less than 1600 hours and the claim is
unacceptable.

Example 4
A soap manufacturing company was distributing a particular brand of
soap through a large number of retail shops. Before a heavy advertisement
campaign, the mean sales per week per shop was 140 dozens. After the
campaign, a sample of 26 shops was taken and the mean sales was
found to be 147 dozens with standard deviation 16. Can you consider
the advertisement effective?
Solution
n = 26, x = 147 dozens, s = 16 dozens, m = 140 dozens
(i) Null Hypothesis H0: m = 140 dozens, i.e., the advertisement is not effective.
(ii) Alternative Hypothesis H1: m > 140 dozens (One tailed test)
(iii) Level of significance: a = 0.05 (assumption)
x-m 147 - 140
(iv) Test statistic: t = = = 2.1875
Ê s ˆ Ê 16 ˆ
ÁË n - 1 ˜¯ ÁË 26 - 1 ˜¯

t = 2.1875
(v) Critical value: n = n – 1 = 26 – 1 = 25
t0.05 (n = 25) = 1.708
(vi) Decision: Since t > t0.05 , the null hypothesis is rejected at 5% level of signifi-
cance, i.e., the advertisement is effective.

Example 5
A random sample of size 16 from a normal population showed a mean of
103.75 cm and sum of squares of deviations from the mean 843.75 cm2.
Can we say that the population has a mean of 108.75 cm?
6.12 t-test: Test of Significance for Single Mean 6.41

Solution
n = 16, x = 103.75 cm, Â ( x - x )2 = 843.75 cm2 , m = 108.75 cm

Â(x - x )
2
843.75
s= = = 7.26 cm
n 16
(i) Null Hypothesis H0: m = 108.75 cm, i.e., the population has a mean of 108.75 cm.
(ii) Alternative Hypothesis H1: m π 108.75 cm (Two tailed test)
(iii) Level of significance: a = 0.05 (assumption)
x-m 103.75 - 108.75
(iv) Test statistic: t = = = - 2.67
Ê s ˆ Ê 7.26 ˆ
ÁË n - 1 ˜¯ ÁË 16 - 1 ˜¯

t = 2.67
(v) Critical value: n = n – 1 = 16 – 1 = 15
t0.05 (n = 15) = 2.132
(vi) Decision: Since t > t0.05 , the null hypothesis is rejected at 5% level of signifi-
cance, i.e., the population has not a mean of 108.75 cm.

Example 6
A random sample of 10 boys had the following IQs:
70, 120, 110, 101, 88, 83, 95, 98, 107 and 100.
(a) Do these data support the assumption of a population mean IQ of
100?
(b) Find 95% confidence limits for the mean IQ.
Solution
n = 10
x = 97.2 ¸
˝ From calculator
s = 13.54 ˛

(i) Null Hypothesis H0: m = 100, i.e., the population has mean IQ of 100.
(ii) Alternative Hypothesis H1: m π 100 (Two tailed test)
(iii) Level of significance: a = 0.05 (assumption)
x-m 97.2 - 100
(iv) Test statistic: t = = = - 0.62
Ê s ˆ Ê 13.54 ˆ
ÁË n - 1 ˜¯ ÁË 10 - 1 ˜¯

t = 0.62
6.42 Chapter 6 Applied Statistics: Test of Hypothesis

(v) Critical value: n = n – 1 = 10 – 1 = 9


  t0.05 (n = 9) = 2.262
(vi) Decision: Since t < t0.05 , the null hypothesis is accepted at 5% level of sig-
nificance, i.e., population has mean IQ of 100.
Ê s ˆ
95% confidence limits = x ± t0.05 Á
Ë n - 1 ˜¯
Ê 13.54 ˆ
= 97.2 ± 2.262 Á
Ë 10 - 1 ˜¯
= 97.2 ± 10.21
= 87 and 107.41
         

Example 7
The heights of 10 males of a given locality are found to be 175, 168, 155,
170, 152, 170, 175, 160, 160 and 165 cm. Based on this sample, find the
95% confidence limits for the heights of males in that locality.
Solution
n = 10
x = 165¸
˝ From calculator
s = 7.6 ˛

n = n - 1 = 10 - 1 = 9
From t-table
t0.05(n = 9) = 2.262 (Two tailed test)
The 95% confidence limits for m are
È Ê s ˆ Ê s ˆ˘
Í x - t0.05 Á ˜ , x + t0.05 Á ˙
Ë n -1 ¯ Ë n - 1 ˜¯ ˙˚
ÍÎ
È ˘
i.e., Í165 - 2.262(7.6) , 165 + 2.262(7.6) ˙
Î 10 - 1 10 - 1 ˚
i.e., [159.27, 170.73]
i.e., the heights of males in the locality are likely to be in limits 159.27 cm and 170.73 cm.

6.13 t-test: Test of Significance for Difference of


Means
Let x1, x2, …, xn and y1, y2, …., yn be two independent samples of sizes n1 and
n2 (n1 £ 30, n2 £ 30) with means x and y and standard deviations s1 and s2 from a
6.13 t-test: Test of Significance for Difference of Means 6.43

normal population with means m1 and m2 and same standard deviations. The student’s
t statistic is given by
x-y
t= with n = n1 + n2 - 2
1 1
s +
n1 n2
∑x
where        x=
n1
∑y
y=
n2
S ( x - x )2 + å( y - y )2
and           s =
n1 + n2 - 2
In terms of standard deviations s1 and s2.
x-y
t=
1 1
            s +
n1 n2

n1 s12 + n2 s22
where             s =
n1 + n2 - 2

S( x - x ) 2
and           s1 =
n1

S( y - y ) 2
             s2 = n2

Note
1. If n1 = n2 = n and the samples are independent, i.e., the observations in the two
samples are not all related then test statistic is given by
x-y
t= with n = 2 n - 2
s12 + s22
n -1
2. If n1 = n2 = n and if the pairs of values of x and y are associated or correlated
in some way (or not independent), the above formula for testing of hypothesis
cannot be used.
Let di = xi – yi denote the difference (with proper sign) in the values of x and y
for the ith pair (i = 1, 2, ..., n).
The test statistics is given by
d
t= with n = n - 1
Ê s ˆ
ÁË n - 1 ˜¯

6.44 Chapter 6 Applied Statistics: Test of Hypothesis

where d and s denote the mean and standard deviation of the difference di
respectively, i.e.,

d=
 di
n
2

s=
 (di - d )2 =  di 2 - Ê Â di ˆ
Á ˜
n n Ë n ¯

(3) Confidence Limits
Ê ˆ
Á ˜
1
(i) 95% confidence limits = ( x - y ) ± t0.05 Á ˜
Á 1 1 ˜
Ás + ˜
Ë n1 n2 ¯
where t0.05 is the 5% critical value of t for n = n1 + n2 – 2 degree of free-
dom for a Two tailed test.
Ê ˆ
Á ˜
Á 1 ˜
(ii) 99% confidence limits = ( x - y ) ± t0.01
Á 1 1 ˜
Ás + ˜
Ë n1 n2 ¯
where t0.01 is the 1% critical value of t for n = n1 + n2 – 2 degree of free-
dom for a Two tailed test.

Example 1
The means of two random samples of size 9 and 7 are 196.42 and 198.82
respectively. The sum of squares of the deviation from the mean are
26.94 and 18.73 respectively. Can the sample be considered to have
been drawn from the same population?
Solution
n1 = 9, n2 = 7, x = 196.42, y = 198.82
 ( x - x )2 = 26.94,  ( y - y )2 = 18.73
Â(x - x ) + Â(y - y )
2 2
26.94 + 18.73
s= = = 1.806
n1 + n2 - 2 9+7-2

(i) Null Hypothesis H0: m1 = m2, i.e., the samples are drawn from the same popula-
tion.
(ii) Alternative Hypothesis H1: m1 π m2 (Two tailed test)
6.13 t-test: Test of Significance for Difference of Means 6.45

(iii) Level of significance: a = 0.05 (assumption)


x-y 196.42 - 198.82
(iv) Test statistic: t = = = - 2.6368
1 1 1 1
s + 1.8061 +
n1 n2 9 7
t = 2.6368
(v) Critical value: n = n1 + n2 – 2 = 9 + 7 – 2 = 14
t0.05 (n = 14) = 2.145
(vi) Decision: Since t > t0.05 , the null hypothesis is rejected at 5% level of signifi-
cance, i.e., the samples are not drawn from the same population.

Example 2
Samples of two types of electric bulbs were tested for length of life and
the following data were obtained.
Size Mean SD
Sample 1 8 1234 hr 36 hr
Sample 2 7 1036 hr 40 hr

Is the difference in the means sufficient to warrant that type 1 bulbs are
superior to type 2 bulbs?
Solution
n1 = 8, n2 = 7, x1 = 1234 hr, x2 = 1036 hr
s1 = 36 hr, s2 = 40 hr

n1s12 + n2 s22 (8)(36)2 + (7)(40)2


s= = = 40.73 hr
n1 + n2 - 2 8+7-2
  
(i) Null Hypothesis H0: m1 = m2, i.e., the type 1 bulbs are not superior to type 2
bulbs.
(ii) Alternative Hypothesis H1: m1 > m2 (One tailed test)
(iii) Level of significance: a = 0.05 (assumption)
x1 - x2 1234 - 1036
(iv) Test statistic: t = = = 9.39
1 1 1 1
s + 40.73 +
n1 n2 8 7
t = 9.39
(v) Critical value: n = n1 + n2 – 2 = 8 + 7 – 2 = 13
t0.05 (n = 13) = 1.771
6.46 Chapter 6 Applied Statistics: Test of Hypothesis

(vi) Decision: Since t > t0.05 , the null hypothesis is rejected at 5% level of signifi-
cance, i.e., the type 1 bulbs are superior to type 2 bulbs.

Example 3
The mean height and SD height of 8 randomly chosen soldiers are
166.9 cm and 8.29 cm respectively. The corresponding values of 6
randomly chosen sailors are 170.3 cm and 8.50 cm respectively. Based
on this data, can we conclude that soldiers are, in general, shorter than
sailors?
Solution
n1 = 8, n2 = 6, x1 = 166.9 cm, x2 = 170.3 cm
s1 = 8.29 cm, s2 = 8.50 cm

n1s12 + n2 s22 (8)(8.29)2 + (6)(8.50)2


s= = = 9.05 cm
n1 + n2 - 2 8+6-2
  
(i) Null Hypothesis H0: m1 = m2, i.e., there is no significant difference between the
heights of soldiers and sailors.
(ii) Alternative Hypothesis H1: m1 < m2 (One tailed test)
(iii) Level of significance: a = 0.05 (assumption)
x1 - x2 166.9 - 170.3
(iv) Test statistic: t = = = - 0.696
1 1 1 1
s + 9.05 +
n1 n2 8 6
t = 0.696
(v) Critical value: n = n1 + n2 – 2 = 8 + 6 – 2 = 12
t0.05 (n = 12) = 1.782
(vi) Decision: Since t < t0.05 , the null hypothesis is accepted at 5% level of sig-
nificance, i.e., there is no significant difference between the heights of soldiers
and sailors and we cannot conclude that sailors are, in general, shorter than
sailors.

Example 4
Two types of batteries are tested for their length of life and the following
data are obtained:
6.13 t-test: Test of Significance for Difference of Means 6.47

No. of Mean life


Variance
Samples in hours
Type A 9 600 121
Type B 8 640 144

Is there a significant difference in the two means? Find 95% confidence


limits for the difference in means.
Solution
n1 = 9, n2 = 8, x1 = 600 hours, x2 = 640 hours
s1 = 121 = 11 hours, s2 = 144 = 12 hoours

n1s12 + n2 s22 (9)(121) + (8)(144)


s= = = 12.22 hours
n1 + n2 - 2 9+8-2

(i) Null Hypothesis H0: m1 = m2, i.e., there is no significant difference in two
means.
(ii) Alternative Hypothesis H1: m1 π m2 (Two tailed test)
(iii) Level of significance: a = 0.05 (assumption)
x1 - x2 600 - 640
(iv) Test statistic: t = = = - 6.74
1 1 1 1
s + 12.22 +
n1 n2 9 8
t = 6.74
(v) Critical value: n = n1 + n2 – 2 = 9 + 8 – 2 = 15
t0.05 (n = 15) = 2.132
(vi) Decision: Since t > t0.05 , the null hypothesis is rejected at 5% level of confi-
dence, i.e., there is significant difference in the two means.
Ê 1 1 ˆ
95% confidence limits for (m1 – m2) = ( x1 - x2 ) ± t0.05 Á s + ˜
Ë n1 n2 ¯
Ê 1 1ˆ
= (600 - 640) ± 2.132 Á 12.22 +
Ë 9 8 ˜¯
= - 40 ± 12.66
= - 27.34 and - 52.66

Example 5
A group of 5 patients treated with medicine A weigh 42, 39, 48, 60 and
41 kg. Second group of 7 patients from the same hospital treated with
6.48 Chapter 6 Applied Statistics: Test of Hypothesis

medicine B weigh 38, 42, 56, 64, 68, 69 and 62 kg. Do you agree with
the claim that medicine B increases the weight significantly?
Solution
n1 = 5, n2 = 7
x = 46 kg ü
y = 57 kg ïï
ý From calculator
s1 = 7.62 kg ï
s2 = 11.5 kg ïþ

n1 s12 + n2 s22 (5)(7.62) 2 + (7)(11.5) 2


s= = = 11.03 kg
n1 + n2 - 2 5+7-2
   
(i) Null Hypothesis H0 : m1 = m2, i.e., there is no significant difference between
the medicines A and B as regards their effect on the increase in weight.
(ii) Alternative Hypothesis H1 : m1 > m2 (One tailed test)
(iii) Level of significance: a = 0.05 (assumption)
x-y 46 - 57
(iv) Test statistic t = = = -1.7
1 1 1 1
s + 11.03 +
n1 n2 5 7
      |t| = 1.7
(v) Critical value: n = n1 + n2 – 2 = 5 + 7 – 2 = 10
     t0.05 (n = 10) = 1.812.
(vi) Decision: Since |t| < t0.05, the null hypothesis is accepted at 5% level of sig-
nificance, i.e., the medicines A and B do not differ significantly as regards
their effect on increase in weight.

Example 6
The following data represent the biological values of protein from cow’s
milk and buffalo’s milk at a certain level:
Cow’s milk 1.82 2.02 1.88 1.61 1.81 1.54
Buffalo’s milk 2.00 1.83 1.86 2.03 2.19 1.88

Examine if the average values of protein in the two samples significantly


differ.
Solution
Here, n1 = n2 = 6 and two samples are independent.
6.13 t-test: Test of Significance for Difference of Means 6.49

n=6
x1 = 1.78 ¸
Ô
x2 = 1.965Ô
˝ From calculator
s1 = 0.16 Ô
s2 = 0.124 Ô˛

(i) Null Hypothesis H0: m1 = m2, i.e., there is no significant difference in average
values of proteins in two milk samples.
(ii) Alternative Hypothesis H1: m1 π m2 (Two tailed test)
(iii) Level of significance: a = 0.05 (assumption)
x1 - x2 1.78 - 1.965
(iv) Test statistic: t = = = - 2.043
s12 + s22 (0.16)2 + (0.124)2
n -1 6 -1
t = 2.0043
(v) Critical value: n = 2n – 2 = 2(6) – 2 = 10
t0.05 (n = 10) = 2.228
(vi) Decision: Since t < t0.05 , the null hypothesis is accepted at 5% level of sig-
nificance, i.e., there is no significant difference in average values of proteins in
two milk samples.

Example 7
A certain injection administered to 12-patients resulted in the following
changes of blood pressure:
5, 2, 8, –1, 3, 0, 6, –2, 1, 5, 0, 4
Can it be concluded that the injection will be in general accompanied by
an increase in blood pressure?
Solution
Here, ‘the changes’ d = x – y in blood pressure are given, i.e., x is the final blood pressure
after administering the injection and y is the initial blood pressure. It is required to test
whether the mean blood pressure has increased, i.e., m1 is greater than m2.
n = 12, Â di = 31, Â di2 = 185
d=
 di =
31
= 2.58
n 12
2

s=
 di2 - Ê Â di ˆ 185 Ê 31 ˆ
2

Á ˜ = - = 2.96
n Ë n ¯ 12 ÁË 12 ˜¯

6.50 Chapter 6 Applied Statistics: Test of Hypothesis

(i) Null Hypothesis H0: m1 = m2, i.e., mean blood pressure has not increased.
(ii) Alternative Hypothesis H1: m1 > m2 (One tailed test)
(iii) Level of significance: a = 0.05 (assumption)
d 2.58
(iv) Test statistic: t = = = 2.89
Ê s ˆ Ê 2.96 ˆ
ÁË n - 1 ˜¯ ÁË ˜
12 - 1 ¯
t = 2.89
(v) Critical value: n = n – 1 = 12 – 1 = 11
t0.05 (n = 11) = 1.796

(vi) Decision: Since t > t0.05 , the null hypothesis is rejected, i.e., injection will in
general accompanied by an increase in blood pressure.

Example 8
Scores obtained in a shooting competition by 10 soldiers before and
after intensive training are given below:
Score before training 67 24 57 55 63 54 56 68 33 43
Score after training 70 38 58 58 56 67 68 75 42 38

Test whether the intensive training is useful at 0.05 level of significance.


Solution
Since both the scores belongs to same set of soldiers, scores can be regarded as
correlated and no longer independent. Paired t-test is applied to check hypothesis.
n1 = n2 = n = 10
Calculation of paired-t
x 67 24 57 55 63 54 56 68 33 43
y 70 38 58 58 56 67 68 75 42 38
d=x–y –3 –14 –1 –3 7 –13 –12 –7 –9 5
2
d 9 196 1 9 49 169 144 49 81 25

 di = - 50,  di2 = 732


d=
 di = -50 = - 5
n 10
2

s=
 di2 - Ê Â di ˆ 732 Ê -50 ˆ
2

Á ˜ = - = 6.94
n Ë n ¯ 10 ÁË 10 ˜¯

6.14 t-test: Test of Significance for Correlation Coefficients 6.51

(i) Null Hypothesis H0: m1 = m2, i.e., there is no significant effect of intensive
training.
(ii) Alternative Hypothesis H1: m1 < m2 (One tailed test)
(iii) Level of significance: a = 0.05
d -5
(iv) Test statistic: t = = = - 2.16
Ê s ˆ Ê 6.94 ˆ
ÁË n - 1 ˜¯ ÁË 10 - 1 ˜¯

t = 2.16
(v) Critical value: n = n – 1 = 10 – 1 = 9
t0.05 (n = 9) = 1.96
(vi) Decision: Since t > t0.05 , the null hypothesis is rejected at 5% level of signifi-
cance, i.e., intensive training is useful.

6.14 t-test: Test of Significance for Correlation


Coefficients

Let (x1, y1), (x2, y2), …, (xn, yn) be n pairs of observations of a random sample from
a bivariate normal population and let r be the observed correlation coefficient in the
sample. It is required to test if this sample correlation coefficient is significant of any
correlation in the population, i.e., whether the value of the population correlation
coefficient r is zero and the observed value of r has arisen due to fluctuation of
sampling. The student’s t statistic is given by
r n-2
t = with n = n – 2
1 - r2

Example 1
A random sample of 18 pairs of observations from a bivariate normal
population gives a correlation coefficient of 0.3. Is it likely that vari-
ables are uncorrelated in the population?
Solution
n = 18,   r = 0.3
(i) Null Hypothesis H0: r = 0, i.e., the variables are uncorrelated.
(ii) Alternative Hypothesis H1: r π 0 (Two tailed test)
(iii) Level of significance: a = 0.05 (assumption)
r n - 2 0.3 18 - 2
(iv) Test statistic: t = = = 1.26
1 - r2 1 - (0.3)2
t = 1.26
6.52 Chapter 6 Applied Statistics: Test of Hypothesis

(v) Critical value: n = n – 2 = 18 – 2 = 16


t0.05 (n = 16) = 2.12
(vi) Decision: Since t < t0.05, the null hypothesis is accepted at 5% level of sig-
nificance, i.e., the variables are uncorrelated in the population.

Example 2
A random sample of 10 nations gives a correlation coefficient of 0.5
between literacy rate and political stability. Is the relationship signifi-
cant?
Solution
n = 10,   r = 0.5
(i) Null Hypothesis H0 : r = 0, i.e., there is no relationship between literacy rate
and political stability.
(ii) Alternative Hypothesis H1: r π 0 (Two tailed test)
(iii) Level of significance a = 0.5 (assumption)
r n-2 0.5 10 - 2
(iv) Test statistic: t = = = 1.63
2
1- r 1 - (0.5)2
t = 1.63
(v) Critical value: n = n – 2 = 10 – 2 = 8
t0.05 (n = 8) = 2.306
(vi) Decision: Since t < t0.05, the null hypothesis is accepted at 5% level of
significance i.e., there is no relationship between literacy rate and political
stability.

Example 3
Find the least value of r in samples of 18 pairs of observations from a
bivariate normal population, which is significant at 5% level.
Solution
The value of r for n = 18 will be significant at 5% level if
r n-2
≥ t0.05 (n = 16)
1 - r2
r n-2
≥ 2.12
   1 - r2
6.14 t-test: Test of Significance for Correlation Coefficients 6.53

Squaring both the sides and putting n = 18,


r 2 (18 - 2)
≥ 4.5
1 - r2
16r 2 ≥ 4.5 - 4.5r 2
20.5r 2 ≥ 4.5
r 2 ≥ 0.22
r ≥ 0.47
  
Hence, the least value of r is 0.47 (numerically).

Exercise 6.3
1. A sample of 26 bulbs gives a mean life of 990 hours with a SD of 20
hours. The manufacturer claims that the mean life of bulbs is 1000
hours. Is the sample not up to standard?
[Ans.: The sample is not up to the standard]

2. The average breaking strength of the steel rods is specified to be 18.5


thousand pounds. To test this, sample of 14 rods were tested. The mean
and SD obtained were 17.85 and 1.955 respectively. Is the result of
experiment significant?
[Ans.: The result of experiment is not significant]

3. A random sample of six steel beams has a mean compressive strength


of 58392 psi (pounds per square inch) with a SD of 648 psi. Use this
information and level of significance a = 0.05 to test whether the true
average compressive strength of the steel from which this sample came
is 58000 psi. Assume normality.
[Ans.: The average compressive strength of the
steel beam is not equal to 58000 psi]
4. A sample of 155 members has a mean of 67 and SD of 52. Is this sample
has been taken from a large population of mean 70?
[Ans.: The sample has not been taken from the given population]
5. The heights of 10 males of a given locality are found to be 70, 67, 62,
68, 61, 68, 70, 64, 64, 66 inches. Is it reasonable to believe that the
average height is greater than 64 inches? Test at 5% significance level
assuming that for 9 degrees of freedom t = 1.833 at a = 0.05.
[Ans.: The average height is greater than 64 inches]
6. A random sample from a company’s very extensive files shows that the
orders for a certain kind of machinery were filled respectively in 10, 12,
19, 14, 15, 18, 11 and 13 days. Use the level of significance a = 0.01 to
test the claim that on the average such orders are filled in 10.5 days.
6.54 Chapter 6 Applied Statistics: Test of Hypothesis

Choose the alternative hypothesis so that rejection of null hypothesis m


= 10.5 days implies that it takes longer than indicated.
[Ans.: The orders on average are filled in more than 10.5 days]
7. Producer of gutkha claims that the nicotine content in his gutkha on the
average is 1.83 mg. Can this claim be accepted if a random sample of 8
gutkha of this type have the nicotine contents of 2, 1.7, 2.1, 1.9, 2.2,
2.1, 2, 1.6 mg? Use a 0.05 level of significance.
[Ans.: The null hypothesis is accepted]
8. Two horses A and B were tested according to the time (in seconds) to
run a particular track with the following results:
Horse A 28 30 32 33 33 29 34

Horse B 29 30 30 24 27 29

Test whether the two horses have the same running capacity.
[Ans.: The two horses do not have the same running capacity]
9. To examine the hypothesis that the husbands are more intelligent than
the wives, an investigator took a sample of 10 couples and administered
them a test which measures the IQ. The results are as follows:
Husbands 117 105 97 105 123 109 86 78 103 107
Wives 106 98 87 104 116 95 90 69 108 85
Test the hypothesis with a reasonable test at the level of significance of
0.05.
[Ans.: These is no significant difference in IQs]
10. Two independent samples of 8 and 7 items respectively had the following
values:
Sample I 11 11 13 11 15 9 12 14
Sample II 9 11 10 13 9 8 10 –
Is the difference between the means of samples significant?
[Ans.: The difference between the mean of samples is not significant]
11. Random samples of specimens of coal from two mines A and B are
drawn and their heat-producing capacity (in millions of calories/ton)
were measured yielding the following results:
Mine A 8350 8070 8340 8130 8260 –
Mine B 7900 8140 7920 7840 7890 7950
Is there significant difference between the means of these two samples
at 0.01 level of significance?
[Ans.: There is significant difference between
the means of two samples]
6.15 Snedecor’s F-test for Ratio of Variances 6.55

12. A
 random sample of 27 pairs of observations from a bivariate normal
population gives a correlation coefficient of 0.42. Is it likely that the
variables are uncorrelated in the population?
 [Ans.: correlated]
13. F
 ind the least value of r in a sample of 27 pairs from a bivariate normal
population which is significant at 5% level.
 [Ans.: |r| = 0.487]

6.15 Snedecor’s F-test for Ratio of Variances

Let x1, x2, …, xn and y1, y2, …, yn be the values of two independent random samples
of sizes n1 and n2 (n1 £ 30, n2 £ 30) with means x and y drawn from the normal
population with mean m and standard deviation s. The test statistic of Snedecor’s
F-test in terms of unbiased estimates of standard deviations S1 and S2 of population is
given by
S12
F= where S12 > S22
S22

and S12 =
 ( x - x )2
n1 - 1

S22 =
 ( y - y )2
n2 - 1
with numerator degree of freedom n1 = n – 1 and denominator degree of freedom
n2 = n2 – 1.
If s1 and s2 are standard deviations of samples then

s12 =
 ( x - x )2 P(F )
n1

s22 =
 ( y - y )2
n2
\ Â (x - x ) =
2
n1s12

 ( y - y )2 = n2 s22 O
F

Substituting in S12 and S22 , Fig. 6.6 F-distribution curve

n1s12
S12 =
n1 - 1
n2 s22
S22 =
n2 - 1
       
6.56 Chapter 6 Applied Statistics: Test of Hypothesis

The Snedecor’s F-distribution is defined by


 n +n 
 n1 − 2  − 1 2 
 2   n1   2 
P ( F ) = cF 1 + n F 
2

where the constant c depends on n1 and n2. It is so chosen that the area under the curve
is unity.

6.15.1 Properties of F-distribution


(i) F-distribution curve lies entirely in the first quadrant and is unimodal.
(ii) F-distribution is independent of the population variance s 2 and depends on
n1 and n2 only.
(iii) The mode of F-distribution is less than unity.
1
(iv) F1-a (n1 , n 2 ) =
Fa (n 2 , n1 )
where Fa(n2, n1 ) is the value of F with n2 and n1 degrees of freedom such
that the area under the F-distribution curve right of Fa is a.
(v) F-test is one tailed test (right tailed test).

6.15.2 Test of Significance for Ratio of Variances


Significant test is performed by means of Snedecer’s F-table which provides 5% and
1% of points of significance for F. 5% points of F means that the area under the
F-curve, to the right of the ordinate at a value of F, is 0.05. Further, F-table gives
only single tail test. F-distribution is very useful for testing the equality of population
means by comparing sample variances.
Working Rule
2 2
(i) Null Hypothesis H0 : s 1 = s 2
2 2
(ii) Alternative Hypothesis H1: s 1 > s 2
(iii) Level of significance: Select the level of significance
S2
(iv) Test statistic: F = 12 where S12 > S22
S2
(v) Critical value: Find the critical value (tabulated value) Fa at the given level of
significance at degree of freedoms,
n1 = n1 – 1
n2 = n2 – 1
(vi) Decision: If F < Fa at the level of significance a, the null hypothesis is accepted.
If F > Fa at the level of significance a, the null hypothesis is rejected.

Example 1
In two independent samples of sizes 8 and 10, the sum of squares of
deviations of the sample values from the respective means were 84.4
6.15 Snedecor’s F-test for Ratio of Variances 6.57

and 102.6. Test whether the difference of variances of the population is


significant or not. Use a 0.05 level of significance.
Solution
n1 = 8, n2 = 10
S ( x - x ) 2
= 84.4, S ( y - y ) 2 = 102.6,
S( x - x )2 84.4
S12 = = = 12.057
n1 - 1 8 -1
S( y - y )2 102.6
S22 = = = 11.4
n2 - 1 10 - 1
(i) Null Hypothesis H 0 : s 12 = s 22 , i.e., the variances of two populations are
equal.

(ii) Alternative Hypothesis H1 : s 12 > s 22


(iii) Level of significance: a = 0.05
S12 12.057
(iv) Test statistic: F = = = 1.057
S22 11.4
(v) Critical value: n1 = n1 − 1 = 8 − 1 = 7
n 2 = n2 − 1 = 10 − 1 = 9
       F0.05 (n1 = 7, n 2 = 9) = 3.29
(vi) Decision: Since F < F0.05, the null hypothesis is accepted at 0.05 level of
significance, i.e., there is no significant difference in variances of the popu-
lation.

Example 2
The standard deviations calculated from two random samples of sizes
9 and 13 are 2.1 and 1.8 respectively. Can the samples be regarded as
drawn from normal populations with the same SD?
Solution
n1 = 9, n2 = 13, s1 = 2.1, s2 = 1.8
n1s12 2
9(2.1)
S12 = = = 4.96
n1 - 1 9 -1
n2 s22 13(1.8)2
S22 = = = 3.51
n2 - 1 13 - 1

6.58 Chapter 6 Applied Statistics: Test of Hypothesis

(i) Null Hypothesis H0 : s 12 = s 22 , i.e., variances of two populations are equal.


(ii) Alternative Hypothesis H1: s 12 > s 22
(iii) Level of significance: a = 0.05 (assumption)
S12 4.96
(iv) Test statistic: F = = = 1.41
S22 3.51
(v) Critical value: n1 = n1 – 1 = 9 – 1 = 8
n2 = n2 – 1 = 13 – 1 = 12
F0.05 (n1 = 8, n2 = 12) = 2.85
(vi) Decision: Since F < F0.05, the null hypothesis is accepted at 5% level of signifi-
cance, i.e., the samples can be regarded as drawn from normal population with
same SD.

Example 3
Two random samples are drawn from two populations and the following
results were obtained:
Sample I 16 17 18 19 20 21 22 24 26 27
Sample II 19 22 25 25 26 28 29 30 31 32 35 36

Find the variances of the two samples and test whether the two popula-
tions have the same variances.
Solution
n1 = 10, n2 = 12
x1 = 21 ¸
Ô
x2 = 28 Ô
˝ From calculator
s1 = 3.55 Ô
s = 4.98 Ô˛
2
n1s12 10(3.55)2
S12 = = = 14
n1 - 1 10 - 1
n2 s22 12(4.98)2
S22 = = = 27.05
n2 - 1 12 - 1

2
(i) Null Hypothesis H0: s 12 = s 2 , i.e., two populations have the same variances.
2
(ii) Alternative Hypothesis H1: s 12 > s 2
(iii) Level of significance: a = 0.05 (assumption)
(iv) Test statistic: Since S22 > S12 ,
6.15 Snedecor’s F-test for Ratio of Variances 6.59

S22 27.05
F= = = 1.93
S12 14

(v) Critical value: n1 = n1 – 1 = 10 – 1 = 9
n2 = n2 – 1 = 12 – 1 = 11
F0.05 (n2 = 11, n1 = 9) = 3.10
(vi) Decision: Since F < F0.05, the null hypothesis is accepted at 5% level of signifi-
cance, i.e., two populations have the same variances.

Example 4
In a test given to two groups of students drawn from two normal
populations, the marks obtained were as follows:
Group I 18 20 36 50 49 36 34 49 41

Group II 29 28 26 35 30 44 46

Examine at 5% level, whether the two populations have the same


variances.
Solution
  nA = 9
  nB = 7
ü x = 37
ï y = 34
ï
ý From calculator
s1 = 11.225ï
ï
s2 = 7.426 þ

n1s12 9(11.2225)2
S12 = = = 141.75
n1 - 1 9 -1

n2 s22 7(7.426)2
S22 = = = 64.33
n2 - 1 7 -1

(i) Null Hypothesis H 0 : s 12 = s 22 , i.e., the two populations have same vari-
ances.

(ii) Alternative Hypothesis H1 : s 12 ¹ s 22


(iii) Level of significance: a = 0.05
6.60 Chapter 6 Applied Statistics: Test of Hypothesis

S12 141.75
(iv) Test statistic: F = = = 2.203
S22 64.33

(v) Critical value: n 1 = n1 - 1 = 9 - 1 = 8


n 2 = n2 - 1 = 7 - 1 = 6
    F0.05 (n 1 = 8, n 2 = 6) = 4.15
(vi) Decision: Since F < F0.05, the null hypothesis is accepted at 5% level of
significance, i.e., the two populations have the same variances.

Example 5
A group of 10 rats fed on diet A and another group of 8 rats fed on diet
B recorded following increase in weight:
Diet A 5 6 8 1 12 4 3 9 6 10 gm
Diet B 2 3 6 8 1 10 2 8 gm

Find, if the variances are significantly different?


Solution
n1 = 10, n2 = 8
s1 = 3.2 ¸
˝ From calculator
s2 = 3.23Ô˛
n1s12 10(3.2)2
S12 = = = 11.38
n1 - 1 10 - 1
n2 s22 8(3.23)2
S22 = = = 11.92
n2 - 1 8 -1
2 2
(i) Null Hypothesis H0: s 1 = s 2 , i.e., the there is no significant difference in
variances.
(ii) Alternative Hypothesis H1: s 12 > s 22
(iii) Level of significance: a = 0.05 (assumption)
(iv) Test statistic: Since S22 > S12 ,
S22 11.92
F= = = 1.05
S12 11.33

(v) Critical value: n1 = n1 – 1 = 10 – 1 = 9
n2 = n2 – 1 = 8 – 1 = 7
F0.05 (n2 = 7, n1 = 9) = 3.29
6.15 Snedecor’s F-test for Ratio of Variances 6.61

(vi) Decision: Since F < F0.05, the null hypothesis is accepted at 5% level of signifi-
cance, i.e., the two variances are not significantly different.

Example 6
Two random samples gave the following data:
Size Mean Variance
Sample I 8 9.6 1.2
Sample II 11 16.5 2.5

Can we conclude that the two samples have been drawn from the same
normal population?
Solution
A normal distribution has two parameters, mean µ and variance s2. To conclude that the
two samples have been drawn from the same normal population, we have to test for
(i) Equality of two means H0 (µ1 = µ2) by t-test
2 2
(ii) Equality of two variances H0 (s 1 = s 2 ) by F-test.
F-test:

n1 = 8, n2 = 11, x1 = 9.6, x2 = 16.5, s12 = 1.2, s22 = 2.5


n1s12 8(1.2)
S12 = = = 1.37
n1 - 1 8 - 1
n2 s22 11(2.5)
S22 = = = 2.75
n2 - 1 11 - 1
2
(i) Null Hypothesis H0: s 1 = s 22 , i.e., variances of two populations are equal.

(ii) Alternative Hypothesis H1: s 12 > s 22


(iii) Level of significance: a = 0.05 (assumption)
(iv) Test statistic: Since S22 > S12 ,

S22 2.75
F= = = 2.007
S12 1.37

(v) Critical value: n1 = n1 – 1 = 8 – 1 = 7
n2 = n2 – 1 = 11 – 1 = 10
F0.05 (n2 = 10, n1 = 7) = 3.64
(vi) Decision: Since F < F0.05, the null hypothesis is accepted at 5% level of signifi-
cance, i.e., two populations have the same variances.
6.62 Chapter 6 Applied Statistics: Test of Hypothesis

t-test:
n1s12 + n2 s22 8(1.2) + 11(2.5)
s= = = 1.48
n1 + n2 - 2 8 + 11 - 2

(i) Null Hypothesis H0: µ1 = µ2, i.e., means of two populations are equal.
(ii) Alternative Hypothesis H1: µ1 π µ2 (Two tailed test)
(iii) Level of significance: a = 0.05 (assumption)
x1 - x2 9.6 - 16.5
(iv) Test statistic: t = = = -10.03
1 1 1 1
s + 1.48 +
n1 n2 8 11
t = 10.03
(v) Critical value: n1 = n1 + n2 – 2 = 8 + 11 – 2 = 17
      t0.05 (n = 17) = 2.11
(vi) Decision: Since t > t0.05, the null hypothesis is rejected at 5% level of signifi-
cance, i.e., two populations have not same means.
Hence, the two samples could not have been drawn from the same normal
population.

Example 7
Two nicotine contents in two random samples of tobacco are given
below:
Sample I 21 24 25 26 27
Sample II 22 27 28 30 31 36

Can we say that two samples came from the same population?
Solution
F-test:
n1 = 5, n2 = 6
x1 = 24.6 ¸
Ô
x2 = 29 Ô
˝ From calculator
s1 = 2.06 Ô
s = 4.24 Ô˛
2
n1s12 5(2.06)2
S12 = = = 5.30
n1 - 1 5 -1
n2 s22 6(4.24)2
S22 = = = 21.57
n2 - 1 6 -1

6.15 Snedecor’s F-test for Ratio of Variances 6.63

2
(i) Null Hypothesis H0: s 1 = s 22 , i.e., variances of two populations are equal.
2 2
(ii) Alternative Hypothesis H1: s 1 > s 2
(iii) Level of significance: a = 0.05 (assumption)
2 2
(iv) Test statistic: Since S2 > S1 ,

S22 21.57
F= = = 4.07
S12 5.30

(v) Critical value: n1 = n1 – 1 = 5 – 1 = 4
n2 = n2 – 1 = 6 – 1 = 5
F0.05 (n2 = 5, n1 = 4) = 6.26
(vi) Decision: Since F < F0.05, the null hypothesis is accepted at 5% level of signifi-
cance, i.e., the two populations have the same variances.
t-test:
n1s12 + n2 s22 5(2.06)2 + 6(4.24)2
s= = = 14.34
n1 + n2 - 2 5+6-2

(i) Null Hypothesis H0: µ1 = µ2, i.e., means of two populations are equal.
(ii) Alternative Hypothesis H1: µ1 π µ2 (Two tailed test)
(iii) Level of significance: a = 0.05 (assumption)
x1 - x2 24.6 - 29
(iv) Test statistic: t = = = -0.51
1 1 1 1
s + 14.34 +
n1 n2 5 6
t = 0.51
(v) Critical value: n = n1 + n2 – 2 = 5 + 6 – 2 = 9
 t0.05 (n = 9) = 2.262
(vi) Decision: Since t < t0.05, the null hypothesis is accepted at 5% level of signifi-
cance, i.e., two populations have same means.
Hence, two samples came from the same population.

Exercise 6.4
1. If two independent samples of sizes n1 = 13 and n2 = 7 are taken from
a normal population. What is the probability that the variance of the
first sample will be at least four times as large as that of the second
sample?
[Ans.: 0.05]
6.64 Chapter 6 Applied Statistics: Test of Hypothesis

2. The standard deviations calculated from two random samples of size


9 and 13 are 2 and 1.9 respectively. Can the samples be regarded as
drawn from the normal populations with the same standard deviation?
[Ans.: The samples can be regarded as drawn from
the normal populations with the same standard deviation]
3. Two samples are drawn from two normal populations. From the
following data test whether the two samples have the same variance at
5% level?
Sample I 60 65 71 74 76 82 85 87
Sample II 61 66 67 85 78 63 85 86 88 91
[Ans.: Two samples have the same variances]

4. The time taken by workers in performing a job by method I and method II


is given below.

Method I 20 16 26 27 22

Method II 27 33 42 35 32 34 38

Do the data show that the variances of time distribution in a population


from which these samples are drawn do not differ significantly?
[Ans.: The variances of time distribution in a population
from which the samples are drawn do not differ ­significantly]
5. Following results were obtained from two samples, each drawn from
two different population A and B:
Population A B
Sample I II
Sample size 25 17
Sample SD 3 2

Test the hypothesis that the variance of brand A is more than that of B.
[Ans.: Variance of brand A is not more than the variance of brand B ]
6. In a laboratory experiment two samples gave the following results:

Sample Sum of squares of


Sample Size
mean deviation from the mean
1 10 15 90
2 12 14 108

Test the equality of sample variances at 5% level of significance.


[Ans.: The two population have the same variances]
6.16 Chi-square (c2) Test 6.65

2
6.16 Chi–square (c ) Test
2
The chi-square (c ) test is a useful measure of comparing experimentally obtained
results with those expected theoretically and based on hypothesis. It is used as a test
statistic in testing a hypothesis that provides a set of theoretical frequencies with which
observed frequencies are compared. The magnitude of discrepancy between observed
2
and theoretical frequencies is given by the quantity c (pronounced as chi-square). If
c = 0, the observed and expected frequencies completely coincide. As the value of c2
2

increases, the discrepancy between the observed and theoretical frequency decreases.
If fo1 , fo2 , ..., fon be a set of observed frequencies and fe , fe , ..., fe be the corre-
1 2 n
2
sponding set of expected (or theoretical) frequencies then c is defined by

( fo1 - fe1 )2 ( fo2 - fe1 )2 ( fon - fen )2 ( f o - f e )2


c2 = + + + =Â
fe1 fe2 fen fe
with n – 1 degrees of freedom.
Note
If the data is given in a series of n numbers then degrees of freedom n = n – 1
In case of binomial distribution, n = n – 1
In case of Poisson distribution, n = n – 2
In case of normal distribution, n = n – 3

6.16.1 Chi–Square Distribution
If x1, x2, …, xn are n independent normal variates with mean zero and standard deviation
2
unity then x12 + x22 + ... + xn2 is a random variate having c distribution with probability
density function given by
n −1 c2

P ( c ) = y0 ( c )
2 2 2
e 2

where n = degrees of freedom = n – 1 and y0 = constant depending on the degrees of


freedom

P(c 2)

v=3

v=5

v=1
c2
O

Fig. 6.7 Chi-square distribution curve


6.66 Chapter 6 Applied Statistics: Test of Hypothesis

2
6.16.2 Properties of c -Distribution
(i) Chi-Square test is always positively skewed.
(ii) The mean of chi-square distribution is the number of degrees of freedom.
(iii) The standard deviation of chi-square distribution = 2n .
(iv) Chi-square values increases with the increase in degrees of freedom.
2
(v) The value of c lies between zero and infinity.
(vi) For different values of degrees of freedom, the shape of the curve will be
different.

6.17 Chi-square Test: Goodness of Fit


2
The values of c is used to test whether the deviations of the observed frequencies
from the expected frequencies are significant or not. It is also used to fit a set of obser-
vations to a given distribution. Hence, chi-square test provides a test of goodness of
fit and may be used to examine the validity of some hypothesis about an observed
frequency distribution.

Test of Significance
Let fo1 , fo2 , ..., fon be a set of observed frequencies and fe1 , fe2 , ..., fen be the corre-
sponding set of expected or theoretical frequencies. The c2 statistic is given by
( f o - f e )2
c2 = Â
fe

Working Rule
(i) Set up a null hypothesis.
(ii) Set up an alternative hypothesis.
(iii) Set a level of significance a.
2
(iv) Calculate c .
2
(v) Find the degree of freedom and find the corresponding value of c at given
level of significance a.
2 2
(vi) If the calculated value of c is less than tabulated value of c at the level
2
of significance a, the null hypothesis is accepted. If calculated value of c
2
is more than tabulated value of c at the level of significance a, the null
hypothesis is rejected.

Example 1
A dice was thrown 132 times and the following frequencies were observed:
No. obtained 1 2 3 4 5 6 Total
Frequency 15 20 25 15 29 28 132
6.17 Chi-square Test: Goodness of Fit 6.67

Test the hypothesis that the dice is unbiased.


Solution
n = 6
(i) Null Hypothesis H0: The dice is unbiased.
(ii) Alternative Hypothesis H1: The dice is biased.
(iii) Level of significance: a = 0.05 (assumption)
(iv) Test statistic:
132
Expected frequency of each number fe = = 22
6

Observed Expected ( fo - fe ) 2
No. obtained
frequency, fo frequency, fe fe

1 15 22 2.23
2 20 22 0.18
3 25 22 0.41
4 15 22 2.23
5 29 22 2.23
6 28 22 1.64

( fo - fe )2
c2 = Â fe
= 8.92

(v) Critical value: n = n – 1 = 6 – 1 = 5


2
c 0.05 (v = 5) = 11.07
(vi) Decision: Since c2 < c 0.052
, the null hypothesis is accepted at 5% level of sig-
nificance, i.e., the dice is unbiased.

Example 2
The number of car accidents in a metropolitan city was found to be 20,
17, 12, 6, 7, 15, 8, 5, 16 and 14 per month respectively. Use c2 test to
check whether these frequencies are in agreement with the belief that the
occurrence of accidents was the same during 10 months period. Test at
5% level of significance.
Solution
n = 10
(i) Null Hypothesis H0: Occurrence of accident was same during 10 months
period.
6.68 Chapter 6 Applied Statistics: Test of Hypothesis

(ii) Alternative Hypothesis H1: Occurrence of accidents was not same during
10 months period.
(iii) Level of significance: a = 0.05
(iv) Test statistic: If occurrence of accidents is same, the expected frequency of
accidents per month
20 + 17 + 12 + 6 + 7 + 15 + 8 + 5 + 16 + 14
fe = = 12
10

Observed Expected ( fo - fe ) 2
frequency, fo frequency, fe fe

20 12 5.33
17 12 2.08
12 12 0
6 12 3
7 12 2.08
15 12 0.75
8 12 1.33
5 12 4.08
16 12 1.33
14 12 0.33

( fo - fe )2
c2 = Â fe
= 20.31

(v) Critical value: n = n – 1 = 10 – 1 = 9


2
c 0.05 (n = 9) = 16.92
(vi) Decision: Since c2 > c 0.05
2
, the null hypothesis is rejected at 5% level of signif-
icance, i.e., occurrence of accidents was not same during 10 months period.

Example 3
200 digits were chosen at random from a set of tables, The frequency of
the digits are shown below:
Digits 0 1 2 3 4 5 6 7 8 9
Frequency 18 19 23 21 16 25 22 20 21 15

Use the c2-test to access the correctness of the hypothesis that the digits
were distributed in equal number in the tables from which these were
chosen.
6.17 Chi-square Test: Goodness of Fit 6.69

Solution
n = 10
(i) Null Hypothesis H0: The digits were distributed in equal number in the ta-
bles.
(ii) Alternative Hypothesis H1: The digits were not distributed in equal number in
the tables.
(iii) Level of significance: a = 0.05 (assumption)
200
(iv) Test statistic: Expected frequency of each digit fe = = 20
10

Observed Expected ( fo - fe ) 2
frequency, fo frequency, fe fe

18 20 0.2
19 20 0.05
23 20 0.45
21 20 0.05
16 20 0.8
25 20 1.25
22 20 0.2
20 20 0
21 20 0.05
15 20 1.25

( fo - fe )2
c2 = Â fe
= 4.3

(v) Critical value: n = n – 1 = 10 – 1 = 9


2
c 0.05 (n = 9) = 16.92
2
(vi) Decision: Since c2 < c 0.05 , the null hypothesis is accepted at 5% level of sig-
nificance, i.e., the digits were distributed in equal number in the table.

Example 4
Theory predicts that the proportion of beans in the four groups A, B, C, D
should be 9 : 3 : 3: 1. In an experiment among 1600 beans, the numbers
in the four groups were 882, 313, 287 and 118. Does the experimental
results support the theory?
Solution
n=4
6.70 Chapter 6 Applied Statistics: Test of Hypothesis

(i) Null Hypothesis H0: The proportion of the beans in the four groups A, B, C, D
is 9 : 3 : 3 : 1.
(ii) Alternative Hypothesis H1: The proportion of the beans in the four groups A,
B, C, D is not 9 : 3 : 3 : 1.
(iii) Level of significance: a = 0.05 (assumption)
(iv) Test statistic:

Observed Expected ( fo - fe ) 2
Group Frequency, frequency,
fe
fo fe

9
A 882 ¥ 1600 = 900 0.36
16

3
B 313 ¥ 1600 = 300 0.56
16

3
C 287 ¥ 1600 = 300 0.56
16

1
D 118 ¥ 1600 = 100 3.24
16

( fo - fe )2
c2 = Â fe
= 4.72

(v) Critical value: n = n – 1 = 4 – 1 = 3


2
     c 0.05 (n = 3) = 7.81
(vi) Decision: Since c2 < c 0.05
2 the null hypothesis is accepted at 5% level of sig-
nificance, i.e., experimental results support the theory and the proportion of the
beans is 9 : 3 : 3 : 1.

Example 5
The following mistakes per page were observed in a book:
No. of mistakes per page 0 1 2 3 4
No. of pages 211 90 19 5 0

Fit a Poisson distribution and test the goodness of fit.


Solution
(i) Null Hypothesis H0: The mistakes follow Poisson distribution and Poisson
distribution can be fitted to the data.
(ii) Alternative Hypothesis H1: The mistakes do not follow Poisson distribu-
tion.
6.17 Chi-square Test: Goodness of Fit 6.71

(iii) Level of significance: a = 0.05 (assumption)


(iv) Test statistic: The expected frequencies by Poisson distribution are given
by
 e−l l x 
Expected frequency fe = Np = N  , x = 0,1, 2, 3, 4
 x ! 
Sfx 211(0) + 90(1) + 19( 2) + 5(3) + 0( 4)
l= = = 0.44
N 211 + 90 + 19 + 5 + 0
æ e -0.44 0.44 x ö
f e = Np = 325 ç ÷ , x = 0,1, 2, 3, 4
è x! ø
Expected or Theoretical frequency
x 0 1 2 3 4
fe 209.31 92.1 20.26 2.97 0.33
When expected frequencies are less than 10, classes are grouped together.

No. of Observed Expected ( fo - fe )2


fo – fe
mistakes frequency fo frequency fe fe

0 211 209.31 1.69 0.014


1 90 92.10 –2.1 0.048
2 19 20.26
3 5 2.97 0.44 0.008
4 0 0.33

( fo - fe )2
c2 = å = 0.07
fe

(v)
Critical value: The number of degrees of freedom is 1 for each class. There
are 5 classes originally. Hence, the degrees of freedom originally is 5. Since
the classes are reduced by 2, the degrees of freedom is reduced by 2. Further,
while calculating the parameter l, two sums Sfx and Sf are used. Hence, the
degrees of freedom is again reduced by 2.
Hence, the number of degrees of freedom g = 5 – (2 + 2) = 1
c 02.05 = 3.84
2
(vi) Decision: Since c < c 0.05 , the null hypothesis is accepted at 5% level of sig-
2

nificance, i.e., the mistakes follow Poisson’s distribution.

Example 6
A set of five similar coins is tossed 320 times and result is obtained as
follows:
6.72 Chapter 6 Applied Statistics: Test of Hypothesis

No. of heads 0 1 2 3 4 5
Frequency 6 27 72 112 71 32
Test the hypothesis that the data follow a binomial distribution.
Solution
(i)
Null Hypothesis H0: The data follow a binomial distribution.
(ii)
Alternative Hypothesis H1: The data do not follow binomial distribution.
Level of significance: a = 0.05
(iii)
1
Test statistic: Probability of getting a head p =
(iv)
2
1
Probability of getting a tail q =
By binomial distribution, 2
x 5- x
æ1ö æ1ö
p( x ) = nC x p x q n - x = 5C x ç ÷ ç ÷ , x = 0,1, 2, 3, 4, 5
è2ø è2ø
N = 320
é æ1ö æ1ö ù
x 5- x

Expected frequency f e = Np( x ) = 320 ê 5C x ç ÷ ç ÷ ú , x = 0,1, 2, 3, 4, 5


êë è 2 ø è 2 ø úû
Expected or theoretical frequency
x 0 1 2 3 4 5
fe 10 50 100 100 50 10

Observed Expected ( fo - fe )2
No. of heads fo – fe
frequency fo frequency fe fe

0 6 10 –4 1.6
1 27 50 –23 10.58
2 72 100 –28 7.84
3 112 100 12 1.44
4 71 50 21 8.82
5 32 10 22 48.4

( fo - fe )2
c2 = å = 78.68
fe

(v) Critical value: n = n – 1 = 6 – 1 = 5


        c 02.05 = 11.07
2
(vi) Decision: Since c2 > c 0.05 at 5% level of significance, the null hypothesis is
rejected, i.e., the data do not follow the binomial distribution.
6.17 Chi-square Test: Goodness of Fit 6.73

Example 7
Fit the equation of the best fitting normal curve to the following data:
x 135 145 155 165 175 185 195 205 Total
f 2 14 22 25 19 13 3 2 100
2
Compare the theoretical and observed frequencies. Using c test find
goodness of fit. Given that m = 165.6 and s = 15.02.
Solution
m = 165.6, s = 15.02, N = Sf = 100
The data is first converted into class intervals with inclusive series

Class Lower class X −m Area from O Area in class Expected


Z=
interval X s to Z interval frequencies

130–140 130 –2.37 0.4911 0.0357 3.57  4


140–150 140 –1.70 0.4554 0.1046 10.46  11
150–160 150 –1.04 0.3508 0.2065 20.65  21
160–170 160 –0.37 0.1443 0.2584 25.84  26
170–180 170 0.29 0.1141 0.2174 21.74  21
180–190 180 0.96 0.3315 0.1159 11.59  12
190–200 190 1.62 0.4474 0.0416 4.16  4
200–210 200 2.29 0.4890 0.0095 0.95  1
210–220 210 2.96 0.4985
2
Calculation of c
When expected frequencies are less than 10, classes are grouped together.
Observed Expected ( fo - fe )2
x frequency frequency fo – fe
fo fe fe

135 2 4 1 0.067
145 14 11
155 22 21 1 0.048
165 25 26 –1 0.038
175 19 21 –2 0.19
185 13 12
195 3 4 1 0.0588
205 2 1

( f o - f e )2
c2 = Â = 0.4018
fe
6.74 Chapter 6 Applied Statistics: Test of Hypothesis

Critical value: There are 5 frequencies. While calculating mean and standard deviation,
three sums Sf, Sfx, and Sfx2 are used. Hence, the number of degrees of freedom
n = 5−3= 2
c 02.05 = 5.99
2
Since c < c 0.05 at 5% level of significance, the fit is good and the distribution is nearly
2

normal.

6.18 Chi-square Test for Independence of Attributes

In statistics, sometimes we have to deal with attributes or qualitative characters,


which cannot be measured accurately, although items can be divided into two or more
categories w.r.t. the attributes. Let A and B be two attributes of the population. A can be
divided into m categories A1, A2 …, Am and B can be divided into n categories B1, B2, …,
Bn. The data can be shown in the form of a two-way table with m rows and n columns,
as in a bivariate frequency distribution. This two-way frequency table for attributes is
known as m × n contingency table. The frequency of observations belonging to both
the categories Ai and Bi simultaneously is shown in the cell at the i-th row and j-th
column and denoted by (AiBj). Similarly (Ai) and (Bj) denote the frequency of items
belonging to categories Ai and Bj respectively and N, the total frequency.
(3 × 4) contingency table
Attribute B
Total
B1 B2 B3 B4
A1 (A1 B1) (A1 B2) (A1 B3) (A1 B4) (A1)
Attribute A

A2 (A2 B1) (A2 B2) (A2 B3) (A2 B4) (A2)


A3 (A3 B1) (A3 B2) (A3 B3) (A3 B4) (A3)
Total (B1) (B2) (B3) (B4) N

Independence of Attributes
Two attributes A and B are said to be independent if they are not related to each other.
If two attributes A and B are not independent, they are associated on the basis of cell
frequencies. It is required to test whether two attributes A and B are associated or not.
Under null hypothesis H0 (attributes are independent), the expected frequency fe of any
cell is given by
(Row total) ¥ (Column total) ( Ai )( B j )
fe = =
Total frequency N
6.18 Chi-square Test for Independence of Attributes 6.75

Then test statistic is given by


( f o - f e )2
c2 = Â
fe

with degree of freedom n = (number of row – 1) (number of columns – 1)
If the calculated value of c2 is less than tabulated value of c2 at the given level of
significance a for degree of freedom n, the null hypothesis is accepted and attributes
are said to be independent. If calculated value of c2 is more than tabulated value of
c2 at given level of significance a for degree of freedom n, the null hypothesis is
rejected.

Yate’s Correction
In a 2 × 2 table, there is only one degree of freedom. If any of the expected frequency
is less than 10, Yate’s correction is applied in chi-square formula.
È f - f - 0.5
{ } ˘
2

c = ÂÍ ˙
o e 2
Í fe ˙
ÍÎ ˙˚

Example 1
A total of 3759 individual were interviewed in a public opinion survey on
a political proposal. Of them 1872 were men and the rest were women.
A total of 2257 individuals were in favour of the proposal and 917 were
opposed to it. A total of 243 men were undecided and 442 women were
opposed to it. Do you justify or contradict the hypothesis that there is no
association between sex and attitude at 5% level of significance?
Solution
N = 3759
Opinion about political proposal
Favoured Opposed Undecided Total
Men 1154 475 243 1872
Women 1103 442 342 1887
Total 2257 917 585 3759

(i) Null Hypothesis H0: There is no association between sex and attitude i.e., sex
and attitude are independent.
(ii) Alternative Hypothesis H1: There is association between sex and attitude.
(iii) Level of significance: a = 0.05
6.76 Chapter 6 Applied Statistics: Test of Hypothesis

(iv) Test statistic:


Calculation of c2

Expected Frequency
Observed Frequency ( fo - fe ) 2
( Ai )( B j )
fo fe = fe
N

1872 ¥ 2257
1154 ª 1124 0.8
3759

1872 ¥ 917
475 ª 457 0.71
3759

1872 ¥ 585
243 ª 291 7.92
3759

1887 ¥ 2257
1103 ª 1133 0.79
3759

1887 ¥ 917
442 ª 460 0.70
3759

1887 ¥ 585
342 ª 294 7.84
3759

( fo - fe )2
c2 = Â fe
= 18.76

(v) Critical value: n = (r – 1)(c – 1) = (2 – 1) (3 – 1) = 2


2
       c 0.05 (n = 2) = 5.99

(vi) Decision: Since c2 > c 0.052


, the null hypothesis is rejected at 5% level of sig-
nificance, i.e., there is association between sex and attitude.

Example 2
A sample of 400 students of undergraduate and 400 students of post-
graduate classes was taken to know their opinion about autonomous
colleges. 290 of the undergraduate and 310 of the postgraduate students
favoured the autonomous status. Present these facts in the form of a
table and test at 5% level of significance, that the opinion regarding
autonomous status of colleges is independent of the level of classes of
students.
6.18 Chi-square Test for Independence of Attributes 6.77

Solution
N = 800
Opinion about autonomous colleges

Favoured Not favoured Total


Undergraduate 290 110 400
Postgraduate 310 90 400
Total 600 200 800

(i) Null Hypothesis H0 : There is no relation between the classes of students and
opinion, i.e., two attributes are independent.
(ii) Alternative Hypothesis H1: There is relation between the classes of students
and opinion.
(iii) Level of significance: a = 0.05
(iv) Test statistic:
Expected frequency
Observed Frequency ( fo - fe ) 2
( Ai )( B j )
fo fe = fe
N

400 ¥ 600
290 = 300 0.33
800

400 ¥ 200
110 = 100 1.00
800

400 ¥ 600
310 = 300 0.33
800

400 ¥ 200
90 = 100 1.00
800

( fo - fe )2
c2 = Â fe
= 2.66

(v) Critical value: n = (r – 1) (c – 1) = (2 – 1) (2 – 1) = 1


2
c 0.05 (n – 1) = 3.81
(vi) Decision: Since c2 < c 0.052
the null hypothesis is accepted at 5% level of sig-
nificance, i.e., there is no relation between the classed of students are opinion.

Example 3
In an experiment on immunisation of cattle from tuberculosis the follow-
ing results were obtained:
6.78 Chapter 6 Applied Statistics: Test of Hypothesis

Affected Not affected Total


Inoculated 267 27 294
Not inoculated 757 155 912
Total 1024 182 1206

Use c2-test to determine the efficiency of vaccine in preventing the


tuberculosis.
Solution
N = 1206
(i) Null Hypothesis H0: There is no relation between inoculation and effect on
disease, i.e., two attributes are independent.
(ii) Alternative Hypothesis H1: There is relation between inoculation and effect on
disease.
(iii) Level of significance: a = 0.05 (assumption)
(iv) Test statistic:
Expected frequency
Observed Frequency ( fo - fe ) 2
( Ai )( B j )
fo fe = fe
N

294 ¥ 1024
267 ª 250 1.156
1206

294 ¥ 182
27 ª 44 6.568
1206

912 ¥ 1024
757 ª 774 0.37
1206

912 ¥ 182
155 ª 138 2.09
1206

( fo - fe )2
c2 = Â fe
= 10.19

(v) Critical value: n = (r – 1) (c – 1) = (2 – 1) (2 – 1) = 1


2
c 0.05 (n = 1) = 3.84
(vi) Decision: Since c2 > c 0.05
2
, the null hypothesis is rejected at 5% level of sig-
nificance, i.e., vaccine is effective in preventing tuberculosis.

Example 4
Given the following contingency table for hair colour and eye colour.
Find the value of c2. Is there good association between the two?
6.18 Chi-square Test for Independence of Attributes 6.79

Hair colour
Eye colour Total
Fair Brown Black
Blue 15 5 20 40
Grey 20 10 20 50
Brown 25 15 20 60
Total 60 30 60 150

Solution
N = 150
(i) Null Hypothesis H0: There is no association between two attributes, hair and
eye colours.
(ii) Alternative Hypothesis H1: There is association between two attributes, hair
and eye colours.
(iii) Level of significance: a = 0.05 (assumption)
(iv) Test statistic:
Expected frequency
Observed Frequency ( fo - fe ) 2
( Ai )( B j )
fo fe = fe
N
40 ¥ 60
15 = 16 0.0625
150
40 ¥ 30
5 =8 1.125
150
40 ¥ 60
20 = 16 1
150
50 ¥ 60
20 = 20 0
150
50 ¥ 30
10 = 10 0
150
50 ¥ 60
20 = 20 0
150
60 ¥ 60
25 = 24 0.042
150
60 ¥ 30
15 = 12 0.75
150
60 ¥ 60
20 = 24 0.666
150
( fo - fe )2
c2 = Â fe
= 3.6465

(v) Critical value: n = (r – 1) (c – 1) = (3 – 1) (3 – 1) = 4


2
c 0.05 (n = 4) = 9.49
6.80 Chapter 6 Applied Statistics: Test of Hypothesis

(vi) Decision: Since c2 < c 0.05 2


, the null hypothesis is accepted at 5% level of
significance, i.e., there is no association between two attributes, hair and eye
colours.

Example 5
Two researchers adopted different sampling techniques while investi-
gating some group of students to find the number of students falling into
different intelligence level. The results are as follows:
Researchers Below Average Above Genius Total
average average
X 86 60 44 10 200
Y 40 33 25 2 100
Total 126 93 69 12 300

Would you say that the sampling techniques adopted by the two research-
ers are significantly different?
Solution
N = 300
(i) Null Hypothesis H0: There is no significant difference in the sampling
techniques adopted by the two researchers.
(ii) Alternative Hypothesis H1: There is significant difference in the sampling
techniques adopted by the two researchers.
(iii) Level of significance: a = 0.05 (assumption)
(iv) Test statistic:
Expected frequency
Observed frequency ( fo - fe ) 2
( Ai )( B j )
fo fe = fe
N

200 ¥ 126
86 = 84 0.0476
300

200 ¥ 93
60 = 62 0.0645
300

200 ¥ 69
44 = 46 0.0869
300

200 ¥ 12
10 =8 0.5
300
6.18 Chi-square Test for Independence of Attributes 6.81

100 ¥ 126
40 = 42 0.0952
300

100 ¥ 93
33 = 31 0.129
300

100 ¥ 69
25 = 23 0.1739
300

100 ¥ 12
2 =4 1
300

( fo - fe )2
c2 = Â fe
= 2.0971

(v) Critical value: n = (r – 1) (c – 1) = (2 – 1) (4 – 1) = 3


2
c 0.05 (n = 3) = 7.81
(vi) Decision: Since c2 < c 0.05 2
, the null hypothesis is accepted at 5% level of
significance, i.e., there is no significant difference in the sampling techniques
adopted by the two researchers.

Example 6
The following table gives the level of education and the marriage
adjustment score for a sample of married women:
Level of Marriage adjustment Total
education Very low Low High Very high
College 24 97 62 58 241
High school 22 28 30 41 121
Middle school 32 10 11 20 73
Total 78 135 103 119 435

Can you conclude from the above data the higher the level of education,
the greater is the degree of adjustment in marriage?
Solution
N = 435
(i) Null Hypothesis H0: There is no relation between the level of education and
adjustment in marriage, i.e., two attributes are independent.
(ii) Alternative Hypothesis H1: There is relation between level of education and
adjustment in marriage.
6.82 Chapter 6 Applied Statistics: Test of Hypothesis

(iii) Level of significance: a = 0.05 (assumption)


(iv) Test statistic:
Expected frequency
Observed frequency ( fo - fe ) 2
( Ai )( B j )
fo fe = fe
N

241 ¥ 78
24 ª 43 8.3953
435

241 ¥ 135
97 ª 75 6.4533
435

241 ¥ 103
62 ª 57 0.4386
435

241 ¥ 119
58 ª 66 0.9697
435

121 ¥ 78
22 ª 22 0
435

121 ¥ 135
28 ª 37 2.1892
435

121 ¥ 103
30 ª 29 0.0345
435

121 ¥ 119
41 ª 33 1.9394
435

73 ¥ 78
32 ª 13 27.7692
435

73 ¥ 135
10 ª 23 7.3478
435

73 ¥ 103
11 ª 17 2.1176
435

73 ¥ 119
20 ª 20 0
435

( fo - fe )2
c2 = Â fe
= 57.713

(v) Critical value: n = (r – 1) (c – 1) = (3 – 1) (4 – 1) = 6


2
c 0.05 (n = 6) = 12.59
6.18 Chi-square Test for Independence of Attributes 6.83

(vi) Decision: Since c2 > c 0.05


2
, the null hypothesis is rejected at 5% level of significance
i.e., level of education and adjustment in marriage are related and higher the level
of education, the greater is the degree of adjustment in marriage.

Example 7
Two batches each of 12 animals are taken for test of inoculation. One
batch was inoculated and the other batch was not inoculated. The
number of dead and surviving animals are given in the following table
for both the cases. Can the inoculation be regarded as effective against
the disease. Make Yate’s correction for continuity of c2?
Dead Survived Total
Inoculated 2 10 12
Not inoculated 8 4 12
Total 10 14 24

Solution
N = 24
(i) Null hypothesis H0: There is no relation between inoculation and death i.e.,
inoculation and effect on disease are independent.
(ii) Alternative Hypothesis H1: There is relation between inoculation and death.
(iii) Level of significance: a = 0.05 (assumption)
(iv) Test statistic: Yate’s correction is used only when n = 1 and when some ex-
pected frequencies are small, i.e., less than 10. Here, expected frequencies are
less than 10 each.
Expected frequency
{ fo - fe - 0.5}
2
Observed frequency ( Ai )( B j )
fo fe = fe
N

12 ¥ 10
2 =5 1.25
25

12 ¥ 14
10 =7 0.89
24

12 ¥ 10
8 =5 1.25
24

12 ¥ 14
4 =7 0.89
24

{ fo - fe - 0.5}
2
2
c = Â fe
= 4.28
6.84 Chapter 6 Applied Statistics: Test of Hypothesis

(v) Critical value: n = (2 – 1) (2– 1) = 1


2
c 0.05 (n = 1) = 3.84
(vi) Decision: Since c2 > c 0.052
, the null hypothesis is rejected at 5% level of sig-
nificance, i.e., there is association between inoculation and death and inocula-
tion is regarded as effective against the disease.

Exercise 6.5
1. A dice is thrown 264 times with the following results: Show that the
dice is biased [Given χ02.05 = 11.07 for 5 df]
No. appeared on the dice 1 2 3 4 5 6
Frequency 40 32 28 58 54 52

2. A pair of dice are thrown 360 times and frequency of each sum is given
below:
Sum 2 3 4 5 6 7 8 9 10 11 12
Frequency 8 24 35 37 44 65 51 42 26 14 14

would you say that the dice are fair on the basis of the chi-square test
at 0.05 level of significance?
 [Ans.: The dice are fair]
3. 4 coins are tossed 160 times and the following results were obtained:
No. of heads 0 1 2 3 4
Observed frequencies 17 52 54 31 6
Under the assumption that coins are balanced, find the expected
frequencies of 0, 1, 2, 3 or 4 heads, and test the goodness of fit
(a = 0.05).
[Ans.: Expected frequencies: 10, 40, 60, 40, 10,
the data do not follow binomial distribution]
4. Fit a Poisson distribution to the following data and for its goodness of
fit at level of significance 0.05:
x 0 1 2 3 4
f 419 352 154 56 19

5. The following table gives the number of accidents in a city during a week.
Find whether the accidents are uniformly distributed over a week.
Day Sun Mon Tue Wed Thu Fri Sat Total
No. of accidents 13 15 9 11 12 10 14 84

[Ans.: The accidents are uniformly distributed over a week]


6.18 Chi-square Test for Independence of Attributes 6.85

6. Weights in kilograms of 10 students are given below: 38, 40, 45, 53, 47,
43, 55, 48, 52, 49
Can we say that the variance of the normal distribution from which the
above sample is drawn is 20 kg?
[Ans.: The sample is drawn from the
normal population with variance 20]
7. Five dice are thrown 192 times and the number of times 4, 5 or 6 are
obtained are as follows:
No. of dice showing 4, 5, 6 5 4 3 2 1 0
Frequency 6 46 70 48 20 2
2
Calculate c . [Ans.: 16.94]
8. The distribution of defects in printed circuit board is hypothesised to
follow Poisson distribution. A random sample of 60 printed boards shows
the following data:
No. of defects 0 1 2 3
Observed frequency 32 15 9 4
Does the hypothesis of Poisson distribution appropriate?
[Ans.: The defects follow Poisson distribution]
9. Based on the following data, determine if there is a relation between
literacy and smoking.
Smokers Non-smokers
Literates 83 57
Illiterates 45 68
[Ans.: c2 = 9.19, yes]
10. Table below shows the performances of students in mathematics and
physics. Test the hypothesis that the performance in mathematics is
independent of performance in physics.
Grades in Grades in Mathematics
Physics High Medium Low
High 56 71 12
Medium 47 163 38
Low 14 42 81
2
[Ans.: c = 132.31, Hypothesis is rejected]
11. Investigate the association between the darkness of eye colour in father
and son from the following data:
6.86 Chapter 6 Applied Statistics: Test of Hypothesis

Colour of son’s Colour of father’s eyes


Total
eyes Dark Not dark
Dark 48 90 138
Not dark 80 782 862
Total 128 872 1000
[Ans.: c2 = 3.84, There is association between two attributes]
12. From the following data, find whether there is any significant linking in
the habit of taking soft drinks among the categories of employees.
Employees
Soft drink
Clerks Teachers Officers
Pepsi 10 25 65
Thumsup 15 30 65
Fanta 50 60 30
2
[Ans.: c = 60.24, Two attributes are not independent]
13. 1000 students at college level were graded according to their IQ and
the economic conditions of their home. Use c2-test to find out whether
there is any association between condition at home and IQ.
Economic IQ
condition High Low Total
Rich 460 140 600
Poor 240 160 400
Total 700 300 1000
[Ans.: c2 = 31.733, These is no association between two attributes]
14. A random sample of 500 students were classified according to economic
condition of their family and also according to merit as shown below:
Economic condition
Merit Total
Rich Middleclass Poor
Meritorious 42 137 61 240
Not-meritorious 58 113 89 260
Total 100 250 150 500

Test whether the two attributes merit and economic condition are
associated or not.
[Ans.: c2 = 9.30, The two attributes are associated]
CHAPTER

Curve Fitting
7
Chapter Outline
7.1 Introduction
7.2 Least Square Method
7.3 Fitting of Linear Curves
7.4 Fitting of Quadratic Curves
7.5 Fitting of Exponential and Logarithmic Curves

7.1 introduction

Curve fitting is the process of finding the ‘best-fit’ curve for a given set of data. It is
the representation of the relationship between two variables by means of an algebraic
equation. On the basis of this mathematical equation, predictions can be made in many
statistical problems.
Suppose a set of n points of values (x1, y1), (x2, y2), …, (xn, yn) of the two variables
x and y are given. These values are plotted on a rectangular coordinate system, i.e.,
the xy-plane. The resulting set of points is known as a scatter diagram (Fig. 7.1).
The scatter diagram exhibits the trend and it is possible to visualize a smooth curve
approximating the data. Such a curve is known as an approximating curve.

y y

o x o x

Fig. 7.1
7.2 Chapter 7 Curve Fitting

7.2 LEAST SQUARE METHOD

From a scatter diagram, generally, more than one curve y


may be seen to be appropriate to the given set of data. The
P(xi ,yi )
method of least squares is used to find a curve which passes
through the maximum number of points.
Q(xi ,y)
Let P (xi, yi) be a point on the scatter diagram (Fig. 7.2).
Let the ordinate at P meet the curve y = f (x) at Q and the o x
M
x-axis at M.
Fig. 7.2
Distance QP = MP - MQ
= yi - y
= yi - f ( xi )

The distance QP is known as deviation, error, or residual and is denoted by di. It may
be positive, negative, or zero depending upon whether P lies above, below, or on the
curve. Similar residuals or errors corresponding to the remaining (n – 1) points may be
obtained. The sum of squares of residuals, denoted by E, is given as
n n
E = Â di 2 = Â [ yi - f ( xi )]2
i =1 i =1

If E = 0 then all the n points will lie on y = f (x). If E π 0, f (x) is chosen such that E is
minimum, i.e., the best fitting curve to the set of points is that for which E is minimum.
This method is known as the least square method. This method does not attempt to
determine the form of the curve y = f (x) but it determines the values of the parameters
of the equation of the curve.

7.3 Fitting of Linear Curves

Let (xi, yi), i = 1, 2, …, n be the set of n values and let the relation between x and y be
y = a + bx. The constants a and b are selected such that the straight line is the best fit to
the data.
The residual at x = xi is
di = yi - f ( xi )
= yi - (a + bxi ) i = 1, 2, ..., n
n
E = Â di 2
i =1
n
= Â ÈÎ yi - (a + bxi )˘˚
2

i =1
n
= Â ( yi - a - bxi )2
i =1
7.3 Fitting of Linear Curves 7.3

For E to be minimum,
(i) ∂E = 0
∂a
n
 2( yi - a - bxi )(-1) = 0
i =1
n
 ( yi - a - bxi ) = 0
i =1
n n n
 yi = aÂ1 + b xi
i =1 i =1 i =1

 y = na + b x
∂E
(ii) =0
∂b
n
 2( yi - a - bxi )(- xi ) = 0
i =1
n
 ( xi yi - axi - bxi 2 ) = 0
i =1
n n n
 xi yi = a xi + b xi 2
i =1 i =1 i =1

 xy = a x + b x 2
These two equations are known as normal equations. These equations can be solved
simultaneously to give the best values of a and b. The best fitting straight line is
obtained by substituting the values of a and b in the equation y = a + bx .

Example 1
Fit a straight line to the following data:
x 1 2 3 4 6 8
y 2.4 3 3.6 4 5 6

Solution
Let the straight line to be fitted to the data be
y = a + bx

The normal equations are


 y = na + b x  …(1)
 xy = a x + b x 2  …(2)
7.4 Chapter 7 Curve Fitting

Here, n = 6

x y x2 xy

1 2.4 1 2.4

2 3 4 6

3 3.6 9 10.8

4 4 16 16

6 5 36 30

8 6 64 48

Âx = 24 Ây = 24 Âx2 = 130 Âxy = 113.2

Substituting these values in Eqs (1) and (2),


24 = 6 a + 24b …(3)
113.2 = 24 a + 130b …(4)
Solving Eqs (3) and (4),
a = 1.9764
b = 0.5059
Hence, the required equation of the straight line is
y = 1.9764 + 0.5059x

Note Âx, Ây, Âx2, Âxy can be directly obtained with the help of scientific calculator.

Example 2
Fit a straight line to the following data. Also, estimate the value of y at
x = 2.5.
x 0 1 2 3 4
y 1 1.8 3.3 4.5 6.3

Solution
Let the straight line to be fitted to the data be
y = a + bx
The normal equations are

 y = na + b x …(1)

 xy = a x + b x 2 …(2)
Here, n = 5
7.3 Fitting of Linear Curves 7.5

x y x2 xy

0 1 0 0

1 1.8 1 1.8

2 3.3 4 6.6

3 4.5 9 13.5

4 6.3 16 25.2

Âx = 10 Ây = 16.9 Âx2 = 30 Âxy = 47.1

Substituting these values in Eqs (1) and (2),


16.9 = 5a + 10b …(3)
47.1 = 10 a + 30b …(4)
Solving Eqs (3) and (4),
a = 0.72
b = 1.33
Hence, the required equation of the straight line is
y = 0.72 + 1.33x

At x = 2.5,
y (2.5) = 0.72 + 1.33 (2.5) = 4.045

Example 3
A simply supported beam carries a concentrated load P(lb) at its
midpoint. Corresponding to various values of P, the maximum deflection
Y(in) is measured. The data is given below:
P 100 120 140 160 180 200
Y 0.45 0.55 0.60 0.70 0.80 0.85

Find a law of the form Y = a + bP using the least square method.


 [Summer 2015]
Solution
Let the straight line to be fitted to the data be
Y = a + bP
The normal equations are

 Y = na + b P  ...(1)
7.6 Chapter 7 Curve Fitting

 PY = a P + b P 2  ...(2)
Here, n = 6
P Y P2 PY
100 0.45 10000 45
120 0.55 14400 66
140 0.60 19600 84
160 0.70 25600 112
180 0.80 32400 144
200 0.85 40000 170
ÂP = 900 ÂY = 3.95 ÂP = 142000
2
ÂPY = 621

Substituting these values in Eqs (1) and (2),


3.95 = 6a + 900 b ...(3)
621 = 900 a + 142000 b ...(4)
Solving Eqs (3) and (4),
a = 0.0476
b = 0.0041
Hence, the required equation of the straight line is
Y = 0.0476 + 0.0041 P

Example 4
Fit a straight line to the following data. Also, estimate the value of y at
x = 70.
x 71 68 73 69 67 65 66 67
y 69 72 70 70 68 67 68 64

Solution
Since the values of x and y are larger, we choose the origin for x and y at 69 and 67
respectively,
Let X = x - 69 and Y = y - 67
Let the straight line to be fitted to the data be
Y = a + bX
The normal equations are

 Y = na + b X …(1)
 XY = a X + b X 2 …(2)
7.3 Fitting of Linear Curves 7.7

Here, n = 8
x y X Y X2 XY
71 69 2 2 4 4
68 72 −1 5 1 −5
73 70 4 3 16 12
69 70 0 3 0 0
67 68 –2 1 4 −2
65 67 −4 0 16 0
66 68 −3 1 9 −3
67 64 −2 −3 4 6
ÂX = –6 ÂY = 12 ÂX = 54
2
ÂXY = 12

Substituting these values in Eqs (1) and (2),


12 = 8a - 6b …(3)
12 = -6 a + 54b …(4)
Solving Eqs (3) and (4),
a = 1.8182
b = 0.4242
Hence, the required equation of the straight line is
Y = 1.8182 + 0.4242 X
y - 67 = 1.8182 + 0.4242( x - 69)
y = 0.4242 x + 39.5484
y( x = 70) = 0.4242(70) + 39.5484 = 69.2424

Note Since Âx, Ây, Âx2, Âxy can be directly obtained with the help of scientific
calculator, the problem can be solved without shifting the origin.

Example 5
Fit a straight line to the following data taking x as the dependent vari-
able.
x 1 3 4 6 8 9 11 14
y 1 2 4 4 5 7 8 9

Solution
If x is considered the dependent variable and y the independent variable, the equation
of the straight line to be fitted to the data is
x = a + by
7.8 Chapter 7 Curve Fitting

The normal equations are

 x = na + b y …(1)
 xy = a y + b y2 …(2)
Here, n = 8
x y y2 xy
1 1 1 1
3 2 4 6
4 4 16 16
6 4 16 24
8 5 25 40
9 7 49 63
11 8 64 88
14 9 81 126
Âx = 56 Ây = 40 Ây = 256
2
Âxy = 364
Substituting these values in Eqs (1) and (2),
56 = 8a + 40b  …(3)
364 = 40 a + 256b  …(4)
Solving Eqs (3) and (4),
a = − 0.5
b = 1.5
Hence, the required equation of the straight line is
x = - 0.5 + 1.5 y

Example 6
If P is the pull required to lift a load W by means of a pulley block, find
a linear law of the form P = mW + c connecting P and W using the
following data:
P 12 15 21 25

W 50 70 100 120

where P and W are taken in kg-wt. Compute P when W = 150 kg.


Solution
Let the linear curve (straight line) fitted to the data be
P = mW + c = c + mW
7.3 Fitting of Linear Curves 7.9

The normal equations are


ÂP = nc + mÂW ...(1)
ÂPW = cÂW + mÂW2 ...(2)
Here, n = 4

P W W2 PW
12 50 2500 600
15 70 4900 1050
21 100 10000 2100
25 120 14400 3000
ÂP = 73 ÂW = 340 ÂW = 31800
2
ÂPW = 6750

Substituting these values in Eqs (1) and (2),


73 = 4c + 340 m  ...(3)
6750 = 340 c + 31800 m ...(4)
Solving Eqs (3) and (4),
c = 2.2759
m = 0.1879
Hence, the required equation of the straight line is
P = 0.1879 W + 2.2759
When W = 150 kg,
P = 0.1879(150) + 2.2759 = 30.4609

Exercise 7.1
1. Fit the line of best fit to the following data:

x 0 5 10 15 20 25
y 12 15 17 22 24 30

ÈÎAns. : y = 0.7 x + 11.28˘˚


2. The results of a measurement of electric resistance R of a copper bar at
various temperatures t°C are listed below:

t°C 19 25 30 36 40 45 50
R 76 77 79 80 82 83 85

Find a relation R = a + bt where a and b are constants to be determined.


ÎÈAns. : R = 70.0534 + 0.2924 t ˘˚
7.10 Chapter 7 Curve Fitting

3. Fit a straight line to the following data:

x 1.53 1.78 2.60 2.95 3.42


y 33.50 36.30 40.00 45.85 53.40

ÈÎAns. : y = 19 + 9.7 x ˘˚

4. Fit a straight line to the following data:

x 100 120 140 160 180 200


y 0.45 0.55 0.60 0.70 0.80 0.85

ÎÈAns. : y = 0.0475 + 0.00407 x ˘˚


 ind the relation of the type R = aV + b, when some values of R and V
5. F
obtained from an experiment are

V 60 65 70 75 80 85 90
R 109 114 118 123 127 130 133

ÈÎAns. : R = 0.8071V + 61.4675˘˚

7.4 Fitting of Quadratic Curves

Let (xi, yi), i = 1, 2, …, n be the set of n values and let the relation between x and y be
y = a + bx + cx 2 . The constants a, b, and c are selected such that the parabola is the
best fit to the data. The residual at x = xi is
di = yi - f ( xi )

(
= yi - a + bxi + cxi2 )
n
E = Â di2
i =1
2

( )
n
= Â È yi - a + bxi + cxi 2 ˘
i =1
Î ˚
2

( )
n
= Â yi - a - bx i - cxi 2
i =1

For E to be minimum,
∂E
(i) =0
∂a
n
 2 ( y i - a - bxi - cxi ) (-1) = 0
i =1
7.4 Fitting of Quadratic Curves 7.11

n
 ( yi - a - bxi - cxi ) = 0
i =1
n n n n
 yi = aÂ1 + b xi + c xi 2
i =1 i =1 i =1 i =1

 y = na + b x + c x 2
(ii) ∂E = 0
∂b
n
 2( yi - a - bxi - cxi )(- xi ) = 0
i =1

 ( xi yi - axi - bxi2 - cxi3 ) = 0


n

i =1
n n n n
 xi yi = a xi + b xi2 + c xi3
i =1 i =1 i =1 i =1

 xy = na + b x 2 + c x3
∂E
(iii) =0
∂c
n
 2( yi - a - bxi - cxi2 )( xi2 ) = 0
i =1
n
 xi2 yi - axi2 - bxi3 - cxi4 = 0
i =1
n n n n
 xi2 yi = a xi2 + b xi3 + c xi4
i =1 i =1 i =1 i =1

 x 2 y = a  x 2 + b x 3 + c  x 4
These equations are known as normal equations. These equations can be solved simul-
taneously to give the best values of a, b, and c. The best fitting parabola is obtained by
substituting the values of a, b, and c in the equation y = a + bx + cx 2 .

Example 1
Fit a least squares quadratic curve to the following data:
x 1 2 3 4
y 1.7 1.8 2.3 3.2

Estimate y(2.4).
7.12 Chapter 7 Curve Fitting

Solution
Let the equation of the least squares quadratic curve (parabola) be y = a + bx + cx 2 .
The normal equations are

 y = na + b x + c x 2 …(1)

 xy = a x + b x 2 + c x3  …(2)

 x 2 y = a  x 2 + b x 3 + c  x 4 …(3)
Here, n = 4

x y x2 x3 x4 xy x2y
1 1.7 1 1 1 1.7 1.7
2 1.8 4 8 16 3.6 7.2
3 2.3 9 27 81 6.9 20.7
4 3.2 16 64 256 12.8 51.2
Sx = 10 Sy = 9 2
Sx = 30 3
Sx = 100
4
Sx = 354 Sxy = 25 2
Sx y = 80.8

Substituting these values in Eqs (1), (2), and (3),


9 = 4 a + 10b + 30c …(4)
25 = 10 a + 30b + 100c …(5)
80.8 = 30 a + 100b + 354c  …(6)
Solving Eqs (4), (5), and (6),
a=2
b = − 0.5
c = 0.2
Hence, the required equation of least squares quadratic curve is
y = 2 - 0 ◊ 5x + 0 ◊ 2 x2
y(2 ◊ 4) = 2 - 0 ◊ 5(2 ◊ 4) + 0 ◊ 2(2.4)2 = 1 ◊ 952

Note Âx, Ây, Âx2, Âx3, Âx4, Âxy, Âx2y can be directly obtained with the help of
scientific calculator.

Example 2
Fit a second-degree polynomial using least square method to the
following data:
x 0 1 2 3 4
y 1 1.8 1.3 2.5 6.3
 [Summer 2015]
7.4 Fitting of Quadratic Curves 7.13

Solution
Let the equation of the least squares quadratic curve be y = a + bx + cx2. The normal
equations are
Ây = na + bÂx + cÂx2 ...(1)

Âxy = aÂx + bÂx2 + cÂx3 ...(2)

Âx2y = aÂx2 + bÂx3 + cÂx4 ...(3)


Here, n = 5

x y x2 x3 x4 xy x2y
0 1 0 0 0 0 0
1 1.8 1 1 1 1.8 1.8
2 1.3 4 8 16 2.6 5.2
3 2.5 9 27 81 7.5 22.5
4 6.3 16 64 256 25.2 100.8
Âx = 10 Ây = 12.9 Âx2 = 30 Âx3 = 100 Âx4 = 354 Âxy = 37.1 Âx2y = 130.3

Substituting these values in Eqs (1), (2), and (3),


12.9 = 5a + 10b + 30 c ...(4)
37.1 = 10a + 30b + 100c ...(5)
130.3 = 30a + 100b + 354c ...(6)
Solving Eqs (4), (5), and (6),
a = 1.42
b = –1.07
c = 0.55
Hence, the required equation of the least squares quadratic curve is
y = 1.42 – 1.07 x + 0.55 x2

Example 3
By the method of least squares, fit a parabola to the following data:
x 1 2 3 4 5

y 5 12 26 60 97

Also, estimate y at x = 6.
Solution
Let the equation of the parabola be y = a + bx + cx2. The normal equations are
Ây = na + bÂx + cÂx2 ...(1)
7.14 Chapter 7 Curve Fitting

Âxy = aÂx + bÂx2 + cÂx3 ...(2)

Âx2y = aÂx2 + bÂx3 + cÂx4 ...(3)


Here, n = 5

x y x2 x3 x4 xy x2y
1 5 1 1 1 5 5
2 12 4 8 16 24 48
3 26 9 27 81 78 234
4 60 16 64 256 240 960
5 97 25 125 625 485 2425
Âx = 15 Ây = 200 Âx = 55
2
Âx = 225 Âx = 979 Âxy = 832
3 4
Âx y = 3672
2

Substituting these values in Eqs (1), (2), and (3),


200 = 5a + 15b + 55 c ...(4)
832 = 15a + 55b + 225c ...(5)
3672 = 55a + 225b + 979c ...(6)
Solving Eqs (4), (5), and (6),
a = 10.4
b = –11.0857
c = 5.7143
Hence, the required equation of the parabola is
y = 10.4 – 11.0857 x + 5.7143 x2
y(6) = 10.4 – 11.0857(6) + 5.7143(6)2 = 149.6006

Example 4
Fit a second-degree parabolic curve to the following data.
x 1 2 3 4 5 6 7 8 9
y 2 6 7 8 10 11 11 10 9

Solution
Let X = x-5
Y = y -10
Let the equation of the parabola be Y = a + bX + cX 2 .
The normal equations are

 Y = na + b X + c X 2 …(1)

 XY = a  X + b X 2 + c  X 3 …(2)
7.4 Fitting of Quadratic Curves 7.15

 X 2Y = a  X 2 + b  X 3 + c  X 4 …(3)
Here, n = 9
x y X Y X2 X3 X4 XY X 2Y
1 2 −4 −8 16 −64 256 32 −128
2 6 −3 −4 9 −27 81 12 −36
3 7 −2 −3 4 −8 16 6 −12
4 8 −1 −2 1 −1 1 2 −2
5 10 0 0 0    0 0 0    0
6 11 1 1 1 1 1 1    1
7 11 2 1 4 8 16 2    4
8 10 3 0 9 27 81 0    0
9 9 4 −1 16 64 256 −4 −16

SX = 0 SY = -16 SX 2 = 60 SX 3 = 0 SX 4 = 708 SXY = 51 SX 2Y = -189

Substituting these values in Eqs (1), (2), and (3),


-16 = 9a + 60c …(4)
51 = 60b …(5)
-189 = 60 a + 708c  …(6)
Solving Eqs (4), (5), and (6),
a = 0.0043
b = 0.85
c = − 0.2673
Hence, the required equation of the parabola is
Y = 0.0043 + 0.85 X - 0.2673 X 2

y - 10 = 0.0043 + 0.85( x - 5) - 0.2673( x - 5)2


y = 10 + 0.0043 + 0.85( x - 5) - 0.2673( x 2 - 10 x + 25)
= 10 + 0.0043 + 0.85 x - 4.25 - 0.2673 x 2 + 2.673 x - 6.6825
= - 0.9282 + 3.523 x - 0.2673 x 2

Note Since Âx, Ây, Âx2, Âx3, Âx4, Âxy, Âx2y can be directly obtained with the help
of scientific calculator, the problem can be solved without shifting the origin.

Example 5
Fit a second-degree parabola y = a + bx2 to the following data:
x 1 2 3 4 5
y 1.8 5.1 8.9 14.1 19.8
7.16 Chapter 7 Curve Fitting

Solution
Let the curve to be fitted to the data be
y = a + bx2
The normal equations are

 y = na + b  x 2  ...(1)

 x 2 y = a  x 2 + b x 4  ...(2)
Here, n = 5
x y x2 x4 x2y
1 1.8 1 1 1.8
2 5.1 4 16 20.4
3 8.9 9 81 80.1
4 14.1 16 256 225.6
5 19.8 25 625 495
Ây = 49.7 Âx = 55
2
Âx = 979
4
Âx y = 822.9
2

Substituting these values in Eqs (1) and (2),


  49.7 = 5a + 55b ...(3)
822.9 = 55a + 979 b ...(4)
Solving Eqs (3) and (4),
a = 1.8165
b = 0.7385
Hence, the required equation of the curve is
y = 1.8165 + 0.7385 x2

Example 6
Fit a curve y = ax + bx 2 for the following data:
x 1 2 3 4 5 6
y 2.51 5.82 9.93 14.84 20.55 27.06

Solution
Let the curve to be fitted to the data be
y = ax + bx 2
The normal equations are

 xy = a x 2 + b x3 …(1)
7.4 Fitting of Quadratic Curves 7.17

 x 2 y = a  x 3 + b x 4 …(2)

x y x2 x3 x4 xy x2y
1 2.51 1 1 1 2.51 2.51
2 5.82 4 8 16 11.64 23.28
3 9.93 9 27 81 29.79 89.37
4 14.84 16 64 256 59.36 237.44
5 20.55 25 125 625 102.75 513.75
6 27.06 36 216 1296 162.36 974.16

Sx 2 = 91 Sx 3 = 441 Sx 4 = 2275 Sxy = 368.41 Sx 2 y = 1840.51

Substituting these values in Eqs (1) and (2),


368 ◊ 41 = 91a + 441b …(3)
1840 ◊ 51 = 441 a + 2275 b …(4)
Solving Eqs (3) and (4),
a = 2.11
b = 0.4
Hence, the required equation of the curve is
2
    y = 2 ◊ 11x + 0 ◊ 4 x

Exercise 7.2
1. Fit a parabola to the following data:

x −2 −1 0 1 2
y 1.0 1.8 1.3 2.5 6.3

[ Ans. : y = 1.48 + 1.13x + 0.55x 2 ]

2
2. Fit a curve y = ax + bx to the following data:

x −2 −1 0 1 2
y −72 −46 −12 35 93

[ Ans. : y = 41.1x + 2.147 x 2 ]


7.18 Chapter 7 Curve Fitting

3. Fit a parabola y = a + bx + cx 2 to the following data:

x 0 2 5 10
y 4 7 6.4 −6

[ Ans. : y = 4.1 + 1.979x - 0.299x 2 ]

4. Fit a curve y = a0 + a1x + a2 x 2 for the given data:

x 3 5 7 9 11 13
y 2 3 4 6 5 8

[ Ans. : y = 0.7897 + 0.4004 x + 0.0089x 2 ]

7.5 Fitting of Exponential and Logarithmic Curves

Let (xi , yi), i = 1, 2, …, n be the set of n values and let the relation between x and y be
y = abx.
Taking logarithm on both the sides of the equation y = abx,
loge y = loge a + x loge b

Putting loge y = Y , loge a = A, x = X, and logeb = B,

Y = A + BX
This is a linear equation in X and Y. The normal equations are
 Y = nA + B X
 XY = A X + B X 2
Solving these equations, A and B, and, hence, a and b can be found. The best fitting
exponential curve is obtained by substituting the values of a and b in the equation
y = abx.
Similarly, the best fitting exponential curves for the relation y = axb and y = aebx can be
obtained.

Example 1
Find the law of the form y = abx to the following data:
x 1 2 3 4 5 6 7 8
y 1 1.2 1.8 2.5 3.6 4.7 6.6 9.1
7.5 Fitting of Exponential and Logarithmic Curves 7.19

Solution
y = ab x
Taking logarithm on both the sides,
loge y = loge a + x loge b

Putting loge y = Y , loge a = A, x = X and loge b = B ,


Y = A + BX
The normal equations are
 Y = nA + B X …(1)

 XY = A X + B X 2 …(2)
Here, n = 8
x y X Y X2 XY
1 1 1 0.0000 1 0.0000
2 1.2 2 0.1823 4 0.3646
3 1.8 3 0.5878 9 1.7634
4 2.5 4 0.9163 16 3.6652
5 3.6 5 1.2809 25 6.4045
6 4.7 6 1.5476 36 9.2856
7 6.6 7 1.8871 49 13.2097
8 9.1 8 2.2083 64 17.6664

 X = 36  Y =8.6103  X 2 = 204  XY = 52.3594


Substituting these values in Eqs (1) and (2),
8.6103 = 8 A + 36 B …(3)
52.3594 = 36 A + 204 B …(4)
Solving Eqs (3) and (4),
A = − 0.3823
B = 0.3241
loge a = A
loge a = - 0.3823
a = 0.6823
loge b = B
loge b = 0.3241
b = 1.3828
7.20 Chapter 7 Curve Fitting

Hence, the required law is


y = 0.6823 (1.3828)x

Example 2
Fit a curve of the form y = abx to the following data by the method of
least squares:
x 1 2 3 4 5 6 7

y 87 97 113 129 202 195 193

Solution
y = abx
Taking logarithm on both the sides,
logey = logea + x logeb
Putting logey = Y, logea = A, x = X and logeb = B,
Y = A + BX
The normal equations are
ÂY = nA + BÂX ...(1)

ÂXY = AÂX + BÂX  2


...(2)
Here, n = 7

x y X Y X2 XY
1 87 1 4.4659 1 4.4659
2 97 2 4.5747 4 9.1494
3 113 3 4.7274 9 14.1822
4 129 4 4.8598 16 19.4392
5 202 5 5.3083 25 26.5415
6 195 6 5.2730 36 31.6380
7 193 7 5.2627 49 36.8389
ÂX = 28 ÂY = 34.4718 ÂX = 140
2
ÂXY = 142.2551
Substituting these values in Eqs (1) and (2),
34.4718 = 7A + 28 B ...(3)
142.2551 = 28 A + 140 B ...(4)
Solving Eqs (3) and (4),
A = 4.3006
B = 0.156
7.5 Fitting of Exponential and Logarithmic Curves 7.21

logea = A
logea = 4.3006
  a = 73.744
logeb = B
logeb = 0.156
  b = 1.1688
Hence, the required curve is
y = 73.744 (1.1688)x

Example 3
Fit a curve of the form y = axb to the following data:
x 20 16 10 11 14
y 22 41 120 89 56

Solution
y = axb
Taking logarithm on both the sides,
loge y = loge a + b loge x

Putting loge y = Y , loge a = A, b = B and loge x = X ,


Y = A + BX
The normal equations are

 Y = nA + B X …(1)
 XY = A X + B X 2 …(2)
Here, n = 5

x y X Y X2 XY
20 22 2.9957 3.0910 8.9742 9.2597
16 41 2.7726 3.7136 7.6873 10.2963
10 120 2.3026 4.7875 5.3019 11.0237
11 89 2.3979 4.4886 5.7499 10.7632
14 56 2.6391 4.0254 6.9648 10.6234

 X = 13.1079  Y =20.1061  X 2 =34.6781  XY =51.9663


Substituting these values in Eqs (1) and (2),
20.1061 = 5A + 13.1079 B …(3)
51.9663 = 13.1079 A + 34.6781 B …(4)
7.22 Chapter 7 Curve Fitting

Solving Eqs (3) and (4),


A = 10.2146
B = -2.3624
loge a = A
loge a = 10.2146
a = 27298 ◊ 8539
and b = B = -2.3624
Hence, the required equation of the curve is
y = 27298.8539 x -2.3624

Example 4
Fit a curve of the form y = aebx to the following data:
x 1 3 5 7 9
y 115 105 95 85 80

Solution
y = aebx
Taking logarithm on both the sides,
loge y = loge a + bx loge e
= loge a + bx

Putting loge y = Y , loge a = A, b = B and x = X ,


Y = A + BX
The normal equations are

 Y = nA + B X …(1)
 X Y = A X + B  X 2 …(2)
Here, n = 5
x y X Y X2 XY
1 115 1 4.7449 1 4.7449
3 105 3 4.6539 9 13.9617
5 95 5 4.5539 25 22.7695
7 85 7 4.4427 49 31.0989
9 80 9 4.3820 81 39.438

 X = 25  Y = 22.7774  X 2 = 165  XY =112.013


7.5 Fitting of Exponential and Logarithmic Curves 7.23

Substituting these values in Eqs (1) and (2),


22.7774 = 5 A + 25 B …(3)
112.013 = 25A + 165B …(4)
Solving Eqs (3) and (4),
A = 4.7897
B = − 0.0469
loge a = A
loge a = 4.7897

a = 120.2653
b = B = - 0.0469
and
Hence, the required equation of the curve is

y = 120.2653 e -0.0469 x

Example 5
bx
Fit the exponential curve y = ae to the following data:
x 0 2 4 6 8
y 150 63 28 12 5.6

 [Summer 2015]
Solution
y = aebx
Taking logarithm on both the sides,
loge y = loge a + bx loge e
= loge a + bx
Putting logey = Y, logea = A, b = B and x = X,
           Y = A + BX
The normal equations are
           Y = nA + b X  ...(1)

         Â XY = AÂ X + BÂ X 2  ...(2)
7.24 Chapter 7 Curve Fitting

Here, n = 5
x y X Y X2 XY
0 150 0 5.0106 0 0
2 63 2 4.1431 4 8.2862
4 28 4 3.3322 16 13.3288
6 12 6 2.4849 36 14.9094
8 5.6 8 1.7228 64 13.7824

 X = 20  Y = 16.6936  X 2 = 120  XY = 50.3068

Substituting these values in Eqs (1) and (2),


16.6936 = 5 A + 20 B ...(3)
50.3068 = 20 A + 120 B ...(4)
Solving Eqs (3) and (4),
A = 4.9855
B = –0.4117
logea = A
logea = 4.9855
a = 146.28
and b = B = –0.4117
Hence, the required equation of the curve is
y = 146.28 e–0.4117 x

Example 6
The pressure and volume of a gas are related by the equation PVg = c.
Fit this curve to the following data:
P 0.5 1.0 1.5 2.0 2.5 3.0

V 1.62 1.00 0.75 0.62 0.52 0.46

Solution
PVg = c

Taking logarithm on both the sides,


loge P + g loge V = loge c
1 1
loge V = loge c - loge P
g g

1 1
Putting loge V = y, loge c = a, loge P = x, - = b,
g g
y = a + bx
7.5 Fitting of Exponential and Logarithmic Curves 7.25

The normal equations are


Ây = na + bÂx
Âxy = aÂx + bÂx2
Here, n = 6

P V x y x2 xy
0.5 1.62 –0.6931 0.4824 0.4804 –0.3343
1.0 1.00 0 0 0 0
1.5 0.75 0.4055 –0.2877 0.1644 –0.1166
2.0 0.62 0.6931 –0.4780 0.4804 –0.3313
2.5 0.52 0.9163 –0.6539 0.8396 –0.5992
3.0 0.46 1.0986 –0.7765 1.2069 –0.8531
Âx = 2.4204 Ây = –1.7137 Âx = 3.1717
2
Âxy = –2.2345

Substituting these values in Eqs (1) and (2),


–1.7137 = 6a + 2.4204 b ...(3)
–2.2345 = 2.4204a + 3.1717 b ...(4)
Solving Eqs (3) and (4),
a = –0.002
b = –0.7029
1
- =b
g
g = 1.4227
1
loge c = a
g
1
loge c = -0.002
1.4227
c = 0.9972
Hence, the required equation of the curve is
PV (1.4227) = 0.9972

Exercise 7.3
1. Fit the curve y = ab to the following data:
x

x 2 3 4 5 6
y 144 172.3 207.4 248.8 298.5
[Ans.: y = 100 (1.2)x]
7.26 Chapter 7 Curve Fitting

2. Fit the curve y = ae bx to the following data:


x 0 2 4
y 5.012 10 31.62
[Ans.: y = 4.642e0.46x]
3. Fit the curve y = ax b to the following data:

x 1 2 3 4
y 2.50 8.00 19.00 50.00
[Ans.: y = 2.227x2.09]
4. Estimate g by fitting the ideal gas law PV g = c to the following data:
P 16.6 39.7 78.5 115.5 195.3 546.1
V 50 30 20 15 10 5
[Ans.: g = 1.504]
Appendix
Standard Normal Distribution Table

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3990 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4115 0.4131 0.4147 0.4162
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
A.2 Appendix

t-Distribution Table

Significance level = a
Degrees of
.005 (1-tail) .01 (1-tail) .025 (1-tail) .05 (1-tail) .10 (1-tail) .25 (1-tail)
Freedom
.01 (2-tails) .02 (2-tails) .05 (2-tails) .10 (2-tails) .20 (2-tails) .50 (2-tails)
(n)
1 63.657 31.821 12.706 6.314 3.078 1.000
2 9.925 6.965 4.303 2.920 1.886 0.816
3 5.841 4.541 3.182 2.353 1.638 0.765
4 4.604 3.747 2.776 2.132 1.533 0.741
5 4.032 3.365 2.571 2.015 1.476 0.727
6 3.707 3.143 2.447 1.943 1.440 0.718
7 3.500 2.998 2.365 1.895 1.415 0.711
8 3.355 2.896 2.306 1.860 1.397 0.706
9 3.250 2.821 2.262 1.833 1.383 0.703
10 3.169 2.764 2.228 1.812 1.372 0.700
11 3.106 2.718 2.201 1.796 1.363 0.697
12 3.054 2.681 2.179 1.782 1.356 0.696
13 3.012 2.650 2.160 1.771 1.350 0.694
14 2.977 2.625 2.145 1.761 1.345 0.692
15 2.947 2.602 2.132 1.753 1.341 0.691
16 2.921 2.584 2.120 1.746 1.337 0.690
17 2.898 2.567 2.110 1.740 1.333 0.689
18 2.878 2.552 2.101 1.734 1.330 0.688
19 2.861 2.540 2.093 1.729 1.328 0.688
20 2.845 2.528 2.086 1.725 1.325 0.687
21 2.831 2.518 2.080 1.721 1.323 0.686
22 2.819 2.508 2.074 1.717 1.321 0.686
23 2.807 2.500 2.069 1.714 1.320 0.685
24 2.797 2.492 2.064 1.711 1.318 0.685
25 2.878 2.485 2.060 1.708 1.316 0.684
26 2.779 2.479 2.056 1.706 1.315 0.684
27 2.771 2.473 2.052 1.703 1.314 0.684
28 2.763 2.467 2.048 1.701 1.313 0.683
29 2.756 2.462 2.045 1.699 1.311 0.683
Large 2.575 2.327 1.960 1.645 1.282 0.675
Appendix A.3

Chi-Square Distribution Table

n 0.995 0.990 0.975 0.950 0.900 0.10 0.05 0.025 0.010 0.005
1 0.000039 0.00016 0.00098 0.0039 0.0158 2.71 3.84 5.02 6.63 7.88
2 0.0100 0.0201 0.0506 0.103 0.211 4.61 5.99 7.38 9.21 10.60
3 0.0717 0.115 0.216 0.352 0.584 6.25 7.81 9.35 11.34 12.84
4 0.207 0.297 0.484 0.711 1.06 7.78 9.49 11.14 13.28 14.86
5 0.412 0.554 0.831 1.15 1.61 9.24 11.07 12.83 15.09 16.75
6 0.676 0.872 1.24 1.64 2.20 10.64 12.59 14.45 16.81 18.55
7 0.989 1.24 1.69 2.17 2.83 12.02 14.07 16.01 18.48 20.28
8 1.34 1.65 2.18 2.73 3.49 13.36 15.51 17.53 20.09 21.96
9 1.73 2.09 2.70 3.33 4.17 14.68 16.92 19.02 21.67 23.59
10 2.16 2.56 3.25 3.94 4.87 15.99 18.31 20.48 23.21 25.19
11 2.60 3.05 3.82 4.57 5.58 17.28 19.68 21.92 24.73 26.76
12 3.07 3.57 4.40 5.23 6.30 18.55 21.03 23.34 26.22 28.30
13 3.57 4.11 5.01 5.89 7.04 19.81 22.36 24.74 27.69 29.82
14 4.07 4.66 5.63 6.57 7.79 21.06 23.68 26.12 29.14 31.32
15 4.60 5.23 6.26 7.26 8.55 22.31 25.00 27.49 30.58 32.80
16 5.14 5.81 6.91 7.96 9.31 23.54 26.30 28.85 32.00 34.27
17 5.70 6.41 7.56 8.67 10.08 24.77 27.59 30.19 33.41 35.72
18 6.26 7.01 8.23 9.39 10.86 25.99 28.87 31.5.3 34.81 37.16
19 6.84 7.63 8.91 10.12 11.65 27.20 30.14 32.85 36.19 38.58
20 7.43 8.26 9.59 10.85 12.44 28.41 31.41 34.17 37.57 40.00
21 8.03 8.90 10.28 11.59 13.24 29.62 32.67 35.48 38.93 41.40
22 8.64 9.54 10.98 12.34 14.04 30.81 33.92 36.78 40.29 42.80
23 9.26 10.20 11.69 13.09 14.85 32.01 35.17 38.08 41.64 44.18
24 9.89 10.86 12.40 13.85 15.66 33.20 36.42 39.36 42.98 45.56
25 10.52 11.52 13.12 14.61 16.47 34.38 37.65 40.65 44.31 46.93
26 11.16 12.20 13.84 13.38 17.29 35.56 38.88 41.92 45.64 48.29
27 11.81 12.88 14.57 16.15 18.11 36.74 40.11 43.19 46.96 49.64
28 12.46 13.56 15.31 16.93 18.94 37.92 41.34 44.46 48.28 50.99
29 13.12 14.26 16.05 17.71 19.77 39.09 42.56 45.72 49.59 52.34
30 13.79 14.95 16.79 18.49 20.60 40.26 43.77 46.98 50.89 53.67
40 20.71 22.16 24.43 26.51 29.05 51.81 55.76 59.34 63.69 66.77
60 35.53 37.48 40.48 43.19 46.46 74.40 79.08 83.30 88.38 91.95
120 83.85 86.92 91.58 95.70 100.62 140.23 146.57 152.21 158.95 163.65
A.4 Appendix

F-Distribution Table

n1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 16
n2
1 161 200 216 225 230 234 237 239 241 242 243 244 245 245 246

2 18.5 19.0 19.2 19.2 19.3 19.3 19.4 19.4 19.4 19.4 19.4 19.4 19.4 19.4 19.4
3 10.1 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79 8.76 8.74 8.73 8.71 8.69
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.94 5.91 5.89 5.87 5.84
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.73 4.70 4.68 4.66 4.64 4.60
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 4.03 4.00 3.98 3.96 3.92
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64 3.60 3.57 3.55 3.53 3.49
8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35 3.31 3.28 3.26 3.24 3.20
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14 3.10 3.07 3.05 3.03 2.99
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98 2.94 2.91 2.89 2.86 2.83
11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85 2.82 2.79 2.76 2.74 2.70
12 4.75 3.89 3.49 3.25 3.11 3.00 2.91 2.85 2.80 2.75 2.72 2.69 2.66 2.64 2.60
13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67 2.63 2.60 2.58 2.55 2.51
14 4.60 3.74 3.35 3.11 2.96 2.85 2.76 2.70 2.65 2.60 2.57 2.53 2.51 2.48 2.44
16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49 2.46 2.42 2.40 2.37 2.33
18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41 2.37 2.34 2.31 2.29 2.25
20 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35 2.31 2.28 2.25 2.22 2.18
22 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.30 2.26 2.23 2.20 2.17 2.13
24 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.25 2.21 2.18 2.15 2.13 2.09
26 4.23 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 2.22 2.18 2.15 2.12 2.09 2.05
28 4.20 3.34 2.95 2.71 2.56 2.45 2.36 2.29 2.24 2.19 2.15 2.12 2.09 2.06 2.02
30 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.16 2.13 2.09 2.06 2.04 1.99
40 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.08 2.04 2.00 1.97 1.95 1.90
50 4.03 3.18 2.79 2.56 2.40 2.29 2.20 2.13 2.07 2.03 1.99 1.95 1.92 1.89 1.85
60 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.99 1.95 1.92 1.89 1.86 1.82
80 3.96 3.11 2.72 2.49 2.33 2.21 2.13 2.06 2.00 1.95 1.91 1.88 1.84 1.82 1.77
100 3.94 3.09 2.70 2.46 2.31 2.19 2.10 2.03 1.97 1.93 1.89 1.85 1.82 1.79 1.75
200 3.89 3.04 2.65 2.42 2.26 2.14 2.06 1.98 1.93 1.88 1.84 1.80 1.77 1.74 1.69
500 3.86 3.01 2.62 2.39 2.23 2.12 2.03 1.96 1.90 1.85 1.81 1.77 1.74 1.71 1.66
• 3.84 3.00 2.60 2.37 2.21 2.10 2.01 1.94 1.88 1.83 1.79 1.75 1.72 1.69 1.64
Index
A Constants of the Poisson Distribution 5.29
Continuous Distribution Function 2.18
Additive Law of Probability (Addition Continuous Random Variable 2.2
Theorem) 1.15 Correlation 4.2
Alternative Hypothesis 6.2 Correlation Coefficient 6.2
Applications of t-distribution 6.37 Critical Region 6.3
Assumptions for t-test 6.37 Cumulative Distribution Function 2.4, 2.41,
Axiomatic Definition of Probability 1.3 2.57
Cumulative Probability Distribution 2.4
B
D
Bayes’ Theorem 1.47
Binomial Distribution 5.2 De Morgan’s Laws 1.14
Binomial Frequency Distribution 5.4 Definition of Probability 1.4
Bivariate Data 4.2 Definitions of Probability 1.3
Bounds on Probabilities 3.84 Deviation 7.2
Discrete Distribution Function 2.4
C Discrete Probability Distribution 2.3
Discrete Random Variables 2.2
Central Moment 3.18
Central Moments or Moments about Actual E
Mean 3.18
Chebyshev’s Inequality 3.84 Empirical or Statistical Definition of
Chi-square Test for Independence of Probability 1.3
Attributes 6.74 Equally Likely Events 1.2
Chi-square Test: Goodness of Fit 6.66 Errors in Hypothesis Testing 6.3
Chi-square Distribution 6.65 Event 1.2, 1.4
Chi-square (c2) Test 6.65 Examples of Binomial Distribution 5.2
Classical Definition of Probability 1.3 Examples of Poisson Distribution 5.29
Coefficient of Variation Exhaustive Event 1.2
Expectation 3.2
Conditional Expectation and Conditional
Expected Values of Two Dimensional
Variance 3.69
Random Variables 3.68
Conditional Probability Function 2.42, 2.57
Exponential Distribution 5.79
Conditional Probability Theorem 1.25
Expressions for Regression Coefficients
Conditions for Binomial Distribution 5.2
4.32
Conditions of Poisson Distributions 5.29
Confidence Limits 6.5
F
Constants of the Binomial Distribution 5.2
Constants of the Exponential Favourable Events 1.3
Distribution 5.80 Fitting a Normal Distribution 5.75
Constants of the Gamma Distribution 5.97 Fitting of Exponential and Logarithmic
Constants of the Normal Distribution 5.54 Curves 7.18
I.2 Index

Fitting of Linear Curves 7.2 Measures of Central Tendency 3.2


Fitting of Quadratic Curves 7.10 Measures of Dispersion 3.3
Measures of Skewness 3.25
G Median 3.2, 3.33
Median of the Exponential Distribution 5.81
Gamma Distribution 5.96 Median of the Normal Distribution 5.56
Geometric Mean 3.3, 3.33 Memoryless Property of the Exponential
Distribution 5.79
H Merits of a Scatter Diagram 4.4
Harmonic Mean 3.3, 3.33 Method of Least Squares 4.31
High Degree of Negative Correlation 4.4 Method of Scatter Diagram 4.30
High Degree of Positive Correlation 4.4 Methods of Studying Correlation 4.3
Methods of Studying Regression 4.30
I Mode 3.2, 3.33
Mode of the Binomial Distribution 5.3
Independence of Attributes 6.74 Mode of the Exponential Distribution 5.81
Independent Events 1.2 Mode of the Gamma Distribution 5.98
Mode of the Normal Distribution 5.56
J Mode of the Poisson Distribution 5.30
Moments 3.18, 3.34
Joint Probability Density Function 2.57 Moments about Arbitrary Origin 3.19
Joint Probability Mass Function 2.41 Multiple Correlation 4.3
Multiple Regression 4.30
K Multiplicative Theorem for Independent
Karl Pearson’s Coefficient of Correlation 4.5 Events 1.25
Karl Pearson’s Coefficient of Skewness 3.26 Mutual Independence 1.26
Kurtosis 3.26, 3.34 Mutually Exclusive Events 1.2

L N
Least Square Method 7.2 Negative Correlation 4.2
Left Tailed Test 6.4 No Correlation 4.4
Level of Significance 6.3 Nonlinear Correlation 4.3
Line of Regression of x on y 4.31 Nonlinear Regression 4.30
Line of Regression of y on x 4.31 Normal Distribution 5.53
Linear Correlation 4.3 Normal Equations 7.3
Linear Regression 4.30 Null Hypothesis 6.2

M O
Marginal Probability Function 2.42, 2.57 Outcome 1.2
Mean 3.2, 3.32
Mean Deviation 3.3, 3.33 P
Mean of the Binomial Distribution 5.2 Pairwise Independence 1.26
Mean of the Exponential Distribution 5.80 Parameters 6.2
Mean of the Gamma Distribution 5.97 Partial Correlation 4.3
Mean of the Normal Distribution 5.54 Perfect Negative Correlation 4.4
Mean of the Poisson Distribution 5.29 Perfect Positive Correlation 4.4
Measures for Continuous Random Poisson Approximation to the Binomial
Variables 3.32 Distribution 5.28
Index I.3

Poisson Distribution 5.27 Sample Space 1.3


Population Proportion 6.2 Sampling Distribution 6.2
Positive Correlation 4.2 Scatter Diagram 4.4, 7.1
Probability 1.1 Simple Correlation 4.3
Probability Density Function 2.18 Simple Graph 4.5
Probability Function 2.3 Simple Regression 4.30
Probability Mass Function 2.3 Skewness 3.25, 3.34
Probability of a Normal Random Variable in Small Sample Tests 6.36
an Interval 5.58 Snedecor’s F-test for Ratio of
Procedure for Testing of Hypothesis 6.5 Variances 6.55
Properties of Central Moments 3.18 Spearman’s Rank Correlation
Properties of Coefficient of Correlation 4.6 Coefficient 4.22
Properties of Conditional Expectation 3.70 Standard Deviation 3.4, 3.34
Properties of Distribution Function 2.18 Standard Deviation of the Binomial
Properties of Expected Values of Two Distribution 5.3
Dimensional Random Variables 3.69 Standard Deviation of the Exponential
Properties of F-distribution 6.56 Distribution 5.81
Properties of Joint Probability Density Standard Deviation of the Gamma
Function 2.57 Distribution 5.98
Properties of Lines of Regression 4.35 Standard Deviation of the Normal
Properties of Probability Density Distribution 5.55
Function 2.18 Standard Deviation of the Poisson
Properties of Regression Coefficients 4.34 Distribution 5.30
Properties of c2-Distribution 6.66 Standard Error 6.2
Properties of cdf 2.42 Standard Normal Distribution 5.58
Properties of t-distribution 6.37 Standard Normal Variate 5.58
Properties of the Normal Distribution 5.53 Statistic 6.2
Student’s t-distribution 6.36
Q
T
Quartile Deviation 3.3, 3.33
t-test: Test of Significance for Correlation
R Coefficients 6.51
t-test: Test of Significance for Difference of
Random Experiment 1.1 Means 6.42
Random Variables 2.2 t-test: Test of Significance for Single
Rank Correlation 4.22 Mean 6.37
Raw Moments 3.19 Test Statistic 6.2
Recurrence Relation for a Poisson Test of Significance 6.56
Distribution 5.31 Test of Significance for Difference of
Recurrence Relation for the Binomial Means – Large Samples 6.26
Distribution 5.4 Test of Significance for Difference of
Regression 4.29 Proportions – Large Samples 6.13
Regression Coefficients 4.31 Test of Significance for Difference of
Relation between Central Moments and Raw Standard Deviations – Large
Moments 3.20 Samples 6.31
Right Tailed Test 6.4 Test of Significance for Large Samples 6.6
Test of Significance for Single Mean – Large
S Samples 6.21
Sample Proportion 6.2 Test of Significance for Single Proportion –
Large Samples 6.8
I.4 Index

Tests of Hypothesis 6.2 U


Theorems on Probability 1.13
Tied Ranks 4.27 Uses of Normal Distribution 5.62
Total Correlation 4.3
Trial 1.2 V
Two Tailed Test and One Tailed Test 6.3
Variance 3.4, 3.34
Two-Dimensional Continuous Random
Variance of the Binomial Distribution 5.3
Variables 2.56
Variance of the Exponential
Two-Dimensional Discrete Random
Distribution 5.80
Variables 2.41
Variance of the Gamma Distribution 5.97
Type I Error 6.3
Variance of the Normal Distribution 5.54
Type II Error 6.3
Variance of the Poisson Distribution 5.30
Types of Correlations 4.2
Types of Random Variables 2.2
Types of Regression 4.30
Y
Yate’s Correction 6.75

You might also like