Introductory Probability Theory PDF
Introductory Probability Theory PDF
Introductory Probability Theory PDF
INTRODUCTORY
PROBABILITY THEORY
A FIRST COURSE IN PROBABILITY
THEORY – VOLUME I
2
Introductory Probability Theory: A first Course in Probability Theory – Volume I
2nd edition
© 2018 Nicholas N.N. Nsowah-Nuamah & bookboon.com
ISBN 978-87-403-1969-9
Peer review by Prof. Bashiru I.I. Saeed
3
INTRODUCTORY
PROBABILITY THEORY
CONTENTS
Preface to the second edition 8
Acknowledgement 11
Greek alphabets 13
1 Set Theory 16
1.1 Introduction 16
1.2 Sets 17
1.3 Set Operations 28
1.4 Classes Of Sets 33
1.5 Laws Of Algebra Of Sets 35
1.6 Venn Diagrams 38
Free eBook on
Learning & Development
By the Chief Learning Officer of McKinsey
Download Now
4
INTRODUCTORY
PROBABILITY THEORY
2 Counting principles 46
2.1 Introduction 46
2.2 Sampling with or without replacement 46
2.3 Addition and multiplication principles of counting 47
2.4 Permutations: ordered selection 52
2.5 Combinations: unordered selection 58
2.6 Binomial theorem 65
2.7 Multinomial theorem 76
5
INTRODUCTORY
PROBABILITY THEORY
Bibliography 287
Endnotes 292
Index 294
6
INTRODUCTORY
PROBABILITY THEORY Dedication
DEDICATION
To all students, both past and present, in the Department of Statistics and Institute of
Statistical, Social and Economic Research (I.S.S.E.R) of the University of Ghana, without
whose help and encouragement this book would never have been a reality.
7
INTRODUCTORY
PROBABILITY THEORY Preface to the second edition
Volume I deals with the basic mathematical tools for the understanding of probability,
basic probability concepts, probability calculus, laws and theorems in probability, random
variables and probability distributions. Determination of central and non-central location
of distributions as well as their spread are extensively discussed. The volume concludes with
discussion of moments and moment-generating functions.
The extension of univariate random variable to multivariate random variables with emphasis
on Bivariate Distributions, and also Statistical Inequalities, Limit Laws and Sampling
Distributions are in another e-book entitled “Advanced Topics in Basic Probability Theory”.
While the typesetting was done by myself, the images were produced by Mr Joshua
Appiah, a former staff of mine at Kumasi Technical University, where I was their Interim
Vice Chancellor.
N.N.N. Nsowah-Nuamah
Accra, Ghana
August, 2017
8
INTRODUCTORY
PROBABILITY THEORY Preface to the first edition
This text has been written in a systematic way such that the student needs only to be introduced
to the topic for him/her to understand the specified chapter. In view of this the rigour of
other books has been avoided. I have also tried very hard to use notation that is at once
consistent and intuitive. The only mathematical prerequisite in this Volume is a knowledge
of elementary course in Calculus. An exposure to differentiation, integration, series and some
ideas of convergence should be sufficient. Within the chapters, important concepts are set
out as definitions (enclosed in double boxes) while the main theoretical results of concepts
and meanings are set out as theorems (enclosed in single boxes). Results which follow almost
immediately from theorems with very little additional argument are stated as corollaries. Even
though the book does not provide a formal rigorous treatment, at times we adopt a “theorem-
proof ” format. Proofs have been provided to almost all theorems and corollaries stated. We
implore the reader to clearly understand the definitions and implications of the theorems
and corollaries although it is not always necessary to understand the proof of each theorem.
Many instructors and students will prefer to skim through these proofs, especially in a first
course, although we hope that most will choose not to omit them completely. The notes, a
common feature of the book, are an important part in three respects. Firstly, they illuminate
points made in the text. Secondly, they present material that does not quite fit anywhere else
and thirdly, they discuss details whose exposition would disturb the flow of the argument.
In this Volume more than 200 worked examples have been provided and it is essential that
the student goes through all of them. Probability theory requires a lot of practice from
the student. I have therefore provided a set of questions at the end of each chapter. These
exercises, which are more than 300, form an integral part of the text and it is essential that
the student goes through many of them.
9
INTRODUCTORY
PROBABILITY THEORY Preface to the first edition
The Undergraduate programmes in Statistics may vary among countries and even among
departments in the same country. Besides, the ability of students to grasp probability theory
may be diverse from year to year. In view of this, there is purposely more material here than
may be required. This will give the instructor the opportunity to choose topics to satisfy
his/her own needs and taste.
Now a brief tour through this Volume. In Chapter 1 a brief account of Set Theory is given.
Chapter 2 describes a number of counting tricks that are very useful in solving probability
problems. These first two chapters do not really discuss probability at all. Instead, they
present the essential mathematical tools that are needed in the rest of the book. Although
you may have dealt with these concepts in high school, it is important that these two
chapters be utilised (at least by the independent reader) to assure a sound base for the
applications in probability. Together with this, a detailed treatment of basic concepts in
probability theory, such as experiments, sample spaces, and events in Chapter 3 make the
book self-contained both as a reference and a text. The materials in Chapters 4–12 form
the heart of any introductory course in probability theory. Chapters 4, 5 and 6 give the
basic theory of probability while Chapters 7 to 12 treat special probability distributions.
We shall go through the basic content of Volume II. Chapters 1–4 extend the concept of
a univariate random variable to multivariate random variables, even though emphasis is on
Bivariate Distributions. Chapter 1 discusses the Joint, Marginal and Conditional Probability
Distributions, Joint Cumulative Distribution Functions and Independence of Bivariate
Random Variables. Chapter 2 is on Sums, Differences, Products and Quotients of Bivariate
Random Variables. Chapter 3 treats Expectation and Variance of Random Variables and
their properties. Chapter 4 discusses various forms of Moments of bivariate distributions,
and goes on to discuss Covariance, Correlation, Conditional Expectations and Regression
Curves. Chapter 5 is on Statistical Inequalities (Markov’s and Chebyshev’s Inequalities in
their various forms) and Limit Laws (the Law of Large Numbers and the Central Limit
Theorem, also in their various forms). Sampling Distributions are discussed in Chapters
6 (Basic Concepts) and 7 (Sampling Distributions of a Statistic). We conclude Volume II
with the Distributions derived from the Normal Distribution, namely, the χ2, the t and
the F Distributions.
We believe that the two Volumes of the book may also be used as a revision text for students
starting their Master’s programme.
N.N.N. Nsowah-Nuamah
Accra, Ghana
November, 1997
10
INTRODUCTORY
PROBABILITY THEORY Acknowledgement
ACKNOWLEDGEMENT
First and foremost, I express my sincere thanks to the Almighty God both for the gift of
writing He has endowed me with and for the stamina I had to complete this particular book.
It is possible that a book as detailed and technical as this one may not be free from errors.
If there are few errors it was because l had the assistance of a great many people in reading
the preliminary drafts.
I owe special thanks to two anonymous reviewers who spent substantial time reviewing the
manuscript and who provided numerous helpful comments. On the quality of this text, I
owe so much to the Swedish Agency for Research Cooperation with Developing Countries
(SAREC) for financial support during my visit at the International Centre for Theoretical
Physics. I am also very grateful to UNESCO, the International Atomic Energy Agency and
the International Center for Theoretical Physics in two respects. Firstly, they provided me
conducive conditions including an office space, and a good library and computer facilities
which accorded me the opportunity to reassess the draft of the manuscript and prepare
the camera-ready copy for the publishers. Secondly, they provided an opportunity for me
to meet other scientists who commented on the manuscript. Prominent among them was
Professor A.A.K. Majumdar of Mathematics Department, Jahangirnagar University in
Bangladesh, who critically read the manuscript at its final stage and made constructive
suggestions in both substance and style. Most of his and Dr. Atsem’s comments have given
this book its current shape. I am grateful to the University of Ghana for permission to use
examination questions from the Department of Statistics and I.S.S.E.R. I also acknowledge
that some of the questions in the exercises have been taken from some of the books listed
in the bibliography.
11
INTRODUCTORY
PROBABILITY THEORY Acknowledgement
The book was typeset using Latex whose authors I would like to thank for a superior
product. I did the typesetting myself but I do not know how I could have completed it
on time if David Mensah had not aided me. I say thank you to Prof. G.O.S Ekhaguere of
Ibadan University who showed me a few more tricks in LaTeX when he visited Ghana. I
also acknowledge Ato Kwamena Otu (who I call Leftie) for sketching some of the diagrams
in the text (especially the Normal curves) which I could not immediately use the computer
to draw.
And finally, I appreciate very much the care, cooperation and attention devoted to the
preparation of the book by the staff of Ghana Universities Press, especially the Director,
Mr. K.M. Ganu, and Mr. V.K. Boadu, who personally went through the scripts to check
for consistencies and typographical errors.
12
INTRODUCTORY
PROBABILITY THEORY Greek alphabets
GREEK ALPHABETS
Greek Alphabets
A α alpha N ν nu
B β beta Ξ ξ xi
Γ γ gamma O o omicron
∆ δ delta Π π pi
E epsilon P ρ rho
Z ζ zeta Σ σ sigma
H η eta T τ tau
Θ θ theta Υ υ upsilon
I ι iota Φ φ, ϕ phi
K κ kappa X χ chi
Λ λ lambda Ψ ψ psi
M µ mu Ω ω omega
13
INTRODUCTORY
PROBABILITY THEORY List of mathematical notations and conventions
= equal to
= not equal to
< is less than
≤ less than or equal to
> greater than
≥ greater than or equal to
≈ approximately equal to
∼ asymptotically equal to (or distributed as)
± plus or minus
∅ the empty (null) set
∈ is a member of (belongs to)
∈ is not a member of (does not belongs to)
⊆ is a subset of (is contained in)
⊆ is not a subset of (is not contain in)
⊇ contains
⊇ does not contain
⊂ is a proper subset of
⊂ is not a proper subset of
⇒ implies
⇒ does not imply
⇔ equivalent to
∪ the union of
∩ the intersection of
Ac , A , A A complement
A\B, A − B difference of A and B
A∆B symmetric difference of A and B
P(A) power set of A
n! n − factorial
n
, n Cr , binomial coefficient (n combination r)
r
n Pr , n permutation r
Σ sum of
Π product of
n(·), #(·) number of elements in
∞ infinity
14
INTRODUCTORY
PROBABILITY THEORY List of mathematical notations and conventions
P (A) probability of A
P (A|B) conditional probability of A given B
p(xi ) probability mass function
f (x) probability density function
F (x) cumulative distribution function
E(X) expectation of X
Var(X), σ 2 variance of X
σ standard deviation
Cov(X, Y ) covariance of X and Y
Corr(X, Y ) correlation of X and Y
MX (t) moment-generating function of X
a3 skewness
a4 kurtosis
b(x; n, p) Binomial probability distribution
B(x; n, p) Binomial distribution function
G(x; p) Geometric probability distribution
b− (x; n, p) Negative binomial probability distribution
p(x; λ) Poisson probability distribution
P (x; λ) Poisson distribution function
h(x; N, n, M ) Hypergeometric probability distribution
H(x; N, n, M ) Hypergeometric distribution function
M (xi ; n, pi ) Multinomial probability distribution
U (a, b) Uniform probability distribution
Exp(x; λ) Exponential probability distribution
Γ(α) Gamma function
Γx (α) Incomplete gamma function
Gamma(x; α, β) General Gamma probability distribution
B(α, β) Beta function
Bx (α, β) Incomplete beta function
Beta(x; α; β) Beta probability distribution
N (µ, σ) General Normal probability distribution
N (0, 1) Standard Normal probability distribution
p
→ converges in probability to
a.s
→ converges almost surely to
d
→ converges in distribution to
w.p.1 converges with probability 1
15
INTRODUCTORY
PROBABILITY THEORY SET THEORY
1 SET THEORY
1.1 INTRODUCTION
In everyday language we often hear expressions such as a class of children, a fleet of cars, a
crate of soft drinks, etc. All these express the basic idea of a collection of objects with some
common properties. The idea of a collection of objects is the basic idea behind the notion
of sets. Set theory, in itself, has few direct applications in probability theory. It plays the
same role in present day probability as numbers in previous times.
a) It provides a sufficiently sensitive, precise, and flexible language for the accurate
expression of probability ideas.
b) It helps to clarify ideas that traditionally seem to be confused in probability.
In this chapter, we shall introduce some basic concepts in set theory that will be sufficient
for the understanding of the probability theory presented in this book. Standard symbols
for the more important sets of numbers are desirable and we introduce them first.
p
c) Q = the set of rational or fractional numbers where p and q are any integers
q
and q = 0 . Every rational number when expressed as a decimal will be either a
terminating1 or a repeating decimal2. An irrational number is a number that
cannot be written as a simple fraction because when written as a decimal, the
decimal part is non-terminating, non-repeating decimal. Examples of an irrational
√
number are 2 and π .
d) R = the set of real numbers (collection of rational and irrational numbers).
16
INTRODUCTORY
PROBABILITY THEORY SET THEORY
1.2 SETS
1.2.1 DEFINITION OF SET
“Objects” may be things, people or symbols. “Well-defined” means that there must be no
doubt whatsoever about whether or not a given item belongs to the set under consideration.
“Distinct” is used in the sense that no two identical objects should be contained in the
same set.
Example 1.1
Examples of sets include the following:
1.2.2 ELEMENTS
Example 1.2
If a list of students is compiled, each student is an element of the set of students.
17
INTRODUCTORY
PROBABILITY THEORY SET THEORY
a) by definition;
b) by the roster method and
c) by the property method.
The definition method of describing a set has been illustrated in Example 1.1.
The roster method (or sometimes referred to as the tabular form) specifies a set by actually
listing its elements. The simplest notation for a set is to list its members inside the curly
brackets {}. The braces are an essential part of the notation, since they identify the contents
as a set.
Example 1.3
A is the set of integers between zero and ten. The set A may be written as
A = {1, 2, 3, 4, 5, 6, 7, 8, 9}
or simply as
A = {1, 2, · · · , 9}
Note
a) The order in which the elements in a set are written does not matter. For example,
the set A = {1, 2, 3} is the same as the set B = {2, 1, 3}.
b) Each element of a set is listed only once.
Although the roster method is a common practice, the danger exists that the pattern the
reader sees may not be the one the writer had in mind. Also, sometimes, it is not possible
to use the roster method; for example, the set of all points in a square.
The property method (sometimes called the set-builder form or the set-generator notation)
specifies a set by stating properties which characterize the elements of the set. A property
of membership may be given in words. The following example illustrates the use of the
property method.
18
INTRODUCTORY
PROBABILITY THEORY SET THEORY
Example 1.4
The set A consists of integers between zero and ten. This may be written as: “the set of all
elements x such that x is an integer between zero and ten”.
In a more precise and compact form, this set may be described as follows:
or
Note
Sometimes, a vertical line “ |” may also be used instead of the colon “:”. Thus, the set A in
Example 1.4 could be written as
Example 1.5
a) If A = the set of the months in the year, then:
19
INTRODUCTORY
PROBABILITY THEORY SET THEORY
A set is countable if it is finite or if its elements can be arranged in the form of a sequence, in
which case it is said to be countably infinite or denumerable; otherwise, the set is uncountable
or non-denumerable
Example 1.6
For the information in Example 1.5, determine the sets that are countable, countably infinite
and uncountable.
Solution
Example 1.5(a) is a countable set; Example 1.5(b) is a countable set; Example 1.5(c) is a
countably infinite set; Example 1.5(d) is an uncountable set.
Cardinality of a set
A synonym for cardinality is the cardinal number. The cardinality of, say, set A , is denoted by
the symbol n(A) or |A| or sometimes #A , and read as “the number of elements in set A ”.
Note
There is the cardinality of certain infinite sets but we cannot simply use the symbol ∞
to denote the cardinality of any infinite set. The subject of cardinality of infinite sets is
important and interesting in its own right but we will not pursue it further here.
20
INTRODUCTORY
PROBABILITY THEORY SET THEORY
Example 1.7
The set A = {1, 2, 7, 9} has cardinal number 4, because the set contains four elements,
hence, n(A) = 4.
Singleton Sets
Example 1.8
Consider the set {all months with less than 30 days}. This is {February}. This is a singleton
set. Other examples include {2}, {D}, {} {b}.
www.sylvania.com
We do not reinvent
the wheel we reinvent
light.
Fascinating lighting offers an infinite spectrum of
possibilities: Innovative technologies and new
markets provide both opportunities and challenges.
An environment in which your expertise is in high
demand. Enjoy the supportive working atmosphere
within our global group and benefit from international
career paths. Implement sustainable ideas in close
cooperation with other specialists and contribute to
influencing our future. Come and join us in reinventing
light every day.
Light is OSRAM
21
INTRODUCTORY
PROBABILITY THEORY SET THEORY
Empty Sets
The set which contains no element at all is called the empty set
Synonym for empty set is null set and is denoted by {} or ∅ . Formally, we may define the
empty set as
∅ = {x | x �= x}
Example 1.9
a) The set of coins with three faces is an empty set since no coin has three faces.
b) The set of months with thirty two days is an empty set since there is no such month.
Non-Empty Sets
Example 1.10
The sets {0}, {a}, {15, 6}, {} are all non-empty sets since each contains at least one element.
Equal Sets
Two sets which have exactly the same elements are called equal sets
Example 1.11
If
22
INTRODUCTORY
PROBABILITY THEORY SET THEORY
Unequal Sets
If A and B do not have exactly the same elements we say that A is not equal to B , written as
A �= B
Example 1.12
If
then A �= B because they do not contain exactly the same elements. 4 ∈ A but 4 �∈ B ; also
5 ∈ B but 5 �∈ A.
Note
The set {0} is not an empty set because it has zero as its element.
Equivalent Sets
Two sets A and B are said to be equivalent if they contain the same number of elements
Equivalent sets are denoted by “ ⇔ , that is, if set A is equivalent to set B , we write A ⇔ B .
Example 1.13
If
It is important to remember that all equal sets are equivalent but not all equivalent sets
are equal.
Thus, in Example 1.11, the sets A and B are equal and, therefore, they are also equivalent
sets. However, in Example 1.13, because they contain the same number of elements the two
sets are equivalent but they are not equal, because they do not contain the same elements.
23
INTRODUCTORY
PROBABILITY THEORY SET THEORY
Note
The following statements all say the same thing:
Universal Set
In discussing sets in a particular context, all such sets may be viewed relative to some
particular set called the universal set.
The universal set is the set of all elements relevant to a particular problem or discussion
360°
.
Synonyms for a universal set are the universe of discourse (or simply universe) or the
thinking
population set and it is denoted as U or as E in some texts. In this text, we shall adopt the
former notation.
360°
thinking . 360°
thinking .
Discover the truth at www.deloitte.ca/careers Dis
Discover the truth at www.deloitte.ca/careers © Deloitte & Touche LLP and affiliated entities.
Example 1.14
Examples of universal sets include:
1.2.7 SUBSETS
A⊆B
B⊇A
Example 1.15
If
If at least one element of A does not belong to B , then A is not a subset of B , and in such
a case, we write
A �⊆ B or B �⊇ A
Example 1.16
If
25
INTRODUCTORY
PROBABILITY THEORY SET THEORY
Theorem 1.1
a) ∅ ⊆ A
b) A ⊆ A
c) If A ⊆ ∅ then A = ∅
d) If A ⊆ B and B ⊆ A then A = B
e) If A ⊆ B and B ⊆ C then A ⊆ C
In Theorem 1.1, part (a) implies that the null set is a subset of every set; part (b) means
that every set is a subset of itself and part (d) is another way of stating equality of two sets
A and B .
Example 1.17
Let A = {1, 2, 3, 4} and B = {1, 2, 3, 4}.
Then A ⊆ B and B ⊆ A . Hence A = B .
Example 1.18
Let A = {1, 2, 3, 4} and B = {1, 2, 3, 4, 5}.
Then A ⊆ B but A �= B since B �⊆ A.
Proper Subsets
If every element in A is an element in B , and also B has at least one other element which is
not in A , then A is called a proper subset of B .
A⊂B or B ⊃ A
Example 1.19
Let
then A is a proper subset of B . This is because apart from the fact that all the elements in
A are in B , there is at least one element in B , namely, 4 , that is not in A , that is. A �⊂ B .
Example 1.15 is also a proper subset.
26
INTRODUCTORY
PROBABILITY THEORY SET THEORY
Improper Subsets
If every element of set A is an element of set B , but B does not have any other elements that
are not in A , then set A is said to be an improper subset of set B
Example 1.20
Let
Do you like cars? Would you like to be a part of a successful brand? Send us your CV on
We will appreciate and reward both your enthusiasm and talent. www.employerforlife.com
Send us your CV. You will be surprised where it can take you.
27
INTRODUCTORY
PROBABILITY THEORY SET THEORY
Note
a) Some authors use the symbol ⊂ for both proper and improper subsets.
b) If we know only that A is a subset of B but do not know if it is proper or improper,
then we write A ⊆ B .
c) We should not confuse the symbol “ ⊂ ”, which stands for “is a proper subset of ”
with “∈” which means “is an element of ”.
Example 1.21
The union of A and B , denoted by A ∪ B , is the set of all elements which belong to either A
or B or both.
A ∪ B = {x|x ∈ A or x ∈ B}
Example 1.22
Let
Find A ∪ B .
28
INTRODUCTORY
PROBABILITY THEORY SET THEORY
Solution
The union of A and B is:
A ∪ B = {1, 2, 3, 4, 5, 6, 7, 8, 9}
and
∞
A 1 ∪ A2 ∪ · · · = Ai
i=1
The intersection of two sets A and B , denoted by A ∩ B, is the set of all elements which belong
to both A and B
A ∩ B = {x|x ∈ A and x ∈ B}
Example 1.23
Refer to Example 1.22. Find A ∩ B.
Solution
The only element that belongs to both A and B is 3 . Hence
A ∩ B = {3}
If two sets A and B have no elements in common, that is, if A∩B = ∅ , then A and B are said
to be disjoint
29
INTRODUCTORY
PROBABILITY THEORY SET THEORY
Example 1.24
If
then
A∩B =∅
The complement of A , denoted by Ac , is the set of all elements in the universal set which are
not in A
In symbol, we have
Ac = {x | x ∈ U, x �∈ A}
AXA Global
Graduate Program
Find out more and apply
30
INTRODUCTORY
PROBABILITY THEORY SET THEORY
Note
Ac may also be denoted as A or A .
Example 1.25
Let U = {1, 2, 3, · · · , 10} and A = {1, 2, 3, 4, 5}
Find Ac .
Solution
Ac is the set of all elements in U that are not in A , so that,
Ac = {6, 7, 8, 9, 10}
Complement of B Relative to A
If B ⊆ A , then the set consisting of all elements in A which do not belong to B is called the
complement of B relative to A and is denoted by BA c
Note
c and
BA B are disjoint, that is, BA
c ∩B =∅
Example 1.26
Let
A = {1, 2, 3, 4, 5}
B = {1, 2, 3, 6, 7}
Then
c
BA = {4, 5}
c
BA ∩B = ∅
The set consisting of all elements in A which do not belong to B is called the difference of A
and B , denoted by A − B or A\B
31
INTRODUCTORY
PROBABILITY THEORY SET THEORY
In symbol,
A\B = {x | x ∈ A, x �∈ B} = {x | x ∈ A, x ∈ B c } = A ∩ B c
Example 1.27
Let A = {1, 2, 3, 4, 5} and B = {1, 2, 3}
Find A\B .
Solution
A\B consists of the elements in A that are not in B , That is,
A\B = {4, 5}
Note
a) If A �= B ,
A − B �= B − A
A−B =B−A=∅
b) U\A = Ac
The symmetric difference of the sets A and B , denoted by AΔB , is the set of elements which
belong to exactly one of the two sets
In symbol
AΔB = (A ∩ B c ) ∪ (Ac ∩ B)
= (A\B) ∪ (B\A)
= (A ∪ B)\(A ∩ B)
Example 1.28
Refer to Example 1.26. Find AΔB .
32
INTRODUCTORY
PROBABILITY THEORY SET THEORY
Solution
A ∪ B = {1, 2, 3, 4, 5, 6, 7}
A ∩ B = {1, 2, 3}
Hence
Example 1.29
What are the members of the following class of sets:
�e Graduate Programme
I joined MITAS because for Engineers and Geoscientists
I wanted real responsibili� www.discovermitas.com
Maersk.com/Mitas �e G
I joined MITAS because for Engine
I wanted real responsibili� Ma
Month 16
I was a construction Mo
supervisor ina const
I was
the North Sea super
advising and the No
Real work he
helping foremen advis
International
al opportunities
Internationa
�ree wo
work
or placements ssolve problems
Real work he
helping fo
International
Internationaal opportunities
�ree wo
work
or placements ssolve pr
33
INTRODUCTORY
PROBABILITY THEORY SET THEORY
Solution
The members of class A are the sets
Power Set
The power set is the set of all possible subsets of an original set
Example 1.30
Let A = {a, b}. Find the power set of A .
Solution
P(A) = {{a, b}, {a}, {b}, ∅}
Example 1.31
A = {1, 2, 3} . Find the power set of A .
Solution
P(A) = {∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}
Note
a) The set A is a subset of itself.
b) The null set ∅ is a subset of every set.
Partition of Sets
A partition of set A is a subdivision of A into non-empty subsets which are disjoint and whose
union is A
34
INTRODUCTORY
PROBABILITY THEORY SET THEORY
Cells
Thus, the class of non-empty sets {P1 , P2 , · · · , Pn } forms a partition of the set A if and only
if the following two conditions hold:
a) Pi ∩ Pj = ∅, if i �= j
n
b) Pi = A
i=1
Example 1.32
Let A = {1, 2, 3, 4, 5, 6}. Which of the following form a partition of the set?
Solution
a) This is not a partition of A since 2 ∈ A and belongs to both {2, 4} and {2}.
b) This is a partition of the set A since each element of A belongs to exactly one cell.
c) This is not a partition of A since 4 ∈ A but 4 does not belong to any of the cells.
d) This is not a partition of A since 7 ∈ {4, 7} but 7 �∈ A..
35
INTRODUCTORY
PROBABILITY THEORY SET THEORY
3. Associative
(A ∪ B) ∪ C = (A ∩ B) ∩ C =
A ∪ (B ∪ C) A ∩ (B ∩ C)
4. Distributive
A ∪ (B ∩ C) = A ∩ (B ∪ C) =
(A ∪ B) ∩ (A ∪ C) (A ∩ B) ∪ (A ∩ C)
6. Complement A ∪ Ac = U A ∩ Ac = ∅
EL 93%
DE LOS ALUMNOS DEL MIM ESTÁN
TRABAJANDO EN SU SECTOR
C
A LOS 3 MESES DE GRADUARSE
M
MASTER IN MANAGEMENT
CM
MY
36
INTRODUCTORY
PROBABILITY THEORY SET THEORY
Apart from the above seven laws, we have the following ones.
8. (A ) = A; U = ∅; ∅ = U
c c c c
9. A = (A\B) ∪ (A ∩ B)
A ∪ B = A\B) + B
(A\B) ∩ B = ∅
A\B) ∩ (A ∩ B) = ∅
Note
The null set ∅ is the identity for union while U is the identity for intersection in the sense
of Law 5 above.
Many of the laws of the algebra of sets can be generalized to arbitrary unions and intersections.
For example,
n
A1 ∪ A2 ∪ · · · ∪ An = Ai
i=1
and
n
A1 ∩ A2 ∩ · · · ∩ An = Ai
i=1
Formal proofs of these laws are lengthy. The reader is asked in Exercise 1.19 to illustrate these
statements with Venn diagrams discussed in the next section. We must emphasis however,
that Venn diagrams are slightly unreliable and are, therefore, not considered sufficient proof
of a theorem on set theory. The following theorem will be very useful in Chapter 4.
Theorem 1.2
If A ∩ B = ∅ then
(A ∩ C) ∩ (B ∩ C) = ∅
37
INTRODUCTORY
PROBABILITY THEORY SET THEORY
A Venn diagram is a geometrical representation of sets and are useful for visualizing the
relationships among sets. A universal set U is represented geometrically by a rectangle, such
as the one shown below. All other sets are drawn as circles within the rectangle.
U
U
A
A
Set A is shaded
Set A is shaded
Subsets
Set A is a proper subset of set B
U
B U
B
A
A
A B is shaded
U U
A B is shaded
We shall now illustrate set operations using Venn diagrams
Intersection
a) A ∩ B �= ∅
U
A B
U
A B is shaded
38
U
A B is shaded
INTRODUCTORY
PROBABILITY THEORY SET THEORY
U
A B
b) A ∩ B = ∅
U
A B
Nothing is shaded
Nothing is shaded
c) A ∩ B when A is a subset of B .
U
B
A
U
B
A
U
A B is shaded
U
A B is shaded
39
INTRODUCTORY
PROBABILITY THEORY SET THEORY
U
A B
Union
a) A ∪ B when A and B intersect. U
A B
U
A B
A U B is shaded
A U B is shaded
A U B is shaded
U
b) A ∪ B when A and B Aare disjoint. B
U
A B
U
A B
A U B is shaded
A U B is shaded
A U B is shaded
c) A ∪ B when A is a subset of B
U
B
A U
B
U
A
B
A U B is shaded
A
A U B is shaded
A U B is shaded
40
INTRODUCTORY
PROBABILITY THEORY SET THEORY
Complement U
a) A complement AC
A
U
AC
AC is shaded
AC is shaded
b) Complement of B relative to A
U
B
BCA is shaded
Difference of Sets
BCA is shaded
U
A B
A – B is shaded
U
A B
41
A – B is shaded
INTRODUCTORY
PROBABILITY THEORY SET THEORY
A ∆ B is shaded
This chapter has reviewed sufficiently the basic concepts of set theory that are crucial in
probability theory as presented in this book: definition and manipulation of sets of objects,
operations on sets to obtain other sets, special relationships among sets, and so on. It will
help us simplify word problems in probability problems. As we shall see in later chapters,
the mathematics of probability is expressed most naturally in terms of sets.
EXERCISES
1.1 Let A be the set of all positive odd numbers less than 10 . Describe A by
(a) the roster method (b) the property method.
CLICK HERE
to discover why both socially
and academically the University
of Groningen is one of the best
places for a student to be
www.rug.nl/feb/education
42
INTRODUCTORY
PROBABILITY THEORY SET THEORY
1.2 Let A be the set of real numbers that satisfy the following equation: 2x = 10. Show
how to describe A by
(a) the roster method (b) the property method.
1.8 In Exercise 1.6, which of the sets are equivalent as well as equal?
43
INTRODUCTORY
PROBABILITY THEORY SET THEORY
1.13 Let
Find
1.14 Draw a Venn diagram and shade the following when no set is a subset of the other.
(a) Ac (b) (A ∩ B)c (c) A ∩ B = Ac
(d) A ∪ B = ∅ (e) A ∪ B = A ∩ B
1.17 Show that (a) A−B = A∩Bc (b) A−B ⊂ A∪B (c) Ac −Bc = B ∩A.
1.19 Use the Venn diagrams to illustrate the laws of the algebra of sets.
1.21 Find the power set P(A) of the set A = {a, b, c}.
(a) {{a, b}, {c}, {c, d}} (b) {{a}, {b, d}, {c}}
(c) {{a, b, c}, {d}} (d) {{a, c}, {d}, {b}}
(e) {{a, d}, {c}} (f ) {a, b, c, d}
44
INTRODUCTORY
PROBABILITY THEORY SET THEORY
A ∩ (B − C) = (A ∩ B) − (A ∩ C)
(A − B) ∪ (B − A) = (A ∪ B) − (A ∩ B)
45
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
2 COUNTING PRINCIPLES
2.1 INTRODUCTION
Counting is the basis of probability and statistics. Often the number of elements in a
particular set or the number of possible outcomes of a particular experiment, as we shall
see later, is not very large and so direct enumeration or counting is not difficult. However,
problems arise in probability where direct counting becomes a practical difficulty. For
example, if a team of five players are to be formed with positions from ten players, then
each of the choices of five players (elements) of the ten players (in the universal set) is
5-tuple. The total number of all possible subsets in the universal set would be very difficult
to list4. A more sophisticated way of counting such large numbers of outcomes falls within
th realm of a branch of mathematics referred to as combinatorics or combinatorial analysis
or combinatorial mathematics.
The problem of counting begins with drawing r objects5 of a specified type from a class
or group of n objects, a technique known as sampling. The number of ways in which this
can be done depends on whether
Here in this text, we shall consider only cases having applications in probability problems.
Other aspects found in the analysis of algorithms, coding theory, medicine, quality control,
communications, agriculture, genetics, quantum mechanics, and so on, are treated in books
devoted to combinatorial theory.
46
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
If, after an object is drawn and a note is taken of it, it is put back into the population before
another object is drawn then it is sampling with replacement
In this case, the same object may appear several times in the sample.
If, after an object is drawn, it is put aside before the next one is drawn, until all the r objects
are drawn, it is sampling without replacement
When drawing r objects one at a time, the order in which the objects appear in the sample
is very important. We may then have to distinguish between ordered and unordered samples.
Consider a population set (or simply the population) of n objects, a1 , a2 , · · · , an . We
define an ordered sample of size r drawn from this population as any ordered arrangement
aj1 , aj2 , · · · , ajr of r objects. In combinatorics, whether order matters is the principal factor
that distinguishes combinations and (permutations), the two main techniques for counting.
If order is important in the selection of the r objects, it is a permutation problem and
if not, a combination problem. These two main techniques for counting are both in turn
based on a simple concept known as the principle of multiplication which we discuss in the
next section.
47
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
Example 2.1
A class of students consists of 6 Ghanaian, 8 Nigerians and 2 Italians. In how many ways
can a Ghanaian or an Italian be drawn from the class if none of them has a dual citizenship.
Solution
Let A denote the set of students who are Ghanaians or Italians;
Then B1 and B2 are disjoint, since no student can be a Ghanaian and an Italian at the
same time. Hence
A=6+2=8
Whenever we have a simple action to perform but that action must satisfy one condition or
another where the conditions cannot happen together, we normally use the addition principle.
American online
LIGS University
is currently enrolling in the
Interactive Online BBA, MBA, MSc,
DBA and PhD programs:
48
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
The inclusion-exclusion principle tells us that when we add together the number of elements
in A and the number of elements in B , the elements in A ∩ B are counted twice. To find
n(A ∪ B), we must add n(A) to n(B) and subtract n(A ∩ B) .
Sometimes we shall be faced with situations that involve processes consisting of successive
actions. In such situations we are more likely to use the multiplication principle. We shall
first introduce the tree diagram which is the type of reasoning behind the multiplication
principle of counting.
The tree diagram is a device used to enumerate all the possible outcomes of sequence of
processes where each process can occur in a finite number of ways. It is constructed from
left to right. Each complete path is called a branch of the tree. The number of branches at
each point correspond to the number of possible outcomes of the next process.
Example 2.2
If a woman has 2 blouses and 3 skirts, how many outfit can she put together.
Solution
The following is the tree diagram. There are 6 outfits she can put together.
t1 N1
Skir
Skirt 2
N2
1
use
Blo Skir
t3 N3
t1 N1
Blo
use Skir
2 Skirt 2
N2
Skir
t3 N3
49
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
If one thing can be accomplished in n1 different ways and after this a second thing can be
accomplished in n2 different ways, …, and finally a k th thing can be accomplished in k th ways,
then all k things can be accomplished, in specified order, in
n1 × n2 × · · · × nk different ways
The multiplication principle of counting is also called the fundamental principle of counting.
Example 2.3
Use the multiplication principle of counting to solve the problem in Example 2.2.
Solution
There are 2 ways the woman can put on the two blouses and for each of these ways there
are 3 ways she can put on the three skirts. Hence, by the fundamental principle of counting,
there are 2×3 = 6 ways of putting outfits together.
Example 2.4
Six dice are rolled. In how many ways may the faces of the dice show up.
Solution
Each face of the die may show up in six different ways so by the fundamental principle of
counting there are
6 × 6 × 6 × 6 × 6 × 6 = 66 = 46656 ways
Example 2.5
Given the numbers, 1, 2, 3, 4, how many different numbers of three digits can be formed
from them if repetitions are (a) allowed (b) not allowed.
Solution
a) Since repetitions are allowed, each digits can be chosen in four different ways.
Hence there are
4 × 4 × 4 = 64 such numbers
b) Since repetitions are not allowed, the first digit can be chosen in four ways, the
second in three ways, and the remaining digit in two choices. Hence there are
4 × 3 × 2 = 24 numbers in all
50
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
2.3.3 FACTORIAL
Another concept that we need to know before starting discussions on the main counting
techniques is the factorial.
Given the positive integer n , the product of all the whole numbers from n down to 1 is called
n factorial
n! = n(n − 1)(n − 2) · · · 2 . 1
1! = 1
2! = 2 . 1 = 2
3! = 3 .2 . 1 = 6
4! = 4 . 3. 2 . 1 = 24
51
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
Example 2.6
Evaluate 10!.
Solution
For instance
For large n , the functions, n! is very large and convenient approximation can be obtained
by the Stirling formula, suggested by a Scot Mathematician, James Stirling (1692 − 1770)6:
√ n
n
n! ∼ 2πn
e
where ∼ is read “asymptotically equal” and means that the ratio of the two sides approaches
1 as n approaches infinity; π = 3.14159 and e = 2.71828 is the base of natural (Napierian)
logarithms.
For example
40
40
40! = 2(3.14159)(40) = 8.142 × 1047
2.71828
A special case of the fundamental principle of counting occurs when we make successive
choices from a single set of n objects. The first choice may be made in n ways, and for
each of these ways there are n − 1 ways for the second choice, then n − 2 ways for the
third, and so on.
52
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
An arrangement of n distinct objects in a given order taking all of them at a time is called an
n -permutation
Theorem 2.4
Note
Theorem 2.4 may equivalently be stated as “the number of arrangements of n objects in
a row”.
Example 2.7
Consider the set of letters a , b and c . In how many ways can the three letters be arranged
taking all of them together. Indicate the arrangements.
Solution
Since order is important in this case, it is a permutation problem. There are 3 letters and
all are to be taken together simultaneously so
3 P3 = 3! = 3 × 2 × 1 = 6
53
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
Theorem 2.5
n Pr = n(n − 1)(n − 2) · · · (n − r + 1)
or equivalently,
n!
n Pr =
(n − r)!
Note
If r = n, then n Pr =n Pn = n!
Maastricht
University is
the best specialist
university in the
Visit us and find out why we are the best! Netherlands
(Elsevier)
Master’s Open Day: 22 February 2014
www.mastersopenday.nl
54
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
Example 2.8
In how many different ways can 4 people be chosen from a set of 6 and be seated in a
row of four chairs?
Solution
n = 6, r = 4.
Hence, the desired different number of ways is
6!
6 P4 =
(6 − 4)!
6×5×4×3×2×1
=
2×1
= 360
Theorem 2.6
Example 2.9
In how many different ways can a three-letter word be formed from the letter BEST if any
letter may be repeated.
Solution
n = 4, r = 3.
Hence, there are 43 = 64 different ways
One important variation of r -permutations is when objects are arranged in a circle. Typically
in this case the actual positions do not matter, but the relative positions (that is, which
objects are next to one another) matter. For example, if six people are sitting in a circle we
do not get a new permutation if they all move one position in a clockwise (or anti-clockwise)
direction. The first object can be placed anywhere. Starting from this object and moving
around clockwise or anti-clockwise there are n Pn−1 = (n − 1)! possibilities.
55
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
Theorem 2.7
(n − 1)!
Example 2.10
Five Executives attend a round-table meeting. How many different arrangements are possible?
Solution
There are (n − 1)! = (5 − 1)! = 24 circular permutations.
Example 2.11
Refer to Example 2.10. Suppose each of the Executives was accompanied by the Secretary
to take minutes of the meeting.
a) How many arrangements are possible that alternate the Executives and the Secretaries.
b) If a Secretary should sit by his/her Executive, how many arrangements are possible
that alternate the Executive and the Secretaries?
Solution
Any person (an Executive or a Secretary) can be placed anywhere at the start.
a) Suppose the first to sit down is an Executive. Then there are (5 − 1)! = 4! different
arrangements for the remaining Executives. The five Secretaries can be seated
in the next 5 alternating seats. Thus, there are 5! possibilities for them. By the
multiplication principle, there are
4!5! = 2880 different arrangements
b) Suppose the first to sit down is an Executive. Then there are (5 − 1)! different
arrangements. There are two ways the first Secretary can sit, either at the left or
the right of her Executive. Once she sits, all other places are automatic for the rest
of the Secretaries. Hence, there are
4!(2) = 48 possible arrangements
So far, we have been assuming that all objects considered are distinct. What happens if
they are not?
56
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
Theorem 2.8
The number of permutations (without repetition) of n objects of which n1 are alike, n2 are
alike, · · · , nk are alike is
n!
Mn = n Pn1 ,n2 ,···,nk =
n1 !n2 ! · · · nk!
where n = n1 + n2 + · · · + nk
Example 2.12
In how many ways can the letters of the word N E C E S S I T I E S be arranged.
57
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
Solution
The total number of letters in the word is 11. The letters E and S each occurs 3 times.
The letter I occurs twice and the letters N, C and T each occurs once. Hence the total
number of arrangements is
11!
= 554, 400
3!3!2!1!1!1!
Permutations Identities
a) n Pn = n!
b) n Pn = n!
c) n P1 = n
d) n Pr = n[P (n − 1, r − 1)], n > 0, r > 0
The r -combination is the total number of combinations of a set of n objects taken r at a time,
n≥r
Similar to permutations, there are two types of combinations, without repetition and
with repetition.
58
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
Example 2.13
How many combinations are there of three letters a, b, c taken two letters at a time, without
repetitions?
Solution
There are 3 such combinations: ab, ac, bc.
Suppose 3 letters are chosen from 4 letters. If order is important then it is a permutation
problem. The number of permutations of 3 letters chosen from 4 letter is
4!
4 P3 = = 4 × 3 × 2 = 24
(4 − 3)!
However, any set of 3 letters will appear 6 (or 3! ) times in the list of all possible arrangements.
For instance, the set containing letters a, b, and c will appear as abc, acb, bac, cab and cba .
Hence the total number of ways of selecting 3 letters is
4! 4×3×2×1
= =4
3!(4 − 3)! 3!1!
In general, the number of ways of selecting r objects from n distinct objects where order
is not important is
n Pr
n Cr =
r!
59
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
Theorem 2.9
The number of combinations of a set of n distinct objects taken r at a time without repetition,
n ≥ r is given by
n n!
=
r r!(n − r)!
for 0 ≤ r ≤ n
n
is also called the binomial coefficient for reasons that will become clear in Section 2.6.
r
Usually, when we talk of r -combination, we refer to combination without
repetition. The
n
number of r -combinations is usually denoted by n Cr , C(n, r) or by and readas “n
r
n
combination r ” or “ n choose r ”. Throughout in this text we shall adopt the notation .
r
60
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
Example 2.14
A school basketball squad for the inter-school competition has ten players. The coach must
select a team for the first tournament.
a) How many different teams of five players can be constituted for this tournament?
b) If, in constituting the team, the coach also has to designate positions, how many
different teams of five players can be constituted?
Solution
a) Here, we are not interested in the positions each of the five players in the team
will take. It is, therefore, a problem of combination, and
10 10!
=
5 5!(10 − 5)!
10 × 8 × 7 × 6 × 5!
=
5 × 4 × 3 × 2 × 1 × 5!
= 252 combinations.
b) Since the order counts in this case, the problem is one of permutation Hence,
10!
10 P5 =
(10 − 5)!
10 × 9 × 8 × 7 × 6 × 5!
=
5!
= 30, 240 permutations.
Example 2.15
A Committee of 5 is to be formed from 12 men and 8 women. In how many ways can
the Committee be chosen so that there are 3 men and 2 women on it.
Solution
Number of ways of choosing 3 men from 12 is
12 12! 12 × 11 × 10
= = = 220
3 3!9! 3×2×1
61
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
By the multiplication principle, the total number of ways of forming the Committee is
12 (8
× = 220 × 28 = 6160
3 2
The number of combinations of n distinct objects, taken r at a time, with repetitions is the
number of arrangements that can be made up of the r objects chosen from the given objects,
each being used as often as desired
Example 2.16
How many combinations are there of three letters a, b, c taken two letter at a time,
with repetitions.
Solution
There are 6 such combinations ab, ac, bc, aa, bb, cc.
Theorem 2.10
The number of different combinations of a set of n distinct objects taken r at a time with
repetitions (n ≥ r) , is given by
n + r − 1
r
Note
In general,
n+r n+r
=
r n
Example 2.17
Refer to Example 2.14. Use Theorem 2.10 to rework it.
62
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
Solution
n = 3, r = 2.
Hence
3+2−1 4
= =6
2 2
Theorem 2.10 can be used to solve one peculiar type of counting problems, that is, the
selection of an r -element subset of a set of size n . The objects were considered to be distinct
with n ≥ r .
63
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
Example 2.18
Three types of beverages, tea, coffee and cocoa, are to be served at the Academic Board
meeting. There are 12 members present at the meeting. How many different beverage orders
are possible.
Solution
Here the set S contains three elements, namely, tea, coffee and cocoa; and r = 12 selections
are to be made from S (the twelve members, each selecting one kind of beverage). Thus
n = 3 and r = 12
One way of stating this problem is “distribute r similar balls in n distinct boxes”. If, in
placing the balls in the boxes, no box should be empty, then there are
r−1
n−1
Combination
Identities
n
a) =1
0
n
b) =n
1
n
c) =1
n
0
d) =1
0
n+1 n n
e) = +
r r−1 r
n
f ) = 0, if r > n or r < 0
r
n n
g) =
r n−r
n n−1 n−1
h) = + , r = 1, 2, · · · , n − 1
r r r−1
64
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
n n+1 n
i) = − , r = 1, 2, · · · , n − 1
r r+1 r+1
n n n−1
j) = , if r < n
r n−r r
n n−r+1 n
k) = , if r > 0
r r r−1
n n n−1
l) = , n > 0, r > 0
r r r−1
In Table 2.1, we have summarized what we call the basic rules of counting.
n!
Permutation (ordered) nr (n − r)!
n+r−1 n!
Combination (unordered)
r r!(n − r)!
In addition to the basic rules summarized in Table 2.1, the number of ordered pairs that
can be formed when there are m choices for the first element and n choices for the second
is mn .
Example 2.19
n = 0, (1 + x)0 = 1,
n = 1, (1 + x)1 = 1 + x,
n = 2, (1 + x)2 = (1 + x)(1 + x)1 = 1 + 2x + x2
n = 3, (1 + x)3 = (1 + x)(1 + x)2 = 1 + 3x + 3x2 + x3
n = 4, (1 + x)4 = (1 + x)(1 + x)3 = 1 + 4x + 6x2 + 4x3 + x4
n = 5, (1 + x)5 = (1 + x)(1 + x)4 = 1 + 5x + 10x2 + 103 + 5x4 + x5
65
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
Similarly, to obtain the expansion (1 + x)6 we multiply the expansion of (1 + x)5 by (1 + x).
This process could be continued indefinitely. The expansions on the right hand side of
Example 2.19 have coefficients which correspond to the symmetrical array:
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
known as Pascal’s triangle, named after Blaise Pascal (1623–1662), a French mathematician,
and one of the founders of the science of probability. There are two obvious characteristics
of the Pascal’s triangle:
66
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
Example 2.20
Construct the row of the Pascal triangle corresponding to n = 6
Solution
That is
1 6 15 20 15 6 1
The process of multiplying out, as in Example 2.16, or using Pascal’s triangle is not a
satisfactory method of obtaining the expansion of (1 + x)n for large values of n . A better
method is to use what is called the Binomial theorem.
67
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
n
r+1 n+k+1
d) =
r=0
k k+1
n
n
e) (−1) r
=0
r=0
r
n
n
f ) (a − 1)r = an
r=0
r
n
n
g) r = n2n−1
r=0
r
Example 2.21
Use the binomial theorem to expand (a) (1 + x)5 , (b) (1 − y)5
Solution
a) From Theorem 2.11,
5 5 5 5
(1 + x) 5
= x +
0
x +
1
x + ··· +
2
x5
0 1 2 5
as in Example 2.19.
(1 − y)5 = [1 + (−y)]5
= 1 + 5(−y) + 10(−y)2 + 10(−y)3 + 5(−y)4 + (−y)5
= 1 + 5(−y) + 10(−y)2 + 10(−y)3 + 5(−y)4 + (−y)5
= 1 − 5y + 10y 2 − 10y 3 + 5y 4 − y 5
A more general form for the binomial theorem when n is a positive integer is
n n n n n n
(y + z)
(y + z)
= n n
= y n z 0 y+n z 0 + + z1 +
y n−1 zy1n−1 + ·z·2·+ · · ·
y n−2 zy2n−2
0 0 1 1 2 2
n n 1 n−11 n−1 n n
+ + y z y z+ + y0zn y0zn
n − 1n − 1 n n
n
n
= y n−r z r
r
r=0
68
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
Example 2.22
1 10
Determine the coefficients of x in x +
8 2 .
x
69
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
Solution
10 3 10
1 1
x +
2
= x 20
1+
x x
10 1 3 10 1 6 10 1 9
= x 20
1+ + +
1 x 2 x 3 x
10 1 12
+ + ···
4 x
10
Therefore the coefficient is = 210.
4
d b h b
It is usual, when expanding
by the binomial theorem for large values of n , to leave the
n
coefficients in the form or in factorial form, rather than evaluating them explicitly,
r
since the numbers may become very large.
are required to be nonnegative integers. But even though it does not make sense for
negative integers of the expression on the right-hand of (i) the same cannot be said about
the expression on the left-hand side.
where r
is still a nonnegative integer. If r is not an integer we shall never use the symbol
α
.
r
70
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
α −α + r − 1
c) = (−1)r
r r
−α α+r−1
d) = (−1)r
r r
Example 2.23
Evaluate the following
−5 −2 1/3 −1/2 4
a) (b) (c) (d) (e)
4 5 4 3 6
Solution
−5 (−5)(−5 − 1)(−5 − 2)(−5 − 3)
a) = = 70
4 4!
−2 (−2)(−2 − 1)(−2 − 2)(−2 − 3)(−2 − 4)
b) = = −6
5 4!
1/3 (1/3)(−1/3 − 1)(−1/3/2)(−1/3 − 3) 55
c) = =
4 4! 972
−1/2 (1 − 1/2)(−1/2 − 1)(−1/2 − 2) 5
d) = =−
3 3! 16
4
e) = 0, (since 4 < 6)
6
71
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
α
Theorem 2.12 is also called Newton’s Binomial Expansion and the coefficient as the
r
Newton’s binomial coefficient or the generalized binomial coefficient.
Example 2.24
1
Use the generalized binomial theorem to expand (1 + x) 2 up to the term in x3 and determine
1
(1.08) 2 up to the term in x3 to 5 decimal places.
Solution
Here, α = 1
2 so that
2(2 − 1) 2 12 ( 12 − 1)( 12 − 2) 3
1 1 1
(1 + x) 2
1
= 1+ 2
x+ x + x + ···
1! 2! 3!
2 (− 2 ) 2 2 (1 − 12 )(− 32 ) 3
1 1 1 1
= 1 + 2
x+ x + x + ···
1! 2! 3!
1 1 1 3
= 1 + x − x2 + x − ···
2 8 16
RUN FASTER.
RUN LONGER.. READ MORE & PRE-ORDER TODAY
RUN EASIER… WWW.GAITEYE.COM
72
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
1
To obtain an approximate value for (1.08) 2 up to the term in x3, we need to set x = 0.08
1
in the expansion for (1 + x) 2 . That is
1 1 1 1
(1 + 0.08) 2 = 1 + (0.08) − (0.08)2 + (0.08)3
2 8 16
= 1 + 0.04 − 0.0008 + 0.000032 = 1.03923 (5 decimal places)
Note that the expansion contains an infinite number of terms when α is not a positive
integer, and the restrictions |x| < 1 is necessary to ensure that the terms decrease in value
sufficiently as quickly as the number of terms increases. Since it is not possible to write
down all the terms in the expansion, it is usual to present the first few terms, the number
of terms being chosen to give the required accuracy. If we substitute (−x) for x and (−α)
for α in Theorem 2.12, the result is the next theorem.
Theorem 2.13
Example 2.25
Expand (1 + x)−2 up to the term in x4 and state its general form.
Solution
−2 −2 −2 −2 −2
(1 + x) = x +
0
x +
1
x +
2
x3 + · · ·
0 1 2 3
6 2 24 3 120 4
= 1 − 2x + x − x + x + ···
2! 3! 4!
73
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
Example 2.26
1
Use the generalized binomial theorem to expand (1 − x)− 3 up to the term in x3 and state
its general terms.
Solution
1 − 13 − 1 (− 1 − 1) − 1 (− 1 − 1)(− 13 − 2)(−x)3
(1 − x)− 3 = 1+ (−x) + 3 3 (−x)2 + 3 3 + ···
1! 2! 3!
1 1.4 x2 1.4.7 x3 1.4.7 · · · (3n − 2) xn
= 1 + x + 2 . + 3 . + ··· + . + ···
3 3 2! 3 31 3n n!
Expansion of (y + z)α
In order to expand (y + z)α when α is a real number it is necessary to write
z
y+z =y 1+
y
So that
z
(y + z)α = y α (1 + )α = y α (1 − x)α
y
z
where x = with |x| < 1
y
Example 2.27
1
Find the expansion of the expression 1 up to the term in x3 .
(4 − x) 2
Solution
x
We may write 4 − x as 4 − x = 4 1 −
4
x
Let y = − . Then 4 − x = 4(1 + y)
4
1 1 1
so that (4 − x) 2 = (4) 2 (1 + y) 2 . Now
1 1 1
= (1 + y)− 2
(4 − x) 12 2
But
1
1
− 1 (− 3 ) − 1 (− 3 )(− 52 ) 3
(1 + y) 2 = 1 + − 2 y + 2 2 y2 + 2 2 y + ···
1! 2! 3!
1 3 5
= 1 − y + y 2 − y 3 + · · · , |y| < 1
2 8 16
74
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
x
Replacing y by − , we obtain
4
− 1
1 1 x 2
= 1−
(4 − x)
1
2 2 4
1 1 3 2 5 3
= 1− x+ x − x + ··· , |x| < 4
2 8 128 1025
α
z
Expansion of y +
w
The following theorem provides a general formula for determining any term of the binomial
expansion.
Theorem 2.14
This e-book
is made with SETASIGN
SetaPDF
www.setasign.com
75
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
Example 2.28
1 4
Use Theorem 2.14 to expand x − 2
x
Solution
4 4 4
4−k
1 1 4 1
x− 2 = x+ − 2 = xk
− 2
x x k x
k=0
4−0
4! 0 1 1
For k = 0, x − 2 =
0!4! x x8
4−1
4! 1 1 1
For k = 1, x − 2 = −4
1!3! x x5
4−2
4! 2 1 1
For k = 2, x − 2 =6
2!2! x x2
4−3
4! 3 1
For k = 3, x − 2 = 4x
3!1! x
4−4
4! 4 1
For k = 4, x − 2 = x4
4!0! x
Therefore
4
1
x− = x−8 − 4x−5 + 6x−2 − 4x + x4
x2
76
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
where
n
n1 , n 2 , · · · , n k
b) For k = 2 ,
n n n
= =
n1 , n 2 n1 n2
c) For k = n,
n
= n!
1, 1, · · · , 1
d) The number of terms in the multinomial expansion is k+n−1
n
n
e) = kn
r=1
n1 , n 2 , · · · , n k
Example 2.29
Use the multinomial theorem to expand (x + y + z)2
77
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
Solution
We expect to have 3 + 2 − 1 = 6 terms of the form xn1 , y n2 , z n3 , where n1 + n2 + n3 = 2.
2
Therefore
2! 2 0 0 2! 1 1 0 2! 1 0 1
(x + y + z)2 = x y z + x y z + x y z +
2!0!0! 1!1!0! 1!0!1!
2! 0 1 1 2! 2! 0 0 2
x y z + x0 y 2 z 0 + x y z
0!1!1! 0! 2!0! 0!0!2!
a) the number of n -letter words formed with r distinct letters used n1 , n2 , · · · , nr times;
b) the number of ways of distributing n distinct objects into k distinct boxes, with
n1 objects in the first box, n2 objects in the second box, and so on.
c) the number of ways to split n distinct objects into r distinct groups, of sizes
n1 , n2 , · · · , nr , respectively.
Free eBook on
Learning & Development
By the Chief Learning Officer of McKinsey
Download Now
78
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
We have now gone through some basic mathematics that we obviously require to understand
and solve probability problems. If we have understood the materials in this part we are then
ready for the rest of the book, especially Parts 2 and 3.
EXERCISES
2.1 In Ghana, vehicle license numbers follow the system of two letters followed by four
digits and then by a letter. How many license plate are possible.
2.2 How many six-digit telephone numbers are possible if the first digit cannot be a 0 .
2.4 How many different three-digit numbers can be formed using 0, 1, 2, 3, 4 (excluding
numbers which begin with 0 ) if
2.6 Refer to Exercise 2.5. Suppose the books consist of two Mathematics books, three
Statistics books, and 1 Physics book. In how many ways can these books be arranged
on a shelf it
a) the different books on a subject are not distinguished?
b) two particular statistics books will be together?
c) books on the same subject are placed together?
2.8 A lady has 8 house plants. In how many ways can she arrange 6 of them on a line
a window still?
2.9 The Management Committee of a certain football team consists of nine members.
If 5 are to fill the positions of the president, vice-president, secretary, treasurer and
publicity officer, how many different slates of officers are possible?
79
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
2.10 There are 25 entrants in a gymnastics competition. In how many different way can
the gold, silver and bronze medals be won?
2.11 In how many ways could we arrange four guests, in a row of four numbered chairs
at a concert. Show all the arrangements.
2.12 In how many ways can 3 boys and 3 girls sit in a row
a) all of them can sit anywhere?
b) the boys and the girls are each to sit together?
c) only the boys must sit together?
d) no two people of the same sex are allowed to sit together?
2.13 Refer to Exercise 2.11. Suppose two of the guests are not on taking terms, what
different arrangements are possible with the two
a) sitting together?
b) not sitting together?
2.15 The Managing Director of a reputable company wants to fill four vacant positions
in four regional capitals, namely, Kumasi, Tamale, Koforidua and Ho. There is a
pool of ten officers at the headquarters in Accra from which to fill these positions.
In how many different ways can this be done?
2.16 Find the number of different ways in which the letters of the following words can
be arranged:
(a) N U M B E R; (b) P O S S I B L E;
(c) P E P P E R; (d) S T A T I S T I C S.
2.17 Refer to Exercise 2.16 (d). Rework it if all the three T’s should be together.
2.18 A child has 12 blocks of which 6 are black, 4 are red, 1 is white and 1 is blue. If
the child puts blocks in a line, how many arrangement are possible?
80
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
2.21 Evaluate
7 5 6 8 8
(a) (b) (c) (d) (e)
4 4 1 0 6
9 50 9 50 4
(f ) (g) (h) (i) (j)
4 4 0 46 5
2.22 The Reverend Minister of church always insists that every member shakes hands
with every other member exactly once. On one particular Sunday, ninety members
were present in the church. How many handshakes occurred.
www.sylvania.com
We do not reinvent
the wheel we reinvent
light.
Fascinating lighting offers an infinite spectrum of
possibilities: Innovative technologies and new
markets provide both opportunities and challenges.
An environment in which your expertise is in high
demand. Enjoy the supportive working atmosphere
within our global group and benefit from international
career paths. Implement sustainable ideas in close
cooperation with other specialists and contribute to
influencing our future. Come and join us in reinventing
light every day.
Light is OSRAM
81
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
2.24 A Committee of 6 is to be formed from1 3 men and 7 women. In how many ways
can the Committee be selected given that it must have
a) 4 men and 2 women?
b) at least one member of each sex?
2.26 16 people, 4 from each of 4 groups A, B, C and D , have to select 6 of their members
to represent them on Committee. How many selections can be made if
a) each group must be represented?
b) no group can be more than two representatives?
2.27 The Managing Director of a company decides on one Monday to accompany one
of his ten Executives on each of the remaining four days of the week to negotiate
contracts. To decide whom to accompany, he writes the name of each of the Executives
on a separate card. He puts all the cards in a small container, mixes then up and
draws one. In how many different sequence are there when by a sequence we mean
a list of four names, the first being the name of the individual he accompanies in
Tuesday, the second the individual he accompanies on Wednesday, and so on?
2.28 There are four female and eight male senior officers of an establishment. The Chief
Executive plans to form a Committee on which there are two females and four males.
How many different Committees can be formed for this compositions?
2.29 How many different ways are there to place 10 indistinguishable balls in four boxes?
2.30
Roll five dices once. How many different outcomes are there if the dice are
(a) distinguishable (b) indistinguishable?
82
INTRODUCTORY
PROBABILITY THEORY COUNTING PRINCIPLES
2.38 Use the binomial theorem to find the first four terms, and give the range of x for
which the full expansion is valid, for:
1 1
1
2.39 Use the expansion obtained in (e) in Exercise 2.38 above to evaluate (3.8) 2 to four
places of decimals.
2.42 Use Theorem 2.14 to rework part (a) and (c) of Exercise 2.34.
83
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
3.2 EXPERIMENTS
360°
3.2.1 DEFINITION OF CONCEPTS
.
Uncertainty refers to the outcome of some process of change. If a process of change can
lead to two or more possible results, the results are said to be uncertain.
thinking
360°
thinking . 360°
thinking .
Discover the truth at www.deloitte.ca/careers Dis
Discover the truth at www.deloitte.ca/careers © Deloitte & Touche LLP and affiliated entities.
Example 3.1
If a coin is tossed7 we observe whether a head or a tail is obtained. We might describe this
as the experiment of tossing a coin. Another example of an experiment is rolling a die.
Example 3.2
If a coin is tossed four times, each single toss is a trial.
Example 3.3
A die is rolled once and a 5 showed on the face. The outcome is the number 5.
When any one outcome of an experiment has the same chance of occurrence as any other
outcome when the experiment is performed, then outcomes are said to be equally likely
Example 3.4
When a die is tossed once the outcomes 1, 2, 3, 4, 5, 6 are all equally likely as long as
the die is fair.
85
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
Example 3.5
If a die is thrown once there are six possible outcomes: 1, 2, 3, 4, 5, 6. Each of these
outcomes cannot be further subdivided. Hence each outcome is simple.
Example 3.6
Consider an experiment E of tossing a coin twice and observing the sequence of heads H
and tails T . The four possible simple outcomes are
TT TH HT HH
where T H for example, indicates a tail on the first throw and a head on the second throw.
Note
The experiment of tossing a coin twice is the same as tossing two identical coins once.
A composite outcome is an outcome which can be further broken down into simple outcomes
Example 3.7
Refer to Example 3.5. There can be two possible outcomes: odd numbers 1, 3, 5 or even
numbers 2, 4, 6. Each of these outcomes can further be subdivided into simple outcomes 1,
3 and 5 and 2, 4 and 6. Hence, the two outcomes “odd” and “even” are composite outcomes.
a) Deterministic experiments,
b) Random experiments.
86
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
Example 3.8
If we measure the distance in, say, kilometres between town A and town B many times
under the same conditions, we expect to have the same result.
Do you like cars? Would you like to be a part of a successful brand? Send us your CV on
We will appreciate and reward both your enthusiasm and talent. www.employerforlife.com
Send us your CV. You will be surprised where it can take you.
87
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
Example 3.9
Tossing a coin or rolling a die is a random experiment since in each case, the process can
lead to more than one possible outcome. In the case of the coin tossing experiment, the
result will come up tail ( T ) or head ( H ); and in the case of the die, the result will be one
of the numbers 1, 2, 3, 4, 5 and 6 .
Example 3.10
Picking a ball from a box containing 50 numbered balls is a random experiment since the
process can lead to one of the many possible outcomes, that is, any of the 50 balls may
be chosen.
A sample space S is the set of all possible outcomes of some given random experiment E
Note
A sample space is synonymous to outcome set or an outcome space.
The following three examples are experiments related to the coin, the die and cards. For
each of them we shall define the sample space. Such and similar experiments are performed
even more often in probability textbooks than they are in real life.
88
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
Example 3.11
A coin is tossed once. There are only two possible outcomes. Either it falls Head, H or
Tail, T , so the sample space is
S = {H, T }
Example 3.12
A die is rolled once. There are six faces of a die so the sample space is
S = {1, 2, 3, 4, 5, 6}
Example 3.13
A deck of cards has fifty-two cards. There are four suits: clubs (♣), hearts (♥), diamond
(♦) and spade (♠). Hearts and diamond are red; clubs and spade are black. Each suit has
thirteen cards. The picture cards for each of the four suits are the Jack (J ), the Queen (Q),
and the King (K). Hence there are 12 picture cards in the deck of 52 cards. In addition,
each of the suits has an Ace (A) so there are 4 Aces in the deck.
An experiment may have several sample spaces depending on the problem of interest. This
is illustrate in the following examples.
Example 3.14
Toss a coin three times and observe the sequence of heads and tails.
Solution
Let H be Head and T be Tail. Then the sample space S would be
where T HT , for example, indicates a tail on the first throw, a head on the second throw
and a tail on the third throw.
Example 3.15
Toss a coin three times and observe the total number of heads that occur.
Solution
When a coin is tossed three times, it is likely none of them will show a head (T T T ) or
only one coin will show a head (either HT T or T HT or T T H ) or all the three will show
a head (HHH). Hence the sample space S in this case is
S = {0, 1, 2, 3}
89
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
The number of sample points in S may be denoted as n(S) and is called the size of S or
cardinality of S . In Example 3.12, there are six sample points in the sample space, hence
n(S) = 6.
Example 3.16
Throw two dice once8 and observe the numbers that appear on their faces.
AXA Global
Graduate Program
Find out more and apply
90
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
Solution
Let the pair of integers (i, j) (1 ≤ i, j ≤ 6) represent the numbers that appear on Die 1
and Die 2 respectively, where the dice are numbered arbitrary. For example, the pair (2,
3) means that the number 2 appear on Die 1 and 3 on Die 2. In order that we may not
miss any pair, it is always advisable to fix the number that appears on the face of one of
the dice while varying the number on the other die.
Die 2
If we classify sample spaces according to the number of points that they contain, then there
are three distinct kinds:
If a sample space has a finite number of points, it is called a finite sample space
Example 3.17
Toss a coin twice and count the number of heads. The sample space which is S = {0, 1, 2}
is finite.
91
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
If a sample space has as many point as there are natural numbers 1, 2, 3, · · · , then it is called a
countably infinite sample space
Example 3.18
Toss a coin until a head appears and then count the number of times the coin was tossed.
A coin may be tossed once and a head appears, or it may require two, three, …, fiftieth, …,
and for all we know it may require thousands of tosses before a head appears. Not knowing
how many times we may have to toss the coin, it is appropriate in an example like this to
take as the sample space the whole set of natural numbers, of which there is a countable
infinity before a head appears, and so on. Thus, the sample space S is
S = {1, 2, 3, · · ·}
If a sample has as many points as there are in some interval on the x -axis, it is called an
uncountably infinite sample space
Example 3.19
The interval
[0, 1] = {0 ≤ x ≤ 1}
Sample spaces may also be distinguished according to whether they are discrete or continuous.
92
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
A sample space which is finite or countably infinite is often called a discrete sample space.
Elements in a discrete sample space can be separated and counted. It can be finite or
countably infinite
Example 3.20
The number of students in a school can be considered a finite outcome of a discrete space.
The first success in trials (Example 3.18) can be a countably infinite outcome of discrete
sample space.
�e Graduate Programme
I joined MITAS because for Engineers and Geoscientists
I wanted real responsibili� www.discovermitas.com
Maersk.com/Mitas �e G
I joined MITAS because for Engine
I wanted real responsibili� Ma
Month 16
I was a construction Mo
supervisor ina const
I was
the North Sea super
advising and the No
Real work he
helping foremen advis
International
al opportunities
Internationa
�ree wo
work
or placements ssolve problems
Real work he
helping fo
International
Internationaal opportunities
�ree wo
work
or placements ssolve pr
93
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
Continuous sample spaces have elements that cannot be separated and counted. They are
required whenever the outcomes of experiments are measurements of physical properties
such as length, temperature, time, weight, mass, etc, which are measured on a continuous
scale. The number of sample points is always uncountably infinite.
Example 3.21
The time it takes for light-bulb to burn out. The outcome of this experiment could be any
real number from zero to a certain upper limit (such as 100,000 hours or more).
3.4 EVENTS
3.4.1 DEFINITION OF CONCEPTS
The term “event” is vital in understanding the basics of probability and it is important that
after understanding the concept of “sample space”, we grasp this concept.
Example 3.22
Roll a die once. Write down the following events as sets:
a) the number 4;
b) a number greater than 4;
c) an odd number.
Solution
Let A represent an event. Then
a) The event of rolling a 4 can be satisfied by only one outcome, the 4 itself, hence
A = {4}
b) The event of rolling a number greater than 4 can be satisfied by any one of two
outcomes: the number 5 or 6 , hence
A = {5, 6}
94
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
c) The event of rolling an odd number can be satisfied by any one of three outcomes:
the number 1, 3, or 5 hence
A = {1, 3, 5}
Note
a) An event may be defined also as a subcollection of the outcomes of an experiment.
b) If the outcome ω of an experiment is an element of event A , that is, ω ∈ A, then
the event A is said to occur. It is important to remember this idea. The set of
outcomes not in A is called the complement (or negation) of A , and is denoted
by A or A or Ac (see Chapter 1).
c) If a sample space contains n sample points, then there are a total of 2n subsets of
event (see Chapter 1).
The empty set ∅ and the sample space S itself are particular events. The sample space S is
called the “sure” or “certain” or “definite” event since an element of S is bound to occur
in one trial of the experiment. An event containing no outcome is an “impossible” event
and is denoted as ∅ . An element of ∅ cannot occur in any of the trials of the experiment.
Obviously, therefore, ∅ = S c .
Synonyms for simple event are elementary event of fundamental event. The letter e with
a subscript will be used to denote a simple event or the corresponding sample point.
Example 3.23
Toss a die once. The individual event
95
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
Example 3.24
Toss a coin twice and obtain, say, 1 head and 1 tail. This involves a sequence of two
elementary events of tossing a head and a tail.
Note
In a string event, the order is of crucial importance. Thus, the events HT is not the same
as T H when a coin is tossed twice.
Example 3.25
Throw a balance die once. Is the event A = {number ≤ 2} an example of a composite event?
Explain.
EL 93%
DE LOS ALUMNOS DEL MIM ESTÁN
TRABAJANDO EN SU SECTOR
C
A LOS 3 MESES DE GRADUARSE
M
MASTER IN MANAGEMENT
CM
MY
96
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
Solution
Let ei denote a simple event i (i = 1, 2, · · · , 6). Then
A = {e2 , e3 , e4 , e5 , e6 }
is a composite event. This is because the set A can be decomposed further into the following
simple events: e2 , e3 , e4 , e5 , e6 .
Since events are sets, it is clear that statements concerning events can be translated into
the language of set theory and conversely. In particular, we have an algebra for events
corresponding to the algebra of sets. We can, therefore, combine events using the various
set operations. Thus if A and B are events in a sample space S , then
Example 3.26
A die is rolled once and the numbers on its face recorded.
A ∪ B, A, A ∪ C, A ∩ B, A/C, AΔC
Solution
The sample space consists of the six possible numbers
S = {1, 2, 3, 4, 5, 6}
97
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
b) The set A ∪ B is the event that either even number occurs or odd number occurs:
A ∪ B = {2, 4, 6, 1, 3, 5} = S
The set A is the event that an even number does not occur:
A = S\A = {1, 3, 5}
The set A ∩ C is the event that the numbers that occur are even as well as prime numbers:
A ∩ C = {2}
The set A ∩ B is the event that the numbers that occur are even as well as odd numbers:
A∩B =∅
The set A\C is the event that the numbers that occur are either even and not prime
numbers:
A\C = {4, 6}
The set AΔC is the event that the numbers that occur are either even numbers or prime
numbers but not both:
AΔC = {3, 4, 5, 6}
Note
If A1 , A2 , · · · , An is a sequence of sets, then:
n
A = the event that at least one of the events Ai occurred;
i=1
n
A = the event that all the events Ai occurred.
i=1
98
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
Two events A and B are said to be mutually exclusive if they cannot occur together
That is, two events, A and B are mutually exclusive if the occurrence of A implies the
non-occurrence of B and vice versa. Synonyms for mutually exclusive events are disjoint
events, incompatible events or non-overlapping events.
Example 3.27
When a die is rolled once, the numbers 4 and 5 cannot occur together and hence the event
A = {4} and B = {5} are mutually exclusive.
99
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
Two events A and B are called mutually inclusive if they can occur together
Example 3.28
Suppose
Solution
a) A and B are not mutually inclusive (they are mutually exclusive) since they cannot
occur together.
b) C and D are mutually inclusive since they overlap:
C ∩ D = {4, 6} �= ∅
Two or more events defined on the same sample space are said to be collectively exhaustive if
their union is equal to the sample space S
In order words, n events are said to be collectively exhaustive if their union equals the
sample space, that is
A1 ∪ A2 ∪ · · · ∪ An = S
Example 3.29
When a die is thrown once, the events 1, 2, 3, 4, 5, 6 are collectively exhaustive because
their union equals the sample space.
100
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
Note
a) If two or more events are mutually exclusive, they cannot occur together, so that
at most one of the events will occur.
b) If two or more events are collectively exhaustive, then at least one of them will occur.
c) If a set of events is mutually exclusive and collectively exhaustive, then exactly one
of the events will occur.
d) For any two events, A and B , defined over the same sample space S ,
i) (A ∩ B) and (A ∩ B),
ii) (A ∩ B) and (A ∩ B),
iii) (A ∩ B) and (A ∩ B),
a) Ai �= ∅ for all i = 1, 2, · · · n
b) A1 ∩ Aj = ∅ for all i �= j; i, j = 1, 2, · · · n
n
c) Ai = S
i=1
In other words, the n events A1 , A2 , · · · , An form a partition of the sample space S if the
n events are (a) nonempty, (b) mutually exclusive and (c) collective exhaustive. Fig. 3.1 is
an example of a partition.
A1 A3 An
A2 ...
101
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
Example 3.30
A coin is tossed three times. Partition the sample space according to the number of heads
in the outcome.
Solution
The sample space is
S = {T T T , T T H, T HT , T HH, HT H, HHT , HHH}
The partitions are
A1 = {HHH}
A2 = {HHT , HT H, T HH}
A3 = {T T H, T HT , HT T }
A4 = {T T T }
These subsets satisfy the definition of a partition, namely, no nonempty event, no overlapping
events and all the possible events add up to the sample space S .
CLICK HERE
to discover why both socially
and academically the University
of Groningen is one of the best
places for a student to be
www.rug.nl/feb/education
102
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
Two events A and B are said to be independent if the occurrence (or non-occurrence) of one
of them is not affected by the occurrences (or non-occurrence) of the other
Example 3.31
Toss two coins. The events “Head” on the first coin and “Tail” on the second coin are independent.
Two events A and B are said to be dependent if the occurrence (or non-occurrence) of one is
affected by the occurrence (or non-occurrence) of the other
Example 3.32
A box contains two red pens and three blue pens. Two pens are picked at random successively.
The events “blue pen” in the second picking and the “red pen picked in the first round”
are dependent.
That is, if A and B defined over the same sample space S are independent events, then
Property 2
If A, B and C are independent events, then
That is, if A, B and C are independent, then C will be independent of any event that can
be formed from A and B using set union, intersection and complementation.
103
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
The word “probability” is frequently used in everyday speech. We say, for instance, “It
will probably rain tomorrow” or “He is probably guilty of the offence” or “The train will
probably be late”. What does the word “probable” mean? Here are a few examples of the
definition from the point of view of philosophy.
The “probably” is something which lies mid-way between truth and error (Thomasius).
An assertion, of which the contrary is not completely self-contradictory or impossible is
called probable (Reimarus).
That which, if it were held as truth would be more than half certain, is called probable (Kent).
Some phenomena can give different results when repeated trials of an experiment are
performed. If a die is thrown in the air, the fact that it will come down, is deterministic
but the observation of say, a “6”, is uncertain (synonyms are random or nondeterministic).
Probability is concerned with the study of such non-deterministic experiments.
The origin of the theory of probability goes back to the middle of the seventeenth century
and is connected to the mathematician Fermat (1601–1665), B. Pascal (1623–1662), and
Huygens (1629–1695)10. Among later scholars who contributed significantly to the development
of the theory of probability included Jakob Bernoulli (1654–1705), Abraham De Moivre
(1667–1754), Thomas Bayes (1702–1761), P. Laplace (1749–1827), S.D. Poisson (1781–1840),
Karl Gauss (1856–1922), P.L. Chebyshev (1821–1894), A.A. Markov (1856–1922), A.M.
Lyapunov (1857–1918), A. Khinchin (1894–1959) and A.N. Kolmogorov (1903–1987).
Historically, probability originated from the study of games of chance and early applications
of the theory of probability were in such games. In the middle of the seventeenth century, a
French courtier, the Chavelier de Méré wanted to know how to adjust the stakes in gambling
so that in the long run, the advantage would be his. He presented his problem to Blaise
Pascal, his countryman. It was in the correspondence between Pascal and Pierre Fermat,
another French mathematician (a friend of Pascal’s father), that the theory of probability
has its beginning. Many of the probability calculations were, therefore, based on objects of
gambling: the coin, the die, and the cards. Even though the use of probability in gambling
today is just one of its minor applications, the use of such objects is a convention.
104
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
In the previous chapter, these objects have been used in some examples. In most probability
problems, unless otherwise specified, a coin and a die are considered to be fair and a deck
of cards are assumed to be well shuffled.
Historically, there are three main schools of thought in defining and interpreting the
probability of an event: the classical approach, the frequency approach, and the subjective
approach. The first two are referred to as objective approaches.
105
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
If there are a possible outcomes favourable to the occurrence of an event A , and b possible
outcomes unfavourable to the occurrence of A , and all possible outcomes are equally likely
and mutually exclusive, then the probability that event A will occur is
a Number of Favourable outcomes
P (A) = =
a+b Number of all Possible outcomes
or equivalently
If a statistical experiment or simply an experiment E can lead to n mutually exclusive and equally
likely simple outcomes, and if a of these outcomes have attribute A , then
a
P (A) =
n
To explain how the classical probability formula arises, suppose that S contains n points.
1
Then the probability of each point is . Suppose also that the event A contains a points.
n
Then, its probability is
1 a
a× =
n n
That is
That is
Number of outcomes comprising the event
P (event) = Total number of equally likely outcomes in the sample space
Example 3.33
In a well-shuffled deck of cards, a card is selected at random. What is the probability of
(a) a queen? (b) a 5? (c) a spade? (d) a picture card?
106
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
Solution
The sample space for selecting a well-shuffled deck of cards was first provided in Example 3.13.
There are 52 possible outcomes.
4
P (A) =
52
4
P (5) =
52
13
P (spade) =
52
12
P (picture cards) =
52
Example 3.34
A pair of dice is rolled once. Find the probability of rolling
(a) a sum of 7 (b) a sum of 7 or 11 (c) a double.
Solution
The sample space for rolling a pair of dice was first provided in Example 3.16. There are
36 possible outcomes.
(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)
Hence
6
P (sum of 7) =
36
107
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1), (6, 5), (5, 6)
Hence
8
P (sum of 7 of 11) =
36
(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6).
Hence
6
P (double) =
36
American online
LIGS University
is currently enrolling in the
Interactive Online BBA, MBA, MSc,
DBA and PhD programs:
108
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
If there are n objects, m of which are of a type A . The probability of selecting r objects of
that type A is
m
r
P (A) =
n
r
Example 3.35
In a well-shuffled deck of playing cards, four cards are drawn at random. What is the
probability that they are all hearts.
Solution
The number of all possible outcomes is the number of ways we can select 4 cards from 52
cards, namely
52
4
The number of favourable outcomes is the number of ways we can select 4 hearts from a
total of 13 hearts, namely
13
4
Example 3.36
A box contains 10 marbles of which 6 are red and 4 are blue. Two marbles are chosen at
random. Find the probability that both are red.
109
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
Solution
There are 10 ways to choose 2 marbles from 10 marbles. 6 of the marbles are red
= 45
2
so there are 6 ways to choose red marbles. Hence
= 15
2
6
number of ways 2 red marbles can be chosen 2
P (A) = =
number of ways two marbles can be chosen 10
2
15
=
45
The classical probability is objective in the sense that it is based on deductions from a set
of assumptions. It is very useful not only in games of chance but also in a great variety of
situations where gambling devices are used to make random selection.
b) Limited Applicability
The classical approach is all very well when dealing with the games-of-chance type of
problems, but its field of application is limited.
i) Even in the theory of games, although the outcome is unpredictable, some results
are not equally likely. For instance, if we are tossing a coin which is weighted
so that a head (H) is three times as likely to appear as a tail (T ), then a head
is much more a likely outcome of a throw than a tail.
ii) Furthermore, there are many more situations in real-world where the possibilities
that arise cannot be considered as equally likely. For example, the probability
that a ninety-year-old will live the next twenty years is not as equally likely as
a twenty-year-old person surviving the next twenty years.
110
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
c) Restriction of Definition
The classical approach to probability can only be applied to situations where
In more complex situations, such as the ones we often meet in the real-world, we often
cannot assign probabilities a priori and the classical approach cannot be used. Again, the
real-world has not such thing as a perfectly balanced coin, perfect die and so on, and
therefore, the assumption of perfection will cause slightly wrong probability assumptions.
The relative frequency approach overcomes the disadvantages of the classical approach by
using the relative frequencies of past occurrences as probabilities. This approach claims that
the only valid procedure for determining event probabilities is through repetitive experiments.
111
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
Suppose that an experiment is repeated a large number of times under the same conditions
with each of the n trials being independent of one another. If an event A is observed to occur
in n(A) of the trials, the probability of the event A is
n(A) m
R(A) = =
n n
where m = n(A) is the number of times event A occurs
Sometimes, if we want to explicitly indicate that the event A depends on the number of
trials, we write it as Rn (A). The different probabilities for different number of trials stabilize
or approach a limit as the number of trials or experiments increases. A mathematician may
write this as
n(A)
lim Rn = lim = P (A)
n→∞ n→∞ n
where “n→∞
lim �� means “when the number of trials increases without bound”.
This type of “limit” is called a probability limit and P (A) is a unique number to which
the outcome ultimately settles. That is, when n is very large, this ratio may be taken as an
approximation to the “true” probability function that underlies the experiment in the sense
that Rn (A) will become close to P (A). This assertion is not about mathematical limits but
about empirical phenomena. Of course, there is no way actually to repeat an experiment
an infinite number of times to evaluate the limit, but the intuitive idea is that such a limit
must exist.
It is obvious that with the relative frequency approach, there is just no way of estimating
the probability without empirically drawing a sample or performing an experiment and
it is for this reason that this approach has also been called the empirical approach. It is
also sometimes called a posteriori approach because probability values are determined only
after events are observed. We may, therefore, describe it as “being wise after the event”.
The relative-frequency definition is also objective because the probability of an event is
determined by repeated empirical observations.
Note
n(A)
We must keep in mind that strictly speaking is only an estimate of P (A).
n
112
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
0 ≤ Rn (A) ≤ 1
where
Rn (A) = 0, if A occurs in none of the n trials;
Rn (A) = 1 if A occurs in all the n trials.
This property is intuitively clear. If n(A) is the number of trials in which A occurs out of
n trials, then
0 ≤ n(A) ≤ n
and we have
0 n(A) n
0= ≤ Rn (A) = ≤ =1
n n n
Property 2
a) If A and B are any two events, then
and hence
Example 3.37
A coin is tossed 1,000 times and tail comes up 508 times. Estimates the probability of
the tail.
113
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
Solution
n = 1, 000 n(A) = 5508
Hence
n(A) 508
P (Tail) = = = 0.508
n 1, 000
508
The ratio is rather an estimate but we would be very reluctant to state that the
1000
probability of getting a tail on that coin is 0.508, because if we had stopped at some other
number of trials the ratio would have been different. What we could say is that if we toss
the coin many times it will come up tail 50.8% of the times. Intuitively, we feel that as
the number of trials increases, the relative frequency will settle down to some stable value,
greater than zero and less than unity.
Maastricht
University is
the best specialist
university in the
Visit us and find out why we are the best! Netherlands
(Elsevier)
Master’s Open Day: 22 February 2014
www.mastersopenday.nl
114
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
c) The expression “large number” of trials in the definition is vague and this creates
even more practical problems. In an attempt to reach this number, we may face
the problem of cost, in terms of money and time. This may discourage the good-
intended experimenter to use an “insufficient” number of trials.
d) The concepts of the relative frequency approach in defining probability is somewhat
intuitive which is a plus for understanding but it is a minus for a rigorous development.
In spite of these limitations, the frequency approach provides a link between the outside
world and the mathematical theory of probability we study. It enables us to construct
probability models of natural phenomena and study them. If past is any guide to the
future and if by and large nature is stable, the method of assigning probabilities to real-life
situations should be satisfactory.
Subjective probability is the degree of rational belief by an individual that the event will occur,
based on all evidence available to that individual.
Equivalently,
The subjective probability of A is a measure of confidence that a reasonable person assigns to
the event A
115
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
As an example, we might say that we are “almost sure” that it will rain now; in which case
“almost sure” is a non-quantitative measure of subjective probability. We might wish to
quantify the subjective probability and say, perhaps, “I am 90 percent certain that it will
rain now”, but the 90 percent is still a degree of belief or subjective probability.
a) where there is little or no direct evidence (either the events have not yet occurred,
or have occurred only once) so that there is no choice but to consider collateral
(indirect) evidence, educated guesses, and perhaps intuition and other subjective factors;
b) in situations that do not require
i) an experiment with a large number of trials, or
ii) the assumption of statistical regularity.
c) in situations where from experience, there has been massive evidence in favour
of the event; for example, a person who has seen the sun appearing in the sky
everyday will conclude subjectively that the probability that the sun will appear
everyday is one.
This approach to probability, also known as Bayesian approach, has been developed
relatively recently and is related to Bayesian decision analysis. Although the subjective view
of probability has enjoyed increased attention over the years, it has not been fully accepted
by statisticians who have traditional orientations.
The three distinct approaches to probabilities, namely, the classical, relative frequency and
subjective approaches, raise interesting philosophical problems. We know that P (A) = 0 can
never be less than “zero” nor exceed “one” but do P (A) = 0 imply impossibility and P (A) = 1
absolute certainty? Let us now look at probability values as interpreted in these approaches.
116
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
P (A) = 0
To the classical probabilist, this means that event A cannot occur. However, to the frequency
probabilist (frequentist), event A has never occurred. Of course this does not mean it cannot
occur in the future. The subjective probabilist (subjectivist) will say I think that the event
A will not occur and which also does not mean that it cannot occur.
P (A) = 1
To the classical probabilist, it means that event A must always occur. To the frequentist,
it means that event A has always occurred, which does not imply that it must occur in
the future. The subjectivist will say that I think that event A will occur and this also does
not imply it must occur.
117
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
Without doubt, the empirical and subjective approaches are more interesting and more
useful than a priori approach. However, in an introductory text such as this, it is preferable
to concentrate attention on classical probability, and unless we state to the contrary, an a
priori approach is used.
Kolmogorov gave a simple set of three axioms13 or postulates which a probability function
is assumed to obey. This approach is what is considered the modern approach or axiomatic
definition of probability.
Axiom 1 (Non-negativity)
For every event A
0 ≤ P (A) ≤ 1
Axiom 2 (Normed)
P (S) = 1
P (A ∪ B) = P (A) + P (B)
Note
Recall that [0, 1] denotes {x ∈ R|0 ≤ x ≤ 1}
Axiom 1 states that the probability of an event A exists and can never be less than zero
(which corresponds to an “impossibility” of an event) nor greater than one (which corresponds
to a “certainty” of an event).
118
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
Axiom 2 states that the events Ai s comprising the sample space must be exhaustive: it
must be a certainty that at least one of the mutually exclusive events in the sample space
will take place in each trial of the experiment (and when all are taken together their total
probability is 1).
Axiom 3 implies that if the probability can be assigned to each of the sample points in S ,
then we can obtain the probability of any event defined on S merely by summing the separate
probabilities of all the sample points that are members of the event set under consideration.
These three axioms satisfy our intuitive notions of what we mean by a probability function
or probability measure and seem motivated by the properties of the relative frequency of an
event. The axiomatic approach launched probability as a separately identifiable subfield of
mathematics and provided a mathematical foundation to the classical theory of probability,
hence this approach is sometimes called the mathematical approach to defining probability.
Example 3.38
Suppose in a certain experiment, the sample space S consists of 3 elements: S = {e1 , e2 , e3 }.
Which of the following functions defines a probability function of S ?
Solution
a) The function does define a probability function on S since each value is non-
1 1 3
negative and the sum of the values is one; that is + + =1
5 2 10
b) The function does not define a probability function on S since the sum of the
1 1 1 13
values on the points is greater than; + + = >1
2 3 4 12
1
c) The function does not define a probability function on S , since P (e2 ) = − , a
2
negative number.
d) The function does not define a probability function S since the sum of the values
1 1 7
is less than 1; + = <1
4 3 12
This chapter has laid a good foundation upon which we can understand the basic theory
of probability which is developed systematically in the next chapter. It is advisable that
the reader goes through most of the exercises to ensure that the materials presented in this
chapter are well assimilated.
119
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
EXERCISES
3.1 A fair die is thrown twice. List the elements of the following events:
120
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
3.3 Let A, B and C be three events. For each of the following statements, write in set
notation the event specified by the statement:
a) Only A occurs:
b) Exactly one of A, B, C occurs
c) At least one of the events occurs;
d) Both A and B but not C occur;
e) Exactly two of the events occur;
f ) At least two of the events occur;
g) All three occur;
h) Either A or B occurs, but not C ;
i) C occurs and either A or B , but not both
3.4 Refer to Exercise 3.3. Draw a Venn diagram and on it shade the area corresponding
to the event specified by the statement in set notation:
E = the event that the sum of the numbers that appear on the faces of the dice is odd;
F = at least one of the dice shows 1, and
G = the sum is 5.
E ∪ F, E ∩ F, F ∩ G, E ∩ F c , and E ∩ F ∪ G
3.6 A box contains 3 marbles, 1 red, 1 greed and 1 blue. Consider an experiment that
consists of taking 1 marble from the box, then replacing it in the box and drawing
a second marble from the box. Describe the sample space.
3.7 Repeat the experiment in Exercise 3.6 but this time the second marble is drawn
without first replacing the first marble.
3.8 A box contains 10 pairs of shoes. If 8 shoes are randomly selected, what is the
probability that there will be
a) no complete pair;
b) exactly one complete pair.
121
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
3.9 If 4 married couples are arranged in a row, find the probability that no husband sits
next to his wife.
3.10 If 4 married couples are arranged in a circle, find the probability that
3.11 Eight married couples enter a supermarket. If the owner of the supermarket selects 2
persons at random to give them free lottery coupons find the probability that
3.12 Refer to Exercise 3.11. If 6 persons are chosen at random, find the probability that
3.13 Refer to Exercise 3.11. If 16 persons are divided into a 8 pairs, find the probability that
3.14 Refer to Exercise 2.6. Find the probability that books on the same subject are together.
3.15 A bus starts with 6 people and stop at 10 different stops. Assuming that passengers
are equally likely to depart at any stop, find the probability that,
3.16 At Kwame’s birthday party, there were 22 people present. Find the probability that
122
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
3.17 A table consists of four digit random numbers. What is the probability that four
consecutive random digits are all different.
3.18 Two balls are drawn with replacement from a box containing 3 white and 2 black
balls. Calculate the probability that
3.19 A box contains 40 good and 10 defective fuses. If 10 fuses are selected what is the
probability that they will all be good.
123
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
3.22 A coin is weighted so that a head H is three times as likely to appear as a tail T .
Find P (T ) and P (H).
3.23 Three athletes, Yaa, Ama, and Afua run a 100 metre race. Yaa is twice as likely to
win as Ama and Ama is thrice as likely to win as Afua.
3.24 Three men, Fosu, Yeboah, and Abu and two women, Attaa and Adjoa are in a spelling
competition. Those of the same sex have equal probabilities of winning, but each
woman is twice as likely to win as any man.
3.25 If n distinguishable balls are distributed at random into r boxes, what is the probability
that box 1 has exactly j balls, 0 ≤ j ≤ n .
3.26 A box has b black balls and r red balls. Their colours are distinguishable but balls
of the same colours are not distinguishable. Balls are drawn from the box one at a
time without replacement. Find the probability that the first black ball selected is
drawn at the nth trial.
3.27 Repeat Exercise 3.26 for the case when the first ball is not replaced.
3.28 Suppose r objects are drawn from a set of n objects without replacement. Find the
probability that k given objects are selected.
3.29 Suppose n objects are permuted at random among themselves. Find the probability
that k specified objects occupy k specified positions.
124
INTRODUCTORY
PROBABILITY THEORY BASIC CONCEPTS IN PROBABILITY
3.30 Two dice are used, each loaded so that the probabilities of throwing 1, 2, 3, 4, and
6 are
1 − x 1 + 2x 1 − x 1 + x 1 − 2x 1 + x
, , , , ,
6 6 6 6 6 6
respectively. Compute the probability that, in one rolling of two such dice, we obtain
a total of (a) a seven (b) a six.
3.31 Refer to Exercise 2.11. What is the probability that the two guests who are not on
talking terms will sit next to each other.
3.32 Refer to Exercise 2.16 (d). Find the probability that all T’s are together.
125
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Theorem 4.1
P (∅) = 0
Proof
Let S be the sample space. Then
S S= =S ∪
S ∅∪ ∅ (from the Identity Law in Table 1.1)
P (S) = P (S ∪ ∅)
= P (S) + P (∅) (from Axiom 3 of Definition 3.31)
since S and ∅ are disjoint. Subtracting P (S) from both sides, we get
P (∅) = 0
This theorem states that if the event set is the null set, then the event is an “impossibility”
and the probability of occurrence is zero. This is obvious. The null set contains no sample
point; hence no weight can be assigned to it.
Example 4.1
A fair die is tossed once. What is the probability of obtaining a 7?
Solution
The sample space for this experiment is
S = {1, 2, 3, 4, 5, 6}
126
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
A=∅
P (A) = P (∅) = 0
Theorem 4.2
If A and B are events defined over the same sample space S and if they overlap (A ∩ B �= ∅)
then the probability that either A or B (or both) will occur is the sum of their separate probabilities
less the probability of their joint occurrence:
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
Proof
From Section 1.5, A ∪ B can be decomposed into the mutually exclusive events A\B and B :
A ∪ B = (A\B) ∪ B
so that
A = (A\B) ∪ (A ∩ B)
Therefore,
P (A) = P (A\B) + P (A ∩ B)
so that
127
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Aliter
Let us represent (A ∪ B) in the diagram presented in Figure 4.1.
U
A B
U U U
A B A B A B
A = (A ∩ B) ∪ (A ∩ B)
so that14
Similarly,
P (B) = P (A ∩ B) + P (A ∩ B) (ii)
(A ∪ B) = (A ∩ B) + (A ∩ B) + (A ∩ B)
so that
P (A + B) = P (A ∩ B) + P (A ∩ B) + P (A ∩ B) (iv)
128
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
The expression at the right-hand side of Eq. (iv) is the same as the last three terms of the
expression on the right-hand side of Eq. (iii).
P (A) + P (B) = P (A ∩ B) + P (A + B)
Hence,
P (A + B) = P (A) + P (B) − P (A ∩ B)
Example 4.2
The student union of the Regent University of Science and Technology decided to elect
a representative from five of its members to represent it on an Exchange Programme
Committee. Profiles of the five students were as follows: a male Accounting student, a male
Theology student, a female Psychology student, a female Computer Science student, and a
male Engineering student.
129
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
The Union decided to elect the representative by drawing a name from a hat. What is the
probability that the representative will be either a female or from the Faculty of Science
and Engineering.
Solution
Let the event A represent a female student and event B represent a student from the Faculty
of Science and Engineering. Then
Example 4.3
A school offers its 200 students three science subjects: Mathematics (M) Physics (P) and
Chemistry (C) The following are data on the number of students who are offered the
various subjects:
A student is selected at random from this school, what is the probability that he is studying
at least one of the three science subjects.
Solution
60 40 30 10 5 3 1 113
= + + − − − + =
200 200 200 200 200 200 200 200
130
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Corollary 4.2
For any events A1 , A2 , ..., An defined over the same sample space, S ,
n n n
m
P Ai = P (Ai ) − P (Ai ∩ Aj )
i=1 i=1 i=1 j=1
i<j
n
m
r
+ P (Ai ∩ Aj ∩ Ak ) − · · ·
i=1 j=1 k=1
i<j<k
+(−1)n−1 P (A1 ∩ A2 ∩ · · · ∩ An )
That is, the sum of the second term on the right hand side is over all distinct pairs of sets,
that of the third over all distinct triples of sets, and so forth.
P (A ∪ Z)
where Z = B ∪ C.
Example 4.4
St. Augustine College is presenting six athletes for the forthcoming Inter-School Games
and each of athletes has been promised a laptop computer by the school’s authority if at
least one of them wins a Gold medal. Find the probability that each athlete will be given
a car after the games if
2
a) each of the athletes has a probability of winning of ;
5
1 2
b) the probability that the first athlete would win is , the second is , the third is
4 5
1 2 1 4
, the fourth is , the fifth is , and the sixth is .
5 7 6 9
Solution
Let A1 , A2 , · · · , A6 be the event that the 1st , 2nd , · · · , 6th athlete wins a Gold medal. We
are required to calculate P (A1 ∪ A2 ∪ · · · ∪ A6 ).
131
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
6
(There are in all);
1
2
2
P (A1 ∩ A2 ) = P (A1 ∩ A3 ) = · · · = P (A5 ∩ A6 ) = P (A1 )P (A2 ) =
5
6
(There are pairs in all);
2
RUN FASTER.
RUN LONGER.. READ MORE & PRE-ORDER TODAY
RUN EASIER… WWW.GAITEYE.COM
132
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Hence
P (A1 ∪ A2 ∪ · · · ∪ A6 )
6 6 6
= P (A1 ) − P (A1 ∩ A2 ) + P (A1 ∩ A2 ∩ A3 )
1 2 3
6 6
− P (A1 ∩ A2 ∩ A3 ∩ A4 ) + P (A1 ∩ A2 ∩ A3 ∩ A4 ∩ A5 )
4 5
6
− P (A1 ∩ A2 ∩ A3 ∩ A4 ∩ A5 ) ∩ A6
6
6 2 6 2 2 6 23
6 2 4 6 2 5
6 2 6
= − + − + −
1 5 2 5 3 5 4 5 5 5 6 5
12 60 160 240 192 64
= − + − + − = 0.953344
5 25 125 625 3125 15625
1 2 1 2 1
b) P (A1 ) = , P (A2 ) = , P (A3 ) = , P (A4 ) = , P (A5 ) = ,
4 5 5 7 6
4 (i)
P (A6 ) = .
9
Since the events Ai , (i = 1, 2, · · · , 6) are not mutually exclusive we use Corollary 4.2:
P (A1 ∪ A2 ∪ · · · ∪ A6 ) = P (A1 ) + P (A2 ) + · · · + P (A6 ) − P (A1 ∩ A2 )
−P (A1 ∩ A3 ) − · · · − P (A5 ∩ A6 )
+P (A1 ∩ A2 ∩ A3 ) + P (A1 ∩ A2 ∩ A4 )
+ · · · + P (A4 ∩ A5 ∩ A6 )
−P (A1 ∩ A2 ∩ A3 ∩ A4 ∩ A5 ∩ A6 ) (ii)
Substituting the probabilities from (i) into (ii), the reader should verify that the result is
37
P (A1 ∪ A2 ∪ · · · ∪ A6 ) =
42
We realise that this approach is indeed cumbersome. A shorter approach will be shown in
Section 4.4 when we discuss independence.
Theorem 4.2, together with its corollaries, is called the general law (rule) of addition,
because it can be applied to any events.
If the events A and B are mutually exclusive, the last term in Theorem 4.2, P (A ∩ B) = ∅,
and Theorems 4.2 and Axiom 3 in Definition 3.31 are the same. Thus, the formula in
Theorem 4.2 is true whether or not A and B are disjoint (mutually exclusive).
133
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Theorem 4.3
If two events, A and B are mutually exclusive, then the probability that both events will occur
together is the sum of their individual probabilities:
P (A ∪ B) = P (A) + P (B)
Corollary 4.3
For a set of mutually exclusive events, Ai , i = 1, 2, · · · , n,, the probability of occurrence of
any of the Ai is the sum of their individual probabilities:
or simply
n n
P Ai = P (Ai )
i=1 i=1
Theorem 4.3, together with its corollary (Corollary 4.3), is sometimes called the special
law (rule) of addition, since it is concerned with the special case when the events are
mutually exclusive.
Example 4.5
A box contains 40 identical bulbs of which 10 are red, 25 are black and 5 white. A ball is
selected at random from the box. What is the probability that it is red or black.
Solution
Let event R be selecting a red ball , B selecting a black ball , W selecting a white ball . Now
n(R) = 10; n(B) = 25; n(W) = 5; n = 40.
10 25 5
P (R) = ; P (B) = ; P (W) =
40 40 40
Hence the probability of selecting a red or a black ball is
P (R ∪ B) = P (R) + P (B)
10 25
= +
40 40
7
= (Since R ∩ B = ∅)
8
134
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
The understanding of the probability with the concepts of “at least”, “at most” and “or” in
mutually exclusive events is very important in probability. We shall realise from the example
that follows that they all use the idea of the special law of addition.
Example 4.6
A die is rolled once. What is the probability that the outcome will be:
a) a 3;
b) less than 3;
c) at least 3;
d) more than 3;
e) at most 3;
f ) either 4 or 6.
This e-book
is made with SETASIGN
SetaPDF
www.setasign.com
135
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Solution
Let represent the outcome on the die by x . The probability of each of the outcomes is
1
P (x) =
6
Then:
1
P (x = 3) =
6
Theorem 4.4
If A ⊆ B , then
P (A ∪ B) = P (B)
136
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Proof
If A ⊆ B , then
A∩B =A
Theorem 4.5
Let A be the complement of A with respect to the same sample space S , then
P (A) = 1 − P (A)
Proof
The sample space S can be decomposed into mutually exclusive events A and A , that is,
S =A∪A
Hence,
P (S) = P (A ∪ A) = 1
P (A ∪ A) = P (A) + P (A)
Hence
P (A) + P (A) = 1
giving
P (A) = 1 − P (A)
137
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Example 4.7
Suppose a fair die is rolled twice. What is the probability of not getting a sum of five.
Solution
Let A be “the event of getting a sum of five”. Then “the event of not getting a sum of five”
is A , the complement of A .
The sample space of this experiment has 36 equally likely outcomes (Refer to Example 3.17).
Among them are 4 points that correspond to the event “getting a sum of five”, namely,
where the first number represents the number shown on the first die and the second number
the one shown on the second die.
Hence
4
P (A) =
36
Free eBook on
Learning & Development
By the Chief Learning Officer of McKinsey
Download Now
138
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Theorem 4.6
P (A) ≤ P (B)
Proof
We may decompose B into two mutually exclusive events as follows:
B = A ∪ (B ∩ A)
Hence
P (B) = P (A) + P (B ∩ A)
≥ P (A) [since P (B ∩ A) ≥ 0]
Note
Theorem 4.6 is intuitively appealing, for it says that if B must occur whenever A occurs
then B is at least as possible as A .
Boole’s Inequality
Theorem 4.7
The probability of the occurrence of at least one of the two events A and B never exceeds the
sum of the probabilities of these events:
P (A ∪ B) ≤ P (A) + P (B)
Proof
Since P (A ∩ B) ≥ 0, the result follows from Theorem 4.2.
This inequality can easily be extended to any number of events, as the following
corollary shows.
139
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Corollary 4.4
The probability of the occurrence of at least one of several events never exceeds the sum of
the probabilities of these events:
n n
P Ai ≤ P (Ai )
i=1 i=1
The equality sign in Corollary 4.4 holds only in the case when each pair of the given events
is mutually exclusive.
We shall now consider some additional rules of probability, introducing first what is called
conditional probability.
So far, in defining the probability of an event A we have assumed that the outcome
corresponds to some point in the given sample space S . Thus, when we use the symbol
P (A) for the probability of A , we really mean the probability of A given some sample space
S . We use the notation P (A|S) to make clear that we are referring to a particular sample
space S and we read as “the conditional probability of A relative to S ”. Of course, we
usual use the conventional notation P (A) whenever the choice of S is clearly understood.
Suppose we have some additional information that the outcome of a trial is contained in
a subset B of the sample space S , with P (B) �= 0 . The knowledge of the occurrence of the
event B in effect reduces the original sample space S to one of its subsets and this may
change the probability of the occurrence of the event A . The resulting probability is what
is known as conditional probability.
Example 4.8
A class consists of 15 Science students and 25 Arts students. A student has broken a chair.
a) Find the probability that Kofi, who is one of the students in the class, broke the chair.
b) If it is known that the student who broke the chair is a Science student, and Kofi
is a Science student, what is the probability that it was Kofi?
140
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Solution
a) n(S) = 15 + 25 = 40.
1
P (Kofi broke the chair) =
40
since each of the 40 students has an equal chance of breaking the chair.
b) Given the extra information that the student who broke the chair is a Science
student reduces the sample space to only the science students so that n (Science
students) = 15. Therefore,
1
P (Kofi broke the chair) =
15
This is usually written as: P (Kofi broke the chair | a Science student has broken the chair)
and described as “the probability that Kofi broke the chair given that a Science student has
broken the chair”. This is an example of a conditional probability.
www.sylvania.com
We do not reinvent
the wheel we reinvent
light.
Fascinating lighting offers an infinite spectrum of
possibilities: Innovative technologies and new
markets provide both opportunities and challenges.
An environment in which your expertise is in high
demand. Enjoy the supportive working atmosphere
within our global group and benefit from international
career paths. Implement sustainable ideas in close
cooperation with other specialists and contribute to
influencing our future. Come and join us in reinventing
light every day.
Light is OSRAM
141
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Let A and B be two events in the same sample space S with P (B) > 0 . The probability assigned
to A or that would be assigned to A when it is known that B has already occurred is called the
conditional probability of A given B and is denoted by P (A|B)
Note
a) P (A|B) is undefined if P (B) = 0.
b) Instead of using the longer statement given in the formal definition (Definition
4.2), we generally call P (A|B) “the probability of A given B ”.
c) Order is of no significance in the intersection set, since A ∩ B = B ∩ A. This property
of intersection yields the following results:
P (A ∩ B) = P (B ∩ A)
A1 ∩ A2 ∩ · · · ∩ An
may be represented by
A 1 A 2 · · · An
and
P (A1 ∩ A2 ∩ · · · ∩ An )
is represented by
142
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
The conditional probability satisfies all the axioms required for it to be a probability function
(see Definition 3.31).
Let us consider the two extreme cases, namely, when the probability of an event is 0 or 1
P (A ∩ B)
P (A|B) = =0
P (B)
This is intuitively clear. If A and B are disjoint events, then whenever B occurs A cannot
occur and so P (A|B) = 0
b) If B ⊆ A , then
P (A ∩ B) P (B)
P (A|B) = = =1
P (B) P (B)
c) If A ⊆ B , then
P (A ∩ B) P (A)
P (A|B) = =
P (B) P (B)
Example 4.9
A fair die is thrown once.
Solution
The sample space is
S = {1, 2, 3, 4, 5, 6}
A = {3, 4, 5, 6}
Hence,
4
P (A) =
6
143
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
3
P (B) =
6
A ∩ B = {4, 6}
2
P (A ∩ B) =
6
Hence,
P (A ∩ B) 2/6 2
P (A|B) = = =
P (B) 3/6 3
360°
Both probability and area can be thought of as measures of size. As can be seen in Fig. 4.2,
P (A|B), in a certain sense, measures the probability of A relative to the reduced sample
space B . It measures the size of P (A ∩ B) in comparison to P (B).
thinking .
360°
thinking . 360°
thinking .
Discover the truth at www.deloitte.ca/careers Dis
Discover the truth at www.deloitte.ca/careers © Deloitte & Touche LLP and affiliated entities.
B A
Geometrically, the analogous idea would be to compare the area of A ∩ B to the area of B .
In both the probability setting and the geometric figure we are in the same sense measuring
what fraction of B happens to lie also in the set A . Hence, we can view B as our new
sample space, that is, knowing the outcomes in B reduces our sample space from S to B .
Theorem 4.8
Proof
From the relative frequency definition of probability (Definition 3.29),
n(A ∩ B)
P (A ∩ B) =
n(S)
n(B)
P (B) =
n(S)
so that
P (A ∩ B)
P (A|B) =
P (B)
n(A ∩ B) n(B)
= /
n(S) n(S)
n(A ∩ B)
=
n(B)
145
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
The intuitive idea behind this theorem using equiprobable measure is that, if we want to
find the probability of an event A and we have no additional information, we must find the
size of the sample space S , the size of A and divide the latter by the former quantity. But
suppose we have more information, namely, that another event, B , has definitely occurred.
Then we no longer need to count all elements in S , but only those in B . Also, we do
not need to count every element in A , but only those in A ∩ B . Hence, with additional
information, we can obtain
n(A ∩ B)
n(B)
This is the conditional probability of A given B ; that is, P (A|B) is the measure of A ∩ B
divided by the measure of B .
The conditional probability plays a major role in probability. It has given birth to rules in
probability theory which are of great theoretical and practical importance. These are the
multiplication rule, the total probability law and the Bayes’ theorem.
Theorem 4.9
If A and B are two events in the same sample space S , then the probability of the joint
occurrence of A and B is given by
That is, the multiplication rule states that the probability of the simultaneous occurrence
of two events equals the product of the probability of the first event and the conditional
probability of the second event given that the first event has already occurred.
146
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Proof
The first relation is obtained from the conditional probability formula in Definition 4.2,
by cross-multiplying. The second relation follows from the first by interchanging the letters
A and B and using the fact that P (A∩B) = P (B ∩ A)
Aliter
Refer to Fig 4.2. Let S have finite number of points. We define n(A|B) as the number of
elements x in A such that x also belongs to the reduced sample space B . By this definition,
n(A|B) = n(A ∩ B) . Let
m
P (B) =
n
The conditional probability that the event A will happen given that B has already happened is
k
P (A|B) =
m
Do you like cars? Would you like to be a part of a successful brand? Send us your CV on
We will appreciate and reward both your enthusiasm and talent. www.employerforlife.com
Send us your CV. You will be surprised where it can take you.
147
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
k
P (A ∩ B) =
n
m k
= .
n m
= P (B)P (A|B)
Example 4.10
A box contains 10 balls, of which 6 are red and 4 are blue. If 2 balls are randomly selected
from the box without replacement, what is the probability that both are red?
Solution
We may think of the balls as being drawn one at a time. (This just means that we are
going to label them “first” and “second”. It does not really matter at all whether the balls
are drawn one at a time or all together.)
Let A be the event that the first ball drawn is ‘red’, and
B , the event that the second is ‘red’.
6
P (A) =
10
After the first ball has been drawn, we shall be left with 9 balls from which to draw the
second ball. If we know that the first ball drawn is red, then what it means is that there
are 5 red balls left in the box, one of which might be drawn second time, so
5
P (B|A) =
9
Hence,
P (A ∩ B) = P (A)P (B|A)
3 5 1
= =
5 9 3
148
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Corollary 4.5
The probability of the simultaneous occurrence of three events, A, B and C is given by15
Proof
By the associative law,
A ∩ B ∩ C = (A ∩ B) ∩ C
P (A ∩ B ∩ C) = P [(A ∩ B) ∩ C]
= P (A ∩ B) ∩ P (C|A ∩ B)
= P (B)P (A|B) · P (C|A ∩ B)
Example 4.11
In a consignment of 40 manufactured items, 8 are known to be defective. Suppose three
items are drawn at random without replacement. What is the probability that all three in
the sample are defective?
Solution
Let Ai be the event “getting a defective on the ith draw”. Then
The probability of selecting a defective item in the 1st draw is:
8
P (A1 ) =
40
7
P (A2 |A1 ) =
39
6
P (A3 |A1 ∩ A2 ) =
38
149
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Corollary 4.6
For any events A1 , A2 , · · · , An , Ai > 0 ,
Theorem 4.9, together with its corollaries, is called the general multiplication law (rule)
or the chain rule or Bayes’ sequential formula.
AXA Global
Graduate Program
Find out more and apply
150
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Example 4.12
Refer to Example 4.11.
Solution
The sample space for this problem is the set of all possible 3-tuples of defective
items that
40
could be selected from 40 items so that the sample space consists of equally likely
3
simple events.
a)
There
are 8 defective items and so 3-tuples of defective items can be selected in
8
number of ways.
3
8
73
P (all three are ddfective items) = =
40 1235
3
Hence
8 32
1 2 496
P (exactly 1 defective item) = =
40 1235
3
The preceding ‘combinatorial’ solution comes under what is generally called the Hypergeometric
Probability Distribution which will be taken up in the next volume.
151
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Theorem 4.10
If A and B are two independent events in the same sample space S , then the probability of
the joint occurrence of A and B is given by
P (A ∩ B) = P (A)P (B)
Corollary 4.7
For a set of n independent events, Ai , i = 1, 2, · · · , n, the probability of the joint occurrence
of A1 , A2 , · · · , An is the product of their individual probabilities:
or simply
n n
P Ai = P (Ai )
i=1 i=1
Theorem 4.10, together with its corollary, is called special multiplication rule because it is
only applicable to the case when the events are independent discussed in the sequel.
Example 4.13
Two dice are thrown once. What is the probability that the first die will show a 4 and the
second one will show a 6?
Solution
Let
A be the event that the first die will show a 4;
B be the event that the second die will show a 6.
Then
1 1
P (A) = and P (B) =
6 6
152
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
The addition and multiplication laws are the basic tools of probability. The application of
these laws may be demonstrated in the total probability law, also called the formula of
incompatible and exhaustive causes.
Theorem 4.11
If A1 , A2 , ..., An form a partition of the sample space S , then for any event B ⊆ S and
P (B) > 0
�e Graduate Programme
I joined MITAS because for Engineers and Geoscientists
I wanted real responsibili� www.discovermitas.com
Maersk.com/Mitas �e G
I joined MITAS because for Engine
I wanted real responsibili� Ma
Month 16
I was a construction Mo
supervisor ina const
I was
the North Sea super
advising and the No
Real work he
helping foremen advis
International
al opportunities
Internationa
�ree wo
work
or placements ssolve problems
Real work he
helping fo
International
Internationaal opportunities
�ree wo
work
or placements ssolve pr
153
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Proof
Fig. 4.3 represents a partition of the sample space S :
A3 S
A1 An
A2 ...
S = {A1 , A2 , ..., An }
By Definition 3.23, the events are pairwise mutually exclusive and their union is S .
and since
A i ∩ Aj = ∅, for i �= j
(Ai ∩ B) ∩ (Aj ∩ B) = ∅, for i �= j (by Theorem 1.2)
Suppose A1 , A2 , ..., An is a partition of sample space S and B an event defined on the same
sample space S such that P (B) > 0. Then
n
P (B) = P (Ai )P (B|Ai )
i=1
= P (A1 )P (B|A1 ) + P (A2 )P (B|A2 ) + ... + P (An )P (B|An )
154
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Proof
From Theorem 4.9,
P (B) = P (Ai ∩ B)
Example 4.14
A group of visitors to the University of Ghana consisted of 15 students from the University
of Oxford and 20 students from the University of Ibadan. Among the students from the
University of Oxford were 8 females and among the students from the University of Ibadan
were 5 females. A student was selected (at random) to give a vote of thanks at the end of
the visit. What is the probability that the student is a female?
Solution
Let
Then either the female student came from University of Oxford and was a female or she
came from University of Ibadan and was a female. This is the union of two disjoint events
which are H1 ∩ F and H2 ∩ F . Hence
Directly:
Number of female students from both Universities A1 and A2 is:
155
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Hence,
13
P (female) =
35
Many simple problems that can be solved with total probability can also be solved with
the probability tree diagrams. In fact, the tree diagram can be extremely useful in more
complex problems.
A probability tree diagram is a form of graphical display which combines the addition and
multiplication laws. It enables us to
EL 93%
DE LOS ALUMNOS DEL MIM ESTÁN
TRABAJANDO EN SU SECTOR
C
A LOS 3 MESES DE GRADUARSE
M
MASTER IN MANAGEMENT
CM
MY
156
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
In the probability diagram, the first event is represented by a dot from where branches
are drawn to represent all possible outcomes of the event. The first level of branching
corresponds to the hypothesis (prior) and the second refers to the event outcome (condition).
The probability of each outcome is written on its branch where the ones on the first level
of the branches are the prior probabilities and those on the second level are conditional
probabilities. In fact, the tree diagram will show more information than is required to answer
the question being asked. The probability that any particular path of the tree occurs is, by
the multiplication rule, the product of the probabilities written on its segments (that is, the
path of the branch from the root to the end of the path). The probability of any outcome
is found by adding the probabilities of all branches that are part of that event. Fig. 4.4
shows an example of a probability tree diagram.
) E
P(E|H 1
) H1
P(H 1
P(H
2 )
H2 E
P(E|H2)
A good understanding of the tree diagram will give us the ability to “reverse” the process
or compute reverse probabilities discussed in the following section. Thomas Bayes created
a formula for this revere process which has come to be called the Bayes’ Theorem.
Example 4.15
Use the probability tree diagram to solve Example 4.14.
Solution
In this example, the hypothesis is that the student came from either the University of Oxford
or the University of Ibadan. The second branching refers to the event outcome which is
either a male or a female.
157
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
7 Male
Oxford
15 Male
7
8 Female
Oxford
15 15 Male
20 8 Female
Ibadan
15 Male
5 Female
20
Ibadan
5 Female
From this tree diagram, we obtain the probability tree diagram by dividing the values on
the branch by their corresponding totals. The result is presented below:
7
15 Male
15 Oxford 7
35 Male
15
8 Female
15 Oxford 15
15 Male
35
20
20 8 Female
35 Ibadan 15
15 Male
5
20
20 15 Female
35 Ibadan
5
The selected female student could come from15either theFemale
University of Oxford or the
University of Ibadan.
Now from the diagram, the probability of selecting a female student given that she came
from the University of Oxford is to follow the University of Oxford’s branch through to
the female branch and multiply the probabilities. This gives:
15 8 8
P (F|H1 ) = =
35 15 35
Similarly, the probability of selecting a female student given that she came from the University
of Ibadan is to follow the University of Ibadan’s branch through to the female branch and
multiply the probabilities. This gives:
20 5 5
P (F|H2 ) = =
35 20 35
158
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Hence the probability of selecting a female student is the probability that she came either
from the University of Oxford or from the University of Ibadan. That is:
8 5 13
P (Female student) = + =
35 35 35
The calculations of the total probability and the Bayes’ Theorem are complicated enough to
create an abundance of opportunities for errors and/or incorrect substitution of the involved
probabilities. Fortunately, they can be presented in a tabular form.
Let
Then, the probability of the event outcome B is the sum of column 4. This value is the
total probability.
159
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
1 2 3 4 = (2) × (3)
Example 4.16
Present the probability problem in Example 4.13 in a tabular form and find the probability
that a student selected is a female.
Solution
Let M and F be males and females respectively. The problem may be summarised as in
the table below.
University M F Total
Oxford 7 8 15
Ibadan 15 5 20
Let H1 and H2 be the hypotheses that the student selected is from the University of Oxford
and the University of Ibadan, respectively. Then, the results can be summarised in the table
below.
The probability of selecting a female is the probability of selecting a female from either the
8 5
University of Oxford ( ) or the University of Ibadan ( ). The total probability is, therefore,
35 35
8 5 13
P (Female) = + =
35 35 35
which is what is shown in the last row of the column of the table below.
160
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
1 2 3 4 5 = (3) × (4)
In Example 4.14, a student was selected and we computed the probability that the student
was a female. As noticed earlier, that female could come from either the University of
Oxford or the University of Ibadan. Recollect that it was given as:
Suppose that we now have an additional information that a female student was selected
and we are interested in computing the probability that she was from Oxford, that is,
P (Oxford|Female).
This process of reversing the order of the conditioning is called the Bayes’ Theorem or Bayes’
rule or Bayes’ law, due to the English theologian and Mathematician, Reverend Thomas
Bayes (1702–1761). The Bayes’ Theorem describes the probability of an event, based on
conditions that are related to the event.
Suppose there is a hypothesis H and some observed evidence E , then Bayes’ Theorem is the
relationship between the probability of the hypothesis after getting the evidence P (H|E) and
the probability of the hypothesis before getting the evidence P (H):
P (E|H)
P (H|E) = P (H)
P (E)
161
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
In Definition 4.3,
P (H) is known as the prior probability and it is the (initial) probability of
the event originally obtained before any new or additional evidence or
information is collected;
P (H|E) is referred to as the posterior probability and it is the probability of the
event that has been revised after new or additional evidence or information
has been collected;
P (E|H)
is the factor that relates the posterior and the prior probabilities and is called
P (E)
the likelihood ratio.
CLICK HERE
to discover why both socially
and academically the University
of Groningen is one of the best
places for a student to be
www.rug.nl/feb/education
162
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Suppose A1 , A2 , ..., An are mutually exclusive and collectively exhaustive events in sample
space S with prior probabilities P (Ai ) such that P (Ai ) �= 0 for i = 1, 2, · · · , n . Let B be
any event in S for which conditional probabilities of B given Ai are P (B|Ai ) withP P (Bi )) �=
(A � 00 .
Assume that P (Ai ) and P (B|Ai ) for i = 1, 2, · · · , n are known, then the posterior probability
of Ai given that B has occurred is
P (Ai )P (B|Ai )
P (Ai |B) = n
P (Ai )P (B|Ai )
i=1
Proof
From the definition of conditional probability (Definition 4.2),
P (Ai ∩ B)
P (Ai |B) =
P (B)
We now replace the numerator by its equivalent expression in Theorem 4.9 and the
denominator by its equivalent expression in Theorem 4.12 to obtain:
P (Ai )P (B|Ai )
P (Ai |B) = n
P (Ai )P (B|Ai )
i=1
In the Bayes’ Theorem, Ai is a hypothesis (causes) and B an event based on this hypothesis
(consequences). Hence,
P (Ai |B) = the probability of the hypothesis Ai , given the occurrence of event B ;
P (Ai |B) = the probability of the event B , given the occurrence of hypothesis Ai .
Bayes’ Theorem is applicable in situations where quantities of the form P (B|Ai ) and P (Ai )
are known and we wish to determine P (Ai |B) The probabilities, P (Ai ) are the prior
probabilities of Ai , P (Ai |B) are the posterior probabilities of Ai , given that the event B
has occurred and P (B|Ai ) are the likelihoods.
The Bayes’ Theorem enables us to find the probabilities of the various events A1 , A2 , ..., An
which could cause A to occur, given the consequence B . For this reason, Bayes’ Theorem is
often referred to as a theorem on the probability of causes or Bayes’ retrodiction formula.
163
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Note
The terms on the right-hand side of the Bayes’ Theorem (Theorem 4.13) are all conditioned
on the events Ai (hypothesis), while that on the left-hand side is conditioned on B
(event outcome).
Example 4.17
Suppose, in Example 4.14, the student selected is a female. What is the probability that
the student came from
a) University of Oxford?
b) University of Ibadan?
Solution
All the probabilities have been calculated in Example 4.14. We use the notations in that
example.
P (H1 )P (F|H1 )
P (H1 |F) =
P (H1 )P (F|H1 ) + P (H1 )P (F|H1 )
(15/35)(8/15) 8
= =
(15/35)(8/15) + (20/35)(5/20) 13
P (H2 )P (F|H2 )
P (H2 |F) =
P (H2 )P (F|H2 ) + P (H2 )P (F|H2 )
(20/35)(5/20) 5
= =
(15/35)(8/15) + (20/35)(5/20) 13
Note
If the only information concerning the group of visitors consisted of the fact that 13 were
females and 22 were males, and we asked “What is the chance that a student selected at
13
random to give the vote of thanks is a female?” we should reply, “The probability is ”.
35
However, if we increase our knowledge by means of the additional fact that the randomly
selected student was found to be from University of Oxford, then our probability value
13 8
rises from to . On the other hand, if we increase our knowledge by means of the
35 13
additional fact that the randomly selected student was found to be from University of
13 5
Ibadan, then our probability value also rises from to .
35 13
l
164
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
This illustrates the reason for the assertion that “If our probability is a measure of the
importance of our state of ignorance it must change its value whenever we add new knowledge”.
The probability tree diagram so constructed in Fig. 4.4 can be used to compute posterior
probabilities, that is, in solving problems relating to the Bayes’ Theorem. The posterior
probability of any outcome is found by multiplying the probabilities of all branches that
are part of that event and dividing it by the total probability related to that event.
Example 4.18
Use the probability tree diagram to solve Example 4.17.
165
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Solution
For convenience, we reproduce the diagram in Example 4.15 here.
7
15 Male
15 Oxford
35
8 Female
15
15 Male
20
20
35 Ibadan
5
15 Female
a) From the diagram, knowing that a female has been selected, the probability that
she came from University of Oxford is to
i) Follow the Oxford branch through to the female branch and multiply their
probabilities:
15 8 8
=
35 15 35
ii) Compute the total probability for female branches for both Universities:
15 8 20 5 13
P (F) = + =
35 15 35 20 35
8/35 8
P (H1 |F) = =
13/35 13
b) We ask the reader to find out that the probability that the female student came
from the University of Ibadan.
166
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
We advise readers to adopt this type of table in solving problems relating to Bayes’ Theorem.
Total 1 P (T ) 1
Example 4.19
Rework Example 4.18 using Table 4.2.
Solution
The following table is an extension of the table in Example 4.15 by adding a column for
the posterior probabilities:
Total 13/35 1
From the last column of the table, the probability that a female student selected came from
8 which is the same as in Example 4.17.
Oxford is 13
167
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
We have discussed the concept of mutually exclusive events, that is, the situation when
events A and B cannot occur together. We have noted earlier that if A and B are mutually
exclusive, then P (A|B) = 0, since the given occurrence of B precludes the occurrence of A
. The other extreme situation is when B ⊆ A in which case P (A|B) = 1. In both situations,
we noted that if we know that B has occurred then we have very definite information
about the probability of the occurrence of A . However, there are many situations when
the knowledge of the occurrence of some event B does not have any bearing whatsoever
on the occurrence or non-occurrence of A .
Two events A and B are said to be independent if the occurrence of one does not affect the
probability that the other occurs
American online
LIGS University
is currently enrolling in the
Interactive Online BBA, MBA, MSc,
DBA and PhD programs:
168
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Frequently, we will postulate that two events are independent or it will be clear from the
nature of the experiment that the two events are independent.
Example 4.20
Throw two dice once. Is the probability of obtaining a six (or any other point) on the first
die independent of obtaining a six (or any other point) on the second die?
Solution
Our intuition says the two dice should not exercise any control over each other; that is, the
outcome on one die should be “independent” of the outcome on the other.
This shows that A and B are independent according to the condition, just as our intuition
would indicate.
Note
One cannot prove mathematically whether two real-world dice behave independently or
not. All we are illustrating is that if the sample space described in Example 4.20 is used
to model a pair of dice, with the assumption that the outcomes are equally likely, then
the theoretical dice of the model behave independently. In other cases we need to show
mathematically that the events are independent.
Events A and B on the same sample space S are said to be (statistically) independent if
the probability of the joint occurrence of A and B is equal to the product of their individual
probabilities.
P (A ∩ B) = P (A)P (B)
Note
If both P (A) > 0 and P (B) > 0 and if A and B are mutually exclusive, then the two events
cannot be independent (that is, they are dependent). In such a case, the left-hand side of
Definition 4.5 is zero, but the right-hand side is positive.
Example 4.21
A box contains ten numbered identical balls. A ball is picked at random with replacement
from the box. Consider the events:
169
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Solution
S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
A = {1, 2, 3, 4}
B = {2, 4, 6, 8, 10}
A ∩ B = {2, 4}
So
2
P (A ∩ B) =
10
4
P (A) =
10
5
P (B) =
10
2 5 2
P (A)P (B) = · = = P (A ∩ B)
5 10 10
Definition 4.6
For n > 2, events A1 , A2 , · · · , An are said to be mutually independent if and only if the
probability of the intersection of any 2, 3, · · · , n of these events is equal to the product of their
respective probabilities
It follows, therefore, that there are 2n − n − 1 of such subcollections that should all be
mutually independent for independence of the n events to be established.
170
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Theorem 4.14
That is, three events A, B and C are independent if and only if all the four (23 − 3 − 1 = 4)
of the following conditions are satisfied:
Example 4.22
Consider the sample space
S = {x|x ≤ 10}
171
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Solution
A = {1, 2, 3, 4}; B = {2, 4, 6, 8, 10} C = {1, 3, 6, 10}
A ∩ B = {2, 4}, A ∩ C = {1, 3}, B ∩ C = {6, 10}, A∩B∩C =∅
Then
2
P (A ∩ B) =
10
2
P (A ∩ C) =
10
2
P (B ∩ C) =
10
P (A ∩ B ∩ C) = 0
But
4 5 4
P (A)P (B)P (C) = × × �= 0
10 10 10
Note
a) Three events can be pairwise independent even though they are not statistically
independent. Example 4.22 illustrates this.
b) If P (A ∩ B ∩ C) = P (A)P (B)P (C), it does not necessarily imply that the events
A, B , and C are pairwise independent
Corollary 4.8
Events A1 , A2 , · · · , An are jointly independent (mutually independent) if and only if, for
any subcollection of k events ( k ≤ n ), Ai1 , Ai2 , · · · , Ain ,
⎛ ⎞ ⎛ ⎞
k
� k
�
P⎝ ⎠=P⎝ Aij ⎠
j=1 j=1
In general, n > 2 events can be pairwise independent but it does not imply necessarily that
they are statistically independent.
172
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Theorem 4.15
Proof
By Definition 4.5,
P (A ∩ B) = P (A)P (B)
Let
Z = P (A ∩ B)
P (A ∩ B ∩ C) = P (Z)P (C)
Note
a) If A is independent of B and A is independent of C then B is not necessarily
independent of C .
b) If A is independent of B and A is independent of C , then A is not necessarily
independent of B ∪ C .
Independence as defined in Definition 4.5 means that independence would not be defined
if P (B) = 0 since P (A|B) is defined only when P (B) �= 0 .
173
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Theorem 4.16
Suppose A and B are events defined on the same sample space S . If they are independent, then
P (A|B) = P (A)
Proof
From Definition 4.2,
P (A ∩ B)
P (A|B) =
P (B)
P (A)P (B)
= (by Definition 4.5)
P (B)
= P (A)
Similarly
P (B|A) = P (B)
Maastricht
University is
the best specialist
university in the
Visit us and find out why we are the best! Netherlands
(Elsevier)
Master’s Open Day: 22 February 2014
www.mastersopenday.nl
174
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Example 4.23
2 2
Suppose A and B are independent event. If P (A) = and P (B) = , calculate (a) P (A|B)
5 3
and (b) P (B|A).
Solution
Since the two events are independent,
i) By Theorem 4.16,
2
P (A|B) = P (A) =
5
2
P (B|A) = P (B) =
3
c) it is applicable even when P (B) = 0 in determining P (A|B), and it is for this one
reason why virtually all works in the field of probability and statistics rather make
use of Theorem 4.9 and Corollary 4.617.
Example 4.24
The Regent University College of Science and Technology wanted to employ staff to
participate in the teaching of a newly introduced programme. The following table gives the
proportion of the candidates who were selected for employment18. Are being a PhD holder
and a male independent?
175
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Solution
Let
A be the event that a PhD holder is employed;
B be the event that a male is employed.
P (A) = 0.58
P (B) = 0.63
P (A ∩ B) = 0.38
Hence,
P (A ∩ B)
P (B|A) =
P (A)
0.38
= = 0.66 �= P (A)
0.58
P (A ∩ B)
P (A|B) =
P (B)
0.38
= = 0.60 �= P (B)
0.0.63
Theorem 4.17
If two events A and B defined on the same sample space S are statistically independent, then
176
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Proof
We shall prove for the case of the pair A and B .
P (A ∩ B) = P (A ∪ B)
= 1 − P (A ∪ B)
= 1 − [P (A) + P (B) − P (A ∩ B)]
= 1 − P (A) − P (B) + P (A)P (B)
= [1 − P (A)] − P (B)[1 − P (A)]
= [1 − P (A)][1 − P (B)]
= P (A)P (B)
This proof shows that if two events are independent, then their complements are also
independent.
The other two cases of Theorem 4.15 say that if two events are independent, then each
event is independent of the complement of the other. The reader is asked to prove them
in Exercise 4.33.
177
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Corollary 4.9
If events A1 , A2 , · · · , An defined on the same sample space S are statistically independent,
then A1 , A2 , · · · , An are independent events.
Theorem 4.18
Proof
By Corollary 4.9, the events A1 , A2 , · · · , An are independent.
Hence
By Theorem 4.5
P (A1 ∪ A2 · · · ∪ An ) + P (A1 ∩ A2 ∩ · · · ∩ An ) = 1
so that
P (A1 ∪ A2 ∪ · · · ∪ An ) = 1 − P (A1 ∩ A2 ∩ · · · ∩ An )
= 1 − P (A1 )P (A2 ) · · · P (An )
Corollary 4.10
If all the events A1 , A2 , · · · , An have the same probability p then
P (A1 ∪ A2 ∪ · · · ∪ An ) = 1 − (1 − p)n = 1 − q n
where q = 1 − p
Example 4.25
Refer to Example 4.5. Use Theorem 4.18 and its corollary to calculate the probabilities.
178
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
Solution
a) Let Ai (i = 1, 2, · · · , 6) denote the event that the athlete i wins a Gold medal and
P (Ai ) its corresponding probability. Then the probability that the athletes i does
not win a Gold medal is
2 3
q = P (A1 ) = P (A2 ) = · · · = P (A6 ) = 1 − =
5 5
By Corollary 4.10, the probability that at least one of the athletes wins a Gold medal is
6
3
P (A1 ∪ A2 ∪ · · · ∪ A6 ) = 1 − q 6 = 1 − = 0.953344
5
b) P ( A 1 ∪ A 2 ∪ … ∪ A n )
Most beginners always confuse the concept of mutually exclusive events with statistical
independence. In general, mutually exclusiveness has to do with simultaneous occurrence
or non-occurrence of events while statistical independence has to do with the effect of the
occurrence of one event on the occurrence or non-occurrence of another event.
a) Mutual exclusiveness is associates with the addition rule while statistical independence
is associated with the multiplication rule.
b) Two mutually exclusive events are strongly dependent, that is, if two events A and
B are mutually exclusive then if A occurs, B cannot occur and vice versa, and this
is, the very opposite of independent events. Independence means that the outcome
of one event does not influence the outcome of the other.
c) For two mutually exclusive events A and B to be independent, at least one of them
should have a probability of zero.
d) Unlike mutually exclusive events, independent events cannot be spotted in Venn
diagrams.
179
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
1 P (A ∩ B) = 0 (A ∩ B) = P (A)P (B)
This chapter concludes discussions on the fundamental theory of probability. In the next
two chapters we shall introduce the concept of a random variable and its basic numerical
characterisation.
180
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
EXERCISES
4.1 Show that the probability that exactly one of the events A and B occurs equals
4.2 If A and B are mutually exclusive events and it is known that P (A) = 0.20 while
P (B) = 0.30 . Estimate
4.3 Suppose that in Exercise 4.2, the events A and B are not mutually exclusive, re-
evaluate the probabilities, assuming AandB are independent.
4.7 If A and B are disjoint events and P (A) = 0.3 and P (B) = 0.6,
(a) P (A ∪ B) (b) P (A ∪ B) (c) P (A ∪ B) (d) P (A ∩ B)
calculate
(a) P (A ∪ B) (b) P (A ∪ B) (c) P (A ∪ B) (d) P (A ∩ B)
4.8 A, B , and C are three events in a sample space such that
4.8 A, B, and C are three events in a sample space such that
P (A) = x, P (B) = y, P (A ∩ B) = z.
4.9 A certain town of population size 100,000 has 3 newspapers, A, B , and C . The
proportion of the inhabitants that read these papers are as follows:
4.10 The following data were given in a study of a group of 1,000 subscribers to a certain
magazine. In reference to sex, marital status, and education, there were 312 males,
470 married persons, 525 college graduates, 86 married males, and 25 married male
college graduates. Show that the numbers reported in the study must be incorrect.
4.11 If you hold 5 tickets to a lottery for which n tickets were sold and 3 prizes are to
be given, what is the probability that you will win at least 1 prize?
4.13 A man has n keys, of which one will open his door. If he tries the keys at random,
what is the probability that he will open the door on his kth try
4.14 Roll a pair of fair dice once. Given that the two numbers that occur are not the
same, what is the probability that the sum is 7 or that the sum is 4 or that the sum
is 12.
4.15 Suppose a fair coin is tossed four times and on each toss a head comes up. What
is the probability that it will come up head on the fifth toss as well? Comment on
the result.
4.16 Two balls are selected at random without replacement from a box which contains 4
white and 8 black balls. Compute the probability that
(a) both balls are white (b) the second ball is white.
182
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
4.17 A box contains 8 bulbs, of which 5 are good and 3 are defective. If 3 bulbs are
randomly taken from the box, what is the probability that all are good?
4.18
Box 1 contains 4 defective and 16 non-defective light-bulbs. Box 2 contains 1
defective and 1 non-defective light bulbs. Roll a fair die once. If you get a 1 or a 2,
then we select a bulb at random from box 1. Otherwise we select a bulb from box
2. What is the probability that the selected bulb will be defective?
4.19 A box contains 8 red, 3 white and 9 blue balls. If 3 balls are drawn at random
without replacement, determine the probability that
4.20 Suppose that in Exercise 4.18 a bulb selected is defective. What is the probability
that the bulb comes from box 2?
4.21 Two fair dice are thrown once. Find the probability that the sum is 5, given that
one of the dice came up with a 3?
183
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
4.22 Refer to Exercise 4.21. Find the probability that the sum is 5, given that neither die
came up with a 1.
4.23 A pair of dice, one blue and one white, is rolled once and the sum of the numbers
that show is 7.
a) What is the probability that the blue die shows the number 3?
b) If it is known that the first number is 3 what is the probability that it was on
the blue die?
4.24 A manufacturing company has two plants, 1 and 2. Plant 1 produces 40% of the
company’s output and plant 2 produces the other 60% . Of the output produced
by plant 1, 95% are good and of that produced by plant 2, 10% are defective. If a
product is randomly selected from the output of this company, what is the probability
that the output will be good?
4.25 Three shots are fired at a target in succession. The probability of a hit in the first
shot is p1 = 0.3,, in the second, p2 = 0.6 , and in the third p3 = 0.8 . The probability
of destroying the target in the first hit λ1 = 0.4, the second hit, λ2 = 0.7, and in the
third hit, λ3 = 1.0. What is the probability that the target will be
a) destroyed
b) by three shots.
4.26 From past experience with the illness of his patients, a doctor has gathered the following
information: 5% feel that they have cancer and do have cancer, 45% feel that they
have cancer and don’t have cancer, 10% do not feel that they have cancer and do
have it, and finally 40% feel that they do not have cancer and really do not have it.
A patient is randomly selected from this doctor’s practice. What is the probability
184
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
4.27
Refer to Exercise 4.24. If a product selected at random was good, what is the
probability that it comes from plant 1.
4.28 Suppose that you are a political prisoner in Ghana and are to be sent to one of
two prisons Nsawam or James Fort. The probabilities of being sent to these two
places are 0.7 and 0.3, respectively. Suppose it is known that 85% of the residents
of Nsawam wear a white shirt or blouse, whereas in James Fort it is 80%. Late one
night you are blindfolded and thrown on a truck. Two weeks later (you estimate) the
truck stops, you are told you have arrived at your place of exile, and your blindfold
is removed. The first person you see is not wearing a white shirt or blouse. What is
the probability that the prison you are being sent to is Nsawam?
4.29 Of 100 patients in a hospital with a certain disease, ten are chosen to undergo a drug
treatment that increases the percentage cured rate from 50 percent to 75 percent.
If a doctor later encounters a cured patient, what is the probability that he received
the drug treatment?
Three boxes each contain two coins. In one box, B1, both coins are gold, in another,
B2 , both are silver, and in the third, B3 , one is gold and the other is silver. A box
is chosen at random and from it, a coin is chosen at random. If this coin is gold,
what is the probability that it came from the box containing two gold coins?
4.31 Suppose it is known that 3% of cigarette smokers develop lung cancer, whereas only
0.5% of non-smokers develop lung cancer. Furthermore, suppose that 30% of adults
smoke. If it is known that a randomly chosen adult had developed lung cancer, what
is the probability that the person is a smoker?
4.32 Two hunters independently fired at a bird. The probability that the first hunter will
kill the bird is p1 = 0.8 and that of the second is p2 = 0.4. Suppose the bird is
killed by a single hit. What is the probability that it was killed by the first hunter.
185
INTRODUCTORY
PROBABILITY THEORY BASIC PROBABILITY LAWS AND THEOREMS
4.34
A box contains ten numbered identical balls. A ball is picked at random with
replacement from the box. Consider the events:
4.37 Suppose A and B are independent with P (A) = 0.54 and P (B) = 0.4 . Find P (A ∪ Bc ).
186
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
5 RANDOM VARIABLES
5.1 INTRODUCTION
The foundation of probability theory was set forth in the preceding chapters. The ideas of
an experiment, a sample space corresponding to the experiment and events in a sample space
were introduced. The axioms of a probability measure on these events were also postulated. In
this chapter, the concept of a random variable and its probability and distribution functions
are introduced. This is to unify the study of probabilistic situations which is achieved by
mapping the original sample space to the real line for any experiment. This means we will
only have to study the one sample space, the real line.
Let S be the sample space associated with some experiment E . A random variable X is a
function that assigns a real number X(s) to each sample point s ∈ S
Schematically, we may present this concept of a random variable as in Figure 5.1. A random
variable (r.v.) is sometimes called a stochastic variable, a random function or a stochastic
function and usually denoted by a capital letter such as X, Y, S, T, Z and the corresponding
lower case letters, x, y, s, t, z, to denote particular values in its range.
S X(S)
187
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
One of the essential things that the notion of random variables does is to provide us the
power of abstraction and thus enabling us avoid dealing with unnecessarily complicated
sample spaces. Suppose X is a random variable and x a real number. We define
AX = {s ∈ S|X(s) = x}
where event AX is the subset of S consisting of all sample points s which the random
variable X assigns the value x . Clearly,
A X ∩ AY = ∅, if x �= y and
AX = S
X∈R
Thus, the collection of events AX for all x defines an event space which we may find more
convenient to work with rather than the original sample space, provided our only interest in
performing the experiment is with the resulting experimental value of the random variable X.
Example 5.1
Let E be an experiment of tossing a coin twice and let us be interested in the number of
heads H which come up. Then the sample space S associated with this experiment is
S = {HH, HT , T H, T T }
A random variable X and a sample space S can be defined such that for s ∈ S, X(s) is
the number of heads in the point s ∈ S . Pictorially the random variable can be viewed as
a mapping (see Fig. 5.2 ).
TT
0
HT
1
TH
2
HH
188
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
Sample point HH HH TH TT
X 2 1 1 0
In this table the event {s : X(s) = 1} is simply the set {HT , T H}. The notation {s : X(s) = 1}
is often shortened to {X = 1}. In general {X = x} will be used to represent the event AX .
Similarly {s : X(s) < 1} is shortened to {X < 1}. And finally, the probabilities of the two
events (X = 1) and (X < 1) are usually written as P (X = 1) or simply p(1) and P (X < 1)
respectively.
From Example 5.1, the original sample space comprised four sample points but the event
space has three event points, namely, 0, 1, 2. In Example 5.4 we shall observe that when
a coin is tossed three times, the original sample space contains eight sample points but the
event space defined by X contains four event points. In general, for a sequence of n coin-
tossing experiments, say, there are 2n sample points in the original sample space whereas
the event space defined by X will have (n + 1) event points.
189
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
It is important to note that not every mapping from S to the real line will be considered
as a random variable. For example, a function that assigns more than one real number to
any element of S (a one to-many function), is unacceptable. Only functions that assign
exactly one real number to each element of S are acceptable. Of course, the same real
number may be assigned to many elements of S (that is, a many-to-one function is also
satisfactory). The domain of a random variable X is the sample space S and the range space
Rx is a subset of the real line.
Example 5.2
Refer to Example 5.1. Which of the following relations is a random variable?
i) Suppose the sample point b corresponds to “no head”, c to “one head” and also
to “two heads”, and a to “two heads”. The mapping is given below:
a
a
c
b c
b
0 1 2
0 1 2
ii) Suppose the sample point c corresponds to “no head”, b to “one head” and a to
“two heads”:
a
a
c
b c
b
0 1 2
0 1 2
190
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
iii) Suppose the sample point a corresponds to “no head”, b to “no head” and c to
“no head”.
c
b
0 1 2
Solution
The function defined by (i) is not a random variable because the element c corresponds to
both 1 and 2 .
The function defined by (ii) is a random variable because it is a one-to-one function.
The function defined by (iii) is a random variable because it is a many-to-one function.
Just as there are often many sample spaces that can be of interest in an experiment, there
can also be many different random variables of interest for the sample space.
Example 5.3
We may define the following other random variables for the sample space for Example 5.1:
We shall distinguish between two basic types of random variables: the discrete and the
continuous random variables.
A random variable which takes on a finite or countably infinite number of values is called a
discrete random variable
191
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
That is, the possible values of X may be listed as x1 , x2 , · · · , xn , · · ·. In the finite case the list
terminates at n and in the countably infinite case, the list continues indefinitely.
A random variable which takes on an uncountably infinite number of values is called a continuous
random variable
A random variable may also be partly discrete and partly continuous but such random
variables will not be considered in this book.
RUN FASTER.
RUN LONGER.. READ MORE & PRE-ORDER TODAY
RUN EASIER… WWW.GAITEYE.COM
192
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
When such information is available we say that the distribution law or the probability
distribution or simply the distribution of the random variable X is known.
The values xi may be either finite or countably infinite and in any order though for
convenience it should be in increasing order of magnitude. The probability distribution
of a discrete random variable is more often called a probability function (p.f.) of a discrete
random variable or a probability mass function (p.m.f.) and denoted by p(xi ) or P (X = xi ).
It is the probability that the random variable X assumes a specific value xi . Note that for
a specific value xi , p(xi ) = P (X = xi )
Tabular Form
It is convenient to specify the probability distribution of a random variable by means of a
table having two rows: the upper row contains the possible values the random variable assumes
and the lower row contains the corresponding probabilities of the values (see Table 5.1).
xi x1 x2 … xn
193
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
or
X x1 x2 … xn
When the set of possible values of a random variable X is countably infinite then the
probability distribution of the random variable X is given in the form of the following table:
xi x1 x2 …
Graphical Form
The probability distribution may also be given graphically. The graph represents chance
and not data.
To obtain a probability graph, vertical lines or bars are drawn above the possible values xi
of the random variable X on the horizontal axis. The height of each line or bar (rectangle)
equals the probabilities of the corresponding values of the random variable (see Fig. 5.3).
194
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
p(x)
p(x3)
p(x2) p(x4)
p(x1) P(x5)
x
x1 x2 x3 x4 x5
(a) Vertical Lines
p(x)
x
x1 x2 x3 x4 x5
(b) Bar Chart
This e-book
is made with SETASIGN
SetaPDF
www.setasign.com
195
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
p(x)
0.2
0.1
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11
Similar to the probability bar chart, the height of each rectangle of a probability histogram
is proportional to the probability that the random variable X takes on the value which
corresponds to the midpoint of its base. The values 0, 1, 2, · · · have respectively, the midpoints
−0.5 and 0.5, 0.5, and 1.5, 1.5 and 2.5, · · · which are proportional to their corresponding
probabilities. In using the midpoints we are in a sense ” spreading” the values of the given
discrete random variable over a continuous scale. We can, therefore, construct a polygon
using these midpoints and then approximate the graph of a discrete random variable with
a continuous curve.
If each of the rectangles of the histogram has a unit width then we say that the areas,
rather than their heights, equals the corresponding probabilities. In the situation when
the rectangles of the histogram do not all have unit width, we adjust the heights of the
rectangles or modify the vertical scales.
Formula
Representing a discrete probability distribution by means of a formula is the most useful
method. For example
1
P (X = x) = , x = 1, 2, · · · , n
n
Note
This is a special case of a discrete distribution known as the discrete uniform probability
distribution.
196
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
We must point out immediately that not all functions of a discrete random variable qualify
to be called probability mass functions. We shall now give a formal definition of a probability
mass function.
Any function p defined on all possible values of the discrete random variable X = xi , i = 1, 2, · · ·
is called a probability mass function if it satisfies the following properties:
Property 1: p(xi ) ≥ 0, i = 1, 2, · · ·
Property 2: p(xi ) = 1
i
where the summation is over all possible values of the random variable X
This definition is obvious. A random variable X in one experiment takes only one of its n
possible values with a corresponding probability which is non-negative. In other words, in
an experiment, one of the only possible and pairwise mutually exclusive events:
X = x1 , X = x2 , · · · , X = xn
will happen by all means. Such events, as pointed out earlier, forms a partition of the
sample space, and their union is a sure event. Consequently, the sum of the probabilities
of these events:
Example 5.4
A fair coin is tossed three times. Let X represent the number of heads which come up.
Solution
a) The sample space is
197
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
1
The probability of each outcome is , since all the outcomes are equally likely simple events.
8
With each sample point we can associate a number for X, as shown in the table below.
Sample
TTT TTH T HT HT T HHT HT H T HH HHH
point, s
Number of
0 1 1 1 2 2 2 3
Heads, X
The table above shows that the random variable can take the values 0, 1, 2, 3. Our next task
is to compute the probability distribution p(xi ) of X.
1
P (X = 0) = P (T T T ) =
8
P (X = 1) = P ({T T H} ∪ {T HT } ∪ {HT T })
= P (T T H) + P (T HT ) + P (HT T )
Free eBook on
Learning & Development
By the Chief Learning Officer of McKinsey
Download Now
198
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
1 1 1 3
+ + =
=
8 8 8 8
P (X = 2) = P ({HHT } ∪ {HT H} ∪ {T HH})
= P (T T H) + P (T HT ) + P (HT T )
1 1 1 3
= + + =
8 8 8 8
1
P (X = 3) = P (HHH) =
8
xi 0 1 2 3
p(xi ) 1 3 3 1
8 8 8 8
Note
This probability distribution satisfies the properties given in Definition 5.6, namely,
p(xi ) ≥ 0 and p(xi ) = 1
4
8
3
8
2
8
1
8
x
0 1 2 3
Example 5.5
A discrete function is given by
⎧
⎪ 1
⎪
⎨ (2x + 3), x = 1, 2, 3
p(x) = 21
⎪
⎪
⎩
0, elsewhere
Verify that it is a probability function of some random variable.
199
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
Solution
Clearly, p(xi ) ≥ 0
and
3
1 1 1
(2x + 3) = [(2 + 3) + (4 + 3) + (6 + 3)] = (5 + 7 + 9) = 1
21 x=1 21 21
Example 5.6
A discrete random variable X has a p.m.f.
⎧
⎨ k(x − 1), x = 3, 4, 5
⎪
p(x) =
⎪
⎩ 0, elsewhere
Find the constant k for which p(x) is a probability function.
Solution
Since the function is a p.m.f.
5
k(x − 1) = 1
i=3
Now,
5
k(x − 1) = k [(3 − 1) + (4 − 1) + (5 − 1)] = 9k = 1
3
1
From which k = .
9
The probability distribution for a continuous random variable is more often called a
probability density function (p.d.f.) or simply density function and is denoted by f (x).
200
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
A function f defined on the real numbers is called a probability density function (p.d.f.) if it
satisfies the following properties:
The first property indicates that, unlike the p.m.f p(xi ), the value of the p.d.f. f (x) at the
point x is not a probability and that it is perfectly acceptable if f (x) > 1 at the point x . It
is when the function is integrated over some interval that a probability is obtained, hence
the name “density”.
The second property is a mathematical statement of the fact that a real-valued random
variable must certainly lie between - ∞ and ∞ . That is, the probability that a random
variable X takes any value between - ∞ and ∞ is a sure event.
www.sylvania.com
We do not reinvent
the wheel we reinvent
light.
Fascinating lighting offers an infinite spectrum of
possibilities: Innovative technologies and new
markets provide both opportunities and challenges.
An environment in which your expertise is in high
demand. Enjoy the supportive working atmosphere
within our global group and benefit from international
career paths. Implement sustainable ideas in close
cooperation with other specialists and contribute to
influencing our future. Come and join us in reinventing
light every day.
Light is OSRAM
201
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
If f (x) is a p.d.f for a continuous random variable X then we can represent y = f (x)
graphically by a curve as in Figure 5.5. By Property 1, f (x) is non-negative so the curve
cannot fall below the x -axis. By Property 2 the total area under the curve (i.e bounded by
the curve and the x -axis) must be 1.
y=f(x)
The following definition gives the relationship between a random variable and its density
function.
Definition 5.8
X is said to be a continuous random variable if there exists a function f , called the probability
density function of X such that
a) f (x) ≥ 0
∞
b) f (x) dx = 1
−∞
b
c) P (a ≤ X ≤ b) = f (x) dx, −∞ ≤ a ≤ b ≤ ∞
a
Geometrically, relation (c) in Definition 5.8 means the following: the probability that a
random variable X takes the value in the interval (a, b) equals the area of the region defined
by the curve of the probability distribution y = f (x), the straight lines x = a and x = b ,
and the x -axis (see shaded region in Figure 5.6).
202
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
y=f(x)
x
0 c b
Fig. 5.6 Area of Probability Density Function
If all non-zero possible values of the random variable X lie in the interval (a, b) then they
lie in the interval (−∞, ∞) as well. Thus
b ∞
P (a ≤ X ≤ b) = f (x) dx = f (x) dx = 1
a −∞
Note
a) It is not actually necessary that f (x) be continuous everywhere. It is only necessary
that the derivative
d
f (x) = f (x)
dx
exists everywhere except at possibly a finite number of points in which case the function
f (x) is said to be piecewise continuous.
b) f (x) needs not be less than unity for all values of X. It needs only be nonnegative
piecewise continuous and have unit area; and any such function is the p.d.f. of
some random variable.
Theorem 5.1
If X is a continuous random variable having p.d.f. f (x), then for any number a ,
P (X = a) = 0
203
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
Proof
P (X = a) = P (a ≤ X ≤ a)
a
= f (x) dx = 0
a
This theorem does not imply that the set {s|X(s) = a} is empty but that the probability
assigned to this set is zero. That is, the probability that a continuous random variable X
takes one definite value, say a , is zero even though the probability density may not be zero.
That is, if x0 ∈ X, the fact that P (X = x0 ) = 0 does not mean that x0 cannot occur. For
example, choose a point in the interval (0, 1) at random. Each point has a probability of
zero of being chosen on any particular trial but on each trial some point is chosen.
Theorem 5.2
360°
.
P (a ≤ X ≤ b) = P (a ≤ X < b)
= P (a < X ≤ b)
= P (a < X < b)
thinking
360°
thinking . 360°
thinking .
Discover the truth at www.deloitte.ca/careers Dis
Discover the truth at www.deloitte.ca/careers © Deloitte & Touche LLP and affiliated entities.
Proof
P (a ≤ X < b) = P (X = a) + P (a < X < b)
= 0 + P (a < X < b) (by Theorem 5.1)
= P (a < X < b)
The other parts may be proved in a similar way.
That is, the probability that X is in the interval from a to b is the same whether a itself
is included or excluded or whether b also is included or excluded. This situation of course
is quite different from that where a random variable is discrete.
Example 5.7
Let X be a continuous random variable such that
⎧
⎨ 1
x, 0 < x < 4
f (x) = 8
⎩ 0, elsewhere
Solution
a) It is clear f (x) ≥ 0 so Property 1 is satisfied. For the f (x) to be a p.d.f., it must
also satisfy Property 2, that is
∞
f (x) dx = 1
−∞
Now,
4 4
1
f (x)dx = x dx = 1
0 0 8
205
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
4
8
3
8
2
8
1
8
x
0 1 2 3 4
Example 5.8
A random variable X has the p.d.f.
ax, 0 < x < 4
f (x) =
0, elsewhere
where a is a constant.
Solution
a) Since the function is a p.d.f.
4 4
f (x) dx = a x dx = 8a = 1
0 0
1
Hence a = .
8
3
1 5
b) P (2 < X < 3) = x dx =
2 8 16
The cumulative distribution function (c.d.f.) or simply the distribution function is the most
universal characteristic of a random variable. It exists for all random variables whether they
are discrete or continuous.
206
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
Let X be a random variable and x any real number. The cumulative distribution function of X
is a function F defined as the probability that the random variable X takes a value less than
or equal to x :
F (x) = P (X ≤ x)
Do you like cars? Would you like to be a part of a successful brand? Send us your CV on
We will appreciate and reward both your enthusiasm and talent. www.employerforlife.com
Send us your CV. You will be surprised where it can take you.
207
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
Let X be a discrete random variable with probability p(xi ), then the cumulative distribution
function is given by
F (x) = p(xi )
xi ≤x
Fig. 5.7 depicts the graph of F (x) which is discontinuous at the possible values x1 , x2 , , xn .
At these points F (x) is continuous from the right but discontinuous from the left. Because
of the appearance of its graph, the cumulative distribution function for a discrete random
variable is also called a staircase function or a step function, having jump discontinuities
at all possible values X = xi with a step at the xi of height p(xi ). The graph increases only
through these jumps at x1 , x2 , · · · , xn . Everywhere else in the interval [xi , xi+1 ], the cumulative
distribution function F (xi ) is constant.
n
f(x) p(x)=1
i=1
p(xn)
p(x3)
p(x1)+p(x2)
p(x2)
p(x1)
p(x1)
x
x1 0 x2 x3 x4
208
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
Example 5.9
Refer to Example 5.4.
Solution
Step 1
We find the probability distribution of the random variable X. The probability distribution
of this example has been found already in Example 5.4. We reproduce the result here
for convenience.
xi 0 1 2 3
p(xi ) 1
8
3 3 1
8
8 8
Step 2
We find the cumulative distribution function:
F (0) = P (X ≤ 0) = P (0 ≤ X < 1)
= P (X < 0) + P (X = 0)
1
= 0 + p(0) =
8
F (1) = P (X ≤ 1) = P (0 ≤ X < 2)
= P (X < 0) + P (X = 0) + P (X = 1)
= 0 + p(0) + p(1)
1 3 4
= 0+ + =
8 8 8
F (2) = P (X ≤ 2) = P (0 ≤ X < 3)
= P (X < 0) + P (X = 0) + P (X = 1) + P (X = 2)
1 3 3 7
= 0 + p(0) + p(1) + p(3) = 0 + + + =
8 8 8 8
209
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
F (3) = P (X ≤ 3) = P (0 ≤ X ≤ 3)
= P (X < 0) + P (X = 0) + P (X = 1) + P (X = 2)
+P (X = 3)
= 0 + p(0) + p(1) + p(3) + p(4)
1 3 3 1
= 0+ + + + =1
8 8 8 8
AXA Global
Graduate Program
Find out more and apply
210
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
f(x)
1 +3+3+1
8 8 8 8 8
8
1
1 +3+3 8
6 8 8 8
8
3
1 +3 8
4 8 8
8
3
2 8
8 1
8
1
8
x
0 1 2 3 4
Example 5.10
Suppose we were given the distribution function of Example 5.7, find its probability
distribution.
Solution
Note from the graph of this distribution function (see Example g5.7) that the magnitude or
1 3 3 1
the height (that is, p(xi )) of the jumps (steps) at 0, 1, 2, 3, are , , , , respectively, hence
8 8 8 8
⎧
⎪
⎪ 1/8, x = 0
⎪
⎨ 3/8, x = 1
p(x) =
⎪
⎪ 3/8, x = 2
⎪
⎩
1/8, x = 3
Note
We can obtain this result without the graph by finding the difference in the adjacent values
of F (x).
211
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
Let X be a continuous random variable with probability distribution function f (x). Then the
cumulative distribution function F (x) is given by:
x
F (x) = f (t) dt
−∞
Proof
We know that
b
P (a < x < b) = f (x) dx
a
By Definition 5.9
F (x) = P (X ≤ x)
= P (−∞ < X ≤ x)
x
= f (t) dt
−∞
The typical graph of the cumulative distribution of the continuous random variable is
shown in Fig. 5.8.
f(x)
x
0 1
The graph of F (x) is continuous. Its slope needs not be everywhere continuous, but where
this is so, it is equal to the probability density function. That is to say,
F (x) = f (x)
212
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
Example 5.11
The probability density function of X is given by
⎧
⎪
⎪ 0, x<0
⎪
⎪
⎪
⎪
⎨ x
f (x) = , 0≤x≤2
⎪
⎪
⎪
2
⎪
⎪
⎪
⎩
0, x>2
Solution
If x < 0 , then
x
F (x) = f (t) dt = 0
−∞
�e Graduate Programme
I joined MITAS because for Engineers and Geoscientists
I wanted real responsibili� www.discovermitas.com
Maersk.com/Mitas �e G
I joined MITAS because for Engine
I wanted real responsibili� Ma
Month 16
I was a construction Mo
supervisor ina const
I was
the North Sea super
advising and the No
Real work he
helping foremen advis
International
al opportunities
Internationa
�ree wo
work
or placements ssolve problems
Real work he
helping fo
International
Internationaal opportunities
�ree wo
work
or placements ssolve pr
213
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
x
= 0+ f (t) dt
0
x
t
= dt
0 2
x2
=
4
If x > 2 , then
0 2 x
F (x) = f (t) dt + f (t) dt + f (t) dt
−∞ 0 2
= 0+1+0=1
Thus,
⎧
⎪
⎪ 0,
x<0
⎨ 2
x
F (x) = , 0≤x≤2
⎪
⎩ 4
⎪
1, x>2
f(x)
x
1 2
Note
The c.d.f. F (x) for a continuous random variable will always be a continuous function but
the p.d.f f (x) may or may not be.
Property 1
The function F (x) is a probability, consequently,
0 ≤ F (x) ≤ 1
214
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
Property 2
F (x) is a nondecreasing function of x , that is, for two particular values of x1 and x2 , if
x1 ≤ x2 , then F (x1 ) ≤ F (x2 ).
Property 3
The probability that a random variable X takes the value within an interval (a, b) is equal
to the increment of the distribution function in that interval:
This means that all probabilities of interest can be computed once the cumulative function
F (x) is known.
Property 4
a) F (+∞) = lim F (x) = 1
x→+∞
Property 5
F (x) is always right continuous, that is
Note
It is not true that F (x) is continuous from the left, that is,
for all points x0 . This is because we have defined F (x) = P (X ≤ x). If F (x) had been
defined as P (X < x) (strict inequality), it would have been continuous on the left, but not
on the right.
Any function satisfying all the five properties stated above is the c.d.f. of some random
variable.
215
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
Example 5.12
A random variable X has c.d.f.
⎧
⎪
⎪ 0, x ≤ −5
⎪
⎪
⎪
⎪
⎨ x
F (x) = + 1, −5 < x ≤ 0
⎪
⎪
⎪
6
⎪
⎪
⎪
⎩
1, x>0
Solution
Now
0
F (0) = +1=1
6
1 5
F (−1) = − + 1 =
6 6
170x115_MIM_ datos_Espv4.pdf 1 17/12/15 18:29
EL 93%
DE LOS ALUMNOS DEL MIM ESTÁN
TRABAJANDO EN SU SECTOR
C
A LOS 3 MESES DE GRADUARSE
M
MASTER IN MANAGEMENT
CM
MY
216
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
Hence
5 1
P (−1 < X < 0) = F (0) − F (−1) = 1 − =
6 6
Theorem 5.5
f (x) = F (x)
Note
The derivative of F (x) is f (x) at all points where f (x) is continuous, which is everywhere
except at the point x = x0 where f (x) is discontinuous.
Example 5.13
A random variable X has a cumulative distribution function:
⎧
⎪
⎪ 0, x<0
⎪
⎪
⎪
⎪
⎨ x
F (x) = , 0 ≤ x < 10
⎪
⎪
⎪
10
⎪
⎪
⎪
⎩
1, x ≥ 10
Solution
f (x) = F (x)
217
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
That is,
⎧
⎪
⎪ 0, x<0
⎪
⎪
⎪
⎪
⎨
1
f (x) = F (x) = , 0 ≤ x < 10
⎪ 10
⎪
⎪
⎪
⎪
⎪
⎩
0, x ≥ 10
or
⎧
⎪ 1
⎪
⎨ , 0 < x < 10
f (x) = 10
⎪
⎪
⎩
0, elsewhere
Example 5.14
A random variable X has the following probability density function
⎧
⎨ x + 1, −1 ≤ x < 0
⎪
f (x) = 1 − x, 0 ≤ x < 1
⎪
⎩ 0, elsewhere
Solution
x
F (x) = P (X ≤ x) = f (t) dt
−∞
If x < –1,
�1 then f (x) = 0 , consequently,
F (x) = 0
If −1 ≤ x < 0,
x x
(x + 1)2
F (x) = f (t) dt = (t + 1) dt =
−∞ −1 2
218
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
If 0 ≤ x < 1 then
x
F (x) = f (t) dt
−∞
0 x
= (t + 1) dt + (1 − t) dt
−1 0
(1 − x)2
= 1−
2
If x ≥ 1,
0 1
F (x) = (t + 1) dt + (1 − t) dt
−1 0
= 1
219
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
Example 5.15
Consider a random variable X with probability density function
⎧
⎪ 1
⎪
⎨ , 0 < x < 30
f (x) = 30
⎪
⎪
⎩
0, elsewhere
Find
(a) P (X > 25|X > 15) (b) P (X < 20|X > 15)
(c) P (X > 15|X < 22) (d) P (X < 13|X < 18)
Solution
220
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
The reader will be asked to solve the remaining questions in Exercise 5.28.
The probability function and the probability distribution are complete characterisation of
the probabilistic behaviour of a random variable. Nevertheless, there are other useful though
relatively weak characterisation. These are the means and variances which give an indication
of location and width or dispersion of the distributions. These are the topics of discussion
in the next chapter.
EXERCISES
5.1 Two fair coins are tossed once. Let X represent the number of tails.
a) Find the probability distribution of X;
b) Construct a probability graph;
c) Find the distribution function of X and,
d) Sketch the graph.
5.2 A fair die is thrown once. Let X represent the number that show up.
a) Find the probability distribution of X;
b) Construct a probability graph;
c) Find the distribution function of X and,
d) Sketch the graph.
221
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
CLICK HERE
to discover why both socially
and academically the University
of Groningen is one of the best
places for a student to be
www.rug.nl/feb/education
222
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
a
f (x) = , −∞ < x < ∞
1 + x2
5.10 Two fair dice are thrown once. Let X represent the sum of the results.
a) Find the probability distribution of X;
b) Construct a probability graph;
c) Find the distribution function of X and,
d) Sketch the graph.
223
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
Find
224
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
225
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
5.21 Let X be a continuous random variable with probability density function given by
3x2 , if 0 ≤ x ≤ 1
f (x) =
0, elsewhere.
Find
a) F (x);
b) P (0.12 ≤ X < 0.98); (c) P (X > 0.5);
c) Graph both f (x) and F (x).
5.22 The length of time to failure (in hundreds of hours) for a certain transistor is a
random variable X with distribution function given by
2
1 − e−x , x ≥ 0
F (x) =
0, x < 0.
5.23 A family on a University campus has a 150-gallon tank that is filled at the beginning
of each week. The weekly demand of the family shows a relative frequency behaviour
that increases steadily up to 100 gallons and then levels off between 100 and 150
gallons. If X denotes weekly demand in hundreds of gallons, the relative frequency
of demand can be modelled by
⎧
⎪
⎨ x, 0≤x≤1
f (x) = 1, 1 < x < 1.5
⎪
⎩ 0, elsewhere
a) Find F (x)
b) Find P (0 ≤ X < 0.5)
c) P (0.5 < X ≤ 1.2)
d) P (X ≥ 1|X ≤ 1.4)
5.24 As a measure of intelligence, mice are timed when going through a maize to reach a
reward of food. The time (in seconds) required by any mouse is a random variable
X with density function given by
⎧
⎨ b
, x≥b
f (x) = x2
⎩ 0, elsewhere,
226
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
5.27 The percentage of alcohol in a certain solvent is 100% where X may be regarded as
a continuous random variable with probability density function
4 x2 (1 − x), 0 ≤ x < 1,
f (x) =
0, elsewhere
227
INTRODUCTORY
PROBABILITY THEORY RANDOM VARIABLES
a) P (X < 13 )
b) P ( 13 < X < 23 )
c) P (X ≥ 23 )
a) P (X < 13 )
b) P ( 13 < X < 23 )
c) P (X ≥ 23 )
a) P (X < 13 )
b) P ( 13 < X < 23 )
c) P (X ≥ 23 )
American online
LIGS University
is currently enrolling in the
Interactive Online BBA, MBA, MSc,
DBA and PhD programs:
228
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
6 NUMERICAL CHARACTERISTICS
OF RANDOM VARIABLES
6.1 INTRODUCTION
In the previous chapter, we considered the probability distributions of random variables.
In a number of cases, we need to know much less about the random variable itself. We
may merely want to have a general ideal about its behaviour. That is, we may want to give
a general quantitative description of the random variable by obtaining a single value from
its values and the corresponding probabilities of these values. Usually, the first task is to
determine the location of the distribution, that is, a typical value to describe or summarise
the entire set of values.
A measure of location is a single value located in a set of values which can be used to describe
a distribution
Two general approaches may be taken to compute a measure of central location, and these
two approaches lead to two different classes of averages. The first approach is to define a
mathematical operation which might yield a very useful summary value. This approach
gives us what is known as a calculated average or mean’ or mathematical expectation.
The second approach to compute a measure of central location is to describe a special place
or location in the data and then try to find methods for giving a value for this location.
Such averages are called “positionary averages”. There are two such averages – the median
and the mode. These various measures of central location are discussed in Sections 6.3 to
6.5. The measures of “non-central” location in describing a distribution are called quantiles
or fractiles.
Density and distribution functions can be used to obtain simple descriptive measures of
the location.
229
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
6.3 MODE
The mode or the modal value of a distribution is a value xmod of the random variable at which
the probability distribution function takes its maximum value
Example 6.1
Toss a coin four times and define the random variable X as the number of heads. What is
the mode of the distribution?
Solution
When a coin is tossed four times, the following are all the possible combinations:
HHHH
HHHT , HHT H, HT HH, T HHH
4 heads 3 Heads
HHT T , HT HT , HT T H, T HT H, T HHT , T T HH
heads2
T T T H, T T HT , T HT T , HT T T , T
TT T
1 head no head
1 4 6
P (X = 4) = ; P (X = 3) = ; P (X = 2) = ;
16 16 16
4 1
P (X = 1) = , P (X = 0) =
16 16
6
6
The value x0 which has the highest probability of is 2 heads. Hence the mode is Xmod = 2 .
16
230
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
Example 6.2
The p.d.f. of a continuous random variable X is given by
2
2xe−x , 0 ≤ x < ∞
f (x) =
0, otherwise
Solution
The derivative of f (x) is
2 2
f � (x) = 2e−x − 4x2 e−x
2
= 2e−x (1 − 2x2 )
2
2e−x (1 − 2x2 ) = 0
1 1
which yields the solution x = . Hence the mode is √ .
2 2
231
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
The mode may not exist for some distributions and there may also be more than one
mode for a distribution. When a distribution has one mode it is said to be unimodal, two
modes is bimodal, three modes is trimodal. In general if a distribution has many modes
it is called multimodal.
Example 6.3
Roll a die once. Let X be the number of spots that show up. Find the mode.
Solution
1
All the six possible numbers 1, 2, 3, 4, 5, 6 have equal probabilities of . Hence the mode
6
does not exist.
6.4 MEDIAN
The median is a value above and below which half the probability lies
Theorem 6.1
The median of the distribution of a discrete random variable (xmed ) is a number x0 such that
1 1
P (X ≤ x0 ) ≥ and P (X ≥ x0 ) ≥
2 2
Example 6.4
Refer to Example 6.1. Find the median for the number of heads.
Solution
11 1
P (X ≤ 2) = >
16 2
11 1
P (X ≥ 2) = >
16 2
232
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
Note
a) From Theorem 6.1, the median is given in terms of the cumulative distribution and,
hence, it can be readily determined from the graph of a cumulative distribution function.
b) The median may or may not exist for the case of a discrete random variable.
c) Theorem 6.1 may lead to two possible values, In such a case, we take the midpoint
of the values as the median.
Example 6.5
Refer to Example 6.3. Find the median.
Solution
For xo = 3
3 4
P (X ≤ 3) = = 0.5, P (X ≥ 3) = > 0.5
6 6
which satisfies Theorem 6.1. Therefore 3 may be considered a median of the distribution.
However, for xo = 4
4 3
P (X ≤ 4) = > 0.5, P (X ≥ 4) = = 0.5
6 6
Here too, Theorem 6.1 is satisfied which means that 4 may also be considered a median
of the distribution.
As a matter of fact, any number in the range from 3 to 4 inclusive satisfies Theorem 6.1
and is a median. The midpoint of the values which is 3.5 is taken as the median.
Theorem 6.2
The median of the distribution of a continuous random variable, (xmed ) is a number x0 such that
1
P (X ≤ x0 ) = F (x) = and
2
1
P (X ≥ x0 ) = 1 − F (x) =
2
233
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
Proof
f f
1 1 1 1
If F (x0 ) ≥ and 1 − F (x0 ) ≥ then F (x0 ) ≥ and F (x0 ) ≤ which implies that
1 2 2 2 2
F (x0 ) = .
2
Example 6.6
Refer to Example 6.2. Find the median.
Solution
1
P (X ≤ x0 ) ≥ = F (x0 )
2 x0
2
= 2 t e−t dt
0
2
= 1 − e−x0
1
Now equating F (x0 ) to (by Theorem 6.2) we obtain
2
2 1
1 − e−x0 =
2
√
x0 = ln 2
Maastricht
University is
the best specialist
university in the
Visit us and find out why we are the best! Netherlands
(Elsevier)
Master’s Open Day: 22 February 2014
www.mastersopenday.nl
234
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
6.5 QUANTILES
The median is but one of a family of descriptive measures called quantiles or fractiles.
Theorem 6.3
F (xk ) = P (X ≤ xk ) = k
Note
Like the median
Example 6.7
Refer to Example 6.2. Find the lower and upper quartiles.
235
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
Solution
a) From Example 15.20,
2 1
1 − e−xp =
4
from which
4
xp = ln
3
2 3
1 − e−xp =
4
√
xp = ln 4
We have just discussed the mode, the median and the quantiles of a distribution as simple
descriptive measures of the centre. The other measure of the centre is the mean value known
as mathematical expectation in probability theory. Some other single values of particular
importance in Statistics are variance and moments of various orders. We shall discuss the
moments in the next chapter.
The term “expectation” is used here in a special, statistical sense and not as we might
understand it in ordinary everyday sense. We introduce the concept of mathematical
expectation with an example.
Example 6.8
Suppose there are 1,000 students in a school. Suppose also that on the first day of re-opening,
600 of them can spend Gh/c 5 each19; 200 students can spend Gh/c 50 each; 150 students
can spend Gh/c 100 each; 40 students can spend Gh/c 200 each; and 10 students can spend
Gh/c 500 each. What is the expected expenditure of students on a re-opening day?
236
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
Solution
The mean amount that could be spent on the re-opening day is found if the total sum
Gh/
c 41, 000 (= 600 × 5 + 200 × 50 + 150 × 100 + 40 × 200 + 10 × 500)
Now, let us define a random variable X to represent the amount a student spends on a re-
opening day which has the values
Gh/
c 5, Gh/
c 50, Gh/
c 100 200 and Gh/
c 500
237
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
Therefore, the expected expenditure of students on a re-opening day is equal to the sum of
the product of the value of the expenditure and their corresponding probabilities, namely,
(5 × 0.6) + (50 × 0.2) + (100 × 0.15) + (200.04) + (500 × 0.01) = 41, 000
We shall now give a general definition of the mathematical expectation of a random variable.
Let x1 , x2 , · · ·, be the range of a discrete random variable X which assumes the value of xi
with the probability p(xi ), i = 1, 2 · · ·, then the mathematical expectation of X is given by
∞
E(X) = xi p(xi )
i=1
provided that the finite series converges
If the infinite series diverges, the corresponding mathematical expectation does not exist.
Note
a) The mathematical expectation of a random variable X is often simply referred to
as the expectation or the expected value of X. It is usually used for the mean of
probability distribution as the average value in the long run.
b) The expectation of a random variable takes the unit of measurement of that random
variable. For example, if the random variable assumes its values in kilometre, its
expectation is also in kilometre.
Example 6.9
Given the following data, find E(X).
xi 0 1 2 3
238
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
Solution
xi 0 1 2 3 Total
Hence,
4
3 6 3 3
E(X) = xi p(xi ) = 0 + + + =
i=1
8 8 8 2
Before we discuss the expectation of a continuous random variable, we shall explain how
the arithmetic mean of a finite set of real numbers is an example of an “expectation”.
fi
p(xi ) = P (X = xi ) =
N
This is the formula of the arithmetic mean for grouped data. When each value of X is
equally likely and is separately listed, all f s are equal to 1, and
N
1
E(X) = xi
N i=1
239
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
Similar to the discrete case, the integral may or may not converge and the expectation may
or may not exist.
Example 6.10
A random variable X has the p.d.f
⎧
⎪ 1
⎨ x, 0 ≤ x ≤ 4
⎪
f (x) = 8
⎪
⎪
⎩
0, elsewhere
240
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
Solution
4 4 4
1 1 1 x3 8
E(X) = x xdx = x dx =
2
=
0 8 8 0 8 3 0
3
In discussing the properties of mathematical expectation, we shall assume that the expectation
itself exists. The expectation E(X) is said to exist (that is, it is finite or E(X) < ∞ ) if X is
a bounded random variable20.
Theorem 6.4
E(c) = c
Proof
The constant value c may be considered as random which takes only one value c with
probability 1. So its expectation is:
E(c) = c × 1 = c
Theorem 6.5
E(cX) = cE(X)
The proof of this theorem is left to the reader, see Exercise 6.3. The theorem means that if
we change the units of measurement of X, then the expectation of X changes in the same
way as X.
Example 6.11
For the data in Example 6.9, find the expectation for 2X and verify that it is equal to 2E(X) .
241
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
Solution
2xi 0 2 4 6 Total
1 3 3 1
p(xi ) 8 8 8 8 1
4
E(2X) = 2xi p(xi )
i=1
1 3 3 1
= 0 +2 +4 +6
8 8 8 8
6 12 6
= 0+ + + =3
8 8 8
3
E(X) =
2
so that
3
2E(X) = 2 =3
2
Hence
E(2X) = 2E(X)
Theorem 6.6
E(aX + b) = aE(X) + b)
242
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
Proof
Suppose X is a discrete random variable. Then
E(aX + b) = (ax + b)p(x)
x
= a x p(x) + bp(x)
x x
= a x p(x) + b p(x)
x x
= aE(X) + b (since by Definition 5.6, p(x) = 1)
x
If X is continuous
∞
E(aX + b) = (ax + b)f (x)dx
−∞
∞ ∞
= a xf (x)dx + b f (x)dx
−∞ −∞
∞
= aE(X) + b (since by Definition 5.7, f (x)dx = 1)
−∞
243
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
Example 6.12
Suppose the distribution of X is given by the following table:
x -2 4
Solution
If a = 3 and b = 2 , then we are required to show that
E(3X + 2) = 3E(X) + 2
Now,
x 3(−2) + 2 3(4) + 2
x -4 14
But
and
244
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
Theorem 6.6 states that the expectation of a linear function of a random variable is that
same linear function of the expectation. This is not generally true for all functions unless
a linear function is involved. It can be illustrated that
a) E(X 2 ) = [E(X)]2
b) E(ln X) = ln E(X)
1 1
c) E =
X E(X)
Example 6.13
Given the following table:
x -1 1
1 1
p(x) 2 2
Solution
1 1
E(X) = (−1) +1 =0
2 2
Now,
1 1
E(X 2 ) = (−1)2 + (1)2 =1
2 2
Thus,
E(X 2 ) = 1 = [E(X)]2 = 0
Theorem 6.7
245
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
Theorem 6.8
|E(X)| ≤ E(|X|)
Theorem 6.9
246
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
Theorem 6.10
The expectation of the deviation of the random variable X from its expectation is zero:
E(X − μ) = 0
where μ = E(X)
Proof
From Theorem 6.4 and 6.6
6.7 VARIANCE
6.7.1 DEFINITION OF VARIANCE
The expected value of a random variable X is its average value which can be viewed as an
indication of the central value of the density or frequency function. The expected value is,
therefore, sometimes referred to as the location parameter. To describe the behaviour of a
random variable adequately we also need to have some idea of how the values are dispersed
about the mean. One such measure of dispersion is the variance of the random variable,
denoted by V ar(X) or σ 2 .
The variance of a random variable X is the expectation of the square of the deviation of the
random variable from its expected value:
or
The unit of measurement of the variance is the unit of the random variable squared so the
frequently considered measure of dispersion is the standard deviation which has the same
unit of measurement as the random variable itself.
247
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
The positive square root of the variance, denoted as σ , is called the standard deviation:
σ = V ar(X)
Even though the standard deviation is the one that is used to measure how dispersed the
probability distribution is about its mean, of how spread out on the average are the values
of the random variable about its expectation, the dispersion is usually defined in terms
of variance.
Theorem 6.11
or
V ar(X) = E(X 2 ) − μ2
where μ = E(X)
Proof
Using Definition 6.3 and Theorem 6.6, we have21
248
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
Definition 6.9 VARIANCE
(Discrete Variable)
Let X be a discrete random variable which takes on values x1 , x2 , · · · with respective probabilities
p(x1 ), p(x2 ), · · · Then
V ar(X) = (xi − μ)2 p(xi )
i
where μ = E(X)
Example 6.14
For the data in Example 6.9, find
a) V ar(X);
b) the standard deviation of X.
249
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
Solution
3
From Example 6.9, E(X) = μ = . Now,
2
x 0 1 2 3 Total
Hence,
√
σ= 0.75 = 0.866
As we have realized, it is cumbersome to use the formula in Definition 6.9 to calculate the
variance. For computational purposes, it is advisable to use Theorem 6.8 which leads to
the following theorem.
Theorem 6.12
n
n 2
V ar(X) = x2i p(xi ) − xi p(xi )
i=1 i=1
Proof
V ar(X) = (xi − μ)2 p(xi ) (from Definition 6.5)
i
= [x2i p(xi ) − 2μxi p(xi ) + μ2 p(xi )]
i
= x2i p(xi ) − 2μ xi p(xi ) + μ2 p(xi )
i i i
= x2i p(xi ) − 2μ2 + μ2 (since p(xi ) = 1)
i i
250
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
Example 6.15
Work out Example 6.14 using Theorem 6.12.
Solution
x p(x) xp(x) x2 p(x)
0 1/8 0 0
Total 1 3/2 3
Hence
n
n 2
V ar(X) = x2i p(xi ) − xi p(xi )
i=1 i=1
2
3 3
= 3− =
2 4
Example 6.16
Refer to Example 6.10. Find V ar(X).
Solution
8
From Example 6.10, E(X) = .
3
Now
4 4
1 1 x4
E(X 2 ) = x2 x2 dx = =8
0 8 8 4 0
251
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
Hence
2
8 8
V ar(X) = E[X ] − [E(X)] = 8 −
2 2
=
3 9
Theorem 6.13
V ar(c) = 0
Proof
RUN FASTER.
RUN LONGER.. READ MORE & PRE-ORDER TODAY
RUN EASIER… WWW.GAITEYE.COM
252
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
Theorem 6.14
V ar(aX) = a2 V ar(X)
Proof
Applying Definition 6.3 and Theorem 6.2
But by Definition 6.3, the expression E[X − E(X)]2 is the variance of a random variable
X. Hence
V ar(a X) = a2 V ar(X)
Example 6.17
For the data in Example 6.2, calculate V ar(2X) and check whether Theorem 6.11 is satisfied.
Solution
From Example 6.7, V ar(X) = 0.75.
We shall now calculate V ar(2X) from the following table.
0 0 1/8 0
2 4 3/8 12/8
4 16 3/8 48/8
6 36 1/8 36/8
Total 96/8 = 12
[E(2X)]2 = 9
253
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
so that
Theorem 6.15
V ar(a X + b) = a2 V ar(X)
Proof
Using the definition of variance (Definition 6.3) and Theorems 6.2 and 6.3
Example 6.18
Refer to Example 6.12. Verify whether Theorem 6.12 is valid or not.
Solution
From Example 6.12, a = 3, b = 2. Let Y = aX + b .
254
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
E (3X + 2)2 = 16(0.4) + 196(0.6) = 124
This e-book
is made with SETASIGN
SetaPDF
www.setasign.com
255
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
Hence
V ar(3X + 2) = 32 V ar(X)
Now,
x2 (−2)2 16
Hence
EXERCISES
256
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
6.5 The probability distribution of a random variable X is represented in the table below.
X -2 -1 0 0 4
Find
X 1 2 3 4 5
Find
257
INTRODUCTORY
PROBABILITY THEORY NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES
6.11 If
λe−λx , x > 0
f (x) =
0, x<0
determine E(X k )
Free eBook on
Learning & Development
By the Chief Learning Officer of McKinsey
Download Now
258
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
In this chapter, we shall consider a set of numerical descriptive measures that under rather
general conditions, uniquely determine the general probability distribution p(x). These measures
can be defined in terms of “moments” of a probability distribution of a random variable.
Strictly speaking, moments are associated with the distribution of X rather than with X
itself. Thus, though we may be speaking of the moment of X, we mean the moment of the
distribution of X.
There are also what we call a factorial moment and an absolute moment. We should note
that, unlike the moments about the origin, about the mean and about a point, the factorial
moment in particular, may have theoretical application in advanced statistical theory but
is of less practical use.
259
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
The most basic moment of a random variable X is the one in relation to the origin of the
coordinate.
The k th moments of a random variable X about the origin is defined to be the expectation
E(X k )
The kth moment about the origin is also called the ordinary moment and denoted by µk .
In our discussion, we shall always assume that the moment exists.
Theorem 7.1
E(|X|k ) < ∞
The moment about the origin of the order zero always exists and equals 1.
The k th moment about the origin of the distribution of the discrete random variable X whose
probability mass function is p(x) is given by
∞
μ�k = E(X k ) = xki p(xi )
i=1
The k th moment about the origin of the distribution of the continuous random variable X whose
probability density function is f (x) is given by
∞
μ�k = E(X k ) = xk f (x)dx
−∞
260
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
When k = 1
μ1 = xi p(xi ) = E(X) (discrete case)
∞
μ�1 = xf (x)dx = E(X) (continuous case)
−∞
That is, the first moment about the origin is the mean of the distribution. Since the moment,
μ1, occurs so often in Probability and Statistics, it is given the symbol μ ; that is,
E(X) = μ1 = μ
Before giving the definition of the moment about the mean, we shall introduce the concept
of centred random variable. Consider a random variable X with an expectation μ .
www.sylvania.com
We do not reinvent
the wheel we reinvent
light.
Fascinating lighting offers an infinite spectrum of
possibilities: Innovative technologies and new
markets provide both opportunities and challenges.
An environment in which your expertise is in high
demand. Enjoy the supportive working atmosphere
within our global group and benefit from international
career paths. Implement sustainable ideas in close
cooperation with other specialists and contribute to
influencing our future. Come and join us in reinventing
light every day.
Light is OSRAM
261
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
X∗ = X − μ
Theorem 7.2
E(X ∗ ) = E(X − μ) = 0
Proof
See proof of Theorem 6.7.
The centred random variable is obviously equivalent to moving the origin of the co-ordinate
to the mean (central point) along the horizontal axis. The moment of the centred random
variable is called the central moment or the moment about the mean.
0 x x
E[(X − μ)k ]
262
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
The k th central moment of a discrete random variable X whose probability mass function is
p(xi ) is given by
∞
μk = (x − μ)k p(xi )
i=1
The k th central moment of a continuous random variable X whose probability density function
is f (x) is given by
∞
μk = (x − μ)k f (x)dx
−∞
Theorem 7.3
μ1 = E[(X − μ)] = 0
Proof
See the proof of Theorem 6.7.
Theorem 7.4
where
μ2 = E(X 2 ) is the second moment about the origin;
μ = E(X) is the first moment about the origin which is the mean
263
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
Proof
From Theorem 6.8
Theorem 7.5
Proof
μ3 = E[(X − μ)3 ]
= E[X 3 − 3X 2 μ + 3Xμ2 − μ3 ]
360°
.
= E[X 3 ] − 3μE[X 2 ] + 3μ2 E[X] − μ3 (Using Theorems 6.1 and 6.3)
thinking
= μ3 − 3μμ2 + 3μ μ − μ = μ3 − 3μμ2 + 2μ3
2 3
360°
thinking . 360°
thinking .
Discover the truth at www.deloitte.ca/careers Dis
Discover the truth at www.deloitte.ca/careers © Deloitte & Touche LLP and affiliated entities.
Central moments have an advantage over the other moments: the first central moment, as
has been seen earlier (Theorem 6.14), is always equal to zero and the second central moment
is the minimum value of the second moment of a random variable about any arbitrary
point (see Theorem 6.18 in the sequel).
In general, moments may be considered not only in relation to the origin of the coordinate
(moment about origin) or in relation to the expectation (central moments) but also in
relation to any arbitrary point a .
Theorem 7.6
Proof
mf = E[X(X − 1) · · · (X − k + 1)]
265
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
ma = E(|X|k )
Theorem 7.7
If an absolute moment of a random variable X of order k exists, then all moments (ordinary,
central, factorial and absolute) of order r < k exist
Approximation of Distributions
Moments are used to approximate the probability distribution of a random variable (usually
an estimator). Under some fairly general conditions, it can be used to show that two random
variables X and Y have identical probability distributions.
Variance
The second central moment, μ2 , is frequently used in practice. It is denoted by the special
symbol σ 2 and as noted earlier, is called the variance of the distribution which is used to
determine the degree of concentration of the distribution about the mean μ . For central
moments of higher than second order, interpretations in terms of shape may not be accurate.
However, the third and the fourth moments are sometimes used in statistics.
Skewness
The third central moment, μ3 , can be used to determine the symmetry of a distribution.
f (μ + x) = f (μ − x)
where μ = E(X)
266
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
It can be shown that for a symmetrical distribution, the odd number central moments
vanish. This suggests that the odd number central moments of a distribution will measure
its departure from symmetry, that is, its asymmetry.
The simplest of the odd central moment is the third central moment. Since all deviations
are cubed, negative and positive deviations will tend to cancel each other, giving μ3 = 0 , if
the distribution is symmetrical about μ .
Theorem 7.8
μ3 = E[(X − E(X)]3 = 0
If the distribution is skewed to the right, then μ3 > 0 . On the other hand, given a left
(negatively) skewed distribution, we would have μ3 < 0. We should note that µ3 alone is
a rather poor measure of skewness since the size of the µ3 is influenced by the units used
to measure the values of X.
Do you like cars? Would you like to be a part of a successful brand? Send us your CV on
We will appreciate and reward both your enthusiasm and talent. www.employerforlife.com
Send us your CV. You will be surprised where it can take you.
267
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
To make this measure dimensionless and thereby allowing us to compare the symmetry of
two or more distributions whose units of measurement are different, we may form a relative
measure by dividing µ3 (the third moment about the mean) by the cube of the standard
deviation (that is, skewness of the distribution relative to its degree of spread). The ratio
which is often denoted a3 , is known as coefficient of skewness and is used to measure lack
of symmetry.
It can be shown that |a3 | ≤ 1. The table below gives interpretation of the value of a3 .
a3 = 0 Distribution is symmetric
The table, however, cannot be taken as a rule of thumb. It must be noted, however, that
although, a3 = 0 for symmetrical distributions, the converse is not true; it is possible for
these quantities to vanish for non-symmetrical distributions. For this reason, the use of a3
as a measure of lack of symmetry is limited.
Kurtosis
The fourth central moment μ4 = E[(X − μ)4 ] is always non-negative and it can be used to
measure the degree of peakedness (sharpness of the spike) of a unimodal p.d.f. The relative
measure of peakedness is known as coefficient of kurtosis.
268
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
Moment higher than μ4 are only of theoretical interest and are usually difficult to interpret.
MX (t) = E(etX )
Example 7.1
A discrete random variable X has a probability mass function
e−λ λx
P (X = x) = , x = 0, 1, 2, . . .
x!
269
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
Solution
∞
e−λ λx
MX (t) = etx
x=0
x!
∞
(λet )x
= e−λ
x=0
x!
with y = λ et , we have
t
MX (t) = e−λ eλe
t −1)
= eλ(e
AXA Global
Graduate Program
Find out more and apply
270
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
If X is a continuous random variable with probability density function f (x), then the moment-
generating function is defined as
∞
MX (t) = etx f (x)dx
−∞
Example 7.2
A continuous random variable X has a probability density function
2e−2x , x ≥ 0
f (x) =
0, elsewhere
Solution
MX (t) = E(etX )
∞
= etx 2e−2x dx
0
∞
= 2e−2x+tx dx
0
∞
= 2 e−(2−t)x dx
0
∞
e−(2−t)x
= 2 2 − t
0
2
= , t<2
2−1
Theorem 7.9
A moment-generating function for X exists if there exists a positive constant a such that MX (t)
is finite for |t| ≤ a
271
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
The moment-generating function of a random variable X exists only when the series is finite
(in the discrete case) or the improper integral has a definite value (in the continuous case).
This may not always be true because even if the moments are all finite and have definite
values, the generating function may not converge for any value of t other than 0. It is
for this reason that more advance book on probability theory tend to use what is called
characteristics function (instead of the moment-generating function) which always exists
for all random variables22. This is defined as in Definition 7.16 but with t replaced by it
√
where i = −1. This is also so in all the proofs of corresponding theorems. Advanced readers
will recognize the moment-generating function as the Laplace transform of the function f ,
and the characteristics function as the Fourier transform.
In this text, whenever we make use of the moment-generating function, we shall always
assume it exists.
a) Finding any of the moments for X. If we can find E(etX ), then we can find any
of the moments for X.
b) Proving that a random variable possesses a particular probability distribution p(x).
If MX (t) exists for a probability distribution p(x), it is unique (see Theorem 7.16).
x2 x3 xn
ex = 1 + x + + + ··· + + ···
2! 3! n!
Thus
Now,
MX (t) = E(etX )
(tX)2 (tX)3 (tX)n
= E 1 + tX + + + ··· + + ···
2! 3! n!
272
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
For a finite sum, the expected value of the sum equals the sum of the expected values (shown
in my subsequent books). However, here we are dealing with an infinite series and hence
cannot, immediately, apply such a result. It turns out, however, that under fairly general
conditions, this operation is still valid. We shall assume that the required conditions are
satisfied and proceed accordingly.
t2
k = 2 the coefficient of is E(X 2 ),
2!
t3
k = 3 the coefficient of is E(X 3 ),
3!
.. ..
. .
tn
k = n the coefficient of is E(X n ).
n!
�e Graduate Programme
I joined MITAS because for Engineers and Geoscientists
I wanted real responsibili� www.discovermitas.com
Maersk.com/Mitas �e G
I joined MITAS because for Engine
I wanted real responsibili� Ma
Month 16
I was a construction Mo
supervisor ina const
I was
the North Sea super
advising and the No
Real work he
helping foremen advis
International
al opportunities
Internationa
�ree wo
work
or placements ssolve problems
Real work he
helping fo
International
Internationaal opportunities
�ree wo
work
or placements ssolve pr
273
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
Theorem 7.10
dk M X (t)
where = MX
k
(t) is the k th derivative of MX (t) with respect to t (since MX is a
dtk
function of the real variable t )
Proof
MX (t) = E(etX )
t2 E(X 2 ) tn E(X n )
= 1 + tE(X) + + ··· + + ···
2! n!
Then
Setting t = 0 ,
MX (t) = E(X)
�� tn−2 E(xn )
MX (t) = E(X 2 ) + tE(X 3 ) + · · · + + ···
(n − 2)!
Setting t = 0 ,
MX (0) = E(X 2 )
(n)
MX (0) = E(X n )
Thus, from a knowledge of the function, MX (t), the moments may be “generated”. Hence,
the name “moment-generating function”.
274
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
Example 7.3
For the density function of Example 7.2, find
a) the mean;
b) the variance.
Solution
From Example 7.2, the moment-generating function was found to be
2
MX (t) =
2−t
t −1
= 1−
2
t t2 t3 (i)
= 1 + + + + · · ·
2 4 8
1 2t 3t2
a) MX (t) = + + + · · ·,
2 4 8
so that
1
E(X) = MX (0) =
2
2 6t
b) MX (t) = + + · · ·,
4 8
so that
2 1
E(X 2 ) = MX (0) = =
4 2
Hence,
2
1 1 1
V ar(X) = E(X ) − [E(X)] = −
2 2
=
2 2 4
In fact, we could have obtained E(X) and E(X 2 ) from (i). Recall that the coefficient of
tk
is the kth moment about the origin. Now for
k!
1
k = 1, the coefficient of t which is E(X) is ;
2
t2 1
k = 2 , the coefficient of which is E(X 2 ) is since
2! 2
t2 t2 1
=
4 2! 2
275
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
The moment-generating function has a number of useful properties. They are stated in
theorems and we are encouraged to understand their import very well.
Property 1
Theorem 7.11
MX (0) = 1
EL 93%
DE LOS ALUMNOS DEL MIM ESTÁN
TRABAJANDO EN SU SECTOR
C
A LOS 3 MESES DE GRADUARSE
M
MASTER IN MANAGEMENT
CM
MY
276
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
Property 2
Theorem 7.12
Suppose that the random variable X has a moment-generating function MX (t). Let Y = αX ,
(( α is a constant), then MY (t), the moments generation function of the random variable Y , is
given by
MY (t) = MX (αt)
Proof
That is, to find the moment-generating function of Y = αX , evaluate the moment generating
function of X at αt (instead of at t ).
Property 3
Theorem 7.13
Suppose that the random variable X has moment-generating function MX (t). Let
Y = αX + β (α, β are constants) . Then MY (t), the moment-generating function of the
random variable Y , is given by
Proof
Example 7.3
The moment-generating function of a probability distribution is given by
1
MX (t) = t
1−
μ
Find the moment-generating function of Y , where Y = 2x − 3
277
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
Solution
Property 4
Theorem 7.14
Suppose that the random variable X has moment-generating function MX (t). Let
W = α(X + β). Then MW (t), the moment-generating function of the random variable W
, is given by
Proof
Replaying β by αβ in Theorem 7.12, the result follows.
1
A special case of Theorem 7.13 is when β = −µ and α = .
σ
Property 5
Theorem 7.15
Suppose that the random variable X has moment-generating function MX (t). Let β = −μ
1
and α = , then
σ
μt t
M x−μ (t) = e− σ MX
σ σ
278
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
Theorem 7.16
Let X and Y be two random variables with moment-generating functions MX (t) and MY (t),
respectively. If
MX (t) = MY (t)
279
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
Note
It is very important to understand exactly what the theorem says.
a) The theorem says that if two random variables have the same moment-generating
function, then they have the same probability distribution. That is, the moment-
generating function uniquely determines the probability distribution of the random
variable. What this implies is that it is impossible for random variables with different
probability distributions to have the same moment-generating functions.
b) What the theorem does not imply is that if two distributions have the same moments,
then they are identical at all points. This is because in some cases, even though
the moments exist, the moment-generating function does not, because the limit
n i
t mi
lim
n→∞
i=0
i!
Theorem 7.17
Suppose that X and Y are independent random variables. Let Z = X + Y . Let MX (t), MY (t)
and MZ (t) be the moment-generating functions of the random variables X, Y, and Z ,
respectively. Then
Proof
MZ (t) = E(etZ )
= E(et(X+Y ) )
= E(etX etY )
= E(etX )E(etY ) by independence of X and Y
= MX (t)MY (t)
Note
This theorem may be generalized as follows.
280
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
Corollary 7.1
If X1 , X2 , · · · , Xn are independent random variables with moment-generating functions
MXi (t), i = 1, 2, · · · , n then MZ (t), the moment-generating function of
Z = X 1 + X2 + · · · + Xn
is given by
That is, the moment-generating function of the sum of independent random variables is
equal to the product of their moment-generating functions.
From Chapter 5 to this chapter, we have been discussing in general terms the probability
distributions of discrete and continuous random variables. In Volume II, we shall discuss
special probability distributions. Under discrete probability distributions we shall consider
the Bernoulli, Binomial, Geometric, Negative Binomial, Poisson, Hypergeometric and
Multinomial distributions in the order. Under continuous distribution, the Uniform,
Exponential, Gamma, Beta and Normal distributions will be considered, respectively.
EXERCISES
7.2 The time in hours between arrivals T of customers in a store, has probability
density function
10exp(−10t), t ≥ 0
fT (t) =
0, 0<t
281
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
CLICK HERE
to discover why both socially
and academically the University
of Groningen is one of the best
places for a student to be
www.rug.nl/feb/education
282
INTRODUCTORY
PROBABILITY THEORY MOMENTS AND MOMENT-GENERATING FUNCTIONS
7.7 Refer to Exercise 5.14. Using the moment-generating function, find (a) E(X) (b)
V ar(X).
a) a and b ,
b) the variance of X.
7.9 If X and Y are independent random variables with respective probability mass
functions
e−μp (μP )x
p(x) = , x = 0, 1, 2 · · ·
x!
and
e−μp (μp)y
p(y) = , y = 0, 1, 2, · · ·
y!
7.11 If X and Y are independent random variables with respective probability mass
functions
n
p(x) = px q n−x , x0, 1, 2, · · ·
x
and
n
p(y) = py q n−y , y = 0, 1, 2 · · ·
y
283
INTRODUCTORY
PROBABILITY THEORY Answers to Odd-Numbered Exercises
ANSWERS TO ODD-NUMBERED
EXERCISES
Chapter 1
1.1 (a) {1, 3, 5, 7, 9} (b) {x|x is a positive odd number less than10} 1.3 No 1.5 (c) 1.7 (a),
(b), (c) 1.9 ∅ 1.11 (a) finite (c) infinite (e) finite (f ) infinite (g) infinite
1.13 (a) {1, 2, 3, 4, 5, 6, 7} (c) {1, 2, 3, 4, 5, 6, 7, 8, 10} (e) {7, 8, 9, 10} (g) {1, 2, 3, 7, 8, 9, 10}
(i) {1, 2, 3, 7} (k) {4, 5, 6, 7, 8, 9, 10} 1.21 {{a,
{{a, b, c}, {a,
b, c}, {a, b},
b}, {a,
{a, c},
c}, {b,
{b, c},
c}, {a},
{a}, {b},
{b}, {c},
{c}, ∅}
∅}
1.23 {{1}, {2}, {3}}; {{1, 2}, {3}}; {{1, 3}, {2}}; {{1}, {2, 3}}; {1, 2, 3}
Chapter 2
2.1 175,760,000 2.3 (a) 4 (c) 336 2.5 720 2.9 15120 2.15 5040 2.17 3360 2.19 (a) 120
(b) 72 2.21 (a) 35 (c) 6 (e) 28 (g) 230300 (i) 230300 2.23 (a) 1140 (b) 12 2.25 210 2.27
576 2.29 286
2.31
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
2.33 (a) x4 + 4x3 y + 6x2 y 2 + 4xy 3 + y 4 2.35 (a) 35 (c) -1760 (e) 1120 2.37x4 35x
+ 4x43 2 2
yy9 9+ 6x y + 4xy
3
2.39 1.9494
Chapter 3
3.1 A = {(2, 6), (3, 5), (5, 3), (6, 2)} D = {(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)}
E = {(3, 3), (3, 6), (6, 3), (6, 6)} 3.3 (a) A ∩ (B ∪ C) (c) A ∪ B ∪ C
(e) [A ∩ B ∩ C̄] ∪ [A ∩ B̄ ∩ C] ∪ [Ā ∩ B ∩ C] (g) A ∩ B ∩ C (i) (A ∪ B) ∩ C̄
3.5 E ∩ F = {(1, 2), (1, 4)(1, 6)(2, 1)(4, 1), (6, 1)} F ∩ G = {(1, 4), (4, 1)}
3.7 {(R, G), (R, B), (G, R), (G, B), (B, R), (B, G)} 3.9 0.75 3.11 (a) 0.0044 (b) 0.03516
7
3.13 (a) (a) 4.933 × 10−7 (b) 0.0199 3.15 0.1512 3.17 (a) 0.1074 (b) 0.0619 3.19 (a) 15 7
6 3 1 2 9 (n−k)! 1
3.21 (a) 10
, 10 , 10 3.25 (a) 5 (b) 10 3.27 n! 3.29 2
284
INTRODUCTORY
PROBABILITY THEORY Answers to Odd-Numbered Exercises
Chapter 4
4.3 (a) 0.8 (b) 0.7 (c) 0.44 (d) 0.56 4.5n58 4.7
(a)0.9 (b) 1 (c) 0.7 (d) 0.6 4.9 (a) 20,000
5
− n−3 k−1
(b) 12,000 (c) 11,000 (d) 10,000 4.11 n 5 4.13 (a) n(n−k) 1
(b) (n−1)k 4.15 12
n
5
5 14 2 1 1
4.17 28 4.19 (a) 285 (b) 95
3
4.21 11 4.23 (a) 6 (b) 6 4.25 0.6044 4.27 0.413
4.29 0.1429 4.31 0.72 4.35 not independent
Chapter 5
5.1
X 0 1 2
1 2 1
P (X = xi ) 4 4 4
1
9
, 0≤x<2 0,elsewhere
2 x−1
5.3 (b) f (x) = 9
, 2 ≤ x < 6 5.5 (a) 4 (b) f (x) = 4
,
x = 2, 3, 5
2
6≤x<8 1,
3
,
x>5
0, x<0
3
5.7 (a) a = 4 (b) 1
4
5.9 1
π
5.11 (a) 1 (b) 1 − e−x 5.13 (a) F (x) = 1 2
4
x , 0≤x≤2
(c) 1 (d) 1 1, x>2
285
INTRODUCTORY
PROBABILITY THEORY Answers to Odd-Numbered Exercises
αe−αx , x≥0
5.15 −0.02 5.17 (a) f (x) = (b) 1 − e−2α (c) e−3α (d) 1 − e−3α
0, x<0
0, x<0
5.21 (a) F (x) = x3 , 0 ≤ x ≤ 1 (b) 0.94 (c) 0.88
1, x>1
⎧
⎪ 0, x<0
⎨ x2
, 0≤x≤1 4
5.23 (a) f (x) = 2
1 (b) 0.125 (c) 0.575 (d) 9
⎪
⎩ x− 2
, 1 < x < 1.5
1, x ≥ 1.5
� 1
8
, 0<x<2
7 9
5.25 (a) f (x) = x
8
, 2 ≤ x ≤ 4 (b) 16 (c) 13
16 (d) 16
0, elsewhere
0, x<0
5.27 (a) f (x) = 4( 31 x3 − 14 x4 ), 0≤x≤1
1,
, x>1
1
1
(c) F (−1) = 0, F (0) = 0, F (1) = 3 5.29 (a) 0.0453 (c) 0.5390 5.31 (b) 0.2535
Chapter 6
6.1 (a) 1 (c) 87 , 85 6.5 (a) 1.05 (c) 0.525 (e) 4.5475 (g) 1.1369 6.9 (a) 7 (b) 5.83 6.11
k!
λk
λ > 0 6.13 E(X) = 23 (b) V ar(X) = 63 2
Chapter 7
2 1
7.1 (a) E(X) = 3, V ar(X) = 18 7.3 (a) 1
(1−t)2
, |t| < 1 7.5 (a) λ
λ−t
eaλ , |t| < λ (b)
1 aλ
e 7.7 E(X) = 2
(b) V ar(X) = 2
7.9 e2μp (1 − et ) 7.11 (pet + q)2n
λ 3 63
286
INTRODUCTORY
PROBABILITY THEORY Bibliography
BIBLIOGRAPHY
Applebaum, D., (1996), Probability and Information: An Integrated Approach , Cambridge
University Press, Cambridge.
Bajpal, A.C., Calus, L.M. and Fairley, J.A., (1978), Statistical Methods for Engineers and
Scientists, John Wiley and Sons, New York.
Barlow, R. (1989), Statistics, A Guide to the Use of Statistical Methods in the Physical Sciences,
John Wiley and Sons, New York.
Beaumont, G.P. (1986), Probability and Random Variables, Ellis Norwood Ltd, Chichester.
Billinsley, P. (1979) Probability and Measure, John Wiley and Sons, New York.
Birnbaum, Z.W. (1962), Introduction to Probability and Mathematical Statistics, Harper and
Brothers Publishers, New York.
Blake, Ian F.B. (1979), An Introduction to Applied Probability, John Wiley and Sons, New
York.
Breiman L., (1972), Society for Industrial and Applied Maths, Siam, Philadelphia.
Brémaud, Pierre (1994), An Introduction to Probabilistic Modeling, Springer – Verlag, New York.
David F.N. (1951), Probability Theory for Statistical Methods, Cambridge University Press,
Cambridge.
287
INTRODUCTORY
PROBABILITY THEORY Bibliography
Drake, A.W. (1967), Fundamentals of Applied Probability Theory, McGraw-Hill, New York.
Dudewicz, E.J. and Mishra, S.N. (1988), Modern Mathematical Statistics, John Wiley and
Sons, New York.
Feller, W. (1970) An Introduction to Probability Theory and its Applications Vol. I, Second
Edition, John Wiley and Sons, New York.
Feller, W. (1971) An Introduction to Probability Theory and its Applications Vol. II, Second
Edition, John Wiley and Sons, New York.
Freund, J.E. and Walpole, R.E. (1971), Mathematical Statistics, Prentice – Hall, Inc,
Englewood Cliffs, New Jersey.
Galambos, J. (1984), Introductory Probability Theory, Marcel Dekker, Inc., New York and Basel.
Gnedenko, B.V. and Khinchin, A.Ya. (1962), An Elementary Introduction to the Theory of
Probability, Dover Publications, Inc., New York.
American online
LIGS University
is currently enrolling in the
Interactive Online BBA, MBA, MSc,
DBA and PhD programs:
288
INTRODUCTORY
PROBABILITY THEORY Bibliography
Guttman, I., Wilks, S.S and Hunter, J.S. (1982), Introduction Engineering Statistics, John
Wiley and Sons, New York.
Haln, G.J., and Shapiro, S.S. (1967) Statistical Models in Engineering, John WIley and Sons.
Hoel, P.G. (1984), Mathematical Statistics, Fifth ed., John Wiley and Sons, New York.
Hoel, P.G., Port, S.C, and Stone, J.S. (1971), Introduction to Probability Theory, Houghton
Mifflin.
Hogg, R.V. and Cragg, A.T. (1978), Introduction to Mathematical Statistics, Fourth Edition,
Macmillan, New York.
Johnson, N.L. and Kotz, S.(1969) Distributions in Statistics: Discrete Distributions , John
Wiley and Sons, New York.
Johnson, N.L. and Kotz, S.(1970) Distributions in Statistics: Continuous Univariate Distributions
1 & 2, John Wiley and Sons, New York.
Kendal, M.G. and Stuart, A. (1963) The Advanced Theory of Statistics,, Vol. 1, 2nd ed.,
Charles Griffin & Company Ltd., London.
Kendal, M.G. and Stuart, A. (1976) The Advanced Theory of Statistics,, Vol. 3, 4th ed.,
Charles Griffin & Company Ltd., London.
Kendal, M.G. and Stuart, A. (1979) The Advanced Theory of Statistics,, Vol. 2, 4th ed.,
Charles Griffin & Company Ltd., London.
289
INTRODUCTORY
PROBABILITY THEORY Bibliography
Lai, Chung Kai (1975), Elementary Probability: Theory with Stochastic Processes, Springer –
Verlag, New York.
Lamperti, J. (1966), Probability: A Survey of the Mathematical Theory, W.A. Benjamin, INC.,
New York.
Lupton R., (1993), Statistics in Theory and Practice, Princeton, Princeton University Press.
Mendenhall, W. and Scheaffer, R.L. (1990), Mathematical Statistics With Applications, PWS-
KENT Publishing Company, Mass.
Meyer, P.L. (1970), Introductory Probability and Its Applications, 2nd Edition, Addison
Wesley Publishing Company.
Miller, I. and Freund, J.E. (1987), Probability and Statistics for Engineers, Third Edition,
Prentice-Hall, Englewood Cliffs, New Delhi.
Mood, A.M., Graybill, F.A. and Boes, D.C (1974), Introduction to the Theorey of Statistics,
Third Edition, McGraw-Hill, New York.
Page, L.B. (1989), Probability for Engineering with Applications to Reliability, Computer
Science Press.
Prohorov, Yu.V. and Rozanov, Yu.A. (1969) Probability Theory, Springer-Verlag, New York.
Robinson, E.A. (1985), Probability Theory and Applications International Human Resources
Dev. Corporation.
Ross, S. (1984), A First Course in Probability, 2nd Edition, Macmillan Publishing Company.
Spiegel, M.R. (1980), Probability and Statistics, McGraw-Hill Book Company, New York.
Stoyanov. J, et. al. (1989), Exercise Manual in Probability Theory, Kluwer Academic Publishers,
London.
290
INTRODUCTORY
PROBABILITY THEORY Bibliography
Studies in the History of Statistics and Probability, Vol. 1, edited by Pearson, E.S., and Kendal,
M. (1970), Charles Griffin & Company Ltd., London.
Studies in the History of Statistics and Probability, Vol. 1, edited by Pearson, E.S., and Kendal,
M. (1977), Charles Griffin & Company Ltd., London.
Trivedi, K.S. (1988), Probability and Statistics with Reliability, Queuing, and Computer Science
Applications, Prentice-Hall of India, New Delhi.
Wayne, W.D. (1991), Biostatistics: A Foundation for Analysis in the Health Sciences, Fifth
Edition, John Wiley and Sons, Singapore.
Wilks, S.S. (1962), Mathematical Statistics, John Wiley and Sons, New York.
291
INTRODUCTORY
PROBABILITY THEORY Endnotes
ENDNOTES
1. A terminating decimal is a decimal that ends. It is a decimal with a finite number of digits. For ex-
ample, if we divide 1 by 8, we get 0.125.
2. A repeating (or recurring) decimal is a number whose decimal part eventually has the same sequence
of digits repeating indefinitely. For example, 1/3=0.333.
3. The name power set is motivated by the fact that if set A is finite and has n distinct elements, then
its power set P(A) contains exactly 2n elements.
4. See Examples 2.15.
5. Objects may be persons, town, numbers, or anything of interest.
6. With the factorial function on any simple scientific calculator, this approximation is, with time, be-
coming more and more obsolete. However, it is highly used in advanced probability theory.
7. In an experiment a coin is tossed or flipped while a die is rolled or thrown.
8. The experiment of throwing two dice once is equivalent to the experiment of throwing a die twice.
Both experiments leads to the same sample space.
9. In general, any events defined on non-overlapping sets of trials are independent.
10. It is believed that the first mathematician to calculate a theoretical probability correctly was Girolama
Cardano, an Italian who lived from 1501 to 1576.
11. A finite probability space is obtained by assigning to each point ei ∈ S = {e1 , e2 , · · · , en } a real number
pi , called the probability of ei , which satisfies the following properties (a) pi ≥ 0 (b) n pi = 1 .
i
12. A probability function is a real-valued function defined over the events in a sample space with three
fundamental properties given as axioms in Definition 3.31. A synonym of a probability function is a
probability measure.
13. Axioms are basic assumption that one takes to develop ideas.
14. We can interchange ∪ with +
15. To determine the formula for the multiplication rule of probability for several events, we have to
condition each event on the occurrence of all of the preceding events
16. In the table, recall that
P (Hi ∩ F) = P (Hi )P (F|Hi )
Note also that
P (T ) = P (F ∩ Hi ) = P (H1 )P (A|H1 ) + P (H2 )P (A|H2 )
i
17. Recollect from Definition 4.2 that P (A|B) is defined only when P (B) > 0 or P (B|A) is defined
only when P (A) > 0 But in some cases we would like to determine the independence of events when
their probabilities are zero.
18. The probabilities which appear in the margins of the table are called marginal probabilities.
19. /c is the currency of Ghana called the cedi
20. A random variable X is said to be bounded if |X| ≤ M < ∞. This implies that P (|X| ≤ M ) = 1
21. And also the theorem that the expected value of a finite sum of random variables is the sum of the
expected values of the random variables. This will be proved later in Volume III.
292
INTRODUCTORY
PROBABILITY THEORY Endnotes
22. The characteristics function always exists for all random variables because it is the integral of
a bounded function on a space of finite measure.
23. Sum of random variables will be treated in another book.
24. Two random variables X and Y are said to be independent if F (x, y) = FX (x)FY (y) (to be shown
in another book).
Maastricht
University is
the best specialist
university in the
Visit us and find out why we are the best! Netherlands
(Elsevier)
Master’s Open Day: 22 February 2014
www.mastersopenday.nl
293
INTRODUCTORY
PROBABILITY THEORY Index
INDEX
Symbols Binomial Theorem
-combination 58, 60 for Non-Negative Integer 65
-permutation 53 for Rational Index 70
generalized 37, 72, 74, 280
-permutations 53, 54, 55
Boole’s Inequality 139
A
C
absolute moment 259, 266
Calculated average, see mean
Addition law
cardinal number 20, 21, 24
general 17, 59, 62, 68, 73, 74, 75, 292
special 42, 52, 134, 135, 152, 196, 229, 236, cardinality of infinite sets 20
266, 278, 281 cardinal number 20, 21, 24
addition principle 47, 48 cells 35
Addition principle central location 229
of counting 46, 49, 50, 52, 63, 65 Central moment, see moment about mean
Addition rule continuous case 261, 272
special 42, 52, 134, 135, 152, 196, 229, 236, discrete case 240, 261, 272
266, 278, 281 expectation of 238, 239, 240, 241, 245, 247,
Approach to defining probability 248, 250, 259, 262
classical 105, 106, 110, 111, 116, 117, 118, 119 central tendency 229
mathematical 28, 106, 112, 115, 118, 119, 131, centred random variable 261, 262
142, 150, 201, 229, 236, 238, 240, 241 characteristics function 272, 293
relative frequency approach 111, 112, 115 Classical approach
subjective 105, 115, 116, 117, 118 disadvantages of 111
Arithmetic mean coefficient of kurtosis 268
for grouped data 239 coefficient of skewness 268
for ungrouped data 239 collectively exhaustive events 163
asymmetry 267 Combination
identities 79
B
without repetition 54, 55, 57, 58, 60
Bayes’ probabilities with repetition 55, 58
using Table 167 combinatorial analysis 46, 150
Bayes’ theorem 146, 162 Combinatorial Analysis 109
rule 133, 134, 146, 150, 152, 157, 161, 179, combinatorics 46, 47
268, 292
Conditional probability
bimodal 232 definition 18, 42, 54, 102, 104, 110, 112, 114,
binomial coefficient 57, 60, 70, 72 115, 118
Binomial Coefficient Graphical Interpretation of 144
Identities 58, 64, 67, 70, 77 of random variables 187, 188, 191, 229, 292, 293
Binomial theorem 67 uses 266
294
INTRODUCTORY
PROBABILITY THEORY Index
295
INTRODUCTORY
PROBABILITY THEORY Index
296
INTRODUCTORY
PROBABILITY THEORY Index
297
INTRODUCTORY
PROBABILITY THEORY Index
238, 239, 240, 241, 242, 243, 245, 246, 247, equal 22, 23, 32, 43, 52, 100, 124, 131, 141,
248, 249, 250, 251, 253, 254, 257, 259, 260, 169, 170, 207, 211, 212, 215, 232, 238,
261, 262, 263, 264, 265, 266, 267, 269, 271, 239, 241, 262, 265, 281
272, 274, 277, 278, 280, 281, 283, 292 equivalent 23, 24, 43, 163, 262, 292
continuous 92, 93, 94, 191, 192, 196, 200, finite 19, 20, 43, 46, 47, 49, 91, 92, 93, 106,
202, 203, 204, 205, 206, 208, 212, 214, 145, 147, 191, 192, 193, 203, 208, 238,
215, 217, 218, 223, 224, 226, 227, 230, 239, 241, 271, 272, 273, 284, 292, 293
231, 233, 239, 240, 243, 251, 260, 261, infinite 19, 20, 43, 73, 91, 92, 93, 94, 112, 191,
263, 271, 272, 281 192, 193, 194, 238, 273, 284
discrete 46, 92, 93, 191, 192, 193, 196, 197, laws of
complement 30, 31, 41, 95, 97, 103, 120, 137, 138,
199, 200, 205, 206, 208, 211, 221, 222,
177
223, 230, 232, 233, 235, 238, 240, 243, De Morgan’s 36, 37
249, 250, 260, 261, 263, 269, 272, 281 identity 37
probability distribution of 192, 193, 194, 196, non-empty 22, 34, 35
200, 209, 221, 223, 257, 259, 266, 280 partition of 34, 35, 101, 153, 154, 197
power 34, 44, 65, 188, 269, 292
S singleton 21
sampling without replacement 46, 47 uncountable 20, 91
sampling with replacement 46, 47 Unequal 23
Set universal 24, 25, 30, 38, 46, 47, 49, 206
definition of 102, 114, 118, 142, 145, 163, set theory 16, 37, 42, 97
197, 238, 239, 254, 261 standard deviation 247, 248, 249, 259, 268
Set operations statistical independence 178, 179
intersection 29, 37, 97, 103, 142, 170 string event 95, 96
symmetric difference 32, 97 Subjective approach
union 28, 29, 34, 37, 97, 100, 103, 129, 154, disadvantages of 111
155, 197 Subsets 25, 26, 27, 38
Sets improper 27, 28, 272
cardinality of 20, 44, 90 proper 26, 28, 38
classes of 229 Sums of Binomial coecients
complement 30, 31, 41, 95, 97, 103, 120, 137, identities 79
138, 177
countable 20, 43, 92 T
countably infinite 20, 92, 93, 191, 192, 193, Techniques for counting
194 combinations 47, 58, 59, 60, 62, 230
denumerable 20, 92 permutations 47, 53, 54, 55, 56, 57, 58, 59
description of 229 Theorem
disjoint 29, 30, 31, 34, 40, 47, 48, 99, 126, Bayes’ 146, 150, 156, 157, 159, 161, 162, 163,
133, 143, 155, 181 164, 165, 166, 167
element of 17, 18, 25, 27, 28, 35, 95, 190 trimodal 232
empty 19, 22, 23, 34, 35, 46, 64, 95, 126, 204 Types of Experiments 86
298
INTRODUCTORY
PROBABILITY THEORY Index
299