Probability Made Easy by College
Probability Made Easy by College
This chapter reviews basic notions of probability (or stochastic variability) which is the formal study of the laws
of chance, i.e., where the ambiguity in outcome is inherent
in the nature of the process itself. Both the primary views
of probability, namely the frequentist (or classical) and the
Bayesian, are covered, and some of the important probability distributions are presented. Finally, an effort is made to
explain how probability is different from statistics, and to
present different views of probability concepts such as absolute, relative and subjective probabilities.
2.1 Introduction
p(E) =
(2.1)
T. Agami Reddy, Applied Data Analysis and Modeling for Energy Engineers and Scientists,
DOI 10.1007/978-1-4419-9613-8_2, Springer Science+Business Media, LLC 2011
27
28
probability as the long run frequency. This has a nice intuitive interpretation, hence its appeal. However, people have
argued that most processes are unique events and do not
occur repeatedly, thereby questioning the validity of the frequentist or objective probability viewpoint. Even when one
may have some basic preliminary idea of the probability associated with a certain event, the frequentist view excludes
such subjective insights in the determination of probability.
The Bayesian approach, however, recognizes such issues
by allowing one to update assessments of probability that
integrate prior knowledge with observed events, thereby allowing better conclusions to be reached. Both the classical
and the Bayesian approaches converge to the same results
as increasingly more data (or information) is gathered. It
is when the data sets are small that the additional benefit
of the Bayesian approach becomes advantageous. Thus, the
Bayesian view is not an approach which is at odds with the
frequentist approach, but rather adds (or allows the addition
of) refinement to it. This can be a great benefit in many
types of analysis, and therein lies its appeal. The Bayes
theorem and its application to discrete and continuous probability variables are discussed in Sect.2.5, while Sect.4.6
(of Chap.4) presents its application to estimation and hypothesis problems.
P(n, k) =
n!
(n k)!
(2.2a)
(2.2b)
(b) Combinations C(n, k) is the number of ways that k objects can be selected from n objects with the order not being
important. It is given by:
n!
C(n, k) =
(n k)!k!
n
k
(2.3)
Note that the same equation also defines the binomial coefficients since the expansion of (a+b)n according to the Binomial theorem is
(a + b)n =
n
n
ank bk .
k
(2.4)
k=0
7!
(7) (6) (5)
= 2110
=
(7 3)!
1
(7 3)!3!
(3) (2)
Another type of combinatorial problem is the factorial problem to be discussed in Chap.6 while dealing with design of
experiments. Consider a specific example involving equipment scheduling at a physical plant of a large campus which
includes primemovers (diesel engines or turbines which produce electricity), boilers and chillers (vapor compression and
absorption machines). Such equipment need a certain amount
of time to come online and so operators typically keep some
of them idling so that they can start supplying electricity/
heating/cooling at a moments notice. Their operating states
can be designated by a binary variable; say 1 for on-status and 0 for off-status. Extensions of this concept include
cases where, instead of two states, one could have m states.
An example of 3 states is when say two identical boilers are
to be scheduled. One could have three states altogether: (i)
when both are off (00), (ii) when both are on (11), and (iii)
when only one is on (10). Since the boilers are identical,
state (iii) is identical to 01. In case, the two boilers are of
different size, there would be four possible states. The number of combinations possible for n such equipment where
each one can assume m states is given by mn. Some simple
cases for scheduling four different types of energy equipment
in a physical plant are shown in Table2.1.
29
One of each
Two of each-assumed identical
Two of each-non-identical
except for boilers
Boilers
01
00, 01, 11
00, 01, 10, 11
01
00, 01, 11
00, 01, 10
Chillers-Vapor
compression
01
00, 01, 11
00, 01, 10, 11
ChillersAbsorption
01
00, 01, 11
00, 01, 10, 11
Number of
Combinations
24=16
34=81
4331=192
a
S
b
S
intersection
c
S
d
S
30
p(A) 0
(2.5)
(2.7)
p( A ) = 1 p(A)
(2.8)
(2.9)
(2.11)
(v) probability for either A or B (when they are not mutually exclusive) to occur is equal to:
(2.10)
These are called product models. Consider a dice tossing experiment. If event A is the occurrence of an even number,
then p(A)=1/2. If event B is that the number is less than or
equal to 4, then p(B)=2/3. The probability that both events
occur when a dice is rolled is p(A and B)=1/22/3=1/3.
This is consistent with our intuition since events {2,4} would
satisfy both the events.
(b) Marginal probability of an event A refers to the probability of A in a joint probability setting. For example, consider a space containing two events, A and B. Since S can be
taken to be the sum of event space B and its complement B ,
the probability of A can be expressed in terms of the sum of
the disjoint parts of B:
14 12
1
+
= .
52 52
2
p(B/A) =
p(A B)
p(A)
(2.12)
8 8
64
.
=
= 0.64
10 10
100
(b) What is the probability that two bulbs drawn in sequence (i.e., not replaced) are good where the status of the
bulb can be checked after the first draw?
From Eq.2.12, p(both bulbs drawn are good):
8 7
28
p(A B) = p(A) p(B/A) =
=
= 0.622
10 9
45
Example 2.2.5: Two events A and B have the following probabilities: p(A) = 0.3, p(B) = 0.4 and p(A B) = 0.28 .
(a) Determine whether the events A and B are independent
or not?
= 1 p(A) = 0.7 . Next, one will
From Eq.2.8, P (A)
verify whether Eq.2.10 holds or not. In this case, one
p(B) or
needs to verify whether: p(A B) = p(A)
whether 0.28 is equal to (0.70.4). Since this is correct,
one can state that events A and B are independent.
(b) Find p(AB)
From Eqs.2.9 and 2.10:
p(A B) = p(A) + p(B) p(A B)
= p(A) + p(B) p(A) p(B)
= 0.3 + 0.4 (0.3)(0.4) = 0.58
31
Fig.2.2 The forward probability
tree for the residential air-conditioner when two outcomes are
possible (S satisfactory or NS not
satisfactory) for each of three
day-types (VH very hot, H hot
and NH not hot)
0.2
0.8
NS
0.9
0.1
1.0
NS
NH 0.0
NS
VH
0.1
Day
type
0.3
0.6
Example 2.2.6: Generating a probability tree for a residential air-conditioning (AC) system.
Assume that the AC is slightly under-sized for the house it
serves. There are two possible outcomes (S- satisfactory and
NS- not satisfactory) depending on whether the AC is able
to maintain the desired indoor temperature. The outcomes
depend on the outdoor temperature, and for simplicity, its
annual variability is grouped into three categories: very hot
(VH), hot (H) and not hot (NH). The probabilities for outcomes S and NS to occur in each of the three day-type categories are shown in the probability tree diagram (Fig.2.2)
while the joint probabilities computed following Eq.2.10 are
assembled in Table2.2.
Note that the relative probabilities of the three branches
in both the first stage as well as in each of the two branches of each outcome add to unity (for example, in the Very
Hot, the S and NS outcomes add to 1.0, and so on). Further,
note that the joint probabilities shown in the table also have
to sum to unity (it is advisable to perform such verification checks). The probability of the indoor conditions being
satisfactory is determined as: p(S)=0.02+0.27+0.6=0.89
while p(NS)= 0.08+0.03+0=0.11. It is wise to verify that
p(S)+p(NS)=1.0.
Example 2.2.7: Consider a problem where there are two boxes with marbles as specified:
Box 1: 1 red and 1 white and Box 2: 4 red and 1 green
A box is chosen at random and a marble drawn from it.
What is the probability of getting a red marble?
One is tempted to say that since there are 4 red marbles in
total out of 6 marbles, the probability is 2/3. However, this
is incorrect, and the proper analysis approach requires that
one frame this problem as a two-stage experiment. The first
stage is the selection of the box, and the second the drawing
Table2.2 Joint probabilities of various outcomes
p(V H S) = 0.1 0.2 = 0.02
32
p(B W ) = 1/2 0 = 0
p(A G) = 1/2 0 = 0
Marble
color
Box
1/2
p(A R)
p(A W)
=1/4
A
1/2
1/2
3/4
1/2
=1/4
p(B R)
=3/8
B
1/4
=1/8
G
Fig.2.3 The first stage of the forward probability tree diagram involves
selecting a box (either A or B) while the second stage involves drawing a
marble which can be red (R), white (W) or green (G) in color. The total
probability of drawing a red marble is 5/8
p(B G)
=5/8
f(x)
F(x)
1/6
1.0
2/6
2/6
a
Fig.2.5 Probability density
function and its association with
probability for a continuous random variable involving the outcomes of hourly outdoor temperatures at Philadelphia, PA during
a year. The probability that the
temperature will be between 55
and 60F is given by the shaded
area. a Density function. b Probability interpreted as an area
PDF
0.03
PDF
0.03
0.02
0.02
0.01
0.01
20
40
60
80
Dry bulb temperature
100
20
40
60
80
Dry bulb temperature
100
33
The cumulative distribution function (CDF) or F(a) represents the area under f(x) enclosed in the range <x<a:
CDF
0.8
0.6
0.4
F (a) = p{X a} =
f (x)dx
(2.15)
0.2
0
0
20
40
60
80
100
f (x) =
< x <
(2.13)
dF (x)
dx
(2.16)
a
p{a X b} =
b
f (x)dx
=
b
f (x)dx
= F (b) F (a)
a
(2.17)
f (x)dx
f (x, y) 0
for all
(x, y)
f (x, y)dxdy = 1
p[(X, Y ) A] =
f (x, y)dxdy
(2.18)
(2.19)
(2.20)
34
(2.21)
g(x) > 0
(2.23)
f (x) = ax2
i=0
p(20 < X < ) =
a = 1/3 .
Example 2.3.2: The operating life in weeks of a high efficiency air filter in an industrial plant is a random variable X
having the PDF:
20
f (x) =
(x + 100)3
20
= 0.000694
f (x, y) =
1 1
0
1
1
0
2
(2x + 3y)dxdy
5
2x2
6xy
+
5
5
x=1
dy
x=0
2 6y
2 3
+
dy = + = 1
5
5
5 5
for x > 0
Find the probability that the filter will have an operating life
of:
(a) at least 20 weeks
(b) anywhere between 80 and 120 weeks
First, determine the expression for the CDF from Eq.2.14.
Since the operating life would decrease with time, one needs
to be careful about the limits of integration applicable to this
case. Thus,
10
(x + 100)2
(b) for this case, the limits of integration are simply modified as follows:
120
10
p(80 < X < 120) =
= 0.000102
(x + 100)2 80
c=1/30
(b) One uses Eq.2.14 modified for the limiting range in x:
3 2
2 2
ax dx = 1 from which ax3
= 1 resulting in
0
20
10
dx =
(x + 100)3
(x + 100)2 x
CDF =
0
1/2 1/2
1/4 0
2
(2x + 3y)dxdy
5
13
160
5
5
10 y=0
5
0
35
Under 25
Between 2540
Above 40
Marginal
probability of X
Income (X)
>$40,000
40,00090,000 <90,000
0.15
0.10
0.08
0.33
0.09
0.16
0.20
0.45
0.05
0.12.
0.05
0.22
Marginal
probability
of Y
0.29
0.38
0.33
Should
sum to 1.00
both ways
Example 2.3.4: The percentage data of annual income versus age has been gathered from a large population living
in a certain region see Table2.4. Let X be the income
and Y the age. The marginal probability of X for each
class is simply the sum of the probabilities under each column and that of Y the sum of those for each row. Thus,
p(X 40, 000) = 0.15 + 0.10 + 0.08 = 0.33, and so on.
Also, verify that the sum of the marginal probabilities of X and
Y sum to 1.00 (so as to satisfy the normalization condition).
E[X ] =
xf (x)dx.
(2.24)
var[X] = E[X 2 ] 2
(2.25b)
CV = 100 ( )
(2.26)
X = a0 + a1 X1 + a2 X2 ...
(2.27)
var[a0 ] = 0
var[a1 X1 ] = a12 var[X1 ]
(2.29)
(2.25a)
Alternatively, it can be shown that for any discrete distribution:
(2.30)
(2.31)
36
(2.32)
E[f (x)] =
0
20x
dx = 0.1
(x + 100)3
Further:
E(X 2 ) = (0)(0.51) + (12 )(0.38) + (22 )(0.10) + (32 )(0.01)
Hence:
= 0.87
3
0.01
b
Fig.2.8 Unimodal and bi-modal distributions. a Unimodal, b Bi-modal
37
D
Hypergeometric
Bernouilli Trials
(two outcomes,
success prob. p)
n trials
w/o replacement
n trials
with replacement
outcomes >2
Multinomial
Binomial
Number of trials
before success
D
B(n,p)
n
p = cte
Weibull
p0
W ( , )
t = np
Time
Poisson
P (t)
Normal
N ( , )
Geometric
G(n,p)
n
Frequency
of events
per time
between
=1
events
Exponential
n<30
E ( )
=1
Student
t(,s,n)
Lognormal
L(, )
Chi-square
2(v)
2(m)/m
2 (n)/n
=1
= /2
= 1/2
F-distribution
F(m,n)
Gamma
G(, )
38
0.4
0.4
B (15,0.1)
0.3
0.2
0.2
0.1
0.1
0
B (15,0.9)
0.3
12
15
x
0.15
CDF
0.09
0.06
B (100,0.1)
0.8
0.6
0.4
0.2
0.03
0
0
20
40
60
80
100
20
40
60
B(x; n, p) =
n
x
px (1 p)nx
(2.33a)
with mean: = (n.p) and variance
2 = np(1 p)
(2.33b)
B(r; n, p) =
r
15
B (100,0.1)
0.12
12
80
100
4
x=0
= 1 0.0094 = 0.9906
(2.33c)
x=0
P(5 X 8) =
8
x=0
4
x=0
Example 2.4.1: Let k be the number of heads in n=4 independent tosses of a coin. Then the mean of the distribution=(4)(1/2)=2, and the variance 2=(4)(1/2) (c) Geometric Distribution. Rather than considering the
(11/2)=1. From Eq.2.33a, the probability of two successes number of successful outcomes, there are several physical
in four tosses =
instances where one would like to ascertain the time interval
for a certain probability event to occur the first time (which
2
1 42
1
4
could very well destroy the physical system). This proba1
B(2; 4, 0.5) =
2
2
2
bility (p) is given by the geometric distribution which can
43 1 1
3
be derived from the Binomial distribution. Consider N to be
=
=
the random variable representing the number of trials until
2
4 4
8
the event does occur. Note that if an event occurs the first
0.016
0.8
0.012
0.6
CDF
39
0.008
0.2
0.004
0
0
100
200
x
time during the nth trial then it did not occur during the previous (n1) trials. Then, the geometric distribution is given
by:
G(n; p) = p (1 p)n1
n = 1, 2, 3, . . .
(2.34a)
An extension of the above concept relates to the time between two successive occurrences of the same event, called
the recurrence time. Since the events are assumed independent, the mean recurrence time denoted by random variable
T between two consecutive events is simply the expected value of the Bernouilli distribution:
t p(1 p)t1
T = E(T ) =
t=1
(2.34b)
1
= p[1 + 2(1 p) + 3(1 p)2 ]
p
Example 2.4.3:1 Using geometric PDF for 50 year design
wind problems
The design code for buildings in a certain coastal region specifies the 50-year wind as the design wind, i.e., a wind
velocity with a return period of 50 years, or one which may
be expected to occur once every 50 years. What are the probabilities that:
(a) the design wind is encountered in any given year. From
1
1
= 0.02
Eq.2.34b, p = =
50
T
(b) the design wind is encountered during the fifth year
of a newly constructed building (from Eq.2.34a):
G(5; 0.02) = (0.02).(1 0.02)4 = 0.018
(c) the design wind is encountered within the first 5 years:
G(n 5; p) =
5
t=1
Figure2.11 depicts the PDF and the CDF for the geometric
function corresponding to this example.
0.4
From Ang and Tang (2007) by permission of John Wiley and Sons.
300
400
0
0
100
200
x
300
400
(d) Hypergeometric Distribution. The Binomial distribution applies in the case of independent trials or when sampling from a batch of items is done with replacement. Another type of dependence arises when sampling is done without
replacement. This case occurs frequently in areas such as acceptance sampling, electronic testing and quality assurance
where the item is destroyed during the process of testing. If
n items are to be selected without replacement from a set of
N items which contain k items that pass a success criterion,
the PDF of the number X of successful items is given by the
hypergeometric distribution:
C(k, x) C(N k, n x)
C(N , n)
k
N k
x
nx
=
N
n
x = 0, 1, 2, 3x
= 0, 1, 2, 3 . . .
H (x; N , n, k) =
nk
and
N
variance 2 = N n n k 1 k
N 1
N
N
with mean =
(2.35a)
(2.35b)
Note that C(k, x) is the number of ways x items can be chosen from the k successful set, while C(N-k, n-x) is the
number of ways that the remainder (n-x) items can be chosen
from the unsuccessful set of (N-k) items. Their product
divided by the total number of combinations of selecting
equally likely samples of size n from N items is represented
by Eq.2.35a.
Example 2.4.4: Lots of 10 computers each are called acceptable if they contain no fewer than 2 defectives. The procedure for sampling the lot is to select 5 computers at random
and test for defectives. What is the probability that exactly
one defective is found in the sample if there are 2 defectives
in the entire lot?
Using the hypergeometric distribution given by Eq.2.35a
with n=5, N=10, k=2 and x=1:
40
2
10 2
1
51
= 0.444
H (1; 10, 5, 2) =
10
5
(e) Multinomial Distribution. A logical extension to Bernouilli experiments where the result is a two-way outcome,
either success/good or failure/defective, is the multinomial experiment where k possible outcomes are possible. An
example of k=5 is when the grade of a student is either A, B,
C, D or F. The issue here is to find the number of combinations of n items which can be partitioned into k independent
groups (a student can only get a single grade for the same
class) with x1 being in the first group, x2 in the second,
This is represented by:
n!
n
(2.36a)
=
x1 , x2 , . . . xk
x1 !x2 ! . . . xk !
with the conditions that ( x1 + x2 + . . . + xk ) = n and that
all partitions are mutually exclusive and occur with equal
probability from one trial to the next. It is intuitively obvious
that when n is large and k is small, the hypergeometric distribution will tend to closely approximate the Binomial.
Just like Bernouilli trials lead to the Binomial distribution, the multinomial experiment leads to the multinomial
distribution which gives the probability distribution of k random variables x1, x2,xk in n independent trials occurring
with probabilities p1, p2, pk:
f (x1 , x2 , . . . xk ) =
with
k
i=1
xi = n
n
x1 , x2 , ..xk
and
k
i=1
pi = 1
Example 2.4.5: Consider an examination given to 10 students. The instructor, based on previous years experience,
expects the distribution given in Table2.6.
On grading the exam, he finds that 5 students got an A, 3 got
a B and 2 got a C, and no one got either D or F. What is the
probability that such an event could have occurred purely by
chance?
This answer is directly provided by Eq.2.36b which yields the corresponding probability of the above event taking
place:
Table2.6 PDF of student grades for a class
X
A
B
C
D
p(X)
0.2
0.3
0.3
0.1
F
0.1
f (A, B, C, D, F) =
10
5, 3, 2, 0, 0
p(x; t) =
x = 0, 1, 2, 3 . . .
(2.37a)
where is called the mean occurrence rate, i.e., the average number of occurrences of the event per unit time (or
space) interval t. A special feature of this distribution is that
its mean or average number of outcomes per time t and its
variance 2 are such that
(X) = 2 (X) = t = n p
(2.37b)
Akin to the Binomial distribution, tables for certain combinations of the two parameters allow the cumulative Poisson
distribution to be read off directly (see TableA2) with the
latter being defined as:
P (r; t) =
r
P (x; t)
(2.37c)
x=0
0.2
0.16
0.8
0.12
0.6
CDF
41
0.08
0.04
0.2
0
0
12
15
= 1 0.9513 = 0.0487
(4)0 e4
= 0.018
0!
p(X = 4) =
12
15
P (6; 4) =
0.4
(4) e4
= 0.195
4!
Note that though the average is four, the probability of actually encountering four storms in a year is less than 20%. Figure2.12 represents the PDF and CDF for different number
of X values for this example.
N (x; , ) =
1
(x ) 2
]
exp [
(2 )1/2
(2.38a)
where and are the mean and standard deviation respectively of the random variable X. Its name stems from an
erroneous earlier perception that it was the natural pattern
followed by distributions and that any deviation from it required investigation. Nevertheless, it has numerous applications in practice and is the most important of all distributions
studied in statistics. Further, it is the parent distribution for
several important continuous distributions as can be seen
from Fig.2.9. It is used to model events which occur by
chance such as variation of dimensions of mass-produced
items during manufacturing, experimental errors, variability in measurable biological characteristics such as peoples
height or weight, Of great practical import is that normal
distributions apply in situations where the random variable is
the result of a sum of several other variable quantities acting
independently on the system.
The shape of the normal distribution is unimodal and
symmetrical about the mean, and has its maximum value
at x= with points of inflexion at x = . Figure2.13
illustrates its shape for two different cases of and . Further, the normal distribution given by Eq.2.38a provides a
convenient approximation for computing binomial probabilities for large number of values (which is tedious), provided
[np(1p)]>10.
In problems where the normal distribution is used, it is
more convenient to standardize the random variable into a
new random variable z x
with mean zero and vari
ance of unity. This results in the standard normal curve or
z-curve:
1
N (z; 0, 1) = exp (z 2 /2).
2
(2.38b)
42
0.16
N(10,2.5)
0.12
0.08
N(10,5)
0.04
0
10
10
20
30
N (z1 z z2 ) =
z2
z1
1
exp (z2 /2)dz
2
(2.38c)
The shaded area in TableA3 permits evaluating the above integral, i.e., determining the associated probability assuming
z1=. Note that for z=0, the probability given by the shaded area is equal to 0.5. Since not all texts adopt the same
format in which to present these tables, the user is urged to
use caution in interpreting the values shown in such tables.
Fig.2.14 Figures meant to illus trate that the shaded areas are the
physical representations of the
tabulated standardized probability values in TableA3. a Lower
limit. b Upper limit
0.4
0.4
z = 1.2
0.3
0.2
0.1
z = 0.8
0.3
0.2
p (1.2)
=0.1151
p ( 0.8)
=0.2119
0.1
0
3
z= 0.8
0
z
0
z
43
(c) Lognormal Distribution. This distribution is appropriate for non-negative outcomes which are the product of a
number of quantities. In such cases, the data are skewed and
the symmetrical normal distribution is no longer appropriate.
If a variate X is such that log(X) is normally distributed, then
0.4
Normal
N(0,1)
d.f=10
0.3
0.2
d.f=5
0.1
0
3
0
x
Fig.2.15 Comparison of the normal (or Gaussian) z curve to two Student-t curves with different degrees of freedom (d.f.). As the d.f. increase, the PDF for the Student-t distribution flattens out and deviates
increasingly from the normal distribution
0.8
L(1,1)
0.6
L(2,2)
0.4
L(3,3)
0.2
0
0
10
Fig.2.16 Lognormal distributions for different mean and standard deviation values
( ln x )2
exp
2 2
.x( 2)
=0
when x 0
elsewhere
(2.39)
The lognormal curves are a family of skewed curves as illustrated in Fig.2.16. Lognormal failure laws apply when
the degradation in lifetime is proportional to the previous
amount of degradation. Typical applications in civil engineering involve flood frequency, in mechanical engineering with
crack growth and mechanical wear, and in environmental engineering with pollutants produced by chemical plants and
threshold values for drug dosage.
Example 2.4.11: Using lognormal distributions for pollutant concentrations
Concentration of pollutants produced by chemical plants is
known to resemble lognormal distributions and is used to
evaluate issues regarding compliance of government regulations. The concentration of a certain pollutant, in parts
per million (ppm), is assumed lognormal with parameters
=4.6 and =1.5. What is the probability that the concentration exceeds 10ppm?
One can use Eq.2.39, or simpler still, use the z tables
(TableA3) by suitable transformations of the random variable.
ln (10) 4.6
L(X > 10) = N [ ln (10), 4.6, 1.5] = N
1.5
= N ( 1.531) = 0.0630
44
(d) Gamma Distribution. There are several processes where distributions other than the normal distribution are warranted. A distribution which is useful since it is versatile in
the shapes it can generate is the gamma distribution (also
called the Erlang distribution). It is a good candidate for modeling random phenomena which can only be positive and
are unimodal. The gamma distribution is derived from the
gamma function for positive values of , which one may recall from mathematics, is defined by the integral:
x () =
x
0
(2.40a)
x1 ex dx
(2.40b)
(k + 1) = k!
The continuous random variable X has a gamma distribution with positive parameters and if its density function
is given by:
G(x; , ) = ex
=0
x 1
( 1)!
x>0
(2.40c)
elsewhere
= /
and
E(x;) = ex
=0
2 = /2
if x > 0
otherwise
(2.41a)
bution which applied to the discrete case. It is used to model the interval between two occurrences, e.g. the distance
between consecutive faults in a cable, or the time between
chance failures of a component (such as a fuse) or a system,
or the time between consecutive emissions of -particles, or
the time between successive arrivals at a service facility. Its
PDF is given by
= 1/
and
2 = 1/2
(2.41b)
The distribution is represented by a family of curves for different values of (see Fig.2.18). Exponential failure laws
apply to products whose current age does not have much effect on their remaining lifetimes. Hence, this distribution is
said to be memoryless. Notice the relationship between the
exponential and the Poisson distributions. While the latter
represents the number of failures per unit time, the exponential represents the time between successive failures. Its CDF
is given by:
CDF [E(a, )] =
(2.40d)
a
0
.ex dx = 1 ea
(2.41c)
(e) Exponential Distribution. A special case of the gamma distribution for =1 is the exponential distribution. It is
the continuous distribution analogue to the geometric distri-
= 0.3297
0.3
2.4
0.25
G(3,1)
0.15
G(0.5,1)
1.6
PDF
0.2
PDF
G(3,0.33)
0.1
G(3,0.2)
0.05
1.2
G(1,1)
0.8
G(3,1)
0.4
0
0
10
15
20
X
25
30
35
40
6
X
10
12
45
1.6
E(0.5)
1.2
E(1)
0.8
= 7.9
from which = 0.8862
E(2)
0.4
0
0
= 1 [1 e0.4(2) ] = 0.4493
(b) using the PDF given by Eq.2.42, it is left to the reader to compute the probability of the wind speed being
equal to 10m/s (and verify the solution against the figure which indicates a value of 0.064).
1
exp [ (x/) ] for x > 0
W (x; , ) = x
(2.42a)
=0
elsewhere
with mean
1
= 1+
(2.42b)
Figure2.19 shows the versatility of this distribution for different sets of and values. Also shown is the special case
Fig.2.19 Weibull distributions
0.8
W(1,1)
0.6
W(2,1)
0.4
W(10,1)
W(10,2)
W(2,5)
2
0.2
0
W(10,0.5)
6
X
10
2
X
46
0.8
0.12
0.1
0.6
F(6,24)
PDF
0.08
0.06
0.4
F(6,5)
0.04
0.2
0.02
0
0
0
10
15
X
20
25
1.2
1
0.8
PDF
30
2(1)
0.6
2(4)
0.4
2(6)
0.2
0
0
6
X
10
12
(i) Uniform Distribution. The uniform probability distribution is the simplest of all PDFs and applies to both continuous and discrete data whose outcomes are all equally likely,
i.e. have equal probabilities. Flipping a coin for heads/tails
or rolling a dice for getting numbers between 1 and 6 are
examples which come readily to mind. The probability density function for the discrete case where X can assume values
x1, x2,xk is given by:
1
(2.44a)
U (x; k) =
k
with mean =
variance 2 =
k
xi
i=1
k
k
i=1
and
(2.44b)
(xi )2
k
1
x /21 ex/2 x > 0
2 (x; ) = /2
(2.43a)
2 (/2)
=0
elsewhere
while the mean and variance values are :
=v
and
2 = 2v
(2.43b)
(h) F-Distribution. While the t-distribution allows comparison between two sample means, the F distribution allows comparison between two or more sample variances. It
is defined as the ratio of two independent chi-square random variables, each divided by its degrees of freedom. The
F distribution is also represented by a family of plots (see
Fig.2.22) where each plot is specific to a set of numbers representing the degrees of freedom of the two random variables (v1, v2). TableA6 assembles critical values of the F-distri-
1
when c < x < d
d c
=0
otherwise
U (x) =
(2.44c)
f(x)
1
dc
c
c+d
2
47
and
2 =
(d c)2
12
(2.44d)
Beta(0.5,3)
Beta(0.5,1)
2
1
0
0
0.2
0.4
0.6
0.8
Beta(1,0.5)
Beta(1,3)
4
Beta(1,2)
3
PDF
(2.45a)
Beta(0.5,2)
Example 2.4.14: A random variable X has a uniform distribution with c=5 and d=10 (see Fig.2.23). Determine:
(a) On an average, what proportion will have a negative value? (Answer: 1/3)
(b) On an average, what proportion will fall between 2 and
2? (Answer: 4/15)
(p + q + 1)! p1
Beta(x; p, q) =
x (1 x)q1
(p 1)!(q 1)!
Beta(0.5,0.5)
x2 x 1
U (x1 X x2 ) =
dc
Beta(1,1)
2
1
(2.45b)
This distribution originates from the Binomial distribution,
and one can detect the obvious similarity of a two-outcome
affair with specified probabilities. The usefulness of this distribution will become apparent in Sect. 2.5.3 dealing with the
Bayesian approach to probability problems.
0.2
0.4
0.6
0.8
X
6
Beta(2,0.5)
5
Beta(2,2)
Beta(2,1)
Beta(2,3)
3
PDF
p
and
The mean of the Beta distribution =
p+q
pq
variance 2 =
2
(p + q) (p + q + 1)
2
1
0
0
0.2
0.4
0.6
0.8
p(A B)
It was stated in Sect.2.1.4 that the Bayesian viewpoint can
p(B/A) =
(2.46)
2
There are several texts which deal with Bayesian statistics; for example, Bolstad (2004).
48
p(B/A) =
p(A/B) p(B)
(2.47)
Bayes theorem, superficially, appears to be simply a restatement of the conditional probability equation given by
Eq.2.12. The question is why is this reformulation so insightful or advantageous? First, the probability
is now re
expressed in terms of its disjoint parts {B, B }, and second
the probabilities have been flipped, i.e., p(B/A) is now expressed in terms of p(A/B). Consider the two events A and
B. If event A is observed while event B is not, this expression
allows one to infer the flip probability, i.e. probability of
occurrence of B from that of the observed event A. In Bayesian terminology, Eq.2.47 can be written as:
Posterior
probability of event B given event A
(Likelihood of A given B) (Prior probability of B)
=
Prior probability of A
(2.48)
p(A) =
n
j =1
p(A Bj ) =
n
j =1
p(A Bi )
p(A/Bi ) p(Bi )
= n
(2.50)
p(A)
p(A/Bj ) p(Bj )
j =1
likelihood
Marble
color
B1
Box
2/5
3/5
1.0
1/4
0.0
B3
1/8
B4
Fig.2.25 Bayes theorem for multiple events depicted on a Venn diagram. In this case, the sample space is assumed to be partitioned into
four discrete events B1B4. If an observable event A has already occurp(B3 A)
. This is the
red, the conditional probability of B3 : p(B3 /A) = p(A)
ratio of the hatched area to the total area inside the ellipse
B2
B3 A
prior
which is known as Bayes theorem for multiple events. As before, the marginal or prior probabilities p(Bi ) for i = 1, ..., n
are assumed to be known in advance, and the intention is to
update or revise our belief on the basis of the observed
evidence of event A having occurred. This is captured by the
probability p(Bi /A) for i = 1, ..., n called the posterior probability or the weight one can attach to each event Bi after
event A is known to have occurred.
5/8
(2.49)
p(A/Bj ) p(Bj )
0.0
G
1.0
B
A
B
A
49
Outcome
Probability
Fault-free
A
0.95
A1
0.99
0.05
0.90
0.01
B
Faulty
0.10
the posterior probabilities of event A with R having occurred, i.e., they are relevant after the experiment has been performed. Thus, from the law of total probability (Eq.2.47):
A2
B1
B2
Diagnosis
State of
equipment
0.9405
Fine
Fine
0.0495
Faulty
Fine
0.009
Faulty
Faulty
0.001
Fine
Faulty
False alarm
Missed
opportunity
A
A1
No alarm
1 3
.
3
2
4
p(B/R) =
=
1 1 1 3
5
. + .
2 2 2 4
0.001
B2
A2
Missed opportunity
False alarm
0.846
Alarm
and
1 1
.
2
2
2
p(A/R) =
=
1 1 1 3
5
. + .
2 2 2 4
B1
B
50
(0.99).(0.05)
p(A/A2) =
(0.99).(0.05) + (0.01).(0.90)
0.0495
=
0.0495 + 0.009
= 0.846
This is very high for practical situations and could well result
in the operator disabling the fault detection system altogether. One way of reducing this false alarm rate, and thereby
enhance robustness, is to increase the sensitivity of the detection device from its current 90% to something higher by
altering the detection threshold. This would result in a higher
missed opportunity rate, which one has to accept for the price of reduced false alarms. For example, the current missed
opportunity rate is:
(0.01) (0.10)
(0.01) (0.10) + (0.99) (0.95)
0.001
=
= 0.001
0.001 + 0.9405
p(B/B1) =
Limiting
case of
infinite
defectives
0.0
0.0
0.0
0.0
1.0
1.0
From Ang and Tang (2007) by permission of John Wiley and Sons.
51
PDF
1
Prior To Testing
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0.2
PDF
1
0.4
0.6
0.8
1.0
0.2
PDF
1.0
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.2
0.4
0.6
0.8
1.0
Fig.2.29 Illustration of how the prior discrete PDF is affected by data collection following Bayes theorem
(0.05)=0.44 (as shown in the last row under the second column).
This is the prior probability.
Suppose the first pile tested is found to be defective. How
should the engineer revise his prior probability of the proportion of piles likely to be defective? This is given by Bayes theorem (Eq.2.50). For proportion x=0.2, the posterior
probability is:
(0.2)(0.3)
(0.2)(0.3) + (0.4)(0.4) + (0.6)(0.15) + (0.8)(0.10) + (1.0)(0.05)
0.06
=
0.44
= 0.136
p(x = 0.2) =
This is the value which appears in the first row under the third
column. Similarly the posterior probabilities for different va-
p(x = 0.2) =
(0.2)(0.136)
= 0.049
(0.2)(0.136) + (0.4)(0.364) + (0.6)(0.204) + (0.8)(0.182) + (1.0)(0.114)
52
0.2
0.4
There are instances when no previous knowledge or information is available about the behavior of the random variable; this is sometime referred to as prior of pure ignorance. It
can be shown that this assumption of the prior leads to results
identical to those of the traditional probability approach (see
Examples 2.5.5 and 2.5.6).
Example 2.5.4:4 Consider a machine whose prior pdf of the
proportion x of defectives is given by Table2.8.
If a random sample of size 2 is selected, and one defective
is found, the Bayes estimate of the proportion of defectives
produced by the machine is determined as follows.
Let y be the number of defectives in the sample. The probability that the random sample of size 2 yields one defective
is given by the Binomial distribution since this is a two-outcome situation:
2
f (y/x) = B(y; n, x) =
x y (1 x)2y ; y = 0, 1, 2
y
If x=0.1, then
f (1/0.1) = B(1; 2, 0.1) =
2
1
(0.1)1 (0.9)21
= 0.18
p(x) x a (1 x)b
(2.51)
(2.52)
53
L(x) =
n
y
xy (1 x)ny
0x1
(2.53)
Notice that the Beta distribution is the same form as the likelihood function. Consequently, the posterior distribution
given by Eq.2.53 assumes the form:
Example 2.5.5: Repeat Example 2.5.4 assuming that no information is known about the prior. In this case, assume a
uniform distribution.
The prior pdf can be found from the Binomial distribution:
2
1
x1 (1 x)21
= 2x(1 x)
1
0
1
2x(1 x)dx =
3
f (x/y = 1) =
2x(1 x)
= 6x(1 x)
1/3
x=6
1
0
x 2 (1 x)dx = 0.5
which can be compared to the value of 0.5 given by the classical method.
(2.54)
f (y/x) = B(1; 2, x) =
for
0x1
The likelihood function for the case of the single tested pile
turning out to be defective is x, i.e. L(x)=x. The posterior
distribution is then:
f (x/y) = k x(1.0)
for
0x1
Example 2.5.7:6 Enhancing historical records of wind velocity using the Bayesian approach
Buildings are designed to withstand a maximum wind speed
which depends on the location. The probability x that the
wind speed will not exceed 120km/h more than once in 5
years is to be determined. Past records of wind speeds of a
nearby location indicated that the following beta distribution
would be an acceptable prior for the probability distribution
(Eq.2.45):
p(x) = 20x3 (1 x)
for
0x1
54
Posterior
2
f (p)
Likelihood
Prior
0.2
0.4
0.6
0.8
1.0
for
0x1
55
Table2.9 Estimates of absolute probabilities for different natural disasters in the United States. (Adapted from Barton and Nishenko 2008)
Exposure Times
Disaster
Earthquakes
Hurricanes
Floods
Tornadoes
20 years
0.89
>0.99
>0.99
>0.99
20 years
0.26
0.71
0.08
0.11
56
25
90th percentile
50th percentile
20
10th percentile
15
10
5
14
17
16
11
15
12
18
13
10
Fig.2.31 Example illustrating large differences in subjective probability. A group of prominent economists, ecologists and natural scientists
were polled so as to get their estimates of the loss of gross world product due to doubling of atmosphereic carbon dioxide (which is likely to
occur by the end of the twenty-first century when mean global temperatures increase by 3C). The two ecologists predicted the highest adverse impact while the lowest four individuals were economists. (From
Nordhaus 1994)
Problems
Pr. 2.1 An experiment consists of tossing two dice.
(a) List all events in the sample space
(b) What is the probability that both outcomes will have the
same number showing up both times?
(c) What is the probability that the sum of both numbers
equals 10?
Pr. 2.2 Expand Eq.2.9 valid for two outcomes to three outcomes: p(A B C) = ....
Pr. 2.3 A solar company has an inspection system for batches of photovoltaic (PV) modules purchased from different
vendors. A batch typically contains 20 modules, while the
inspection system involves taking a random sample of 5 modules and testing all of them. Suppose there are 2 faulty modules in the batch of 20.
(a) What is the probability that for a given sample, there
will be one faulty module?
(b) What is the probability that both faulty modules will be
discovered by inspection?
Pr. 2.4 A county office determined that of the 1000 homes
in their area, 600 were older than 20 years (event A), that
500 were constructed of wood (event B), and that 400 had
central air conditioning (AC) (event C). Further, it is found
that events A and B occur in 300 homes, that all three events
occur in 150 homes, and that no event occurs in 225 homes.
Pr. 2.5 A university researcher has submitted three research proposals to three different agencies. Let E1, E2 and
E3 be the events that the first, second and third bids are
successful with probabilities: p(E1)=0.15, p(E2)=0.20,
p(E3)=0.10. Assuming independence, find the following
probabilities:
(a) that all three bids are successful
(b) that at least two bids are successful
(c) that at least one bid is successful
Pr. 2.6 Consider two electronic components A and B with
probability rates of failure of p(A)=0.1 and p(B)=0.25. What
is the failure probability of a system which involves connecting the two components in (a) series and (b) parallel.
Pr. 2.78 A particular automatic sprinkler system for a highrise apartment has two different types of activation devices
for each sprinkler head. Reliability of such devices is a measure of the probability of success, i.e., that the device will activate when called upon to do so. Type A and Type B devices
have reliability values of 0.90 and 0.85 respectively. In case,
a fire does start, calculate:
(a) the probability that the sprinkler head will be activated
(i.e., at least one of the devices works),
(b) the probability that the sprinkler will not be activated at
all, and
(c) the probability that both activation devices will work
properly.
Pr. 2.8 Consider the two system schematics shown in
Fig.2.32. At least one pump must operate when one chiller is
operational, and both pumps must operate when both chillers
are on. Assume that both chillers have identical reliabilities
of 0.90 and that both pumps have identical reliabilities of
0.95.
(a) Without any computation, make an educated guess as to
which system would be more reliable overall when (i)
one chiller operates, and (ii) when both chillers operate.
(b) Compute the overall system reliability for each of the
configurations separately under cases (i) and (ii) defined
above.
8
From McClave and Benson (1988) by permission of Pearson Education.
Problems
57
C2
C1
C1
C2
P1
P2
Chillers
Pumps P1
P2
Water flow
(system 2)
Water flow
(system 1)
0<x<y<1
elsewhere
f(x,y)
Y
X
1
2
3
1
0.05
0.05
0
2
0.05
0.1
0.2
3
0.1
0.35
0.1
Pr. 2.13 Consider a medical test for a disease. The test has
a probability of 0.95 of correctly or positively detecting an
infected person (this is the sensitivity), while it has a probability of 0.90 of correctly identifying a healthy person (this
is called the specificity). In the population, only 3% of the
people have the disease.
(a) What is the probability that a person testing positive is
actually infected?
(b) What is the probability that a person testing negative is
actually infected?
Pr. 2.14 A large industrial firm purchases several new computers at the end of each year, the number depending on the
frequency of repairs in the previous year. Suppose that the
number of computers X purchased each year has the following PDF:
f (x, y) = 10xy2
Pr. 2.12 Consider the data given in Example 2.2.6 for the
case of a residential air conditioner. You will use the same
data to calculate the flip problem using Bayes law.
(a) During a certain day, it was found that the air-conditioner was operating satisfactorily. Calculate the probabilities that this was a NH= not hot day.
(b) Draw the reverse tree diagram for this case.
X
f(x)
0
0.2
1
0.3
2
0.2
3
0.1
0
0.4
1
0.3
2
0.2
3
0.1
58
a random variable. Further, global radiation has an underlying annual pattern due to the orbital rotation of the earth
around the sun. A widely adopted technique to filter out this
deterministic trend is:
(i) to select the random variable not as the daily radiation
itself but as the daily clearness index K defined as the
ratio of the daily global radiation on the earths surface
for the location in question to that outside the atmosphere for the same latitude and day of the year, and
(ii) to truncate the year into 12 monthly time scales since
the random variable K for a location changes appreciably on a seasonal basis.
Gordon and Reddy (1988) proposed an expression for
Xmax = (n + 3)/(n + 1)
n+1
A = (n + 1)(n + 2)/Xmax
Note that because of the manner of normalization, the random variable selected can assume values greater than unity.
Figure2.33 shows the proposed distribution for a number of
different variance values.
Pr. 2.22 The life in years of a certain type of electrical switches has an exponential distribution with an average life in
years of =5. If 100 of these switches are installed in different systems,
(a) what is the probability that at most 20 will fail during
the first year?
(b) How many are likely to have failed at the end of 3 years?
10
var (X)
4.5
0.01
0.02
0.04
0.06
3.5
0.1
0.15
2.5
2
1.5
1
0.5
0
0
0.5
Random variable X
1.5
Problems
59
Pr. 2.24 Cumulative distribution and utilizability functions for horizontal solar radiation
N
d=1
(18.3 T d )+
in C day
F (X ) = probability(X X ) =
P (x)
X'
Xmax
Radiation ratio
1
Fc
0
Xmin
0
Xmin
X'
Xc
Radiation ratio
Xmax
QLoad = (UA)Bldg
MJ
s
) (106
)
DD (86, 400
day
J
F (X')
0
Xc
Critical radiation ratio
Xmax
Fig.2.34 Relation between different distributions. a Probability density curve (shaded area represents the cumulative distribution value
F(X)). b Cumulative distribution function (shaded area represents utilizability fraction ar Xc). c Utilizability curve. (From Reddy 1987)
(XC ) =
1
FC
(X XC )dF =
X
max
[1 F (X )dX
(2.56b)
XC
The value of the utilizability function for such a critical radiation level XC is shown in Fig.2.34c.
P (X)dX
(2.56a)
Xmin
Pr. 2.25 Generating cumulative distribution curves and utilizability curves from measured data.
The previous two problems involved probability distributions of solar radiation and ambient temperature, and how
these could be used to derive functions for quantities of interest such as the solar utilizability or the Degree-Days. If
monitored data is available, there is no need to delve into
such considerations of probability distributions, and one can
calculate these functions numerically.
60
2.8
2.6
2.4
1.0
2.0
0.9
1.8
0.8
1.6
0.7
1.4
0.6
1.2
i and
H/H and I / I m
2.2
1.0
0.5
0.4
0.8
0.3
0.6
0.4
0.2
0.2
0.1
0
0
0.2
0.4
0.6
Fractional time F
0.8
0.0
1.0
0.4
0.8
1.2
1.6
Xc and Xc
2.0
2.4
2.8
Fig.2.35 Distribution for Quezon City, Manila during October 1980. (From Reddy 1987)
References
Ang, A.H.S. and W.H. Tang, 2007. Probability Concepts in Engineering, 2nd Ed., John Wiley and Sons, USA
Barton, C. and S. Nishenko, 2008. Natural Disasters: Forecasting Economic and Life Losses, U.S. Geological Survey, Marine and Coastal
Geology Program.
http://www.springer.com/978-1-4419-9612-1