FOL Uncertainty

CSEP 573
Logic, Reasoning, and Uncertainty
© CSE AI Faculty
What’s on our menu today?
Propositional Logic
• Resolution
• WalkSAT
Reasoning with First-Order Logic
• Unification
• Forward/Backward Chaining
• Resolution
• Wumpus again
Uncertainty
• Bayesian networks
2
Recall from Last Time:
Inference/Proof Techniques
Two kinds (roughly):
Successive application of inference rules
– Generate new sentences from old in a sound way
– Proof = a sequence of inference rule applications
– Use inference rules as successor function in a
standard search algorithm
– E.g., Resolution
Model checking
– Done by checking satisfiability: the SAT problem
– Recursive depth-first enumeration of models using
heuristics: DPLL algorithm (sec. 7.6.1 in text)
– Local search algorithms (sound but incomplete)
e.g., randomized hill-climbing (WalkSAT)
3
Understanding Resolution
IDEA: To show KB ╞ α, use proof by
contradiction,
i.e., show KB ∧ ¬ α unsatisfiable
KB is in Conjunctive Normal Form (CNF):

KB is conjunction of clauses
E.g., (A ∨ ¬B) ∧ (B ∨ ¬C ∨ ¬D)
Clause
Literals
4
Generating new clauses
General Resolution inference rule (for CNF):
l1 ∨… ∨ l k m1 ∨ … ∨ mn
l1 ∨ … ∨ li-1 ∨ li+1 ∨ … ∨ l k ∨ m1 ∨ … ∨ mj-1 ∨ mj+1…∨ mn
where li and mj are complementary literals (l i = ¬mj)
E.g., P1,3 ∨ P2,2 ¬P2,2

P1,3
5
Why this is sound
Proof of soundness of resolution inference rule:
¬ (l1 ∨ … ∨ li-1 ∨ li+1 ∨ … ∨ l k) ⇒ l i

¬mj ⇒ (m1 ∨ … ∨ mj-1 ∨ mj+1 ∨... ∨ mn)
¬ (li ∨ … ∨ li-1 ∨ li+1 ∨ … ∨ lk) ⇒ (m1 ∨ … ∨ mj-1 ∨

mj+1 ∨... ∨ mn)
(since l i = ¬mj)
6
Resolution example
KB ¬α
You got a literal and its negation Empty clause

What does this mean?
Recall that KB is a conjunction of all these clauses
Is P1,2 ∧ ¬P1,2 satisfiable? No!
Therefore, KB ∧ ¬ α is unsatisfiable, i.e., KB ╞ α
7
Back to Inference/Proof Techniques
Two kinds (roughly):

Successive application of inference rules
– Generate new sentences from old in a sound way
– Proof = a sequence of inference rule applications
– Use inference rules as successor function in a
standard search algorithm
– E.g., Resolution
Model checking
– Done by checking satisfiability: the SAT problem
– Recursive depth-first enumeration of models using
heuristics: DPLL algorithm (sec. 7.6.1 in text)
– Local search algorithms (sound but incomplete)
e.g., randomized hill-climbing (WalkSAT)
8
Why Satisfiability?
Can’t get
¬satisfaction
9
Why Satisfiability?
Recall: KB ╞ α iff KB ∧ ¬α is unsatisfiable
Thus, algorithms for satisfiability can be used for
inference by showing KB ∧ ¬α is unsatisfiable
BUT… showing a sentence is

satisfiable (the SAT problem)
is NP-complete!
Finding a fast algorithm for SAT
automatically yields fast algorithms
for hundreds of difficult (NP-
complete) problems
I really can’t get

¬satisfaction
10
Satisfiability Examples
E.g. 2-CNF sentences (2 literals per clause):
(¬A ∨ ¬B) ∧ (A ∨ B) ∧ (A ∨ ¬B)

Satisfiable?
Yes (e.g., A = true, B = false)
(¬A ∨ ¬B) ∧ (A ∨ B) ∧ (A ∨ ¬B) ∧ (¬A ∨ B)

Satisfiable?
No
11
The WalkSAT algorithm
Local hill climbing search algorithm
• Incomplete: may not always find a
satisfying assignment even if one exists
Evaluation function?
= Number of satisfied clauses
WalkSAT tries to maximize this function
Balance between greediness and randomness
12
The WalkSAT algorithm
Greed Randomness
13
Hard Satisfiability Problems
Consider random 3-CNF sentences. e.g.,
(¬D ∨ ¬B ∨ C) ∧ (B ∨ ¬A ∨ ¬C) ∧ (¬C ∨ ¬B ∨ E) ∧
(E ∨ ¬D ∨ B) ∧ (B ∨ E ∨ ¬C)
m = number of clauses
n = number of symbols
• Hard instances of SAT seem to cluster near

m/n = 4.3 (critical point)
14
15
Median runtime for random 3-CNF sentences, n = 50
16
What about me?
Putting it all together:
Logical Wumpus Agents
A wumpus-world agent using propositional logic:
¬P1,1
¬W1,1
For x = 1, 2, 3, 4 and y = 1, 2, 3, 4, add
(with appropriate boundary conditions):
Bx,y ⇔ (Px,y+1 ∨ Px,y-1 ∨ Px+1,y ∨ Px-1,y)
Sx,y ⇔ (Wx,y+1 ∨ Wx,y-1 ∨ Wx+1,y ∨ Wx-1,y)
W1,1 ∨ W1,2 ∨ … ∨ W4,4 At least 1 wumpus
¬W1,1 ∨ ¬W1,2
At most 1 wumpus
¬W1,1 ∨ ¬W1,3
…
⇒ 64 distinct proposition symbols, 155 sentences!
18
Limitations of propositional logic
KB contains "physics" sentences for every single
square
For every time step t and every location [x,y],

we need to add to the KB:
t+1
Lx,y ∧ FacingRight
t t ∧ Forward t ⇒ Lx+1,y
Rapid proliferation of sentences!
19
What we’d like is a way to talk about
objects and groups of objects, and to
define relationships between them
Enter…First-Order Logic
(aka “Predicate logic”)
Propositional vs. First-Order
Propositional logic
Facts: p, q, ¬r, ¬P1,1, ¬W1,1 etc.
(p ∧ q) v (¬r v q ∧ p)
First-order logic
Objects: George, Monkey2, Raj, 573Student1, etc.
Relations:
Curious(George), Curious(573Student1), …
Smarter(573Student1, Monkey2)
Smarter(Monkey2, Raj)
Stooges(Larry, Moe, Curly)
PokesInTheEyes(Moe, Curly)
PokesInTheEyes(573Student1, Raj)
21
FOL Definitions
Constants: George, Monkey2, etc.
• Name a specific object.
Variables: X, Y.
• Refer to an object without naming it.
Functions: banana-of, grade-of, etc.
• Mapping from objects to objects.
Terms: banana-of(George), grade-of(stdnt1)
• Logical expressions referring to objects
Relations (predicates): Curious, PokesInTheEyes, etc.
• Properties of/relationships between objects.
22
More Definitions
Logical connectives: and, or, not, ⇒, ⇔
Quantifiers:
• ∀ For all (Universal quantifier)
• ∃ There exists (Existential quantifier)
Examples
• George is a monkey and he is curious
Monkey(George) ∧ Curious(George)
• All monkeys are curious
∀m: Monkey(m) ⇒ Curious(m)
• There is a curious monkey
∃m: Monkey(m) ∧ Curious(m)
23
Quantifier / Connective
Interaction
M(x) == “x is a monkey”
C(x) == “x is curious”
∀x: M(x) ∧ C(x)
“Everything is a curious monkey”
∀x: M(x) ⇒C(x)
“All monkeys are curious”
∃x: M(x) ∧ C(x)
“There exists a curious monkey”
∃x: M(x) ⇒ C(x)
“There exists an object that is either a curious
monkey, or not a monkey at all”
24
Nested Quantifiers:
Order matters!
∀x ∃y P(x,y) ≠ ∃y ∀x P(x,y)
Examples
Every monkey has a tail Every monkey shares a tail!
∀m ∃t has(m,t) ∃t ∀m has(m,t)
Try:
Everybody loves somebody vs. Someone is loved by everyone
∀x ∃y loves(x, y) ∃y ∀x loves(x, y)
25
Semantics
Semantics = what the arrangement of symbols means in
the world
Propositional logic
• Basic elements are variables
(refer to facts about the world)
• Possible worlds: mappings from variables to T/F
First-order logic
• Basic elements are terms
(logical expressions that refer to objects)
• Interpretations: mappings from terms to real-
world elements.
26
Example: A World of Kings and Legs
Syntactic elements:
Constants: Functions: Relations:
•
Richard John LeftLeg(p) On(x,y) King(p)
27
Interpretation I
Interpretations map syntactic tokens to model elements
• Constants: Functions: Relations:
•
28
Interpretation II
• Constants: Functions: Relations:
•
29
How Many Interpretations?
Two constants (and 5 objects in world)
• Richard, John (R, J, crown, RL, JL)
52 = 25 object mappings
One unary relation
King(x)
Infinite number of values for x Æ infinite mappings
Even if we restricted x to: R, J, crown, RL, JL:
25 = 32 unary truth mappings
Two binary relations
• Leg(x, y); On(x, y)
Infinite. But even restricting x, y to five objects
still yields 225 mappings for each binary relation
30
Satisfiability, Validity, &
Entailment
S is valid if it is true in all interpretations
S is satisfiable if it is true in some interp
S is unsatisfiable if it is false in all interps
S1 ╞ S2 (S1 entails S2) if

For all interps where S1 is true,
S2 is also true
31
Propositional. Logic vs. First Order
Objects,
Ontology Facts (P, Q,…) Properties,
Relations
Syntax Atomic sentences Variables & quantification

Connectives Sentences have structure: terms
father-of(mother-of(X)))
Semantics Interpretations
Truth Tables (Much more complicated)
Inference DPLL, WalkSAT Unification

Algorithm Fast in practice Forward, Backward chaining
Prolog, theorem proving
Complexity NP-Complete Semi-decidable

32
First-Order Wumpus World
Objects
• Squares, wumpuses, agents,
• gold, pits, stinkiness, breezes
Relations
• Square topology (adjacency),
• Pits/breezes,
• Wumpus/stinkiness
33
Wumpus World: Squares
• Each square as an object:
Square1,1, Square1,2, …,
Square3,4, Square4,4
•Square topology relations?
Adjacent(Square1,1, Square2,1)
…
Adjacent(Square3,4, Square4,4)
Better: Squares as lists:
[1, 1], [1,2], …, [4, 4]
Square topology relations:
∀x, y, a, b: Adjacent([x, y], [a, b]) Ù
[a, b] Є {[x+1, y], [x-1, y], [x, y+1], [x, y-1]}
34
Wumpus World: Pits
•Each pit as an object:
Pit1,1, Pit1,2, …,
Pit3,4, Pit4,4
• Problem?
Not all squares have pits
List only the pits we have?
Pit3,1, Pit3,3, Pit4,4
Problem?
No reason to distinguish pits (same properties)
Better: pit as unary predicate
Pit(x)
Pit([3,1]); Pit([3,3]); Pit([4,4]) will be true
35
Wumpus World: Breezes
• Represent breezes like pits,

as unary predicates:
Breezy(x)
“Squares next to pits are

breezy”:
∀x, y, a, b:
Pit([x, y]) ∧ Adjacent([x, y], [a, b]) ⇒ Breezy([a, b])
36
Wumpus World: Wumpuses
• Wumpus as object:
Wumpus
• Wumpus home as unary

predicate:
WumpusIn(x)
Better: Wumpus’s home as a function:
Home(Wumpus) references the wumpus’s home square.
37
FOL Reasoning: Outline
Basics of FOL reasoning
Classes of FOL reasoning methods
• Forward & Backward Chaining
• Resolution
• Compilation to SAT
38
Basics: Universal Instantiation
Universally quantified sentence:
• ∀x: Monkey(x) ⇒ Curious(x)
Intutively, x can be anything:
• Monkey(George) ⇒ Curious(George)
• Monkey(473Student1) ⇒ Curious(473Student1)
• Monkey(DadOf(George)) ⇒ Curious(DadOf(George))
Formally: (example)
∀x S ∀x Monkey(x) Æ Curious(x)
Subst({x/p}, S) Monkey(George) Æ Curious(George)
x is replaced with p in S, x is replaced with George in S,

and the quantifier removed and the quantifier removed
39
Basics: Existential Instantiation
Existentially quantified sentence:
• ∃x: Monkey(x) ∧ ¬Curious(x)
Intutively, x must name something. But what?
• Monkey(George) ∧ ¬Curious(George) ???
• No! S might not be true for George!
Use a Skolem Constant :

• Monkey(K) ∧ ¬Curious(K)
…where K is a completely new symbol (stands for the monkey
for which the statement is true)
Formally:
∃x S
Subst({x/K}, S)
K is called a Skolem constant
40
Basics: Generalized Skolemization
What if our existential variable is nested?
• ∀x ∃y: Monkey(x) ⇒ HasTail(x, y)
• ∀x: Monkey(x) ⇒ HasTail(x, K_Tail) ???
Existential variables can be replaced by Skolem

functions
• Args to function are all surrounding ∀ vars
∀x: Monkey(x) ⇒ HasTail(x, f(x))
“tail-of” function
41
Motivation for Unification
What if we want to use modus ponens?
Propositional Logic:
a ∧ b, a ∧ b ⇒ c
c
In First-Order Logic?
Monkey(x) ⇒ Curious(x)
Monkey(George)
????
Must “unify” x with George:
Need to substitute {x/George} in Monkey(x) ⇒ Curious(x) to
infer Curious(George)
42
What is Unification?
Not this kind of unification…

43
What is Unification?
Match up expressions by finding variable
values that make the expressions identical
Unify(x, y) returns most general unifier (MGU).
MGU places fewest restrictions on values of variables
Examples:
• Unify(city(x), city(seattle)) returns {x/seattle}
• Unify(PokesInTheEyes(Moe,x), PokesInTheEyes(y,z))
returns {y/Moe,z/x}
– {y/Moe,x/Moe,z/Moe} possible but not MGU
44
Unification and Substitution
Unification produces a mapping from variables to
values (e.g., {x/kent,y/seattle})
Substitution: Subst(mapping,sentence) returns new
sentence with variables replaced by values
• Subst({x/kent,y/seattle}),connected(x, y)),
returns connected(kent, seattle)
45
Unification Examples I
Unify(road(x, kent), road(seattle, y))
• Returns {x / seattle, y / kent}
• When substituted in both expressions, the
resulting expressions match:
• Each is (road(seattle, kent))
Unify(road(x, x), road(seattle, kent))

• Not possible – Fails!
• x can’t be seattle and kent at the same time!
46
Unification Examples II
Unify(f(g(x, dog), y)), f(g(cat, y), dog)
• {x / cat, y / dog}
Unify(f(g(x)), f(x))
• Fails: no substitution makes them identical.
• E.g. {x / g(x) } yields f(g(g(x))) and f(g(x))
which are not identical!
47
Unification Examples III
Unify(f(g(cat, y), y), f(x, dog))
• {x / g(cat, dog), y / dog}
Unify(f(g(y)), f(x))
• {x / g(y)}
Back to curious monkeys:

Monkey(x) Æ Curious(x)
Monkey(George)
Curious(George)
Unify and then use modus ponens =

generalized modus ponens
(“Lifted” version of modus ponens)
48
Inference I: Forward Chaining
The algorithm:
• Start with the KB
• Add any fact you can generate with GMP (i.e.,
unify expressions and use modus ponens)
• Repeat until: goal reached or generation halts.
49
Inference II: Backward
Chaining
The algorithm:
• Start with KB and goal.
• Find all rules whose results unify with goal:
Add the premises of these rules to the goal list
Remove the corresponding result from the goal list
• Stop when:
Goal list is empty (SUCCEED) or
Progress halts (FAIL)
50
Inference III: Resolution
[Robinson 1965]
{ (p ∨ q), (¬ p ∨ r ∨ s) } (q ∨ r ∨ s)
Recall Propositional Case:

•Literal in one clause
•Its negation in the other
•Result is disjunction of other literals
51
First-Order Resolution
[Robinson 1965]
{ (p(x) ∨ q(A), (¬ p(B) ∨ r(x) ∨ s(y)) }
Substitute
(q(A) ∨ r(B) ∨ s(y))
MGU {x/B}
in all
literals
• Literal in one clause

• Negation of something which unifies in other
• Result is disjunction of all other literals with
substitution based on MGU
52
Inference using First-Order
Resolution
As before, use “proof by contradiction”
To show KB╞ α, show KB ∧ ¬α unsatisfiable
Method
• Let S = KB ∧ ¬goal
• Convert S to clausal form
– Standardize apart variables (change names if needed)
– Move quantifiers to front, skolemize to remove ∃
– Replace ⇒ with ∨ and ¬
– DeMorgan’s laws to get CNF (ands-of-ors)
• Resolve clauses in S until empty clause
(unsatisfiable) or no new clauses added
53
First-Order Resolution
Given
Example
• ∀x man(x) ⇒ human(x)
• ∀x woman(x) ⇒ human(x)
• ∀x singer(x) ⇒ man(x) ∨ woman(x)
• singer(M)
Prove
• human(M)
CNF representation (list of clauses):

[¬m(x),h(x)] [¬w(y), h(y)] [¬s(z),m(z),w(z)] [s(M)] [¬h(M)]
54
FOL Resolution Example
[¬m(x),h(x)] [¬w(y), h(y)] [¬s(z),m(z),w(z)] [s(M)] [¬h(M)]
[m(M),w(M)]
[w(M), h(M)]
[h(M)]
[]
55
Back To the Wumpus World
Recall description:
• Squares as lists: [1,1] [3,4] etc.
• Square adjacency as binary predicate.
• Pits, breezes, stenches as unary predicates:
Pit(x)
• Wumpus, gold, homes as functions:
Home(Wumpus)
56
Back To the Wumpus World
“Squares next to pits are breezy”:
∀x, y, a, b:
Pit([x, y]) ∧ Adjacent([x, y], [a, b]) ⇒
Breezy([a, b])
“Breezes happen only and always next to pits”:

• ∀a,b Breezy([a, b]) Ù
∃ x,y Pit([x, y]) ∧ Adjacent([x, y], [a, b])
57
That’s nice but these algorithms
assume complete knowledge of the
world!
Hard to achieve in most cases

Enter…
Uncertainty
Example: Catching a flight
Suppose you have a flight at 6pm
When should you leave for SEATAC?
• What are the traffic conditions?
• How crowded is security?
60
Leaving time before 6pm P(arrive-in-time)
20 min 0.05
30 min 0.25
45 min 0.50
60 min 0.75
120 min 0.98
1 day 0.99999
Probability Theory: Beliefs about events

Utility theory: Representation of preferences
Decision about when to leave depends on both:

Decision Theory = Probability + Utility Theory
61
What Is Probability?
Probability: Calculus for dealing with nondeterminism and
uncertainty
Probabilistic model: Says how often we expect different

things to occur
Where do the numbers for probabilities come from?

• Frequentist view (numbers from experiments)
• Objectivist view (numbers inherent properties of universe)
• Subjectivist view (numbers denote agent’s beliefs)
62
Why Should You Care?
The world is full of uncertainty
• Logic is not enough
• Computers need to be able to handle uncertainty
Probability: new foundation for AI (& CS!)
Massive amounts of data around today

• Statistics and CS are both about data
• Statistics lets us summarize and understand it
• Statistics is the basis for most learning
Statistics lets data do our work for us
63
Logic vs. Probability
Symbol: Q, R … Random variable: Q …
Boolean values: T, F Values/Domain: you specify

e.g. {heads, tails}, [1,6]
State of the world: Atomic event: a complete
Assignment of T/F to assignment of values to Q… Z
all Q, R … Z • Mutually exclusive
• Exhaustive
Prior probability aka
Unconditional prob: P(Q)
Joint distribution: Prob.
of every atomic event
64
Types of Random Variables
65
Axioms of Probability Theory
Just 3 are enough to build entire theory!
1. All probabilities between 0 and 1
0 ≤ P(A) ≤ 1
2. P(true) = 1 and P(false) = 0
3. Probability of disjunction of events is:
P ( A ∨ B ) = P ( A) + P( B) − P ( A ∧ B )
A ∧ B
A
True
66
Prior and Joint Probability
0.2
sunny, rain, cloudy, snow
We will see later how any question can be answered by

the joint distribution
67
Conditional Probability
Conditional probabilities
e.g., P(Cavity = true | Toothache = true) =
probability of cavity given toothache
Notation for conditional distributions:

P(Cavity | Toothache) = 2-element vector of 2-
element vectors (2 P values when Toothache is true
and 2 P values when false)
If we know more, e.g., cavity is also given (i.e. Cavity =

true), then we have
P(cavity | toothache, cavity) = ?1
New evidence may be irrelevant, allowing simplification:

P(cavity | toothache, sunny) = P(cavity | toothache) = 0.8
68
Conditional Probability
P(A | B) is the probability of A given B
Assumes that B is the only info known.
Defined as:
P ( A, B) P ( A ∧ B )
P( A | B) = =
P( B) P( B)
A A∧ B B
69
Dilemma at the Dentist’s
What is the probability of a cavity given a toothache?

What is the probability of a cavity given the probe catches?
70
Probabilistic Inference by Enumeration
P(toothache) =.108+.012+.016+.064
P(toothache)= ?
= .20 or 20%
71
Inference by Enumeration
P(toothache∨cavity) = ?
.2 + ?
.108 + .012 + .072 + .008 - (.108+.012)
= .28
72
Inference by Enumeration
73
Problems with Enumeration
Worst case time: O(dn)
where d = max arity of random variables
e.g., d = 2 for Boolean (T/F)
and n = number of random variables
Space complexity also O(dn)
• Size of joint distribution
Problem: Hard/impossible to estimate all O(dn)
entries for large problems
74
Independence
A and B are independent iff:
P( A | B) = P( A) These two constraints are

logically equivalent
P(B | A) = P(B)
Therefore, if A and B are independent:

P( A ∧ B)
P( A | B) = = P ( A)
P( B)
P ( A ∧ B) = P ( A) P( B )
75
Independence
2 values
4 values
Complete independence is powerful but rare

What to do if it doesn’t hold?
76
Conditional Independence
Instead of 7 entries, only need 5 (why?)
77
Conditional Independence II
P(catch | toothache, cavity) = P(catch | cavity)
P(catch | toothache,¬cavity) = P(catch |¬cavity)
Why only 5 entries in table?
78
Power of Cond. Independence
Often, using conditional independence reduces the
storage complexity of the joint distribution from
exponential to linear!!
Conditional independence is the most basic & robust

form of knowledge about uncertain environments.
79
Thomas Bayes
Publications:
Reverand Thomas Bayes Divine Benevolence, or an
Nonconformist minister Attempt to Prove That
(1702-1761) the Principal End of the
Divine Providence and
Government is the
Happiness of His
Creatures (1731)
An Introduction to the
Doctrine of Fluxions
(1736)
An Essay Towards Solving
a Problem in the
Doctrine of Chances
(1764)
80
Recall: Conditional Probability
P(x | y) is the probability of x given y
Assumes that y is the only info known.
Defined as:
P ( x, y )
P( x | y ) =
P( y )
P ( y , x ) P ( x, y )
P( y | x) = =
P( x) P( x)
Therefore?
81
Bayes’ Rule
P ( x, y ) = P ( x | y ) P ( y ) = P ( y | x ) P ( x )
⇒
P( y | x) P( x)
P( x y ) =
P( y )
What this useful for?
82
Bayes’ rule is used to Compute Diagnostic
Probability from Causal Probability
E.g. let M be meningitis, S be stiff neck

P(M) = 0.0001,
P(S) = 0.1,
P(S|M)= 0.8 (note: these can be estimated from patients)
P(M|S) =
83
Normalization in Bayes’ Rule
P ( y | x) P ( x)
P(x y) = = α P ( y | x) P( x)
P( y)
1 1 1
α = = =
P ( y ) ∑ P ( y , x ) ∑ P ( y | x )P ( x )
x x
α is called the normalization constant
84
Cond. Independence and the Naïve Bayes Model
85
Example 1: State Estimation
Suppose a robot obtains measurement z

What is P(doorOpen|z)?
86
Causal vs. Diagnostic Reasoning
P(open|z) is diagnostic.
P(z|open) is causal.
Often causal knowledge is easier to obtain.
Bayes rule allows us to use causal knowledge:
count frequencies!
P( z | open) P (open)
P(open | z ) =
P( z )
87
State Estimation Example
P(z|open) = 0.6 P(z|¬open) = 0.3

P(open) = P(¬open) = 0.5
P( z | open) P (open)
P (open | z ) =
P ( z | open) p (open) + P ( z | ¬open) p (¬open)
0.6 ⋅ 0.5 2
P (open | z ) = = = 0.67
0.6 ⋅ 0.5 + 0.3 ⋅ 0.5 3
Measurement z raises the probability that

the door is open from 0.5 to 0.67
88
Combining Evidence
Suppose our robot obtains another observation z2.
How can we integrate this new information?
More generally, how can we estimate

P(x| z1...zn )?
89
Recursive Bayesian Updating
P( zn | x, z1, K, zn − 1) P( x | z1, K, zn − 1)
P( x | z1, K, zn) =
P( zn | z1, K, zn − 1)
Markov assumption: zn is independent of z1,...,zn-1

given x.
P ( zn | x, z1, K , zn − 1) P ( x | z1, K , zn − 1)
P ( x | z1, K , zn ) =
P ( zn | z1, K , zn − 1)
P ( zn | x) P ( x | z1, K , zn − 1)
=
P ( zn | z1, K, zn − 1)
= α P ( zn | x) P( x | z1, K , zn − 1)
= α1...n ∏ P( z | x) P( x)
i =1...n
i
Recursive! 90
Incorporating a Second Measurement
P(z2|open) = 0.5 P(z2|¬open) = 0.6

P(open|z1)=2/3 = 0.67
P ( z 2 | open) P (open | z1 )
P (open | z 2 , z1 ) =
P ( z 2 | open) P (open | z1 ) + P ( z 2 | ¬open) P (¬open | z1 )
1 2
⋅
2 3 5
= = = 0.625
1 2 3 1 8
⋅ + ⋅
2 3 5 3
• z2 lowers the probability that the door is open.

91
These calculations seem laborious
to do for each problem domain –
is there a general
representation scheme for
probabilistic inference?
Yes!
92
Enter…Bayesian networks
93
What are Bayesian networks?
Simple, graphical notation for conditional independence
assertions
Allows compact specification of full joint distributions
Syntax:
• a set of nodes, one per random variable
• a directed, acyclic graph (link ≈ "directly influences")
• a conditional distribution for each node given its parents:
P (Xi | Parents (Xi))
For discrete variables, conditional distribution =

conditional probability table (CPT) = distribution over
Xi for each combination of parent values
94
Back at the Dentist’s
Topology of network encodes conditional independence
assertions:
Weather is independent of the other variables

Toothache and Catch are conditionally independent of
each other given Cavity
95
Example 2: Burglars and Earthquakes
You are at a “Done with the AI class” party.
Neighbor John calls to say your home alarm has gone
off (but neighbor Mary doesn't).
Sometimes your alarm is set off by minor earthquakes.
Question: Is your home being burglarized?
Variables: Burglary, Earthquake, Alarm, JohnCalls,

MaryCalls
Network topology reflects "causal" knowledge:

• A burglar can set the alarm off
• An earthquake can set the alarm off
• The alarm can cause Mary to call
• The alarm can cause John to call
96
Burglars and Earthquakes
97
Compact Representation of Probabilities in
Bayesian Networks
A CPT for Boolean Xi with k Boolean parents has 2k

rows for the combinations of parent values
Each row requires one number p for Xi = true

(the other number for Xi = false is just 1-p)
If each variable has no more than k parents, an n-

variable network requires O(n · 2k) numbers
• This grows linearly with n vs. O(2n) for full joint
distribution
For our network, 1+1+4+2+2 = 10 numbers (vs. 25-1 =

31)
98
Semantics
Full joint distribution is defined as product of local
conditional distributions:
P (X , … ,X ) = π n P (X | Parents(X ))
1 n i = 1 i i
e.g., P(j ∧ m ∧ a ∧ ¬b ∧ ¬e)

= P (j | a) P (m | a) P (a | ¬b, ¬e) P (¬b) P (¬e)
99
Probabilistic Inference in BNs
The graphical independence representation yields
efficient inference schemes
We generally want to compute
• P(X|E) where E is evidence from sensory measurements etc.
(known values for variables)
• Sometimes, may want to compute just P(X)

One simple algorithm:
• variable elimination (VE)
100
P(B | J=true, M=true)
Earthquake Burglary
Alarm
John Mary
P(b|j,m) = α P(b,j,m) = α Σe,a P(b,j,m,e,a)
101
Computing P(B | J=true, M=true)
Earthquake Burglary
Alarm
John Mary
P(b|j,m) = α Σe,a P(b,j,m,e,a)

= α Σe,a P(b) P(e) P(a|b,e) P(j|a) P(m|a)
= α P(b) Σe P(e) Σa P(a|b,e)P(j|a)P(m|a)
102
Structure of Computation
Repeated computations ⇒ use dynamic programming?

103
Variable Elimination
A factor is a function from some set of variables to a
specific value: e.g., f(E,A,Mary)
• CPTs are factors, e.g., P(A|E,B) function of
A,E,B
VE works by eliminating all variables in turn until

there is a factor with only the query variable
To eliminate a variable:
1. join all factors containing that variable (like
DBs/SQL), multiplying probabilities
• 2. sum out the influence of the variable on new
factor
P(b|j,m) = α P(b) Σe P(e) Σa P(a|b,e)P(j|a)P(m|a)

104
Example of VE: P(J)
P(J)
= ΣM,A,B,E P(J,M,A,B,E) Earthqk Burgl
= ΣM,A,B,E P(J|A)P(M|A) P(B)P(A|B,E)P(E)
= ΣAP(J|A) ΣMP(M|A) ΣBP(B) ΣEP(A|B,E)P(E) Alarm
= ΣAP(J|A) ΣMP(M|A) ΣBP(B) f1(A,B)
= ΣAP(N1|A) ΣMP(M|A) f2(A) J M
= ΣAP(J|A) f3(A)
= f4(J)
105
Other Inference Algorithms
Direct Sampling:
• Repeat N times:
– Use random number generator to generate sample values for each
node
– Start with nodes with no parents
– Condition on sampled parent values for other nodes
• Count frequencies of samples to get an approximation to
joint distribution
Other variants: Rejection sampling, likelihood weighting, Gibbs sampling
and other MCMC methods (see text)
Belief Propagation: A “message passing” algorithm for approximating

P(X|evidence) for each node variable X
Variational Methods: Approximate inference using distributions

that are more tractable than original ones
(see text for details)

106
Summary
Bayesian networks provide a natural way to represent
conditional independence
Network topology + CPTs = compact representation
of joint distribution
Generally easy for domain experts to construct
BNs allow inference algorithms such as VE that are
efficient in many cases
107
Next Time
Guest lecture by Dieter Fox on
Applications of Probabilistic Reasoning
To Do: Work on homework #2
Bayes
rules!
108

FOL Uncertainty

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FOL Uncertainty

Uploaded by

Copyright:

Available Formats

CSEP 573

Logic, Reasoning, and Uncertainty

KB is in Conjunctive Normal Form (CNF):

E.g., P1,3 ∨ P2,2 ¬P2,2

¬ (l1 ∨ … ∨ li-1 ∨ li+1 ∨ … ∨ l k) ⇒ l i

¬ (li ∨ … ∨ li-1 ∨ li+1 ∨ … ∨ lk) ⇒ (m1 ∨ … ∨ mj-1 ∨

You got a literal and its negation Empty clause

Two kinds (roughly):

BUT… showing a sentence is

I really can’t get

E.g. 2-CNF sentences (2 literals per clause):

(¬A ∨ ¬B) ∧ (A ∨ B) ∧ (A ∨ ¬B)

(¬A ∨ ¬B) ∧ (A ∨ B) ∧ (A ∨ ¬B) ∧ (¬A ∨ B)

Balance between greediness and randomness

• Hard instances of SAT seem to cluster near

Median runtime for random 3-CNF sentences, n = 50

For every time step t and every location [x,y],

Rapid proliferation of sentences!

S is satisfiable if it is true in some interp

S is unsatisfiable if it is false in all interps

S1 ╞ S2 (S1 entails S2) if

Syntax Atomic sentences Variables & quantification

Inference DPLL, WalkSAT Unification

Complexity NP-Complete Semi-decidable

• Represent breezes like pits,

“Squares next to pits are

• Wumpus home as unary

x is replaced with p in S, x is replaced with George in S,

Use a Skolem Constant :

• ∀x: Monkey(x) ⇒ HasTail(x, K_Tail) ???

Existential variables can be replaced by Skolem

∀x: Monkey(x) ⇒ HasTail(x, f(x))

Not this kind of unification…

Unify(road(x, x), road(seattle, kent))

Back to curious monkeys:

Unify and then use modus ponens =

Recall Propositional Case:

{ (p(x) ∨ q(A), (¬ p(B) ∨ r(x) ∨ s(y)) }

• Literal in one clause

CNF representation (list of clauses):

“Breezes happen only and always next to pits”:

Hard to achieve in most cases

Probability Theory: Beliefs about events

Decision about when to leave depends on both:

Probabilistic model: Says how often we expect different

Where do the numbers for probabilities come from?

Probability: new foundation for AI (& CS!)

Massive amounts of data around today

Statistics lets data do our work for us

Boolean values: T, F Values/Domain: you specify

sunny, rain, cloudy, snow

We will see later how any question can be answered by

Notation for conditional distributions:

If we know more, e.g., cavity is also given (i.e. Cavity =

New evidence may be irrelevant, allowing simplification:

What is the probability of a cavity given a toothache?

P( A | B) = P( A) These two constraints are

Therefore, if A and B are independent:

Complete independence is powerful but rare

Instead of 7 entries, only need 5 (why?)

Why only 5 entries in table?

Conditional independence is the most basic & robust

E.g. let M be meningitis, S be stiff neck

α is called the normalization constant