Professional Documents
Culture Documents
FOL Uncertainty
FOL Uncertainty
© CSE AI Faculty
What’s on our menu today?
Propositional Logic
• Resolution
• WalkSAT
Reasoning with First-Order Logic
• Unification
• Forward/Backward Chaining
• Resolution
• Wumpus again
Uncertainty
• Bayesian networks
2
Recall from Last Time:
Inference/Proof Techniques
Two kinds (roughly):
Successive application of inference rules
– Generate new sentences from old in a sound way
– Proof = a sequence of inference rule applications
– Use inference rules as successor function in a
standard search algorithm
– E.g., Resolution
Model checking
– Done by checking satisfiability: the SAT problem
– Recursive depth-first enumeration of models using
heuristics: DPLL algorithm (sec. 7.6.1 in text)
– Local search algorithms (sound but incomplete)
e.g., randomized hill-climbing (WalkSAT)
3
Understanding Resolution
IDEA: To show KB ╞ α, use proof by
contradiction,
i.e., show KB ∧ ¬ α unsatisfiable
4
Generating new clauses
General Resolution inference rule (for CNF):
l1 ∨… ∨ l k m1 ∨ … ∨ mn
l1 ∨ … ∨ li-1 ∨ li+1 ∨ … ∨ l k ∨ m1 ∨ … ∨ mj-1 ∨ mj+1…∨ mn
where li and mj are complementary literals (l i = ¬mj)
5
Why this is sound
Proof of soundness of resolution inference rule:
6
Resolution example
KB ¬α
7
Back to Inference/Proof Techniques
Can’t get
¬satisfaction
9
Why Satisfiability?
Recall: KB ╞ α iff KB ∧ ¬α is unsatisfiable
Thus, algorithms for satisfiability can be used for
inference by showing KB ∧ ¬α is unsatisfiable
10
Satisfiability Examples
11
The WalkSAT algorithm
Local hill climbing search algorithm
• Incomplete: may not always find a
satisfying assignment even if one exists
Evaluation function?
= Number of satisfied clauses
WalkSAT tries to maximize this function
12
The WalkSAT algorithm
Greed Randomness
13
Hard Satisfiability Problems
Consider random 3-CNF sentences. e.g.,
(¬D ∨ ¬B ∨ C) ∧ (B ∨ ¬A ∨ ¬C) ∧ (¬C ∨ ¬B ∨ E) ∧
(E ∨ ¬D ∨ B) ∧ (B ∨ E ∨ ¬C)
m = number of clauses
n = number of symbols
14
Hard Satisfiability Problems
15
Hard Satisfiability Problems
16
What about me?
Putting it all together:
Logical Wumpus Agents
A wumpus-world agent using propositional logic:
¬P1,1
¬W1,1
For x = 1, 2, 3, 4 and y = 1, 2, 3, 4, add
(with appropriate boundary conditions):
Bx,y ⇔ (Px,y+1 ∨ Px,y-1 ∨ Px+1,y ∨ Px-1,y)
Sx,y ⇔ (Wx,y+1 ∨ Wx,y-1 ∨ Wx+1,y ∨ Wx-1,y)
W1,1 ∨ W1,2 ∨ … ∨ W4,4 At least 1 wumpus
¬W1,1 ∨ ¬W1,2
At most 1 wumpus
¬W1,1 ∨ ¬W1,3
…
⇒ 64 distinct proposition symbols, 155 sentences!
18
Limitations of propositional logic
KB contains "physics" sentences for every single
square
19
What we’d like is a way to talk about
objects and groups of objects, and to
define relationships between them
Enter…First-Order Logic
(aka “Predicate logic”)
Propositional vs. First-Order
Propositional logic
Facts: p, q, ¬r, ¬P1,1, ¬W1,1 etc.
(p ∧ q) v (¬r v q ∧ p)
First-order logic
Objects: George, Monkey2, Raj, 573Student1, etc.
Relations:
Curious(George), Curious(573Student1), …
Smarter(573Student1, Monkey2)
Smarter(Monkey2, Raj)
Stooges(Larry, Moe, Curly)
PokesInTheEyes(Moe, Curly)
PokesInTheEyes(573Student1, Raj)
21
FOL Definitions
Constants: George, Monkey2, etc.
• Name a specific object.
Variables: X, Y.
• Refer to an object without naming it.
Functions: banana-of, grade-of, etc.
• Mapping from objects to objects.
Terms: banana-of(George), grade-of(stdnt1)
• Logical expressions referring to objects
Relations (predicates): Curious, PokesInTheEyes, etc.
• Properties of/relationships between objects.
22
More Definitions
Logical connectives: and, or, not, ⇒, ⇔
Quantifiers:
• ∀ For all (Universal quantifier)
• ∃ There exists (Existential quantifier)
Examples
• George is a monkey and he is curious
Monkey(George) ∧ Curious(George)
• All monkeys are curious
∀m: Monkey(m) ⇒ Curious(m)
• There is a curious monkey
∃m: Monkey(m) ∧ Curious(m)
23
Quantifier / Connective
Interaction
M(x) == “x is a monkey”
C(x) == “x is curious”
∀x: M(x) ∧ C(x)
“Everything is a curious monkey”
∀x: M(x) ⇒C(x)
“All monkeys are curious”
∃x: M(x) ∧ C(x)
“There exists a curious monkey”
∃x: M(x) ⇒ C(x)
“There exists an object that is either a curious
monkey, or not a monkey at all”
24
Nested Quantifiers:
Order matters!
∀x ∃y P(x,y) ≠ ∃y ∀x P(x,y)
Examples
Every monkey has a tail Every monkey shares a tail!
∀m ∃t has(m,t) ∃t ∀m has(m,t)
Try:
Everybody loves somebody vs. Someone is loved by everyone
∀x ∃y loves(x, y) ∃y ∀x loves(x, y)
25
Semantics
Semantics = what the arrangement of symbols means in
the world
Propositional logic
• Basic elements are variables
(refer to facts about the world)
• Possible worlds: mappings from variables to T/F
First-order logic
• Basic elements are terms
(logical expressions that refer to objects)
• Interpretations: mappings from terms to real-
world elements.
26
Example: A World of Kings and Legs
Syntactic elements:
Constants: Functions: Relations:
•
Richard John LeftLeg(p) On(x,y) King(p)
27
Interpretation I
Interpretations map syntactic tokens to model elements
• Constants: Functions: Relations:
•
Richard John LeftLeg(p) On(x,y) King(p)
28
Interpretation II
• Constants: Functions: Relations:
•
Richard John LeftLeg(p) On(x,y) King(p)
29
How Many Interpretations?
Two constants (and 5 objects in world)
• Richard, John (R, J, crown, RL, JL)
52 = 25 object mappings
One unary relation
King(x)
Infinite number of values for x Æ infinite mappings
Even if we restricted x to: R, J, crown, RL, JL:
25 = 32 unary truth mappings
Two binary relations
• Leg(x, y); On(x, y)
Infinite. But even restricting x, y to five objects
still yields 225 mappings for each binary relation
30
Satisfiability, Validity, &
Entailment
S is valid if it is true in all interpretations
31
Propositional. Logic vs. First Order
Objects,
Ontology Facts (P, Q,…) Properties,
Relations
33
Wumpus World: Squares
• Each square as an object:
Square1,1, Square1,2, …,
Square3,4, Square4,4
•Square topology relations?
Adjacent(Square1,1, Square2,1)
…
Adjacent(Square3,4, Square4,4)
Better: Squares as lists:
[1, 1], [1,2], …, [4, 4]
Square topology relations:
∀x, y, a, b: Adjacent([x, y], [a, b]) Ù
[a, b] Є {[x+1, y], [x-1, y], [x, y+1], [x, y-1]}
34
Wumpus World: Pits
•Each pit as an object:
Pit1,1, Pit1,2, …,
Pit3,4, Pit4,4
• Problem?
Not all squares have pits
List only the pits we have?
Pit3,1, Pit3,3, Pit4,4
Problem?
No reason to distinguish pits (same properties)
Better: pit as unary predicate
Pit(x)
Pit([3,1]); Pit([3,3]); Pit([4,4]) will be true
35
Wumpus World: Breezes
36
Wumpus World: Wumpuses
• Wumpus as object:
Wumpus
37
FOL Reasoning: Outline
Basics of FOL reasoning
Classes of FOL reasoning methods
• Forward & Backward Chaining
• Resolution
• Compilation to SAT
38
Basics: Universal Instantiation
Universally quantified sentence:
• ∀x: Monkey(x) ⇒ Curious(x)
Intutively, x can be anything:
• Monkey(George) ⇒ Curious(George)
• Monkey(473Student1) ⇒ Curious(473Student1)
• Monkey(DadOf(George)) ⇒ Curious(DadOf(George))
Formally: (example)
∀x S ∀x Monkey(x) Æ Curious(x)
Subst({x/p}, S) Monkey(George) Æ Curious(George)
39
Basics: Existential Instantiation
Existentially quantified sentence:
• ∃x: Monkey(x) ∧ ¬Curious(x)
Intutively, x must name something. But what?
• Monkey(George) ∧ ¬Curious(George) ???
• No! S might not be true for George!
Formally:
∃x S
Subst({x/K}, S)
K is called a Skolem constant
40
Basics: Generalized Skolemization
What if our existential variable is nested?
• ∀x ∃y: Monkey(x) ⇒ HasTail(x, y)
“tail-of” function
41
Motivation for Unification
What if we want to use modus ponens?
Propositional Logic:
a ∧ b, a ∧ b ⇒ c
c
In First-Order Logic?
Monkey(x) ⇒ Curious(x)
Monkey(George)
????
Must “unify” x with George:
Need to substitute {x/George} in Monkey(x) ⇒ Curious(x) to
infer Curious(George)
42
What is Unification?
Examples:
• Unify(city(x), city(seattle)) returns {x/seattle}
• Unify(PokesInTheEyes(Moe,x), PokesInTheEyes(y,z))
returns {y/Moe,z/x}
– {y/Moe,x/Moe,z/Moe} possible but not MGU
44
Unification and Substitution
Unification produces a mapping from variables to
values (e.g., {x/kent,y/seattle})
Substitution: Subst(mapping,sentence) returns new
sentence with variables replaced by values
• Subst({x/kent,y/seattle}),connected(x, y)),
returns connected(kent, seattle)
45
Unification Examples I
Unify(road(x, kent), road(seattle, y))
• Returns {x / seattle, y / kent}
• When substituted in both expressions, the
resulting expressions match:
• Each is (road(seattle, kent))
46
Unification Examples II
Unify(f(g(x, dog), y)), f(g(cat, y), dog)
• {x / cat, y / dog}
Unify(f(g(x)), f(x))
• Fails: no substitution makes them identical.
• E.g. {x / g(x) } yields f(g(g(x))) and f(g(x))
which are not identical!
47
Unification Examples III
Unify(f(g(cat, y), y), f(x, dog))
• {x / g(cat, dog), y / dog}
Unify(f(g(y)), f(x))
• {x / g(y)}
49
Inference II: Backward
Chaining
The algorithm:
• Start with KB and goal.
• Find all rules whose results unify with goal:
Add the premises of these rules to the goal list
Remove the corresponding result from the goal list
• Stop when:
Goal list is empty (SUCCEED) or
Progress halts (FAIL)
50
Inference III: Resolution
[Robinson 1965]
{ (p ∨ q), (¬ p ∨ r ∨ s) } (q ∨ r ∨ s)
51
First-Order Resolution
[Robinson 1965]
Substitute
(q(A) ∨ r(B) ∨ s(y))
MGU {x/B}
in all
literals
Method
• Let S = KB ∧ ¬goal
• Convert S to clausal form
– Standardize apart variables (change names if needed)
– Move quantifiers to front, skolemize to remove ∃
– Replace ⇒ with ∨ and ¬
– DeMorgan’s laws to get CNF (ands-of-ors)
• Resolve clauses in S until empty clause
(unsatisfiable) or no new clauses added
53
First-Order Resolution
Given
Example
• ∀x man(x) ⇒ human(x)
• ∀x woman(x) ⇒ human(x)
• ∀x singer(x) ⇒ man(x) ∨ woman(x)
• singer(M)
Prove
• human(M)
54
FOL Resolution Example
[¬m(x),h(x)] [¬w(y), h(y)] [¬s(z),m(z),w(z)] [s(M)] [¬h(M)]
[m(M),w(M)]
[w(M), h(M)]
[h(M)]
[]
55
Back To the Wumpus World
Recall description:
• Squares as lists: [1,1] [3,4] etc.
• Square adjacency as binary predicate.
• Pits, breezes, stenches as unary predicates:
Pit(x)
• Wumpus, gold, homes as functions:
Home(Wumpus)
56
Back To the Wumpus World
“Squares next to pits are breezy”:
∀x, y, a, b:
Pit([x, y]) ∧ Adjacent([x, y], [a, b]) ⇒
Breezy([a, b])
57
That’s nice but these algorithms
assume complete knowledge of the
world!
60
Leaving time before 6pm P(arrive-in-time)
20 min 0.05
30 min 0.25
45 min 0.50
60 min 0.75
120 min 0.98
1 day 0.99999
61
What Is Probability?
Probability: Calculus for dealing with nondeterminism and
uncertainty
62
Why Should You Care?
The world is full of uncertainty
• Logic is not enough
• Computers need to be able to handle uncertainty
63
Logic vs. Probability
Symbol: Q, R … Random variable: Q …
65
Axioms of Probability Theory
Just 3 are enough to build entire theory!
1. All probabilities between 0 and 1
0 ≤ P(A) ≤ 1
2. P(true) = 1 and P(false) = 0
3. Probability of disjunction of events is:
P ( A ∨ B ) = P ( A) + P( B) − P ( A ∧ B )
A ∧ B
A
True
66
Prior and Joint Probability
0.2
A A∧ B B
69
Dilemma at the Dentist’s
70
Probabilistic Inference by Enumeration
P(toothache) =.108+.012+.016+.064
P(toothache)= ?
= .20 or 20%
71
Inference by Enumeration
P(toothache∨cavity) = ?
.2 + ?
.108 + .012 + .072 + .008 - (.108+.012)
= .28
72
Inference by Enumeration
73
Problems with Enumeration
Worst case time: O(dn)
where d = max arity of random variables
e.g., d = 2 for Boolean (T/F)
and n = number of random variables
Space complexity also O(dn)
• Size of joint distribution
Problem: Hard/impossible to estimate all O(dn)
entries for large problems
74
Independence
A and B are independent iff:
P ( A ∧ B) = P ( A) P( B )
75
Independence
2 values
4 values
77
Conditional Independence II
P(catch | toothache, cavity) = P(catch | cavity)
P(catch | toothache,¬cavity) = P(catch |¬cavity)
78
Power of Cond. Independence
Often, using conditional independence reduces the
storage complexity of the joint distribution from
exponential to linear!!
79
Thomas Bayes
Publications:
Reverand Thomas Bayes Divine Benevolence, or an
Nonconformist minister Attempt to Prove That
(1702-1761) the Principal End of the
Divine Providence and
Government is the
Happiness of His
Creatures (1731)
An Introduction to the
Doctrine of Fluxions
(1736)
An Essay Towards Solving
a Problem in the
Doctrine of Chances
(1764)
80
Recall: Conditional Probability
P(x | y) is the probability of x given y
Assumes that y is the only info known.
Defined as:
P ( x, y )
P( x | y ) =
P( y )
P ( y , x ) P ( x, y )
P( y | x) = =
P( x) P( x)
Therefore?
81
Bayes’ Rule
P ( x, y ) = P ( x | y ) P ( y ) = P ( y | x ) P ( x )
⇒
P( y | x) P( x)
P( x y ) =
P( y )
What this useful for?
82
Bayes’ rule is used to Compute Diagnostic
Probability from Causal Probability
P(M|S) =
83
Normalization in Bayes’ Rule
P ( y | x) P ( x)
P(x y) = = α P ( y | x) P( x)
P( y)
1 1 1
α = = =
P ( y ) ∑ P ( y , x ) ∑ P ( y | x )P ( x )
x x
84
Cond. Independence and the Naïve Bayes Model
85
Example 1: State Estimation
86
Causal vs. Diagnostic Reasoning
P(open|z) is diagnostic.
P(z|open) is causal.
Often causal knowledge is easier to obtain.
Bayes rule allows us to use causal knowledge:
count frequencies!
P( z | open) P (open)
P(open | z ) =
P( z )
87
State Estimation Example
P( z | open) P (open)
P (open | z ) =
P ( z | open) p (open) + P ( z | ¬open) p (¬open)
0.6 ⋅ 0.5 2
P (open | z ) = = = 0.67
0.6 ⋅ 0.5 + 0.3 ⋅ 0.5 3
88
Combining Evidence
89
Recursive Bayesian Updating
P( zn | x, z1, K, zn − 1) P( x | z1, K, zn − 1)
P( x | z1, K, zn) =
P( zn | z1, K, zn − 1)
P ( z 2 | open) P (open | z1 )
P (open | z 2 , z1 ) =
P ( z 2 | open) P (open | z1 ) + P ( z 2 | ¬open) P (¬open | z1 )
1 2
⋅
2 3 5
= = = 0.625
1 2 3 1 8
⋅ + ⋅
2 3 5 3
Yes!
92
Enter…Bayesian networks
93
What are Bayesian networks?
Simple, graphical notation for conditional independence
assertions
Syntax:
• a set of nodes, one per random variable
• a directed, acyclic graph (link ≈ "directly influences")
• a conditional distribution for each node given its parents:
P (Xi | Parents (Xi))
95
Example 2: Burglars and Earthquakes
You are at a “Done with the AI class” party.
Neighbor John calls to say your home alarm has gone
off (but neighbor Mary doesn't).
Sometimes your alarm is set off by minor earthquakes.
97
Compact Representation of Probabilities in
Bayesian Networks
99
Probabilistic Inference in BNs
The graphical independence representation yields
efficient inference schemes
We generally want to compute
• P(X|E) where E is evidence from sensory measurements etc.
(known values for variables)
100
P(B | J=true, M=true)
Earthquake Burglary
Alarm
John Mary
101
Computing P(B | J=true, M=true)
Earthquake Burglary
Alarm
John Mary
P(J)
= ΣAP(J|A) f3(A)
= f4(J)
105
Other Inference Algorithms
Direct Sampling:
• Repeat N times:
– Use random number generator to generate sample values for each
node
– Start with nodes with no parents
– Condition on sampled parent values for other nodes
• Count frequencies of samples to get an approximation to
joint distribution
Other variants: Rejection sampling, likelihood weighting, Gibbs sampling
and other MCMC methods (see text)
107
Next Time
Guest lecture by Dieter Fox on
Applications of Probabilistic Reasoning
To Do: Work on homework #2
Bayes
rules!
108