0、Bayesian reinforcement learning in continuous POMDPs with application to robot navigation.(ppt)
0、Bayesian reinforcement learning in continuous POMDPs with application to robot navigation.(ppt)
1
School of Computer Science, McGill University, Canada
2
Department of Computer Science, Laval University, Canada
Bayesian RL in Continuous POMDP Stéphane Ross1 , Brahim Chaib-draa2 and Joelle Pineau1 1 / 17
Motivation POMDP Bayesian RL Experiments Conclusion
Motivation
Bayesian RL in Continuous POMDP Stéphane Ross1 , Brahim Chaib-draa2 and Joelle Pineau1 2 / 17
Motivation POMDP Bayesian RL Experiments Conclusion
Continuous POMDP
States : S ⊆ Rm
Actions : A ⊆ Rn
Observations : Z ⊆ Rp
Rewards : R(s, a) ∈ R
Bayesian RL in Continuous POMDP Stéphane Ross1 , Brahim Chaib-draa2 and Joelle Pineau1 3 / 17
Motivation POMDP Bayesian RL Experiments Conclusion
Example
x0
x cos θ − sin θ X1
= +v
y0 y sin θ cos θ X2
zx x Y1
= +
zy y Y2
Bayesian RL in Continuous POMDP Stéphane Ross1 , Brahim Chaib-draa2 and Joelle Pineau1 4 / 17
Motivation POMDP Bayesian RL Experiments Conclusion
Problem
In practice : µX , ΣX , µY , ΣY unknown.
Need to trade-off between :
Learning the model
Identifying the state
Gathering rewards
Bayesian RL in Continuous POMDP Stéphane Ross1 , Brahim Chaib-draa2 and Joelle Pineau1 5 / 17
Motivation POMDP Bayesian RL Experiments Conclusion
Current Prior/
Posterior
Action
Observation
New Posterior
Bayesian RL in Continuous POMDP Stéphane Ross1 , Brahim Chaib-draa2 and Joelle Pineau1 6 / 17
Motivation POMDP Bayesian RL Experiments Conclusion
Bayesian RL in Continuous POMDP Stéphane Ross1 , Brahim Chaib-draa2 and Joelle Pineau1 7 / 17
Motivation POMDP Bayesian RL Experiments Conclusion
Bayesian RL in Continuous POMDP Stéphane Ross1 , Brahim Chaib-draa2 and Joelle Pineau1 8 / 17
Motivation POMDP Bayesian RL Experiments Conclusion
Belief Update
Bayesian RL in Continuous POMDP Stéphane Ross1 , Brahim Chaib-draa2 and Joelle Pineau1 9 / 17
Motivation POMDP Bayesian RL Experiments Conclusion
Belief Update
Particle filter :
Use particles of the form (s, φ, ψ)
φ,ψ : Normal-Wishart posterior parameters for X ,Y
Bayesian RL in Continuous POMDP Stéphane Ross1 , Brahim Chaib-draa2 and Joelle Pineau1 10 / 17
Motivation POMDP Bayesian RL Experiments Conclusion
Particle Filter
Bayesian RL in Continuous POMDP Stéphane Ross1 , Brahim Chaib-draa2 and Joelle Pineau1 11 / 17
Motivation POMDP Bayesian RL Experiments Conclusion
Online Planning
Monte Carlo Online Planning (Receding Horizon Control) :
b0
a1 a2 an
...
o1 o2 on
...
b1 b2 b3
Bayesian RL in Continuous POMDP Stéphane Ross1 , Brahim Chaib-draa2 and Joelle Pineau1 12 / 17
Motivation POMDP Bayesian RL Experiments Conclusion
1
Prior model
0.9 Exact Model
Learning
0.8
0.7
Average Return
0.6
0.5
0.4
0.3
0.2
0.1
0
0 50 100 150 200 250
Training Steps
Bayesian RL in Continuous POMDP Stéphane Ross1 , Brahim Chaib-draa2 and Joelle Pineau1 13 / 17
Motivation POMDP Bayesian RL Experiments Conclusion
0.9
0.8
0.7
0.6
WL1
0.5
0.4
0.3
0.2
0.1
0
0 50 100 150 200 250
Training Steps
Bayesian RL in Continuous POMDP Stéphane Ross1 , Brahim Chaib-draa2 and Joelle Pineau1 14 / 17
Motivation POMDP Bayesian RL Experiments Conclusion
Conclusion
Bayesian RL in Continuous POMDP Stéphane Ross1 , Brahim Chaib-draa2 and Joelle Pineau1 15 / 17
Motivation POMDP Bayesian RL Experiments Conclusion
Future Work
What if gT , gO unknown ?
What if (µ, Σ) change over time ?
More efficient planning algorithms.
Apply to a real robot.
Bayesian RL in Continuous POMDP Stéphane Ross1 , Brahim Chaib-draa2 and Joelle Pineau1 16 / 17
Motivation POMDP Bayesian RL Experiments Conclusion
Thank you !
Questions
?
Bayesian RL in Continuous POMDP Stéphane Ross1 , Brahim Chaib-draa2 and Joelle Pineau1 17 / 17