5 - HJB
5 - HJB
Contents
1 The
1.1
1.2
1.3
1.4
Hamilton-Jacobi-Bellman Equation
The optimal control problem . . . . . . . .
Deriving the HJB equation . . . . . . . . .
Solution of the HJB equation by examples
Exercises . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
2
2
4
10
1.1
In the previous description of the dynamic programming method, we approximated continuous systems to discrete systems. This approach leads to
a recurrence relation that is suited for digital implementation. In this chapter
we shall consider another approach, for continuous systems, that leads to a
nonlinear partial differential equation - the Hamilton-Jacobi-Bellman (HJB)
equation. We shall show how the optimal performance measure, if it satisfies
the HamiltonJacobi equation, determines an optimal control.
Consider the following continuous optimal control problem.
The process described by the state equation
x = f(x(t), u(t), t)
(1.1)
tf
t0
g(x( ), u( ), )d
(1.2)
1.2
The HJB equation will be obtained using the principle of optimality and the
results obtained for the dynamic programming.
tf
g(x( ), u( ), )d
(1.3)
where t can be any value less than or equal to tf and x(t) can be any admissible state value. Notice that the performance measure will depend on
the numerical values for x(t) and t and on the optimal control history in the
interval [t, tf ], (Kirk, 2004).
Let us now attempt to determine the controls that minimize (1.3) for all
admissible x(t) and all t tf . The minimum cost function is then:
tf
g(x( ), u( ), )d
(1.4)
t+t
g(x, u, )d +
tf
t+t
g(x, u, )d
(1.5)
J (x(t), t) = min
u
(Z
t+t
(1.6)
where J (x(t + t), t + t) is the minimum cost of the process for the time
interval t + t tf with initial state x(t + t).
Assuming J has bounded second derivatives in both arguments, we can
expand J (x(t + t), t + t) in a Taylor series about the point (x(t), t)
(truncated after the first order terms) to obtain:
J (x(t), t) = min
u
"
(Z
t+t
g(x, u, )d + J (x(t), t) +
"
J
J
(x(t), t) t +
(x(t), t)
t
x
#T
For small t,
[x(t + t) x(t)](1.7)
t
x(t + t) x(t) = x(t)
t+t
"
J (x(t), t)
J (x(t), t)
0 = min g(x, u, t)t +
t +
u( )
t
x
#T
f(x, u, t)t
(1.8)
0=
"
J (x(t), t)
J (x(t), t)
+ min g(x, u, t) +
u(
)
t
x
"
J (x(t), t)
H = g(x, u, ) +
x
#T
#T
f(x, u, t)
f(x, u, t)
(1.9)
(1.10)
J (x(t), t)
+ min H
u
t
(1.11)
To find the boundary value for this partial differential equation set t = tf ;
from (1.4) we have:
J (x(tf ), tf ) = h(x(tf ), tf )
(1.12)
1.3
= x(t) + u(t)
4
It is desired to find the control law that minimizes the cost function
1
J = x2 (T ) +
4
1 2
u (t)dt
4
The final time T is specified and the admissible state and control values are
not constrained by any boundaries.
1
g(x(t), u(t), t) = u2 (t)
4
f(x(t), u(t), t) = x(t) + u(t)
The hamiltonian:
H(x(t), u(t),
1
J (x(t), t)
J
) = u2 (t) +
(x(t) + u(t))
x
4
x
J
u (t) = 2
x
when substituted into the HJB equation:
0=
J
+ min H
u
t
gives:
J 1
J
0=
+
2
t
4
x
!2
J
J
+
x(t) 2
x
x
J
J
0=
t
x
!2
J
x(t)
x
(1.13)
(1.14)
One way to solve the HJB equation is to guess a form for the solution
and see if it can be made to satisfy the differential equation and the boundary
conditions. Since J (x(T ), T ) is quadratic in x(T ) guess
1
J (x(t), t) = p(t)x2 (t)
2
(1.15)
(t)
t
2
(1.17)
(1.18)
with the final condition: p(T ) = 1/2. p(t) is a scalar function of t; therefore
the solution can be obtained using the transformation z(t) = 1/p(t), with the
result:
1
p(t) =
(1.19)
1 + e2(tT )
Obs. The solution of (1.18) is obtained as follows:
1
=2
p(T )
z
2
2
(1.18) 2 2 + = 0
z
z
z
z 2z + 2 = 0
z(t) =
1
;
p(t)
z(T ) =
1
e2(tT )
+1
J (x, t)
= 2p(t)x(t)
x
2
x(t)
1 + e2(tT )
The controller is a time-varying one and the block diagram is shown in Figure
1.1.
u (t) =
u*(t)
Process
x(t)
-2
(1+e2(t-T))
Figure 1.1: Optimal control for example 1.1
Notice that as T , the linear time-varying feedback approaches constant feedback p(t) = 1 and the controlled system
x(t)
= x(t) u(t),
where > 0 and u(t) is his rate of expenditure. He wishes to maximize
J=
eat u(t)dt
J (x(t), t) = min
u
at
u(t)dt
Let a = 1; = 4.
We solve this problem using the Hamilton-Jacobi-Bellman equation:
J
0=
+ min H Jt + H = 0
u
t
Then:
f (x, u, t) = 4x(t) u(t);
and the Hamiltonian is:
H = g(x, u, t) +
g(x, u, t) = et u(t)
J
f (x, u, t) = et u + Jx (4x u)
x
1 1
1 2t 2
Jx
2 = e
2t
4 e Jx
4
1 (3e2t + 16xJx 2 )
4
Jx
8
q
J
= p(t)
x(t)
t
1 p(t)
J
Jx =
= q
x
2 x(t)
Jt =
It has to be satisfied for any x(t), thus we can obtain p(t) from:
p +
1 3e2t + 4p2
=0
2
p
1 2t
6e + 4e4t C
2
The constant C is calculated from the final condition and has the value:
C=
3
2e2T
1 q 2t
6e 6e4t+2T
2
e2t
4x(t)
4x(t)
=
4t+2T
6e
)
1 6e2(T t)
(e2t
1.4
Exercises
= ax(t) + bu(t)
with the associated cost index:
1
1
J = f x2 (tf ) +
2
2
tf
t0
where the initial time t0 = 0, the final time tf < is fixed, and the final
state x(tf ) is free. Find a control u (t) that minimizes J.
Exercise 1.2 Consider a first-order system described by the state equation
x(t)
= x(t) + u(t)
Determine the optimal control u (t) that minimizes the performance measure:
Z 1
1
J = x2 (1) +
u2 (t)dt
2
0
= x(t) u(t)
Determine the optimal control u (t) that minimizes the performance measure:
2
J = x (1) +
(x(t) u(t))2 dt
10
Bibliography
11
and
control.
online
at