Data-conforming data-driven control: avoiding premature generalizations beyond data

Mohammad S. Ramadan    \IEEEmembershipMember, IEEE     Evan Toler     Mihai Anitescu    \IEEEmembershipMember, IEEE The authors are with the Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL 60439, USA, [email protected], [email protected], [email protected].
Abstract

Data-driven and adaptive control approaches face the problem of introducing sudden distributional shifts beyond the distribution of data encountered during learning. Therefore, they are prone to invalidating the very assumptions used in their own construction. This is due to the linearity of the underlying system, inherently assumed and formulated in most data-driven control approaches, which may falsely generalize the behavior of the system beyond the behavior experienced in the data. This paper seeks to mitigate these problems by enforcing consistency of the newly designed closed-loop systems with data and slow down any distributional shifts in the joint state-input space. This is achieved through incorporating affine regularization terms and linear matrix inequality constraints to data-driven approaches, resulting in convex semi-definite programs that can be efficiently solved by standard software packages. We discuss the optimality conditions of these programs and then conclude the paper with a numerical example that further highlights the problem of premature generalization beyond data and shows the effectiveness of our proposed approaches in enhancing the safety of data-driven control methods.

{IEEEkeywords}

Data-driven control, adaptive control, system identification, robust control, offline reinforcement learning.

1 Introduction

The development of adaptive control systems, while marked by significant theoretical advancements, has historically faced skepticism from practitioners, rooted in concerns over the reliability and robustness of such systems in real-world applications. Brian Anderson, in his article [4], explains reasons for this distrust and the inherent dangers of adaptive control algorithms. In this paper we highlight another fundamental vulnerability in adaptive learning and data-driven control algorithms: premature and often false generalization beyond the seen data. That is, hastily generalizing the behavior of the system beyond its behavior that was observed in the data. This is pervasive in modern data-driven control methods and can lead to catastrophic results in real-world applications. We propose practical methods for overcoming this vulnerability in a computationally efficient manner and using standard off-the-shelf software packages.

Data has become a cornerstone across every scientific domain, and it is foundational for the advancements in artificial intelligence [23]. In control theory, the reliance on data in control design is not new; and the subfields of system identification [26], robust control [17], and adaptive control [6], pioneered decades ago, have long been central to the field of control. The historical insights these subfields have generated are still crucial to the development of modern data-driven control and reinforcement learning (RL) algorithms. Take, for example, the “exciting” input for the stability of adaptive control [5, 27], which is analogous to the role of stochastic policies in RL algorithms [35], or the problem of the chaos phenomenon resulting from learning and control loops operating in comparable time scales, discovered in adaptive control [28] and later in RL [39]. These insights have also been crucial to the development of dual control, and its automatic experiment design aspect [32, 21], which have their connections to the exploration vs exploitation trade-off in RL [36].

The aforementioned insights, particularly the difficulties they are coupled with, hindered the spread of adaptive control algorithms in real-world applications, and instead, practitioners and control theorists alike found refuge in the more established and guaranteed robust control algorithms [34]. The importance of adaptive control cannot be excluded, however; and the recent surge in the application of data-driven control and RL algorithms (in power systems, for instance [18, 41] ) and RL’s role in the development of large language models [7] revive this importance. Thus, we are motivated to further understand the problems of adaptive control and expand their remedies.

In this paper we highlight an extra vulnerability of adaptive and data-driven control, not addressed explicitly by Anderson in [4]. This vulnerability is the premature and possibly false generalizations beyond data, resulting from extrapolating the behavior of the system beyond what was experienced in the data. This in turn can lead to compromising safety and performance under the existence of unmodeled nonlinearities in the true underlying system.

Modern data-driven control methods, whether direct (model-free) [13] or indirect (model-based) [40], mostly assume the linearity of the data-generating dynamics (although a few approaches extend direct methods to very limited forms of nonlinearities [15]). The problem with the linearity assumption is that it imposes the universality of the data, in the sense that the underlying dynamics must “behave similarly” (according to the same linear dynamics), beyond data and in any region of the state space. This assumption is often invalid, however, because various systems admit nonlinear effects beyond their engineered operating conditions and typical control design does not account for this situation.

We emphasize that in a system identification and data collection experiment, an identified system (or recorded data in the direct approaches case) is not necessarily valid under different experimental and data collection conditions. For example, a controller designed to be stabilizing/optimal with respect to the identified model (or recorded data) is guaranteed to be stabilizing/optimal only with respect to this specific identified model. However, eventually this controller will be connected to the actual data-generating system. This connection likely will change the operating conditions of the system and therefore may invalidate the exact assumptions this controller was based on. This can possibly activate unmodeled nonlinearities that can result in instability or severely degraded performance. Therefore, even with batch learning and iterative identification in a much smaller time scale than control [2], a change in the controller may still result in a rapid shift in the system’s operating conditions, compromising safety and performance.

The methods we propose in this paper enforce the consistency between the designed closed-loop system and the data encountered during learning. We achieve this consistency by augmenting the standard linear quadratic regulator (LQR) problem with affine regularization terms and linear matrix inequalities (LMIs) in the corresponding decision variables. The result is an affine semi-definite program (SDP) with LMI constraints that can be solved, even for high-dimensional systems, efficiently and using off-the-shelf software packages. Furthermore, we introduce a single hyperparameter that enforces the consistency level; in other words, this hyperparameter is an exploration vs exploitation balance factor. Our method allows for an iterative identification (data collection) and control that prevents rapid distributional shifts and the related sudden violations of data consistency.

In mathematical terms, this augmentation of the standard LQR problem includes working with the parametrization given by the controllability-type Gramian [25], which is also the steady-state covariance matrix of the system state. This makes it possible to relate the closed-loop state-input distribution to that of the data, using LMI constraints and/or regularization terms such as the Frobenius norm on covariance matrices or the Kullback–Liebler divergence between distributions. Throughout the paper we call a control design approach data-conforming if it enforces the consistency with data according to our aforementioned methods.

Inspired by the problems addressed in [4], our problem is closest in spirit to the so-called Rohrs’ counterexample [33], which sought to invalidate adaptive control algorithms in the 1980s. Part of Rohrs’ argument is that every physical system has unmodeled (parasitic) high-frequency dynamics. Because contemporary adaptive control led to highly nonlinear loops even for simple linear systems, these loops could internally generate high-frequency signals, exciting the unmodeled dynamics and possibly leading to instability. Our concern regarding modern data-driven control is analogous to that of Rohrs’ with adaptive control, but we target unmodeled nonlinear dynamics instead of unmodeled high-frequency ones. Through iterative identification and carefully expanding the closed-loop bandwidth, one can avoid exciting high-frequency dynamics [3]. Analogously, through iterative identification and careful data-conforming control design, one can avoid sudden exposures to new areas of the state space, which potentially contain harmful, unmodeled nonlinearities.

Our data-conforming control design approach also bears resemblance to the unfalsified control approach [12] in enforcing consistency with past data. Contrasting with this approach, which categorizes consistency through Boolean values, our approach is built around consistency in distances between distributions, allowing for generalized formulations that can cope with stochastic dynamics. Moreover, the method of [12] relies on model reference adaptive control, whereas our approach augments modern optimal control design via extending the standard LQR problem with affine regularization terms and LMI constraints. Our approach readily integrates with many modern control design formulations, allows for multivariate state-space systems, and can be extended to handle receding-horizon and input-output formulations.

The idea of consistency with data is also fundamental in the field of offline RL [1], but there the main motivation is data efficiency by initially limiting exploration. Offline RL algorithms are generally complex [20] and rely on stochastic nonlinear programming methods. Instead, our approach insists on computationally efficient methods that blend with modern control design techniques.

One can argue that modern data-driven policy gradient methods [42, 19] with small enough step sizes can prevent sudden distributional shifts and therefore enhance consistency with data. This is indeed a plausible argument. However, it is difficult to relate between the control gain step size and the corresponding distributional shift beyond a simple (ϵδitalic-ϵ𝛿\epsilon-\deltaitalic_ϵ - italic_δ) continuity argument. Instead, our approach deals explicitly with distributions and directly dampens their shifts. Moreover, a very dampened policy gradient procedure may be slow and hence inadequate for many time-varying dynamics, limiting its applicability to constant or pseudo-constant dynamics. On the other hand, our approach decouples distributional shifts from learning system variations and therefore can accomplish both tasks effectively and simultaneously.

We conclude the paper with a simple yet telling numerical example that explains the premature generalization problem inherent in modern data-driven approaches and shows how our proposed solutions can mitigate this problem.

2 Problem Formulation

Consider the dynamic system

xk+1subscript𝑥𝑘1\displaystyle x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT =f(xk,uk,wk),absent𝑓subscript𝑥𝑘subscript𝑢𝑘subscript𝑤𝑘\displaystyle=f(x_{k},u_{k},w_{k}),= italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , (1)

where xkrxsubscript𝑥𝑘superscriptsubscript𝑟𝑥x_{k}\in\mathbb{R}^{r_{x}}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the state and ukrusubscript𝑢𝑘superscriptsubscript𝑟𝑢u_{k}\in\mathbb{R}^{r_{u}}italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the control input. The function f𝑓fitalic_f is bounded but unknown and possibly nonlinear. The exogenous disturbance wkrxsubscript𝑤𝑘superscriptsubscript𝑟𝑥w_{k}\in\mathbb{R}^{r_{x}}italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is independent and identically distributed. It is also independent from x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, the initial condition, which is also random and has finite mean and covariance.

The goal of this paper is to design a control law that minimizes the cost function

J=limk𝔼{xkQxk+ukRuk},𝐽subscript𝑘𝔼superscriptsubscript𝑥𝑘top𝑄subscript𝑥𝑘superscriptsubscript𝑢𝑘top𝑅subscript𝑢𝑘\displaystyle J=\lim_{k\to\infty}\mathbb{E\,}\left\{x_{k}^{\top}Qx_{k}+u_{k}^{% \top}Ru_{k}\right\},italic_J = roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT blackboard_E { italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_Q italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } , (2)

where Q0succeeds-or-equals𝑄0Q\succeq 0italic_Q ⪰ 0 and R0succeeds𝑅0R\succ 0italic_R ≻ 0 are positive semi-definite and positive definite, respectively111In this paper all positive semi-definite and positive definite matrices are also symmetric., and the expectation averages over all the possible realizations of x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and wksubscript𝑤𝑘w_{k}italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.

Since f𝑓fitalic_f is unknown, we assume that we have access to it only through experiments. Hence, we rely on the data-driven paradigm for the control design. During the first data collection experiment, we assume an initial control law of the form

uk=K0xk+vk,subscript𝑢𝑘subscript𝐾0subscript𝑥𝑘subscript𝑣𝑘\displaystyle u_{k}=K_{0}x_{k}+v_{k},italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , (3)

where this law is locally stabilizing (not necessarily optimal) and vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a persistently exciting (PE) signal, in the following sense.

Assumption 1.

(The PE assumption [14, condition (6)]). The signal vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT in (3) is PE for (1). That is, given a natural number222This lower bound is the minimum number of measurements required to identify a linear model of the system [40]. N(ru+1)rx+ru𝑁subscript𝑟𝑢1subscript𝑟𝑥subscript𝑟𝑢N\geq(r_{u}+1)r_{x}+r_{u}italic_N ≥ ( italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT + 1 ) italic_r start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT + italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, every control realization {uk}k=0Nsuperscriptsubscriptsubscript𝑢𝑘𝑘0𝑁\{u_{k}\}_{k=0}^{N}{ italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT and the corresponding state realization {xk}k=0Nsuperscriptsubscriptsubscript𝑥𝑘𝑘0𝑁\{x_{k}\}_{k=0}^{N}{ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT produce the matrix

D:=[XU]=[x0xNu0uN]assign𝐷matrix𝑋𝑈matrixsubscript𝑥0subscript𝑥𝑁subscript𝑢0subscript𝑢𝑁{D}:=\begin{bmatrix}X\\ U\end{bmatrix}=\begin{bmatrix}x_{0}&\ldots&x_{N}\\ u_{0}&\ldots&u_{N}\end{bmatrix}italic_D := [ start_ARG start_ROW start_CELL italic_X end_CELL end_ROW start_ROW start_CELL italic_U end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL start_CELL … end_CELL start_CELL italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL start_CELL … end_CELL start_CELL italic_u start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] (4)

with full row rank rx+rusubscript𝑟𝑥subscript𝑟𝑢r_{x}+r_{u}italic_r start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT + italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT.

The PE assumption is typical in system identification and data-driven control design and equates to an effective experiment design and a minimal descriptive state definition.

Problem statement

Starting from the initial control law of the form (3) and given access to the resulting input and state data (4), with the possibility of iteratively changing the control law and collecting new data, design a linear state feedback gain K𝐾Kitalic_K that minimizes the cost J𝐽Jitalic_J in (2).

Assumption 2.

The data (4) is centered around zero, and the state-input joint distribution can be approximated by a multivariate Gaussian with a positive definite covariance.

This assumption is true if the underlying system is linear Gaussian. For nonlinear systems, however, this point can be achieved, for instance, if the data was collected locally within a specified region of the state-input space under some operational limits. In other words, this assumption is satisfied if there is a “good enough” linear approximation of the underlying system within each operating conditions (4).

One may ask about the purpose of considering a general nonlinear system in (1) if the data can be predicted by an approximate linear model. The reason is to remember that once a new controller is designed, the resulting closed-loop system may operate in regions of the state-input space not adequately explored by the learning data and over which this approximate linear model is no longer a valid approximation. This point can be easily forgotten if we start with a linear system in (1).

3 Background

In this section we start by presenting the standard LQR problem. Although the problem formulation targets the nonlinear dynamics (1), our derivations in the subsequent sections assume the existence of a valid linear approximation under each operating condition.

3.1 Standard LQR

Suppose, in this subsection only, that the state-space system (1) is linear,

xk+1=f(xk,uk,wk)=Axk+Buk+wk,subscript𝑥𝑘1𝑓subscript𝑥𝑘subscript𝑢𝑘subscript𝑤𝑘𝐴subscript𝑥𝑘𝐵subscript𝑢𝑘subscript𝑤𝑘\displaystyle x_{k+1}=f(x_{k},u_{k},w_{k})=Ax_{k}+Bu_{k}+w_{k},italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = italic_A italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_B italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ,

where x0,wksubscript𝑥0subscript𝑤𝑘x_{0},w_{k}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT are random variables and their statistics are as in (1), the disturbance wksubscript𝑤𝑘w_{k}italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT has zero mean and a known (in this subsection only, it will be approximated from data in the later subsections) covariance W0succeeds-or-equals𝑊0W\succeq 0italic_W ⪰ 0, and the pair (A,B)𝐴𝐵(A,B)( italic_A , italic_B ) is known and controllable [25]. The LQR problem is obtaining a controller of the form uk=Kxk+vksubscript𝑢𝑘𝐾subscript𝑥𝑘subscript𝑣𝑘u_{k}=Kx_{k}+v_{k}italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_K italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, where vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a white noise of zero mean and a fixed user-defined covariance V0succeeds𝑉0V\succ 0italic_V ≻ 0, to minimize (2). The signal vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is to preserve Assumption 1 for future or iterative control design.

The cost function (2), up to an additive constant in K𝐾Kitalic_K, can be rewritten as (for a detailed derivation, check Appendix A.1)

J=tr(P[W+BVB]),𝐽tr𝑃delimited-[]𝑊𝐵𝑉superscript𝐵top\displaystyle J=\textit{tr}\left(P\left[W+BVB^{\top}\right]\right),italic_J = tr ( italic_P [ italic_W + italic_B italic_V italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] ) , (5)

where P𝑃Pitalic_P is the observability-type Gramian [25], given by

P=k=0[A+BK]k[Q+KRK][A+BK]k,𝑃superscriptsubscript𝑘0superscriptdelimited-[]𝐴𝐵𝐾limit-from𝑘topdelimited-[]𝑄superscript𝐾top𝑅𝐾superscriptdelimited-[]𝐴𝐵𝐾𝑘\displaystyle P=\sum_{k=0}^{\infty}[A+BK]^{k\,\top}\left[Q+K^{\top}RK\right][A% +BK]^{k},italic_P = ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT [ italic_A + italic_B italic_K ] start_POSTSUPERSCRIPT italic_k ⊤ end_POSTSUPERSCRIPT [ italic_Q + italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R italic_K ] [ italic_A + italic_B italic_K ] start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , (6)

where Q0succeeds-or-equals𝑄0Q\succeq 0italic_Q ⪰ 0 and R0succeeds𝑅0R\succ 0italic_R ≻ 0 are the user-defined weighting matrices, as in (2), and (Q,A)𝑄𝐴(Q,A)( italic_Q , italic_A ) is assumed detectable. The optimal control gain K𝐾Kitalic_K (which minimizes J𝐽Jitalic_J) is then given by the solution to the following problem.

Problem 1.

Standard LQR (Observability-type Gramian) [8, Sec. 4.1]:

Evaluate K=[R+BPB]1BPA,Evaluate 𝐾superscriptdelimited-[]𝑅superscript𝐵top𝑃𝐵1superscript𝐵top𝑃𝐴\displaystyle\text{Evaluate }K=-\left[R+B^{\top}PB\right]^{-1}B^{\top}PA,Evaluate italic_K = - [ italic_R + italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_P italic_B ] start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_P italic_A ,

when P𝑃Pitalic_P satisfies the algebraic Riccati equation

0=Q+APAPAPB[R+BPB]1BPA.0𝑄superscript𝐴top𝑃𝐴𝑃superscript𝐴top𝑃𝐵superscriptdelimited-[]𝑅superscript𝐵top𝑃𝐵1superscript𝐵top𝑃𝐴\displaystyle 0=Q+A^{\top}PA-P-A^{\top}PB\left[R+B^{\top}PB\right]^{-1}B^{\top% }PA.0 = italic_Q + italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_P italic_A - italic_P - italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_P italic_B [ italic_R + italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_P italic_B ] start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_P italic_A .

The problem above results from the square completion in K𝐾Kitalic_K of the Lyapunov equation

0=[A+BK]P[A+BK]P+Q+KRK,0superscriptdelimited-[]𝐴𝐵𝐾top𝑃delimited-[]𝐴𝐵𝐾𝑃𝑄superscript𝐾top𝑅𝐾\displaystyle 0=[A+BK]^{\top}P[A+BK]-P+Q+K^{\top}RK,0 = [ italic_A + italic_B italic_K ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_P [ italic_A + italic_B italic_K ] - italic_P + italic_Q + italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R italic_K , (7)

then chosing K=[R+BPB]1BPA𝐾superscriptdelimited-[]𝑅superscript𝐵top𝑃𝐵1superscript𝐵top𝑃𝐴K=-\left[R+B^{\top}PB\right]^{-1}B^{\top}PAitalic_K = - [ italic_R + italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_P italic_B ] start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_P italic_A, which corresponds to the minimum P𝑃Pitalic_P (in the Loewner (succeeds-or-equals\succeq) ordering sense).

There is an equivalent characterization to solving the LQR problem. Using the linearity and the cyclic property of the trace, the cost can be written as (check Appendix A.1 for more details)

J𝐽\displaystyle Jitalic_J =tr([Q+KRK]Σ),absenttrdelimited-[]𝑄superscript𝐾top𝑅𝐾Σ\displaystyle=\textit{tr}\left(\left[Q+K^{\top}RK\right]\Sigma\right),= tr ( [ italic_Q + italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R italic_K ] roman_Σ ) , (8)
=tr(QΣ)+tr(RKΣK),absenttr𝑄Σtr𝑅𝐾Σsuperscript𝐾top\displaystyle=\textit{tr}\left(Q\Sigma\right)+\textit{tr}\left(RK\Sigma K^{% \top}\right),= tr ( italic_Q roman_Σ ) + tr ( italic_R italic_K roman_Σ italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) ,

where ΣΣ\Sigmaroman_Σ is the controllability-type Gramian, given by

ΣΣ\displaystyle\Sigmaroman_Σ =limk𝔼{xkxk},absentsubscript𝑘𝔼subscript𝑥𝑘superscriptsubscript𝑥𝑘top\displaystyle=\lim_{k\to\infty}\mathbb{E\,}\left\{x_{k}x_{k}^{\top}\right\},= roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT blackboard_E { italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT } , (9)
=k=0[A+BK]k[W+BVB][A+BK]k,absentsuperscriptsubscript𝑘0superscriptdelimited-[]𝐴𝐵𝐾𝑘delimited-[]𝑊𝐵𝑉superscript𝐵topsuperscriptdelimited-[]𝐴𝐵𝐾limit-from𝑘top\displaystyle=\sum_{k=0}^{\infty}\left[A+BK\right]^{k}\left[W+BVB^{\top}\right% ]\left[A+BK\right]^{k\,\top},= ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT [ italic_A + italic_B italic_K ] start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT [ italic_W + italic_B italic_V italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] [ italic_A + italic_B italic_K ] start_POSTSUPERSCRIPT italic_k ⊤ end_POSTSUPERSCRIPT ,

and is also the solution of the Lyapunov equation

Σ=[A+BK]Σ[A+BK]+W+BVB.Σdelimited-[]𝐴𝐵𝐾Σsuperscriptdelimited-[]𝐴𝐵𝐾top𝑊𝐵𝑉superscript𝐵top\displaystyle\Sigma=\left[A+BK\right]\Sigma\left[A+BK\right]^{\top}+W+BVB^{% \top}.roman_Σ = [ italic_A + italic_B italic_K ] roman_Σ [ italic_A + italic_B italic_K ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_W + italic_B italic_V italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT . (10)

Since V0succeeds𝑉0V\succ 0italic_V ≻ 0 and the pair (A,B)𝐴𝐵(A,B)( italic_A , italic_B ) is controllable, the controllability Gramian satisfies Σ0succeedsΣ0\Sigma\succ 0roman_Σ ≻ 0.

Using this parametrization, we first define the extra variable Z0subscript𝑍0Z_{0}italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, such that Z0KΣKsucceeds-or-equalssubscript𝑍0𝐾Σsuperscript𝐾topZ_{0}\succeq K\Sigma K^{\top}italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⪰ italic_K roman_Σ italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, which is equivalent to (the Schur complement corresponding to) the LMI333Since congruence preserves definiteness, that is, for any full rank matrix T𝑇Titalic_T, TΣT0succeeds𝑇Σsuperscript𝑇top0T\Sigma T^{\top}\succ 0italic_T roman_Σ italic_T start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ≻ 0 if and only if Σ0succeedsΣ0\Sigma\succ 0roman_Σ ≻ 0. T𝑇Titalic_T is taken to be T=[IK0I]𝑇matrix𝐼𝐾0𝐼T=\begin{bmatrix}I&-K\\ 0&I\end{bmatrix}italic_T = [ start_ARG start_ROW start_CELL italic_I end_CELL start_CELL - italic_K end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_I end_CELL end_ROW end_ARG ] in the above instance, and hence, both Z0KΣKsubscript𝑍0𝐾Σsuperscript𝐾topZ_{0}-K\Sigma K^{\top}italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_K roman_Σ italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT and ΣΣ\Sigmaroman_Σ appear on the block diagonal of TΣT𝑇Σsuperscript𝑇topT\Sigma T^{\top}italic_T roman_Σ italic_T start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. Since Σ0succeedsΣ0\Sigma\succ 0roman_Σ ≻ 0, it must be true that Z0KΣK0succeeds-or-equalssubscript𝑍0𝐾Σsuperscript𝐾top0Z_{0}-K\Sigma K^{\top}\succeq 0italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_K roman_Σ italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ⪰ 0.

[Z0KΣΣKΣ]0, or, [Z0LLΣ]0,formulae-sequencesucceeds-or-equalsmatrixsubscript𝑍0𝐾ΣΣsuperscript𝐾topΣ0succeeds-or-equals or, matrixsubscript𝑍0𝐿superscript𝐿topΣ0\displaystyle\begin{bmatrix}Z_{0}&K\Sigma\\ \Sigma K^{\top}&\Sigma\end{bmatrix}\succeq 0,\text{ or, }\begin{bmatrix}Z_{0}&% L\\ L^{\top}&\Sigma\end{bmatrix}\succeq 0,[ start_ARG start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL start_CELL italic_K roman_Σ end_CELL end_ROW start_ROW start_CELL roman_Σ italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL roman_Σ end_CELL end_ROW end_ARG ] ⪰ 0 , or, [ start_ARG start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL start_CELL italic_L end_CELL end_ROW start_ROW start_CELL italic_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL roman_Σ end_CELL end_ROW end_ARG ] ⪰ 0 ,

using the change of variables L=KΣ𝐿𝐾ΣL=K\Sigmaitalic_L = italic_K roman_Σ. The stability-imposing inequality relaxation of (10) (we discuss the consequences of this relaxation in Section 5) can also be described as an LMI, that is,

Σ[A+BK]Σ[A+BK]+W+BVBsucceeds-or-equalsΣdelimited-[]𝐴𝐵𝐾Σsuperscriptdelimited-[]𝐴𝐵𝐾top𝑊𝐵𝑉superscript𝐵top\displaystyle\Sigma\succeq\left[A+BK\right]\Sigma\left[A+BK\right]^{\top}+W+% BVB^{\top}roman_Σ ⪰ [ italic_A + italic_B italic_K ] roman_Σ [ italic_A + italic_B italic_K ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_W + italic_B italic_V italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT
Σ[AΣ+BL]Σ1[AΣ+BL]+W+BVBiffabsentsucceeds-or-equalsΣdelimited-[]𝐴Σ𝐵𝐿superscriptΣ1superscriptdelimited-[]𝐴Σ𝐵𝐿top𝑊𝐵𝑉superscript𝐵top\displaystyle\iff\Sigma\succeq\left[A\Sigma+BL\right]\Sigma^{-1}\left[A\Sigma+% BL\right]^{\top}+W+BVB^{\top}⇔ roman_Σ ⪰ [ italic_A roman_Σ + italic_B italic_L ] roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ italic_A roman_Σ + italic_B italic_L ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_W + italic_B italic_V italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT
[ΣWBVBAΣ+BLΣA+LBΣ]0.iffabsentsucceeds-or-equalsmatrixΣ𝑊𝐵𝑉superscript𝐵top𝐴Σ𝐵𝐿Σsuperscript𝐴topsuperscript𝐿topsuperscript𝐵topΣ0\displaystyle\iff\begin{bmatrix}\Sigma-W-BVB^{\top}&A\Sigma+BL\\ \Sigma A^{\top}+L^{\top}B^{\top}&\Sigma\end{bmatrix}\succeq 0.⇔ [ start_ARG start_ROW start_CELL roman_Σ - italic_W - italic_B italic_V italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL italic_A roman_Σ + italic_B italic_L end_CELL end_ROW start_ROW start_CELL roman_Σ italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL roman_Σ end_CELL end_ROW end_ARG ] ⪰ 0 .

Therefore, the LQR problem can be described as an SDP with affine cost and LMI constraints [9].

Problem 2.

Standard LQR (Controllability-type Gramian):444From a numerical perspective, to avoid open sets in the domain of the optimization problem, Σ0succeedsΣ0\Sigma\succ 0roman_Σ ≻ 0 can be replaced by Σϵ0Isucceeds-or-equalsΣsubscriptitalic-ϵ0𝐼\Sigma\succeq\epsilon_{0}Iroman_Σ ⪰ italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_I, where ϵ0>0subscriptitalic-ϵ00\epsilon_{0}>0italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > 0 is very small.

minΣ,L,Z0tr(QΣ)+tr(RZ0)subscriptΣ𝐿subscript𝑍0tr𝑄Σtr𝑅subscript𝑍0\displaystyle\min_{\Sigma,\,L,\,Z_{0}}\textit{tr}\left(Q\Sigma\right)+\textit{% tr}\left(RZ_{0}\right)roman_min start_POSTSUBSCRIPT roman_Σ , italic_L , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT tr ( italic_Q roman_Σ ) + tr ( italic_R italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT )
s.t. Σ0,[Z0LLΣ]0,formulae-sequencesucceedss.t. Σ0succeeds-or-equalsmatrixsubscript𝑍0𝐿superscript𝐿topΣ0\displaystyle\text{s.t. }\Sigma\succ 0,\quad\begin{bmatrix}Z_{0}&L\\ L^{\top}&\Sigma\end{bmatrix}\succeq 0,s.t. roman_Σ ≻ 0 , [ start_ARG start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL start_CELL italic_L end_CELL end_ROW start_ROW start_CELL italic_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL roman_Σ end_CELL end_ROW end_ARG ] ⪰ 0 ,
[ΣWBVBAΣ+BLΣA+LBΣ]0,succeeds-or-equalsmatrixΣ𝑊𝐵𝑉superscript𝐵top𝐴Σ𝐵𝐿Σsuperscript𝐴topsuperscript𝐿topsuperscript𝐵topΣ0\displaystyle\begin{bmatrix}\Sigma-W-BVB^{\top}&A\Sigma+BL\\ \Sigma A^{\top}+L^{\top}B^{\top}&\Sigma\end{bmatrix}\succeq 0,[ start_ARG start_ROW start_CELL roman_Σ - italic_W - italic_B italic_V italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL italic_A roman_Σ + italic_B italic_L end_CELL end_ROW start_ROW start_CELL roman_Σ italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL roman_Σ end_CELL end_ROW end_ARG ] ⪰ 0 ,

where the change of variables L=KΣ𝐿𝐾ΣL=K\Sigmaitalic_L = italic_K roman_Σ has been used and the optimal control is then recovered from the optimal values ΣsubscriptΣ\Sigma_{\star}roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT and Lsubscript𝐿L_{\star}italic_L start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT through K=LΣ1subscript𝐾subscript𝐿superscriptsubscriptΣ1K_{\star}=L_{\star}\Sigma_{\star}^{-1}italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT = italic_L start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT.

The choice of using the controllability-type parametrization of the cost (8) is motivated by (9); the matrix ΣΣ\Sigmaroman_Σ is the steady-state state covariance matrix and thus has a statistical interpretation. Since this paper is in the context of data-driven control, the matrix ΣΣ\Sigmaroman_Σ is of great importance, and having it appear explicitly is a key. We show this in Section 4.

3.2 Data-driven certainty equivalence

In the case of unknown system matrices A𝐴Aitalic_A and B𝐵Bitalic_B, conventional (indirect) data-driven approaches identify a model (A^,B^)^𝐴^𝐵(\widehat{A},\widehat{B})( over^ start_ARG italic_A end_ARG , over^ start_ARG italic_B end_ARG ) and W^^𝑊\widehat{W}over^ start_ARG italic_W end_ARG, then solve the standard LQR problem with the identified model assumed to represent the true system exactly, a procedure termed certainty equivalence LQR.

The estimate model (A^,B^)^𝐴^𝐵(\widehat{A},\widehat{B})( over^ start_ARG italic_A end_ARG , over^ start_ARG italic_B end_ARG ) and W^^𝑊\widehat{W}over^ start_ARG italic_W end_ARG can be obtained as a solution to the least-squares problem [42]

=argminA,BXerrF,W^=1NXerrXerr,Xerr=X1[A,B][X0U0],\displaystyle\begin{aligned} &=\operatorname*{arg\,min}_{A,B}\left\lVert X_{% err}\right\rVert_{F},\\ \widehat{W}&=\frac{1}{N}X_{err}X_{err}^{\top},\\ X_{err}&=X_{1}-[A,B]\begin{bmatrix}X_{0}\\ U_{0}\end{bmatrix},\end{aligned}start_ROW start_CELL end_CELL start_CELL = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_A , italic_B end_POSTSUBSCRIPT ∥ italic_X start_POSTSUBSCRIPT italic_e italic_r italic_r end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL over^ start_ARG italic_W end_ARG end_CELL start_CELL = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG italic_X start_POSTSUBSCRIPT italic_e italic_r italic_r end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_e italic_r italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , end_CELL end_ROW start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_e italic_r italic_r end_POSTSUBSCRIPT end_CELL start_CELL = italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - [ italic_A , italic_B ] [ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_U start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] , end_CELL end_ROW (11)

where Fsubscriptdelimited-∥∥𝐹\lVert\cdot\rVert_{F}∥ ⋅ ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT is the Frobenius norm. Using the acquired data (4), we let X0:=[x0,,xN1]assignsubscript𝑋0subscript𝑥0subscript𝑥𝑁1X_{0}:=[x_{0},\ldots,x_{N-1}]italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT := [ italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_N - 1 end_POSTSUBSCRIPT ], U0:=[u0,,uN1]assignsubscript𝑈0subscript𝑢0subscript𝑢𝑁1U_{0}:=[u_{0},\ldots,u_{N-1}]italic_U start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT := [ italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_u start_POSTSUBSCRIPT italic_N - 1 end_POSTSUBSCRIPT ], and X1:=[x1,,xN]assignsubscript𝑋1subscript𝑥1subscript𝑥𝑁X_{1}:=[x_{1},\ldots,x_{N}]italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ].

The certainty equivalence LQR problem can be decomposed into an identification problem and then a control design problem. We describe this two-level approach as the following problem.

Problem 3.

Certainty equivalence LQR:

minΣ,L,Z0tr(QΣ)+tr(RZ0)subscriptΣ𝐿subscript𝑍0tr𝑄Σtr𝑅subscript𝑍0\displaystyle\min_{\Sigma,\,L,\,Z_{0}}\textit{tr}\left(Q\Sigma\right)+\textit{% tr}\left(RZ_{0}\right)roman_min start_POSTSUBSCRIPT roman_Σ , italic_L , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT tr ( italic_Q roman_Σ ) + tr ( italic_R italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT )
s.t. Σ0,[Z0LLΣ]0,formulae-sequencesucceedss.t. Σ0succeeds-or-equalsmatrixsubscript𝑍0𝐿superscript𝐿topΣ0\displaystyle\text{s.t. }\Sigma\succ 0,\quad\begin{bmatrix}Z_{0}&L\\ L^{\top}&\Sigma\end{bmatrix}\succeq 0,s.t. roman_Σ ≻ 0 , [ start_ARG start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL start_CELL italic_L end_CELL end_ROW start_ROW start_CELL italic_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL roman_Σ end_CELL end_ROW end_ARG ] ⪰ 0 ,
[ΣW^B^VB^A^Σ+B^LΣA^+LB^Σ]0,succeeds-or-equalsmatrixΣ^𝑊^𝐵𝑉superscript^𝐵top^𝐴Σ^𝐵𝐿Σsuperscript^𝐴topsuperscript𝐿topsuperscript^𝐵topΣ0\displaystyle\begin{bmatrix}\Sigma-\widehat{W}-\widehat{B}V\widehat{B}^{\top}&% \widehat{A}\Sigma+\widehat{B}L\\ \Sigma\widehat{A}^{\top}+L^{\top}\widehat{B}^{\top}&\Sigma\end{bmatrix}\succeq 0,[ start_ARG start_ROW start_CELL roman_Σ - over^ start_ARG italic_W end_ARG - over^ start_ARG italic_B end_ARG italic_V over^ start_ARG italic_B end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL over^ start_ARG italic_A end_ARG roman_Σ + over^ start_ARG italic_B end_ARG italic_L end_CELL end_ROW start_ROW start_CELL roman_Σ over^ start_ARG italic_A end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG italic_B end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL roman_Σ end_CELL end_ROW end_ARG ] ⪰ 0 ,
A^,B^,W^ as in (11).^𝐴^𝐵^𝑊 as in (11)\displaystyle\widehat{A},\widehat{B},\widehat{W}\text{ as in \eqref{eq:linear % state-space ID}}.over^ start_ARG italic_A end_ARG , over^ start_ARG italic_B end_ARG , over^ start_ARG italic_W end_ARG as in ( ) .

The optimal control is recovered from the optimal values ΣsubscriptΣ\Sigma_{\star}roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT and Lsubscript𝐿L_{\star}italic_L start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT through K=LΣ1subscript𝐾subscript𝐿superscriptsubscriptΣ1K_{\star}=L_{\star}\Sigma_{\star}^{-1}italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT = italic_L start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT.

This data-driven approach is a basic certainty equivalence control design approach. For a discussion of the robustness and the sample efficiency, and to see how to enforce robustness under the existence of noise (wk0subscript𝑤𝑘0w_{k}\neq 0italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≠ 0) when the underlying model (1) is linear, an interested reader can consult [16] and [38]. In this paper, however, we tackle the problem of the false generalization of nonlinear systems by linear ones, beyond data, which is implicit in data-driven control algorithms.

Notice if the underlying system is indeed linear, the region of the state-input space from which the data (4) has been acquired is irrelevant, as long as the PE assumption, Assumption 1, is met. The reason is that if the system is linear, predicting its behavior in any region equates to predicting its behavior universally across the state-input space. This generally does not hold if the underlying system is nonlinear.

One shortcoming of using Problem 3 as the optimal control model is that once a linear model is identified and the resulting gain matrix K𝐾Kitalic_K is applied in feedback, the region in the state space, where the new closed-loop system is operating, may not be within or close to the region explored by the data (4) in the identification step. Therefore, if the underlying system is nonlinear555This is typically the realistic case, as many systems that are considered linear are not so beyond some operating conditions., the new controller may force the system to go to regions of the domain of f𝑓fitalic_f over which the system admits behaviors that are very different from what was experienced in the data. This implicit and inherent false universal generalization may in turn result in serious violations of safety or lead to a degraded performance.

4 Data-conforming data-driven control

In practice, for a controller to conform to the learning data, a measure of similarity between distributions has to be adopted and augmented to the cost/constraints of the LQR problem. For this purpose, several approaches and simplifications can be considered. We explore some of them in this section.

4.1 Conforming to the state data

Notice that Problem 3 is a convex optimization problem and that the decision variable ΣΣ\Sigmaroman_Σ, the steady-state state covariance matrix, can be enforced to be similar to the state empirical covariance from data, via convex constraints or regularization.

Using (4), let ΣdatasubscriptΣ𝑑𝑎𝑡𝑎\Sigma_{data}roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT be the state empirical covariance:

Σdata=1N+1XX.subscriptΣ𝑑𝑎𝑡𝑎1𝑁1𝑋superscript𝑋top\displaystyle\Sigma_{data}=\frac{1}{N+1}{X}{X}^{\top}.roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N + 1 end_ARG italic_X italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT .

The data-conforming (in the state) version of Problem 3 can now be stated as follows.

Problem 4.

Data-conforming (in the state) LQR via a hard constraint:

minΣ,L,Z0tr(QΣ)+tr(RZ0)subscriptΣ𝐿subscript𝑍0tr𝑄Σtr𝑅subscript𝑍0\displaystyle\min_{\Sigma,\,L,\,Z_{0}}\textit{tr}\left(Q\Sigma\right)+\textit{% tr}\left(RZ_{0}\right)roman_min start_POSTSUBSCRIPT roman_Σ , italic_L , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT tr ( italic_Q roman_Σ ) + tr ( italic_R italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT )
s.t. Σ0,[Z0LLΣ]0,Σ=Σdata,formulae-sequencesucceedss.t. Σ0formulae-sequencesucceeds-or-equalsmatrixsubscript𝑍0𝐿superscript𝐿topΣ0ΣsubscriptΣ𝑑𝑎𝑡𝑎\displaystyle\text{s.t. }\Sigma\succ 0,\quad\begin{bmatrix}Z_{0}&L\\ L^{\top}&\Sigma\end{bmatrix}\succeq 0,\quad\Sigma=\Sigma_{data},s.t. roman_Σ ≻ 0 , [ start_ARG start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL start_CELL italic_L end_CELL end_ROW start_ROW start_CELL italic_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL roman_Σ end_CELL end_ROW end_ARG ] ⪰ 0 , roman_Σ = roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ,
[ΣW^B^VB^A^Σ+B^LΣA^+LB^Σ]0,succeeds-or-equalsmatrixΣ^𝑊^𝐵𝑉superscript^𝐵top^𝐴Σ^𝐵𝐿Σsuperscript^𝐴topsuperscript𝐿topsuperscript^𝐵topΣ0\displaystyle\begin{bmatrix}\Sigma-\widehat{W}-\widehat{B}V\widehat{B}^{\top}&% \widehat{A}\Sigma+\widehat{B}L\\ \Sigma\widehat{A}^{\top}+L^{\top}\widehat{B}^{\top}&\Sigma\end{bmatrix}\succeq 0,[ start_ARG start_ROW start_CELL roman_Σ - over^ start_ARG italic_W end_ARG - over^ start_ARG italic_B end_ARG italic_V over^ start_ARG italic_B end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL over^ start_ARG italic_A end_ARG roman_Σ + over^ start_ARG italic_B end_ARG italic_L end_CELL end_ROW start_ROW start_CELL roman_Σ over^ start_ARG italic_A end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG italic_B end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL roman_Σ end_CELL end_ROW end_ARG ] ⪰ 0 ,
A^,B^,W^ as in (11),^𝐴^𝐵^𝑊 as in (11)\displaystyle\widehat{A},\widehat{B},\widehat{W}\text{ as in \eqref{eq:linear % state-space ID}},over^ start_ARG italic_A end_ARG , over^ start_ARG italic_B end_ARG , over^ start_ARG italic_W end_ARG as in ( ) ,

and, similarly, the optimal control is recovered from the optimal values ΣsubscriptΣ\Sigma_{\star}roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT and Lsubscript𝐿L_{\star}italic_L start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT through K=LΣ1subscript𝐾subscript𝐿superscriptsubscriptΣ1K_{\star}=L_{\star}\Sigma_{\star}^{-1}italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT = italic_L start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT.

The new hard linear constraint Σ=ΣdataΣsubscriptΣ𝑑𝑎𝑡𝑎\Sigma=\Sigma_{data}roman_Σ = roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT has two shortcomings: (i) possible numerical instabilities and/or feasibility issues and (ii) limits to exploration beyond learning data. That is, potential future improvements via some exploration is not possible because the new controller will always seek to generate the same state distribution. This can be relaxed by using some margin, say, ΣΣdataϵ𝕀succeeds-or-equalsΣsubscriptΣ𝑑𝑎𝑡𝑎italic-ϵ𝕀\Sigma\succeq\Sigma_{data}-\epsilon\mathbb{I}roman_Σ ⪰ roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT - italic_ϵ blackboard_I and ΣΣdata+ϵ𝕀precedes-or-equalsΣsubscriptΣ𝑑𝑎𝑡𝑎italic-ϵ𝕀\Sigma\preceq\Sigma_{data}+\epsilon\mathbb{I}roman_Σ ⪯ roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT + italic_ϵ blackboard_I, for some ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, predetermined or possibly included in the cost as a slack variable. This flexibility in the constraint, resembled in ϵitalic-ϵ\epsilonitalic_ϵ, can determine the exploration vs exploitation balance of the new control design.

One can also design a convex regularization term using some norm on the covariance matrices. For example, the squared Frobenius norm ΣΣdataF2=tr([ΣΣdata][ΣΣdata])superscriptsubscriptdelimited-∥∥ΣsubscriptΣ𝑑𝑎𝑡𝑎𝐹2trdelimited-[]ΣsubscriptΣ𝑑𝑎𝑡𝑎superscriptdelimited-[]ΣsubscriptΣ𝑑𝑎𝑡𝑎top\lVert\Sigma-\Sigma_{data}\rVert_{F}^{2}=\textit{tr}\left(\left[\Sigma-\Sigma_% {data}\right]\left[\Sigma-\Sigma_{data}\right]^{\top}\right)∥ roman_Σ - roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = tr ( [ roman_Σ - roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ] [ roman_Σ - roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) can be used by minimizing tr(Z)trsuperscript𝑍\textit{tr}\left(Z^{\prime}\right)tr ( italic_Z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), where Z[ΣΣdata][ΣΣdata]succeeds-or-equalssuperscript𝑍delimited-[]ΣsubscriptΣ𝑑𝑎𝑡𝑎superscriptdelimited-[]ΣsubscriptΣ𝑑𝑎𝑡𝑎topZ^{\prime}\succeq\left[\Sigma-\Sigma_{data}\right]\left[\Sigma-\Sigma_{data}% \right]^{\top}italic_Z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⪰ [ roman_Σ - roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ] [ roman_Σ - roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, or equivalently as an LMI [10],

[ZΣΣdataΣΣdataI]0.succeeds-or-equalsmatrixsuperscript𝑍ΣsubscriptΣ𝑑𝑎𝑡𝑎ΣsubscriptΣ𝑑𝑎𝑡𝑎𝐼0\displaystyle\begin{bmatrix}Z^{\prime}&\Sigma-\Sigma_{data}\\ \Sigma-\Sigma_{data}&I\end{bmatrix}\succeq 0.[ start_ARG start_ROW start_CELL italic_Z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL start_CELL roman_Σ - roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL roman_Σ - roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT end_CELL start_CELL italic_I end_CELL end_ROW end_ARG ] ⪰ 0 .

We use this regularization term in the following modified problem.

Problem 5.

Data-conforming (in the state) LQR via regularization:

minΣ,L,Z0,Ztr(QΣ)+tr(RZ0)+γtr(Z)subscriptΣ𝐿subscript𝑍0superscript𝑍tr𝑄Σtr𝑅subscript𝑍0superscript𝛾trsuperscript𝑍\displaystyle\min_{\Sigma,\,L,\,Z_{0},\,Z^{\prime}}\textit{tr}\left(Q\Sigma% \right)+\textit{tr}\left(RZ_{0}\right)+\gamma^{\prime}\textit{tr}\left(Z^{% \prime}\right)roman_min start_POSTSUBSCRIPT roman_Σ , italic_L , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT tr ( italic_Q roman_Σ ) + tr ( italic_R italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT tr ( italic_Z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )
s.t. Σ0,[Z0LLΣ]0,formulae-sequencesucceedss.t. Σ0succeeds-or-equalsmatrixsubscript𝑍0𝐿superscript𝐿topΣ0\displaystyle\text{s.t. }\Sigma\succ 0,\quad\begin{bmatrix}Z_{0}&L\\ L^{\top}&\Sigma\end{bmatrix}\succeq 0,s.t. roman_Σ ≻ 0 , [ start_ARG start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL start_CELL italic_L end_CELL end_ROW start_ROW start_CELL italic_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL roman_Σ end_CELL end_ROW end_ARG ] ⪰ 0 ,
[ΣW^B^VB^A^Σ+B^LΣA^+LB^Σ]0,succeeds-or-equalsmatrixΣ^𝑊^𝐵𝑉superscript^𝐵top^𝐴Σ^𝐵𝐿Σsuperscript^𝐴topsuperscript𝐿topsuperscript^𝐵topΣ0\displaystyle\begin{bmatrix}\Sigma-\widehat{W}-\widehat{B}V\widehat{B}^{\top}&% \widehat{A}\Sigma+\widehat{B}L\\ \Sigma\widehat{A}^{\top}+L^{\top}\widehat{B}^{\top}&\Sigma\end{bmatrix}\succeq 0,[ start_ARG start_ROW start_CELL roman_Σ - over^ start_ARG italic_W end_ARG - over^ start_ARG italic_B end_ARG italic_V over^ start_ARG italic_B end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL over^ start_ARG italic_A end_ARG roman_Σ + over^ start_ARG italic_B end_ARG italic_L end_CELL end_ROW start_ROW start_CELL roman_Σ over^ start_ARG italic_A end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG italic_B end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL roman_Σ end_CELL end_ROW end_ARG ] ⪰ 0 ,
[ZΣΣdataΣΣdataI]0,succeeds-or-equalsmatrixsuperscript𝑍ΣsubscriptΣ𝑑𝑎𝑡𝑎ΣsubscriptΣ𝑑𝑎𝑡𝑎𝐼0\displaystyle\begin{bmatrix}Z^{\prime}&\Sigma-\Sigma_{data}\\ \Sigma-\Sigma_{data}&I\end{bmatrix}\succeq 0,[ start_ARG start_ROW start_CELL italic_Z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL start_CELL roman_Σ - roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL roman_Σ - roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT end_CELL start_CELL italic_I end_CELL end_ROW end_ARG ] ⪰ 0 ,
A^,B^,W^ as in (11),^𝐴^𝐵^𝑊 as in (11)\displaystyle\widehat{A},\widehat{B},\widehat{W}\text{ as in \eqref{eq:linear % state-space ID}},over^ start_ARG italic_A end_ARG , over^ start_ARG italic_B end_ARG , over^ start_ARG italic_W end_ARG as in ( ) ,

where γ>0superscript𝛾0\gamma^{\prime}>0italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT > 0 and the optimal control is recovered from the optimal values ΣsubscriptΣ\Sigma_{\star}roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT and Lsubscript𝐿L_{\star}italic_L start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT through K=LΣ1subscript𝐾subscript𝐿superscriptsubscriptΣ1K_{\star}=L_{\star}\Sigma_{\star}^{-1}italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT = italic_L start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT.

The user-defined weight γsuperscript𝛾\gamma^{\prime}italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT plays an analogous role to ϵitalic-ϵ\epsilonitalic_ϵ in the discussion succeeding Problem 4, in determining the exploration vs exploitation balance of the new control design.

Only the state data distribution is used in Problems 4 and 5. This puts no emphasis on the input data distribution and carries the implicit assumption that the underlying true system (1) is linear with constant coefficients in uksubscript𝑢𝑘u_{k}italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. That is, the seen inputs have the same effect regardless of the system state at their occurrence. In the following subsection, however, we enforce data conformation in the state-input joint distribution instead, which is more suited to general nonlinear systems.

4.2 Conforming to the joint state-input data

We denote the steady-state state-input joint density of the new design and the one estimated from the data by666The notation 𝒩(μ,Σ)𝒩superscript𝜇superscriptΣ\mathcal{N}(\mu^{\prime},\Sigma^{\prime})caligraphic_N ( italic_μ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , roman_Σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) denotes a Gaussian density of mean μsuperscript𝜇\mu^{\prime}italic_μ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and covariance ΣsuperscriptΣ\Sigma^{\prime}roman_Σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. 𝒩des=𝒩(μdes,Γdes)subscript𝒩𝑑𝑒𝑠𝒩subscript𝜇𝑑𝑒𝑠subscriptΓ𝑑𝑒𝑠\mathcal{N}_{des}=\mathcal{N}(\mu_{des},\Gamma_{des})caligraphic_N start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT = caligraphic_N ( italic_μ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT , roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT ) and 𝒩data=𝒩(μdata,Γdata)subscript𝒩𝑑𝑎𝑡𝑎𝒩subscript𝜇𝑑𝑎𝑡𝑎subscriptΓ𝑑𝑎𝑡𝑎\mathcal{N}_{data}=\mathcal{N}(\mu_{data},\Gamma_{data})caligraphic_N start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT = caligraphic_N ( italic_μ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT , roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ), respectively. The Gaussian property of these densities and μdes=μdata=0subscript𝜇𝑑𝑒𝑠subscript𝜇𝑑𝑎𝑡𝑎0\mu_{des}=\mu_{data}=0italic_μ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT = italic_μ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT = 0 follow from Assumption 2. Therefore, these densities are fully characterized by their covariance matrices.

Since uk=Kxk+vksubscript𝑢𝑘𝐾subscript𝑥𝑘subscript𝑣𝑘u_{k}=Kx_{k}+v_{k}italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_K italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, the design covariance matrix satisfies

ΓdessubscriptΓ𝑑𝑒𝑠\displaystyle\Gamma_{des}roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT =[limk𝔼xkxklimk𝔼xkuklimk𝔼ukxklimk𝔼ukuk]absentmatrixsubscript𝑘𝔼subscript𝑥𝑘superscriptsubscript𝑥𝑘topsubscript𝑘𝔼subscript𝑥𝑘superscriptsubscript𝑢𝑘topsubscript𝑘𝔼subscript𝑢𝑘superscriptsubscript𝑥𝑘topsubscript𝑘𝔼subscript𝑢𝑘superscriptsubscript𝑢𝑘top\displaystyle=\begin{bmatrix}\lim_{k\to\infty}\mathbb{E\,}x_{k}x_{k}^{\top}&% \lim_{k\to\infty}\mathbb{E\,}x_{k}u_{k}^{\top}\\ \lim_{k\to\infty}\mathbb{E\,}u_{k}x_{k}^{\top}&\lim_{k\to\infty}\mathbb{E\,}u_% {k}u_{k}^{\top}\end{bmatrix}= [ start_ARG start_ROW start_CELL roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT blackboard_E italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT blackboard_E italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT blackboard_E italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT blackboard_E italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ]
=[ΣΣKKΣKΣK+V],absentmatrixΣΣsuperscript𝐾top𝐾Σ𝐾Σsuperscript𝐾top𝑉\displaystyle=\begin{bmatrix}\Sigma&\Sigma K^{\top}\\ K\Sigma&K\Sigma K^{\top}+V\end{bmatrix},= [ start_ARG start_ROW start_CELL roman_Σ end_CELL start_CELL roman_Σ italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_K roman_Σ end_CELL start_CELL italic_K roman_Σ italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_V end_CELL end_ROW end_ARG ] , (12)

where ΣΣ\Sigmaroman_Σ is as defined in (9). The empirical covariance matrix satisfies

Γdata1N+1[Dμdata][Dμdata]=[ΣdataHdataHdataMdata],subscriptΓ𝑑𝑎𝑡𝑎absent1𝑁1delimited-[]𝐷subscript𝜇𝑑𝑎𝑡𝑎superscriptdelimited-[]𝐷subscript𝜇𝑑𝑎𝑡𝑎topmissing-subexpressionabsentmatrixsubscriptΣ𝑑𝑎𝑡𝑎subscript𝐻𝑑𝑎𝑡𝑎superscriptsubscript𝐻𝑑𝑎𝑡𝑎topsubscript𝑀𝑑𝑎𝑡𝑎\displaystyle\begin{aligned} \Gamma_{data}&\approx\frac{1}{N+1}\left[D-\mu_{% data}\right]\left[D-\mu_{data}\right]^{\top}\\ &=\begin{bmatrix}\Sigma_{data}&H_{data}\\ H_{data}^{\top}&M_{data}\end{bmatrix},\end{aligned}start_ROW start_CELL roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT end_CELL start_CELL ≈ divide start_ARG 1 end_ARG start_ARG italic_N + 1 end_ARG [ italic_D - italic_μ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ] [ italic_D - italic_μ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = [ start_ARG start_ROW start_CELL roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT end_CELL start_CELL italic_H start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_H start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL italic_M start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] , end_CELL end_ROW (13)

where Hdata=(N+1)1XUsubscript𝐻𝑑𝑎𝑡𝑎superscript𝑁11𝑋superscript𝑈topH_{data}=(N+1)^{-1}XU^{\top}italic_H start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT = ( italic_N + 1 ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X italic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT and Mdata=(N+1)1UUsubscript𝑀𝑑𝑎𝑡𝑎superscript𝑁11𝑈superscript𝑈topM_{data}=(N+1)^{-1}UU^{\top}italic_M start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT = ( italic_N + 1 ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_U italic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT.

It is not clear whether one can enforce the closeness of ΓdatasubscriptΓ𝑑𝑎𝑡𝑎\Gamma_{data}roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT to ΓdessubscriptΓ𝑑𝑒𝑠\Gamma_{des}roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT, as was done in Problems 4 and 5, via LMIs in ΣΣ\Sigmaroman_Σ and K𝐾Kitalic_K (or L𝐿Litalic_L). Instead, we use the Kullback–Leibler divergence, notated dKL()d_{KL}(\cdot\mid\mid\cdot)italic_d start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT ( ⋅ ∣ ∣ ⋅ ), between the densities 𝒩dessubscript𝒩𝑑𝑒𝑠\mathcal{N}_{des}caligraphic_N start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT and 𝒩datasubscript𝒩𝑑𝑎𝑡𝑎\mathcal{N}_{data}caligraphic_N start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT, which reduces to an expression in terms of their covariances. This expression, under some relaxation, can be posed as an affine regularization term and LMI constraints in ΣΣ\Sigmaroman_Σ and L𝐿Litalic_L.

Although the KL divergence is not a full-fledged metric, it is non-negative, and zero if and only if the two distributions are equal (almost everywhere) [37, Sec. 8.6]. It is a measure of the inefficiency of assuming one distribution when the true distribution is another, and it has been used in the contexts of exploration vs. exploitation in artificial intelligence and RL [35] and in constructing ambiguity sets in distributionally robust optimization techniques [29].

Lemma 1.

[30] The Kullback–Leibler divergence metric between 𝒩dessubscript𝒩𝑑𝑒𝑠\mathcal{N}_{des}caligraphic_N start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT and 𝒩datasubscript𝒩𝑑𝑎𝑡𝑎\mathcal{N}_{data}caligraphic_N start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT is

dKL(𝒩des𝒩data)=rx+ru𝒩des(x)log𝒩des(x)𝒩data(x)dx\displaystyle d_{KL}(\mathcal{N}_{des}\mid\mid\mathcal{N}_{data})=\int_{% \mathbb{R}^{r_{x}+r_{u}}}\mathcal{N}_{des}(x)\log\frac{\mathcal{N}_{des}(x)}{% \mathcal{N}_{data}(x)}dxitalic_d start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT ( caligraphic_N start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT ∣ ∣ caligraphic_N start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ) = ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT + italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT ( italic_x ) roman_log divide start_ARG caligraphic_N start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG caligraphic_N start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ( italic_x ) end_ARG italic_d italic_x
=12[tr(Γdata1Γdes)+(μdesμdata)Γdata1(μdesμdata)\displaystyle=\frac{1}{2}\Bigg{[}\textit{tr}(\Gamma_{data}^{-1}\Gamma_{des})+(% \mu_{des}-\mu_{data})^{\top}\Gamma_{data}^{-1}(\mu_{des}-\mu_{data})= divide start_ARG 1 end_ARG start_ARG 2 end_ARG [ tr ( roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT ) + ( italic_μ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT )
(rx+ru)+logdet Γdatadet Γdes].\displaystyle\hskip 42.67912pt-(r_{x}+r_{u})+\log\frac{\text{det }\Gamma_{data% }}{\text{det }\Gamma_{des}}\Bigg{]}.- ( italic_r start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT + italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) + roman_log divide start_ARG det roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT end_ARG start_ARG det roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT end_ARG ] . (14)

We drop the second term in (14), since μdes=μdata=0subscript𝜇𝑑𝑒𝑠subscript𝜇𝑑𝑎𝑡𝑎0\mu_{des}=\mu_{data}=0italic_μ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT = italic_μ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT = 0, and the additive constant (rx+ru)subscript𝑟𝑥subscript𝑟𝑢-(r_{x}+r_{u})- ( italic_r start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT + italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ), since it does not alter the optimal control. The modified KL-based regularization term is then

d¯KL(𝒩des𝒩data)=12[tr(Γdata1Γdes)+logdet Γdatadet Γdes].\displaystyle\bar{d}_{KL}(\mathcal{N}_{des}\mid\mid\mathcal{N}_{data})=\frac{1% }{2}\Bigg{[}\textit{tr}(\Gamma_{data}^{-1}\Gamma_{des})+\log\frac{\text{det }% \Gamma_{data}}{\text{det }\Gamma_{des}}\Bigg{]}.over¯ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT ( caligraphic_N start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT ∣ ∣ caligraphic_N start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG [ tr ( roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT ) + roman_log divide start_ARG det roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT end_ARG start_ARG det roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT end_ARG ] .

We now employ an approximation to reshape d¯KL(𝒩des𝒩data)\bar{d}_{KL}(\mathcal{N}_{des}\mid\mid\mathcal{N}_{data})over¯ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT ( caligraphic_N start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT ∣ ∣ caligraphic_N start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ) into a form amenable to the linear control design machinery, using the following results.

Lemma 2.

The algebraic equality

logdet Γdatadet Γdes=trlog(ΓdataΓdes1)det subscriptΓ𝑑𝑎𝑡𝑎det subscriptΓ𝑑𝑒𝑠trsubscriptΓ𝑑𝑎𝑡𝑎superscriptsubscriptΓ𝑑𝑒𝑠1\displaystyle\log\frac{\text{det }\Gamma_{data}}{\text{det }\Gamma_{des}}=% \textit{tr}\log\left(\Gamma_{data}\Gamma_{des}^{-1}\right)roman_log divide start_ARG det roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT end_ARG start_ARG det roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT end_ARG = tr roman_log ( roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) (15)

holds.777The function log\logroman_log to the left is the real-valued logarithmic function while log\logroman_log to the right is the logarithm of a matrix, in the sense that a matrix exponential of the log\logroman_log of a matrix equals the matrix itself.

Proof.

This is immediate from a result in linear algebra for positive definite matrices. We can write

logdet Γdatadet Γdesdet subscriptΓ𝑑𝑎𝑡𝑎det subscriptΓ𝑑𝑒𝑠\displaystyle\log\frac{\text{det }\Gamma_{data}}{\text{det }\Gamma_{des}}roman_log divide start_ARG det roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT end_ARG start_ARG det roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT end_ARG =logdet Γdatalogdet Γdes,absentdet subscriptΓ𝑑𝑎𝑡𝑎det subscriptΓ𝑑𝑒𝑠\displaystyle=\log\text{det }\Gamma_{data}-\log\text{det }\Gamma_{des},= roman_log det roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT - roman_log det roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT ,
=(1)logdet Γdata+logdet Γdes1,superscript1absentdet subscriptΓ𝑑𝑎𝑡𝑎det superscriptsubscriptΓ𝑑𝑒𝑠1\displaystyle=^{(1)}\log\text{det }\Gamma_{data}+\log\text{det }\Gamma_{des}^{% -1},= start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT roman_log det roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT + roman_log det roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ,
=(2)trlogΓdata+trlogΓdes1,superscript2absenttrsubscriptΓ𝑑𝑎𝑡𝑎trsuperscriptsubscriptΓ𝑑𝑒𝑠1\displaystyle=^{(2)}\textit{tr}\log\Gamma_{data}+\textit{tr}\log\Gamma_{des}^{% -1},= start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT tr roman_log roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT + tr roman_log roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ,
=(3)trlog(ΓdataΓdes1),superscript3absenttrsubscriptΓ𝑑𝑎𝑡𝑎superscriptsubscriptΓ𝑑𝑒𝑠1\displaystyle=^{(3)}\textit{tr}\log\left(\Gamma_{data}\Gamma_{des}^{-1}\right),= start_POSTSUPERSCRIPT ( 3 ) end_POSTSUPERSCRIPT tr roman_log ( roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ,

where (1),(2),(3)123(1),(2),(3)( 1 ) , ( 2 ) , ( 3 ) are implied by the positive definiteness of ΓdessubscriptΓ𝑑𝑒𝑠\Gamma_{des}roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT and ΓdatasubscriptΓ𝑑𝑎𝑡𝑎\Gamma_{data}roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT, and the identity logdet=trlog\log\text{det}\,\cdot=\textit{tr}\log\cdotroman_log det ⋅ = tr roman_log ⋅ for nonsingular matrices. ∎

Toward an approximation of (15) that is convex in ΓdessubscriptΓ𝑑𝑒𝑠\Gamma_{des}roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT, suppose ΓdataΓdes1IsubscriptΓ𝑑𝑎𝑡𝑎superscriptsubscriptΓ𝑑𝑒𝑠1𝐼\Gamma_{data}\Gamma_{des}^{-1}\approx Iroman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ≈ italic_I. That is, suppose ΓdataΓdes1=I+ΘsubscriptΓ𝑑𝑎𝑡𝑎superscriptsubscriptΓ𝑑𝑒𝑠1𝐼Θ\Gamma_{data}\Gamma_{des}^{-1}=I+\Thetaroman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = italic_I + roman_Θ for a small perturbation matrix ΘΘ\Thetaroman_Θ. The matrix logarithm can be written in terms of its Taylor expansion as

log(I+Θ)=Θ12Θ2+13Θ3+,𝐼ΘΘ12superscriptΘ213superscriptΘ3\displaystyle\log\left(I+\Theta\right)=\Theta-\frac{1}{2}\Theta^{2}+\frac{1}{3% }\Theta^{3}+\ldots,roman_log ( italic_I + roman_Θ ) = roman_Θ - divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_Θ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 3 end_ARG roman_Θ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + … ,

which, for a first-order truncation, satisfies

log(ΓdataΓdes1)=log(I+Θ)subscriptΓ𝑑𝑎𝑡𝑎superscriptsubscriptΓ𝑑𝑒𝑠1𝐼Θ\displaystyle\log\left(\Gamma_{data}\Gamma_{des}^{-1}\right)=\log\left(I+% \Theta\right)roman_log ( roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) = roman_log ( italic_I + roman_Θ ) Θ=ΓdataΓdes1I.absentΘsubscriptΓ𝑑𝑎𝑡𝑎superscriptsubscriptΓ𝑑𝑒𝑠1𝐼\displaystyle\approx\Theta=\Gamma_{data}\Gamma_{des}^{-1}-I.≈ roman_Θ = roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT - italic_I . (16)

Substituting the approximation (16), after dropping the identity (additive constant), in (15), we can use a surrogate version of d¯KLsubscript¯𝑑𝐾𝐿\bar{d}_{KL}over¯ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT, call it F𝐹Fitalic_F, as a regularization term. For a fixed ΓdatasubscriptΓ𝑑𝑎𝑡𝑎\Gamma_{data}roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT,

F(Γdes)=tr(Γdata1Γdes+ΓdataΓdes1),𝐹subscriptΓ𝑑𝑒𝑠trsuperscriptsubscriptΓ𝑑𝑎𝑡𝑎1subscriptΓ𝑑𝑒𝑠subscriptΓ𝑑𝑎𝑡𝑎superscriptsubscriptΓ𝑑𝑒𝑠1\displaystyle F(\Gamma_{des})=\textit{tr}\left(\Gamma_{data}^{-1}\Gamma_{des}+% \Gamma_{data}\Gamma_{des}^{-1}\right),italic_F ( roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT ) = tr ( roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT + roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) , (17)

where the 1/2121/21 / 2 factor in d¯KLsubscript¯𝑑𝐾𝐿\bar{d}_{KL}over¯ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT is dropped and will be replaced by a hyperparameter γ>0𝛾0\gamma>0italic_γ > 0. The new version of the cost (8), with F𝐹Fitalic_F as a regularization term and γ>0𝛾0\gamma>0italic_γ > 0 a user-defined hyperparameter, is now designated by

JFsubscript𝐽𝐹\displaystyle J_{F}italic_J start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT =tr(QΣ)+tr(RKΣK)+γF.absenttr𝑄Σtr𝑅𝐾Σsuperscript𝐾top𝛾𝐹\displaystyle=\textit{tr}\left(Q\Sigma\right)+\textit{tr}\left(RK\Sigma K^{% \top}\right)+\gamma F.= tr ( italic_Q roman_Σ ) + tr ( italic_R italic_K roman_Σ italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) + italic_γ italic_F . (18)

Next we show that the regularization term F𝐹Fitalic_F, although an approximation of the KL divergence, has favorable properties for our purposes.

4.3 Properties of F𝐹Fitalic_F

We show that Γdes=ΓdatasubscriptΓ𝑑𝑒𝑠subscriptΓ𝑑𝑎𝑡𝑎\Gamma_{des}=\Gamma_{data}roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT = roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT is the unique minimizer of the regularization term F𝐹Fitalic_F, or equivalently of JFsubscript𝐽𝐹J_{F}italic_J start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT as γ𝛾\gamma\to\inftyitalic_γ → ∞. That is, the regularization term enforces conforming to the data distribution, even though F𝐹Fitalic_F is not the KL divergence, after adopting the approximation (16). We take as given the arithmetic and geometric means inequality, which we state as the following lemma.

Lemma 3.

Let λ>0𝜆0\lambda>0italic_λ > 0. Then λ+1/λ2𝜆1𝜆2\lambda+1/\lambda\geq 2italic_λ + 1 / italic_λ ≥ 2, with equality if and only if λ=1𝜆1\lambda=1italic_λ = 1. ∎

From this, we conclude the subsequent lemma about positive definite matrices.

Lemma 4.

Let E𝐸Eitalic_E be an n×n𝑛𝑛n\times nitalic_n × italic_n symmetric positive definite matrix. Then E=𝕀𝐸𝕀E=\mathbb{I}italic_E = blackboard_I is the unique minimizer of F¯(E)=tr(E+E1)¯𝐹𝐸tr𝐸superscript𝐸1\bar{F}(E)=\textit{tr}\left(E+E^{-1}\right)over¯ start_ARG italic_F end_ARG ( italic_E ) = tr ( italic_E + italic_E start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ).

Proof.

Since E𝐸Eitalic_E is symmetric positive definite, we may write the unitary diagonalization E=ΨΛΨT𝐸ΨΛsuperscriptΨ𝑇E=\Psi\Lambda\Psi^{T}italic_E = roman_Ψ roman_Λ roman_Ψ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, where Λ0succeedsΛ0\Lambda\succ 0roman_Λ ≻ 0 is diagonal and ΨΨ\Psiroman_Ψ is unitary. Therefore, by algebraic identities F¯(E)=tr(Ψ(Λ+Λ1)ΨT)=tr((Λ+Λ1)ΨTΨ)=tr(Λ+Λ1)¯𝐹𝐸trΨΛsuperscriptΛ1superscriptΨ𝑇trΛsuperscriptΛ1superscriptΨ𝑇ΨtrΛsuperscriptΛ1\bar{F}(E)=\textit{tr}\left(\Psi\left(\Lambda+\Lambda^{-1}\right)\Psi^{T}% \right)=\textit{tr}\left(\left(\Lambda+\Lambda^{-1}\right)\Psi^{T}\Psi\right)=% \textit{tr}\left(\Lambda+\Lambda^{-1}\right)over¯ start_ARG italic_F end_ARG ( italic_E ) = tr ( roman_Ψ ( roman_Λ + roman_Λ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) roman_Ψ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) = tr ( ( roman_Λ + roman_Λ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) roman_Ψ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_Ψ ) = tr ( roman_Λ + roman_Λ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ). That is,

F¯(E)=tr(Λ+Λ1)=in(λi+1λi),¯𝐹𝐸trΛsuperscriptΛ1superscriptsubscript𝑖𝑛subscript𝜆𝑖1subscript𝜆𝑖\displaystyle\bar{F}(E)=\textit{tr}\left(\Lambda+\Lambda^{-1}\right)=\sum_{i}^% {n}\left(\lambda_{i}+\frac{1}{\lambda_{i}}\right),over¯ start_ARG italic_F end_ARG ( italic_E ) = tr ( roman_Λ + roman_Λ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) ,

where λi>0,i=1,,n,formulae-sequencesubscript𝜆𝑖0𝑖1𝑛\lambda_{i}>0,\,i=1,\ldots,n,italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0 , italic_i = 1 , … , italic_n , are the eigenvalues of E𝐸Eitalic_E. The unique minimum happens when each λi=1subscript𝜆𝑖1\lambda_{i}=1italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1, per Lemma 3. That is, when E=Λ=𝕀𝐸Λ𝕀E=\Lambda=\mathbb{I}italic_E = roman_Λ = blackboard_I (𝕀𝕀\mathbb{I}blackboard_I is the only positive definite matrix of eigenvalues of 1111), it is the unique minimizer of F¯¯𝐹\bar{F}over¯ start_ARG italic_F end_ARG. ∎

Theorem 1.

Γdes=ΓdatasubscriptΓ𝑑𝑒𝑠subscriptΓ𝑑𝑎𝑡𝑎\Gamma_{des}=\Gamma_{data}roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT = roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT is the global minimizer of

F(Γdes)=tr(Γdata1Γdes+ΓdataΓdes1).𝐹subscriptΓ𝑑𝑒𝑠trsuperscriptsubscriptΓ𝑑𝑎𝑡𝑎1subscriptΓ𝑑𝑒𝑠subscriptΓ𝑑𝑎𝑡𝑎superscriptsubscriptΓ𝑑𝑒𝑠1\displaystyle F(\Gamma_{des})=\textit{tr}\left(\Gamma_{data}^{-1}\Gamma_{des}+% \Gamma_{data}\Gamma_{des}^{-1}\right).italic_F ( roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT ) = tr ( roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT + roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) .
Proof.

By the linearity and the cyclic property of the trace, we can rewrite F𝐹Fitalic_F as follows:

F(Γdes)=tr(Γdata12ΓdesΓdata12+Γdata12Γdes1Γdata12).𝐹subscriptΓ𝑑𝑒𝑠trsuperscriptsubscriptΓ𝑑𝑎𝑡𝑎12subscriptΓ𝑑𝑒𝑠superscriptsubscriptΓ𝑑𝑎𝑡𝑎12superscriptsubscriptΓ𝑑𝑎𝑡𝑎12superscriptsubscriptΓ𝑑𝑒𝑠1superscriptsubscriptΓ𝑑𝑎𝑡𝑎12\displaystyle F(\Gamma_{des})=\textit{tr}\left(\Gamma_{data}^{-\frac{1}{2}}% \Gamma_{des}\Gamma_{data}^{-\frac{1}{2}}+\Gamma_{data}^{\frac{1}{2}}\Gamma_{% des}^{-1}\Gamma_{data}^{\frac{1}{2}}\right).italic_F ( roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT ) = tr ( roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT + roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) .

If we let E=Γdata12ΣdesΣdata12𝐸superscriptsubscriptΓ𝑑𝑎𝑡𝑎12subscriptΣ𝑑𝑒𝑠superscriptsubscriptΣ𝑑𝑎𝑡𝑎12E=\Gamma_{data}^{-\frac{1}{2}}\Sigma_{des}\Sigma_{data}^{-\frac{1}{2}}italic_E = roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT, then we have

F¯(E)=tr(E+E1).¯𝐹𝐸tr𝐸superscript𝐸1\displaystyle\bar{F}(E)=\textit{tr}\left(E+E^{-1}\right).over¯ start_ARG italic_F end_ARG ( italic_E ) = tr ( italic_E + italic_E start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) .

Showing that Γdes=ΓdatasubscriptΓ𝑑𝑒𝑠subscriptΓ𝑑𝑎𝑡𝑎\Gamma_{des}=\Gamma_{data}roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT = roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT is the unique global minimizer of F𝐹Fitalic_F equates to showing that E=𝕀𝐸𝕀E=\mathbb{I}italic_E = blackboard_I is the unique global minimizer of F¯¯𝐹\bar{F}over¯ start_ARG italic_F end_ARG, which is true by Lemma 4. ∎

Next we show that the regularization term F𝐹Fitalic_F not only has a unique global minimizer that we seek to enforce, that is, Γdes=ΓdatasubscriptΓ𝑑𝑒𝑠subscriptΓ𝑑𝑎𝑡𝑎\Gamma_{des}=\Gamma_{data}roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT = roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT, but is also computationally appealing, convex in particular.

Theorem 2.

F𝐹Fitalic_F is convex in ΓdessubscriptΓ𝑑𝑒𝑠\Gamma_{des}roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT.

Proof.

The domain—the set of all positive definite matrices—is convex. The first term of F(Γdes)𝐹subscriptΓ𝑑𝑒𝑠F(\Gamma_{des})italic_F ( roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT ) is linear in ΓdessubscriptΓ𝑑𝑒𝑠\Gamma_{des}roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT, while the second term

tr(ΓdataΓdes1)trsubscriptΓ𝑑𝑎𝑡𝑎superscriptsubscriptΓ𝑑𝑒𝑠1\displaystyle\textit{tr}\left(\Gamma_{data}\Gamma_{des}^{-1}\right)tr ( roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) =tr(Γdata12Γdes1Γdata12)=iξiTΓdes1ξi,absenttrsuperscriptsubscriptΓ𝑑𝑎𝑡𝑎12superscriptsubscriptΓ𝑑𝑒𝑠1superscriptsubscriptΓ𝑑𝑎𝑡𝑎12subscript𝑖superscriptsubscript𝜉𝑖𝑇superscriptsubscriptΓ𝑑𝑒𝑠1subscript𝜉𝑖\displaystyle=\textit{tr}\left(\Gamma_{data}^{\frac{1}{2}}\Gamma_{des}^{-1}% \Gamma_{data}^{\frac{1}{2}}\right)=\sum_{i}\xi_{i}^{T}\Gamma_{des}^{-1}\xi_{i},= tr ( roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,

where ξisubscript𝜉𝑖\xi_{i}italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the i𝑖iitalic_ith column of Γdata12superscriptsubscriptΓ𝑑𝑎𝑡𝑎12\Gamma_{data}^{\frac{1}{2}}roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT. Each entry in the sum is convex in ΓdessubscriptΓ𝑑𝑒𝑠\Gamma_{des}roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT, as shown in [11, p. 76]. ∎

Notice that 12F12𝐹\frac{1}{2}Fdivide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_F is the Jeffreys divergence, a discrepancy metric between distributions [30] (Gaussians in this case). The above derivations highlight its relation to the KL divergence, which is more familiar in the exploration vs exploitation context.

The above results are concerned with the regularization term as a function of ΓdessubscriptΓ𝑑𝑒𝑠\Gamma_{des}roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT. However, the original LQR problem, whether Problem 2 or 3, has ΣΣ\Sigmaroman_Σ and L𝐿Litalic_L (L=KΣ𝐿𝐾ΣL=K\Sigmaitalic_L = italic_K roman_Σ) as decision variables. We show next that, under some relaxation, F𝐹Fitalic_F can be related back to the original decision variables affinely in the cost together with LMI constraints.

4.4 Representing F𝐹Fitalic_F in terms of ΣΣ\Sigmaroman_Σ and L𝐿Litalic_L

The first term of F𝐹Fitalic_F, tr(Γdata1Γdes)trsuperscriptsubscriptΓ𝑑𝑎𝑡𝑎1subscriptΓ𝑑𝑒𝑠\textit{tr}\left(\Gamma_{data}^{-1}\Gamma_{des}\right)tr ( roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT ), can be upper-bounded by tr(Γdata1Z1)trsuperscriptsubscriptΓ𝑑𝑎𝑡𝑎1subscript𝑍1\textit{tr}\left(\Gamma_{data}^{-1}Z_{1}\right)tr ( roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), where Z1Γdessucceeds-or-equalssubscript𝑍1subscriptΓ𝑑𝑒𝑠Z_{1}\succeq\Gamma_{des}italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⪰ roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT, or equivalently, committing to the change of variables L=KΣ𝐿𝐾ΣL=K\Sigmaitalic_L = italic_K roman_Σ,

Z1Γdessucceeds-or-equalssubscript𝑍1subscriptΓ𝑑𝑒𝑠\displaystyle Z_{1}\succeq\Gamma_{des}italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⪰ roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT =[ΣΣKKΣKΣK+V],absentmatrixΣΣsuperscript𝐾top𝐾Σ𝐾Σsuperscript𝐾top𝑉\displaystyle=\begin{bmatrix}\Sigma&\Sigma K^{\top}\\ K\Sigma&K\Sigma K^{\top}+V\end{bmatrix},= [ start_ARG start_ROW start_CELL roman_Σ end_CELL start_CELL roman_Σ italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_K roman_Σ end_CELL start_CELL italic_K roman_Σ italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_V end_CELL end_ROW end_ARG ] ,
=[ΣL]Σ1[ΣL]+𝒱,absentmatrixΣ𝐿superscriptΣ1matrixΣsuperscript𝐿top𝒱\displaystyle=\begin{bmatrix}\Sigma\\ L\end{bmatrix}\Sigma^{-1}\begin{bmatrix}\Sigma&L^{\top}\end{bmatrix}+\mathcal{% V},= [ start_ARG start_ROW start_CELL roman_Σ end_CELL end_ROW start_ROW start_CELL italic_L end_CELL end_ROW end_ARG ] roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARG start_ROW start_CELL roman_Σ end_CELL start_CELL italic_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] + caligraphic_V ,

which can be equivalently described by the LMI

[Z1𝒱[ΣL][ΣL]Σ]0,succeeds-or-equalsmatrixsubscript𝑍1𝒱matrixΣ𝐿matrixΣsuperscript𝐿topΣ0\displaystyle\begin{bmatrix}Z_{1}-\mathcal{V}&\begin{bmatrix}\Sigma\\ L\end{bmatrix}\\ \begin{bmatrix}\Sigma&L^{\top}\end{bmatrix}&\Sigma\end{bmatrix}\succeq 0,[ start_ARG start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - caligraphic_V end_CELL start_CELL [ start_ARG start_ROW start_CELL roman_Σ end_CELL end_ROW start_ROW start_CELL italic_L end_CELL end_ROW end_ARG ] end_CELL end_ROW start_ROW start_CELL [ start_ARG start_ROW start_CELL roman_Σ end_CELL start_CELL italic_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] end_CELL start_CELL roman_Σ end_CELL end_ROW end_ARG ] ⪰ 0 , (19)

where 𝒱=block-diag(0rx×rx,V)𝒱block-diagsubscript0subscript𝑟𝑥subscript𝑟𝑥𝑉\mathcal{V}=\text{block-diag}(0_{r_{x}\times r_{x}},V)caligraphic_V = block-diag ( 0 start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT × italic_r start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_V ).

The second term is more involved since it contains the inverse of ΓdessubscriptΓ𝑑𝑒𝑠\Gamma_{des}roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT. Applying the inverse of a partitioned matrix [24], we have

Γdes1=[Σ1+KV1KKV1V1KV1],superscriptsubscriptΓ𝑑𝑒𝑠1matrixsuperscriptΣ1superscript𝐾topsuperscript𝑉1𝐾superscript𝐾topsuperscript𝑉1superscript𝑉1𝐾superscript𝑉1\displaystyle\Gamma_{des}^{-1}=\begin{bmatrix}\Sigma^{-1}+K^{\top}V^{-1}K&-K^{% \top}V^{-1}\\ -V^{-1}K&V^{-1}\end{bmatrix},roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = [ start_ARG start_ROW start_CELL roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K end_CELL start_CELL - italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL - italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K end_CELL start_CELL italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] ,

which exists and is positive definite, since Γdes0succeedssubscriptΓ𝑑𝑒𝑠0\Gamma_{des}\succ 0roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT ≻ 0. Hence, the second term of F𝐹Fitalic_F

tr(ΓdataΓdes1)trsubscriptΓ𝑑𝑎𝑡𝑎superscriptsubscriptΓ𝑑𝑒𝑠1\displaystyle\textit{tr}\left(\Gamma_{data}\Gamma_{des}^{-1}\right)tr ( roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) =tr(ΣdataΣ1+ΣdataKV1K)+absentlimit-fromtrsubscriptΣ𝑑𝑎𝑡𝑎superscriptΣ1subscriptΣ𝑑𝑎𝑡𝑎superscript𝐾topsuperscript𝑉1𝐾\displaystyle=\textit{tr}\left(\Sigma_{data}\Sigma^{-1}+\Sigma_{data}K^{\top}V% ^{-1}K\right)+= tr ( roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K ) +
tr(MdataV12HdataKV1),trsubscript𝑀𝑑𝑎𝑡𝑎superscript𝑉12subscriptsuperscript𝐻top𝑑𝑎𝑡𝑎superscript𝐾topsuperscript𝑉1\displaystyle\hskip-28.45274pt\textit{tr}\left(M_{data}V^{-1}-2H^{\top}_{data}% K^{\top}V^{-1}\right),tr ( italic_M start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT - 2 italic_H start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ,
=tr(ΣdataΣ1)+tr(V1KΣdataK)+absenttrsubscriptΣ𝑑𝑎𝑡𝑎superscriptΣ1limit-fromtrsuperscript𝑉1𝐾subscriptΣ𝑑𝑎𝑡𝑎superscript𝐾top\displaystyle=\textit{tr}\left(\Sigma_{data}\Sigma^{-1}\right)+\textit{tr}% \left(V^{-1}K\Sigma_{data}K^{\top}\right)+= tr ( roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) + tr ( italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) +
tr(MdataV1)+tr(2KV1Hdata).trsubscript𝑀𝑑𝑎𝑡𝑎superscript𝑉1tr2superscript𝐾topsuperscript𝑉1subscriptsuperscript𝐻top𝑑𝑎𝑡𝑎\displaystyle\hskip-28.45274pt\textit{tr}\left(M_{data}V^{-1}\right)+\textit{% tr}\left(-2K^{\top}V^{-1}H^{\top}_{data}\right).tr ( italic_M start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) + tr ( - 2 italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ) . (20)
Lemma 5.

We can rewrite the following term as

tr(ΣdataKV1K2KV1Hdata)=trsubscriptΣ𝑑𝑎𝑡𝑎superscript𝐾topsuperscript𝑉1𝐾2superscript𝐾topsuperscript𝑉1subscriptsuperscript𝐻top𝑑𝑎𝑡𝑎absent\displaystyle\textit{tr}\left(\Sigma_{data}K^{\top}V^{-1}K-2K^{\top}V^{-1}H^{% \top}_{data}\right)=tr ( roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K - 2 italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ) = (21)
tr(V1[KΣHdataΣdata1Σ]Σ1ΣdataΣ1×\displaystyle\textit{tr}\Big{(}V^{-1}\left[K\Sigma-H_{data}^{\top}\Sigma_{data% }^{-1}\Sigma\right]\Sigma^{-1}\Sigma_{data}\Sigma^{-1}\timestr ( italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ italic_K roman_Σ - italic_H start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Σ ] roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ×
[KΣHdataΣdata1Σ]V1HdataΣdata1Hdata).\displaystyle\hskip-28.45274pt\left[K\Sigma-H_{data}^{\top}\Sigma_{data}^{-1}% \Sigma\right]^{\top}-V^{-1}H_{data}^{\top}\Sigma_{data}^{-1}H_{data}\Big{)}.[ italic_K roman_Σ - italic_H start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Σ ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ) .
Proof.

Using the linearity and the cyclic property of the trace, we reduce the right-hand side to the left one. ∎

After dropping the additive constants in ΣΣ\Sigmaroman_Σ and K𝐾Kitalic_K, from (4.4) and (LABEL:eq:_completing_the_squares), and toward forming LMIs, we relax the right-hand side of (LABEL:eq:_completing_the_squares) by assuming ΣΣdataΣsubscriptΣ𝑑𝑎𝑡𝑎\Sigma\approx\Sigma_{data}roman_Σ ≈ roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT; thus, Σ1ΣdataΣ1Σ1superscriptΣ1subscriptΣ𝑑𝑎𝑡𝑎superscriptΣ1superscriptΣ1\Sigma^{-1}\Sigma_{data}\Sigma^{-1}\approx\Sigma^{-1}roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ≈ roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. We use the extra variable Z2subscript𝑍2Z_{2}italic_Z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT such that

Z2[KΣHdataΣdata1Σ]Σ1[KΣHdataΣdata1Σ],succeeds-or-equalssubscript𝑍2delimited-[]𝐾Σsuperscriptsubscript𝐻𝑑𝑎𝑡𝑎topsuperscriptsubscriptΣ𝑑𝑎𝑡𝑎1ΣsuperscriptΣ1superscriptdelimited-[]𝐾Σsuperscriptsubscript𝐻𝑑𝑎𝑡𝑎topsuperscriptsubscriptΣ𝑑𝑎𝑡𝑎1Σtop\displaystyle Z_{2}\succeq\left[K\Sigma-H_{data}^{\top}\Sigma_{data}^{-1}% \Sigma\right]\Sigma^{-1}\left[K\Sigma-H_{data}^{\top}\Sigma_{data}^{-1}\Sigma% \right]^{\top},italic_Z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⪰ [ italic_K roman_Σ - italic_H start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Σ ] roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ italic_K roman_Σ - italic_H start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Σ ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ,

which, equivalently, as an LMI, is

[Z2LHdataΣdata1Σ[LHdataΣdata1Σ]Σ]0.succeeds-or-equalsmatrixsubscript𝑍2𝐿superscriptsubscript𝐻𝑑𝑎𝑡𝑎topsuperscriptsubscriptΣ𝑑𝑎𝑡𝑎1Σsuperscriptdelimited-[]𝐿superscriptsubscript𝐻𝑑𝑎𝑡𝑎topsuperscriptsubscriptΣ𝑑𝑎𝑡𝑎1ΣtopΣ0\displaystyle\begin{bmatrix}Z_{2}&L-H_{data}^{\top}\Sigma_{data}^{-1}\Sigma\\ \left[L-H_{data}^{\top}\Sigma_{data}^{-1}\Sigma\right]^{\top}&\Sigma\end{% bmatrix}\succeq 0.[ start_ARG start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL italic_L - italic_H start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Σ end_CELL end_ROW start_ROW start_CELL [ italic_L - italic_H start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Σ ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL roman_Σ end_CELL end_ROW end_ARG ] ⪰ 0 . (22)

We show empirically in Section 6 that, even after the above relaxation, the consistency with state-input joint distribution is still effectively enforced.

The term tr(ΣdataΣ1)trsubscriptΣ𝑑𝑎𝑡𝑎superscriptΣ1\textit{tr}\left(\Sigma_{data}\Sigma^{-1}\right)tr ( roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) can also be described by an LMI by using an extra variable Z3Σ1succeeds-or-equalssubscript𝑍3superscriptΣ1Z_{3}\succeq\Sigma^{-1}italic_Z start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ⪰ roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. Hence,

[Z3IIΣ]0.succeeds-or-equalsmatrixsubscript𝑍3𝐼𝐼Σ0\displaystyle\begin{bmatrix}Z_{3}&I\\ I&\Sigma\end{bmatrix}\succeq 0.[ start_ARG start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL start_CELL italic_I end_CELL end_ROW start_ROW start_CELL italic_I end_CELL start_CELL roman_Σ end_CELL end_ROW end_ARG ] ⪰ 0 . (23)

4.5 The joint state-input data-conforming data-driven LQR

Toward a state-input data-conforming LQR control, we adjust Problem 3 by including the regularization term F𝐹Fitalic_F and the accompanying extra variables and LMIs defined in (19), (22), and (23).

Problem 6.

Data-conforming (jointly in the state-input) data-driven LQR:

minΣ,L,Z0,Z1,Z2,Z3tr(QΣ)+tr(RZ0)+subscriptΣ𝐿subscript𝑍0subscript𝑍1subscript𝑍2subscript𝑍3tr𝑄Σlimit-fromtr𝑅subscript𝑍0\displaystyle\min_{\Sigma,\,L,\,Z_{0},\,Z_{1},\,Z_{2},\,Z_{3}}\textit{tr}\left% (Q\Sigma\right)+\textit{tr}\left(RZ_{0}\right)+roman_min start_POSTSUBSCRIPT roman_Σ , italic_L , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT tr ( italic_Q roman_Σ ) + tr ( italic_R italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) +
γ{tr(Γdata1Z1)+tr(V1Z2)+tr(ΣdataZ3)}𝛾trsuperscriptsubscriptΓ𝑑𝑎𝑡𝑎1subscript𝑍1trsuperscript𝑉1subscript𝑍2trsubscriptΣ𝑑𝑎𝑡𝑎subscript𝑍3\displaystyle\hskip 14.22636pt\gamma\Big{\{}\textit{tr}\left(\Gamma_{data}^{-1% }Z_{1}\right)+\textit{tr}\left(V^{-1}Z_{2}\right)+\textit{tr}\left(\Sigma_{% data}Z_{3}\right)\Big{\}}italic_γ { tr ( roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + tr ( italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + tr ( roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) }
s.t. Σ0,[Z0LLΣ]0,formulae-sequencesucceedss.t. Σ0succeeds-or-equalsmatrixsubscript𝑍0𝐿superscript𝐿topΣ0\displaystyle\text{s.t. }\Sigma\succ 0,\quad\begin{bmatrix}Z_{0}&L\\ L^{\top}&\Sigma\end{bmatrix}\succeq 0,s.t. roman_Σ ≻ 0 , [ start_ARG start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL start_CELL italic_L end_CELL end_ROW start_ROW start_CELL italic_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL roman_Σ end_CELL end_ROW end_ARG ] ⪰ 0 ,
[ΣW^B^VB^A^Σ+B^LΣA^+LB^Σ]0,succeeds-or-equalsmatrixΣ^𝑊^𝐵𝑉superscript^𝐵top^𝐴Σ^𝐵𝐿Σsuperscript^𝐴topsuperscript𝐿topsuperscript^𝐵topΣ0\displaystyle\begin{bmatrix}\Sigma-\widehat{W}-\widehat{B}V\widehat{B}^{\top}&% \widehat{A}\Sigma+\widehat{B}L\\ \Sigma\widehat{A}^{\top}+L^{\top}\widehat{B}^{\top}&\Sigma\end{bmatrix}\succeq 0,[ start_ARG start_ROW start_CELL roman_Σ - over^ start_ARG italic_W end_ARG - over^ start_ARG italic_B end_ARG italic_V over^ start_ARG italic_B end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL over^ start_ARG italic_A end_ARG roman_Σ + over^ start_ARG italic_B end_ARG italic_L end_CELL end_ROW start_ROW start_CELL roman_Σ over^ start_ARG italic_A end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG italic_B end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL roman_Σ end_CELL end_ROW end_ARG ] ⪰ 0 ,
[Z1𝒱[ΣL][ΣL]Σ]0,[Z3IIΣ]0,formulae-sequencesucceeds-or-equalsmatrixsubscript𝑍1𝒱matrixΣ𝐿matrixΣsuperscript𝐿topΣ0succeeds-or-equalsmatrixsubscript𝑍3𝐼𝐼Σ0\displaystyle\begin{bmatrix}Z_{1}-\mathcal{V}&\begin{bmatrix}\Sigma\\ L\end{bmatrix}\\ \begin{bmatrix}\Sigma&L^{\top}\end{bmatrix}&\Sigma\end{bmatrix}\succeq 0,\quad% \begin{bmatrix}Z_{3}&I\\ I&\Sigma\end{bmatrix}\succeq 0,[ start_ARG start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - caligraphic_V end_CELL start_CELL [ start_ARG start_ROW start_CELL roman_Σ end_CELL end_ROW start_ROW start_CELL italic_L end_CELL end_ROW end_ARG ] end_CELL end_ROW start_ROW start_CELL [ start_ARG start_ROW start_CELL roman_Σ end_CELL start_CELL italic_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] end_CELL start_CELL roman_Σ end_CELL end_ROW end_ARG ] ⪰ 0 , [ start_ARG start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL start_CELL italic_I end_CELL end_ROW start_ROW start_CELL italic_I end_CELL start_CELL roman_Σ end_CELL end_ROW end_ARG ] ⪰ 0 ,
[Z2LHdataΣdata1Σ[LHdataΣdata1Σ]Σ]0,succeeds-or-equalsmatrixsubscript𝑍2𝐿superscriptsubscript𝐻𝑑𝑎𝑡𝑎topsuperscriptsubscriptΣ𝑑𝑎𝑡𝑎1Σsuperscriptdelimited-[]𝐿superscriptsubscript𝐻𝑑𝑎𝑡𝑎topsuperscriptsubscriptΣ𝑑𝑎𝑡𝑎1ΣtopΣ0\displaystyle\begin{bmatrix}Z_{2}&L-H_{data}^{\top}\Sigma_{data}^{-1}\Sigma\\ \left[L-H_{data}^{\top}\Sigma_{data}^{-1}\Sigma\right]^{\top}&\Sigma\end{% bmatrix}\succeq 0,[ start_ARG start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL italic_L - italic_H start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Σ end_CELL end_ROW start_ROW start_CELL [ italic_L - italic_H start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Σ ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL roman_Σ end_CELL end_ROW end_ARG ] ⪰ 0 ,
A^,B^,W^ as in (11).^𝐴^𝐵^𝑊 as in (11)\displaystyle\widehat{A},\widehat{B},\widehat{W}\text{ as in \eqref{eq:linear % state-space ID}}.over^ start_ARG italic_A end_ARG , over^ start_ARG italic_B end_ARG , over^ start_ARG italic_W end_ARG as in ( ) .

The optimal control is recovered from the optimal values ΣsubscriptΣ\Sigma_{\star}roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT and Lsubscript𝐿L_{\star}italic_L start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT through K=LΣ1subscript𝐾subscript𝐿superscriptsubscriptΣ1K_{\star}=L_{\star}\Sigma_{\star}^{-1}italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT = italic_L start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT.

Remark 1.

Notice that if the zero mean condition is to be relaxed in Assumption 2, the corresponding term with μdes,μdata0subscript𝜇𝑑𝑒𝑠subscript𝜇𝑑𝑎𝑡𝑎0\mu_{des},\mu_{data}\neq 0italic_μ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ≠ 0 in (14) and the cost (2) are quadratic in the variables x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG and u¯¯𝑢\bar{u}over¯ start_ARG italic_u end_ARG, where μdes=[x¯,u¯]subscript𝜇𝑑𝑒𝑠superscriptsuperscript¯𝑥topsuperscript¯𝑢toptop\mu_{des}=\left[\bar{x}^{\top},\,\bar{u}^{\top}\right]^{\top}italic_μ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT = [ over¯ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , over¯ start_ARG italic_u end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, and subject to the linear constraint given by steady-state mean dynamics

x¯=(A+BK)x¯+Bu¯.¯𝑥𝐴𝐵𝐾¯𝑥𝐵¯𝑢\displaystyle\bar{x}=(A+BK)\bar{x}+B\bar{u}.over¯ start_ARG italic_x end_ARG = ( italic_A + italic_B italic_K ) over¯ start_ARG italic_x end_ARG + italic_B over¯ start_ARG italic_u end_ARG .

One therefore can solve Problem 6 to find the optimal gain K𝐾Kitalic_K and covariance ΣΣ\Sigmaroman_Σ and then, as a second stage, form a quadratic program in μdessubscript𝜇𝑑𝑒𝑠\mu_{des}italic_μ start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT.

This is common in scenarios where the current operating point during the data collection μdata=[x¯data,u¯data]subscript𝜇𝑑𝑎𝑡𝑎superscriptsubscriptsuperscript¯𝑥top𝑑𝑎𝑡𝑎subscriptsuperscript¯𝑢top𝑑𝑎𝑡𝑎top\mu_{data}=\left[\bar{x}^{\top}_{data},\,\bar{u}^{\top}_{data}\right]^{\top}italic_μ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT = [ over¯ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT , over¯ start_ARG italic_u end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT is different from the intended one. In such a case, we center the data and the system (1) around the latter and incorporate the quadratic program discussed above.

Remark 2.

As mentioned previously, the identification step, inside Problems 3, 4, 5, and 6, resembles a naive certainty equivalence approach [16]. One can enhance the robustness of these problems by incorporating a convex hull or a ball, according to some systems’ norm, centered around (A^,B^)^𝐴^𝐵(\widehat{A},\widehat{B})( over^ start_ARG italic_A end_ARG , over^ start_ARG italic_B end_ARG ), then search for a controller that stabilizes them all, while minimizing the cost and satisfying other constraints. This adjustment, say, to Problem 6, can be done through more LMI constraints [10] in the same decision variables. One also can, again, through LMI constraints, enforce robustness under state measurement noise (state is measured through ζksubscript𝜁𝑘\zeta_{k}italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, where ζk=xk+ηksubscript𝜁𝑘subscript𝑥𝑘subscript𝜂𝑘\zeta_{k}=x_{k}+\eta_{k}italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_η start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, for some white noise ηksubscript𝜂𝑘\eta_{k}italic_η start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT) as in [14] for when the covariance of the state evolution disturbance W0𝑊0W\neq 0italic_W ≠ 0, as in [38].

Remark 3.

The robustness results referred to in Remark 2 hold when the underlying true system (1) is linear. This motivates a new interpretation of our data-conforming framework.

Suppose data has been collected, a system has been identified, and a convex hull or a ball 𝔹𝔹\mathbb{B}blackboard_B has been constructed around this system’s estimate. When the new robust (e.g., according to [10]) controller is applied, new regions of the joint state-input space might be visited, due to the distributional shift, and new modes of nonlinearity activated. If the new system is to be identified and a new convex hull or ball to be constructed, they might be different from 𝔹𝔹\mathbb{B}blackboard_B used in the design process.

In other words, because of the nonlinearity of the underlying model in our case, the application of the new control law equates to the transformation of 𝔹𝔹\mathbb{B}blackboard_B. Hence, this invalidates the premise on which these robust control design approaches were built.

Our data-conforming framework helps in “damping” this transformation in 𝔹𝔹\mathbb{B}blackboard_B; and therefore, when incorporated to the data-driven robust design approaches [10, 38], it increases their validity via solidifying their premises.

5 On the optimality conditions of the proposed problems

In the proposed SDPs, we relaxed the Lyapunov equality (10) into an inequality such that we can represent it as an LMI in the decision variables. This relaxation is of no effect to the standard LQR SDP, as we will show shortly, but may have some consequences to the proposed data-conforming approaches.

Notice that Problems 3, 5 and 6, before the change of variables and before applying the equivalent LMI formulations, can be put in the following form (we use A,B,W𝐴𝐵𝑊A,B,Witalic_A , italic_B , italic_W instead of A^,B^,W^^𝐴^𝐵^𝑊\widehat{A},\widehat{B},\widehat{W}over^ start_ARG italic_A end_ARG , over^ start_ARG italic_B end_ARG , over^ start_ARG italic_W end_ARG for ease of notation)

minΣ,Ktr(QΣ)+tr(RKΣK)+Ξ(Σ,K)subscriptΣ𝐾tr𝑄Σtr𝑅𝐾Σsuperscript𝐾topΞΣ𝐾\displaystyle\min_{\Sigma,\,K}\textit{tr}\left(Q\Sigma\right)+\textit{tr}\left% (RK\Sigma K^{\top}\right)+\Xi\left(\Sigma,K\right)roman_min start_POSTSUBSCRIPT roman_Σ , italic_K end_POSTSUBSCRIPT tr ( italic_Q roman_Σ ) + tr ( italic_R italic_K roman_Σ italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) + roman_Ξ ( roman_Σ , italic_K )
s.t.
Σ[A+BK]Σ[A+BK]+W+BVB,succeeds-or-equalsΣdelimited-[]𝐴𝐵𝐾Σsuperscriptdelimited-[]𝐴𝐵𝐾top𝑊𝐵𝑉superscript𝐵top\displaystyle\Sigma\succeq\left[A+BK\right]\Sigma\left[A+BK\right]^{\top}+W+% BVB^{\top},roman_Σ ⪰ [ italic_A + italic_B italic_K ] roman_Σ [ italic_A + italic_B italic_K ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_W + italic_B italic_V italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , (24)

where Ξ(Σ,K)=0ΞΣ𝐾0\Xi\left(\Sigma,K\right)=0roman_Ξ ( roman_Σ , italic_K ) = 0 in Problem 3,

Ξ(Σ,K)=γtr([ΣΣdata][ΣΣdata])ΞΣ𝐾superscript𝛾trdelimited-[]ΣsubscriptΣ𝑑𝑎𝑡𝑎superscriptdelimited-[]ΣsubscriptΣ𝑑𝑎𝑡𝑎top\Xi\left(\Sigma,K\right)=\gamma^{\prime}\textit{tr}\left(\left[\Sigma-\Sigma_{% data}\right]\left[\Sigma-\Sigma_{data}\right]^{\top}\right)roman_Ξ ( roman_Σ , italic_K ) = italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT tr ( [ roman_Σ - roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ] [ roman_Σ - roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT )

in Problem 5, and

Ξ(Σ,K)=γ(tr(ΣdataΣ1)+tr(V1KΣdataK)\displaystyle\Xi\left(\Sigma,K\right)=\gamma\Big{(}\textit{tr}\left(\Sigma_{% data}\Sigma^{-1}\right)+\textit{tr}\left(V^{-1}K\Sigma_{data}K^{\top}\right)roman_Ξ ( roman_Σ , italic_K ) = italic_γ ( tr ( roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) + tr ( italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT )
+tr(2KV1Hdata)tr2superscript𝐾topsuperscript𝑉1subscriptsuperscript𝐻top𝑑𝑎𝑡𝑎\displaystyle\hskip 28.45274pt+\textit{tr}\left(-2K^{\top}V^{-1}H^{\top}_{data% }\right)+ tr ( - 2 italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT )
+tr(𝒜Σ+KΣ)+tr(ΣK+𝒞KΣK))\displaystyle+\textit{tr}\left(\mathcal{A}\Sigma+\mathcal{B}K\Sigma\right)+% \textit{tr}\left(\mathcal{B}^{\top}\Sigma K^{\top}+\mathcal{C}K\Sigma K^{\top}% \right)\Big{)}+ tr ( caligraphic_A roman_Σ + caligraphic_B italic_K roman_Σ ) + tr ( caligraphic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + caligraphic_C italic_K roman_Σ italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) )

(up to an additive constant in K,Σ𝐾ΣK,\Sigmaitalic_K , roman_Σ) in Problem 6, where the matrices in calligraphic font denote the block matrices of the appropriate size of Γdata1superscriptsubscriptΓ𝑑𝑎𝑡𝑎1\Gamma_{data}^{-1}roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, that is,

Γdata1=[𝒜𝒞].superscriptsubscriptΓ𝑑𝑎𝑡𝑎1matrix𝒜superscripttop𝒞\displaystyle\Gamma_{data}^{-1}=\begin{bmatrix}\mathcal{A}&\mathcal{B}\\ \mathcal{B}^{\top}&\mathcal{C}\end{bmatrix}.roman_Γ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = [ start_ARG start_ROW start_CELL caligraphic_A end_CELL start_CELL caligraphic_B end_CELL end_ROW start_ROW start_CELL caligraphic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL caligraphic_C end_CELL end_ROW end_ARG ] .

In the above problems, the Langrangian is given by

(Σ,K,Υ)=tr(QΣ)+tr(RKΣK)+Ξ(Σ,K)+Σ𝐾Υtr𝑄Σtr𝑅𝐾Σsuperscript𝐾toplimit-fromΞΣ𝐾\displaystyle\mathcal{L}(\Sigma,K,\Upsilon)=\textit{tr}\left(Q\Sigma\right)+% \textit{tr}\left(RK\Sigma K^{\top}\right)+\Xi\left(\Sigma,K\right)+caligraphic_L ( roman_Σ , italic_K , roman_Υ ) = tr ( italic_Q roman_Σ ) + tr ( italic_R italic_K roman_Σ italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) + roman_Ξ ( roman_Σ , italic_K ) +
tr(Υ[[A+BK]Σ[A+BK]Σ+W+BVB]),trΥdelimited-[]delimited-[]𝐴𝐵𝐾Σsuperscriptdelimited-[]𝐴𝐵𝐾topΣ𝑊𝐵𝑉superscript𝐵top\displaystyle\textit{tr}\Bigg{(}\Upsilon\Big{[}\left[A+BK\right]\Sigma\left[A+% BK\right]^{\top}-\Sigma+W+BVB^{\top}\Big{]}\Bigg{)},tr ( roman_Υ [ [ italic_A + italic_B italic_K ] roman_Σ [ italic_A + italic_B italic_K ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - roman_Σ + italic_W + italic_B italic_V italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] ) ,

where ΥΥ\Upsilonroman_Υ is the dual rx×rxsubscript𝑟𝑥subscript𝑟𝑥r_{x}\times r_{x}italic_r start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT × italic_r start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT matrix variable corresponding to the Lyapunov inequality constraint. From the dual feasibility and the zero gradient Karush-Kuhn-Tucker (KKT) [11] optimality conditions (derivatives of traces of matrices [31]), the optimal solution (K,Σ,Υ)subscript𝐾subscriptΣsubscriptΥ(K_{\star},\Sigma_{\star},\Upsilon_{\star})( italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT , roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT , roman_Υ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) has to satisfy [22]:

Υ0,succeeds-or-equalssubscriptΥ0\displaystyle\Upsilon_{\star}\succeq 0,roman_Υ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ⪰ 0 ,
Σ:Q+KRK+[A+BK]Υ[A+BK]Υ:Σ𝑄superscriptsubscript𝐾top𝑅subscript𝐾superscriptdelimited-[]𝐴𝐵subscript𝐾topsubscriptΥdelimited-[]𝐴𝐵subscript𝐾subscriptΥ\displaystyle\frac{\partial}{\partial\Sigma}:\,Q+K_{\star}^{\top}RK_{\star}+% \left[A+BK_{\star}\right]^{\top}\Upsilon_{\star}\left[A+BK_{\star}\right]-% \Upsilon_{\star}divide start_ARG ∂ end_ARG start_ARG ∂ roman_Σ end_ARG : italic_Q + italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT + [ italic_A + italic_B italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Υ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT [ italic_A + italic_B italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ] - roman_Υ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT
+ΣΞ(Σ,K)=0.ΣΞsubscriptΣsubscript𝐾0\displaystyle\hskip 56.9055pt+\frac{\partial}{\partial\Sigma}\Xi\left(\Sigma_{% \star},K_{\star}\right)=0.+ divide start_ARG ∂ end_ARG start_ARG ∂ roman_Σ end_ARG roman_Ξ ( roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT , italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) = 0 .

Notice that in the standard LQR formulation, Problem 3, when Ξ(Σ,K)=0ΞΣ𝐾0\Xi\left(\Sigma,K\right)=0roman_Ξ ( roman_Σ , italic_K ) = 0, ΥsubscriptΥ\Upsilon_{\star}roman_Υ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT is nothing but the observability-type Gramian P𝑃Pitalic_P in (7), that is, Υ=P0subscriptΥ𝑃succeeds0\Upsilon_{\star}=P\succ 0roman_Υ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT = italic_P ≻ 0. The positive definiteness of ΥsubscriptΥ\Upsilon_{\star}roman_Υ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT then implies that in the complementary slackness condition,

Υ[[A+BK]Σ[A+BK]Σ+W+BVB]=0,subscriptΥdelimited-[]delimited-[]𝐴𝐵subscript𝐾subscriptΣsuperscriptdelimited-[]𝐴𝐵subscript𝐾topsubscriptΣ𝑊𝐵𝑉superscript𝐵top0\displaystyle\Upsilon_{\star}\Big{[}\left[A+BK_{\star}\right]\Sigma_{\star}% \left[A+BK_{\star}\right]^{\top}-\Sigma_{\star}+W+BVB^{\top}\Big{]}=0,roman_Υ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT [ [ italic_A + italic_B italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ] roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT [ italic_A + italic_B italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT + italic_W + italic_B italic_V italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] = 0 ,

the constraint has to be active to satisfy the above condition, that is, (Σ,K)subscriptΣsubscript𝐾(\Sigma_{\star},K_{\star})( roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT , italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) satisfy equation (10),

[A+BK]Σ[A+BK]Σ+W+BVB=0,delimited-[]𝐴𝐵subscript𝐾subscriptΣsuperscriptdelimited-[]𝐴𝐵subscript𝐾topsubscriptΣ𝑊𝐵𝑉superscript𝐵top0\displaystyle\left[A+BK_{\star}\right]\Sigma_{\star}\left[A+BK_{\star}\right]^% {\top}-\Sigma_{\star}+W+BVB^{\top}=0,[ italic_A + italic_B italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ] roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT [ italic_A + italic_B italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT + italic_W + italic_B italic_V italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = 0 ,

or in other words, ΣsubscriptΣ\Sigma_{\star}roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT is exactly the steady-state covariance of the system state we expect to see in future data when applying the law uk=Kxk+vksubscript𝑢𝑘subscript𝐾subscript𝑥𝑘subscript𝑣𝑘u_{k}=K_{\star}x_{k}+v_{k}italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.

This, however, is not generally true for Problems 5 and 6, where the Lyapunov inequality may be inactive in certain situations. The zero gradient condition for these two problems, respectively,

Q+KRK+[A+BK]Υ[A+BK]Υ𝑄superscriptsubscript𝐾top𝑅subscript𝐾superscriptdelimited-[]𝐴𝐵subscript𝐾topsubscriptΥdelimited-[]𝐴𝐵subscript𝐾subscriptΥ\displaystyle Q+K_{\star}^{\top}RK_{\star}+\left[A+BK_{\star}\right]^{\top}% \Upsilon_{\star}\left[A+BK_{\star}\right]-\Upsilon_{\star}italic_Q + italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT + [ italic_A + italic_B italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Υ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT [ italic_A + italic_B italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ] - roman_Υ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT
+2γ[ΣΣdata]=0,2superscript𝛾delimited-[]subscriptΣsubscriptΣ𝑑𝑎𝑡𝑎0\displaystyle\hskip 56.9055pt+2\gamma^{\prime}\left[\Sigma_{\star}-\Sigma_{% data}\right]=0,+ 2 italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT [ roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT - roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ] = 0 ,

and

Q+γ𝒜+K[R+γ𝒞]K+γK+γK𝑄𝛾𝒜superscriptsubscript𝐾topdelimited-[]𝑅𝛾𝒞subscript𝐾𝛾superscriptsubscript𝐾topsuperscripttop𝛾subscript𝐾\displaystyle Q+\gamma\mathcal{A}+K_{\star}^{\top}\left[R+\gamma\mathcal{C}% \right]K_{\star}+\gamma K_{\star}^{\top}\mathcal{B}^{\top}+\gamma\mathcal{B}K_% {\star}italic_Q + italic_γ caligraphic_A + italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT [ italic_R + italic_γ caligraphic_C ] italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT + italic_γ italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT caligraphic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_γ caligraphic_B italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT
γΣ1ΣdataΣ1+[A+BK]Υ[A+BK]Υ=0.𝛾superscriptsubscriptΣ1subscriptΣ𝑑𝑎𝑡𝑎superscriptsubscriptΣ1superscriptdelimited-[]𝐴𝐵subscript𝐾topsubscriptΥdelimited-[]𝐴𝐵subscript𝐾subscriptΥ0\displaystyle\hskip 0.0pt-\gamma\Sigma_{\star}^{-1}\Sigma_{data}\Sigma_{\star}% ^{-1}+\left[A+BK_{\star}\right]^{\top}\Upsilon_{\star}\left[A+BK_{\star}\right% ]-\Upsilon_{\star}=0.- italic_γ roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + [ italic_A + italic_B italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Υ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT [ italic_A + italic_B italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ] - roman_Υ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT = 0 .

Using the complementary slackness condition again, to guarantee the satisfaction of the equality (10) in each problem, the terms

Q+KRK+2γ[ΣΣdata],𝑄superscriptsubscript𝐾top𝑅subscript𝐾2superscript𝛾delimited-[]subscriptΣsubscriptΣ𝑑𝑎𝑡𝑎\displaystyle Q+K_{\star}^{\top}RK_{\star}+2\gamma^{\prime}\left[\Sigma_{\star% }-\Sigma_{data}\right],italic_Q + italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT + 2 italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT [ roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT - roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ] , (25)

and

Q+γ𝒜+K[R+γ𝒞]K+γK+γK𝑄𝛾𝒜superscriptsubscript𝐾topdelimited-[]𝑅𝛾𝒞subscript𝐾𝛾superscriptsubscript𝐾topsuperscripttop𝛾subscript𝐾\displaystyle Q+\gamma\mathcal{A}+K_{\star}^{\top}\left[R+\gamma\mathcal{C}% \right]K_{\star}+\gamma K_{\star}^{\top}\mathcal{B}^{\top}+\gamma\mathcal{B}K_% {\star}italic_Q + italic_γ caligraphic_A + italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT [ italic_R + italic_γ caligraphic_C ] italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT + italic_γ italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT caligraphic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_γ caligraphic_B italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT (26)
γΣ1ΣdataΣ1,𝛾superscriptsubscriptΣ1subscriptΣ𝑑𝑎𝑡𝑎superscriptsubscriptΣ1\displaystyle-\gamma\Sigma_{\star}^{-1}\Sigma_{data}\Sigma_{\star}^{-1},- italic_γ roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ,

have to be positive definite, which might not be the case if ΣdatasubscriptΣ𝑑𝑎𝑡𝑎\Sigma_{data}roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT is significantly larger than ΣsubscriptΣ\Sigma_{\star}roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT. However, we show next that even if the terms (25) and (26) are not positive definite, it is still beneficial to have the inequality (24) inactive.

By the contrapositive argument, if the inequality constraint (24) is inactive, the terms (25) and (26) are either indefinite, or negative semi-definite. This can be implied when some eigenvalues of ΣdatasubscriptΣ𝑑𝑎𝑡𝑎\Sigma_{data}roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT are significantly larger than some of ΣsubscriptΣ\Sigma_{\star}roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT’s. In such case, two important observations can be made: (i) when ΣdatasubscriptΣ𝑑𝑎𝑡𝑎\Sigma_{data}roman_Σ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT has larger eigenvalues than ΣsubscriptΣ\Sigma_{\star}roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT, the data show that aggressive exploration has been done along the direction of the corresponding eigenvectors, while less aggressive (conservative) control design is achieved across the same directions, (ii) suppose ΣactualsubscriptΣ𝑎𝑐𝑡𝑢𝑎𝑙\Sigma_{actual}roman_Σ start_POSTSUBSCRIPT italic_a italic_c italic_t italic_u italic_a italic_l end_POSTSUBSCRIPT is the solution of the Lyapunov equation (10) when K=K𝐾subscript𝐾K=K_{\star}italic_K = italic_K start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT, then ΣactualΣprecedes-or-equalssubscriptΣ𝑎𝑐𝑡𝑢𝑎𝑙subscriptΣ\Sigma_{actual}\preceq\Sigma_{\star}roman_Σ start_POSTSUBSCRIPT italic_a italic_c italic_t italic_u italic_a italic_l end_POSTSUBSCRIPT ⪯ roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT, that is, the actual achieved design is more conservative in its exploration tendencies than the intended design. The inequality ΣactualΣprecedes-or-equalssubscriptΣ𝑎𝑐𝑡𝑢𝑎𝑙subscriptΣ\Sigma_{actual}\preceq\Sigma_{\star}roman_Σ start_POSTSUBSCRIPT italic_a italic_c italic_t italic_u italic_a italic_l end_POSTSUBSCRIPT ⪯ roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT holds because when the Lyapunov inequality (24) is inactive, it is implied that the optimal solution ΣsubscriptΣ\Sigma_{\star}roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT to the SDP is not the fixed point solution of the Lyapunov equation (10), hence, ΣsubscriptΣ\Sigma_{\star}roman_Σ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT is not the steady-state covariance, but rather an upper bound. That is, ΣactualsubscriptΣ𝑎𝑐𝑡𝑢𝑎𝑙\Sigma_{actual}roman_Σ start_POSTSUBSCRIPT italic_a italic_c italic_t italic_u italic_a italic_l end_POSTSUBSCRIPT is the smallest covariance that satisfies the Lyapunov inequality (24). This can also be seen when writing the optimality conditions of the SDP with the cost tr(Σ)trΣ\textit{tr}\left(\Sigma\right)tr ( roman_Σ ) and subject to the Lyapunov inequality (24).

The above observations show that the case of inactive constraints corresponds to a more conservative achieved control. We show empirically in the next section that the Lyapunov inequality constraint (24) is mostly active in our simulations, and that our approaches still yield favorable results for nonlinear systems, over the certainty equivalence LQR case, even when the Lyapunov constraint is inactive.

6 Numerical simulations

The scalability of our proposed data-conforming formulations is self-evident; as convex SDPs with affine costs and LMI constraints, they can be scaled to handle problems with high state-input dimensions. Instead, we choose to work with a simple, yet telling problem, crafted to further clarify and illustrate our data-conforming paradigm.

Suppose the data-generating system (1) is of the form888The results of this section can be reproduced using our open-source Julia code found at https://github.com/msramada/data-conforming-control.

(x1,k+1x2,k+1)=xk+1=[.98.10.95]xk+[0.1]uk+wk,matrixsubscript𝑥1𝑘1subscript𝑥2𝑘1subscript𝑥𝑘1matrix.98.10.95subscript𝑥𝑘matrix0.1subscript𝑢𝑘subscript𝑤𝑘\displaystyle\begin{pmatrix}x_{1,k+1}\\ x_{2,k+1}\end{pmatrix}=x_{k+1}=\begin{bmatrix}.98&.1\\ 0&.95\end{bmatrix}x_{k}+\begin{bmatrix}0\\ .1\end{bmatrix}u_{k}+w_{k},( start_ARG start_ROW start_CELL italic_x start_POSTSUBSCRIPT 1 , italic_k + 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUBSCRIPT 2 , italic_k + 1 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ) = italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL .98 end_CELL start_CELL .1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL .95 end_CELL end_ROW end_ARG ] italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + [ start_ARG start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL .1 end_CELL end_ROW end_ARG ] italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , (27)

with the noise wksubscript𝑤𝑘w_{k}italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT having a covariance W=diag(0.2, 0.1)𝑊diag0.20.1W=\text{diag}(0.2,\,0.1)italic_W = diag ( 0.2 , 0.1 ). Toward solving Problems 3 and 5, an experimental simulation was conducted using N=2000𝑁2000N=2000italic_N = 2000 collected state/input samples. The control uk=K0xk+vksubscript𝑢𝑘subscript𝐾0subscript𝑥𝑘subscript𝑣𝑘u_{k}=K_{0}x_{k}+v_{k}italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, where K0=[0.2,9.0]subscript𝐾00.29.0K_{0}=\left[-0.2,\,-9.0\right]italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = [ - 0.2 , - 9.0 ] is stabilizing, and vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a white noise of zero mean and covariance V=0.5𝑉0.5V=0.5italic_V = 0.5 (meeting the PE condition, Assumption 1).

Using these simulation samples to form the data matrices (4), we then solve Problems 3 and 5 and show their results in Figure 1. Each control gain resulting from these problems is used in a control law of the form uk=Kxk+vksubscript𝑢𝑘𝐾subscript𝑥𝑘subscript𝑣𝑘u_{k}=Kx_{k}+v_{k}italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_K italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and is run in a new simulation of the same number of time steps (2000200020002000). Figure 1 shows the resulting design distributions compared with the learning data. Notice that when γsuperscript𝛾\gamma^{\prime}italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT starts having bigger and bigger values, the state design distribution converges to the state data distribution because data conformity is more heavily weighted in the control design.

Notice also in Figure 1 (a), that the (empirical) support of the state distribution of the new designed closed-loop system goes beyond that of the data. Suppose, for instance, that these not well-explored regions contain issues such as nonlinearities, discontinuities, or some safety or design condition violations. These issues, if not known and modeled by the operator in the control design, are also not discovered by the initial experiment. Hence, a non-data-conforming control, as the one in Figure 1 (a), can lead to activating these issues due to its inherent generalization beyond data. On the other hand, in (b), (c) and (d), the new controller conforms to the learning data to different levels dictated by the values of γsuperscript𝛾\gamma^{\prime}italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in each.

Refer to caption
Figure 1: The x-axis and y-axis of each figure correspond to the state space, x1,ksubscript𝑥1𝑘x_{1,k}italic_x start_POSTSUBSCRIPT 1 , italic_k end_POSTSUBSCRIPT and x2,ksubscript𝑥2𝑘x_{2,k}italic_x start_POSTSUBSCRIPT 2 , italic_k end_POSTSUBSCRIPT, respectively. The orange squares are the state data resulting from the initial simulation experiment, using K0subscript𝐾0K_{0}italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. The blue x-crosses are the state data resulting from solving Problem 3 (or, equivalently, Problem 5 with γ=0superscript𝛾0\gamma^{\prime}=0italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 0) in (a), and Problem 5 with γ=5superscript𝛾5\gamma^{\prime}=5italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 5 in (b), γ=20superscript𝛾20\gamma^{\prime}=20italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 20 in (c), and γ=100superscript𝛾100\gamma^{\prime}=100italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 100 in (d).

To better illustrate the effect of unknown/unmodeled nonlinearities, we adjust the dynamics (27) such that it is now

xk+1=[.98.10.95]xk+(θx2,k20)+[0.1]uk+wk,subscript𝑥𝑘1matrix.98.10.95subscript𝑥𝑘matrix𝜃superscriptsubscript𝑥2𝑘20matrix0.1subscript𝑢𝑘subscript𝑤𝑘\displaystyle x_{k+1}=\begin{bmatrix}.98&.1\\ 0&.95\end{bmatrix}x_{k}+\begin{pmatrix}\theta x_{2,k}^{2}\\ 0\end{pmatrix}+\begin{bmatrix}0\\ .1\end{bmatrix}u_{k}+w_{k},italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL .98 end_CELL start_CELL .1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL .95 end_CELL end_ROW end_ARG ] italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ( start_ARG start_ROW start_CELL italic_θ italic_x start_POSTSUBSCRIPT 2 , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW end_ARG ) + [ start_ARG start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL .1 end_CELL end_ROW end_ARG ] italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , (28)

where θ=1/9𝜃19\theta=1/9italic_θ = 1 / 9. Notice now that the nonlinearity θx2,k2𝜃superscriptsubscript𝑥2𝑘2\theta x_{2,k}^{2}italic_θ italic_x start_POSTSUBSCRIPT 2 , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is relatively larger in the not well-explored regions. Naively applying Problem 3 can lead to unexpected behavior and possibly to instability.

Again, we simulate the new nonlinear system with an equivalent control uk=K0xk+vksubscript𝑢𝑘subscript𝐾0subscript𝑥𝑘subscript𝑣𝑘u_{k}=K_{0}x_{k}+v_{k}italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT as in the previous case and for N=2000𝑁2000N=2000italic_N = 2000 time steps. Then, at the last time step, we evaluate the new control laws, using the recorded past 2000200020002000 data points, and apply each control law immediately in feedback for a new 2000200020002000 time steps. The results of each simulation are illustrated in Figure 2. The instability resulting from relying on Problem 3 (or, equivalently, Problem 5 with γ=0superscript𝛾0\gamma^{\prime}=0italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 0) is shown in Figure 2: (1). In contrast, Figure 2: (2), (3), and (4) correspond to Problem 5 with γ=5, 20, 100superscript𝛾520100\gamma^{\prime}=5,\,20,\,100italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 5 , 20 , 100, respectively. The latter three cases enforce conforming to data and hence avoid unexplored regions, in particular those where the nonlinearity is dominant and dangerous.

Refer to caption
Figure 2: Values of x1,ksubscript𝑥1𝑘x_{1,k}italic_x start_POSTSUBSCRIPT 1 , italic_k end_POSTSUBSCRIPT for the N=2000𝑁2000N=2000italic_N = 2000 samples of the initial experiment then under the control law resulting from solving (1) Problem 3 (or, equivalently, Problem 5 with γ=0superscript𝛾0\gamma^{\prime}=0italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 0), and Problem 5 with (2) γ=5superscript𝛾5\gamma^{\prime}=5italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 5, (3) γ=20superscript𝛾20\gamma^{\prime}=20italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 20, and (4) γ=100superscript𝛾100\gamma^{\prime}=100italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 100.

The steps of the above procedure, from (i) an initial experiment of N=2000𝑁2000N=2000italic_N = 2000 bounded data, to (ii) the control design using the above four problems (Problems 3, and Problem 5 with γ=5, 20, 100superscript𝛾520100\gamma^{\prime}=5,\,20,\,100italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 5 , 20 , 100) and then (iii) apply each control in feedback for 2000200020002000 time steps, are repeated ((i)–(iii)) for 1000100010001000 repetitions. Then, we calculate the percentage of the stable simulations resulting from each procedure. The stability of each simulation is decided by the boundedness within some high threshold (50505050) in each state coordinate and pointwise in the 2000200020002000 time steps of step (iii). The Lyapunov inequality constraint (24) was active in almost all of these simulations. The percentages of the stable simulations are shown in Table 1, further highlighting the importance of data-conforming control design.

Table 1: Percentages of stable simulations (out of 1,000 simulations)
Problem 3 Problem 5 Problem 5 Problem 5
(Prob. 5, γ=0superscript𝛾0\gamma^{\prime}=0italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 0) (γ=5superscript𝛾5\gamma^{\prime}=5italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 5) (γ=20superscript𝛾20\gamma^{\prime}=20italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 20) (γ=100superscript𝛾100\gamma^{\prime}=100italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 100)
21.7% 99.2% 99.9% 100.0%

The examples in (27) and (28) share, in their structure, that the input enters the system linearly and with constant coefficients. Therefore, the effect of the input is the same on the state evolution, regardless of the contemporaneous state of the system. This is not true in general, and the effect of the input may be coupled with the current state—for instance, in bilinear systems. The following is a modification of (28) with a coupled state-input term:

xk+1subscript𝑥𝑘1\displaystyle x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT =[.98.10.95]xk+(θx2,k20)+absentmatrix.98.10.95subscript𝑥𝑘limit-frommatrix𝜃superscriptsubscript𝑥2𝑘20\displaystyle=\begin{bmatrix}.98&.1\\ 0&.95\end{bmatrix}x_{k}+\begin{pmatrix}\theta x_{2,k}^{2}\\ 0\end{pmatrix}+= [ start_ARG start_ROW start_CELL .98 end_CELL start_CELL .1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL .95 end_CELL end_ROW end_ARG ] italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ( start_ARG start_ROW start_CELL italic_θ italic_x start_POSTSUBSCRIPT 2 , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW end_ARG ) + (29)
[00.1+θtanhx1,k]uk+wk.matrix00.1𝜃subscript𝑥1𝑘subscript𝑢𝑘subscript𝑤𝑘\displaystyle\hskip 42.67912pt\begin{bmatrix}0\\ 0.1+\theta\tanh{x_{1,k}}\end{bmatrix}u_{k}+w_{k}.[ start_ARG start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0.1 + italic_θ roman_tanh italic_x start_POSTSUBSCRIPT 1 , italic_k end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT .

Similar to the previous two examples, we run an experimental simulation of this system for N=2000𝑁2000N=2000italic_N = 2000 time steps and collect the input and state data as in (4). At the final time step k=N=2000𝑘𝑁2000k=N=2000italic_k = italic_N = 2000, we use the collected data in solving Problems 3, 5, and 6, then use the resulting control laws for another 2000200020002000 samples. The results of these simulations are shown in Figure 3, showing the importance of introducing Problem 6 to account for the coupled state-input effect.

Similar to the construction of Table 1, we run the above procedure for 1000100010001000 repetitions of bounded experiments and calculate the percentages of stable simulations resulting from each of the four control design procedures (Problem 3, Problem 5 (with γ=20, 100superscript𝛾20100\gamma^{\prime}=20,\,100italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 20 , 100) and Problem 6 (with γ=10𝛾10\gamma=10italic_γ = 10)). The results are recorded in Table 2, which shows the importance of using the joint state-input data distribution when the system’s nonlinearities contain a coupled state-input effect, even though the Lyapunov inequality constraint (24) was mostly inactive.

Table 1: Percentages of stable simulations (out of 1000 simulations)
Problem 3 Problem 5 Problem 5 Problem 6
(Prob. 5, γ=0superscript𝛾0\gamma^{\prime}=0italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 0) (γ=20superscript𝛾20\gamma^{\prime}=20italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 20) (γ=100superscript𝛾100\gamma^{\prime}=100italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 100) (γ=10𝛾10\gamma=10italic_γ = 10)
0.07% 22.9% 22.5% 97.9%

We note that in all of these simulations, none of the tested problems returned any feasibility issues. Moreover, the computation time of each problem (about six milliseconds for Problem 6 on an Apple M1 Max MacBook Pro with 64 GB or RAMs) is comparable to solving a standard LQR problem.

These examples show how the concept of data-conforming control can enhance the safety of the data-driven approaches. In practice, a control engineer can start with a high value of γ𝛾\gammaitalic_γ (or γsuperscript𝛾\gamma^{\prime}italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT) then gradually reduce it, if needed, to allow some exploration beyond data. One of the important recommendations of the adaptive control literature [5, 4] is to model as much as possible, and leave any model reduction or abstraction for later. Our data-conforming framework can provide the safety and the time necessary for the control engineer to observe, model/interpret, and react to any potential activation of unknown/unmodeled nonlinearities.

Refer to caption
Figure 3: Value of x1,ksubscript𝑥1𝑘x_{1,k}italic_x start_POSTSUBSCRIPT 1 , italic_k end_POSTSUBSCRIPT for the N=2000𝑁2000N=2000italic_N = 2000 samples of the initial experiment then under the control law resulting from solving Problem 3, Problem 5 with γ=10, 100superscript𝛾10100\gamma^{\prime}=10,\,100italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 10 , 100, and Problem 6 (γ=10𝛾10\gamma=10italic_γ = 10). The last results in the only stabilizing control since it accounts for the joint state-input data.

7 Conclusion

This paper addresses a problem inherent in many modern data-driven and adaptive control approaches, namely, the problem of the premature and possibly false generalization beyond data, together with its consequences in terms of sudden distributional shifts in the state-input space. We present methods for mitigating this problem through enforcing consistency with the learning data, and we numerically test them. Because of the formulation of our methods as solutions to SDPs of affine costs and LMI constraints, they are computationally efficient, can scale up to systems with hundreds in dimension, and can be easily integrated with modern control design approaches.

Further work is being pursued to (i) apply the data-conformation concept in the problem of robust data-driven control, in the sense described in Remarks 2 and 3; and (ii) investigate the possibility of developing data-conforming policy gradient surrogates to dampen distributional shifts resulting from standard policy gradient methods.

Acknowledgment

This material was based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) under Contract DE-AC02-06CH11347.

References

  • [1] R. Agarwal, D. Schuurmans, and M. Norouzi, “An optimistic perspective on offline reinforcement learning,” in International conference on machine learning.   PMLR, 2020, pp. 104–114.
  • [2] P. Albertos and A. S. Piqueras, Iterative identification and control: advances in theory and applications.   Springer Science & Business Media, 2012.
  • [3] B. D. Anderson, “Windsurfing approach to iterative control design,” in Iterative Identification and Control: Advances in Theory and Applications.   Springer, 2002, pp. 143–166.
  • [4] Anderson, Brian DO, “Failures of adaptive control theory and their resolution,” 2005.
  • [5] K. Astrom, “A commentary on the C.E. Rohrs et al. paper ”robustness of continuous-time adaptive control algorithms in the presence of unmodeled dynamics”,” IEEE Transactions on Automatic Control, vol. 30, no. 9, pp. 889–889, 1985.
  • [6] K. Åström and B. Wittenmark, Adaptive Control: Second Edition, ser. Dover Books on Electrical Engineering.   Dover Publications, 2013. [Online]. Available: https://books.google.com/books?id=4CLCAgAAQBAJ
  • [7] Y. Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma, D. Drain, S. Fort, D. Ganguli, T. Henighan et al., “Training a helpful and harmless assistant with reinforcement learning from human feedback,” arXiv preprint arXiv:2204.05862, 2022.
  • [8] D. Bertsekas, Dynamic programming and optimal control: Volume I.   Athena scientific, 2012, vol. 4.
  • [9] S. Boyd, V. Balakrishnan, E. Feron, and L. ElGhaoui, “Control system analysis and synthesis via linear matrix inequalities,” in 1993 American Control Conference.   IEEE, 1993, pp. 2147–2154.
  • [10] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan, Linear matrix inequalities in system and control theory.   SIAM, 1994.
  • [11] S. P. Boyd and L. Vandenberghe, Convex optimization.   Cambridge University Press, 2004.
  • [12] F. B. Cabral and M. G. Safonov, “Unfalsified model reference adaptive control using the ellipsoid algorithm,” International Journal of Adaptive Control and Signal Processing, vol. 18, no. 8, pp. 683–696, 2004.
  • [13] J. Coulson, J. Lygeros, and F. Dörfler, “Data-enabled predictive control: In the shallows of the DeePC,” in 2019 18th European Control Conference (ECC).   IEEE, 2019, pp. 307–312.
  • [14] C. De Persis and P. Tesi, “Formulas for data-driven control: Stabilization, optimality, and robustness,” IEEE Transactions on Automatic Control, vol. 65, no. 3, pp. 909–924, 2019.
  • [15] De Persis, Claudio and Tesi, Pietro, “Learning controllers for nonlinear systems from data,” Annual Reviews in Control, p. 100915, 2023.
  • [16] S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu, “On the sample complexity of the linear quadratic regulator,” Foundations of Computational Mathematics, vol. 20, no. 4, pp. 633–679, 2020.
  • [17] P. Dorato, “A historical review of robust control,” IEEE Control Systems Magazine, vol. 7, no. 2, pp. 44–47, 1987.
  • [18] E. Ekomwenrenren, J. W. Simpson-Porco, E. Farantatos, M. Patel, A. Haddadi, and L. Zhu, “Data-driven fast frequency control using inverter-based resources,” IEEE Transactions on Power Systems, 2023.
  • [19] M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” in International conference on machine learning.   PMLR, 2018, pp. 1467–1476.
  • [20] S. Fujimoto and S. S. Gu, “A minimalist approach to offline reinforcement learning,” Advances in Neural Information Processing Systems, vol. 34, pp. 20 132–20 145, 2021.
  • [21] T. A. N. Heirung, B. E. Ydstie, and B. Foss, “Dual adaptive model predictive control,” Automatica, vol. 80, pp. 340–348, 2017.
  • [22] C. Helmberg, “Semidefinite programming,” European Journal of Operational Research, vol. 137, no. 3, pp. 461–482, 2002.
  • [23] T. Hey, S. Tansley, K. M. Tolle et al., The fourth paradigm: data-intensive scientific discovery.   Microsoft research Redmond, WA, 2009, vol. 1.
  • [24] R. A. Horn and C. R. Johnson, Matrix analysis.   Cambridge University Press, 2012.
  • [25] T. Kailath, Linear systems.   Prentice-Hall Englewood Cliffs, NJ, 1980, vol. 156.
  • [26] L. Ljung, “System identification,” in Signal analysis and prediction.   Springer, 1998, pp. 163–173.
  • [27] G. Marafioti, R. R. Bitmead, and M. Hovd, “Persistently exciting model predictive control,” International Journal of Adaptive Control and Signal Processing, vol. 28, no. 6, pp. 536–552, 2014.
  • [28] I. M. Mareels and R. R. Bitmead, “Non-linear dynamics in adaptive control: Periodic and chaotic stabilization — II. analysis,” Automatica, vol. 24, no. 4, pp. 485–497, 1988.
  • [29] P. Mohajerin Esfahani and D. Kuhn, “Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations,” Mathematical Programming, vol. 171, no. 1, pp. 115–166, 2018.
  • [30] L. Pardo, Statistical inference based on divergence measures.   Chapman and Hall/CRC, 2018.
  • [31] K. B. Petersen, M. S. Pedersen et al., “The matrix cookbook,” Technical University of Denmark, vol. 7, no. 15, p. 510, 2008.
  • [32] M. S. Ramadan and M. Anitescu, “Extended Kalman filter – Koopman operator for tractable stochastic optimal control,” IEEE Control Systems Letters, 2024.
  • [33] C. Rohrs, L. Valavani, M. Athans, and G. Stein, “Robustness of continuous-time adaptive control algorithms in the presence of unmodeled dynamics,” IEEE Transactions on Automatic Control, vol. 30, no. 9, pp. 881–889, 1985.
  • [34] M. G. Safonov, “Origins of robust control: Early history and future speculations,” Annual Reviews in Control, vol. 36, no. 2, pp. 173–181, 2012.
  • [35] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  • [36] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction.   MIT Press, 2018.
  • [37] M. Thomas and A. T. Joy, Elements of information theory.   Wiley-Interscience, 2006.
  • [38] H. J. van Waarde and M. K. Camlibel, “A matrix Finsler’s lemma with applications to data-driven control,” in 2021 60th IEEE Conference on Decision and Control (CDC).   IEEE, 2021, pp. 5777–5782.
  • [39] T. Wang, S. Herbert, and S. Gao, “Fractal landscapes in policy optimization,” Advances in Neural Information Processing Systems, vol. 36, 2024.
  • [40] J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. De Moor, “A note on persistency of excitation,” Systems & Control Letters, vol. 54, no. 4, pp. 325–329, 2005.
  • [41] Z. Yuan, C. Zhao, and J. Cortés, “Reinforcement learning for distributed transient frequency control with stability and safety guarantees,” Systems & Control Letters, vol. 185, p. 105753, 2024.
  • [42] F. Zhao, F. Dörfler, A. Chiuso, and K. You, “Data-enabled policy optimization for direct adaptive learning of the LQR,” arXiv preprint arXiv:2401.14871, 2024.

Government License: The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory (“Argonne”). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan. http://energy.gov/downloads/doe-public-access-plan.

Appendix A Appendix

A.1 The cost in the Plimit-from𝑃P-italic_P -parametrization

Starting from the cost (2),

J=limk𝔼{xkQxk+ukRuk},𝐽subscript𝑘𝔼superscriptsubscript𝑥𝑘top𝑄subscript𝑥𝑘superscriptsubscript𝑢𝑘top𝑅subscript𝑢𝑘\displaystyle J=\lim_{k\to\infty}\mathbb{E\,}\left\{x_{k}^{\top}Qx_{k}+u_{k}^{% \top}Ru_{k}\right\},italic_J = roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT blackboard_E { italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_Q italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } ,

substituting uk=Kxk+vksubscript𝑢𝑘𝐾subscript𝑥𝑘subscript𝑣𝑘u_{k}=Kx_{k}+v_{k}italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_K italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, and using the fact that vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a zero mean white noise,

J=limk𝔼{xk[Q+KRK]xk+vkRvk}.𝐽subscript𝑘𝔼superscriptsubscript𝑥𝑘topdelimited-[]𝑄superscript𝐾top𝑅𝐾subscript𝑥𝑘superscriptsubscript𝑣𝑘top𝑅subscript𝑣𝑘\displaystyle J=\lim_{k\to\infty}\mathbb{E\,}\left\{x_{k}^{\top}\left[Q+K^{% \top}RK\right]x_{k}+v_{k}^{\top}Rv_{k}\right\}.italic_J = roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT blackboard_E { italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT [ italic_Q + italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R italic_K ] italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } .

Using the cyclic property of the trace and the linearity of 𝔼𝔼\mathbb{E\,}blackboard_E, 𝔼{vkRvk}=tr(RV)𝔼superscriptsubscript𝑣𝑘top𝑅subscript𝑣𝑘tr𝑅𝑉\mathbb{E\,}\left\{v_{k}^{\top}Rv_{k}\right\}=\textit{tr}\left(RV\right)blackboard_E { italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } = tr ( italic_R italic_V ), which is an additive constant (for a fixed V𝑉Vitalic_V) and can be omitted from the cost. Now, using the cyclic property of the trace and the linearity of 𝔼𝔼\mathbb{E\,}blackboard_E again, we have

J𝐽\displaystyle Jitalic_J limk𝔼{xk[Q+KRK]xk},proportional-toabsentsubscript𝑘𝔼superscriptsubscript𝑥𝑘topdelimited-[]𝑄superscript𝐾top𝑅𝐾subscript𝑥𝑘\displaystyle\propto\lim_{k\to\infty}\mathbb{E\,}\left\{x_{k}^{\top}\left[Q+K^% {\top}RK\right]x_{k}\right\},∝ roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT blackboard_E { italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT [ italic_Q + italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R italic_K ] italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } ,
=tr(limk𝔼{[Q+KRK]xkxk}),absenttrsubscript𝑘𝔼delimited-[]𝑄superscript𝐾top𝑅𝐾subscript𝑥𝑘superscriptsubscript𝑥𝑘top\displaystyle=\textit{tr}\left(\lim_{k\to\infty}\mathbb{E\,}\left\{\left[Q+K^{% \top}RK\right]x_{k}x_{k}^{\top}\right\}\right),= tr ( roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT blackboard_E { [ italic_Q + italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R italic_K ] italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT } ) ,
=tr([Q+KRK]limk𝔼{xkxk}),absenttrdelimited-[]𝑄superscript𝐾top𝑅𝐾subscript𝑘𝔼subscript𝑥𝑘superscriptsubscript𝑥𝑘top\displaystyle=\textit{tr}\left(\left[Q+K^{\top}RK\right]\lim_{k\to\infty}% \mathbb{E\,}\left\{x_{k}x_{k}^{\top}\right\}\right),= tr ( [ italic_Q + italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R italic_K ] roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT blackboard_E { italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT } ) ,
=tr([Q+KRK]Σ).absenttrdelimited-[]𝑄superscript𝐾top𝑅𝐾Σ\displaystyle=\textit{tr}\left(\left[Q+K^{\top}RK\right]\Sigma\right).= tr ( [ italic_Q + italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R italic_K ] roman_Σ ) . (30)

The steady-state covariance of the state is designated by Σ=limk𝔼{xkxk}Σsubscript𝑘𝔼subscript𝑥𝑘superscriptsubscript𝑥𝑘top\Sigma=\lim_{k\to\infty}\mathbb{E\,}\left\{x_{k}x_{k}^{\top}\right\}roman_Σ = roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT blackboard_E { italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT }, which is also known as the controllability-type Gramian. Since

xk=[A+BK]kx0+i=0k1[A+BK]k1i(wi+Bvi),subscript𝑥𝑘superscriptdelimited-[]𝐴𝐵𝐾𝑘subscript𝑥0superscriptsubscript𝑖0𝑘1superscriptdelimited-[]𝐴𝐵𝐾𝑘1𝑖subscript𝑤𝑖𝐵subscript𝑣𝑖\displaystyle x_{k}=\left[A+BK\right]^{k}x_{0}+\sum_{i=0}^{k-1}\left[A+BK% \right]^{k-1-i}\left(w_{i}+Bv_{i}\right),italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = [ italic_A + italic_B italic_K ] start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT [ italic_A + italic_B italic_K ] start_POSTSUPERSCRIPT italic_k - 1 - italic_i end_POSTSUPERSCRIPT ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_B italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ,

and using the fact that vi,wisubscript𝑣𝑖subscript𝑤𝑖v_{i},\,w_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are white, of zero mean and mutually independent from each other and from x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, this covariance can be described by

ΣΣ\displaystyle\Sigmaroman_Σ =limk{[A+BK]kΣx0[A+BK]k,+\displaystyle=\lim_{k\to\infty}\Big{\{}\left[A+BK\right]^{k}\Sigma_{x_{0}}% \left[A+BK\right]^{k,\,\top}+= roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT { [ italic_A + italic_B italic_K ] start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_A + italic_B italic_K ] start_POSTSUPERSCRIPT italic_k , ⊤ end_POSTSUPERSCRIPT +
i=0k1[A+BK]k1i[W+BVB][A+BK]k1i,},\displaystyle\hskip-14.22636pt\sum_{i=0}^{k-1}\left[A+BK\right]^{k-1-i}\left[W% +BVB^{\top}\right]\left[A+BK\right]^{k-1-i,\,\top}\Big{\}},∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT [ italic_A + italic_B italic_K ] start_POSTSUPERSCRIPT italic_k - 1 - italic_i end_POSTSUPERSCRIPT [ italic_W + italic_B italic_V italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] [ italic_A + italic_B italic_K ] start_POSTSUPERSCRIPT italic_k - 1 - italic_i , ⊤ end_POSTSUPERSCRIPT } ,

or, if A+BK𝐴𝐵𝐾A+BKitalic_A + italic_B italic_K is hurwitz, the transient term vanishes and we have

Σ=k=0[A+BK]k[W+BVB][A+BK]k,.Σsuperscriptsubscript𝑘0superscriptdelimited-[]𝐴𝐵𝐾𝑘delimited-[]𝑊𝐵𝑉superscript𝐵topsuperscriptdelimited-[]𝐴𝐵𝐾𝑘top\displaystyle\Sigma=\sum_{k=0}^{\infty}\left[A+BK\right]^{k}\left[W+BVB^{\top}% \right]\left[A+BK\right]^{k,\,\top}.roman_Σ = ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT [ italic_A + italic_B italic_K ] start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT [ italic_W + italic_B italic_V italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] [ italic_A + italic_B italic_K ] start_POSTSUPERSCRIPT italic_k , ⊤ end_POSTSUPERSCRIPT .

After substituting this description of ΣΣ\Sigmaroman_Σ in (30), and using the cyclic property of the trace operator, we have

J𝐽\displaystyle Jitalic_J tr([Q+KRK]×\displaystyle\propto\textit{tr}\Big{(}\left[Q+K^{\top}RK\right]\times∝ tr ( [ italic_Q + italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R italic_K ] ×
k=0[A+BK]k[W+BVB][A+BK]k,),\displaystyle\hskip 14.22636pt\sum_{k=0}^{\infty}\left[A+BK\right]^{k}\left[W+% BVB^{\top}\right]\left[A+BK\right]^{k,\,\top}\Big{)},∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT [ italic_A + italic_B italic_K ] start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT [ italic_W + italic_B italic_V italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] [ italic_A + italic_B italic_K ] start_POSTSUPERSCRIPT italic_k , ⊤ end_POSTSUPERSCRIPT ) ,
=tr(k=0[A+BK]k,[Q+KRK][A+BK]k×\displaystyle=\textit{tr}\Big{(}\sum_{k=0}^{\infty}\left[A+BK\right]^{k,\,\top% }\left[Q+K^{\top}RK\right]\left[A+BK\right]^{k}\times= tr ( ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT [ italic_A + italic_B italic_K ] start_POSTSUPERSCRIPT italic_k , ⊤ end_POSTSUPERSCRIPT [ italic_Q + italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R italic_K ] [ italic_A + italic_B italic_K ] start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ×
[W+BVB]),\displaystyle\hskip 14.22636pt\left[W+BVB^{\top}\right]\Big{)},[ italic_W + italic_B italic_V italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] ) ,
=tr(P[W+BVB]),absenttr𝑃delimited-[]𝑊𝐵𝑉superscript𝐵top\displaystyle=\textit{tr}\left(P\left[W+BVB^{\top}\right]\right),= tr ( italic_P [ italic_W + italic_B italic_V italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] ) ,

where P𝑃Pitalic_P is the observability-type Gramian, as in (6).