Data-conforming data-driven control: avoiding premature generalizations beyond data
Abstract
Data-driven and adaptive control approaches face the problem of introducing sudden distributional shifts beyond the distribution of data encountered during learning. Therefore, they are prone to invalidating the very assumptions used in their own construction. This is due to the linearity of the underlying system, inherently assumed and formulated in most data-driven control approaches, which may falsely generalize the behavior of the system beyond the behavior experienced in the data. This paper seeks to mitigate these problems by enforcing consistency of the newly designed closed-loop systems with data and slow down any distributional shifts in the joint state-input space. This is achieved through incorporating affine regularization terms and linear matrix inequality constraints to data-driven approaches, resulting in convex semi-definite programs that can be efficiently solved by standard software packages. We discuss the optimality conditions of these programs and then conclude the paper with a numerical example that further highlights the problem of premature generalization beyond data and shows the effectiveness of our proposed approaches in enhancing the safety of data-driven control methods.
Data-driven control, adaptive control, system identification, robust control, offline reinforcement learning.
1 Introduction
The development of adaptive control systems, while marked by significant theoretical advancements, has historically faced skepticism from practitioners, rooted in concerns over the reliability and robustness of such systems in real-world applications. Brian Anderson, in his article [4], explains reasons for this distrust and the inherent dangers of adaptive control algorithms. In this paper we highlight another fundamental vulnerability in adaptive learning and data-driven control algorithms: premature and often false generalization beyond the seen data. That is, hastily generalizing the behavior of the system beyond its behavior that was observed in the data. This is pervasive in modern data-driven control methods and can lead to catastrophic results in real-world applications. We propose practical methods for overcoming this vulnerability in a computationally efficient manner and using standard off-the-shelf software packages.
Data has become a cornerstone across every scientific domain, and it is foundational for the advancements in artificial intelligence [23]. In control theory, the reliance on data in control design is not new; and the subfields of system identification [26], robust control [17], and adaptive control [6], pioneered decades ago, have long been central to the field of control. The historical insights these subfields have generated are still crucial to the development of modern data-driven control and reinforcement learning (RL) algorithms. Take, for example, the “exciting” input for the stability of adaptive control [5, 27], which is analogous to the role of stochastic policies in RL algorithms [35], or the problem of the chaos phenomenon resulting from learning and control loops operating in comparable time scales, discovered in adaptive control [28] and later in RL [39]. These insights have also been crucial to the development of dual control, and its automatic experiment design aspect [32, 21], which have their connections to the exploration vs exploitation trade-off in RL [36].
The aforementioned insights, particularly the difficulties they are coupled with, hindered the spread of adaptive control algorithms in real-world applications, and instead, practitioners and control theorists alike found refuge in the more established and guaranteed robust control algorithms [34]. The importance of adaptive control cannot be excluded, however; and the recent surge in the application of data-driven control and RL algorithms (in power systems, for instance [18, 41] ) and RL’s role in the development of large language models [7] revive this importance. Thus, we are motivated to further understand the problems of adaptive control and expand their remedies.
In this paper we highlight an extra vulnerability of adaptive and data-driven control, not addressed explicitly by Anderson in [4]. This vulnerability is the premature and possibly false generalizations beyond data, resulting from extrapolating the behavior of the system beyond what was experienced in the data. This in turn can lead to compromising safety and performance under the existence of unmodeled nonlinearities in the true underlying system.
Modern data-driven control methods, whether direct (model-free) [13] or indirect (model-based) [40], mostly assume the linearity of the data-generating dynamics (although a few approaches extend direct methods to very limited forms of nonlinearities [15]). The problem with the linearity assumption is that it imposes the universality of the data, in the sense that the underlying dynamics must “behave similarly” (according to the same linear dynamics), beyond data and in any region of the state space. This assumption is often invalid, however, because various systems admit nonlinear effects beyond their engineered operating conditions and typical control design does not account for this situation.
We emphasize that in a system identification and data collection experiment, an identified system (or recorded data in the direct approaches case) is not necessarily valid under different experimental and data collection conditions. For example, a controller designed to be stabilizing/optimal with respect to the identified model (or recorded data) is guaranteed to be stabilizing/optimal only with respect to this specific identified model. However, eventually this controller will be connected to the actual data-generating system. This connection likely will change the operating conditions of the system and therefore may invalidate the exact assumptions this controller was based on. This can possibly activate unmodeled nonlinearities that can result in instability or severely degraded performance. Therefore, even with batch learning and iterative identification in a much smaller time scale than control [2], a change in the controller may still result in a rapid shift in the system’s operating conditions, compromising safety and performance.
The methods we propose in this paper enforce the consistency between the designed closed-loop system and the data encountered during learning. We achieve this consistency by augmenting the standard linear quadratic regulator (LQR) problem with affine regularization terms and linear matrix inequalities (LMIs) in the corresponding decision variables. The result is an affine semi-definite program (SDP) with LMI constraints that can be solved, even for high-dimensional systems, efficiently and using off-the-shelf software packages. Furthermore, we introduce a single hyperparameter that enforces the consistency level; in other words, this hyperparameter is an exploration vs exploitation balance factor. Our method allows for an iterative identification (data collection) and control that prevents rapid distributional shifts and the related sudden violations of data consistency.
In mathematical terms, this augmentation of the standard LQR problem includes working with the parametrization given by the controllability-type Gramian [25], which is also the steady-state covariance matrix of the system state. This makes it possible to relate the closed-loop state-input distribution to that of the data, using LMI constraints and/or regularization terms such as the Frobenius norm on covariance matrices or the Kullback–Liebler divergence between distributions. Throughout the paper we call a control design approach data-conforming if it enforces the consistency with data according to our aforementioned methods.
Inspired by the problems addressed in [4], our problem is closest in spirit to the so-called Rohrs’ counterexample [33], which sought to invalidate adaptive control algorithms in the 1980s. Part of Rohrs’ argument is that every physical system has unmodeled (parasitic) high-frequency dynamics. Because contemporary adaptive control led to highly nonlinear loops even for simple linear systems, these loops could internally generate high-frequency signals, exciting the unmodeled dynamics and possibly leading to instability. Our concern regarding modern data-driven control is analogous to that of Rohrs’ with adaptive control, but we target unmodeled nonlinear dynamics instead of unmodeled high-frequency ones. Through iterative identification and carefully expanding the closed-loop bandwidth, one can avoid exciting high-frequency dynamics [3]. Analogously, through iterative identification and careful data-conforming control design, one can avoid sudden exposures to new areas of the state space, which potentially contain harmful, unmodeled nonlinearities.
Our data-conforming control design approach also bears resemblance to the unfalsified control approach [12] in enforcing consistency with past data. Contrasting with this approach, which categorizes consistency through Boolean values, our approach is built around consistency in distances between distributions, allowing for generalized formulations that can cope with stochastic dynamics. Moreover, the method of [12] relies on model reference adaptive control, whereas our approach augments modern optimal control design via extending the standard LQR problem with affine regularization terms and LMI constraints. Our approach readily integrates with many modern control design formulations, allows for multivariate state-space systems, and can be extended to handle receding-horizon and input-output formulations.
The idea of consistency with data is also fundamental in the field of offline RL [1], but there the main motivation is data efficiency by initially limiting exploration. Offline RL algorithms are generally complex [20] and rely on stochastic nonlinear programming methods. Instead, our approach insists on computationally efficient methods that blend with modern control design techniques.
One can argue that modern data-driven policy gradient methods [42, 19] with small enough step sizes can prevent sudden distributional shifts and therefore enhance consistency with data. This is indeed a plausible argument. However, it is difficult to relate between the control gain step size and the corresponding distributional shift beyond a simple () continuity argument. Instead, our approach deals explicitly with distributions and directly dampens their shifts. Moreover, a very dampened policy gradient procedure may be slow and hence inadequate for many time-varying dynamics, limiting its applicability to constant or pseudo-constant dynamics. On the other hand, our approach decouples distributional shifts from learning system variations and therefore can accomplish both tasks effectively and simultaneously.
We conclude the paper with a simple yet telling numerical example that explains the premature generalization problem inherent in modern data-driven approaches and shows how our proposed solutions can mitigate this problem.
2 Problem Formulation
Consider the dynamic system
(1) |
where is the state and is the control input. The function is bounded but unknown and possibly nonlinear. The exogenous disturbance is independent and identically distributed. It is also independent from , the initial condition, which is also random and has finite mean and covariance.
The goal of this paper is to design a control law that minimizes the cost function
(2) |
where and are positive semi-definite and positive definite, respectively111In this paper all positive semi-definite and positive definite matrices are also symmetric., and the expectation averages over all the possible realizations of and .
Since is unknown, we assume that we have access to it only through experiments. Hence, we rely on the data-driven paradigm for the control design. During the first data collection experiment, we assume an initial control law of the form
(3) |
where this law is locally stabilizing (not necessarily optimal) and is a persistently exciting (PE) signal, in the following sense.
Assumption 1.
(The PE assumption [14, condition (6)]). The signal in (3) is PE for (1). That is, given a natural number222This lower bound is the minimum number of measurements required to identify a linear model of the system [40]. , every control realization and the corresponding state realization produce the matrix
(4) |
with full row rank .
The PE assumption is typical in system identification and data-driven control design and equates to an effective experiment design and a minimal descriptive state definition.
Problem statement
Starting from the initial control law of the form (3) and given access to the resulting input and state data (4), with the possibility of iteratively changing the control law and collecting new data, design a linear state feedback gain that minimizes the cost in (2).
Assumption 2.
The data (4) is centered around zero, and the state-input joint distribution can be approximated by a multivariate Gaussian with a positive definite covariance.
This assumption is true if the underlying system is linear Gaussian. For nonlinear systems, however, this point can be achieved, for instance, if the data was collected locally within a specified region of the state-input space under some operational limits. In other words, this assumption is satisfied if there is a “good enough” linear approximation of the underlying system within each operating conditions (4).
One may ask about the purpose of considering a general nonlinear system in (1) if the data can be predicted by an approximate linear model. The reason is to remember that once a new controller is designed, the resulting closed-loop system may operate in regions of the state-input space not adequately explored by the learning data and over which this approximate linear model is no longer a valid approximation. This point can be easily forgotten if we start with a linear system in (1).
3 Background
In this section we start by presenting the standard LQR problem. Although the problem formulation targets the nonlinear dynamics (1), our derivations in the subsequent sections assume the existence of a valid linear approximation under each operating condition.
3.1 Standard LQR
Suppose, in this subsection only, that the state-space system (1) is linear,
where are random variables and their statistics are as in (1), the disturbance has zero mean and a known (in this subsection only, it will be approximated from data in the later subsections) covariance , and the pair is known and controllable [25]. The LQR problem is obtaining a controller of the form , where is a white noise of zero mean and a fixed user-defined covariance , to minimize (2). The signal is to preserve Assumption 1 for future or iterative control design.
The cost function (2), up to an additive constant in , can be rewritten as (for a detailed derivation, check Appendix A.1)
(5) |
where is the observability-type Gramian [25], given by
(6) |
where and are the user-defined weighting matrices, as in (2), and is assumed detectable. The optimal control gain (which minimizes ) is then given by the solution to the following problem.
Problem 1.
Standard LQR (Observability-type Gramian) [8, Sec. 4.1]:
when satisfies the algebraic Riccati equation
The problem above results from the square completion in of the Lyapunov equation
(7) |
then chosing , which corresponds to the minimum (in the Loewner () ordering sense).
There is an equivalent characterization to solving the LQR problem. Using the linearity and the cyclic property of the trace, the cost can be written as (check Appendix A.1 for more details)
(8) | ||||
where is the controllability-type Gramian, given by
(9) | ||||
and is also the solution of the Lyapunov equation
(10) |
Since and the pair is controllable, the controllability Gramian satisfies .
Using this parametrization, we first define the extra variable , such that , which is equivalent to (the Schur complement corresponding to) the LMI333Since congruence preserves definiteness, that is, for any full rank matrix , if and only if . is taken to be in the above instance, and hence, both and appear on the block diagonal of . Since , it must be true that .
using the change of variables . The stability-imposing inequality relaxation of (10) (we discuss the consequences of this relaxation in Section 5) can also be described as an LMI, that is,
Therefore, the LQR problem can be described as an SDP with affine cost and LMI constraints [9].
Problem 2.
Standard LQR (Controllability-type Gramian):444From a numerical perspective, to avoid open sets in the domain of the optimization problem, can be replaced by , where is very small.
where the change of variables has been used and the optimal control is then recovered from the optimal values and through .
The choice of using the controllability-type parametrization of the cost (8) is motivated by (9); the matrix is the steady-state state covariance matrix and thus has a statistical interpretation. Since this paper is in the context of data-driven control, the matrix is of great importance, and having it appear explicitly is a key. We show this in Section 4.
3.2 Data-driven certainty equivalence
In the case of unknown system matrices and , conventional (indirect) data-driven approaches identify a model and , then solve the standard LQR problem with the identified model assumed to represent the true system exactly, a procedure termed certainty equivalence LQR.
The estimate model and can be obtained as a solution to the least-squares problem [42]
(11) |
where is the Frobenius norm. Using the acquired data (4), we let , , and .
The certainty equivalence LQR problem can be decomposed into an identification problem and then a control design problem. We describe this two-level approach as the following problem.
Problem 3.
Certainty equivalence LQR:
The optimal control is recovered from the optimal values and through .
This data-driven approach is a basic certainty equivalence control design approach. For a discussion of the robustness and the sample efficiency, and to see how to enforce robustness under the existence of noise () when the underlying model (1) is linear, an interested reader can consult [16] and [38]. In this paper, however, we tackle the problem of the false generalization of nonlinear systems by linear ones, beyond data, which is implicit in data-driven control algorithms.
Notice if the underlying system is indeed linear, the region of the state-input space from which the data (4) has been acquired is irrelevant, as long as the PE assumption, Assumption 1, is met. The reason is that if the system is linear, predicting its behavior in any region equates to predicting its behavior universally across the state-input space. This generally does not hold if the underlying system is nonlinear.
One shortcoming of using Problem 3 as the optimal control model is that once a linear model is identified and the resulting gain matrix is applied in feedback, the region in the state space, where the new closed-loop system is operating, may not be within or close to the region explored by the data (4) in the identification step. Therefore, if the underlying system is nonlinear555This is typically the realistic case, as many systems that are considered linear are not so beyond some operating conditions., the new controller may force the system to go to regions of the domain of over which the system admits behaviors that are very different from what was experienced in the data. This implicit and inherent false universal generalization may in turn result in serious violations of safety or lead to a degraded performance.
4 Data-conforming data-driven control
In practice, for a controller to conform to the learning data, a measure of similarity between distributions has to be adopted and augmented to the cost/constraints of the LQR problem. For this purpose, several approaches and simplifications can be considered. We explore some of them in this section.
4.1 Conforming to the state data
Notice that Problem 3 is a convex optimization problem and that the decision variable , the steady-state state covariance matrix, can be enforced to be similar to the state empirical covariance from data, via convex constraints or regularization.
Using (4), let be the state empirical covariance:
The data-conforming (in the state) version of Problem 3 can now be stated as follows.
Problem 4.
Data-conforming (in the state) LQR via a hard constraint:
and, similarly, the optimal control is recovered from the optimal values and through .
The new hard linear constraint has two shortcomings: (i) possible numerical instabilities and/or feasibility issues and (ii) limits to exploration beyond learning data. That is, potential future improvements via some exploration is not possible because the new controller will always seek to generate the same state distribution. This can be relaxed by using some margin, say, and , for some , predetermined or possibly included in the cost as a slack variable. This flexibility in the constraint, resembled in , can determine the exploration vs exploitation balance of the new control design.
One can also design a convex regularization term using some norm on the covariance matrices. For example, the squared Frobenius norm can be used by minimizing , where , or equivalently as an LMI [10],
We use this regularization term in the following modified problem.
Problem 5.
Data-conforming (in the state) LQR via regularization:
where and the optimal control is recovered from the optimal values and through .
The user-defined weight plays an analogous role to in the discussion succeeding Problem 4, in determining the exploration vs exploitation balance of the new control design.
Only the state data distribution is used in Problems 4 and 5. This puts no emphasis on the input data distribution and carries the implicit assumption that the underlying true system (1) is linear with constant coefficients in . That is, the seen inputs have the same effect regardless of the system state at their occurrence. In the following subsection, however, we enforce data conformation in the state-input joint distribution instead, which is more suited to general nonlinear systems.
4.2 Conforming to the joint state-input data
We denote the steady-state state-input joint density of the new design and the one estimated from the data by666The notation denotes a Gaussian density of mean and covariance . and , respectively. The Gaussian property of these densities and follow from Assumption 2. Therefore, these densities are fully characterized by their covariance matrices.
Since , the design covariance matrix satisfies
(12) |
where is as defined in (9). The empirical covariance matrix satisfies
(13) |
where and .
It is not clear whether one can enforce the closeness of to , as was done in Problems 4 and 5, via LMIs in and (or ). Instead, we use the Kullback–Leibler divergence, notated , between the densities and , which reduces to an expression in terms of their covariances. This expression, under some relaxation, can be posed as an affine regularization term and LMI constraints in and .
Although the KL divergence is not a full-fledged metric, it is non-negative, and zero if and only if the two distributions are equal (almost everywhere) [37, Sec. 8.6]. It is a measure of the inefficiency of assuming one distribution when the true distribution is another, and it has been used in the contexts of exploration vs. exploitation in artificial intelligence and RL [35] and in constructing ambiguity sets in distributionally robust optimization techniques [29].
Lemma 1.
We drop the second term in (14), since , and the additive constant , since it does not alter the optimal control. The modified KL-based regularization term is then
We now employ an approximation to reshape into a form amenable to the linear control design machinery, using the following results.
Lemma 2.
The algebraic equality
(15) |
holds.777The function to the left is the real-valued logarithmic function while to the right is the logarithm of a matrix, in the sense that a matrix exponential of the of a matrix equals the matrix itself.
Proof.
This is immediate from a result in linear algebra for positive definite matrices. We can write
where are implied by the positive definiteness of and , and the identity for nonsingular matrices. ∎
Toward an approximation of (15) that is convex in , suppose . That is, suppose for a small perturbation matrix . The matrix logarithm can be written in terms of its Taylor expansion as
which, for a first-order truncation, satisfies
(16) |
Substituting the approximation (16), after dropping the identity (additive constant), in (15), we can use a surrogate version of , call it , as a regularization term. For a fixed ,
(17) |
where the factor in is dropped and will be replaced by a hyperparameter . The new version of the cost (8), with as a regularization term and a user-defined hyperparameter, is now designated by
(18) |
Next we show that the regularization term , although an approximation of the KL divergence, has favorable properties for our purposes.
4.3 Properties of
We show that is the unique minimizer of the regularization term , or equivalently of as . That is, the regularization term enforces conforming to the data distribution, even though is not the KL divergence, after adopting the approximation (16). We take as given the arithmetic and geometric means inequality, which we state as the following lemma.
Lemma 3.
Let . Then , with equality if and only if . ∎
From this, we conclude the subsequent lemma about positive definite matrices.
Lemma 4.
Let be an symmetric positive definite matrix. Then is the unique minimizer of .
Proof.
Since is symmetric positive definite, we may write the unitary diagonalization , where is diagonal and is unitary. Therefore, by algebraic identities . That is,
where are the eigenvalues of . The unique minimum happens when each , per Lemma 3. That is, when ( is the only positive definite matrix of eigenvalues of ), it is the unique minimizer of . ∎
Theorem 1.
is the global minimizer of
Proof.
By the linearity and the cyclic property of the trace, we can rewrite as follows:
If we let , then we have
Showing that is the unique global minimizer of equates to showing that is the unique global minimizer of , which is true by Lemma 4. ∎
Next we show that the regularization term not only has a unique global minimizer that we seek to enforce, that is, , but is also computationally appealing, convex in particular.
Theorem 2.
is convex in .
Proof.
The domain—the set of all positive definite matrices—is convex. The first term of is linear in , while the second term
where is the th column of . Each entry in the sum is convex in , as shown in [11, p. 76]. ∎
Notice that is the Jeffreys divergence, a discrepancy metric between distributions [30] (Gaussians in this case). The above derivations highlight its relation to the KL divergence, which is more familiar in the exploration vs exploitation context.
The above results are concerned with the regularization term as a function of . However, the original LQR problem, whether Problem 2 or 3, has and () as decision variables. We show next that, under some relaxation, can be related back to the original decision variables affinely in the cost together with LMI constraints.
4.4 Representing in terms of and
The first term of , , can be upper-bounded by , where , or equivalently, committing to the change of variables ,
which can be equivalently described by the LMI
(19) |
where .
The second term is more involved since it contains the inverse of . Applying the inverse of a partitioned matrix [24], we have
which exists and is positive definite, since . Hence, the second term of
(20) |
Lemma 5.
We can rewrite the following term as
(21) | ||||
Proof.
Using the linearity and the cyclic property of the trace, we reduce the right-hand side to the left one. ∎
After dropping the additive constants in and , from (4.4) and (LABEL:eq:_completing_the_squares), and toward forming LMIs, we relax the right-hand side of (LABEL:eq:_completing_the_squares) by assuming ; thus, . We use the extra variable such that
which, equivalently, as an LMI, is
(22) |
We show empirically in Section 6 that, even after the above relaxation, the consistency with state-input joint distribution is still effectively enforced.
The term can also be described by an LMI by using an extra variable . Hence,
(23) |
4.5 The joint state-input data-conforming data-driven LQR
Toward a state-input data-conforming LQR control, we adjust Problem 3 by including the regularization term and the accompanying extra variables and LMIs defined in (19), (22), and (23).
Problem 6.
Data-conforming (jointly in the state-input) data-driven LQR:
The optimal control is recovered from the optimal values and through .
Remark 1.
Notice that if the zero mean condition is to be relaxed in Assumption 2, the corresponding term with in (14) and the cost (2) are quadratic in the variables and , where , and subject to the linear constraint given by steady-state mean dynamics
One therefore can solve Problem 6 to find the optimal gain and covariance and then, as a second stage, form a quadratic program in .
This is common in scenarios where the current operating point during the data collection is different from the intended one. In such a case, we center the data and the system (1) around the latter and incorporate the quadratic program discussed above.
Remark 2.
As mentioned previously, the identification step, inside Problems 3, 4, 5, and 6, resembles a naive certainty equivalence approach [16]. One can enhance the robustness of these problems by incorporating a convex hull or a ball, according to some systems’ norm, centered around , then search for a controller that stabilizes them all, while minimizing the cost and satisfying other constraints. This adjustment, say, to Problem 6, can be done through more LMI constraints [10] in the same decision variables. One also can, again, through LMI constraints, enforce robustness under state measurement noise (state is measured through , where , for some white noise ) as in [14] for when the covariance of the state evolution disturbance , as in [38].
Remark 3.
The robustness results referred to in Remark 2 hold when the underlying true system (1) is linear. This motivates a new interpretation of our data-conforming framework.
Suppose data has been collected, a system has been identified, and a convex hull or a ball has been constructed around this system’s estimate. When the new robust (e.g., according to [10]) controller is applied, new regions of the joint state-input space might be visited, due to the distributional shift, and new modes of nonlinearity activated. If the new system is to be identified and a new convex hull or ball to be constructed, they might be different from used in the design process.
In other words, because of the nonlinearity of the underlying model in our case, the application of the new control law equates to the transformation of . Hence, this invalidates the premise on which these robust control design approaches were built.
5 On the optimality conditions of the proposed problems
In the proposed SDPs, we relaxed the Lyapunov equality (10) into an inequality such that we can represent it as an LMI in the decision variables. This relaxation is of no effect to the standard LQR SDP, as we will show shortly, but may have some consequences to the proposed data-conforming approaches.
Notice that Problems 3, 5 and 6, before the change of variables and before applying the equivalent LMI formulations, can be put in the following form (we use instead of for ease of notation)
s.t. | |||
(24) |
where in Problem 3,
in Problem 5, and
(up to an additive constant in ) in Problem 6, where the matrices in calligraphic font denote the block matrices of the appropriate size of , that is,
In the above problems, the Langrangian is given by
where is the dual matrix variable corresponding to the Lyapunov inequality constraint. From the dual feasibility and the zero gradient Karush-Kuhn-Tucker (KKT) [11] optimality conditions (derivatives of traces of matrices [31]), the optimal solution has to satisfy [22]:
Notice that in the standard LQR formulation, Problem 3, when , is nothing but the observability-type Gramian in (7), that is, . The positive definiteness of then implies that in the complementary slackness condition,
the constraint has to be active to satisfy the above condition, that is, satisfy equation (10),
or in other words, is exactly the steady-state covariance of the system state we expect to see in future data when applying the law .
This, however, is not generally true for Problems 5 and 6, where the Lyapunov inequality may be inactive in certain situations. The zero gradient condition for these two problems, respectively,
and
Using the complementary slackness condition again, to guarantee the satisfaction of the equality (10) in each problem, the terms
(25) |
and
(26) | |||
have to be positive definite, which might not be the case if is significantly larger than . However, we show next that even if the terms (25) and (26) are not positive definite, it is still beneficial to have the inequality (24) inactive.
By the contrapositive argument, if the inequality constraint (24) is inactive, the terms (25) and (26) are either indefinite, or negative semi-definite. This can be implied when some eigenvalues of are significantly larger than some of ’s. In such case, two important observations can be made: (i) when has larger eigenvalues than , the data show that aggressive exploration has been done along the direction of the corresponding eigenvectors, while less aggressive (conservative) control design is achieved across the same directions, (ii) suppose is the solution of the Lyapunov equation (10) when , then , that is, the actual achieved design is more conservative in its exploration tendencies than the intended design. The inequality holds because when the Lyapunov inequality (24) is inactive, it is implied that the optimal solution to the SDP is not the fixed point solution of the Lyapunov equation (10), hence, is not the steady-state covariance, but rather an upper bound. That is, is the smallest covariance that satisfies the Lyapunov inequality (24). This can also be seen when writing the optimality conditions of the SDP with the cost and subject to the Lyapunov inequality (24).
The above observations show that the case of inactive constraints corresponds to a more conservative achieved control. We show empirically in the next section that the Lyapunov inequality constraint (24) is mostly active in our simulations, and that our approaches still yield favorable results for nonlinear systems, over the certainty equivalence LQR case, even when the Lyapunov constraint is inactive.
6 Numerical simulations
The scalability of our proposed data-conforming formulations is self-evident; as convex SDPs with affine costs and LMI constraints, they can be scaled to handle problems with high state-input dimensions. Instead, we choose to work with a simple, yet telling problem, crafted to further clarify and illustrate our data-conforming paradigm.
Suppose the data-generating system (1) is of the form888The results of this section can be reproduced using our open-source Julia code found at https://github.com/msramada/data-conforming-control.
(27) |
with the noise having a covariance . Toward solving Problems 3 and 5, an experimental simulation was conducted using collected state/input samples. The control , where is stabilizing, and is a white noise of zero mean and covariance (meeting the PE condition, Assumption 1).
Using these simulation samples to form the data matrices (4), we then solve Problems 3 and 5 and show their results in Figure 1. Each control gain resulting from these problems is used in a control law of the form and is run in a new simulation of the same number of time steps (). Figure 1 shows the resulting design distributions compared with the learning data. Notice that when starts having bigger and bigger values, the state design distribution converges to the state data distribution because data conformity is more heavily weighted in the control design.
Notice also in Figure 1 (a), that the (empirical) support of the state distribution of the new designed closed-loop system goes beyond that of the data. Suppose, for instance, that these not well-explored regions contain issues such as nonlinearities, discontinuities, or some safety or design condition violations. These issues, if not known and modeled by the operator in the control design, are also not discovered by the initial experiment. Hence, a non-data-conforming control, as the one in Figure 1 (a), can lead to activating these issues due to its inherent generalization beyond data. On the other hand, in (b), (c) and (d), the new controller conforms to the learning data to different levels dictated by the values of in each.
To better illustrate the effect of unknown/unmodeled nonlinearities, we adjust the dynamics (27) such that it is now
(28) |
where . Notice now that the nonlinearity is relatively larger in the not well-explored regions. Naively applying Problem 3 can lead to unexpected behavior and possibly to instability.
Again, we simulate the new nonlinear system with an equivalent control as in the previous case and for time steps. Then, at the last time step, we evaluate the new control laws, using the recorded past data points, and apply each control law immediately in feedback for a new time steps. The results of each simulation are illustrated in Figure 2. The instability resulting from relying on Problem 3 (or, equivalently, Problem 5 with ) is shown in Figure 2: (1). In contrast, Figure 2: (2), (3), and (4) correspond to Problem 5 with , respectively. The latter three cases enforce conforming to data and hence avoid unexplored regions, in particular those where the nonlinearity is dominant and dangerous.
The steps of the above procedure, from (i) an initial experiment of bounded data, to (ii) the control design using the above four problems (Problems 3, and Problem 5 with ) and then (iii) apply each control in feedback for time steps, are repeated ((i)–(iii)) for repetitions. Then, we calculate the percentage of the stable simulations resulting from each procedure. The stability of each simulation is decided by the boundedness within some high threshold () in each state coordinate and pointwise in the time steps of step (iii). The Lyapunov inequality constraint (24) was active in almost all of these simulations. The percentages of the stable simulations are shown in Table 1, further highlighting the importance of data-conforming control design.
Table 1: Percentages of stable simulations (out of 1,000 simulations) | |||
Problem 3 | Problem 5 | Problem 5 | Problem 5 |
(Prob. 5, ) | () | () | () |
21.7% | 99.2% | 99.9% | 100.0% |
The examples in (27) and (28) share, in their structure, that the input enters the system linearly and with constant coefficients. Therefore, the effect of the input is the same on the state evolution, regardless of the contemporaneous state of the system. This is not true in general, and the effect of the input may be coupled with the current state—for instance, in bilinear systems. The following is a modification of (28) with a coupled state-input term:
(29) | ||||
Similar to the previous two examples, we run an experimental simulation of this system for time steps and collect the input and state data as in (4). At the final time step , we use the collected data in solving Problems 3, 5, and 6, then use the resulting control laws for another samples. The results of these simulations are shown in Figure 3, showing the importance of introducing Problem 6 to account for the coupled state-input effect.
Similar to the construction of Table 1, we run the above procedure for repetitions of bounded experiments and calculate the percentages of stable simulations resulting from each of the four control design procedures (Problem 3, Problem 5 (with ) and Problem 6 (with )). The results are recorded in Table 2, which shows the importance of using the joint state-input data distribution when the system’s nonlinearities contain a coupled state-input effect, even though the Lyapunov inequality constraint (24) was mostly inactive.
Table 1: Percentages of stable simulations (out of 1000 simulations) | |||
Problem 3 | Problem 5 | Problem 5 | Problem 6 |
(Prob. 5, ) | () | () | () |
0.07% | 22.9% | 22.5% | 97.9% |
We note that in all of these simulations, none of the tested problems returned any feasibility issues. Moreover, the computation time of each problem (about six milliseconds for Problem 6 on an Apple M1 Max MacBook Pro with 64 GB or RAMs) is comparable to solving a standard LQR problem.
These examples show how the concept of data-conforming control can enhance the safety of the data-driven approaches. In practice, a control engineer can start with a high value of (or ) then gradually reduce it, if needed, to allow some exploration beyond data. One of the important recommendations of the adaptive control literature [5, 4] is to model as much as possible, and leave any model reduction or abstraction for later. Our data-conforming framework can provide the safety and the time necessary for the control engineer to observe, model/interpret, and react to any potential activation of unknown/unmodeled nonlinearities.
7 Conclusion
This paper addresses a problem inherent in many modern data-driven and adaptive control approaches, namely, the problem of the premature and possibly false generalization beyond data, together with its consequences in terms of sudden distributional shifts in the state-input space. We present methods for mitigating this problem through enforcing consistency with the learning data, and we numerically test them. Because of the formulation of our methods as solutions to SDPs of affine costs and LMI constraints, they are computationally efficient, can scale up to systems with hundreds in dimension, and can be easily integrated with modern control design approaches.
Further work is being pursued to (i) apply the data-conformation concept in the problem of robust data-driven control, in the sense described in Remarks 2 and 3; and (ii) investigate the possibility of developing data-conforming policy gradient surrogates to dampen distributional shifts resulting from standard policy gradient methods.
Acknowledgment
This material was based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) under Contract DE-AC02-06CH11347.
References
- [1] R. Agarwal, D. Schuurmans, and M. Norouzi, “An optimistic perspective on offline reinforcement learning,” in International conference on machine learning. PMLR, 2020, pp. 104–114.
- [2] P. Albertos and A. S. Piqueras, Iterative identification and control: advances in theory and applications. Springer Science & Business Media, 2012.
- [3] B. D. Anderson, “Windsurfing approach to iterative control design,” in Iterative Identification and Control: Advances in Theory and Applications. Springer, 2002, pp. 143–166.
- [4] Anderson, Brian DO, “Failures of adaptive control theory and their resolution,” 2005.
- [5] K. Astrom, “A commentary on the C.E. Rohrs et al. paper ”robustness of continuous-time adaptive control algorithms in the presence of unmodeled dynamics”,” IEEE Transactions on Automatic Control, vol. 30, no. 9, pp. 889–889, 1985.
- [6] K. Åström and B. Wittenmark, Adaptive Control: Second Edition, ser. Dover Books on Electrical Engineering. Dover Publications, 2013. [Online]. Available: https://books.google.com/books?id=4CLCAgAAQBAJ
- [7] Y. Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma, D. Drain, S. Fort, D. Ganguli, T. Henighan et al., “Training a helpful and harmless assistant with reinforcement learning from human feedback,” arXiv preprint arXiv:2204.05862, 2022.
- [8] D. Bertsekas, Dynamic programming and optimal control: Volume I. Athena scientific, 2012, vol. 4.
- [9] S. Boyd, V. Balakrishnan, E. Feron, and L. ElGhaoui, “Control system analysis and synthesis via linear matrix inequalities,” in 1993 American Control Conference. IEEE, 1993, pp. 2147–2154.
- [10] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan, Linear matrix inequalities in system and control theory. SIAM, 1994.
- [11] S. P. Boyd and L. Vandenberghe, Convex optimization. Cambridge University Press, 2004.
- [12] F. B. Cabral and M. G. Safonov, “Unfalsified model reference adaptive control using the ellipsoid algorithm,” International Journal of Adaptive Control and Signal Processing, vol. 18, no. 8, pp. 683–696, 2004.
- [13] J. Coulson, J. Lygeros, and F. Dörfler, “Data-enabled predictive control: In the shallows of the DeePC,” in 2019 18th European Control Conference (ECC). IEEE, 2019, pp. 307–312.
- [14] C. De Persis and P. Tesi, “Formulas for data-driven control: Stabilization, optimality, and robustness,” IEEE Transactions on Automatic Control, vol. 65, no. 3, pp. 909–924, 2019.
- [15] De Persis, Claudio and Tesi, Pietro, “Learning controllers for nonlinear systems from data,” Annual Reviews in Control, p. 100915, 2023.
- [16] S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu, “On the sample complexity of the linear quadratic regulator,” Foundations of Computational Mathematics, vol. 20, no. 4, pp. 633–679, 2020.
- [17] P. Dorato, “A historical review of robust control,” IEEE Control Systems Magazine, vol. 7, no. 2, pp. 44–47, 1987.
- [18] E. Ekomwenrenren, J. W. Simpson-Porco, E. Farantatos, M. Patel, A. Haddadi, and L. Zhu, “Data-driven fast frequency control using inverter-based resources,” IEEE Transactions on Power Systems, 2023.
- [19] M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” in International conference on machine learning. PMLR, 2018, pp. 1467–1476.
- [20] S. Fujimoto and S. S. Gu, “A minimalist approach to offline reinforcement learning,” Advances in Neural Information Processing Systems, vol. 34, pp. 20 132–20 145, 2021.
- [21] T. A. N. Heirung, B. E. Ydstie, and B. Foss, “Dual adaptive model predictive control,” Automatica, vol. 80, pp. 340–348, 2017.
- [22] C. Helmberg, “Semidefinite programming,” European Journal of Operational Research, vol. 137, no. 3, pp. 461–482, 2002.
- [23] T. Hey, S. Tansley, K. M. Tolle et al., The fourth paradigm: data-intensive scientific discovery. Microsoft research Redmond, WA, 2009, vol. 1.
- [24] R. A. Horn and C. R. Johnson, Matrix analysis. Cambridge University Press, 2012.
- [25] T. Kailath, Linear systems. Prentice-Hall Englewood Cliffs, NJ, 1980, vol. 156.
- [26] L. Ljung, “System identification,” in Signal analysis and prediction. Springer, 1998, pp. 163–173.
- [27] G. Marafioti, R. R. Bitmead, and M. Hovd, “Persistently exciting model predictive control,” International Journal of Adaptive Control and Signal Processing, vol. 28, no. 6, pp. 536–552, 2014.
- [28] I. M. Mareels and R. R. Bitmead, “Non-linear dynamics in adaptive control: Periodic and chaotic stabilization — II. analysis,” Automatica, vol. 24, no. 4, pp. 485–497, 1988.
- [29] P. Mohajerin Esfahani and D. Kuhn, “Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations,” Mathematical Programming, vol. 171, no. 1, pp. 115–166, 2018.
- [30] L. Pardo, Statistical inference based on divergence measures. Chapman and Hall/CRC, 2018.
- [31] K. B. Petersen, M. S. Pedersen et al., “The matrix cookbook,” Technical University of Denmark, vol. 7, no. 15, p. 510, 2008.
- [32] M. S. Ramadan and M. Anitescu, “Extended Kalman filter – Koopman operator for tractable stochastic optimal control,” IEEE Control Systems Letters, 2024.
- [33] C. Rohrs, L. Valavani, M. Athans, and G. Stein, “Robustness of continuous-time adaptive control algorithms in the presence of unmodeled dynamics,” IEEE Transactions on Automatic Control, vol. 30, no. 9, pp. 881–889, 1985.
- [34] M. G. Safonov, “Origins of robust control: Early history and future speculations,” Annual Reviews in Control, vol. 36, no. 2, pp. 173–181, 2012.
- [35] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
- [36] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT Press, 2018.
- [37] M. Thomas and A. T. Joy, Elements of information theory. Wiley-Interscience, 2006.
- [38] H. J. van Waarde and M. K. Camlibel, “A matrix Finsler’s lemma with applications to data-driven control,” in 2021 60th IEEE Conference on Decision and Control (CDC). IEEE, 2021, pp. 5777–5782.
- [39] T. Wang, S. Herbert, and S. Gao, “Fractal landscapes in policy optimization,” Advances in Neural Information Processing Systems, vol. 36, 2024.
- [40] J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. De Moor, “A note on persistency of excitation,” Systems & Control Letters, vol. 54, no. 4, pp. 325–329, 2005.
- [41] Z. Yuan, C. Zhao, and J. Cortés, “Reinforcement learning for distributed transient frequency control with stability and safety guarantees,” Systems & Control Letters, vol. 185, p. 105753, 2024.
- [42] F. Zhao, F. Dörfler, A. Chiuso, and K. You, “Data-enabled policy optimization for direct adaptive learning of the LQR,” arXiv preprint arXiv:2401.14871, 2024.
Government License: The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory (“Argonne”). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan. http://energy.gov/downloads/doe-public-access-plan.
Appendix A Appendix
A.1 The cost in the parametrization
Starting from the cost (2),
substituting , and using the fact that is a zero mean white noise,
Using the cyclic property of the trace and the linearity of , , which is an additive constant (for a fixed ) and can be omitted from the cost. Now, using the cyclic property of the trace and the linearity of again, we have
(30) |
The steady-state covariance of the state is designated by , which is also known as the controllability-type Gramian. Since
and using the fact that are white, of zero mean and mutually independent from each other and from , this covariance can be described by
or, if is hurwitz, the transient term vanishes and we have
After substituting this description of in (30), and using the cyclic property of the trace operator, we have
where is the observability-type Gramian, as in (6).