Contra Recht on Optimal Control
Ok, not really, but, you know ...
In a recent post, Ben Recht takes aim at optimal control as a normative framework for engineering design. He brings forward two related points. The first point is that optimal control is the meeting point of the maximalist reinforcement learning camp (“reward is all you need”, scale everything everywhere all at once) and the minimalist modern control camp (insert the “so it’s all state space? always has been” astronaut meme here). The second point is that the tidy mathematics of the maximum principle and dynamic programming offers an illusion of safety that bursts as soon as we leave behind the small world of linear models, quadratic costs, and low uncertainty.
I won’t argue with Ben’s second point, as I largely agree with it. However, I do want to bring some nuance to the first point. Indeed, modern optimal control is a Cold War creation, and as such it proudly wears the mantle of instrumental rationality and ruthless cost-benefit accounting. Many of the key figures of optimal control were gainfully employed by the military-industrial complex — Richard Bellman, John M. Danskin, and Rufus Isaacs put in a lot of time at the RAND Corporation, Lev Semenovich Pontryagin was deeply embedded in the Soviet mathematical establishment and consulted for the military, and, of course, everyone knows about John von Neumann, whose early foundational work on game theory percolated into optimal control first through Isaacs’ work on differential games and, much later, through Tamer Bașar and Pierre Bernhard’s game-theoretic treatment of H∞-optimal control. And one common theme we can discern in all these works is that the model comes first.
From the vantage point of a historian, this makes sense. Modern optimal control grew out of the classical calculus of variations, which in turn grew out of classical mechanics. The excellent overview article by Héctor Sussmann and Jan Willems designates 1697 as the birth year of optimal control — when several solutions of the brachistochrone problem, posed about a year earlier by Johann Bernoulli as a challenge to the best mathematicians of the time, were published in Acta Eruditorum. This was the time of ascendant rationalism, which in the realm of natural philosophy put a lot of emphasis on mathematical models (given by systems of first-order differential equations relating positions, momenta, and forces) and on optimality (phrased in terms of least action principles and backed up by metaphysical commitments to pre-established harmony and the like). Subsequent developments by Euler, Lagrange, Hamilton, Jacobi, Weierstrass and others would later find new life in the mathematical theory of optimal control.
However, control theory had to first take an empiricist detour. While continental rationalists busied themselves with deciphering the book of nature written in the language of mathematics, industrial revolution was taking root in 18th-century England. When James Watt invented the centrifugal governor to stabilize the steam engine, he was not concerned with optimality, he wanted the shaft of the engine to rotate at a steady angular speed. Feedback control arose out of tinkering and hacking, models came later. Some of the first models and new mathematical concepts, such as stability, were put forward by James Clerk Maxwell. Maxwell’s 1868 treatise “On governors” worked out the criteria of stability for 2nd- and 3rd-order systems and posed the general case as an open question. It was solved independently by Routh in England (apocryphally, Maxwell’s academic rival during their Cambridge days) and by Hurwitz in Germany.
Until the 1960s, the mathematical toolbox of control engineering was imbued by robust empiricism. Systems were modeled in the frequency domain by transfer functions (rational functions of one complex variable), and, more often than not, exact models were not available. Instead, engineers relied on measurements and would fit various canonical models to them based on intuition and “clinical” experience. Once again, hacking and experimentation were the order of the day. The trusty three-term PID controller, invented in industry after a great deal of guesswork and hacking, owed more to Fordism and Taylorism than to model-based theorizing (see Stuart Bennett’s historical account). Various methods for tuning PID controllers based on empirical plant measurements were developed by industry insiders, such as the Ziegler-Nichols method which we don’t really teach to undergrads anymore. Tuning PID or lead-lag compensators based on frequency response measurements summarized in Bode plots is a subtle art, which in principle can be made entirely data-driven — just bust out the oscilloscope and go at it. Before control returned to models and rationalism in the 1950s and 1960s, it went through a data-driven empiricist phase spurred on by rapid industrialization and the rise of automation. We had to go through Bell Labs first before making it to RAND Corporation.
This empiricist view has much in common with modern RL. Policy optimization is closer in spirit to PID controller tuning even if there is an urge by both the theorists and the practitioners to phrase things in terms of approximate dynamic programming and Q-functions. Just like with PID controller tuning, we can only have assured performance locally; once we start nesting feedback loops inside other feedback loops, or add more stages to the dynamic programming recursion, things get out of hand rather quickly. And it’s even worse once uncertainty sets in. Quantitative Feedback Theory was developed by Isaac Horowitz as a valiant attempt to reinvent PID control for the age of uncertainty. QFT has its diehard adherents, but, again, nobody really teaches it to engineering students. However, at least here we have a clear recognition of the fact that optimality is a mirage. Instead, we set various performance specs and try to meet them as best as we can and hit the best trade-offs between them given available resources.
The LQR problem, as Ben points out, is one point of contact between old-school loopshaping craft and the modern optimal control framework. It is, however, an uneasy union of thoroughgoing empiricism and strict rationalism. The empiricist spirit is the heritage of the data-driven industrial control of the 1930s. The complementary rationalist methodology of 1950s-1960s optimal control is patterned on physics, which is not surprising given its roots in the calculus of variations. But physics cannot handle organized complexity well, and, anyway, what works isn’t optimal and what is optimal doesn’t work. But maybe we are not looking for optimality in the right place. Instead of trying to impose optimality on systems we are putting together, perhaps we could channel our creativity into optimizing our process — could we do better with fewer resources, for example?


One particular nuance in this discussion that I think is worth mentioning is that many different applications walk under the banner of "optimal control", but they're all over the place in the spectrum between "optimal" and "control". In some applications, you're basically doing optimal decisionmaking over a long horizon of a defined scenario, the objective function corresponds to tangible quantities (such as cost, or mass, or time), and sometimes you don't even care about stability because it's either the concern of another engineer (in other words, you're doing more "optimal guidance" than optimal control) or the system itself is already stable. In other applications, you're basically using "optimal control" to design a controller, the scenario is sometimes just representative, and the objective function (e.g. the Q and R matrices in LQR) is kind of arbitrary or just a different tuning parameter for the engineer to tweak. Again, using the LQR example, rarely do we actually care about "the square of the control or control deviation" except as a vague proxy for output behavior; in many instances, the actual cost is linear on the integral of the control magnitude - but the associated control problem is then harder (switching between saturated control arcs), more brittle against uncertainty and does not allow "solve once" solutions as in LQR. Of course, the more "optimal" you go, the more optimization pathologies you will encounter.
Max: I really enjoyed this piece. It reminded me of stories I've heard of interactions between control theorists and power systems engineers in the 70s-90s -- suffices to say there was a lot of sneer from one side of the other. Felix Wu tried to fix it, but the other side can also be deeply rooted in not changing things unless absolutely needed. Lead-lag compensators are so closely related to primary frequency control in power systems. Feels a bit like black magic but they do have differential equations ;). I am often amazed by how LQR and LQG systems were largely enough to help us fly spacecrafts! But I wonder if one shoule compare such seemingly "model-able" dynamical systems to where RL is presently used. On the other hand, every time I see a Waymo (which in AZ is everyday and many times a day), I have to wonder if there is any RL at all in it or if it is just Kalman at its best.