Tyche is the Greek goddess of chance and fortune. Charles Sanders Peirce, one of the founders of the American pragmatist tradition in philosophy, coined the term tychism for his philosophical doctrine that stipulates a fundamental and irreducible role of “arbitrary determination or chance” in the workings of the universe. Metaphysics aside, it is hard to overstate the prominent role that probabilistic and statistical ideas play in science and in engineering. However, a heavy dose of skepticism is needed to counteract what I am tempted to call the tendency towards mindless tychism, just throwing probabilistic models around without subjecting them to careful scrutiny. As Sunny Auyang writes in her excellent book Foundations of Complex-System Theories in Economics, Evolutionary Biology, and Statistical Physics,
Probabilistic propositions are not empirically testable, even in principle. When the cards are dealt, the poker hand is either a flush or not, without any sign of probability. When we examine the result of many hands, we have quietly switched from probability to statistics and changed our subject matter. The untestability greatly limits the utility of probabilistic propositions in scientific theories.
And further:
Statistical theories employing the probability calculus give coarse-grained descriptions of composite systems that overlook much information about individual behaviors. Probabilistic propositions express our desire to know some of the missing information. Since the information is unavailable from statistical theories, probabilistic statements appear as an appraisal of propositions such as "This hand is a flush.” Unlike statistical propositions, probabilistic propositions do not belong to the object language, in which we talk about objects and the world. They belong to the metalanguage, in which we talk about propositions and theories.
This perspicuous distinction between statistics as object language and probability theory as metalanguage aligns nicely with David Slepian’s distinction between Facet A (engineering practice) and Facet B (theory), and in particular with his admonition to not mix the two without proper care. But what does “proper care” mean? At the very least, one has to be careful about justifying the use of probabilistic models using something more substantial than entrenched tradition, habit, and ritual.
Let’s consider something presumably uncontroversial, namely filtering in communications and control.1 We have the familiar textbook set-up: There is a signal x, which we would like to estimate on the basis of observations y subject to the causality constraint: at each time t, our estimate of x(t) can only make use of observations available up to time t. The use of probability in this context goes back to the foundational works of Andrey Kolmogorov and Norbert Wiener; the key assumption here is that the pair (x(t),y(t)) is a stationary stochastic process with known first- and second-order moments. This allows us to phrase the filtering problem as least-squares estimation, which is particularly clean in the so-called whitening filter formulation due to Bode and Shannon. In today’s parlance, Bode and Shannon assumed a certain generative model for all the signals under consideration, namely that they are the result of passing white noise (itself an idealization of wideband device noise) through a known minimum-phase linear filter. To their credit, Bode and Shannon were remarkably candid about the pragmatic status of their modeling assumptions and about the need for careful model checking:
A result in applied mathematics is only as reliable as the assumptions from which it is derived. The theory developed above is especially subject to misapplication because of the difficulty in deciding, in any particular instance, whether the basic assumptions are a reasonable description of the physical situation. Anyone using the theory should carefully consider each of the three main assumptions [stationarity; least-squares optimality; linearity — MR] with regard to the particular smoothing or prediction problem involved.
The assumption that the signal and noise are stationary is perhaps the most innocuous of the three, for it is usually evident from the general nature of the problem when this assumption is violated. The determination of the required power spectra … will often disclose any time variation of the statistical structure of the time series. If the variation is slow compared to the other time constants involved, such nonstationary problems may still be solvable on a quasi-stationary basis. A linear predictor may be designed whose transfer function varies slowly in such a way as to be optimal for the "local" statistics.
The least square assumption is more troublesome, for it involves questions of values rather than questions of fact. When we minimize the mean-square error we are, in effect, paying principal attention to the very large errors. The prediction chosen is one which, on the whole, makes these errors as small as possible, without much regard to relatively minor errors. In many circumstances, however, it is more important to make as many very accurate predictions as possible, even if we make occasional gross errors as a consequence. When the distribution of future events is Gaussian, it does not matter which criterion is used since the most probable event is also the one with respect to which the mean-square error is the least. With lopsided or multimodal distributions, however, a real question is involved.
The third assumption, that of linearity, is neither a question of fact, nor of evaluation, but a self-imposed limitation on the types of operations or devices to be used in prediction. The mathematical reason for this assumption is clear; linear problems are always much simpler than their nonlinear generalizations. In certain applications the linear assumption may be justified for one or another of the following reasons:
1. The linear predictor may be an absolute optimal method, as in the Gaussian time series mentioned above.
2. Linear prediction may be dictated by the simplicity of mechanization. Linear filters are easy to synthesize and there is an extensive relevant theory, with no corresponding theory for nonlinear systems.
3. One may use the linear theory merely because of the lack of any better approach. An incomplete solution is better than none at all.
The Kalman filter, introduced in 1960, was likewise based on probabilistic principles, yet the state-space formalism underlying it was a genuine paradigm shift in the sense of Thomas Kuhn. Similar to the Bode-Shannon approach, Kalman’s state-space framework also relied on a generative model, this time involving linear dynamics and additive Gaussian noise. However, it could be easily generalized to nonstationary situations and, due to its recursive nature, lent itself nicely to efficient computer implementations. It is then rather remarkable that, thirty-four years later, Kalman wrote, with his characteristic bluntness, the following words in “Randomness reexamined”:
Having become a household name and being supposedly responsible for a huge number of ‘applications’ of probability to the real world, I may be permitted to say a few words, not necessarily in an unkind way, about the emotional, sociological and philosophical aspects of what in the current fashion is called ‘applied probability’.
I see enormous activity, seemingly aimless, I see fanatical devotion to ideas and principles which have grown into a quasi-religion, but not into a scientific discipline; I see younger generation spinning its wheels at problems my contemporaries have not felt worthwhile to devote their lives to. In short, I see a horrible, total, unimaginable mess.
…
Acting on rumors that probability has become the main theme of prestigious sciences (physics?), less ambitious sciences (economics!) seem to be content to describe the world in probabilistic terms as the best (?) that can be done. This road leads to pseudo-science or, more accurately, ersatz-science. This is not working hard enough.
Jan Willems, somewhat more diplomatically, wrote that it is of paramount importance to “get the physics right”:
Why is the physics of models not more prominently present in areas as Systems and Control? Why are probability, inputs, outputs and signal flow graphs used without analyzing the physical situations to which they claim to pertain? The explanation, in my opinion, lies in the sociology of science. Normal science uses an established paradigm in which to operate. When a problem is cast in an input/output setting with disturbances modeled as stochastic processes, we are operating in a clear and often sophisticated mathematical framework, with results that may be difficult to obtain and hard to prove and that are verifiable mathematically. The results are judged by their mathematical depth and difficulty. In other words, the explanation lies in the Lure of Mathematics. There is no other explanation.
Ironically, he also puts the Kuhnian spin on the underlying state of affairs: What constituted revolutionary science in 1960 is now just good old normal science, comfortably numb in its ritualized mindless tychism. Nevertheless, there are some interesting attempts to get the physics right—as, for example, what John Wyatt and Geoffrey Coram do for noise in nonlinear devices (perhaps a topic for another post?).
Jan Willems’ 1972 survey “Recursive filtering” gives a good summary and a concise historical overview of the key ideas underlying both Wiener and Kalman filters.