The tragedy of Ireneo Funes, the subject of Borges’ short story “Funes the Memorious” is that, being blessed (or, more accurately, cursed) with perfect memory after suffering a head injury, he is incapable of abstraction and categorization. Every sequence of events or sensations is unique for him. He is unable to form stable associations, to make projectable distinctions, to anticipate, and to plan.
We take our ability to form and remember compressed summaries of the past for granted. Yet it is precisely the ability that underlies the concept of state in systems theory. The history of this idea is interesting in its own right. It was present in Turing’s 1936 paper on computable numbers (his machines had internal states that, together with the contents of the tape and the position of the read/write head, determined their behavior). Shannon, in his 1948 paper that introduced information theory, formalized the operation of encoders and decoders in communication systems using the idea of a discrete transducer, i.e., a finite-state automaton whose internal state was determined by the incoming stream of input symbols and would, together with the input, determine the outgoing stream of output symbols. Although Shannon did not cite Turing in 1948, he had an article on the relative capabilities of deterministic and probabilistic Turing machines in Automata Studies, which he edited in 1956 together with John McCarthy. In addition to Shannon and McCarthy themselves, the list of contributors was a who’s-who in the theory of automata and computation (Davis, Kleene, Minsky, Moore, von Neumann) and cybernetics (Ashby, Culbertson, McKay, Uttley), so the idea of state as encapsulation of system memory was already in the air. When Rudolf Kalman introduced it into the theory of control systems in 1960, he did not yet seem to be aware of these important predecessors (more on this a bit later). As we trace the history and the evolution of the concept of state, we see parallels to certain philosophical ideas.
The concept of state from the rationalist perspective
Most of the early works that invoked the idea of state as the encapsulation of system memory were motivated either by introspection or by practical engineering experience. As an example of introspection, take Turing’s 1936 paper. He was interested in Hilbert’s Entscheidungsproblem, the question of whether there exists a mechanical procedure for solving every mathematical problem, provided it is expressed in a suitable formal language. Turing’s answer in the negative was based on abstracting the thought process of a mathematician into such a mechanical procedure. This led to the idea of the internal state, e.g., for storing intermediate results of a computation. Shannon’s idea of the finite-state transducer, on the other hand, was most likely inspired by his extensive experience with modeling, building, debugging, and operating computing machines, from Vannevar Bush’s differential analyzer (the subject of his undergraduate thesis at MIT) to Boolean logic implemented using relays (the subject of his master’s thesis). In both of these examples, the idea of the state arose from an internalist, rule-driven model of either the working mathematician or the building blocks of a communication system. Framed this way, it fits into the rationalist tradition along the lines of Leibniz or Descartes. The state is instantiated in the system by design.
The concept of state from the empiricist perspective
On the other hand, one can take a complementary, externalist perspective: Suppose we could observe the operation of a Turing machine from the outside. Suppose we could only keep track of the contents of its tape and the motion of its read/write head. Could we then reconstruct the state of the Turing machine based on such sense data? We could even take a skeptical, Humean tack and ask how we could even be sure in the first place that the device scurrying to and fro along the tape is built internally like a Turing machine instead of just memorizing everything it had seen so far, Funes-style. Unlike the above rationalist concept of state, which was firmly tied to system building from first principles, this is an empiricist perspective linked to system modeling from data. Viewed from this angle, the state is any attribute of the system which, in conjunction with the current input, is sufficient for determining the future output.
In systems theory this idea is known as Nerode equivalence, after a 1958 paper by Anil Nerode. In broad strokes, it can be described as follows. Suppose we have some system that takes in a stream of input symbols and produces a stream of output symbols in a causal manner, i.e., the output symbol at each time is determined only by the past history of input symbols. Then we say that two histories of inputs at time t are equivalent if the future output behavior of the system starting with either of the two input histories is the same. For the purposes of determining what the system is going to do next, we only need to know the Nerode equivalence class of its past history. Moreover, knowing how the system maps its inputs to outputs, we can write down an update rule that will tell us how to propagate the Nerode equivalence classes from time t to time t+1. The Nerode equivalence class is the system state we are after, and, for the purpose of predicting the behavior of the system, this is all we need to keep track of. More importantly, we may be able to avoid the curse of Funes if the size of the state space grows very slowly with time (or, in the best-case scenario, stays constant). We don’t need to remember everything about the past, just enough in order to make useful and actionable distinctions. In fact, Nerode’s construction has the key property of minimality: It gives us the coarsest partition of past histories that allows for accurately predicting the system’s future behavior.
Interestingly, as pointed out by Dan Davies, this idea of the state as an abstraction of past history can also be found in Ashby’s Introduction to Cybernetics:
Thus, suppose I am in a friend’s house and, as a car goes past, his dog rushes to a corner of the room and cringes. To me the behaviour is causeless and inexplicable. Then my friend says ‘He was run over by a car six months ago’. […] Memory is not an objective something that a system either does or does not possess; it is a concept that the observer invokes to fill in the gap caused when part of the system is unobservable. The fewer the observable variables, the more will the observer be forced to regard events of the past as playing a part in the system’s behaviour [emphasis mine].
Ashby’s book came out in 1956, two years before Nerode’s paper. By the end of the 1960s, control theorists were broadly aware of Nerode equivalence and its relation to their formulation of state-space systems. Researchers like Michael Arbib (who eventually switched from control theory to neuroscience) started speaking of the rapprochement between automata theory and control, and books like Topics in Mathematical Systems Theory by Kalman, Falb, and Arbib (1969) or General Systems Theory: Mathematical Foundations by Mesarovic and Takahara (1975) were synthesizing the ideas from automata, computation, and control into a unified discipline of mathematical systems theory.
The state as an abstraction
Now we see, either from the formal definition of Nerode equivalence or from Ashby’s vivid example, that the state is often an abstraction which may be useful for the purposes of modeling or prediction, but not necessarily internally instantiated in that particular system. Jan Willems, in his behavioral approach to systems theory, makes a psychoanalysis-inspired distinction between manifest content and latent content of system behavior. Manifest content is what we can measure from the outside; latent content (the hidden system state) is something we construct or postulate in order to explain these measurements. As Ashby says, the more we can observe, the more compressed our state representation will be. Cautious observers will be wary of reifying the state, i.e., raising it from the status of a theoretical entity introduced only order to save the phenomena to something that is real in the sense of Ian Hacking, i.e., an entity with causal powers that we can manipulate. This sounds somewhat like behaviorism, but behaviorists like B.F. Skinner were simply hedging their bets: All talk of neural correlates of external behavior has to wait until we have made sufficient progress in neurophysiology.
The realization problem
The problem of constructing a state-space description of a system based on an input-output description is known as the problem of realization. Conceptually, it presents a synthesis of the internalist-rationalist and the externalist-empiricist concepts of state. If we wish to build a system with given input-output behavior, then we would like to know how to realize it as a state machine in the most economical way. Electrical engineers certainly knew this well before Kalman, even if they didn’t necessarily think of branch currents and node voltages as state variables. On the other hand, having a compact state-space model of a given phenomenon may be valuable for making predictions or forecasts even if our model does not accurately describe the actual internal workings of the phenomenon. Philosophers of mind and cognitive scientists have long been aware of this as the problem of multiple realizability. Even Dennett’s intentional stance can be seen as a form of state-space reconstruction based on observed behavior (although belief and desire states in that framework carry additional semantics).
These ideas are relevant in the context of large language models. Unlike recurrent neural nets, LLMs based on the transformer architecture do not have an internal state which is updated recursively. On the other hand, since LLMs are autoregressive models, the contents of their finite context window together with the next generated token can be thought of as a state. However, given GPT-4 context lengths of 10k to 100k tokens and the statistical regularities of natural language, this state representation is highly unlikely to be minimal. It would be very interesting to see what form the Nerode equivalence classes take in LLMs and what they tell us about the internal architecture of language.
Since the comments have veered into philosophy, I can ask a question which I originally held back.
Maxim, your philosophical interests typically hop between logical empirical philosophy and continental metaphysics. Is there a philosophical map in your head that reconcile metaphysics denial (to put it crudely) of the logical positivists with the sort of continental philosophy that prefigures in certain other kinds of arguments you make (Yuk Hui's _Recursivity and Contingency_, which I read after your recommendation, comes to mind as a concrete example).
Very interesting indeed!
If you already brought up the cybernetics connection, I think (?) another contact point is the emphasis, from the very beginning, of feedback loop systems -- both from the Bigelow Wiener Rosenblueth and from McCulloch and Pitts. Stressing that purposeful systems aren't merely a collection of simple reflex arcs / simple input-output devices, but act based on some internal states as well etc.
What I find interesting in light of your analysis is that, in fact, McCulloch and Pitts were almost explicit about relating the idea of the system's state to the notion of abstraction (from the section "consequences" in their paper, emphasis mine):
> Causality, which requires description of states and a law of necessary connection relating them, has appeared in several forms in several sciences, but never, except in statistics, has it been as irreciprocal as in this theory. Specification for any one time of afferent stimulation and of the activity of all constituent neurons, each an “all-or-none” affair, determines the state. Specification of the nervous net provides the law of necessary connection whereby one can compute from the description of any state that of the succeeding state, but the inclusion of disjunctive relations prevents complete determination of the one before. Moreover, the regenerative activity of constituent circles renders reference indefinite as to time past. Thus our
knowledge of the world, including ourselves, is incomplete as to space and indefinite as to time. **This ignorance, implicit in all our brains, is the counterpart of the abstraction which renders our knowledge useful**. The role of brains in determining the epistemic relations of our theories to our observations and of these to the facts is all too clear, for it is apparent that every idea and every sensation is realized by activity within that net, and by no such activity are the actual afferents fully determined.
Even more broadly, from the neuro side, and though not cited directly in the famous M-P paper, the idea of "reverberating activity" in neural loopy circuits (in modern terms, recurrent) has been suggested and discussed at just about the same time and earlier, by Lorente de Nó, Lashley, and later by Hebb etc. McCulloch himself, in his 1949 "The brain computing machine" [1] attributes the idea to Lorente de Nó and to Kubie (there's no explicit citation, but a relevant work of Kubie -- who was a psychoanalyst -- can be found in [2]. There's some nice quotes there about feed-forward vs recurrent connectivity, again if we were to modernize the terminology).
[1] McCulloch, Warren S. "The brain computing machine." Electrical Engineering 68.6 (1949): 492-497.
[2] Kubie, Lawrence S. "A theoretical application to some neurological problems of the properties of excitation waves which move in closed circuits." Brain 53.2 (1930): 166-177.