The (Metro)logic of Machine Learning
Is it all just measurement systems?
In his book Simplexity: Simplifying Principles for a Complex World, neuroscientist Alain Berthoz opens one of the chapters with the following quote from psychophysicist Jan Koenderink:
The world is infinitely complicated, yet the only way to investigate it is to ask questions. . . . The questions imply the answers and act as a “format” for converting simple structures into meaningful information. In this way, intention and meaning are imposed by the observer. To be a good observer requires being able to ask relevant questions of nature or, to put it differently, to have an interface that makes the world appear sufficiently simple to ensure survival. We need simple and effective geometries for interfaces, in contrast to the theories inevitably offered by physics.
This is a statement about the importance of measurement and, more immediately, about the importance of making the right measurements at the right level of complexity. Through the long and slow process of evolution, and then at a much faster pace after the emergence of culture, we have arrived at a wide variety of measurement strategies and interfaces:
using our senses in feedback with locomotion and other outwardly projected behavior to query our environment and to build models of it in order to succeed in it (J.J. Gibson’s oft-quoted “we must perceive in order to move, but we must also move in order to perceive”);
using language to probe our world, to organize our knowledge of it, and to develop shared intentionality through a system of evolving linguistic distinctions, concepts, speech acts, and webs of semantic relations connecting our senses, our descriptions of what the senses tell us, and our models of the world phrased in natural language;
designing a wide variety of inquiring systems (or social/cultural technologies) for extracting, organizing, storing, and retrieving knowledge about the world that would not be easily accessible otherwise, distributed as it is in time and space and among multiple minds and different perspectives (see, e.g., Friedrich von Hayek’s view of spontaneous order in both markets and minds construed as information processors and pattern classifiers).
All of these systems and strategies, vastly different in scale and complexity, are observer interfaces in Koenderink’s sense. Each is equipped with its own “simple and effective geometries” that are, indeed, very different from the way physics apprehends the world. Extrapolating this framing to LLMs is very natural, as we already know from the writings of Henry Farrell, Alison Gopnik, Cosma Shalizi, and others. Thus, it is worthwhile to see how exactly LLMs fit the bill in light of Koenderink’s quote, whether they enhance our capabilities as observers, and whether they in turn can act as observers in feedback with us. This will be the theme of this post, and I will eventually come back from LLMs to markets, languages, and other distributed information interfaces.
The trouble with physics
Koenderink’s mention of physics is not incidental. John Archibald Wheeler’s “it from bit” is the motto of informational reconstruction of the physical world. We can read it in two ways. One is idealist: “in the beginning was the bit,” and everything else flows from information. This seems to have been Wheeler’s view. The other is more realist and points toward measurement as a mechanism of getting a purchase on the material world by collecting information about it, by interacting with it. As Koenderink says, we must be able to ask questions in order to investigate the world. Carl Friedrich von Weizsäcker, a renowned German physicist and philosopher, attempted to do precisely this through his theory of Ur-alternatives, binary measurements that yield only yes-or-no answers. With these building blocks, von Weizsäcker wanted to derive the structure of classical and quantum physics. Mathematicians also have a way of reconstructing various complicated objects (such as algebraic varieties, differentiable manifolds, or probability spaces) by starting with a sufficiently rich algebra of observables that can be used to peer at (measure!) the object of interest and then showing that there exists a one-to-one correspondence between the points making up this object and certain mathematical abstractions of mapping each feasible observable to a numeric outcome.
But this is not what Koenderink is talking about. At the end, when he contrasts “simple and effective geometries for interfaces” with “theories … offered by physics,” he is talking about what Wilfrid Sellars referred to as the manifest image of the world, the arena of ordinary everyday experience, in contrast to von Weizsäcker’s informational (re)construction of the scientific image using an elaborate game of twenty questions played against Nature. The interfaces Koenderink has in mind are our senses and the way our minds structure the experience derived from the senses. This is far removed from the pristine world of carefully controlled scientific experiments, where the links from perception to interpretable observations are extremely indirect and highly complex. Koenderink spent his life studying visual perception and that is what he mostly had in mind, but exactly the same ideas apply to all our other senses, including (as Berthoz would hasten to emphasize) interoception and proprioception. We couldn’t function in the world if we didn’t have a means for mapping regularities in our environment to structural information keyed to the pragmatics of living. Some of these strategies are slow, accurate, and expensive; others are fast, approximate, and frugal. Either way, it all comes down to measurement systems.
Machine learning as measurement
A recent paper by Fintan Mallory suggests to view LLMs as stochastic measuring devices. Fintan defines a measuring device as “an artefact that has been produced to alter its internal states in response to interactions with its environment in such a way that one can gain information about the environment from it,” and then he wants to argue that this is a novel and useful framing for LLMs. Along the way, he wants to draw a contrast between LLMs and traditional measuring devices (such as, say, thermometers). He writes the following about deep neural nets in general and about LLMs in particular:
Unlike traditional measuring devices, they have the memory resources to store models of the domain they measure and, in their generative capacity, are capable of running computational simulations of that domain.
I think, however, that the difference between LLMs and other measurement devices is not as significant as Mallory's paper suggests and that this framing applies more broadly to machine learning systems in general. However, viewing machine learning through the lens of measurement might not be very intuitive even to specialists, so we should be a bit more concrete about it.
Fintan’s definition of a measuring device agrees with other definitions of measurement systems one can find in the literature, such as in An Introduction to The Informational Theory of Measurement, a 1974 text by G. Kavalerov and S. Mandelstam, two Soviet researchers in the area of metrology, information processing, and control systems. Kavalerov and Mandelstam want to distinguish the carefully controlled world of high-precision physics experiments from the fast-paced world of everyday technology, where uncertainty is everywhere, time is of the essence, and a typical measurement system is a cascade of many functional units that convert either sensor data or outputs of other units to structured information. To them, all measuring devices are inherently stochastic. Moreover, the measurement systems Kavalerov and Mandelstam study also have memory because, in their theory, measurement is a dynamic process that generates and makes use of large datasets consisting of collected sensor measurements and their processed summaries.
In the context of machine learning, the process of collecting training data is, in fact, a process of measurement. In Kavalerov and Mandelstam’s classification, it involves primary perception or sensing of the quantities being measured, followed by selection of relevant attributes and features and by transformation of these into an appropriate data format. It is important to keep in mind, however, that there is a complicated cascade of transformations that takes us from measurements collected in the wild to a CSV file. Then the measured dataset is used to train a given machine learning system. The dataset can be viewed as the environment in which the training takes place, and the training process sets up an interaction between the environment and the machine learning system. In the course of training, the internal state of the machine learning system is altered in response to this interaction. The measurements acquired as training data are a finite sample from the domain of interest, and the adjustment of the model parameters during training can be seen as a process of transforming these measurements into information about the phenomenon being measured.
The internal organization of a trained machine learning model reflects the history of past measurements, i.e., the training data. When the model is used for inference or generation, it is also interacting with its environment, and the effect is also a change in its internal state that can be read off by the users of the model through an appropriate interface (e.g., text for LLMs). Again, taking a cue from Kavalerov and Mandelstam, it is entirely natural to interpret the overall transformation from inputs (e.g., prompts) to outputs as a cascade of measurement-like transformations starting with some primary perceptions and resulting in an information-bearing output signal.
Going back to Fintan Mallory’s paper, then, I would argue that the novelty of LLMs is not so much in their ability to store models of the domain being measured, but in what they are measuring, both during training and during generation or inference.
What are we measuring?
So, what do LLMs measure? As models of language, they measure informational regularities in language. Yet, keeping Koenderink’s quote in mind, we see that language itself is a stochastic measurement system coupled to the world through the members of linguistic communities. The evolution of language involves forming direct ties to the world by posing questions to it, followed by formation and stabilization of various concepts and distinctions with a rich network of internal relations between them. This is not a new point at all, going back at least to Zellig Harris’ informational theory of language. A given language, as a whole, is a dynamic system that operates in feedback with a community of speakers and comes to reflect their views, practices, and beliefs. Stable and frequent regularities in the speakers’ environment get coded as more compact linguistic utterances, sometimes sacrificing speed for accuracy, and there is also capacity for generating novel utterances, inventing new words, or completely changing the usage and meaning of existing words. This was also Shannon’s insight: language captures and stores the speakers’ implicit knowledge of the statistics of that language, and these reflect (if imperfectly) various regularities of the speakers’ shared environment. For our everyday needs as observers and agents, language is a good interface in Koenderink’s sense. With this understanding of language itself as a measurement system coupled to the world, we can now see what it is that LLMs measure: they measure the structures and patterns stored in (measured by!) language.
The limits of my language mean the limits of my world.
— Ludwig Wittgenstein, Tractatus Logico-Philosophicus
LLMs, by virtue of the mediated nature of their access to the world via language, can only measure certain kinds of information about the world, namely that encoded in the network of relations between concepts and other linguistic entities. They lack access to the direct sensory links between language and the world. Some researchers argue that language is not enough and that endowing learning systems with perception modules will be needed to resolve this problem.
Now, we come to the second aspect emphasized by Fintan Mallory, that of computational simulation. Simulation is used by scientists, engineers, and policy-makers as an investigative tool, so it is also a form of measurement whose goal is to obtain an answer to some question we wish to pose about the world. Indeed, we can easily come up with other examples of systems that both store models of their domain and can be used to simulate it. Markets are one obvious example and, in fact, this is exactly how Friedrich von Hayek conceptualized them. To Hayek, markets are world-coupled information processors endowed with memory and with capabilities for pattern classification and abstraction. This is the sense in which LLMs are similar to markets and other inquiring systems. And, just like we can view the training of LLMs as making measurements of measurements (the ones encoded in the snapshot of language at the time of training), the use of LLMs as generative systems is also tantamount to making measurements of measurements as the generative process unfolds. Here, though, the crucial difference is the presence of human users in the loop. Users pose questions in the form of prompts and use the LLM-generated responses to direct their activities. They have their own local knowledge which, in the best case scenario, they can combine advantageously with the massive trove of knowledge stored and structured in language models. And, just like with markets, blind trust in the system may prove detrimental; the end user of the simulation should have enough discernment to “trust but verify.” This is what being a good observer requires.


Nice commentary, I also read Mallory’s paper and wondered what you would say about the idea of using deep neural nets as measurement tools in science, more specifically.