In the preface to his 1974 Theory of Probability, Bruno De Finetti proclaims provocatively:
PROBABILITY DOES NOT EXIST
The abandonment of superstitious beliefs about the existence of the Phlogiston, the Cosmic Ether, Absolute Space and Time, . . . or Fairies and Witches was an essential step along the road to scientific thinking. Probability, too, if regarded as something endowed with some kind of objective existence, is no less a misleading misconception, an illusory attempt to exteriorize or materialize our true probabilistic beliefs.
De Finetti, just like Frank Ramsey and Leonard J. Savage, advocated a subjectivist interpretation of probabilities as individual beliefs backed up by a person’s willingness to place bets or, in Savage’s more general interpretation, to take actions contingent on these beliefs. My own view is that, indeed, probability does not exist in what David Slepian called Facet A (the realm of observations on, and manipulations of, “the real world”), but it has legitimate existence in Slepian’s Facet B (the realm of mathematical models and theories).1 In other words, probabilities are not objective properties of physical systems, but rather numerical assessments attached to propositions about these systems.
A while ago, I wrote about the important distinction between probability and statistics as operating, respectively, on the meta level and on the object level; as Sunny Auyang put it in her book on complex systems,
[s]tatistical theories employing the probability calculus give coarse-grained descriptions of composite systems that overlook much information about individual behaviors. Probabilistic propositions express our desire to know some of the missing information. Since the information is unavailable from statistical theories, probabilistic statements appear as an appraisal of propositions such as "This hand is a flush.” Unlike statistical propositions, probabilistic propositions do not belong to the object language, in which we talk about objects and the world. They belong to the metalanguage, in which we talk about propositions and theories.
In this interpretation, which is softer than De Finetti’s radical subjectivism, probability statements can be intersubjective, for example, publicly shared and agreed on by a community of scientists or engineers. The main point is that they refer not to any empirically accessible entities in Facet A, but to propositions and theories about these entities that reside in Facet B. When we stay in Facet A, we can make statements about individuals, such as “this run of our Monte Carlo algorithm generated such and such an output” (recorded as a finite-precision rational number), or “in 900 out of 1,000 runs of stochastic gradient descent, the empirical loss was between 0.01 and 0.03” but never anything like “the probability that the output of the Monte Carlo algorithm is within 0.01 of the unknown true target is 0.95”.
Ben Recht wrote about these issues in a series of posts starting with this one. They are well worth reading, and the key takeaway is that, as Auyang writes, “probabilistic propositions are not empirically testable, even in principle.” As she points out, the primary reason why so many scientific and engineering disciplines make use of the calculus of probability is not because it somehow encodes the “laws of chance” (if only because there is no universal agreement on what constitutes “chance”). It is the structure of the probability calculus that renders it so useful, apart from any interpretation of what the probabilities mean.
When Kolmogorov introduced his axioms of probability in 1933, his main innovation was not the use of measure theory—that was already suggested by others, e.g., by Fréchet. Even the machinery of sigma-algebras, which allows one to form propositional sentences about a system consisting of multiple parts using countably many Boolean operations, was not the main innovation, although here we can already see the Facet B notions of theory-building using formal models. The genuinely new ideas brought in by Kolmogorov were those of independence and conditioning (and also the use of measure-theoretic tools like the Radon-Nikodym theorem to define conditional expectations rigorously). Structurally, probability theory allows us to associate with a given system the notions of part and whole and to work with relative magnitudes of the parts, as long as the consistency conditions required by the axioms are satisfied. Both Ramsey and De Finetti emphasized this aspect as well, including the use of logical operations to form composite statements and the importance of internal consistency.2 The view of probabilities as relative magnitudes, as emphasized by Auyang, is purely pragmatic and agnostic with respect to interpretation. The notion of independence refers to the (absence of) relation between certain parts of the system (for example, as pointed out by Kolmogorov, the digits of a decimal expansion of a number between 0 and 1 may be “independent” in this sense, but one does not need to invoke probabilistic notions); the machinery of conditioning allows us to refine our assessment of relative magnitudes of some parts given knowledge of some other parts.
Now, Kolmogorov was an intuitionist, along the lines of Brouwer, Heyting, and Weyl. As such, he viewed events that are formed by taking infinitely many unions or intersections of simpler events as ideal, and only the ones that can be obtained using finitely many operations were deemed empirically accessible:
[W]e consider these sets in general only as "ideal events" to which nothing corresponds in the world of experience. But if a deduction uses the probabilities of such events, and if it leads to the determination of the probability of a real event, this determination will obviously be unobjectionable also from an empirical point of view.
Philosophically, this is quite close to Bas van Fraassen’s constructive empiricism. In his 1980 book The Scientific Image, van Fraassen says that a scientific theory should be empirically adequate, which is the case whenever the statements it makes about observable things and events are true. Or, in other words, an empirically adequate theory “saves the phenomena.” Further,
[t]o present a theory is to specify a family of structures, its models; and secondly, to specify certain parts of those models (the empirical substructures) as candidates for the direct representation of observable phenomena.
In Kolmogorov’s view, then, the empirical substructures are probability statements about events that admit finite descriptions and finite recipes for verification (we thus ‘save the phenomena’), and all the other parts of the theory that involve infinite operations are acceptable as long as the goal of empirical adequacy is met by the world-facing parts of the theory as they project out into Facet A. But even here one must tread carefully because only statistical statements are admissible in Facet A. At the end of the day, a probabilistic theory should only make empirically verifiable statements. If we were to write out a statement like “with 95% probability, a previously unseen image will be misclassified by our neural net with probability of at most 2%” precisely, using appropriate quantifiers, we would realize that it refers to several kinds of entities:
individuals, i.e., the trained neural net classifier and the new image that will be presented to it;
systems of individuals, i.e., the collections of all possible neural nets and images; and
ensembles of systems, i.e., all the possible random (random in what sense?) realizations of the training images and of the initial weights in the neural net before training.
Only the individuals in this scheme are subject to empirically verifiable claims, which are of the form of subject-predicate propositions, where the subjects are neural net classifiers and images, and the predicates pick out a particular group of individuals according to a universal, e.g., all the classifier-image pairs in which a misclassification occurs. A probabilistic statement then tells us how likely a given group of individuals is to be picked out by this universal. In the above example, there are two nested probabilistic statements: One referring to the pretrained neural net and the test image (probability of misclassification of 2%), and the other one referring to the ensemble of neural nets corresponding to all possible realizations of the training data (the 95% probability of ending up in the situation when the misclassification error happens with 2% probability). The only sense in which either of these two statements is empirically verifiable is on the object level, by doing statistics. The probability statements operate on the meta level and are attached to statements about systems and ensembles of systems, not about particular individuals.
The probabilist Krzysztof Burdzy, in his 2009 book The Search for Certainty: On the Clash of Science and Philosophy of Probability (which was just as provocative3 as De Finetti’s) argued that, while any use of probability calculus is allowed internally subject to Kolmogorov’s axioms, the only external output of a probabilistic theory should be a statement of the form “the probability of such-and-such an event is zero” or “the probability of such-and-such an event is one.” This, in my view, is close to van Fraassen’s notion of empirical adequacy since we can subject such zero-one statements to empirical verification, assuming that the events they refer to map faithfully to Facet A. And, just like the case with unobservables in van Fraassen’s framework, any consistent use of probabilistic constructs is ok, as long as we refrain from making any truth claims about the events they represent.
So, Ramsey, De Finetti, and Savage were definitely on to something when they refused to treat probabilities as objective features of the world or as logically necessary consequences of fixed beliefs about the world (this was the “logicist” view advocated by John Maynard Keynes and by Rudolf Carnap). However, they also went too far in the subjective direction. Indeed, any probabilistic theory, as a Facet B construct, must be coherent according to Kolmogorov’s axioms; however, we demand that any claims that it makes about Facet A entities have to be empirically verifiable, and thus must be “direct representations of” (i.e., correspond to) observable phenomena. The need to base scientific theories on both coherence and correspondence theories of truth was phrased nicely by Jacob Bronowski in his 1978 book The Origins of Knowledge and Imagination:
[T]he "inferred units," the theoretical concepts of a science, … form the links in the system. I have said of the "grammar of explanation," the axiom system of a science, that it is a truth (in so far as there is truth) which we judge by its coherence; whereas through the "dictionary of translation," which relates abstract theorems to specific problems, we judge that truth by its correspondence. And the fact is that once you think of science as that kind of language, then the usual dilemma of philosophy—the coherence theory of truth versus the correspondence theory of truth—disappears. Why? Because you cannot make a theory of scientific explanation which does not involve both, if you believe the universe to be totally connected. And if you believe that, then you have a very good reason for preferring one axiomatic system to another: namely, that it is more richly connected.
And, as far as correspondence goes, I am in full agreement with Burdzy when he says that
the role of the probability theory is to identify events of probability 0 or 1 because these are the only probability values which can be verified or falsified by observations. In other words, knowing some probabilities between 0 and 1 has some value only to the extent that such probabilities can be used as input in calculations leading to the identification of events of probability 0 or 1.
Quantum mechanics does not contradict this: Its substantive content is about deterministic evolution of the wavefunction according to Schrödinger’s equation. The probabilistic interpretation of the wavefunction à la Born, and the associated notion of collapse, is an interface between the quantum and the classical descriptions of the world, and not a core postulate of quantum mechanics.
De Finetti introduced the notion of coherence of a system of bets, or subjective probability assessments, although he famously insisted on allowing only finite additivity in contrast to Kolmogorov’s axioms that admit countably (or sigma-) additive measures. Chapters 1-2 and Appendix A in the PhD thesis of Peter B. Jones contain an interesting discussion of coherence in the works of Ramsey, De Finetti, and Savage.
It certainly ruffled some Bayesian feathers.
this was great :)
"role of the probability theory is to identify events of probability 0 or 1"
IOW: Everything comes down to the Borel-Cantelli lemmas* ... when the aggregated randomness yields certainty in either direction.
*and other results with a family resemblance to B-C