A Dutch Book Theorem and Converse Dutch Book Theorem for

Kolmogorov Conditionalization

Michael Rescorla

Abstract: This paper discusses how to update one’s credences based on evidence that has initial

probability 0. I advance a diachronic norm, Kolmogorov Conditionalization, that governs credal

reallocation in many such learning scenarios. The norm is based upon Kolmogorov’s theory of

conditional probability. I prove a Dutch book theorem and converse Dutch book theorem for

Kolmogorov Conditionalization. The two theorems establish Kolmogorov Conditionalization as

the unique credal reallocation rule that avoids a sure loss in the relevant learning scenarios.

§1. Dutch book arguments for conditionalization

How should you update your credences in light of new evidence? The most widely

discussed norm is Conditionalization, which requires that:

If you assign credence P(H) to a proposition H, and you gain new evidence that is

exhausted by knowledge of E, then you respond to your new evidence by assigning

credence P(H | E) to H.

Here P(H | E) is the conditional probability of H given E. Conditionalization traces back to

Bayes’s seminal discussion (Bayes and Price, 1763). It is a linchpin of Bayesian decision theory.

Philosophers have pursued various strategies for justifying Conditionalization. One

prominent strategy builds upon the classic Dutch book arguments advanced by Ramsey (1931)

and de Finetti (1937/1980). A Dutch book is a collection of acceptable bets that inflict a sure

loss. An agent is Dutch bookable when it is possible to rig a Dutch book against her. Dutch

bookability is a very undesirable property, because a sufficiently devious bookie can pump a

Dutch bookable agent for money by offering her bets that she gladly accepts. Ramsey and de

Finetti independently noted that one can rig a Dutch book against an agent whose credences

violate the probability calculus axioms. They argued that such an agent’s credal allocation is

rationally defective. They concluded that credences should conform to the probability calculus

axioms. This is a synchronic Dutch book argument, because it addresses credences at a moment

rather than credal evolution over time. Subsequent authors have pursued diachronic Dutch book

arguments concerning how to reallocate credence in light of new evidence. In particular, Lewis

proved a Dutch book theorem for Conditionalization. Teller (1973) publicized the theorem (with

full credit to Lewis), and Lewis (1999) eventually published his own treatment.

Lewis’s theorem concerns an idealized agent who learns a proposition drawn from some

mutually exclusive, jointly exhaustive set of propositions E1, …, En, with P(Ei) > 0 for each i.

When Ei has non-zero probability, one defines conditional probability through the ratio formula:

P( H & Ei )
P( H | Ei )  .
P( Ei )

For the proposed learning scenario, we may reformulate Conditionalization as a norm that I will

call Ratio Conditionalization. This norm requires that:

If you assign credence P(H) to each proposition H, and E1, …, En are mutually exclusive,

jointly exhaustive propositions with P(Ei) > 0 for all i, and you gain new evidence that is

exhausted by knowledge of one particular proposition Ei, then you respond to your new

P( H & Ei )
evidence by assigning credence to H.
P( Ei )

Lewis proved that one can rig a diachronic Dutch book (a Dutch book involving bets offered at

different times) against an agent who violates Ratio Conditionalization. Skyrms (1987) proved a

converse theorem: one cannot rig a diachronic Dutch book against an agent who conforms to the

probability calculus axioms and who obeys Ratio Conditionalization. These two theorems

establish Ratio Conditionalization as the unique update rule that avoids diachronic Dutch books

in the specified learning scenario. Lewis and Skyrms argue on that basis that one should conform

to Ratio Conditionalization.

When P(Ei) = 0, the ratio formula is not well-defined and Ratio Conditionalization does

not apply. Ratio Conditionalization does not say how an agent should update her credences based

upon evidence that has probability 0. I call such cases null updating scenarios. A simple example

is conditioning on the value of a continuous random variable X, i.e. a random variable with

continuum many values. Orthodox probability theory demands that P(X = x) = 0 for all but

countably many values x. When P(X = x) = 0, the ratio formula does not yield well-defined

conditional probabilities P(H | X = x). Thus, Ratio Conditionalization does not specify how to

reallocate credence if you learn that X = x. How should you reallocate credence under these

circumstances? More generally, what diachronic norms govern null updating?

Kolmogorov (1933/1956) offered an influential mathematical framework that bears

directly upon these questions. He advanced a theory of conditional probability general enough to

cover scenarios where the conditioning evidence has probability 0. Kolmogorov’s treatment

plays a foundational role within probability theory. It informs every standard graduate-level

textbook. A few philosophers have explored its potential to illuminate credal reallocation (e.g.

Easwaran, 2008; Gyenis and Rédei, 2017; Huttegger, 2015). More commonly, though,

philosophers either reject Kolmogorov’s approach (e.g. Hájek, 2003, 2011; Howson, 2014;

Myrvold, 2015; Seidenfeld, 2001) or else ignore it altogether.

I think that the philosophical community has paid Kolmogorov’s insights into conditional

probability far less attention than they merit. One can extract from Kolmogorov’s discussion a

plausible diachronic norm that governs many important null updating scenarios. I call this norm

Kolmogorov Conditionalization. I will prove a Dutch book theorem and converse Dutch book

theorem for Kolmogorov Conditionalization. The theorems establish Kolmogorov

Conditionalization as the unique update rule that avoids diachronic Dutch books in relevant

learning scenarios.1

Considerable stage-setting is needed before I state and prove the two main theorems. §§2-

3 offer background remarks on null updating and Dutch book arguments. §4 presents

Kolmogorov’s theory of conditional probability. §5 introduces Kolmogorov Conditionalization

and constructs a diachronic Dutch book for agents who violate it. §6 more rigorously states and

proves a strengthened Dutch book theorem. §7 proves a converse Dutch book theorem. §8

discusses the significance of the two theorems. I assume throughout that credences should be

countably additive. This assumption is controversial (de Finetti, 1972), (Howson, 2014),

(Kadane, Schervish, and Seidenfeld, 1986), (Savage, 1954). However, exploring the two main

theorems will keep us busy enough without wading into controversies over countable additivity.

§2. Null updating

McGee (1994) generalizes Lewis’s setup so as to analyze a special case of null updating. He considers an agent
who learns a proposition drawn from some mutually exclusive, jointly exhaustive set E1, …, En, where P(Ei) may be
0. For this special case, McGee proves a Dutch book theorem and converse Dutch book theorem involving Popper’s
(1959) theory of conditional probability. McGee’s setup is not general enough to cover even our motivating example
of conditioning on a continuous random variable, since a continuous random variable induces an infinite partition of
the probability space. The theorems I prove below cover conditioning on a continuous random variable and many
other learning scenarios unaddressed by previous Dutch book results.

Why study null updating? One reason is that it figures prominently in scientific practice.

Our world is populated by continuously varying quantities: shapes, sizes, colors,

locations, and so on. Bayesian agents must frequently estimate these quantities. Accordingly,

continuous random variables are fixtures within scientific applications of the Bayesian paradigm,

including Bayesian statistics (Lindley, 1982), probabilistic robotics (Thrun, Burgard, and Fox,

2006), and game theory (Fudenberg and Tirole, 1991). They also play a large role within

Bayesian cognitive science (Chater and Oaksford, 2008), which provides Bayesian models of

core mental processes such as perception (Knill and Richards, 1996), motor control (Wolpert,

2007), and navigation (Madl, et al. 2014). All these fields routinely adduce credences over a

continuous hypothesis space. For example, probabilistic robotics aims to construct a robot that

estimates its own location. Similarly, Bayesian perceptual psychology hypothesizes that the

perceptual system uses Bayesian inference to estimate the shapes, sizes, colors, and locations of

observable objects. Scientific applications of Bayesianism cannot get far without a continuous

probability space.

Standard probability theory requires that all but countably many values of a continuous

random variable receive probability 0. Scientific applications of Bayesianism therefore assign a

central role to null updating. For example, one might estimate the mass of some star by

measuring perturbations in the orbit of a nearby planet; or one might estimate the location of an

underwater missile by taking sonar readings; or one might estimate the color of a distal surface

by measuring the light spectrum emitted by the surface. All these learning scenarios, and

numerous others that arise in scientific applications, require conditioning on the value of a

continuous random variable (or random vector).


Nevertheless, null updating receives surprisingly little philosophical discussion. One

reason is that many philosophers endorse regularity: the doctrine that one should assign positive

probability to metaphysically possible propositions (Jeffrey, 1992), (Kemeny, 1955), (Shimony,

1955), (Skyrms, 1980), (Stalnaker, 1970). Assuming regularity, a rational agent cannot learn a

proposition to which she formerly assigned probability 0. However, regularity is implausible

(Hájek, 2003, 2012). It conflicts with the aforementioned scientific applications of Bayesianism.

It also conflicts with Ratio Conditionalization: conditionalizing on Ei leads you to assign

credence 1 to Ei and credence 0 to conflicting propositions, even when those propositions are

metaphysically possible. Anyone sympathetic to Ratio Conditionalization should reject


Jeffrey (1983) argues that Conditionalization has limited applicability to real world

agents, since experience rarely authorizes credence 1 for an empirical proposition Ei. More

typically, experience authorizes you to reallocate credence across certain select propositions E1,

…, En, with no single proposition receiving credence 1. On that basis, you must assign new

credences to all remaining propositions H. Jeffrey formulates an update rule (Jeffrey

Conditionalization) tailored to this learning scenario.

Although Jeffrey focuses his critique on Ratio Conditionalization, a similar worry arises

even more forcefully for null updating. We are finite beings with limited representational and

discriminative capacities. Our perceptual organs and measuring instruments rarely if ever specify

the value of a random variable X with infinite precision (Borel, 1909/1956), (Myrvold, 2015). At

best, we learn that X has value x plus or minus some margin of error. At best, we learn that X’s

value falls in some interval, where the interval has positive probability. Thus, one might argue

that we rarely if ever learn null propositions with anything approaching complete confidence.

From this viewpoint, null updating looks like an overly fanciful learning scenario that idealizes

away crucial features of real-world statistical inference.

To a certain extent, I sympathize with such worries. Null updating scenarios are heavily

idealized. There is value in studying less idealized learning scenarios. But I insist that there is

also value in studying null updating. In field after field --- from statistics to economics to

robotics to cognitive science --- researchers have found it explanatorily or pragmatically fruitful

to consider null updating scenarios. These scenarios form an essential starting point for inquiry

into less idealized scenarios. Normative models of null updating persist as benchmarks against

which we can compare less idealized models. Thus, current scientific practice establishes null

updating as a theoretically important phenomenon that merits intensive foundational

investigation. For further argument that we should study null updating, see (McGee, 1994).

§3. Dutch book theorems versus Dutch book arguments

Even philosophers who agree that null updating merits investigation may doubt that

Dutch books shed much light upon it. Although Dutch book arguments are still fairly popular,

they have attracted increasingly harsh criticism over the past few decades.

We must carefully distinguish here between Dutch book theorems and Dutch book

arguments. Dutch book theorems are mathematical results that admit definitive proof. For

example, Lewis proves that one can rig a diachronic Dutch book against anyone who violates

Ratio Conditionalization. Dutch book arguments use Dutch book theorems to defend a

philosophical conclusion: namely, that one should conform to certain norms governing credal

allocation. Over the past few decades, philosophers have grown increasingly skeptical that one

can convert Dutch book theorems into compelling Dutch book arguments (Hájek, 2009). Most

fundamentally, Dutch book arguments seem to elide pragmatic and epistemic factors. If you are

Dutch bookable, then you are vulnerable to a guaranteed net loss. This vulnerability is a

pragmatic defect, so it indicates that your credences are defective from a pragmatic viewpoint.

But why conclude that your credences are defective from an epistemic viewpoint? How can

Dutch book considerations establish a failure of epistemic rationality? 2

The present paper focuses on Dutch book theorems rather than Dutch book arguments. I

will prove a Dutch book theorem and converse Dutch book theorem for Kolmogorov

Conditionalization. I will not use the theorems to argue that Kolmogorov Conditionalization is

epistemically privileged. Even without an accompanying Dutch book argument, the theorems are

useful and informative. Specifically, they show that Kolmogorov’s theory of conditional

probability captures fundamental links between conditional probability, credal reallocation, and


§4. Kolmogorov’s theory of conditional probability

I now present the basic elements of Kolmogorov’s theory. Billingsley’s (1995, pp. 427-

440) more detailed exposition serves as a partial basis for my own exposition. In §4.1, I

introduce the learning scenarios modeled by Kolmogorov. In §4.2, I describe how Kolmogorov

delineates conditional probabilities tailored to these learning scenarios.

§4.1 Kolmogorov learning scenarios

Another prominent worry focuses more specifically on diachronic Dutch book arguments. van Fraassen (1984)
proves a diachronic Dutch book theorem for the Principle of Reflection, which is widely regarded as implausible.
Some authors conclude that diachronic Dutch book arguments are suspect (Christensen, 1991). van Fraassen himself
concludes that we should accept both Conditionalization and Reflection. Others try to isolate a principled difference
between Lewis’s diachronic Dutch book theorem for Conditionalization and van Fraassen’s diachronic Dutch book
theorem for Reflection, so that the former may yield a compelling Dutch book argument even though the latter does
not (Briggs, 2009), (Mahtani, 2015).

Kolmogorov assumes an idealized agent whose credences are modeled by a probability

space (, F, P), where  is a set, F is a -field over , and P is a probability measure on F.

Elements of  are outcomes. Elements of F are events. For some purposes, we might regard

outcomes as possible worlds and events as propositions. However, Kolmogorov’s theory does

not presuppose this philosophical gloss.

Kolmogorov considers a learning setup much more general than that considered by

Lewis. In both setups, an idealized agent awaits partial information about the true outcome .

In Lewis’s setup, the agent learns where  falls within some finite partition of . In

Kolmogorov’s setup, the agent learns whether  belongs to each GG, where GF is itself -

field. Intuitively, the sub--field G serves as an “information filter.” The agent does not learn

everything about outcome , but she learns about  as filtered through G. I call learning

scenarios of this kind Kolmogorov learning scenarios. Thus, a Kolmogorov learning scenario is

one where the agent gains full membership knowledge for a sub--field GF regarding the true

outcome . If we view events as propositions, then we can say that G contains all the new

propositions learned by our agent.

To illustrate Kolmogorov learning scenarios, consider a continuous random variable X: 

 . When you learn that X() = r, you thereby learn many additional facts about the events to

which  belongs. Assuming that expression “r” is suitably informative, knowledge that X() = r

allows you to affirm or deny each proposition

X()  (a, b) a, b .

In that sense, you gain membership knowledge for all sets

X -1(a, b) a, b .

Let (X) be the -field generated by these sets, i.e. the result of starting with the sets X -1(a, b)

and closing under complementation and countable unions. If your knowledge were closed under

complementation and countable union, then you could extrapolate from membership knowledge

for the sets X -1(a, b) to full membership knowledge for (X). Thus, the sub--field (X) helps us

model the knowledge of an idealized superhuman who learns that X = r and whose knowledge is

closed under complementation and countable union.

Of course, the explicit knowledge of an ordinary finite human is not usually closed under

complementation and countable union. However, there is a natural sense in which full

membership knowledge for (X) is implicit in an ability to affirm or deny each proposition

X()  (a, b) a, b .

In that sense, complete membership information for (X) models the implicit knowledge gained

by an agent who learns that X = r.

Even if we use (X) to model implicit rather than explicit knowledge, the envisaged

learning scenario assumes superhuman mental capacities. If (X) models possible implicit

knowledge that an agent might acquire, then the agent has uncountably many possible doxastic

states corresponding to each value of X. Finite beings such as ourselves do not have uncountably

many possible doxastic states. In particular, we do not have the capacity to represent arbitrary

real numbers with infinite precision. Thus, Kolmogorov’s model even viewed as a model of

implicit knowledge assumes an idealized superhuman with infinitary mental capacities. As

indicated in §2, idealized models of this kind play an important role in current science. They

serve as benchmarks against which we can compare less idealized models. Scientific applications

of Bayesian modeling have repeatedly demonstrated the explanatory and pragmatic benefits that

accrue when we take these idealized benchmarks as a starting point.


Many important cases of rational credal reallocation can be fruitfully modeled (in an

idealized way) as Kolmogorov learning scenarios. I do not say that all cases of rational credal

reallocation can be so modeled. Some learning scenarios do not naturally fit the Kolmogorov

template: scenarios of the kind highlighted by Jeffrey, where experience authorizes you to

reallocate credence across certain select propositions E1, …, En, with no single proposition

receiving credence 1; scenarios involving memory loss or the threat of memory loss (Arntzenius,

2003); scenarios involving conceptual discoveries that add new propositions to your cognitive

repertoire (Lewis, 1999); and so on. Thus, Kolmogorov learning scenarios are only one type of

learning scenario one might study. Still, they are very important. We also understand them

relatively well, thanks in large part to Kolmogorov’s efforts. I henceforth focus exclusively on

Kolmogorov learning scenarios. (Cf. Easwaran, 2013, pp. 122-123.)

§4.2 Regular conditional distributions

We want to delineate probabilities conditional on information about whether  belongs to

each GG. Intuitively, these conditional probabilities constitute a plan for updating credences

after gaining full membership knowledge for G regarding . Formally, Kolmogorov isolates a

function PG : F    . We write PG (A | ) to denote the value that this function assumes on

inputs AF and . Think of PG (A | ) as the probability of A given  as filtered through G.

Kolmogorov places three constraints on PG:

Regularity: For each , PG induces a one-place function PG ( . | ): F  . Say that PG is

regular just in case:

PG ( . | ) is a probability measure for each .


Intuitively: conditioning on new evidence should always carry you to a probability measure.

Note that this is a totally different notion of regularity than the notion rejected in §2. It is an

unfortunate fact that the literature associates these two completely different notions with the

same word “regularity.”

G-measurability: For each AF, PG induces a one-place function PG (A | . ):   . This function

reflects a policy for updating the credence assigned to A upon receiving partial information about

the true outcome. Kolmogorov requires that

PG (A | . ) is G-measurable for each AF.

In other words, he requires that

PG (A | . )-1 (-, a]  G for every AF, a .

G-measurability reflects the assumption that credences are updated based solely upon

membership information for G. Intuitively: the agent must learn whether the updated credence for

A is  a, so one of the propositions she implicitly learns should be either the proposition that

PG (A |  )  a

or else the proposition that

PG (A |  ) > a.

If PG (A | . )-1 (-, a]  G, then someone who learns whether G for each GG can (at least in

principle) determine whether PG (A |  )  a. She need simply examine whether   PG (A | . )-1 (-

, a]. In contrast, suppose that PG (A | . )-1 (-, a]  G. Then our agent’s newly acquired

membership knowledge about G does not include knowledge whether   PG (A | . )-1 (-, a]. She

does not acquire even implicit knowledge whether PG (A |  )  a. Thus, G-measurability captures

the intuitive idea that someone who gains full membership knowledge for G regarding  thereby

gains implicit knowledge whether PG (A |  )  a.

The integral formula: Conditional probabilities must be appropriately related to unconditional

probabilities. When the conditioning event has non-zero probability, the ratio formula specifies

this “appropriate relation.” To analyze probabilities conditional on events with probability 0,

Kolmogorov offers a constraint called the integral formula:

P( A  G)   PG (A | ω)dP( ) , for any GG.


The integral formula generalizes the law of total probability from elementary probability theory.

A G-measurable function PG (A | . ):   that satisfies the integral formula is a conditional

probability for A given G. A two-place function PG (. | . ) satisfying all three constraints is a

regular conditional distribution (rcd) for P given G. One can prove that there always exists a

conditional probability for A given G (Billingsley, 1995, p. 430). One can also prove that there

exists an rcd for P given G in a wide variety of cases (Durrett, 1991, pp. 198-200), (Rao, 2005,

pp. 125-182), including all or virtually all cases that arise in empirical applications.

In certain pathological cases, there is no rcd for P given G (Billingsley, 1995, p. 443).

Some critics regard these cases as a serious problem for Kolmogorov’s theory (Seidenfeld,

2001). However, I think that they should not worry any theorist who already accepts countable

additivity. Vitali proved that certain -fields, such as the power set of the unit interval, do not

admit probability measures. His proof uses the Axiom of Choice to define a nonmeasurable set.

Standard examples where rcds do not exist likewise feature a nonmeasurable set (Seidenfeld,

Schervish, and Kadane, 2001). Once we accept that countably additive unconditional

probabilities do not always exist, we should not feel particularly disturbed that countably

additive conditional probabilities do not always exist.

Notably, Kolmogorov relativizes conditional probabilities to conditioning sub--fields.

One never conditions upon an isolated probability zero event. Rather, one conditions upon an

outcome as filtered through a sub--field G. Suppose random variables X and Y are such that

X() = x iff Y() = y for all ,

where the event {X = x} (i.e. the event {Y = y}) has probability 0. In many such cases, there

exists AF such that

P(X)(A | )  P(Y)(A | )

for all {X = x}, where P(X)(A | .) is a conditional probability for A given (X) and P(Y)(A | .)

is a conditional probability for A given (Y). According to Kolmogorov, there is no determinate

answer as to the conditional probability of A given {X = x}. A determinate conditional

probability arises only once one regards {X = x} as embedded in a surrounding sub--field.3

Kolmogorov’s relativistic approach is controversial among philosophers. Many authors insist

that we should instead strive to isolate unrelativized conditional probabilities (Hill, 1980),

(Howson, 2014), (Kadane, Schervish, and Seidenfeld, 1986), (Myrvold, 2015). See (Gyenis,

Hofer-Szabó, and Rédei, 2017) and (Rescorla, 2015) for defense of Kolmogorov’s relativistic


A famous example, nowadays called the Borel-Kolmogorov paradox, arises when a point is picked randomly on
the earth’s surface. How should we condition on news that the point falls on some great circle C? One can motivate
different intuitively compelling answers, depending on whether one regards C as the equator or as two meridians
fused together. According to Kolmogorov, there is no single determinate answer. We must first settle upon a
conditioning sub--field that contains C. We can pick a sub--field that corresponds to regarding C as the equator,
or we can pick a different sub--field that corresponds to regarding C as two meridians fused together. The different
conditioning sub--fields engender different rcds. Gyenis, Hofer-Szabó, and Rédei (2017) and Rescorla (2015)
discuss the Borel-Kolmogorov paradox at length.

Even when we hold the conditioning sub--field fixed, Kolmogorov’s theory determines

PG (A | . ):   only up to measure 0. One can vary PG (A | ) arbitrarily on outcomes 

comprising a probability zero event, provided one takes care that for each  the results assemble

into a probability measure PG ( . | ). This indeterminacy arises because the integral formula

mentions PG (A | ) only inside an integral sign. Integration ignores differences of measure 0, so

the integral formula does not pin down unique conditional probabilities. In the mathematical

literature, candidate conditional probabilities PG (A | . ) equal up to measure 0 are sometimes

called versions of the conditional probability, which is then regarded as an equivalence class of

its versions. Using this terminology, Kolmogorov’s theory does not privilege any specific

version of the conditional probability. Many authors find the resulting indeterminacy worrisome.

Pfanzagl (1979), Rao (2005), and Tjur (1980) try to mitigate the indeterminacy by

supplementing Kolmogorov’s theory with additional constraints on conditional probabilities.4

Over the past century, researchers such as Popper (1959) and Rényi (1955) have

proposed various alternatives to Kolmogorov’s theory of conditional probability. These

alternative theories tend to receive better philosophical press than Kolmogorov’s theory.

However, Kolmogorov’s approach is orthodox within contemporary mathematical practice. One

of its main advantages is that the integral formula tightly constrains the relation between

conditional and unconditional probabilities. Popper and Rényi supply no comparably substantive

constraints. In practice, Kolmogorov offers far more guidance for computing conditional

probabilities than either Popper and Rényi. Of course, one must assess whether Kolmogorov

offers good guidance. The theorems proved below provide insight into that question.

Another worry about Kolmogorov’s approach is that, when the conditioning sub--field is sufficiently odd, the
resulting rcds have properties that conditional properties seemingly should not have (Hájek, 2009), (Seidenfeld,
Schervish, and Kadane, 2001). See Easwaran (2011) for one response to this worry.

§5. A Dutch book theorem for Kolmogorov Conditionalization

Kolmogorov’s explicit concern is conditional probability rather than credal reallocation.

Still, Kolmogorov’s theory naturally suggests a norm that governs credal reallocation in

Kolmogorov learning scenarios. Suppose that (, F, P) is a probability space and GF is a sub-

-field. Kolmogorov Conditionalization requires that:

If you begin with credences P over F, and there exists an rcd for P given G, then, for one

such rcd PG, whenever you gain new evidence that is exhausted by full membership

knowledge for G regarding the true outcome , you respond by adopting new credences

PG ( . | ) over F.

Kolmogorov Conditionalization requires you to update credences whenever possible using an

rcd. If G is generated by a partition E1, …, En of events such that P(Ei) > 0, one can show that the

ratio formula yields a unique rcd for P given G. In this special case, Kolmogorov

Conditionalization reduces to Ratio Conditionalization. If there are infinitely many rcds for P

given G, you can satisfy Kolmogorov Conditionalization by using any one of them as your credal

reallocation policy. If there exists no rcd for P given G, Kolmogorov Conditionalization does not

say how to proceed.5

I will now show that anyone who violates Kolmogorov Conditionalization is Dutch

bookable. I assume the following setup. At time t1, (, F, P) models your credal allocation. At

time t2, you and I will both gain full membership knowledge for GF. Let C(A | ) be the

credence you would assign to AF upon gaining this knowledge for outcome . Given the

argumentation of §4.2, we may assume that

I do not assume that every probability space (, F, P) models a possible credal allocation or that every sub--field
G models a possible learning scenario. In certain cases, (, F, P) or G may be so bizarre that no possible agent
satisfies the antecedent of Kolmogorov Conditionalization (even though there exists an rcd for P given G).

C( . | ) is a probability measure for each .

C(A | . ) is G-measurable for all AF

These two assumptions echo the first two clauses in the definition of rcd.

If C(A | . ) satisfies the integral formula for each AF, then C is an rcd and your credal

reallocation policy conforms to Kolmogorov Conditionalization. Let us therefore assume that

C(A | . ) violates the integral formula for some AF. It follows that

P{: C(A | )  PG (A | )} > 0,

where PG (A | .) is some conditional probability for A given G. Note that

{: C(A | )  PG (A | )} = {: C(A | ) < PG (A | )}  {: C(A | ) > PG (A | w)}.

Both sets on the right-hand side belong to G. At least one of these two sets must have non-zero P-

measure. Without loss of generality, suppose the first does. Call this set G:

G =df {: C(A | ) < PG (A | )}.

I now use G to rig a Dutch book containing sequential bets offered at times t1 and t2.

At time t1, I offer you a conditional bet: if we learn at time t2 that G, then I will sell

you for price PG (A | ) a wager that pays off as follows:

A  payoff = 1

A  payoff = 0.6

We will be able in principle to determine PG (A | ) at time t2, because membership knowledge

for G fixes PG (A | ). Call this bet 1. Table 1 summarizes net gain for bet 1 given any outcome

. Bet 1 has payoff 1 when AG and payoff 0 otherwise, so its expected payoff is

P(AG). You pay price PG (A | ) when G and price 0 otherwise, so the expected price is

Here and elsewhere “” is the material conditional.

G PG ( A |  )dP( ) .
By the integral formula, the expected price is P(AG). Since the expected payoff and the

expected price are equal, the expected net gain is 0. You therefore accept bet 1 as fair. You

commit to buying the specified wager for the specified price if it turns out that G. (Cf.

Billingsley, 1995, p. 431.)


At time t2, we both learn whether G. If G, then no money changes hands. If G,

then we enact the gambling transaction agreed upon at time t1. Furthermore, I now ask you to sell

me for price C(A | ) a wager that pays off as follows:

A  payoff = 1

A  payoff = 0.

Call this bet 2. Table 2 summarizes net gain for bet 2. Bet 2 has payoff 1 when A and payoff

0 otherwise, so its expected payoff is your credence in A at time 2: namely, C(A | ). This is also

bet 2’s price, so expected net gain from bet 2 is 0. You therefore accept bet 2 as fair.


Table 3 summarizes your net gain for the overall gambling scenario, given any outcome

. Net gain is negative if G and 0 if G. Since P(G) > 0, the overall gambling scenario

offers a positive probability of net loss and no compensating prospect of net gain. To quote

Lewis: “I can inflict on you a risk of loss uncompensated by any chance of gain” (1999, p. 406).

When C(A | ) > PG (A | ) with positive probability, I can simply reverse the two bets: I buy bet

1 and sell bet 2. Either way, you are vulnerable to fair bets that offer a positive probability of net

loss and no compensating positive probability of net profit.

Here we may distinguish a weaker and stronger notion of Dutch bookability. A strong

Dutch book is a set of acceptable bets with guaranteed net loss. A weak Dutch book is a set of

acceptable bets with a positive probability of net loss and no compensating positive probability

of net profit. In §1, I used the phrase “Dutch book” to mean strong Dutch book, as is typical in

the literature. I have just shown that anyone who violates Kolmogorov Conditionalization is

weakly Dutch bookable. A minor emendation in §6 will show that any such agent is strongly

Dutch bookable. Clearly, weak Dutch bookability is already very undesirable.

To rig a weak diachronic Dutch book against you, I must know your plan C(A | ) for

updating credences. Only then can I decide whether to buy or sell the relevant bets. A similar

situation prevails in Lewis’s Dutch book theorem for Ratio Conditionalization. As Lewis puts it,

“I still have no safe strategy for exploiting you unless I know in advance what you will do instead

of conditionalizing” (1999, p. 407). However, I think that we should not overemphasize

“strategies” for exploiting non-conditionalizers. The core issue here does not concern

competition between agents. The core issue concerns internal defects in a single agent’s credal

allocation over time. The main worry here is not that you are vulnerable to exploitation but rather

that your own credal allocation depicts your credal reallocation policy as promoting pointlessly

risky behavior. By your own lights, your credal reallocation policy can lead you to incur a

positive probability of net loss with no compensating positive probability of net gain. Thus, your

credal reallocation policy has highly undesirable pragmatic properties by your own lights.7

§6. Dutch books formalized

I have shown that agents who violate Kolmogorov Conditionalization are vulnerable to a

weak Dutch book. Might agents who obey Kolmogorov Conditionalization be just as badly off?

Say that a Kolmogorov conditionalizer is someone who satisfies the antecedent of Kolmogorov

Conditionalization and who conforms to Kolmogorov Conditionalization. A Kolmogorov

conditionalizer has initial credences that admit a suitable rcd, and she reallocates credences using

such an rcd. I will prove that one cannot rig a weak Dutch book against a Kolmogorov

conditionalizer. To prove a negative result of this kind, I must first formalize the notions update

rule, bet, bookie strategy, and weak Dutch book.

I assume an agent who at time t1 has initial credences P over events from (, F). At time

t2, she gains membership knowledge for sub--field GF. I assume that she updates her

credences in conformity to some update rule. Intuitively, an update rule is a policy for

reallocating credence given membership information for G. One might therefore consider

functions U:   M, where M is the space of probability measures over F. However, it will

prove more convenient to focus instead on functions C: F    satisfying two constraints:

C( . , ) is a probability measure for all .

C(A, . ) is G-measurable for all AF.

A common objection to Dutch book arguments is that you should see the losses coming and therefore opt out. Levi
(1987) and Maher (1992) develop this objection with respect to diachronic Dutch books. See Skyrms (1993) for a
response to Levi and Maher.

I say that any such C is an update rule for (, F, G). I will notate C(A, ) as C(A | ). If C(A | . )

satisfies the integral formula for each AF, then C is an rcd for P given G. Obviously, many

update rules are possible that violate the integral formula.

The agent faces a bookie who can offer bets over F at both t1 and t2. Following standard

practice in probability theory, I formalize a bet as a random variable. It is convenient to allow

random variables that take values in the extended real line = [-,]. Thus, a bet is a random

variable X:   . Here X() is the net gain for outcome . Infinite gains - and  arise in

scenarios such as Pascal’s Wager, but they have doubtful relevance to any realistic gambling

scenario. I allow them anyway. A random variable X is F-measurable:

X-1(-, a]  F for each a

X-1{-}  F

X-1{}  F.

F-measurability reflects the idea that bets concern events in F. Learning which events in F

occurred must suffice at least in principle to determine the bet’s net gain. This constraint leads

naturally to F-measurability, as in §4.2’s discussion of G-measurability.

If X is a random variable and  is a probability measure, then the expectation of X with

respect to  is written as E X  and defined in the usual way:

E X  df  Xd .
Depending on our choice of X, E X  may or may not be well-defined. Standard decision theory

only applies to scenarios where E X  is well-defined, since only then does expected utility

maximization offers any guidance. It is common in probability theory and Bayesian decision

theory to restrict attention to such scenarios by demanding that bets have well-defined

expectations. I impose no such restriction here. If the bookie wishes to offer bets without well-

defined expected values, then I raise no objection.

Potential bets are evaluated for acceptability with respect to the agent’s current

credences. A bet X is fair relative to  iff its expected value with respect to  is 0:

E X   0 ,

favorable relative to  iff its expected value with respect to  is positive:

E X   0 ,

and acceptable relative to  iff it is fair or favorable relative to . Thus, a bet is unacceptable if it

has a negative expectation or it does not have a well-defined expectation. I assume that our agent

adopts a policy of accepting all acceptable bets that are offered and rejecting all unacceptable

bets. The assumed policy may not be rationally obligatory, but it is rationally permissible. We

may legitimately consider an agent who adopts this policy.

I assume that the bookie can offer only finitely many bets at a given time.8 All such bets

are summable into a single random variable, so we may assume that the bookie offers at most a

single bet at t1 and a single bet at t2.9 At t1, the agent evaluates whether the proposed bet is

McGee (1999) shows that, if payoffs are unbounded and infinitely many bets are allowed, then an agent satisfying
very weak assumptions faces an infinite Dutch book: an infinite collection of acceptable (indeed, favorable) bets that
inflict a sure loss. McGee concludes that any rational agent must have bounded utilities. Following Arntzenius, Elga,
and Hawthorne (2003), I draw a different conclusion: infinite Dutch books do not reveal any pragmatic or epistemic
defect. A basic mathematical fact is that the expected value of the sum of infinitely many random variables need not
equal the sum of their expected values. In betting terms: a book containing infinitely many bets need not be
acceptable even though each individual bet is acceptable (indeed, favorable). That an agent would regard each bet as
acceptable when presented individually does not entail that the agent should accept the overall package of infinitely
many bets. I therefore ignore books that contain infinitely many bets.
The sum of two well-defined bets may not itself be well-defined for certain . For example, we may have X1( ) =
 and X2() = -. However, we may assume that all bets offered at a given time have a well-defined sum for each
. If a book violates this assumption, then it is indeterminate what net gain an agent who accepts the book receives
in certain outcomes. This book does not seem to me to constitute a well-defined gambling scenario. In any event,
such books raise many problems that are orthogonal to our main concerns. We may legitimately ignore these books.
(Alternatively, we could allow books where all bets offered at a given time have a well-defined sum except for

acceptable relative to her initial credences P. At t2, she evaluates acceptability using updated

credences that incorporate membership knowledge for G. More precisely, she computes

expectations relative to her new credences C( . | ), where  is the true outcome. In this spirit, let

( . ) = C( . | ), and say that bet X is acceptable given  iff

E X    Xd  0 .

The agent accepts bet X at t2 just in case X is acceptable given the true outcome.

A diachronic Dutch book has two elements: a bet X offered at t1; and a strategy used by

the bookie when deciding which bet to offer at t2. The Dutch book from §5 features the strategy:

Offer bet 2 if G; offer no bet if G.

Of course, we can imagine more complex bookie strategies. We would like a framework for

modeling all such strategies.

A bookie strategy maps information received at t2 into a bet offered at t2. In the general

case, a bookie might receive information about events drawn from space (1, F1) and offer a bet

concerning events in a different space (2, F2). For present purposes, we need not proceed so

generally. We are focused exclusively on Kolmogorov learning scenarios, where the agent

updates credences for (, F) based on membership knowledge for GF. Moreover, we only

consider scenarios where the bookie and the agent receive the same information at t2. So the only

relevant bookie strategies are those that map membership information about G into bets over F.

We use (, G) to model the information that the bookie consults when selecting which bet to

offer, and we use (, F) to model the information that an observer consults when evaluating the

net gain from whatever bet the bookie selects.

outcomes  contained in a probability zero event. This generalization would not appreciably impact the definitions
or proofs offered below.)

In this setting, it is natural to define a bookie strategy as a mapping from  to random

variables over (, F). However, it will prove convenient to proceed more circuitously. Let

GF =df (GF )

be the -field generated by the rectangles

GF GG, FF.

Consider the measurable space (  , GF ). A bookie strategy is a GF–measurable function

Y:     . For fixed , Y( , . ):   is the bet that the bookie offers upon learning

whether G for each GG. Since Y is GF–measurable, one can easily show that

Y( , . ):   is F–measurable for every ,

so that Y( , . ) is indeed a bet according to our official definition. I will commonly abbreviate

Y( , . ) as Y. To model scenarios where the bookie offers no bet upon learning full membership

information for G regarding , I set Y () = 0 for all inputs .

The GF-measurability requirement may look a bit mysterious, so let me elucidate it.

Suppose for purposes of this paragraph that “information” received by the bookie may be non-

veridical. We use GF to model the implicit knowledge of an observer who learns what

information was transmitted to the bookie and learns which events in F occurred. Any such

observer should be able in principle to determine whether the bookie’s selected bet has gain  a.

She should acquire implicit knowledge whether the proposition The bet selected by the bookie

has net gain  a is true. This proposition corresponds to the event

{(, ): Y ()  a} = Y-1 (-, a].


GF-measurability requires that each such event belong to GF. Thus, GF–measurability

requires that an observer who knows what information the bookie received and which events in

F occurred is able to decide whether the bet selected by the bookie has net gain  or > a.

For any bookie strategy Y, there is a very important “diagonal function” Y*:  

defined by

Y*() =df Y ().

See Figure 1. Y*() is net gain in outcome  for the bet dictated by bookie strategy Y in outcome

. So Y* specifies the agent’s net gain if she accepts whatever bet the bookie offers at t2.


We can now formalize Dutch bookability. A strong Dutch book for probability space

(, F, P), sub--field G  F, and update rule C is a pair (X, Y) such that

(a) X is a bet that is acceptable relative to P.

(b) Y is a bookie strategy.

(c) Y is acceptable given   X() + Y () < 0.

(d) Y is not acceptable given   X() < 0.

(a) requires that X have a nonnegative expectation at t1. This condition ensures that the agent will

accept bet X at t1. Collectively, (a)-(d) ensure that a bookie who offers bet X and pursues bookie

strategy Y will inflict a net loss in all outcomes. Note that (d) constrains net gain from X rather

than net gain from X + Y. Our betting agent will reject bet Y in outcomes  where Y is

unacceptable, so only X matters for computing net gain in such outcomes. See Figure 2.

A weak Dutch book for probability space (, F, P), sub--field G  F, and update rule

C is a pair (X, Y) such that

(a) X is a bet that is acceptable relative to P.

(b) Y is a bookie strategy.

(c) P{: Y is acceptable given  & X() + Y () < 0} > 0.

(d) P{: Y is acceptable given  & X() + Y () > 0} = 0.

(e) P{: Y is not acceptable given  & X() > 0} = 0.

(a) ensures that the agent will happily accept bet X at t1. (c) ensures a positive probability that the

agent accepts bet Y at t2 and thereby incurs a net loss. (d) ensures that there is no compensating

positive probability of net profit in outcomes  where the agent accepts bet Y. (e) ensures that

there is no compensating positive probability of net profit in outcomes  where the agent rejects

bet Y as unacceptable. (e) is very important, because without it (X, Y) need not comprise

anything like a strategy for exploiting the agent. A strategy for exploiting an agent must have

zero probability of rewarding the agent with net profit.

§10 will show that a strong Dutch book is a weak Dutch book. When there exists a strong

(weak) Dutch book for (, F, P), G, and C, say that C is strongly (weakly) Dutch bookable.

A weaker notion that sometimes figures in the literature is semi-Dutch book. A semi-

Dutch book is a set of acceptable bets with a possibility of net loss and no possibility of net gain.

This is less demanding than the notion weak Dutch book, because the probability of net loss from

a semi-Dutch book may be 0. A weak Dutch book offers a positive probability of net loss, not

just a possibility of net loss. To illustrate, suppose that your credences violate regularity in the

sense of §2, i.e. you assign credence 0 to a metaphysically possible proposition. Then you should

happily pay price 1 for a wager that returns payoff 1 just in case the proposition is false. The

resulting bet is a semi-Dutch book but not a weak Dutch book.

Shimony deploys semi-Dutch bookability to defend regularity. I agree with Hájek (2009,

2012) that Shimony’s argument is not compelling. As Hájek (2009, pp. 188-189) urges, the mere

possibility of net loss need not be worrisome if you are 100% confident that the possibility will

not materialize. From your viewpoint, a semi-Dutch book that is not a weak Dutch book carries

no risk of net loss. It seems rationally permissible for you to regard such a book as perfectly

agreeable. You may be worried by a probability 0 possibility of net loss, but rationality does not

require you to be worried. Thus, semi-Dutch bookability does not in itself suggest that any

serious pragmatic or epistemic defect afflicts your credal reallocations.

Kolmogorov conditionalizers are often semi-Dutch bookable. Suppose you plan to update

credal assignment P(A) using C(A | . ), a conditional probability for A given G. Suppose that there

exists PG (A | . ), a conditional probability for A given G, such that C(A | . ) and PG (A | . ) disagree

on outcomes belonging to some set G. Then I can employ the strategy from §5. I can construct

sequential fair bets that inflict upon you a net loss for outcomes in G and net gain 0 on outcomes

outside G. The catch is that G itself must have probability 0, so that your net loss only occurs

with probability 0. In that case, my strategy is a semi-Dutch book but not a weak Dutch book. As

argued in the previous paragraph, vulnerability to semi-Dutch books does not suggest that

Kolmogorov Conditionalization is pragmatically or epistemically problematic.

Dutch Book Theorem for Kolmogorov Conditionalization: Let (, F, P) be a probability space,

let GF be a sub--field, and let C be an update rule for (, F, G). If C is not an rcd for P given

G, then there exists a strong Dutch book for (, F, P), G, and C.

Proof: C(A | . ) must violate the integral formula for some AF. Let PG (A | . ) be a conditional

probability for A given G. As in §5, we may assume without loss of generality that G =df {: C(A

| ) < PG (A | )} has non-zero P-measure. We formalize the procedure from §5, supplemented

with a sidebet on G at t1 to ensure a net loss when G. Define random variable X by

 1  PG ( A |  ) if   A  G

X ( )   PG ( A |  ) if   Ac  G
 if   G
 0

For G, define random variable Y by

C ( A |  )  1 if v  A
Y (v)  
C ( A |  ) if v  A

and for G define random variable Y by

Y (v)  0 .

These definitions determine a bookie strategy Y(,) =df Y(). X formalizes bet 1 from §5. Y

formalizes the strategy: offer bet 2 if G; offer no bet if G. Let

L df  C ( A | )  PG ( A |  )dP() if this integral is finite. If the integral is infinite, then let L be

any finite negative number. Either way, we have

 C ( A |  )  PG ( A |  )dP( )  L .

Define random variable Z:

( P(G)  1)C ( A |  )  PG ( A |  ) if   G
Z ( )  
L if   G

We show that (X + Z, Y) is a strong Dutch book for (, F, P), G, and C.


§5 already showed that X is fair relative to P, but we now offer a somewhat more formal

proof. For any set S, let IS be the indicator function for S:

1 if   S
I S ( )  
0 if   S

For G, we have

X ()  1  PG ( A | )I A ()  PG ( A | ) I AC ()  I A ()  PG ( A | ) .

For any , we have

X ()  IG ( )I A ()  PG ( A | ) ,

so that

EP X    I G ( )I A ( )  PG ( A |  )dP( )   I G I AdP   I G ( ) PG ( A |  )dP( )

  

  I AG dP   PG ( A |  )dP( )  P( A  G)  G PG ( A |  )dP( )  P( A  G)  P( A  G)  0 ,

 G

where the penultimate identity follows by the integral formula. To confirm that Z is acceptable

relative to P, note that

Z ()  IG ()(P(G)  1)C ( A | )  PG ( A |  )  IG c () L ,

so that

 
EP Z    I G ( )( P(G)  1)C ( A |  )  PG ( A |  )  I G c L dP( )

  I G ( )( P(G)  1)C ( A |  )  PG ( A |  )dP( )   I G c LdP( )

 

  ( P(G)  1)C ( A |  )  PG ( A |  )dP( )  LP(G c )


 ( P(G)  1) L  LP(G c )  L( P(G)  P(G c ) 1)  L(1  1)  0 .

As for Y, we may write


Y = C( A | ) – IA

for any G. Formalizing the reasoning from §5, we check that Y is fair given  for any G:

E Y    C ( A |  )  I A d   C ( A |  )d   I Ad

  

= C ( A |  )  d   ( A)  C ( A |  )  C ( A |  )  0 ,

where  in these equations is held fixed and is not an integration variable. Clearly, Y is also fair

given  for any G.

To complete the proof, we must show that X() + Z() + Y () < 0 for all . If

G, then routine calculation confirms that

X() + Z() + Y () = P(G) [ C( A | ) – PG (A | )] < 0.

If G, then

X() + Z() + Y () = L < 0.

Hence, (X + Z, Y) is a strong Dutch book for (, F, P), G, and C. 

Skyrms (1992) suggests that a genuine Dutch book should contain bets that are favorable,

not just acceptable. We can strengthen the foregoing theorem to accommodate Skyrms’s

viewpoint. In particular, we can supplement all bets with a “sweetener” as follows. Define

P(G)PG ( A |  )  C ( A |  )
S ( )  .

Define a bookie strategy W(,):

 
W (, v)  Y (, )  I G ( ) I G ( )S ( )  I G C ( ) .

Consider the sweetened pair ((X + IGS) + (Z + IGS), W ). One can show that X + IGS and Z + IGS

are each favorable at t1 and that W is favorable given  for every G. One can also check that

W = 0 for any G, corresponding to a situation where no bet is offered at t2. So ((X + IGS) +

(Z + IGS), W ) models a gambling scenario where all proffered bets are favorable. Nevertheless,

net payoff from the overall scenario is always < 0. Hence, agents who violate Kolmogorov

Conditionalization are vulnerable to a set of favorable bets that inflict a sure loss.

§7. A converse Dutch book theorem for Kolmogorov Conditionalization

This section proves that Kolmogorov conditionalizers are not Dutch bookable. The proof

resembles Skyrms’s proof of the converse Dutch book theorem for Ratio Conditionalization. In

both proofs, the basic idea is that a diachronic Dutch book for a conditionalizer could be

converted into a synchronic book with impossible properties. Developing this idea requires much

more mathematical machinery for the general case of Kolmogorov Conditionalization than for

the special case of Ratio Conditionalization. I will first offer some heuristic remarks and then

present a rigorous proof.

Suppose for reductio that you are a Kolmogorov conditionalizer and that there exists a

weak diachronic Dutch book (X, Y) for your update rule. Let

 =df {: Y is acceptable given }.

Now consider a bet Z* defined as follows:

  Z*() = Y ()

  Z*() = 0.

Think of Z* as a conditional bet offered at t1:

You accept the bet Y offered at t2 if that bet is acceptable; otherwise you decline.

See Figure 3. This conditional bet should be acceptable at t1, since it only commits you to betting

in situations where you find the bet offered at t2 acceptable. Given that X and Z* are individually

acceptable at t1, the combined bet X + Z* is also acceptable at t1. In other words:

(1) EP [ X + Z*]  0.

Since (X, Y) is a weak Dutch book, X() + Y () < 0 with positive probability inside . Thus:

(2) X + Z* < 0 with positive probability inside .

Since (X, Y) is a weak Dutch book, X() + Y () > 0 with probability 0 inside . Thus:

(3) X + Z* > 0 with probability 0 inside .

Since (X, Y) is a weak Dutch book, X > 0 with probability 0 outside . Thus:

(4) X + Z* > 0 with probability 0 outside .

(1)-(4) are mutually inconsistent: the negative values ensured by (2) find no counterbalancing

positive values to generate the nonnegative expected value promised by (1). By contradiction,

Kolmogorov conditionalizers are not weakly Dutch bookable.

This reasoning hinges upon the presupposition that Z* is acceptable at t1. The

presupposition is plausible, but why should we believe it? In fact, the presupposition is not true

for agents who violate Kolmogorov Conditionalization. Take the bookie strategy Y from the

Dutch book theorem:

Offer bet 2 if G; offer no bet if G.

Y is acceptable given  for every , yet one can easily check that the corresponding bet Z*:

You accept bet 2 if G; you do not bet if G.

is unacceptable at t1. However, I prove below that the presupposition is true for Kolmogorov


The proof requires a crucial lemma. Suppose that PG is an rcd for P given G. Let ( . ) =

PG ( . , ). For any bookie strategy Z, consider the function EZ:   defined by

E Z ( )  df E Z    Z d   Z (, )  (d ) .

EZ () is the expected value a Kolmogorov conditionalizer computes at t2 for the bet dictated by

bookie strategy Z. Thus, EZ maps each outcome to the net gain that a Kolmogorov

conditionalizer would expect at t2 if she accepted the bet offered at t2. The lemma basically says

that averaging together these expected net gains yields the same result as computing the net gain

our agent would expect at t1 if she resolved to accept the bet offered at t2. More carefully: if the

diagonal function Z* has an expectation with respect to initial credences P, then that expectation

equals the expectation of EZ with respect to P.

Lemma: Let (, F, P) be a probability space and GF a sub--field. Suppose there exists PG , an

rcd for P given G, and let ( . ) = PG ( . , ). Let Z:     be GF-measurable. Define

E z (  ) df E Z  , which may be infinite or undefined for certain  . Then EZ is G-measurable.

Let Z* be the diagonal function defined by Z*() = Z (). If EP Z * exists, then EZ is defined

for P-almost all values and EP Z *  EP E Z  .

I prove the lemma in a mathematical appendix (§10).

The Dutch book theorem generates a conflict between present and future computations of

expected value. The bookie strategy Y:

Offer bet 2 if G; offer no bet if G.


yields a bet with nonnegative expectation as computed at t2 or else yields a non-bet with net gain

0 for all outcomes. So E Y   0 for every , where we have set ( . ) = C( . | ). The

 
expected value at t1 of those nonnegative expectations --- EP E Y  --- is nonnegative as well.

Nevertheless, the conditional bet Z*:

You accept bet 2 if G; you do not bet if G.

has negative expectation when computed at t1. Our lemma shows that no such conflict can arise

for Kolmogorov conditionalizers. If EP [Z*] exists, then it must equal EP E Z  . If Z is a bookie

strategy that always yields acceptable bets at t2, and if EP [Z*] exists, then Z* must already be

acceptable at t1. As I now show, this harmony between present and future expectations

immunizes Kolmogorov conditionalizers from diachronic Dutch books.

Converse Dutch Book Theorem for Kolmogorov Conditionalization: Let (, F, P) be a

probability space, let GF be a sub--field, and let C be an update rule for (, F, G). If C is an

rcd for P given G, then there does not exist a weak Dutch book for (, F, P), G, and C.

Proof: Suppose for reductio that there is a weak Dutch book (X, Y) for (, F, P), G, and C. Let

( . ) =df C( . |  ), and let

 =df {: Y is acceptable given } = {: E Y ( )  0 }.

The lemma shows that E Y is G-measurable, from which it follows that G. Define bookie

strategy Z by

Y ( , . ) if   
Z (, . ) df 
0 if   

Since G, Z is GF–measurable. Partition  as follows:

- =df   {: X() + Z*() < 0} =   {: X() + Y () < 0}

0 =df   {: X() + Z*() = 0} =   {: X() + Y () = 0}

+ =df   {: X() + Z*() > 0} =   {: X() + Y () > 0}.

Since (X, Y) is a weak Dutch book, P(-) > 0 and P(+) = 0. We therefore have

 ( X  Z *)dP   ( X  Z * )dP   ( X  Z *)dP   ( X  Z *)dP   ( X  Z *)dP  0  0  0

   

By similar reasoning,

 ( X  Z *)dP  0 .


 ( X  Z * )dP   ( X  Z * )dP   ( X  Z *)dP  0 ,

  c

so that EP [ X + Z*] < 0.

We now apply the lemma to Z* so as to derive a conflicting value for EP [ X + Z*]. We

must first check that EP Z * exists. For any random variables V and W, define

V ( ) if V ( )  0
V  ( )  df 
0 if V ( )  0

 W ( ) if W ( )  0
W  ( )  df 
0 if W ( )  0

and note that

W+  ( V + W )+ + V -.

All these functions are nonnegative, so

   
0  EP W   EP (V  W )  EP V  .  

In particular,

     
0  EP (Z *)  EP ( X  Z *)  EP X  .

 
We have assumed that X is acceptable, so that EP X   0 . Thus, EP X  must certainly be finite.

 
We have also just shown that EP X  Z *  0 , so that EP ( X  Z *) must likewise be finite. It

 
follows that EP (Z *) is finite. Hence, EP Z * exists. Applying the lemma,

EP Z *  EP E Z  .

We have chosen Z so that

E Z ( )   Z d  0 for all .

The integral of a nonnegative function is nonnegative, so

EP Z *  EP E Z   0 .

Since EP X   0 , we conclude that

EP X  Z *  EP X   EP Z *  0

which contradicts our earlier finding that EP X  Z *  0 . By reductio, there is no weak Dutch

book for (, F, P), G, and C. 

§8. Significance of the two theorems

The Dutch book theorem and converse Dutch book theorem show that Kolmogorov’s

theory delineates conditional probabilities with uniquely desirable pragmatic properties. It is

good to avoid Dutch books. Thus, it is good when credal reallocation is invulnerable to Dutch

books. The theorems establish Kolmogorov Conditionalization as the sole credal reallocation

policy that achieves the desired invulnerability in Kolmogorov learning scenarios where rcds

exist. The forbidding mathematics of Kolmogorov’s theory should not distract us from the fact

that it codifies fundamental ties between conditional probability, credal reallocation, and

decision-making under uncertainty.

To explore the significance of the theorems, let us revisit two worries about

Kolmogorov’s approach raised in §4: non-existence of rcds and residual indeterminacy after

specifying a conditioning sub--field.

§8.1 Non-existence of rcds

Let (, F, P) be a probability space and GF a sub--field such that there exists no rcd

for P given G. Suppose that you have initial credences P over F and then gain new evidence

exhausted by full membership knowledge for G.10 How should you update your credences over

events in F? Kolmogorov Conditionalization does not say. It remains silent about learning

scenarios where rcds do not exist. We can now identify a good reason for this silence: the Dutch

book theorem shows that all options are problematic.

Suppose that C(A | ) is the credence you would assign to AF upon gaining full

membership knowledge for G regarding outcome . As in §5, we may assume that

C(A | . ) is G-measurable for all AF.

Suppose that C(A | . ) violates the integral formula for some AF. Then the Dutch book theorem

shows that I can rig a strong Dutch book against you. Thus, you are strongly Dutch bookable if

you do not employ some conditional probability PG (A | . ) as your policy for reassigning

credence to A. The problem is that, if you do adopt such a policy for each AF, then your

I assume for the sake of argument that (, F, P) models a possible credal allocation. It is not obvious that this
assumption is correct, because the usual examples where rcds do not exist involve a -field F defined non-
constructively through the Axiom of Choice.

credences at t2 will not always constitute a probability measure. You will violate countable

additivity for certain outcomes .

It is controversial whether credences should be countably additive. As mentioned in §1, I

wish to side-step these controversies. I assume that credences should be countably additive, at

least for idealized agents who figure in Kolmogorov learning scenarios. Under that assumption,

the Dutch book theorem shows that a learning scenario modeled by (, F, P) and G is

problematic. Anyone in such a learning scenario must either succumb to a strong Dutch book or

else violate countable additivity.

In my opinion, the most important moral here is that you should avoid these problematic

learning scenarios. By choosing a sufficiently inauspicious probability space as your starting

point, you set yourself a rational dilemma between strong Dutch bookability and countable

additivity violations. To avoid the dilemma, you should refrain from having credences modeled

by a pathological probability space that does not admit rcds. You should adopt a policy of

maintaining a well-behaved credal allocation that admits rcds. This policy implicitly guides all

serious empirical applications of Bayesian modeling within statistics, economics, robotics,

cognitive science, and so on. The policy is easy to implement, because naturally arising

probability spaces always admit rcds.

§8.2 Residual indeterminacy

Kolmogorov does not delineate unique conditional probabilities even after one fixes a

conditioning sub--field. When G contains a non-empty probability zero event, PG (A | . ) is

uniquely determined only up to measure 0. Nor can Dutch books help pin down conditional

probabilities more determinately: the converse Dutch book theorem shows that updating in

accord with any rcd PG avoids a diachronic Dutch book. That is why I formulated Kolmogorov

Conditionalization as an indeterministic constraint on credal reallocation rather than a

deterministic instruction that yields unique reallocated credences.

We see here a foundational basis for the indeterminacy in Kolmogorov’s theory.

Kolmogorov specifies conditional probabilities as uniquely as Dutch book considerations allow.

If the only constraint on rational credal reallocation is that one avoid Dutch books, then

Kolmogorov Conditionalization is the most determinate norm we can expect. If the only

constraint on conditional probability is that it subserve credal reallocations that avoid Dutch

books, then Kolmogorov pins down conditional probabilities as determinately as possible.

Would you like more determinate conditional probabilities than Kolmogorov provides?

Then you must look beyond Dutch books. You must examine the broader role that conditional

probability plays within our cognitive lives. In principle, one might try to motivate more

determinate conditional probabilities through either pragmatic or epistemic considerations.

However, I doubt that pragmatic factors can pin down conditional probabilities more

determinately than Kolmogorov’s theory. Rational decision-making compares expected values,

and expected values obliterate differences among alternative rcds for P given G. Thus, I doubt

that we can render Kolmogorov’s theory any more determinate by examining how conditional

probability figures in rational decision-making. Whether epistemic factors can generate more

determinacy is a question worth further exploration.

§9. Conclusion

Kolmogorov’s theory of conditional probability is acclaimed by mathematicians and

neglected by philosophers. The mathematicians are right. Kolmogorov’s theory brilliantly


exemplifies how formal mathematics can elucidate core philosophical concepts. It deserves

mention in the same breath with Turing’s analysis of computability and Tarski’s analyses of truth

and logical consequence. At the very least, Kolmogorov articulates a systematic framework that

codifies how conditional probability, credal allocation, and decision-making interact in numerous

important learning scenarios. I hope that the theorems proved above will promote wider

appreciation of rcds as invaluable analytical tools.

§10. Mathematical appendix

This appendix proves the lemma from §7. The lemma follows from the conditional

Fubini theorem (Fristedt and Gray, 1997, p. 431). I have thought it best to prove the lemma

directly, partly because doing so takes only a little more space than proving that the conditional

Fubini theorem entails the lemma, partly because a self-contained proof of the conditional Fubini

theorem does not seem to be readily accessible anywhere in the literature. Throughout my

discussion, I employ the conventions of (Fristedt and Gray, 1997, p. 48, p. 445) regarding

partially defined functions and almost surely defined random variables.

Proof of the lemma: We will first prove the lemma for the special case where Z is everywhere

nonnegative, then prove it for general Z. Let us begin by transforming  Z * dP into a more useful

form. Let T:      be the “diagonal embedding”

T() = (, ),

which is a measurable function from (, F ) to (  , GF ). T induces a measure P* on GF:

P* =df PT -1.

For any EGF,


P*(E) = PT -1 (E) = P{: (,)  E}.

For any nonnegative Z,

 Z * dP   ZTdP   ZdPT  ZdP * ,


   

where the second identity follows by change of variable (Billingsley, 1995, p. 216). We will

prove that

 
(5)  ZdP *   E Z dP   
 Z  du  dP( )
 
 

for nonnegative Z, which entails that for all such Z

 
 Z * dP   ZdP *   E Z dP    Z du dP( ) .
    

Following a common strategy from probability theory, we first prove (5) for indicator functions

and then build our way up to arbitrary nonnegative Z.

Take any measurable rectangle GF with GG and FF, and let IGF be the

corresponding indicator function. For this special case, (5) becomes

 
I GF dP *     ( I GF ) du dP( ) .

  

For the left-hand side, note that by change of variable

G F dP *  I
G F dPT 1   I GF TdP   I G I F dP  P(G  F ) .
 

For the right-hand side, note that ( I GF ) is IF for all G and is 0 otherwise. Thus, the right-

hand side reduces to

   
  ( I GF  ) du dP( )    I F du dP( )    ( F )dP( )  P(G  F ) ,
 G 
   G

where the last identity follows by the integral formula. Our analysis also shows that

  (F ) if   G
E IGF ( )    ,
0 if   G

which is a well-defined G-measurable function of . Thus, the lemma is established for the

special case of IGF.

Now consider any arbitrary indicator variable IE , with EGF. Take the class M

containing all EGF such that

 (I E  ) d is a G-measurable function of .

 
 I
 
E dP *  

 ( I )
E  du  dP( ) .

One can show that M is closed under complementation and countable disjoint union. I address

countable disjoint union, leaving complementation to the reader. Suppose that E is the union of

countably many sets En, where these sets are pairwise disjoint and where (5) holds of each

indicator function I E n . Then

    
I E dP*    I En dP * =   I En dP *     ( I En ) du  dP( )
  n 1 n 1  
n 1   

    
     ( I En ) du  dP( ) =    ( I E ) du  dP( ) ,
 n1   

where we have repeatedly used an infinite series version of the monotone convergence theorem

(Billingsley, 1995, p. 211). Thus, (5) holds of IE. To establish G-measurability, one may write

 

 ( I E ) du    ( I En ) du    ( I En ) du

n 1 n 1

and use that the limit of G-measurable functions is itself G-measurable if that limit exists

everywhere (Billingsley, 1995, p. 184). Since M contains all measurable rectangles GF with

GG and FF and is closed under complementation and countable disjoint union, it follows that

M contains all members of GF (Billingsley, 1995, pp. 41-42). This proves the lemma for

arbitrary indicator variables IE.

Using the linearity of integration, one can extend the lemma to any nonnegative

measurable simple function, i.e. any finite linear combination

c I
i 1
i Ei

such that EiGF, ci ≥ 0, and the sets Ei form a partition of   . Given a nonnegative GF-

measurable Z, there is a sequence { Zn } of nonnegative measurable simple functions such that

Z (, )  lim Z n (, ) ,


and such that the sequence {Z n (, )} is non-decreasing, for each ,  (Billingsley, 1995, p.

185). By the monotone convergence theorem (Billingsley, 1995, p. 208),

 
 ZdP* 
 
 lim Z ndP*  lim
n n 
Z n dP* 
n  
lim 
 
Z n ( , )u ( d )  dP( ) =

   
    lim Z n ( , )u (d ) dP( )     Z ( , )u (d ) dP( ) ,
     

which proves the lemma for arbitrary nonnegative GF-measurable Z.

Now fix an arbitrary GF–measurable function Z such that EP Z * exists. We have

already established that E Z  and E Z  are G-measurable functions. We may write

E Z ( )   Z (, ) (d )   Z (, )   (d )   Z (, )  (d )  E Z  ( )  E Z  ( ) .

  

As the difference of G-measurable functions, E Z  E Z   E Z  is G-measurable. We compute

EP Z * =  Z * dP   ( Z *) dP   ( Z *) dP   ( Z ) * dP   ( Z  ) * dP
  

    

 Z
 
   
dP *   Z dP*  EP E Z   EP E Z  .
 

The final identity uses (5), which is legitimate because Z + and Z - are both nonnegative. The

expectation on the left-hand side exists, so at least one of the expectations on the right-hand side

 
must be finite. Suppose without loss of generality that EP E Z  is finite.

Choose any random variables V and W such that EP V  is finite and EP W  exists

(possibly with infinite value). Since EP V  is finite, V must have finite value except possibly

inside a set A of P-measure 0. Define random variable V by

 V ( ) if   A
V ( )  df 
0 if   A

Random variables that agree almost everywhere have the same expectation, so

EP V   EP V . 

V  W is well-defined everywhere and agrees with V  W except possibly inside A. Since


EP V  EP W  exists, the linearity of expectations entails that

 
 
EP V  W  EP V  EP W  .

V  W is well-defined everywhere except possibly inside A, so it is an almost surely defined

random variable. Its expectation is

 
 
EP V  W   df EP V  W  EP V  EP W   EP V   EP W  .

Taking E Z  for V and E Z  for W, we conclude that E Z  E Z   E Z  is an almost surely defined

random variable and that


     
EP E Z   E Z   EP E Z   EP E Z  .


     
EP Z *  EP E Z   EP E Z   EP E Z   E Z   EP E Z  ,

which completes the proof. 

Inspecting the proof, we see that the integral formula is used only once and is not used in

showing G-measurability. Let C be an update rule and Y a bookie strategy. Define

( . ) =df C( . |  )

E Y ( )  df E Y .

Our proof shows that EY is G-measurable, whether or not C satisfies the integral formula. If we


 =df {: Y is acceptable given } = {: E Y ( )  0 },

then the G-measurability of E Y entails that G.

We can now show that strong Dutch books are weak Dutch books. Let (X, Y) be a strong

Dutch book for (, F, P), G, and C. Conditions (a) and (b) in the definition of weak Dutch book

are immediate. Defining  as in the previous paragraph, note that

(6)   X() + Y () < 0

(7)   X() < 0

Condition (d) in the definition of weak Dutch book follows from (6), while condition (e) follows

from (7). For condition (c), note that P() is well-defined since G. If P() = 0, then

 XdP   XdP  0 ,
 c

contradicting our assumption that X is acceptable. We must therefore have P() > 0, which

together with (6) entails condition (c).


I am grateful to Kenny Easwaran, Greg Gandenberger, Stephen Ge, Teddy Seidenfeld,

and two anonymous referees for this journal for comments that improved the paper. Thanks to

Bill Kowalsky for assistance in preparing the final version of the manuscript.

Outcome Price Payoff Net gain

AG PG (A | ) 1 1- PG (A | )
AcG PG (A | ) 0 - PG (A | )
G 0 0 0

Table 1. Net gain for bet 1.

Outcome Price Payoff Net gain

A C(A | ) 1 C(A | )-1
Ac C(A | ) 0 C(A | )

Table 2. Net gain for bet 2. Note that we subtract the payoff from the price, since you sell rather
than buy the bet.

Outcome Net gain

AG C(A | ) - PG (A | )
AcG C(A | ) - PG (A | )
G 0

Table 3. Net gain for the entire gambling scenario, found by adding net gains for bet 1 and bet 2.
Bear in mind that bet 2 is only offered when G.

Figure 1. A visualization of   . There may not be a natural linear ordering of , but the
visualization is still a useful heuristic. The horizontal axis corresponds to (, G). Points on this
axis determine which bet the bookie selects. The vertical axis corresponds to (, F). Points on
this axis determine net gain from whatever bet the bookie selects. When the bookie acquires
information about outcome , he offers bet Y. For any outcome , this bet has a well-defined
net gain Y(), i.e. Y’s value on the point where the vertical line intersects the lower horizontal
line. In actuality, the outcome  that determines the bookie’s bet is the same outcome  that
determines net gain for that bet. Someone who accepts the bet receives net gain Y*() =df Y (),
which is Y’s value at the point where the vertical line intersects the diagonal line.

Figure 2. Think of X and Y* as bets on events over the vertical axis. Let  =df {: Y is
acceptable given }. The grey column contains points (, ) with . If (X, Y) is a strong
Dutch book, then X + Y* < 0 inside  and X < 0 outside . If (X, Y) is a weak Dutch book, then X
+ Y* < 0 with positive probability inside , X + Y* > 0 with probability 0 inside , and X > 0
with probability 0 outside . Thus, the only values of Y that affect whether (X, Y) is a strong (or
weak) Dutch book are its values on points where the diagonal line intersects the grey column.
This reflects the fact that our betting agent rejects Y if .

Figure 3. Think of Z* as a bet on events over the vertical axis. If , then Z*() is Y’s value
on the corresponding point of the diagonal line. If , then Z*() = 0. Informally, Z* converts
bookie strategy Y into a single bet that prunes away all possibilities for expected net loss.

