Wednesday, 28 March 2018

Aggregating attitudes to risk

I've written quite a lot on this blog recently about how we should aggregate the credences or subjective probabilities of a group of individuals to give their collective credences (here, here, here). In some of those posts, this one in particular, I asked how we should combine credences if we wish to use them to make a group decision. And I gave an argument for aggregating by linear pooling that is based on a mathematical theorem.

So what's linear pooling? Suppose there are $n$ individuals with probabilistic credence functions $P_1, \ldots, P_n$; and suppose $P_G$ is their aggregate. Then $P_G$ is a result of linear pooling if there are weights $0 \leq \alpha_1, \ldots, \alpha_n \leq 1$ with $\sum^n_{i=1} \alpha_i =1$ such that $$P_G(-) = \sum^n_{i=1} \alpha_iP_i(-)$$
Next, a little notation from decision theory. Suppose $a$ is an act, $P$ is a credence function over a set of possible states of the world $S$, and $U$ is a utility function. Then, according to expected utility theory (EU theory), an agent with $P$ and $U$ evaluates $a$ as its expected utility relative to those functions. The expected utility of $a$ relative to $P$ and $U$ is $$\mathrm{EU}_{P, U}(a) = \sum_{s \in S} P(s||a)U(a\ \&\ s)$$where: (i) $P(s||a)$ is the agent's credence in state $s$ given that act $a$ is performed, and (ii) $U(a\ \&\ s)$ is the agent's utility for the outcome of act $a$ at state $s$. We will write $P^a(s)$ for $P(s||a)$ and $U^a(s)$ for $U(a\ \&\ s)$, so that $$\mathrm{EU}_{P, U}(a) = \sum_{s \in S} P^a(s)U^a(s)$$
And now the mathematical result:

Theorem 1 Suppose $P_1, \ldots, P_n$ and $P_G$ are probabilistic credence functions over the same set of possibilities.

(I) Suppose $P_G$ is not a result of linear pooling -- that is, $P_G$ is not a weighted average of $P_1, \ldots, P_n$. Then there is a utility function $U$ and a pair of acts $a$ and $b$ such that:
  • $\mathrm{EU}_{P_i, U}(a) < \mathrm{EU}_{P_i, U}(b)$, for $1 \leq i \leq n$.
  • $\mathrm{EU}_{P_G, U}(b) < \mathrm{EU}_{P_G, U}(a)$
That is, if every individual has the same utility function $U$, and the collective utility function is also $U$, then each individual evaluates $b$ as better than $a$, while the aggregate evaluates $a$ as better than $b$.

(II) Suppose $P_G$ is a result of linear pooling -- that is, $P_G$ is not a weighted average of $P_1, \ldots, P_n$. Then for any utility function $U$ and pair of acts $a$ and $b$: if
  • $\mathrm{EU}_{P_i, U}(a) < \mathrm{EU}_{P_i, U}(b)$, for $1 \leq i \leq n$.
then
  • $\mathrm{EU}_{P_G, U}(a) < \mathrm{EU}_{P_G, U}(b)$
That is, if every individual has the same utility function $U$, and the collective utility function is also $U$, then if each individual evaluates $b$ as better than $a$, then the aggregate evaluates $b$ as better than $a$.

Now, notice: all of this what I've just described takes place in the context of expected utility theory. But many decision theorists believe that expected utility theory is too demanding as a set of norms of rational choice. Most often, they cite the so-called Allais paradox to support their claims. In that, the French economist, Maurice Allais describes a pair of decisions, $A$ vs $B$, and $C$ vs $D$. When presented with these decisions, people often prefer $A$ to $B$, and $D$ to $C$. However, there is no way to assign utilities to the outcomes in these cases so that those preferences may be recommended by expected utility theory. There are a number of responses to this, and the literature is now somewhat crowded with non-expected utility theories. In this note, I'll be interested only in Lara Buchak's risk-weighted expected utility theory (REU theory), which she lays out in her 2013 book, Risk and Rationality.

I have described Buchak's REU theory in a lot of detail here, so I won't try to motivate it in this post. In short, while EU theory demands that an agent evaluate an act as its expected utility, REU says that they should evaluate it as its risk-weighted expected utility. While an agent's expected utility for an act is determined only by her credence and utility functions, her risk-weighted expected utility is determined by her credence, utility, and risk functions. For Buchak, an agent's risk function is a strictly increasing, continuous function $r : [0, 1] \rightarrow [0, 1]$ with $r(0) = 0$ and $r(1) = 1$. It represents that agent's attitudes to risk.

Suppose $a$ is an act, $U$ is a utility function, and $S = \{s_1, \ldots, s_n\}$ is the set of states of the world. And now let $S^* = \{S_1, \ldots, S_k\}$ be the coarse-graining of the states of the world that (i) collects together states on which the outcome of $a$ has the same utility and (ii) orders those states from worst to best relative to $U$. Thus: (i) $s_i, s_j$ are in the same state in $S^*$ iff $U^a(s_i) = U^a(s_j)$; (ii) $U^a(S_1) < U^a(S_2) < \ldots < U^a(S_k)$. Then the risk-weighted expected utility of $a$ relative to $P$, $U$, and $r$ is:
$$\mathrm{REU}_{P, r, u}(a)  = U^a(S_1) + \sum^k_{i=2} r(P^a(S_i \vee \ldots \vee S_k))[U^a(S_i) - U^a(S_{i-1})]$$
With that in hand, we can now state the result:

Theorem 2 Suppose $r_1, \ldots, r_n$ and $r_G$ are risk functions.

(I) Suppose $r_G$ is not a result of linear pooling -- that is, $r_G$ is not a weighted average of $r_1, \ldots, r_n$. Then there is a set of states $S$, a credence function $P$ over $S$, a utility function $U$, and a pair of acts $a$ and $b$ such that
  • $\mathrm{REU}_{P, r_i, U}(a) < \mathrm{REU}_{P, r_i, U}(b)$, for $1 \leq i \leq n$
  • $\mathrm{REU}_{P, r_G, U}(b) < \mathrm{REU}_{P, r_G, U}(a)$
That is, if every individual has the same credence function $P$ and utility function $U$, and the group also has $P$ and $U$, then each individual evaluates $b$ as better than $a$, while the aggregate evaluates $a$ as better than $b$.

(II) Suppose $r_G$ is a result of linear pooling -- that is, $r_G$ is not a weighted average of $r_1, \ldots, r_n$. Then for any set of states $S$, any credence function $P$, any utility function $U$, and pair of acts $a$ and $b$: if
  • $\mathrm{EU}_{P, r_i, U}(a) < \mathrm{EU}_{P, r_i, U}(b)$, for $1 \leq i \leq n$.
then
  • $\mathrm{EU}_{P, r_G, U}(a) < \mathrm{EU}_{P, r_G, U}(b)$
That is, if every individual has the same credence function $P$ and the same utility function $U$, and the group also has $P$ and $U$, then if each individual evaluates $b$ as better than $a$, then the aggregate evaluates $b$ as better than $a$.

Proof of Theorem 2. We present a series of lemmas, which we tie together at the end.

First, we present a lemma that allows us to redescribe risk-weighted expected utilities as standard expected utilities calculated with respect to different credence functions.

Lemma 1  Suppose $r$ is a risk function, $P$ is a credence function over $S_m = \{s_1, \ldots, s_m\}$, and $U$ is a utility function. Now suppose $a$ is an act and $U^a(s_1) < \ldots < U^a(s_m)$. Then define the credence function $P_{r, U, a}$ as follows:
\begin{eqnarray*}
P_{r, U, a}(s_1) & = & r(P(s_1 \vee \ldots \vee s_m)) - r(P(s_2 \vee \ldots \vee s_m)) \\
P_{r, U, a}(s_2) & = & r(P(s_2 \vee \ldots \vee s_m)) - r(P(s_3 \vee \ldots \vee s_m)) \\
\vdots & \vdots & \vdots \\
P_{r, U, a}(s_{m-1}) & = & r(P(s_{m-1} \vee s_m)) - r(P(s_m)) \\
P_{r, U, a}(s_m) & = & r(P(s_m))
\end{eqnarray*}
Then$$\mathrm{REU}_{P, r, U}(a) = \mathrm{EU}_{P_{r, U, a}, U}(a)$$

Proof of Lemma 1.  Suppose $U^a(s_1) < \ldots < U^a(s_m)$. Then
\begin{eqnarray*}
& & \mathrm{REU}_{P, r, U}(a) \\
& = & U^a(s_1) + \sum^m_{k=2} r(P(s_k \vee \ldots \vee s_m))[U^a(s_k) - U^a(s_{k-1})] \\
& = & r(P(s_m))U^a(s_m) + \sum^{m-1}_{k=1} \left [ r(P(s_k \vee \ldots \vee s_m)) - r(P(s_{k+1} \vee \ldots \vee s_m)) \right ]U^a(s_k) \\
& = & P_{r, u, a}(s_m)U^a(s_m) + \sum^{m-1}_{k=1} P_{r, u, a}(s_k)U^a(s_k) \\
& = & \mathrm{EU}_{P_{r, U, a}, U}(a)
\end{eqnarray*}
as required.

Definition 1  Let $P^m$ be the uniform distribution over $S_m = \{s_1, \ldots, s_m\}$. That is, for all $1 \leq k \leq m$,$$P^m(s_k) = \frac{1}{m}$$

We now state two lemmas about $P^m$:

Lemma 2  Suppose $r$ is a risk function, $S_m = \{s_1, \ldots, s_m\}$ is a set of states,  $U$ is a utility function, and $a$, $b$ are acts (which possibly order the states differently, so that $U^a(s_i) < U^a(s_j)$ but $U^b(s_i) > U^b(s_j)$ for some $i, j$). Then:$$P^m_{r, U, a}(-) = P^m_{r, U, b}(-)$$
Lemma 3  If $r_G$ is not a linear pool of $r_1, \ldots, r_n$, then there is $m$ such that $P^m_{r_G, U, a}$ is not a linear pool of $P^m_{r_1, U, a}, \ldots, P^m_{r_n, U, a}$ for any utility function $U$ and act $a$.

Proof of Lemma 3.  First, we show that, if $U$ is a utility function and $a$ is an act and $P^m_{r_G, u, a}$ is a linear pool of $P^m_{r_1, U, a}, \ldots, P^m_{r_n, U, a}$, then, for all $1 \leq k \leq m$, $r_G(\frac{k}{m}) = \sum_i \alpha_i r_i(\frac{k}{m})$.

To that end, suppose $P^m_{r_G, u, a}$ is a linear pool of $P^m_{r_1, u, a}, \ldots, P^m_{r_n, u, a}$. That is, there are $\alpha_1, \ldots, \alpha_n$ such that
$$P^m_{r_G, u, a}(-) = \sum_i \alpha_i P^m_{r_i, u, a}(-)$$
Then
\begin{eqnarray*}
r_G(\frac{1}{m}) & = & r_G(P^m(s_m)) = P^m_{r_G, u, a}(s_m) = \\
& & \sum_i P^m_{r_i, u, a}(s_m) = \sum_i \alpha_i r_i(P^m(s_m)) = \sum_i \alpha_i r_i(\frac{1}{m})
\end{eqnarray*}
So $r_G(\frac{1}{m}) = \sum_i \alpha_i r_i(\frac{1}{m})$. And
\begin{eqnarray*}
r_G(\frac{2}{m}) - r_G(\frac{1}{m}) & = & r_G(P^m(s_{m-1} \vee s_m)) - r_G(P^m(s_m)) \\
& = & P^m_{r_G, u, a}(s_{m-1}) \\
& = & \sum_i \alpha_i P^m_{r_i, u, a}(s_{m-1}) \\
& = &  \sum_i \alpha_i \left [  r_i(P^m(s_{m-1} \vee s_m)) - r_i(P^m(s_m)) \right ] \\
& = &  \sum_i \alpha_i  r_i(\frac{2}{m})-  \sum_i \alpha_i r_i(\frac{1}{m}) \\
& = &  \sum_i \alpha_i  r_i(\frac{2}{m}) - r_G(\frac{1}{m})
\end{eqnarray*}
Thus, $r_G(\frac{2}{m}) = \sum_i \alpha_i  r_i(\frac{2}{m})$. And similarly for $r_G(\frac{3}{m}), \ldots, r_G(\frac{m-1}{m}), r_G(\frac{m}{m})$. So, for all $1 \leq k \leq m$,
$$r_G(\frac{k}{m}) = \sum_i \alpha_i r_i(\frac{k}{m})$$
Next, we note that, since each risk function $r_1, \ldots, r_n$, $r_G$ is continuous, then if $r_G$ is not a linear pool of $r_1, \ldots, r_n$, there must be $m$ such that there are no $\alpha_1, \ldots, \alpha_n$ such that, for $1 \leq k \leq m$, $$r_G(\frac{k}{m}) = \sum_i \alpha_i r_i(\frac{k}{m})
$$And that completes our proof of Lemma 3.

We are now ready to piece our proof together. We'll start by proving Theorem 2(I). So suppose $r_G$ is not a linear pool of $r_1, \ldots, r_n$. Then, by Lemma 3, for any utility function $u$ and act $d$, there is $m$ such that $P^m_{r_G, u, d}$ is not a linear pool of $P^m_{r_1, u, d}, \ldots, P^m_{r_n, u, d}$. Then, by Theorem 1, there are acts $a$ and $b$ such that
  • $\mathrm{EU}_{P^m_{r_i, u, d}, u}(a) < \mathrm{EU}_{P^m_{r_i, u, d}}(b)$, for $1 \leq i \leq n$; but
  • $\mathrm{EU}_{P^m_{r_G, u, d}, u}(b) < \mathrm{EU}_{P^m_{r_G, u, d}}(a)$
Thus, by Lemma 2,
  • $\mathrm{EU}_{P^m_{r_i, u, a}, u}(a) < \mathrm{EU}_{P^m_{r_i, u, b}}(b)$, for $1 \leq i \leq n$; but 
  • $\mathrm{EU}_{P^m_{r_G, u, b}, u}(b) < \mathrm{EU}_{P^m_{r_G, u, a}}(a)$
And thus, by Lemma 1,
  • $\mathrm{REU}_{P^m, r_i, u}(a) < \mathrm{REU}_{P^m, r_i, u}(b)$, for $1 \leq i \leq n$; but
  • $\mathrm{REU}_{P^m, r_G, u}(b) < \mathrm{REU}_{P^m, r_G, u}(a)$
This completes our proof of Theorem 2(I).

Next, we prove Theorem 2(II). So suppose $r_G$ is not a linear pool of $r_1, \ldots, r_n$. That is, there are $\alpha_1, \ldots, \alpha_n$ such that $r_G(-) = \sum_i \alpha_i r_i(-)$. Then, if $u^a(s_1) < \ldots < u^a(s_n)$, then
\begin{eqnarray*}
& & \mathrm{REU}_{P, r_G, U}(a) \\
& = & U^a(s_1) + \sum^m_{k=2} r_G(P(s_k \vee \ldots \vee s_m))[U^a(s_k) - U^a(s_{k-1})] \\
& = & U^a(s_1) + \sum^m_{k=2} \left (\sum_i \alpha_i r_i(P(s_k \vee \ldots \vee s_m)) \right ) [U^a(s_k) - U^a(s_{k-1})] \\
& = & \sum_i \alpha_i \left ( U^a(s_1) + \sum^m_{k=2} r_i(P(s_k \vee \ldots \vee s_m)) [U^a(s_k) - U^a(s_{k-1})] \right ) \\
& = & \sum_i \alpha_i \mathrm{REU}_{P, r_i, U}(a)
\end{eqnarray*}
And similarly for $b$. So, if
  • $\mathrm{REU}_{P, r_i, U}(a) < \mathrm{REU}_{P, r_i, U}(b)$, for $1 \leq i \leq n$
then
  • $\mathrm{REU}_{P, r_G, U}(a) < \mathrm{REU}_{P, r_G, U}(b)$
as required.


Monday, 12 February 2018

An almost-Dutch Book argument for the Principal Principle

People often talk about the synchronic Dutch Book argument for Probabilism and the diachronic Dutch Strategy argument for Conditionalization. But the synchronic Dutch Book argument for the Principal Principle is mentioned less. That's perhaps because, in one sense, there couldn't possibly be such an argument. As the Converse Dutch Book Theorem shows, providing you satisfy Probabilism, there can be no Dutch Book made against you -- that is, there is no sets of bets, each of which you will consider fair or favourable on its own, but which, when taken together, lead to a sure loss for you. So you can violate the Principal Principle without being vulnerable to a sure loss, providing your satisfy Probabilism. However, there is a related argument for the Principal Principle. And conversations with a couple of philosophers recently made me think it might be worth laying it out.

Here is the result on which the argument is based:

(I) Suppose your credences violate the Principal Principle but satisfy Probabilism. Then there is a book of bets and a price such that: (i) you consider that price favourable for that book -- that is, your subjective expectation of the total net gain is positive; (ii) every possible objective chance function considers that price unfavourable -- that is, the objective expectation of the total net gain is guaranteed to be negative.

(II) Suppose your credences satisfy both the Principal Principle and Probabilism. Then there is no book of bets and a price such that: (i) you consider that price favourable for that book; (ii) every possible objective chance function considers that price unfavourable.

Put another way:

(I') Suppose your credences violate the Principal Principle. There are two actions $a$ and $b$ such that: you prefer $b$ to $a$, but every possible objective chance function prefers $a$ to $b$.

(II') Suppose your credences satisfy the Principal Principle. For any two actions $a$ and $b$: if every possible objective chance function prefers $a$ to $b$, then you prefer $a$ to $b$.

To move from (I) and (II) to (I') and (II'), let $a$ be the action of accepting the bets in $B$ and let $b$ be the action of rejecting them.

The proof splits into two parts:

(1) First, we note that a credence function $c$ satisfies the Principal Principle iff $c$ is in the closed convex hull of the set of possible chance functions.

(2) Second, we prove that:

(2I) If a probability function $c$ lies outside the closed convex hull of a set of probability functions $\mathcal{X}$, then there is a book of bets and a price such the expected total net gain from that book at that price by the lights of $c$ is positive, while the expected total net gain from that book at that price by the lights of each $p$ in $\mathcal{X}$ is negative.

(2II) If a probability function $c$ lies inside the closed convex hull of a set of probability functions $\mathcal{X}$, then there is no book of bets and a price such the expected total net gain from that book at that price by the lights of $c$ is positive, while the expected total net gain from that book at that price by the lights of each $p$ in $\mathcal{X}$ is negative.

Here's the proof of (2), which I lift from my recent justification of linear pooling -- the same technique is applicable since the Principal Principle essentially says that you should set your credences by applying linear pooling to the possible objective chances.

First:
  • Let $\Omega$ be the set of possible worlds
  • Let $\mathcal{F} = \{X_1, \ldots, X_n\}$ be the set of propositions over which our probability functions are defined. So each $X_i$ is a subset of $\Omega$.
Now:
  • We represent a probability function $p$ defined on $\mathcal{F}$ as a vector in $\mathbb{R}^n$, namely, $p = \langle p(X_1), \ldots, p(X_n)\rangle$.
  • Given a proposition $X$ in $\mathcal{F}$ and a stake $S$ in $\mathbb{R}$, we define the bet $B_{X, S}$ as follows: $$B_{X, S}(\omega) =  \left \{ \begin{array}{ll}
    S & \mbox{if } \omega \in X \\
    0 & \mbox{if } \omega \not \in X
    \end{array}
    \right.$$ So $B_{X, S}$ pays out $S$ if $X$ is true and $0$ if $X$ is false.
  • We represent the book of bets $\sum^n_{i=1} B_{X_i, S_i}$ as a vector in $\mathbb{R}^n$, namely, $S = \langle S_1, \ldots, S_n\rangle$. 

Lemma 1
If $p$ is a probability function on $\mathcal{F}$, the expected payoff of the book of bets $\sum^n_{i=1} B_{X_i, S_i}$ by the lights of $p$ is $$S \cdot p = \sum^n_{i=1} p(X_i)S_i$$
Lemma 2
Suppose $c$ is a probability function on $\mathcal{F}$, $\mathcal{X}$ is a set of probability functions on $\mathcal{F}$, and $\mathcal{X}^+$ is the closed convex hull of $\mathcal{X}$. Then, if $c \not \in \mathcal{X}^+$, then there is a vector $S$ and $\varepsilon > 0$ such that, for all $p$ in $\mathcal{X}$, $$S \cdot p < S \cdot c - \varepsilon$$
Proof of Lemma 2.  Suppose $c \not \in \mathcal{X}^+$. Then let $c^*$ be the closest point in $\mathcal{X}^+$ to $c$. Then let $S = c - c^*$. Then, for any $p$ in $\mathcal{X}$, the angle $\theta$ between $S$ and $p - c$ is obtuse and thus $\mathrm{cos}\, \theta < 0$. So, since $S \cdot (p - c) = ||S||\, ||x - p|| \mathrm{cos}\, \theta$ and $||S||, ||p - c|| > 0$, we have $S \cdot (p - c) < 0$. And hence $S \cdot p < S \cdot c$. What's more, since $\mathcal{X}^+$ is closed, $p$ is not a limit point of $\mathcal{X}^+$, and thus there is $\delta > 0$ such that $||p - c|| > \delta$ for all $p$ in $\mathcal{X}$. Thus, there is $\varepsilon > 0$ such that $S \cdot p < S \cdot c - \varepsilon$, for all $p$ in $\mathcal{X}$.

We now derive (2I) and (2II) from Lemmas 1 and 2:

Let $\mathcal{X}$ be the set of possible objective chance functions. If $c$ violates the Principal Principle, then $c$ is not in $\mathcal{X}^+$. Thus, by Lemma 2, there is a book of bets $\sum^n_{i=1} B_{X_i, S_i}$ and $\varepsilon > 0$ such that, for any objective chance function $p$ in $\mathcal{X}$, $S \cdot p < S \cdot c - \varepsilon$. By Lemma 1, $S \cdot p$ is the expected payout of the book of bets by the lights of $p$, while $S \cdot c$ is the expected payout of the book of bets by the lights of $c$. Now, suppose we were to offer an agent with credence function $c$ the book of bets $\sum^n_{i=1} B_{X_i, S_i}$ for the price of $S \cdot c - \frac{\varepsilon}{2}$. Then this would have positive expected payoff by the lights of $c$, but negative expected payoff by the lights of each $p$ in $\mathcal{X}$. This gives (2I).

(2II) then holds because, when $c$ is in the closed convex hull of $\mathcal{X}$, its expectation of a random variable is in the closed convex hull of the expectations of that random variable by the lights of the probability functions in $\mathcal{X}$. Thus, if the expectation of a random variable is negative by the lights of all the probability functions in $\mathcal{X}$, then its expectation by the lights of $c$ is not positive.


Monday, 1 January 2018

A Dutch Book argument for linear pooling

Often, we wish to aggregate the probabilistic opinions of different agents. They might be experts on the effects of housing policy on people sleeping rough, for instance, and we might wish to produce from their different probabilistic opinions an aggregate opinion that we can use to guide policymaking. Methods for undertaking such aggregation are called pooling operators. They take as their input a sequence of probability functions $c_1, \ldots, c_n$, all defined on the same set of propositions, $\mathcal{F}$. And they give as their output a single probability function $c$, also defined on $\mathcal{F}$, which is the aggregate of $c_1, \ldots, c_n$. (If the experts have non-probabilistic credences and if they have credences defined on different sets of propositions or events, problems arise -- I've written about these here and here.) Perhaps the simplest are the linear pooling operators. Given a set of non-negative weights, $\alpha_1, \ldots, \alpha_n \leq 1$ that sum to 1, one for each probability function to be aggregated, the linear pool of $c_1, \ldots, c_n$ with these weights is: $c = \alpha_1 c_1 + \ldots + \alpha_n c_n$. So the probability that the aggregate assigns to a proposition (or event) is the weighted average of the probabilities that the individuals assign to that proposition (event) with the weights $\alpha_1, \ldots, \alpha_n$.

Linear pooling has had a hard time recently. Elkin and Wheeler reminded us that linear pooling almost never preserves unanimous judgments of independence; Russell, et al. reminded us that it almost never commutes with Bayesian conditionalization; and Bradley showed that aggregating a group of experts using linear pooling almost never gives the same result as you would obtain from updating your own probabilities in the usual Bayesian way when you learn the probabilities of those experts. I've tried to defend linear pooling against the first two attacks here. In that paper, I also offer a positive argument in favour of that aggregation method: I argue that, if your aggregate is not a result of linear pooling, there will be an alternative aggregate that each experts expects to be more accurate than yours; if your aggregate is a result of linear pooling, this can't happen. Thus, my argument is a non-pragmatic, accuracy-based argument, in the same vein as Jim Joyce's non-pragmatic vindication of probabilism. In this post, I offer an alternative, pragmatic, Dutch book-style defence, in the same vein as the standard Ramsey-de Finetti argument for probabilism.

My argument is based on the following fact: if your aggregate probability function is not a result of linear pooling, there will be a series of bets that the aggregate will consider fair but which each expert will expect to lose money (or utility); if your aggregate is a result of linear pooling, this can't happen. Since one of the things we might wish to use an aggregate to do is to help us make communal decisions, a putative aggregate cannot be considered acceptable if it will lead us to make a binary choice one way when every expert agrees that it should be made the other way. Thus, we should aggregate credences using a linear pooling operator.

We now prove the mathematical fact behind the argument, namely, that if $c$ is not a linear pool of $c_1, \ldots, c_n$, then there is a bet that $c$ will consider fair, and yet each $c_i$ will expect it to lose money; the converse is straightforward.

Suppose $\mathcal{F} = \{X_1, \ldots, X_m\}$. Then:
  • We can represent a probability function $c$ on $\mathcal{F}$ as a vector in $\mathbb{R}^m$, namely, $c = \langle c(X_1), \ldots, c(X_m)\rangle$.
  • We can also represent a book of bets on the propositions in $\mathcal{F}$ by a vector in $\mathbb{R}^m$, namely, $S = \langle S_1, \ldots, S_m\rangle$, where $S_i$ is the stake of the bet on $X_i$, so that the bet on $X_i$ pays out $S_i$ dollars (or utiles) if $X_i$ is true and $0$ dollars (or utiles) if $X_i$ is false.
  • An agent with probability function $c$ will be prepared to pay $c(X_i)S_i$ for a bet on $X_i$ with stake $S_i$, and thus will be prepared to pay $S \cdot c = c(X_1)S_1 + \ldots + c(X_m)S_m$ dollars (or utiles) for the book of bets with stakes $S = \langle S_1, \ldots, S_m\rangle$. (As is usual in Dutch book-style arguments, we assume that the agent is risk neutral.)
  • This is because $S \cdot c$ is the expected pay out of the book of bets with stakes $S$ by the lights of probability function $c$.
Now, suppose $c$ is not a linear pool of $c_1, \ldots, c_n$. So $c$ lies outside the convex hull of $\{c_1, \ldots, c_n\}$. Let $c^*$ be the closest point to $c$ inside that convex hull. And let $S = c - c^*$. Then the angle $\theta$ between $S$ and $c_i - c$ is obtuse and thus $\mathrm{cos}\, \theta < 0$ (see diagram below). So, since $S \cdot (c_i - c) = ||S||\, ||c_i - c|| \mathrm{cos}\, \theta$ and $||S||, ||c_i - c|| \geq 0$, we have $S \cdot (c_i - c) < 0$. And hence $S \cdot c_i < S \cdot c$. But recall:
  • $S \cdot c$ is the amount that the aggregate $c$ is prepared to pay for the book of bets with stakes $S$; and 
  • $S \cdot c_i$ is the expert $i$'s expected pay out of the book of bets with stakes $S$.
Thus, each expert will expect that book of bets to pay out less than $c$ will be willing to pay for it.



Tuesday, 10 October 2017

Two Paradoxes of Belief (by Roy T Cook)

This was posted originally at the OUPBlog. This is a first in a series of cross-posted blogs by Roy T Cook (Minnesota) from the OUPBlog series on Paradox and Puzzles.

The Liar paradox arises via considering the Liar sentence:

L: L is not true.

and then reasoning in accordance with the:

T-schema:

“Φ is true if and only if what Φ says is the case.”

Along similar lines, we obtain the Montague paradox (or the “paradox of the knower“) by considering the following sentence:

M: M is not knowable.

and then reasoning in accordance with the following two claims:

Factivity:

“If Φ is knowable then what Φ says is the case.”

Necessitation:

“If Φ is a theorem (i.e. is provable), then Φ is knowable.”

Put in very informal terms, these results show that our intuitive accounts of truth and of knowledge are inconsistent. Much work in logic has been carried out in attempting to formulate weaker accounts of truth and of knowledge that (i) are strong enough to allow these notions to do substantial work, and (ii) are not susceptible to these paradoxes (and related paradoxes, such as Curry and Yablo versions of both of the above). A bit less well known that certain strong but not altogether implausible accounts of idealized belief also lead to paradox.

The puzzles involve an idealized notion of belief (perhaps better paraphrased at “rational commitment” or “justifiable belief”), where one believes something in this sense if and only if (i) one explicitly believes it, or (ii) one is somehow committed to the claim even if one doesn’t actively believe it. Hence, on this understanding belief is closed under logical consequence – one believes all of the logical consequences of one’s beliefs. In particular, the following holds:

B-Closure:

“If you believe that, if Φ then Ψ, and you believe Φ, then you believe Ψ.”

Now, for such an idealized account of belief, the rule of B-Necessitation:

B-Necessitation:

“If Φ is a theorem (i.e. is provable), then Φ is believed.”

is extremely plausible – after all, presumably anything that can be proved is something that follows from things we believe (since it follows from nothing more than our axioms for belief). In addition, we will assume that our beliefs are consistent:

B-Consistency:

“If I believe Φ, then I do not believe that Φ is not the case.”

So far, so good. But neither the belief analogue of the T-schema:

B-schema:

“Φ is believed if and only if what Φ says is the case.”

nor the belief analogue of Factivity:

B-Factivity:

“If you believe Φ then what Φ says is the case.”

is at all plausible. After all, just because we believe something (or even that the claim in question follows from what we believe, in some sense) doesn’t mean the belief has to be true!

There are other, weaker, principles about belief, however, that are not intuitively implausible, but when combined with B-Closure, B-Necessitation, and B-Consistency lead to paradox. We will look at two principles – each of which captures a sense in which we cannot be wrong about what we think we don’t believe.

The first such principle we will call the First Transparency Principle for Disbelief:

TPDB1:

“If you believe that you don’t believe Φ then you don’t believe Φ.”

In other words, although many of our beliefs can be wrong, according to TPDB1 our beliefs about what we do not believe cannot be wrong. The second principle, which is a mirror image of the first, we will call the Second Transparency Principle for Disbelief:

TPDB2:

“If you don’t believe Φ then you believe that you don’t believe Φ.”

In other words, according to TPDB2 we are aware of (i.e. have true beliefs about) all of the facts regarding what we don’t believe.

Either of these principles, combined with B-Closure, B-Necessitation, and B-Consistency, lead to paradox. I will present the argument for TPBD1. The argument for TPDB2 is similar, and left to the reader (although I will give an important hint below).

Consider the sentence:

S: It is not the case that I believe S.

Now, by inspection we can understand this sentence, and thus conclude that:

(1) What S says is the case if and only if I do not believe S.

Further, (1) is something we can, via inspecting the original sentence, informally prove. (Or, if we were being more formal, and doing all of this in arithmetic enriched with a predicate “B(x)” for idealized belief, a formal version of the above would be a theorem due to Gödel’s diagonalization lemma.) So we can apply B-Necessitation to (1), obtaining:

(2) I believe that: what S says is the case if and only if I do not believe S.

Applying a version of B-Closure, this entails:

(3) I believe S if and only if I believe that I do not believe S.

Now, assume (for reductio ad absurdum) that:

(4) I believe S.

Then combining (3) and (4) and some basic logic, we obtain:

(5) I believe that I do not believe S.

Applying TPDB1 to (5), we get:

(6) I do not believe S.

But this contradicts (4). So lines (4) through (6) amount to a refutation of line (4), and hence a proof that:

(7) I do not believe S.

Now, (7) is clearly a theorem (we just proved it), so we can apply B-Necessitation, arriving at:

(8) I believe that I do not believe S.

Combining (8) and (3) leads us to:

(9) I believe S.

But this obviously contradicts (7), and we have our final contradiction.

Note that this argument does not actually use B-Consistency (hint for the second argument involving TPDB2: you will need B-Consistency!)

These paradoxes seem to show that, as a matter of logic, we cannot have perfectly reliable beliefs about what we don’t believe – in other words, in this idealized sense of belief, there are always things that we believe that we don’t believe, but in actuality we do believe (the failure of TPDB1), and things that we don’t believe, but don’t believe that we don’t believe (the failure of TPDB2). At least, the puzzles show this if we take them to force us to reject both TPDB1 and TPDB2 in the same way that many feel that the Liar paradox forces us to abandon the full T-Schema.

Once we’ve considered transparency principles for disbelief, it’s natural to consider corresponding principles for belief. There are two. The first is the First Transparency Principle for Belief:

TPB1:

“If you believe that you believe Φ then you believe Φ.”

In other words, according to TPD1 our beliefs about what we believe cannot be wrong. The second principle, again is a mirror image of the first, is the Second Transparency Principle for Belief:

TPB2:

“If you believe Φ then you believe that you believe Φ.”

In other words, according to TPB2 we are aware of all of the facts regarding what we believe.

Are either of these two principles, combined with B-Closure, B-Necessitation, and B-Consistency, paradoxical? If not, are there additional, plausible principles that would lead to paradoxes if added to these claims? I’ll leave it to the reader to explore these questions further.

A historical note: Like so many other cool puzzles and paradoxes, versions of some of these puzzles first appeared in the work of medieval logician Jean Buridan.

Sunday, 10 September 2017

Aggregating abstaining experts

In a series of posts a few months ago (here, here, and here), I explored a particular method by which we might aggregate expert credences when those credences are incoherent. The result was this paper, which is now forthcoming in Synthese. The method in question was called the coherent approximation principle (CAP), and it was introduced by Daniel Osherson and Moshe Vardi in this 2006 paper. CAP is based on what we might call the principle of minimal mutilation. We begin with a collection of credence functions, $c_1$, ..., $c_n$, one for each expert, and some of which might be incoherent. What we want at the end is a single coherent credence function $c$ that is the aggregate of $c_1$, ..., $c_n$. The principle of minimal mutilation says that $c$ should be as close as possible to the $c_i$s -- when aggregating a collection of credence functions, you should change them as little as possible to obtain your aggregate.

We can spell this out more precisely by introducing a divergence $\mathfrak{D}$. We might think of this as a measure of how far one credence function lies from another. Thus, $\mathfrak{D}(c, c')$ measures the distance from $c$ to $c'$. We call these measures divergences rather than distances or metrics, since they do not have the usual features that mathematicians assume of a metric: we assume $\mathfrak{D}(c, c') \geq 0$, for any $c, c'$, and $\mathfrak{D}(c, c') = 0$ iff $c = c'$, but we do not assume that $\mathfrak{D}$ is symmetric nor that it satisfies the triangle inequality. In particular, we assume that $\mathfrak{D}$ is an additive Bregman divergence. The standard example of an additive Bregman divergence is squared Euclidean distance: if $c$, $c'$ are both defined on the set of propositions $F$, then
$$
\mathrm{SED}(c, c') = \sum_{X \in F} |c(X) - c'(X)|^2
$$In fact, $\mathrm{SED}$ is symmetric, but it does not satisfy the triangle inequality. The details of this family of divergences needn't detain us here (but see here and here for more). Indeed, we will simply use $\mathrm{SED}$ throughout. But a more general treatment would look at other additive Bregman divergences, and I hope to do this soon.

Now, suppose $c_1$, ..., $c_n$ is a set of expert credence functions. And suppose $c_i$ is defined on the set of propositions $F_i$. And suppose that $\mathfrak{D}$ is an additive Bregman divergence -- you might take it to be $\mathrm{SED}$. Then how do we define the aggregate $c$ that is obtained from $c_1$, ..., $c_n$ by a minimal mutilation? We let $c$ be the coherent credence function such that the sum of the distances from $c$ to the $c_i$s is minimal. That is,
$$
\mathrm{CAP}_{\mathfrak{D}}(c_1, \ldots, c_n) = \mathrm{arg\ min}_{c \in P_{F_i}} \sum^n_{i=1} \mathfrak{D}(c, c_i)
$$
where $P_{F_i}$ is the set of coherent credence functions over $F_i$.

As we see in my paper linked above, if each of the credence functions are defined over the same set of propositions -- that is, if $F_i = F_j$, for all $1 \leq i, j, \leq n$ -- then:
  • if $\mathfrak{D}$ is squared Euclidean distance, then this aggregate is the straight linear pool of the original credences; if $c$ is defined on the partition $X_1$, ..., $X_m$, then the straight linear pool of $c_1$, ..., $c_n$ is this:$$c(X_j) = \frac{1}{n}c_1(X_j) + ... + \frac{1}{n}c_n(X_j)$$
  • if $\mathfrak{D}$ is the generalized Kullback-Leibler divergence, then the aggregate is the straight geometric pool of the originals; if $c$ is defined on the partition $X_1$, ..., $X_m$, then the straight geometric pool of $c_1$, ..., $c_n$ is this: $$c(X_j) = \frac{1}{K}(c_1(X_j)^{\frac{1}{n}} \times ... \times c_1(X_j)^{\frac{1}{n}})$$where $K$ is a normalizing factor.
(For more on these types of aggregation, see here and here).

In this post, I'm interested in cases where our agents have credences in different sets of propositions. For instance, the first agent has credences concerning the rainfall in Bristol tomorrow and the rainfall in Bath, but the second has credences concerning the rainfall in Bristol and the rainfall in Birmingham.

I want to begin by pointing to a shortcoming of CAP when it is applied to such cases. It fails to satisfy what we might think of as a basic desideratum of such procedures. To illustrate this desideratum, let's suppose that the three propositions $X_1$, $X_2$, and $X_3$ form a partition. And suppose that Amira has credences in $X_1$, $X_2$, and $X_3$, while Benito has credences only in $X_1$ and $X_2$. In particular:
  • Amira's credence function is: $c_A(X_1) = 0.3$, $c_A(X_2) = 0.6$, $c_A(X_3) = 0.1$.
  • Benito's credence function is: $c_B(X_1) = 0.2$, $c_B(X_2) = 0.6$.
Now, notice that, while Amira's credence function is defined on the whole partition, Benito's is not. But, nonetheless, Benito's credences uniquely determine a coherent credence function on the whole partition:
  • Benito's extended credence function is: $c^*_B(X_1) = 0.2$, $c^*_B(X_2) = 0.6$, $c^*_B(X_3) = 0.2$.
Thus, we might expect our aggregation procedure to give the same result whether we aggregate Amira's credence function with Benito's or with Benito's extended credence function. That is, we might expect the same result whether we aggregate $c_A$ with $c_B$ or with $c^*_B$. After all, $c^*_B$ is in some sense implicit in $c_B$. An agent with credence function $c_B$ is committed to the credences assigned by credence function $c^*_B$.

However, CAP does not do this. As mentioned above, if you aggregate $c_A$ and $c^*_B$ using $\mathrm{SED}$, then the result is their linear pool: $\frac{1}{2}c_A + \frac{1}{2}c^*_B$. Thus, the aggregate credence in $X_1$ is $0.25$; in $X_2$ it is $0.6$; and in $X_3$ it is $0.15$. The result is different if you aggregate $c_A$ and $c_B$ using $SED$: the aggregate credence in $X_1$ is $0.2625$; in $X_2$ it is $0.6125$; in $X_3$ it is $0.125$.

Now, it is natural to think that the problem arises here because Amira's credences are getting too much say in how far a potential aggregate lies from the agents, since she has credences in three propositions, while Benito only has credences in two. And, sure enough, $\mathrm{CAP}_{\mathrm{SED}}(c_A, c_B)$ lies closer to $c_A$ than to $c_B$ and closer to $c_A$ than the aggregate of $c_A$ and $c^*_B$ lies. And it is equally natural to try to solve this potential bias in favour of the agent with more credences by normalising. That is, we might define a new version of CAP:
$$
\mathrm{CAP}^+_D(c_1, \ldots, c_n) = \mathrm{arg\ min}_{c' \in P_{F_i}} \sum^n_{i=1} \frac{1}{|F_i|}D(c, c_i)
$$
However, this doesn't help. Using this definition, the aggregate of Amira's credence function $c_A$ and Benito's extended credence function $c^*_B$ remains the same; but the aggregate of Amira's credence function and Benito's original credence function changes -- the aggregate credence in $X_1$ is $0.25333$; in $X_2$, it is $0.61333$; in $X_3$, it is $0.1333$. Again, the two ways of aggregating disagree.

So here is our desideratum in general:

Agreement with Coherent Commitments (ACC) Suppose $c_1$, ..., $c_n$ are coherent credence functions, with $c_i$ defined on $F_i$, for each $1 \leq i \leq n$. And let $F = \bigcup^n_{i=1} F_i$. Now suppose that, for each $c_i$ defined on $F_i$, there is a unique coherent credence function $c^*_i$ defined on $F$ that extends $c_i$ -- that is, $c_i(X) = c^*_i(X)$ for all $X$ in $F_i$. Then the aggregate of $c_1$, ..., $c_n$ should be the same as the aggregate of $c^*_1$, ..., $c^*_n$.

CAP does not satisfy ACC. Is there a natural aggregation rule that does? Here's a suggestion. Suppose you wish to aggregate a set of credence functions $c_1$, ..., $c_n$, where $c_i$ is defined on $F_i$, as above. Then we proceed as follows.
  1. First, let $F = \bigcup^n_{i=1} F_i$.
  2. Second, for each $1 \leq i \leq n$, let $$c^*_i = \{c : \mbox{$c$ is coherent & $c$ is defined on $F$ & $c(X) = c_i(X)$ for all $X$ in $F$}\}$$ That is, while $c_i$ represents a precise credal state defined on $F_i$, $c^*_i$ represents an imprecise credal state defined on $F$. It is the set of coherent credence functions on $F$ that extend $c_i$. That is, it is the set of coherent credence functions on $F$ that agree with $c_i$ on propositions in $F_i$. Thus, if, like Benito, your coherent credences on $F_i$ uniquely determine your coherent credences on $F$, then $c^*_i$ is just the singleton that contains that unique extension. But if your credences over $F_i$ do not uniquely determine your coherent credences over $F$, then $c^*_i$ will contain more coherent credence functions.
  3. Finally, we take the aggregate of $c_1$, ..., $c_n$ to be the credence function $c$ that minimizes the total distance from $c$ to the $c^*_i$s. The problem is that there isn't a single natural definition of the distance from a point to a set of points, even when you have a definition of the distance between individual points. I adopt a very particular measure of such distances here; but it would be interesting to explore the alternative options in greater detail elsewhere. Suppose $c$ is a credence function and $C$ is a set of credence functions. Then $$D(c, C) = \frac{\mathrm{min}_{c' \in C}D(c, c') + \mathrm{max}_{c' \in C}D(c, c')}{2}$$ With this in hand, we can finally give our aggregation procedure:$$\mathrm{CAP}^*_D(c_1, \ldots, c_n) = \mathrm{arg\ min}_{c' \in P_F} \sum^n_{i=1} D(c, c^*_i)$$ 
The first thing to note about CAP$^*$ is that, unlike the original CAP, or CAP$^+$, it automatically satisfies ACC.

Let's now see CAP$^*$ in action.
  • Since CAP$^*$ satisfies ACC, the aggregate for $c_A$ and $c_B$ is the same as the aggregate for $c_A$ and $c^*_B$, which is just their straight linear pool.
  • Next, suppose we wish to aggregate Amira with a third agent, Cleo, who has a credence only in $X_1$, which she assigns $0.5$ -- that is, $c_C(X_1) = 0.5$. Then $F = \{X_1, X_2, X_3\}$, and  $$c^*_C = \{c : c(X_1) = 0.5, c(X_2) \geq 0.5, c(X_3) = 1 - c(X_1) - c(X_2)\}$$ So, $$\mathrm{CAP}^*_{\mathfrak{D}}(c_A, c_B) = \mathrm{arg\ min}_{c' \in P_F} \mathfrak{D}(c', c_A) + \mathfrak{D}(c', c^*_C)$$Working through the calculation for $\mathfrak{D} = \mathrm{SED}$, we obtain the following aggregate: $c(X_1) = 0.4$, $c(X_2) = 0.425$, $c(X_3) = 0.175$.
  • One interesting feature of CAP$^*$ is that, unlike CAP, we can apply it to individual agents. Thus, for instance, suppose we wish to take Cleo's single credence in $X_1$ and 'fill in' her credences in $X_2$ and $X_3$. Then we can use CAP$^*$ to do this. Her new credence function will be $$c'_C = \mathrm{CAP}^*_{\mathrm{SED}}(c_C) = \mathrm{arg\ min}_{c' \in P_F} D(c', c_C)$$ That is, $c'_C(X_1) = 0.5$, $c'_C(X_2) = 0.25$, $c'_C(X_3) = 0.25$. Rather unsurprisingly, $c'_C$ is the midpoint of the line formed by the imprecise probabilities $c^*_C$. Now, notice: the aggregate of Amira and Cleo given above is just the straight linear pool of Amira's credence function $c_A$ and Cleo's 'filled in' credence function $c'_C$. I would conjecture that this is generally true: filling in credences using CAP$^*_{\mathrm{SED}}$ and then aggregating using straight linear pooling always agrees with aggregating using CAP$^*_{\mathrm{SED}}$. And perhaps this generalises beyond SED.

Monday, 31 July 2017

Logic in the wild CFP (Ghent, 9-10 Nov 2017)

CALL FOR PAPERS
workshop on 
LOGIC IN THE WILD 
Ghent University, 9 & 10 November 2017. 


The scope of this workshop
Nowadays we are witnessing a ‘practical’, or cognitive turn in logic. The approach draws on enormous achievements of a legion of formal and mathematical logicians, but focuses on ‘the Wild’: actual human processes of reasoning and argumentation. Moreover, high standards of inquiry that we owe to formal logicians offer a new quality in research on reasoning and argumentation. In terms of John Corcoran’s distinction between logic as formal ontology and logic as formal epistemology, the aim of the practical turn is to make formal epistemology even more epistemically oriented. This is not to say that this ‘practically turned’ (or cognitively oriented) logic becomes just a part of psychology. This is to say that this logic acquires a new task of “systematically keeping track of changing representations of information”, as Johan van Benthem puts it, and that it contests the claim that the distinction between descriptive and normative accounts of reasoning is disjoint and exhaustive. From a different than purely psychological perspective logic becomes -- again -- interested in answering Dewey’s question about the Wild: how do we think? This is the new alluring face of psychologism, or cognitivism, in logic, as opposed to the old one, which Frege and Husserl fought against. This is the area of research to which our workshop is devoted.
For this workshop we invite submissions on:
- applications of logic to the analysis of actual human reasoning and argumentation processes.
- tools and methods suited for such applications.
- neural basis of logical reasoning.
- educational issues of cognitively-oriented logic.

Keynote speakers
Keith Stenning (University of Edinburgh)
Iris van Rooij (Radboud University Nijmegen)
Christian Strasser (Ruhr University Bochum)

How to submit an abstract
We welcome submissions on any topic that fits into the scope as described above. Send your abstract of 300 to 500 words to: lrr@ugent.be  before 10 September 2017.
Notification of acceptance: 22 September 2017.

Website
More information about the workshop (venue, registration, …) is available at
http://www.lrr.ugent.be/logic-in-the-wild/. The programme will be available there in October.

Background
This workshop is organized by the scientific research network Logical and Methodological Analysis of Scientific Reasoning Processes (LMASRP) which is sponsored by the Research Foundation Flanders (FWO).
All information about the network can be found at http://www.lmasrp.ugent.be/

An overview of the previous workshops of the network can be found at http://www.lrr.ugent.be/.

Sunday, 2 July 2017

Three Postdoctoral Fellowships at the MCMP (LMU Munich)

The Munich Center for Mathematical Philosophy (MCMP) seeks applications for three 3-year postdoctoral fellowships starting on October 1, 2017. (A later starting date is possible.) We are especially interested in candidates who work in the field of mathematical philosophy with a focus on philosophical logic (broadly construed, including philosophy and foundations of mathematics, semantics, formal philosophy of language, inductive logic and foundations of probability, and more).

Candidates who have not finished their PhD at the time of the application deadline have to provide evidence that they will have their PhD in hand at the time the fellowship starts. Applications (including a cover letter that addresses, amongst others, one's academic background, research interests and the proposed starting date, a CV, a list of publications, a sample of written work of no more than 5000 words, and a description of a planned research project of about 1000 words) should be sent by email (in one PDF document) to office.leitgeb@lrz.uni-muenchen.de by August 15, 2017. Hard copy applications are not accepted. Additionally, two confidential letters of reference addressing the applicant's qualifications for academic research should be sent to the same email address from the referees directly.

The MCMP hosts a vibrant research community of faculty, postdoctoral fellows, doctoral fellows, master students, and visiting fellows. It organizes at least two weekly colloquia and a weekly internal work-in-progress seminar, as well as various other activities such as workshops, conferences, summer schools, and reading groups. The successful candidates will partake in the MCMP's academic activities and enjoy its administrative facilities and support. The official language at the MCMP is English and fluency in German is not mandatory.

We especially encourage female scholars to apply. The LMU in general, and the MCMP in particular, endeavor to raise the percentage of women among its academic personnel. Furthermore, given equal qualification, preference will be given to candidates with disabilities.

The fellowships are remunerated with 1.853 €/month (paid out without deductions for tax and social security). The MCMP is able to support fellows concerning expenses for professional traveling.

For further information, please contact Prof. Hannes Leitgeb (H.Leitgeb@lmu.de).


 

Three Doctoral Fellowships at the MCMP (LMU Munich)

The Munich Center for Mathematical Philosophy (MCMP) seeks applications for three 3-year doctoral fellowships starting on October 1, 2017. (A later starting date is possible.) We are especially interested in candidates who work in the field of mathematical philosophy with a focus on philosophical logic (broadly construed, including philosophy and foundations of mathematics, semantics, formal philosophy of language, inductive logic and foundations of probability, and more).

Candidates who have not finished their MA at the time of the application deadline have to provide evidence that they will have their MA in hand at the time the fellowship starts. Applications (including a cover letter that addresses, amongst others, one's academic background, research interests and the proposed starting date, a CV, a list of publications (if applicable), a sample of written work of no more than 3000 words, and a description of the planned PhD-project of about 2000 words) should be sent by email (in one PDF document) to office.leitgeb@lrz.uni-muenchen.de by August 15, 2017. Hard copy applications are not accepted. Additionally, one confidential letter of reference addressing the applicant's qualifications for academic research should be sent to the same email address from the referees directly.

The MCMP hosts a vibrant research community of faculty, postdoctoral fellows, doctoral fellows, master students, and visiting fellows. It organizes at least two weekly colloquia and a weekly internal work-in-progress seminar, as well as various other activities such as workshops, conferences, summer schools, and reading groups. The successful candidates will partake in the MCMP's academic activities and enjoy its administrative facilities and support. The official language at the MCMP is English and fluency in German is not mandatory.

We especially encourage female scholars to apply. The LMU in general, and the MCMP in particular, endeavor to raise the percentage of women among its academic personnel. Furthermore, given equal qualification, preference will be given to candidates with disabilities.

The fellowships are remunerated with 1.468 €/month (paid out without deductions for tax and social security). The MCMP is able to support fellows concerning expenses for professional traveling.

For further information, please contact Prof. Hannes Leitgeb (H.Leitgeb@lmu.de).


Tuesday, 16 May 2017

The Wisdom of the Crowds: generalizing the Diversity Prediction Theorem

I've just been reading Aidan Lyon's fascinating paper, Collective Wisdom. In it, he mentions a result known as the Diversity Prediction Theorem, which is sometimes taken to explain why crowds are wiser, on average, than the individuals who compose them. The theorem was originally proved by Anders Krogh and Jesper Vedelsby, but it has entered the literature on social epistemology through the work of Scott E. Page. In this post, I'll generalize this result.

The Diversity Prediction Theorem concerns a situation in which a number of different individuals estimate a particular quantity -- in the original example, it is the weight of an ox at a local fair. Take the crowd's estimate of the quantity to be the average of the individual estimates. Then the theorem shows that the distance from the crowd's estimate to the true value is less than the average distance from the individual estimates to the true value; and, moreover, the difference between the two is always given by the average distance from the individual estimates to the crowd's estimate (which you might think of as the variance of the individual estimates).

Let's make this precise. Suppose you have a group of $n$ individuals. They each provide an estimate for a real-valued quantity. The $i^\mathrm{th}$ individual gives the prediction $q_i$. The true value of this quantity is $\tau$. And we measure the distance from one estimate of a quantity to another, or to the true value of that quantity, using squared error. Then:
  • The crowd's prediction of the quantity is $c = \frac{1}{n}\sum^n_{i=1} q_i$.
  • The crowd's distance from the true quantity is $\mathrm{SqE}(c) = (c-\tau)^2$.
  • $S_i$'s distance from the true quantity is $\mathrm{SqE}(q_i) = (q_i-\tau)^2$
  • The average individual distance from the true quantity is $\frac{1}{n} \sum^n_{i=1} \mathrm{SqE}(q_i) = \frac{1}{n} \sum^n_{i=1} (q_i - \tau)^2$.
  • The average individual distance from the crowd's estimate is $v = \frac{1}{n}\sum^n_{i=1} (q_i - c)^2$.
Given this, we have:

Diversity Prediction Theorem $$\mathrm{SqE}(c) = \frac{1}{n} \sum^n_{i=1} \mathrm{SqE}(q_i) - v$$
The theorem is easy enough to prove. You essentially just follow the algebra. However, following through the proof, you might be forgiven for thinking that the result says more about some quirk of squared error as a measure of distance than about the wisdom of crowds. And of course squared error is just one way of measuring the distance from an estimate of a quantity to the true value of that quantity, or from one estimate of a quantity to another. There are other such distance measures. So the question arises: Does the Diversity Prediction Theorem hold if we replace squared error with one of these alternative measures of distance? In particular, it is natural to take any of the so-called Bregman divergences $\mathfrak{d}$ to be a legitimate measure of distance from one estimate to another. I won't say much about Bregman divergences here, except to give their formal definition. To learn about their properties, have a look here and here. They were introduced by Bregman as a natural generalization of squared error.

Definition (Bregman divergence) A function $\mathfrak{d} : [0, \infty) \times [0, \infty) \rightarrow [0, \infty]$ is a Bregman divergence if there is a continuously differentiable, strictly convex function $\varphi : [0, \infty) \rightarrow [0, \infty)$ such that $$\mathfrak{d}(x, y) = \varphi(x) - \varphi(y) - \varphi'(y)(x-y)$$
Squared error is itself one of the Bregman divergences. It is the one generated by $\varphi(x) = x^2$. But there are many others, each generated by a different function $\varphi$.

Now, suppose we measure distance between estimates using a Bregman divergence $\mathfrak{d}$. Then:
  • The crowd's prediction of the quantity is $c = \frac{1}{n}\sum^n_{i=1} j_i$.
  • The crowd's distance from the true quantity is $\mathrm{E}(c) = \mathfrak{d}(c, \tau)$.
  • $S_i$'s distance from the true quantity is $\mathrm{E}(j_i) = \mathfrak{d}(q_i, \tau)$
  • The average individual distance from the true quantity is $\frac{1}{n} \sum^n_{i=1} \mathrm{E}(j_i) = \frac{1}{n} \sum^n_{i=1} \mathfrak{d}(q_i, \tau)$.
  • The average individual distance from the crowd's estimate is $v = \frac{1}{n}\sum^n_{i=1} \mathfrak{d}(q_i, c)$.
 Given this, we have:

Generalized Diversity Prediction Theorem $$\mathrm{E}(c) = \frac{1}{n} \sum^n_{i=1} \mathrm{E}(q_i) - v$$
Proof.
\begin{eqnarray*}
& & \frac{1}{n} \sum^n_{i=1} \mathrm{E}(q_i) - v \\
& = & \frac{1}{n} \sum^n_{i=1} [ \mathfrak{d}(q_i, \tau) - \mathfrak{d}(q_i, c)] \\
& = & \frac{1}{n} \sum^n_{i=1} [\varphi(q_i) - \varphi(\tau) - \varphi'(\tau)(q_i - \tau)] - [\varphi(q_i) - \varphi(c) - \varphi'(\tau)(q_i - c)] \\
& = & \frac{1}{n} \sum^n_{i=1} [\varphi(q_i)- \varphi(\tau) - \varphi'(\tau)(q_i - \tau) - \varphi(q_i)+ \varphi(c) + \varphi'(\tau)(q_i - c)] \\
& = & - \varphi(\tau) - \varphi'(\tau)((\frac{1}{n} \sum^n_{i=1} q_i) - \tau) + \varphi(c) + \varphi'(\tau)((\frac{1}{n} \sum^n_{i=1} q_i) - c) \\
& = & - \varphi(\tau) - \varphi'(\tau)(c - \tau) + \varphi(c) + \varphi'(\tau)(c - c) \\
& = & \varphi(c) - \varphi(\tau) - \varphi'(\tau)(c - \tau) \\
& = &   \mathfrak{d}(c, \tau) \\
& = & \mathrm{E}(c)
\end{eqnarray*}
as required.

Thursday, 11 May 2017

Reasoning Club Conference 2017


The Fifth Reasoning Club Conference will take place at the Center for Logic, Language, and Cognition in Turin on May 18-19, 2017.

The Reasoning Club is a network of institutes, centres, departments, and groups addressing research topics connected to reasoning, inference, and methodology broadly construed. It issues the monthly gazette The Reasoner. (Earlier editions of the meeting were held in Brussels, Pisa, Kent, and Manchester.)



PROGRAM


THURSDAY, MAY 18

Palazzo Badini
via Verdi 10, Torino
Sala Lauree di Psicologia (ground floor)


9:00 | welcome and coffee

9:30 | greetings
           presentation of the new editorship of The Reasoner
           (Hykel HOSNI, Milan)


Morning session – chair: Gustavo CEVOLANI (IMT Lucca)


10:00 | invited talk

Branden FITELSON (Northeastern University, Boston)

Two approaches to belief revision

In this paper, we compare and contrast two methods for the qualitative revision of (viz., full) beliefs. The first (Bayesian) method is generated by a simplistic diachronic Lockean thesis requiring coherence with the agent's posterior credences after conditionalization. The second (Logical) method is the orthodox AGM approach to belief revision. Our primary aim will be to characterize the ways in which these two approaches can disagree with each other — especially in the special case where the agent's belief set is deductively cogent.

(joint work with Ted Shear and Jonathan Weisberg)


11:00 | Ted SHEAR (Queensland) and John QUIGGIN (Queensland)
 
A modal logic for reasonable belief


11:45 | Nina POTH (Edinburgh) and Peter BRÖSSEL (Bochum)

Bayesian inferences and conceptual spaces: Solving the complex-first paradox


12:30 | lunch break


Afternoon session I – chair: Peter BRÖSSEL (Bochum)


13:30 | invited talk

Katya TENTORI (University of Trento)

Judging forecasting accuracy 
How human intuitions can help improving formal models

Most of the scoring rules that have been discussed and defended in the literature are not ordinally equivalent, with the consequence that, after the very same outcome has materialized, a forecast X can be evaluated as more accurate than Y according to one model but less accurate according to another. A question that naturally arises is therefore which of these models better captures people’s intuitive assessment of forecasting accuracy. To answer this question, we developed a new experimental paradigm for eliciting ordinal judgments of accuracy concerning pairs of forecasts for which various combinations of associations/dissociations between the Quadratic, Logarithmic, and Spherical scoring rules are obtained. We found that, overall, the Logarithmic model is the best predictor of people’s accuracy judgments, but also that there are cases in which these judgments — although they are normatively sound — systematically depart from what is expected by all the models. These results represent an empirical evaluation of the descriptive adequacy of the three most popular scoring rules and offer insights for the development of new formal models that might favour a more natural elicitation of truthful and informative beliefs from human forecasters.

(joint work with Vincenzo Crupi and Andrea Passerini)


14:15 | Catharine SAINT-CROIX (Michigan)

Immodesty and evaluative uncertainty


15:15 | Michael SCHIPPERS (Oldenburg), Jakob KOSCHOLKE (Hamburg)

Against relative overlap measures of coherence


16:00 | coffee break


Afternoon session II – chair: Paolo MAFFEZIOLI (Torino)


16:30 | Simon HEWITT (Leeds)

Frege's theorem in plural logic


17:15 | Lorenzo ROSSI (Salzburg) and Julien MURZI (Salzburg)

Generalized Revenge


 
FRIDAY, MAY 19

Campus Luigi Einaudi
Lungo Dora Siena 100/A
Sala Lauree Rossa
building D1 (ground floor)


9:00 | welcome and coffee


Morning session – chair: Jan SPRENGER (Tilburg)


9:30 | invited talk

Paul EGRÉ (Institut Jean Nicod, Paris)

Logical consequence and ordinary reasoning

The notion of logical consequence has been approached from a variety of angles. Tarski famously proposed a semantic characterization (in terms of truth-preservation), but also a structural characterization (in terms of axiomatic properties including reflexivity, transitivity, monotonicity, and other features). In recent work, E. Chemla, B. Spector and I have proposed a characterization of a wider class of consequence relations than Tarskian relations, which we call "respectable" (Journal of Logic and Computation, forthcoming). The class also includes non-reflexive and nontransitive relations, which can be motivated in relation to ordinary reasoning (such as reasoning with vague predicates, see Zardini 2008, Cobreros et al. 2012, or reasoning with presuppositions, see Strawson 1952, von Fintel 1998, Sharvit 2016). Chemla et al.'s characterization is partly structural, and partly semantic, however. In this talk I will present further advances toward a purely structural characterization of such respectable consequence relations. I will discuss the significance of this research program toward bringing logic closer to ordinary reasoning.

(joint work with Emmanuel Chemla and Benjamin Spector)


10:30 | Niels SKOVGAARD-OLSEN (Freiburg)

Conditionals and multiple norm conflicts


11:15 | Luis ROSA (Munich)

Knowledge grounded on pure reasoning


12:00 | lunch break


Afternoon session I – chair: Steven HALES (Bloomsburg)


13:30 | invited talk

Leah HENDERSON (University of Groningen)

The unity of explanatory virtues

Scientific theory choice is often characterised as an Inference to the Best Explanation (IBE) in which a number of distinct explanatory virtues are combined and traded off against one another. Furthermore, the epistemic significance of each explanatory virtue is often seen as highly case-specific. But are there really so many dimensions to theory choice? By considering how IBE may be situated in a Bayesian framework, I propose a more unified picture of the virtues in scientific theory choice.


14:30 | Benjamin EVA (Munich) and Reuben STERN (Munich)

Causal explanatory power


15:15 | coffee break


Afternoon session II – chair: Jakob KOSCHOLKE (Hamburg)


16:00 | Barbara OSIMANI (Munich)

Bias, random error, and the variety of evidence thesis


16:45 | Felipe ROMERO (Tilburg) and Jan SPRENGER (Tilburg)

Scientific self-correction: The Bayesian way



ORGANIZING COMMITTEE

Gustavo Cevolani (Torino)
Vincenzo Crupi (Torino)
Jason Konek (Kent)
Paolo Maffezioli (Torino)



For any queries please contact Vincenzo Crupi (vincenzo.crupi@unito.it) or Jason Konek (jpkonek@ksu.edu).


Saturday, 8 April 2017

Formal Truth Theories workshop, Warsaw (Sep. 28-30)

Cezary Cieslinski and his team organize a workshop on formal theories of truth in Warsaw, to take place 28-30 September 2017. The invites include Dora Achourioti, Ali Enayat, Kentaro Fujimoto, Volker Halbach, Graham Leigh, and Albert Visser. Submission deadline is May 15. More details here.

Sunday, 19 March 2017

Aggregating incoherent credences: the case of geometric pooling

In the last few posts (here and here), I've been exploring how we should extend the probabilistic aggregation method of linear pooling so that it applies to groups that contain incoherent individuals (which is, let's be honest, just about all groups). And our answer has been this: there are three methods -- linear-pool-then-fix, fix-then-linear-pool, and fix-and-linear-pool-together -- and they agree with one another just in case you fix incoherent credences by taking the nearest coherent credences as measured by squared Euclidean distance. In this post, I ask how we should extend the probabilistic aggregation method of geometric pooling.

As before, I'll just consider the simplest case, where we have two individuals, Adila and Benoit, and they have credence functions -- $c_A$ and $c_B$, respectively -- that are defined for a proposition $X$ and its negation $\overline{X}$. Suppose $c_A$ and $c_B$ are coherent. Then geometric pooling says:

Geometric pooling The aggregation of $c_A$ and $c_B$ is $c$, where
  • $c(X) = \frac{c_A(X)^\alpha c_B(X)^{1-\alpha}}{c_A(X)^\alpha c_B(X)^{1-\alpha} + c_A(\overline{X})^\alpha c_B(\overline{X})^{1-\alpha}}$
  • $c(\overline{X}) = \frac{c_A(\overline{X})^\alpha c_B(\overline{X})^{1-\alpha}}{c_A(X)^\alpha c_B(X)^{1-\alpha} + c_A(\overline{X})^\alpha c_B(\overline{X})^{1-\alpha}}$
for some $0 \leq \alpha \leq 1$.

Now, in the case of linear pooling, if $c_A$ or $c_B$ is incoherent, then it is most likely that any linear pool of them is also incoherent. However, in the case of geometric pooling, this is not the case. Linear pooling requires us to take a weighted arithmetic average of the credences we are aggregating. If those credences are coherent, so is their weighted arithmetic average. Thus, if you are considering only coherent credences, there is no need to normalize the weighted arithmetic average after taking it to ensure coherence. However, even if the credences we are aggregating are coherent, their weighted geometric averages are not. Thus, geometric pooling requires that we first take the weighted geometric average of the credences we are pooling and then normalize the result, to ensure that the result is coherent. But this trick works whether or not the original credences are coherent. Thus, we need do nothing more to geometric pooling in order to apply it to incoherent agents.

Nonetheless, questions still arise. What we have shown is that, if we first geometrically pool our two incoherent agents, then the result is in fact coherent and so we don't need to undertake the further step of fixing up the credences to make them coherent. But what if we first choose to fix up our two incoherent agents so that they are coherent, and then geometrically pool them? Does this give the same answer as if we just pooled the incoherent agents? And, similarly, what if we decide to fix and pool together?

Interestingly, the results are exactly the reverse of the results in the case of linear pooling. In that case, if we fix up incoherent credences by taking the coherent credences that minimize squared Euclidean distance, then all three methods agree, whereas if we fix them up by taking the coherent credences that minimize generalized Kullback-Leibler divergence, then sometimes all three methods disagree. In the case of geometric pooling, it is the opposite. Fixing up using generalized KL divergence makes all three methods agree -- that is, pool, fix-then-pool, and fix-and-pool-together all give the same result when we use GKL to measure distance. But fixing up using squared Euclidean distance leads to three separate methods that sometimes all disagree. That is, GKL is the natural distance measure to accompany geometric pooling, while SED is the natural measure to accompany linear pooling.

Friday, 17 March 2017

A little more on aggregating incoherent credences

Last week, I wrote about a problem that arises if you wish to aggregate the credal judgments of a group of agents when one or more of those agents has incoherent credences. I focussed on the case of two agents, Adila and Benoit, who have credence functions $c_A$ and $c_B$, respectively. $c_A$ and $c_B$ are defined over just two propositions, $X$ and its negation $\overline{X}$.

I noted that there are two natural ways to aggregate $c_A$ and $c_B$ for someone who adheres to Probabilism, the principle that says that credences should be coherent. You might first fix up Adila's and Benoit's credences so that they are coherent, and then aggregate them using linear pooling -- let's call that fix-then-pool. Or you might aggregate Adila's and Benoit's credences using linear pooling, and then fix up the pooled credences so that they are coherent -- let's call that pool-then-fix. And I noted that, for some natural ways of fixing up incoherent credences, fix-then-pool gives a different result from pool-then-fix. This, I claimed, creates a dilemma for the person doing the aggregating, since there seems to be no principled reason to favour either method.

How do we fix up incoherent credences? Well, a natural idea is to find the coherent credences that are closest to them and adopt those in their place. This obviously requires a measure of distance between two credence functions. In last week's post, I considered two:

Squared Euclidean Distance (SED) For two credence functions $c$, $c'$ defined on a set of propositions $X_1$, $\ldots$, $X_n$,$$SED(c, c') = \sum^n_{i=1} (c(X_i) - c'(X_i))^2$$

Generalized Kullback-Leibler Divergence (GKL) For two credence functions $c$, $c'$ defined on a set of propositions $X_1$, $\ldots$, $X_n$,$$GKL(c, c') = \sum^n_{i=1} c(X_i) \mathrm{log}\frac{c(X_i)}{c'(X_i)} - \sum^n_{i=1} c(X_i) + \sum^n_{i=1} c'(X_i)$$

If we use $SED$ when we are fixing incoherent credences -- that is, if we fix an incoherent credence function $c$ by adopting the coherent credence function $c^*$ for which $SED(c^*, c)$ is minimal -- then fix-then-pool gives the same results as pool-then-fix.

If we use GKL when we are fixing incoherent credences -- that is, if we fix an incoherent credence function $c$ by adopting the coherent credence function $c^*$ for which $GKL(c^*, c)$ is minimal -- then fix-then-pool gives different results from pool-then-fix.

Since last week's post, I've been reading this paper by Joel Predd, Daniel Osherson, Sanjeev Kulkarni, and Vincent Poor. They suggest that we pool and fix incoherent credences in one go using a method called the Coherent Aggregation Principle (CAP), formulated in this paper by Daniel Osherson and Moshe Vardi. In its original version, CAP says that we should aggregate Adila's and Benoit's credences by taking the coherent credence function $c$ such that the sum of the distance of $c$ from $c_A$ and the distance of $c$ from $c_B$ is minimized. That is,

CAP Given a measure of distance $D$ between credence functions, we should pick that coherent credence function $c$ such that minimizes $D(c, c_A) + D(c, c_B)$.

As they note, if we take $SED$ to be our measure of distance, then this method generalizes the aggregation procedure on coherent credences that just takes straight averages of credences. That is, CAP entails unweighted linear pooling:

Unweighted Linear Pooling If $c_A$ and $c_B$ are coherent, then the aggregation of $c_A$ and $c_B$ is $$\frac{1}{2} c_A + \frac{1}{2}c_B$$

We can generalize this result a little by taking a weighted sum of the distances, rather than the straight sum.

Weighted CAP Given a measure of distance $D$ between credence functions, and given $0 \leq \alpha leq 1$, we should pick the coherent credence function $c$ that minimizes $\alpha D(c, c_A) + (1-\alpha)D(c, c_B)$.

If we take $SED$ to measure the distance between credence functions, then this method generalizes linear pooling. That is, Weighted CAP entails linear pooling:

Linear Pooling If $c_A$ and $c_B$ are coherent, then the aggregation of $c_A$ and $c_B$ is $$\alpha c_A + (1-\alpha)c_B$$ for some $0 \leq \alpha \leq 1$.

What's more, when distance is measured by $SED$, Weighted CAP agrees with fix-then-pool and with pool-then-fix (providing the fixing is done using $SED$ as well). Thus, when we use $SED$, all of the methods for aggregating incoherent credences that we've considered agree. In particular, they all recommend the following credence in $X$: $$\frac{1}{2} + \frac{\alpha(c_A(X)-c_A(\overline{X})) + (1-\alpha)(c_B(X)  - c_B(\overline{X}))}{2}$$

However, the story is not nearly so neat and tidy if we measure the distance between two credence functions using $GKL$. Here's the credence in $X$ recommended by fix-then-pool:$$\alpha \frac{c_A(X)}{c_A(X) + c_A(\overline{X})} + (1-\alpha)\frac{c_B(X)}{c_B(X) + c_B(\overline{X})}$$ Here's the credence in $X$ recommended by pool-then-fix: $$\frac{\alpha c_A(X) + (1-\alpha)c_B(X)}{\alpha (c_A(X) + c_A(\overline{X})) + (1-\alpha)(c_B(X) + c_B(\overline{X}))}$$ And here's the credence in $X$ recommended by Weighted CAP: $$\frac{c_A(X)^\alpha c_B(X)^{1-\alpha}}{c_A(X)^\alpha c_B(X)^{1-\alpha} + c_A(\overline{X})^\alpha c_B(\overline{X})^{1-\alpha}}$$ For many values of $\alpha$, $c_A(X)$, $c_A(\overline{X})$, $c_B(X)$, $c_B(\overline{X})$ these will give three distinct results.


Friday, 10 March 2017

A dilemma for judgment aggregation

Let's suppose that Adila and Benoit are both experts, and suppose that we are interested in gleaning from their opinions about a certain proposition $X$ and its negation $\overline{X}$ a judgment of our own about $X$ and $\overline{X}$. Adila has credence function $c_A$, while Benoit has credence function $c_B$. One standard way to derive our own credence function on the basis of this information is to take a linear pool or weighted average of Adila's and Benoit's credence functions. That is, we assign a weight to Adila ($\alpha$) and a weight to Benoit ($1-\alpha$) and we take the linear combination of their credence functions with these weights to be our credence function. So my credence in $X$ will be $\alpha c_A(X) + (1-\alpha) c_B(X)$, while my credence in $\overline{X}$ will be $\alpha c_A(\overline{X}) + (1-\alpha)c_B(\overline{X})$.

But now suppose that either Adila or Benoit or both are probabilistically incoherent -- that is, either $c_A(X) + c_A(\overline{X}) \neq 1$ or $c_B(X) + c_B(\overline{X}) \neq 1$ or both. Then, it may well be that the linear pool of their credence functions is also probabilistically incoherent. That is,

$(\alpha c_A(X) + (1-\alpha) c_B(X)) + (\alpha c_A(\overline{X}) + (1-\alpha)c_B(\overline{X})) = $

$\alpha (c_A(X)  + c_A(\overline{X})) + (1-\alpha)(c_B(X) + c_B(\overline{X})) \neq 1$

But, as an adherent of Probabilism, I want my credences to be probabilistically coherent. So, what should I do?

A natural suggestion is this: take the aggregated credences in $X$ and $\overline{X}$, and then take the closest pair of credences that are probabilistically coherent. Let's call that process the coherentization of the incoherent credences. Of course, to carry out this process, we need a measure of distance between any two credence functions. Luckily, that's easy to come by. Suppose you are an adherent of Probabilism because you are persuaded by the so-called accuracy dominance arguments for that norm. According to these arguments, we measure the accuracy of a credence function by measuring its proximity to the ideal credence function, which we take to be the credence function that assigns credence 1 to all truths and credence 0 to all falsehoods. That is, we generate a measure of the accuracy of a credence function from a measure of the distance between two credence functions. Let's call that distance measure $D$. In the accuracy-first literature, there are reasons for taking $D$ to be a so-called Bregman divergence. Given such a measure $D$, we might be tempted to say that, if Adila and/or Benoit are incoherent and our linear pool of their credences is incoherent, we should not adopt that linear pool as our credence function, since it violates Probabilism, but rather we should find the nearest coherent credence function to the incoherent linear pool, relative to $D$, and adopt that. That is, we should adopt credence function $c$ such that $D(c, \alpha c_A + (1-\alpha)c_B)$ is minimal. So, we should first take the linear pool of Adila's and Benoit's credences; and then we should make them coherent.

But this raises the question: why not first make Adila's and Benoit's credences coherent, and then take the linear pool of the resulting credence functions? Do these two procedures give the same result? That is, in the jargon of algebra, does linear pooling commute with our procedure for making incoherent credences coherent? Does linear pooling commute with coherentization? If so, there is no problem. But if not, our judgment aggregation method faces a dilemma: in which order should the procedures be performed: aggregate, then make coherent; or make coherent, then aggregate.

It turns out that whether or not the two commute depends on the distance measure in question. First, suppose we use the so-called squared Euclidean distance measure. That is, for two credence functions $c$, $c'$ defined on a set of propositions $X_1$, $\ldots$, $X_n$,$$SED(c, c') = \sum^n_{i=1} (c(X_i) - c'(X_i))^2$$ In particular, if $c$, $c'$ are defined on $X$, $\overline{X}$, then the distance from $c$ to $c'$ is $$(c(X) -c'(X))^2 + (c(\overline{X})-c'(\overline{X})^2$$ And note that this generates the quadratic scoring rule, which is strictly proper:
  • $\mathfrak{q}(1, x) = (1-x)^2$
  • $\mathfrak{q}(0, x) = x^2$
Then, in this case, linear pooling commutes with our procedure for making incoherent credences coherent. Given a credence function $c$, let $c^*$ be the closest coherent credence function to $c$ relative to $SED$. Then:

Theorem 1 For all $\alpha$, $c_A$, $c_B$, $$\alpha c^*_A + (1-\alpha)c^*_B = (\alpha c_A + (1-\alpha)c_B)^*$$

Second, suppose we use the generalized Kullback-Leibler divergence to measure the distance between credence functions. That is, for two credence functions $c$, $c'$ defined on a set of propositions $X_1$, $\ldots$, $X_n$,$$GKL(c, c') = \sum^n_{i=1} c(X_i) \mathrm{log}\frac{c(X_i)}{c'(X_i)} - \sum^n_{i=1} c(X_i) + \sum^n_{i=1} c'(X_i)$$ Thus, for $c$, $c'$ defined on $X$, $\overline{X}$, the distance from $c$ to $'$ is $$c(X)\mathrm{log}\frac{c(X)}{c'(X)} + c(\overline{X})\mathrm{log}\frac{c(\overline{X})}{c'(\overline{X})} - c(X) - c(\overline{X}) + c'(X) + c'(\overline{X})$$ And note that this generates the following scoring rule, which is strictly proper:
  • $\mathfrak{b}(1, x) = \mathrm{log}(\frac{1}{x}) - 1 + x$
  • $\mathfrak{b}(0, x) = x$
Then, in this case, linear pooling does not commute with our procedure for making incoherent credences coherent. Given a credence function $c$, let $c^+$ be the closest coherent credence function to $c$ relative to $GKL$. Then:

Theorem 2 For many $\alpha$, $c_A$, $c_B$, $$\alpha c^+_A + (1-\alpha)c^+_B \neq (\alpha c_A + (1-\alpha)c_B)^+$$

Proofs of Theorems 1 and 2. With the following two key facts in hand, the results are straightforward. If $c$ is defined on $X$, $\overline{X}$:
  • $c^*(X) = \frac{1}{2} + \frac{c(X)-c(\overline{X})}{2}$, $c^*(\overline{X}) = \frac{1}{2} - \frac{c(X) - c(\overline{X})}{2}$.
  • $c^+(X) = \frac{c(X)}{c(X) + c(\overline{X})}$, $c^+(\overline{X}) = \frac{c(\overline{X})}{c(X) + c(\overline{X})}$.

Thus, Theorem 1 tells us that, if you measure distance using SED, then no dilemma arises: you can aggregate and then make coherent, or you can make coherent and then aggregate -- they will have the same outcome. However, Theorem 2 tells us that, if you measure distance using GKL, then a dilemma does arise: aggregating and then making coherent gives a different outcome from making coherent and then aggregating.

Perhaps this is an argument against GKL and in favour of SED? You might think, of course, that the problem arises here only because SED is somehow naturally paired with linear pooling, while GKL might be naturally paired with some other method of aggregation such that that method of aggregation commutes with coherentization relative to GKL. That may be so. But bear in mind that there is a very general argument in favour of linear pooling that applies whichever distance measure you use: it says that if you do not aggregate a set of probabilistic credence functions using linear pooling then there is some linear pool that each of those credence functions expects to be more accurate than your aggregation. So I think this response won't work.