Meditations on Löb’s theorem and probabilistic logic

This post is a second look at The Definability of Truth in Probabilistic Logic, a preprint by Paul Christiano and other Machine Intelligence Research Institute associates, which I first read and took notes on a little over one year ago.

In particular, I explore relationships between Christiano et al’s probabilistic logic and stumbling blocks for self-reference in classical logic, like the liar’s paradox (“This sentence is false”) and in particular Löb’s theorem.

The original motivation for the ideas in this post was an attempt to prove a probabilistic version of Löb’s theorem to analyze the truth-teller sentences (“This sentence is [probably] true”) of probabilistic logic, an idea that came out of some discussions at a MIRIx workshop that I hosted in Seattle.

The Reflection Principle

The reflection principle in Christiano’s paper states that one can add a “probability” symbol {P} to a first-order theory sufficiently sophisticated to admit Gödel numbering of its sententes in a way that mostly agrees with some actual probability measure {\mathbb{P}} on sentences. That is, for every rational {a\in[0,1]} and every sentence {\varphi} (which may include {P}), the Reflection principle asserts

  1. If {\mathbb{P}(\varphi)>a}, then {\mathbb{P}(P(\ulcorner\varphi\urcorner)>a)=1}.
  2. If {\mathbb{P}(\varphi)<a}, then {\mathbb{P}(P(\ulcorner\varphi\urcorner)<a)=1}.

The motivating idea is that {\mathbb{P}} can “see” its own values up to arbitrary (but finite) precision, and in particular can see when the probability of some sentence {\varphi} is in a given open set. Christiano et al show that such a {\mathbb{P}} can be constructed (using the high-powered Kakutani Fixed Point Theorem), additionally satisfying the following coherence conditions which guarantee that {\mathbb{P}} agrees with logical deduction the way we should expect:

  1. If the background theory {T} proves {\varphi}, then {\mathbb{P}(\varphi)=1}.
  2. If {T} proves {\neg\varphi}, then {\mathbb{P}(\varphi)=0}.
  3. {\mathbb{P}(\varphi)=\mathbb{P}(\varphi\wedge\psi)+\mathbb{P}(\varphi\wedge\neg\psi)}.

The value of the probabilistic reflection principle is that it establishes a meaningful way for a logical system to reason about truth without choking on the liar’s paradox.

The reflection is meaningful, because contraposition gives the corresponding schema for closed intervals guaranteeing that the things which the system believes about itself are actually true:

  1. If {\mathbb{P}(P(\ulcorner\varphi\urcorner)\geq a)>0}, then {\mathbb{P}(\varphi)\geq a}.
  2. If {\mathbb{P}(P(\ulcorner\varphi\urcorner)\leq a)>0}, then {\mathbb{P}(\varphi)\leq a}.

Regarding the liar’s paradox, the main example in Christiano et al’s paper is the probabilistic liar sentence {G} which says, informally, that “{G} is probably false”. The reflection principle then goes right ahead and agrees that yes, {G} is probably false, but then stops short of concluding that this actually makes {G} true.

More explicitly, for some fixed rational {0<a<1}, {G} should say that {P(\ulcorner G\urcorner)<a}. The reflection principle knows enough to decide that {\mathbb{P}(G)} must be equal {a}, and this does contradict {G}, but only slightly; {\mathbb{P}} simply can’t see its own values with infinite precision, and so does not update further. By contrast, if {\mathbb{P}} tried to assign lower probability than {a} to {G}, then {\mathbb{P}} would see that this actually makes {G} true, and would have to update its estimate for {G} upwards (actually all the way to {1}). The argument that {\mathbb{P}(G)} can’t exceed {a} is symmetric, and {\mathbb{P}(G)=a} is the only “stable” possibility.

Löb’s theorem and the classical limits of self-trust

One example not mentioned in the Christiano paper are truth-teller sentences, i.e. {G} which say {P(\ulcorner G\urcorner)>a}. Application of the reflection principle can quickly rule out probabilities for such {G} other than {a}, {0}, and {1}, but can’t obviously narrow things down any more. Unlike the liar sentences, both extreme values {0} and {1} seem to attract rather than repel the value of {\mathbb{P}(G)}, and so it seems possible that some truth-teller sentences could be “true”, some could be “false”, and some could be “in the middle”.

This is unsatisfying, because in classical logic, Löb’s theorem implies that a certain type of truth-teller sentence must actually be true (the Good News). On the other hand, Löb’s theorem also implies that trusting yourself is provably impossible (the Very Bad News).

In this section, I want to talk about Löb’s theorem, so that in the next, I can talk about an attempt to prove an analogue of it for probabilistic logic to resolve the issue of truth-teller sentences. The attempt failed for obvious-in-retrospect reasons, but there are striking parallels that helped me understand both Löb’s theorem and the reflection principle better.

I like to think of Löb’s theorem as a generalization of the liar paradox sometimes called Löb’s paradox, or Curry’s paradox, or the Santa Claus paradox. The informal English version is a sentence {L} which says “if {L} is true, then Santa Claus exists”. One then tries to prove {L} by assuming the hypothesis (that {L} is true) and working toward the consequent (that Santa Claus exists). Well… if {L} were true, then {L}‘s hypothesis would be satisfied, and it would follow that Santa Claus exists. So now we’ve established that the consequent follows from the hypothesis, which means we’ve proved {L}. Thus {L} is true, and Santa Claus exists.

Of course the argument can be replicated for any proposition you’d like to derive.

Another (possibly clearer) way to phrase the informal sentence {L} is “{L} is false or Santa Claus exists”. Then {L} cannot be false for standard liar paradox reasons, so {L} must be true, and hence Santa Claus must exist.

To set the stage for the actual Löb’s theorem, let me say that the resolution of the liar’s paradox in classical (non-probabilistic) logic is

  1. The “truth predicate” for the standard model is simply not definable (Tarski).
  2. By replacing truth with provability, the modified liar sentence asserting its own unprovability is true in the standard model, but unprovable (Gödel, first incompleteness theorem).

However, self-trust is impossible. This can be seen in Gödel’s second incompleteness theorem, which says that a “reasonable” theory of arithmetic can only prove itself consistent… in the trivial case that it is inconsistent and just proves everything. (Here “reasonable” means that its set of axioms can be enumerated by a computer program.)

The second incompleteness theorem is a special the special case of Löb’s theorem with {\theta} taken to be {\bot} (standing for an obvious contraction, like {0\neq0}). Using the standard abbreviation {\square} for Gödel’s provability predicate:

Theorem 1 (Löb) For a “reasonable” theory {T} of arithmetic, if {T\vdash\square\theta\rightarrow\theta}, then {T\vdash\theta}.

This means that the theory {T} can’t trust its own proofs! It can only know that a proof of {\theta} would actually imply {\theta}… in the trivial case that {T} just proves {\theta}.

This seems really bad, with the small silver lining that truth-teller sentences asserting their own provability must be provable. This fact was is actually used in an interesting way in Robust Cooperation in the Prisoner’s Dilemma, my notes here.

One way to prove Löb’s theorem is to formalize the line of reasoning sketched in the Santa Claus paradox (using {\square} instead of “is true”), which ends up needing to switch out one instance of {\square\theta} for {\theta}. Or in other words, the only thing that stops the Santa Claus argument from proving any {\theta} you choose is the inability of {T} to trust that a proof of {\theta} would actually imply {\theta}.

Löb’s proof relies on three properties of {\square} known as the Hilbert-Bernays provability conditions:

  1. If {T\vdash\theta}, then {T\vdash\square\theta} (If a proof actually exists in the real world, then it’s a finite object that {T} can see and check).
  2. {T\vdash\square\theta\rightarrow\square\square\theta} (There is an easy recipe that {T} knows for turning a proof of {\theta} into a proof of {\square\theta}. Namely, feed any proof of {\theta} into the proof-checker and watch).
  3. {T\vdash\square(\theta\rightarrow\varphi)\rightarrow\square\theta\rightarrow\square\varphi} (So {T} knows how to take a proof of {\theta\rightarrow\varphi} and a proof of {\theta}, concatenate them, and apply Modus Ponens to get a proof of {\varphi}).

The Hilbert-Bernays conditions provide a nice high-level “interface” for reasoning about the {\square} operator without mucking around too much in the low-level details of its implementation. In the next section, I’ll argue that analogues of Hilbert-Bernays hold in the probabilistic setting for the “has high probability” predicate, yet Löb’s argument fails to go through.

Finally, one point of notation that I’ll insist on is writing implications of the form {(A\wedge B)\rightarrow C} in “curried” style as {A\rightarrow B\rightarrow C}. Here the arrow associates right, i.e. this is an abbreviation for {A\rightarrow(B\rightarrow C)}.

Proof of Löb’s Theorem. Suppose that {T\vdash\square\theta\rightarrow\theta}, and consider a sentence {L} for which {T} proves {L\leftrightarrow(\square L\rightarrow\theta)}. Then {T} can prove {\theta} as follows.

  1. {T\vdash\square(L\rightarrow\square L\rightarrow\theta)} (by definition of {L} and HB1)
  2. {T\vdash\square L\rightarrow\square\square L\rightarrow\square\theta} (by 1 and HB3)
  3. {T\vdash\square L\rightarrow\square\square L} (by HB2)
  4. {T\vdash\square L \rightarrow\square\theta} (by 2 and 3)
  5. {T\vdash\square L\rightarrow\theta} (by hypothesis {T\vdash\square\theta\rightarrow\theta} and 4)
  6. {T\vdash L} (by definition of {L} and 5)
  7. {T\vdash\square L} (by HB1 and 6)
  8. {T\vdash\theta} (by definition of {L}, 6, and 7)

Note the similarity in lines 5 and 6 to the informal argument that “if this sentence is true, then Santa Claus exists” is true.

Finally, because it can be a bit confusing to keep track of where everything is happening (there are levels and meta-levels) and what is allowed, I’ll include one of my favorite exercises in provability logic which stresses these distinctions. Determine which of the following sentences are true (in the standard model of arithmetic) for all sentences {\theta}, and which are provable (in Peano Arithmetic) for all sentences {\theta}. (The {\square} operator should also be taken relative to PA.)

  1. {\theta\rightarrow\square\theta}
  2. {\square\theta\rightarrow\theta}
  3. {\square\theta\rightarrow\square\square\theta}
  4. {\square\square\theta\rightarrow\square\theta}


  1. is clearly false and hence unprovable, as the consistency of PA (namely {\neg\square\bot}) is true but not provable by the incompleteness theorem.
  2. is is true (PA only proves true theorems), but not generally provable thanks to Löb’s theorem.
  3. is provable (and true). It’s the 2nd Hilbert-Bernays condition, which is admittedly the trickiest one to justify without making an appeal to authority.
  4. is a special case of 2, so is true, but still not provable (take {\theta} to be {\bot} and apply Löb’s theorem).

Probabilistic analogues

If the truth-teller sentences of classical logic are those {\theta} with {T\vdash\theta\leftrightarrow\square\theta}, and the truth-teller sentences of probabilistic logic (that we’re interested in anyway) are those {G} with {T\vdash G\leftrightarrow P(\ulcorner G\urcorner)>a}, then we might want to adapt the {\square} notation to the new context to talk about “has high probability”.

I started using the notation

\displaystyle \mathbb{P}\models\square_{>a}\varphi

to mean that {\mathbb{P}(\varphi)>a}. I’m using the semantic “models” symbol {\models} because the right hand side is something that’s actually true about {\mathbb{P}} in the real world. Similar notation can be established for other comparators {=}, {\geq}, and so forth. Boolean combinations of such expressions make sense as well, e.g.

\displaystyle \mathbb{P}\models\square_{\geq a}\varphi\rightarrow\square_{=1}\psi

means that either {\mathbb{P}(\varphi)<a} or {\mathbb{P}(\psi)=1}. Notation that does not make sense is {\mathbb{P}\models\varphi} with no {\square} in front.

Finally, it’s also worth overloading this {\square} notation for the formal logical symbol {P}. Thus the truth-teller sentences {G} we are most interested in are those with {T\vdash G\leftrightarrow\square_{>a}G}.

Then boxes can be nested and the resulting sentences can be satisfied by {\mathbb{P}}, e.g.

\displaystyle \mathbb{P}\models\square_{>a}\square_{<b}\varphi

would mean {\mathbb{P}(P(\ulcorner\varphi\urcorner)<b)>a}. So outermost boxes here stand for {\mathbb{P}} in the real world, and all inner boxes (which may be nested arbitrarily deeply) stand for the logical symbol {P}.

We will be careful to always prefix judgments including this new {\square} notation with {T\vdash} or {\mathbb{P}\models} to disambiguate the overloading.

With this notation established, the “high probability” versions of the reflection principle (and contrapositive) can be compactly expressed as

  1. {\mathbb{P}\models\square_{>a}\varphi\rightarrow\square_{=1}\square_{>a}\varphi}
  2. {\mathbb{P}\models\square_{>0}\square_{\geq a}\varphi\rightarrow\square_{\geq a}\varphi}

And now that we’ve gotten ourselves organized, analogues of the three Hilbert-Bernays conditions for “has high probability” emerge:

  1. If {T\vdash\varphi}, then {\mathbb{P}\models\square_{=1}\varphi} (this is one of the coherence conditions)
  2. {\mathbb{P}\models\square_{>a}\varphi\rightarrow\square_{=1}\square_{>a}\varphi} (this is reflection)
  3. {\mathbb{P}\models\square_{>1-\varepsilon}(\varphi\rightarrow\psi)\rightarrow\square_{>1-\delta}\varphi\rightarrow\square_{1-\varepsilon-\delta}\psi} (union bound)

Actually the version of (HB3) that will come up in a Löbian argument is the slightly more complicated

\displaystyle \mathbb{P}\models\square_{=1}(A\rightarrow B\rightarrow C)\rightarrow(\square_{>0}A\rightarrow\square_{=1}B)\rightarrow\square_{>0}A\rightarrow\square_{>0}C

which is also true by a “union bound” type of argument.

While the Hilbert-Bernays conditions were enough to prove Löb’s theorem, Christiano’s probabilistic logic also has a strong anti-Löb principle in the contrapositive reflection schema

\displaystyle \mathbb{P}\models\square_{>0}\square_{\geq a}\varphi\rightarrow\square_{\geq a}\varphi,

i.e. {\mathbb{P}} can always trust its own judgments, at least about closed conditions like {\geq a}.

This means that either there’s a mistake in the proof of the consistency of the reflection principle, or that a generalization of Löb’s theorem will fail for subtle reasons (else the above {\square\square\varphi\rightarrow\square\varphi} type of schema should always let us prove {\square\varphi}). Since I’ve checked the proof of consistency, it must be the latter.

So what goes wrong in trying to prove that Santa Claus exists (at least with high probability)? Fix any sentence {\theta}, and let {L} be a sentence with {T\vdash L\leftrightarrow(\square_{>0}L\rightarrow\square_{=1}\theta)}.

  1. By definition of {L} and (HB1), we have

    \displaystyle \mathbb{P}\models\square_{=1}(L\rightarrow\square_{>0}L\rightarrow\square_{=1}\theta).

  2. Before trying to distribute the outermost {\square}, note by (HB2), we have

    \displaystyle \mathbb{P}\models\square_{>0}L\rightarrow\square_{=1}\square_{>0}L.

  3. Now by the lines above and the specialized (HB3) with {A\equiv L}, {B\equiv\square_{>0}L}, and {C\equiv\square_{=1}\theta}, we have

    \displaystyle \mathbb{P}\models\square_{>0}L\rightarrow\square_{>0}\square_{=1}\theta.

  4. By the self-trust principle for closed conditions,

    \displaystyle \mathbb{P}\models\square_{>0}L\rightarrow\square_{=1}\theta.

At this point, it looks quite a bit like we’ve derived {L}. We’ve shown that its contents hold of {\mathbb{P}}, so we might even write {\mathbb{P}\models L}, since {L} is equivalent to a boolean combination of sentences that start with {\square}. But this does’t mean we can conclude {T\vdash L}, or even the weaker condition {\mathbb{P}\models\square_{=1}L} (since {\square_{>0}L\rightarrow\square_{=1}\theta} is a closed, rather than open, condition on probabilities, so (HB2) does not apply), or even the much weaker condition that {\mathbb{P}\models\square_{>0}L}.

In fact, if {\mathbb{P}(\theta)\neq1}, we must have {\mathbb{P}(L)=0}, even though it looks like we derived {L}.

This is a bullet that the reflection principle has to bite, but it’s not as bad as it looks. Keep in mind that {L} basically just says “either {L} is false or Santa Claus exists”. Just as the reflection principle is okay with saying that “this sentence is [probably] false” is [probably] false, and then not updating further, it has to be okay with proving the contents of this Löb sentence, but not being able to recognize this because doing so requires infinite precision.

Now one might try to massage the argument a little bit, say with {L} such that {T\vdash L\leftrightarrow(\square_{>\varepsilon}L\rightarrow\square_{\geq1-\delta}\theta)} or different combinations of {>} and {\geq}. Such attempts will fail, because the use of (HB2) in step 2 requires an open condition on the {\square L} part, but then continuing after step 4 requires all of {\square L\rightarrow\square\theta} to be an open condition, which requires a closed condition on {\square L}! (Contrast this with the classical case where the provability {\square} has no openness or closedness, and (HB2) always applies.)

One can change between closed and open conditions with minimal loss of precision, e.g. {\geq a} imples {>a-\varepsilon}, which in turn implies {\geq a-\varepsilon}, but Löb’s trick seems extremely sensitive to such to such losses, to the point that it failed to go through above because {\mathbb{P}} didn’t have the infinite precision to see that it had derived exactly what the Löb sentence said.

The failure of the Löb argument to get farther than step 4 above is subtle and confused me for longer that I care to admit, but in the presence of the “anti-Löb” principle of the form {\square\square\theta\rightarrow\square\theta} it makes quite a lot of sence. There may yet be other ways to narrow down the possible probabilities of truth-teller sentences in Christiano’s logic, but a fully general Löb’s theorem seems not to be such a way.


Details in provability logic

This post works through some technical details in the Gödel-Löb provability logic {\mathrm{GL}}, especially as they are relevant to a paper by LaVictoire et al discussed in the previous post. Subsection 4.2 of that post explores {\mathrm{GL}} as a useful tool for the task at hand, deferring some standard technical details until this post.

After reviewing some definitions in Section 1, Section 2 covers the normal form theorem, from which it follows that sentences of {\mathrm{GL}} without propositional variables can be algorithmically classified as true or false in the standard model of arithmetic. By contrast, the Kripke semantics covered in the final Section 4 allows any formula (with or without variables) to be algorithmically classified as provable or unprovable in {\mathrm{GL}}. My main source for these sections is George Boolos’ The Logic of Provability.

(Another point of contrast: Solovay’s completeness theorem, which we do not cover here, establishes that the formulas proved in {\mathrm{GL}} are exactly those which whose every instance is provable in Peano Arithmetic, with propositional variables replaced by Gödel numbers of arithmetical sentences, and the modality {\square} replaced by {\mathrm{PA}}‘s provability predicate. It is the right-to-left implication that is non-trivial.)

Section 3 gives Per Lindström’s simple proof of the fixed point theorem for {\mathrm{GL}} as in his article Provability Logic: A Short Introduction. The fixed point theorem is analogous to Gödel’s diagonal lemma for constructing self-referential sentences, and is actually the main result needed by LaVictoire et al.

Continue reading

Löb’s Theorem and the Prisoner’s Dilemma

This post concerns another research paper from the Machine Intelligence Research Institute, Robust Cooperation in the Prisoner’s Dilemma: Program Equilibrium via Provability Logic by Patrick LaVictoire et al (5/31/2013 preprint).

Mutual defection is the unique Nash equilibrium in the classical prisoner’s dilemma. It is troubling whenever rational agents pursuing their own self-interest predictably find themselves unhappy with the result, and so variants of the prisoner’s dilemma in which mutual cooperation can be rationally achieved are interesting, and perhaps reassuring. The best known variant is the iterated prisoner’s dilemma, in which the potential for long-term mutual cooperation can outweigh the short-term gains of immediate defection.

As for one-shot prisoner’s dilemmas, Douglas Hofstadter may have been the first to seriously suggest that the standard game-theoretic analysis (in which players’ actions are assumed to be independent) is insufficient. His idea of “superrationality” holds that two players who know they are similar to one other will reason that their decision will be the same, so they ought to cooperate. In this setting, any reasoning favoring cooperation is self-reinforcing, and any reasoning favoring defection is self-defeating. His 1983 account of a casual experiment he ran (fascinating though it may be) failed to exhibit superrationality among humans.

LaVictoire et al investigate agents more reliable and more transparent than humans: computer programs. In their setup, each program has access to the other’s source code while deciding whether to cooperate or defect. They identify several agents which will cooperate with agents similar to themselves but never with a defector. Here, Löb’s theorem is the key ingredient to formalizing the notion that a decision to cooperate with similar agents is self-reinforcing.

Continue reading