This post works through some technical details in the Gödel-Löb provability logic , especially as they are relevant to a paper by LaVictoire et al discussed in the previous post. Subsection 4.2 of that post explores as a useful tool for the task at hand, deferring some standard technical details until this post.
After reviewing some definitions in Section 1, Section 2 covers the normal form theorem, from which it follows that sentences of without propositional variables can be algorithmically classified as true or false in the standard model of arithmetic. By contrast, the Kripke semantics covered in the final Section 4 allows any formula (with or without variables) to be algorithmically classified as provable or unprovable in . My main source for these sections is George Boolos’ The Logic of Provability.
(Another point of contrast: Solovay’s completeness theorem, which we do not cover here, establishes that the formulas proved in are exactly those which whose every instance is provable in Peano Arithmetic, with propositional variables replaced by Gödel numbers of arithmetical sentences, and the modality replaced by ‘s provability predicate. It is the right-to-left implication that is non-trivial.)
Section 3 gives Per Lindström’s simple proof of the fixed point theorem for as in his article Provability Logic: A Short Introduction. The fixed point theorem is analogous to Gödel’s diagonal lemma for constructing self-referential sentences, and is actually the main result needed by LaVictoire et al.
This post concerns another research paper from the Machine Intelligence Research Institute, Robust Cooperation in the Prisoner’s Dilemma: Program Equilibrium via Provability Logic by Patrick LaVictoire et al (5/31/2013 preprint).
Mutual defection is the unique Nash equilibrium in the classical prisoner’s dilemma. It is troubling whenever rational agents pursuing their own self-interest predictably find themselves unhappy with the result, and so variants of the prisoner’s dilemma in which mutual cooperation can be rationally achieved are interesting, and perhaps reassuring. The best known variant is the iterated prisoner’s dilemma, in which the potential for long-term mutual cooperation can outweigh the short-term gains of immediate defection.
As for one-shot prisoner’s dilemmas, Douglas Hofstadter may have been the first to seriously suggest that the standard game-theoretic analysis (in which players’ actions are assumed to be independent) is insufficient. His idea of “superrationality” holds that two players who know they are similar to one other will reason that their decision will be the same, so they ought to cooperate. In this setting, any reasoning favoring cooperation is self-reinforcing, and any reasoning favoring defection is self-defeating. His 1983 account of a casual experiment he ran (fascinating though it may be) failed to exhibit superrationality among humans.
LaVictoire et al investigate agents more reliable and more transparent than humans: computer programs. In their setup, each program has access to the other’s source code while deciding whether to cooperate or defect. They identify several agents which will cooperate with agents similar to themselves but never with a defector. Here, Löb’s theorem is the key ingredient to formalizing the notion that a decision to cooperate with similar agents is self-reinforcing.
This post explores a draft paper by Paul Christiano et al (06/10/2013 draft), produced at a workshop hosted by the Machine Intelligence Research Institute. My intent is to go thoroughly through the main argument, to ensure that I fully understand it. Also, on my first or second reading, I had some worries that the argument seemed to produce something for nothing, possibly violating a “conservation of depth” in mathematics. I am no longer worried about this.
Towards the end of this post, I discuss the non-constructive nature of the proof given, but show that it can at least be modified so as not to rely on the Axiom of Choice.
A one-sentence summary of the paper is: Although a logical theory can’t contain its own Truth predicate, it can contain its own “subjective probability function” which assigns reasonable probabilities to sentences of the theory, including of course statements about the probability function itself.
Or to quote the conclusion of the paper:
Tarski’s result on the undefinability of truth is in some sense an artifact of the infinite precision demanded by reasoning about complete certainty.