Details in provability logic

This post works through some technical details in the Gödel-Löb provability logic ${\mathrm{GL}}$, especially as they are relevant to a paper by LaVictoire et al discussed in the previous post. Subsection 4.2 of that post explores ${\mathrm{GL}}$ as a useful tool for the task at hand, deferring some standard technical details until this post.

After reviewing some definitions in Section 1, Section 2 covers the normal form theorem, from which it follows that sentences of ${\mathrm{GL}}$ without propositional variables can be algorithmically classified as true or false in the standard model of arithmetic. By contrast, the Kripke semantics covered in the final Section 4 allows any formula (with or without variables) to be algorithmically classified as provable or unprovable in ${\mathrm{GL}}$. My main source for these sections is George Boolos’ The Logic of Provability.

(Another point of contrast: Solovay’s completeness theorem, which we do not cover here, establishes that the formulas proved in ${\mathrm{GL}}$ are exactly those which whose every instance is provable in Peano Arithmetic, with propositional variables replaced by Gödel numbers of arithmetical sentences, and the modality ${\square}$ replaced by ${\mathrm{PA}}$‘s provability predicate. It is the right-to-left implication that is non-trivial.)

Section 3 gives Per Lindström’s simple proof of the fixed point theorem for ${\mathrm{GL}}$ as in his article Provability Logic: A Short Introduction. The fixed point theorem is analogous to Gödel’s diagonal lemma for constructing self-referential sentences, and is actually the main result needed by LaVictoire et al.

1. Background

Recall that formulas in ${\mathrm{GL}}$ are built up from ${\bot}$, the propositional variables ${p}$, ${q}$, etc., the connective ${\rightarrow}$ (from which the other standard connectives can be derived), and the modal symbol ${\square}$ intended to capture the behavior of the provability predicate in Peano Arithmetic. The axioms for ${\mathrm{GL}}$ are:

• All propositional tautologies
• Distribution: ${\square(A\rightarrow B)\rightarrow(\square A\rightarrow\square B)}$
• Formal Löb’s theorem: ${\square(\square A\rightarrow A)\rightarrow\square A}$.

The rules of inference are:

• Modus Ponens: From ${A\rightarrow B}$ and ${A}$, conclude ${B}$
• Necessitation: From ${A}$, conclude ${\square A}$.

Before moving on to the normal form theorem, we list two useful redundant axioms (in that they are theorems), and one useful redundant rule of inference:

• Alternate distribution: ${\square(A\wedge B)\leftrightarrow(\square A\wedge\square B)}$
• Formal necessitation: ${\square A\rightarrow\square\square A}$
• Löb’s rule: From ${\square A\rightarrow A}$, conclude ${A}$.

2. Normal Forms

There’s an effective procedure for determining the truth or falsity of sentences without propositional variables. For example, the sentence ${\neg\square\bot}$ is not provable in ${\mathrm{GL}}$, but is true when interpreted in the standard model as the statement that Peano Arithmetic is consistent.

This sentence is one of a whole family of true sentences ${\neg\square^N\bot}$, for ${N\geq0}$ expressing that there’s no proof that there’s a proof (and so forth…) of ${\bot}$. Here ${\square^0A}$ means ${A}$, and ${\square^{N+1}A}$ means ${\square\square^NA}$.

The following normal form theorem says that any sentence without propositional variables is provably equivalent to a propositional combination of sentences of the form ${\square^n\bot}$; such a sentence is said to be in normal form. Since ${\square^n\bot}$ is false for all ${n}$, one can compute the truth value of a normal form sentence by simply deleting every ${\square}$ symbol in it, and computing the value of the propositional combination of ${\bot}$‘s that remains!

Or viewed another way, since ${\vdash_\mathrm{GL}\square^n\bot\rightarrow\square^N\bot}$ for ${N\geq n}$ (by the schema ${\square A\rightarrow\square\square A}$, or trivially if ${n=0}$), any normal form sentence ${B}$ is decidable from a sufficiently strong (true) hypothesis ${\neg\square^N\bot}$. That is, when ${N}$ is at least as large as the highest “power” on a ${\square}$ in ${B}$, then either

$\displaystyle \vdash_{\mathrm{GL}}\neg\square^N\bot\rightarrow B\qquad\mathrm{or}\qquad\vdash_\mathrm{GL}\neg\square^N\bot\rightarrow\neg B.$

This is a kind of completeness theorem for ${\mathrm{GL}}$.

Theorem 1 (Normal Form Theorem) For any sentence ${A}$ with no propositional variables, there is [one can construct] a normal form sentence ${A^*}$ with ${\vdash_\mathrm{GL}A\leftrightarrow A^*}$.

Proof: Obviously ${\bot}$ is in normal form, and if ${A}$ and ${B}$ are provably equivalent to normal form sentences ${A^*}$ and ${B^*}$, then

$\displaystyle \vdash_\mathrm{GL}(A\rightarrow B)\leftrightarrow(A^*\rightarrow B^*),$

with ${A^*\rightarrow B^*}$ in normal form.

If ${\vdash_\mathrm{GL}A\leftrightarrow A^*}$, then ${\vdash\square (A\leftrightarrow A^*)}$ by Necessitation. Then ${\square}$ distributes correctly over ${\rightarrow}$ and ${\wedge}$ such that ${\vdash_\mathrm{GL}\square A\leftrightarrow\square A^*}$. Thus to complete the proof, it suffices to show that ${\square C}$ can be put in normal form whenever ${C}$ is in normal form.

Fix such ${C}$, and put it in conjunctive normal form; that is, with some manipulation via propositional tautologies, we can write an equivalent sentence which is a conjunction of disjunctions of ${\square^i\bot}$‘s and ${\neg\square^i\bot}$‘s. Since the ${\square}$ provably distributes over conjunctions, it suffices to show for such a disjunction ${D}$ that ${\square D}$ is equivalent to a normal form sentence.

So let ${D}$ be just such a disjunction,

$\displaystyle D\equiv \left(\bigvee_{i\in M}\square^i\bot\right)\vee\left(\bigvee_{j\in N}\neg\square^j\bot\right)$

for some finite index sets ${M}$ and ${N}$. We can assume without loss of generality that ${M\neq\emptyset}$, by adding ${0}$ to ${M}$, which is akin to replacing ${D}$ with ${\bot\vee D}$. Using the schema ${\square A\rightarrow\square\square A}$ and its contrapositive, we can rewrite ${D}$ by replacing each of the “big” disjunctions with its weakest disjunct. If ${N\neq\emptyset}$, then with ${m=\max(M)}$, ${n=\min(N)}$,

$\displaystyle \vdash_\mathrm{GL} D\leftrightarrow \square^m\bot\vee\neg\square^n\bot.$

Of course, if ${N=\emptyset}$, then ${D}$ is equivalent to ${\square^m\bot}$, and ${\square D}$ is equivalent to ${\square^{m+1}\bot}$, and we are done, so we may suppose that ${N}$ is not empty, ${n}$ is well-defined, and

$\displaystyle \vdash_\mathrm{GL}D\leftrightarrow(\square^n\bot\rightarrow\square^m\bot). \ \ \ \ \ (1)$

If ${n\leq m}$, then ${D}$ is equivalent to ${\top}$ (i.e. to ${\neg\bot}$), and hence so is ${\square D}$, as ${\vdash_\mathrm{GL}\square\top}$ by Necessitation.

In the final case ${n>m}$, we’ll use (1) to show that ${\square D}$ is equivalent to ${\square^{m+1}\bot}$. For the right-to-left implication, start with the tautology

$\displaystyle \vdash_\mathrm{GL}\square^m\bot\rightarrow(\square^n\bot\rightarrow\square^m\bot).$

By Necessitation and distribution,

$\displaystyle \vdash_\mathrm{GL}\square^{m+1}\bot\rightarrow\square(\square^n\bot\rightarrow\square^m\bot).$

Also using Necessitation and distribution in (1), we can see that the consequent implies (actually is equivalent to) ${\square D}$.

For the left-to-right implication, since ${m+1\leq n}$ (and so ${\vdash_\mathrm{GL}\square^{m+1}\bot\rightarrow\square^n\bot}$),

$\displaystyle \vdash_\mathrm{GL}(\square^n\bot\rightarrow\square^m\bot)\rightarrow(\square^{m+1}\bot\rightarrow\square^m\bot).$

The hypothesis is equivalent to ${D}$, and the consequent looks amenable to Löb’s theorem. That is, introduce another ${\square}$ with Necessitation and distribution

$\displaystyle \vdash_\mathrm{GL}\square D\rightarrow\square(\square^{m+1}\bot\rightarrow\square^m\bot).$

Then by the Löb axiom for ${\square^m\bot}$, we have

$\displaystyle \vdash_\mathrm{GL}\square D\rightarrow\square^{m+1}\bot,$

completing the proof. $\Box$

3. Fixed Point Theorem

The fixed point theorem establishes that ${\mathrm{GL}}$ supports self-referential sentences, provided all self-reference occurs within the scope of a ${\square}$.

Specifically, we say that a formula ${A}$ is modalized in ${p}$ if every occurrence of ${p}$ is within the scope of a ${\square}$. For instance ${\square p\rightarrow p}$ is not modalized in ${p}$, but ${\square(\square p\rightarrow p)}$ is. To provide a suitable notation for substitution, we often write a formula ${A}$ as ${A(p)}$, understanding that ${p}$ needn’t actually occur in ${A}$ and that there may be variables distinct from ${p}$ occurring in ${A}$.

Given another sentence ${D}$, write ${A(D)}$ for the result of substituting ${D}$ for all occurrences of ${p}$ in ${A}$, relying on context to make clear which variable is to be replaced with ${D}$. Note that unlike systems such as first-order logic (or the lambda calculus or programming languages), there’s no such thing as a “bound variable” in ${\mathrm{GL}}$, so substitution can be performed naïvely.

Theorem 2 (Fixed point theorem) Any formula ${A}$ modalized in ${p}$ has a fixed point. Specifically, there is (one can construct) a formula ${D}$ neither containing ${p}$ nor any variable not in ${A}$ with

$\displaystyle \vdash_\mathrm{GL} D\leftrightarrow A(D).$

Moreover, ${D}$ is unique up to provable equivalence: any other fixed point ${E}$ of ${A}$ has ${\vdash_\mathrm{GL} D\leftrightarrow E}$.

Two familiar examples of (the uniqueness clause of) the fixed point theorem have ${A(p)\equiv \neg\square p}$ or ${A(p)\equiv\square p}$. Interpreted as statements of Peano Arithmetic, fixed points of the first are the Gödel sentences asserting their own unprovability. It’s fairly straightforward to show that such sentences are equivalent in ${\mathrm{PA}}$ to ${\mathrm{Con}_{\mathrm{PA}}}$, hence Gödel’s second incompleteness theorem falls out of his proof of the first. As expected then, ${\neg\square\bot}$ is the unique fixed point of ${\neg\square p}$ in ${\mathrm{GL}}$.

Fixed points of the second are the “truth-teller” sentences asserting their own provability. In ${\mathrm{PA}}$, they are proved by Löb’s theorem; in ${\mathrm{GL}}$, ${\top}$ is the unique fixed point of ${\square p}$.

The proof of the fixed point theorem in this section is taken from Lindström’s Provability Logic: A Short Introduction. It seems to be the simplest proof around. Boolos’ The Logic of Provability provides three alternate proofs, the first of which is not much longer than this, provided one has already developed the Kripke semantics (as in Section 4); Lindström’s proof is purely syntactic.

To prove the fixed point theorem, we’ll need some lemmas to the effect that substitution preserves logical equivalence as you would expect. To get one of the inductions to go through, we have to introduce the slightly awkward notation ${\square^+A}$ to stand for ${A\wedge\square A}$. Since ${\square A\rightarrow\square\square A}$ and ${\square\square A\rightarrow\square\square\square A}$ and so forth are all theorems of ${\mathrm{GL}}$, ${\square^+A}$ implies every member of the family ${\square^n A}$ for ${n\geq0}$. But the ${\square^+}$ is necessary, since ${A\rightarrow\square A}$ is not a theorem of ${\mathrm{GL}}$, despite the fact that if ${\vdash_\mathrm{GL}A}$ then ${\vdash_\mathrm{GL}\square A}$ (and hence ${\vdash_\mathrm{GL}\square^+A)}$.

Two useful, easily proved facts about ${\square^+}$ are:

• ${\square A}$, ${\square\square^+A}$, and ${\square^+\square A}$ are all provably equivalent.
• ${\square^+A}$ and ${\square^+\square^+A}$ are provably equivalent.

Lemma 3 For any formulas ${A(p)}$, ${B}$, and ${C}$:

• (a) ${\vdash_\mathrm{GL}\square^+(B\leftrightarrow C)\rightarrow(A(B)\leftrightarrow A(C))}$
• (b) If ${\vdash_\mathrm{GL} B\leftrightarrow C}$, then ${\vdash_\mathrm{GL} A(B)\leftrightarrow A(C)}$.

Proof: We’ll prove (a) by induction on ${A}$. Note that (b) then follows from (a), because if ${\vdash_\mathrm{GL}B\leftrightarrow C}$, then ${\vdash_\mathrm{GL}\square^+(B\leftrightarrow C)}$ by Necessitation, hence ${\vdash_\mathrm{GL}A(B)\leftrightarrow A(C)}$ by (a) and Modus Ponens. (Although (b) could be proved directly by a similar induction, which we would do if we didn’t need the stronger (a) for Lemmas 4 and 5.)

In the base case for (a) where ${A(p)}$ is ${\bot}$, ${p}$, or ${q}$ for ${q\neq p}$,

$\displaystyle \vdash_\mathrm{GL} (B\leftrightarrow C)\rightarrow (A(B)\leftrightarrow A(C))$

trivially, and actually without requiring the ${\square}$ part of ${\square^+}$ (here the consequent is ${\bot\leftrightarrow\bot}$, ${B\leftrightarrow C}$, or ${q\leftrightarrow q}$ respectively). The first inductive case, when ${A(p)}$ is ${D(p)\rightarrow E(p)}$ for some formulas ${D}$, ${E}$ is also easy.

In the case where ${A}$ is ${\square D(p)}$, we assume inductively that

$\displaystyle \vdash_\mathrm{GL}\square^+(B\leftrightarrow C)\rightarrow (D(B)\leftrightarrow D(C)),$

so by Necessitation and distribution

$\displaystyle \vdash_\mathrm{GL}\square\square^+(B\leftrightarrow C)\rightarrow \square(D(B)\leftrightarrow D(C)).$

The ${\square}$ distributes correctly over both ${\rightarrow}$ and ${\wedge}$ such that the consequent implies ${\square D(B)\leftrightarrow\square D(C)}$, i.e. implies ${A(B)\leftrightarrow A(C)}$.

By one of our useful facts about ${\square^+}$ above, the ${\square\square^+}$ in the hypothesis can be replaced with ${\square}$ without changing its meaning; then we can strengthen the hypothesis by changing ${\square}$ to ${\square^+}$. Putting these two observations together,

$\displaystyle \vdash_\mathrm{GL}\square^+(B\leftrightarrow C)\rightarrow (A(B)\leftrightarrow A(C)).$

$\Box$

The fixed point theorem is also proved by an induction on complexity, though it takes a different form. As one should expect, most of the difficulty is in handling the ${\square}$‘s.

Lemma 4 below treats this case, constructing fixed points for formulas of the form ${\square C(p)}$. Before giving its proof, we show that it’s a relatively straight path from there to the existence clause of the full fixed point theorem. Uniqueness is proved in Lemma 5.

• Note that if ${A(p)}$ is modalized in ${p}$, then ${A(p)}$ can be decomposed as ${B(\square C_1(p),\ldots,\square C_m(p))}$ for some formulas ${B}$, ${C_1,\ldots,C_m}$ only containing variables from ${A}$, and with ${B}$ not containing ${p}$. The decomposition is not unique: if you construct it by induction on ${A}$, you will have opportunities (namely at the ${\rightarrow}$ step) to simplify the decomposition (namely by combining like terms; more complex simplification may be possible as well). As a trivial example, ${\square p\rightarrow\square p}$ can be decomposed as either:
• ${B(q_1,q_2)\equiv q_1\rightarrow q_2}$, and ${C_1(p)\equiv p}$, and ${C_2(p)\equiv p}$ (demonstrating no simplification), or
• ${B(q)\equiv q\rightarrow q}$ and ${C(p)\equiv p}$ (demonstrating simplification).
• In the case when ${m=1}$, i.e. when ${A(p)}$ can be decomposed as ${B(\square C(p))}$, simply use the assumed lemma to produce a fixed point ${D}$ of ${\square C(B(p))}$,

$\displaystyle \vdash_\mathrm{GL} D\leftrightarrow\square C(B(D)).$

By Lemma 3 (b), it follows that ${B(D)}$ is a fixed point of ${A}$.

• For higher ${m}$, it helps to consider systems ${p_i\leftrightarrow\square C_i(p_1,\ldots,p_{m})}$ for ${i=1,\ldots,m}$ of formulas to be solved. That is, we wish to produce formulas ${D_i}$ with

$\displaystyle \vdash_\mathrm{GL} D_i\leftrightarrow\square C_i(D_1,\ldots,D_m)$

for ${i=1,\ldots,m}$, under the constraint that none of ${p_1,\ldots,p_m}$ appear in ${D_i}$ and any other variable in ${D_i}$ is also present in ${C_i}$.

• It is proved by induction on ${m}$ that such systems can be solved; the base case ${m=1}$ is Lemma 4. Suppose size ${m}$ systems can be solved, and fix a size ${m+1}$ system of ${C_i}$ and ${p_i}$. Solve the smaller system for just ${i=1,\ldots,m}$, producing formulas ${D_i(p_{m+1})}$ for ${i=1,\ldots,m}$ with

$\displaystyle \vdash_\mathrm{GL} D_i(p_{m+1})\leftrightarrow\square C_i(D_1(p_{m+1}),\ldots, D_m(p_{m+1}),p_{m+1}).$

Then find a fixed point ${D_{m+1}}$ for ${p_{m+1}\leftrightarrow\square C_{m+1}(D_1(p_{m+1}),\ldots, D_m(p_{m+1}),p_{m+1})}$ by Lemma 4, and “backsolve”: ${D_1(D_{m+1}),\ldots,D_m(D_{m+1}),D_{m+1}}$ solves the original size ${m+1}$ system.

• The fixed point theorem falls out: given ${A(p)\equiv B(\square C_1(p),\ldots,\square C_m(p))}$, let ${D_1,\ldots,D_m}$ be a solution to the system ${p_i\leftrightarrow\square C_i(p_i)}$, and let ${D\equiv B(D_1,\ldots,D_m)}$. Then ${m}$ applications of Lemma 3 (b) show that ${D}$ is a fixed point of ${A}$.

• It may be of interest to solve systems of equivalences ${p_i\leftrightarrow A_i(p_1,\ldots,p_m)}$ for general formulas ${A_i}$ modalized in each ${p_1,\ldots,p_m}$. This is in particular the application needed in the LaVictoire et al paper discussed in the previous post. The argument two bullet points above uses the special case ${A(p)\equiv\square C(p)}$ of the fixed point theorem (aka Lemma 4) to solve such systems when each ${A_i}$ also begins with a ${\square}$; with the general fixed point theorem established, one solves arbitrary modalized systems by a completely analogous procedure.

So to construct fixed points, we just need to prove Lemma 4. It’s actually not that hard.

Lemma 4 For any formula ${C(p)}$, the formula ${\square C(p)}$ has a fixed point. Specifically, the fixed point can be taken to be ${\square C(\top)}$:

$\displaystyle \vdash_\mathrm{GL}\square C(\top)\leftrightarrow\square C(\square C(\top)).$

Proof: Observe for any formulas ${A}$ and ${B}$ that if ${\vdash_\mathrm{GL}\square A\rightarrow B}$, then by introduction of another ${\square}$ (by Necessitation and distribution) and the fact that ${\square A\rightarrow\square\square A}$, we also have

$\displaystyle \vdash_\mathrm{GL}\square A\rightarrow\square^+B.$

Apply this observation to the tautology

$\displaystyle \vdash_\mathrm{GL}\square C(\top)\rightarrow(\top\leftrightarrow\square C(\top))$

to get

$\displaystyle \vdash_\mathrm{GL}\square C(\top)\rightarrow\square^+(\top\leftrightarrow\square C(\top)). \ \ \ \ \ (2)$

Apply Lemma 3 (a) with formula ${\square C(p)}$ to (the consequent of) (2) to get the desired left-to-right implication

$\displaystyle \begin{array}{l} \vdash_\mathrm{GL}\square C(\top)\rightarrow(\square C(\top)\leftrightarrow\square C(\square C(\top)))\\ \vdash_\mathrm{GL}\square C(\top)\rightarrow\square C(\square C(\top)). \end{array}$

For the left-to-right implication, first apply Lemma 3 (a) with formula ${C(p)}$ to (2):

$\displaystyle \vdash_\mathrm{GL}\square C(\top)\rightarrow(C(\top)\leftrightarrow C(\square C(\top))).$

Rearrange the above:

$\displaystyle \vdash_\mathrm{GL} C(\square C(\top))\rightarrow(\square C(\top)\rightarrow C(\top)),$

then introduce another ${\square}$, and use formal Löb’s theorem for ${C(\top)}$:

$\displaystyle \begin{array}{l} \vdash_\mathrm{GL}\square C(\square C(\top))\rightarrow\square(\square C(\top)\rightarrow C(\top))\\ \vdash_\mathrm{GL}\square C(\square C(\top))\rightarrow\square C(\top). \end{array}$

$\Box$

Lemma 5 (Uniqueness of fixed points) For modalized ${A(p)}$,

$\displaystyle \vdash_\mathrm{GL}\big(\square^+(p\leftrightarrow A(p))\wedge\square^+(q\leftrightarrow A(q))\big)\rightarrow (p\leftrightarrow q).$

In particular, if ${\vdash_\mathrm{GL}D\leftrightarrow A(D)}$ and ${\vdash_\mathrm{GL}E\leftrightarrow A(E)}$, then substituting ${D,E}$ for ${p,q}$ above will give ${\vdash_\mathrm{GL}D\leftrightarrow E}$.

Proof: Decompose ${A(p)}$ as ${B(\square C_1(p),\ldots,\square C_m(p))}$. By Lemma 3 (a),

$\displaystyle \vdash_\mathrm{GL}\square^+(p\leftrightarrow q)\rightarrow (C_i(p)\leftrightarrow C_i(q))$

for ${i=1,\ldots,m}$. Use the same trick as at the end of the proof of Lemma 3: introduce a ${\square}$, distribute, and note that ${\square\square^+}$ can be replaced with just ${\square}$ to get

$\displaystyle \vdash_\mathrm{GL}\square(p\leftrightarrow q)\rightarrow(\square C_i(p)\leftrightarrow\square C_i(q))$

for ${i=1,\ldots, m}$. Introducing another ${\square}$ gives

$\displaystyle \vdash_\mathrm{GL}\square\square(p\leftrightarrow q)\rightarrow\square(\square C_i(p)\leftrightarrow\square C_i(q)).$

By the above two lines and ${\square(p\leftrightarrow q)\rightarrow\square\square(p\leftrightarrow q)}$, we have

$\displaystyle \vdash_\mathrm{GL}\square(p\leftrightarrow q)\rightarrow\square^+(\square C_i(p)\leftrightarrow\square C_i(q)).$

Then applying Lemma 3 (a) ${m}$ times,

$\displaystyle \vdash_\mathrm{GL}\square(p\leftrightarrow q)\rightarrow(A(p)\leftrightarrow A(q)). \ \ \ \ \ (3)$

Note: this is where it is critical that ${A}$ is modalized in ${p}$. If we hadn’t gone through the above, we’d only have ${\vdash_\mathrm{GL}\square^+(p\leftrightarrow q)\rightarrow(A(p)\leftrightarrow A(q))}$, with the stronger hypothesis. In fact, for the non-modalized formula ${A(p)\equiv p}$, (3) is clearly false, or else we’d have ${\vdash_\mathrm{GL}p\leftrightarrow q}$ by Löb’s rule.

Now by some propositional rearrangements of (3) (which wouldn’t be possible with ${\square^+}$ in the hypothesis), we have

$\displaystyle \vdash_\mathrm{GL}\big((p\leftrightarrow A(p))\wedge(q\leftrightarrow A(q))\big)\rightarrow\big(\square(p\leftrightarrow q)\rightarrow(p\leftrightarrow q)\big).$

Introduce and distribute another ${\square}$ to apply the Löb axiom to the consequent, getting

$\displaystyle \vdash_\mathrm{GL}\square\big((p\leftrightarrow A(p))\wedge (q\leftrightarrow A(q))\big)\rightarrow\square(p\leftrightarrow q).$

Combining the last two lines,

$\displaystyle \vdash_\mathrm{GL}\square^+\big((p\leftrightarrow A(p))\wedge (q\leftrightarrow A(q))\big)\rightarrow(p\leftrightarrow q);$

the proof is completed by noting that ${\square^+}$ distributes over ${\wedge}$ just as ${\square}$ does. $\Box$

4. Kripke Semantics

Models for Gödel-Löb are described by the Kripke semantics, a formalization of the old (probably confused) notion of “possible worlds”. The benefit of studying the Kripke semantics is that there is a class of finite models such that a formula ${A}$ is proved in ${\mathrm{GL}}$ if and only if every model in the class satisfies ${A}$. This shows that the theorems of ${\mathrm{GL}}$ are a computable set: a two-threaded program enumerating both proofs and finite models can decide in finite time whether or not ${A}$ is a theorem.

Actually, the proof gives a bit more: for a given formula ${A}$, one can derive a primitive recursive upper bound (in the length of ${A}$) on the size of the smallest model not satisfying ${A}$. Thus the theorems of ${\mathrm{GL}}$ are actually primitive recursive.

In this section we present the Kripke semantics only as they are relevant to ${\mathrm{GL}}$; in other modal logics, other types of relations will be more appropriate than partial orders.

• A finite Kripke model for ${\mathrm{GL}}$ is a tuple ${\mathbf W=(W,<,\models)}$. Here ${W}$ is a non-empty finite set “of worlds”, and ${<}$ is a (strict) partial ordering (it is irreflexive and transitive). Also, ${\models}$ is an arbitrary “satisfaction” relation from ${K}$ to propositional variables.
• Then ${\models}$ is extended to all formulas with
• ${w\not\models\bot}$
• ${w\models A\rightarrow B}$ iff ${w\not\models A}$ or ${w\models B}$
• ${w\models\square A}$ iff ${(\forall x (the only interesting one).

All the derived connectives then behave as expected, e.g. ${w\models\neg A}$ iff ${w\not\models A}$.

• Because of the superficial similarity with the forcing relation from set theory, some texts use the forcing symbol ${\Vdash}$ instead of ${\models}$. But unlike set theoretic forcing, it is not generally the case if ${w\models A}$ and ${x, then ${x\models A}$! Note, though, by transitivity of ${<}$, that this does hold if ${A}$ is of the form ${\square B}$.
• Further note that minimal worlds (with nothing ${<}$ than them) satisfy ${\square A}$ for every ${A}$, and in particular satisfy ${\square\bot}$. This is partially “corrected” as you move up the ordering: non-minimal worlds do not satisfy ${\square\bot}$, and so satisfy ${\neg\square\bot}$ instead. More generally, a world at height ${n}$ (where we say minimal worlds have height ${0}$, and non-minimal worlds have height ${1}$ plus the maximum height of all lesser worlds) satisfies ${\neg\square^n\bot}$ (and lower powers) and ${\square^{n+1}\bot}$ (and higher powers). Thus ancestors satisfy strictly fewer statements starting with ${\square}$ than do their descendants.

• We also say a model ${\mathbf W}$ satisfies ${A}$, also written ${\mathbf W\models A}$, to mean that every ${w\in W}$ satisfies ${A}$. Note that ${\mathbf W\not\models A}$ does not imply ${\mathbf W\models\neg A}$, but only that there is some ${w\in W}$ with ${w\models\neg A}$.1

That the family of finite Kripke models is sufficient to capture all syntactic behavior of ${\mathrm{GL}}$ is the content of the modal soundness and completeness theorems.

Theorem 6 (Modal soundness) ${\mathrm{GL}}$ is sound for finite partial orders, in that any theorem of ${\mathrm{GL}}$ is satisfied in every such model.

Theorem 7 (Modal completeness) The finite partial orders are complete for ${\mathrm{GL}}$, in that any formula satisfied in every such model is a theorem of ${\mathrm{GL}}$.

Modal completeness is proved in the contrapositive by constructing models for arbitrary non-theorems.

It’s worth pointing out that finite trees are still complete for ${\mathrm{GL}}$. By a tree, we mean a (strict) partial order ${<}$ such that for every node ${x}$, the set of ${w>x}$ is linearly ordered by ${<}$. This strengthening is established by outlining a simple procedure for “straightening out” an arbitrary finite partial order, so that the modal completeness theorem for trees will follow easily from the modal completeness theorem as it is stated above (modal soundness is unaffected).

Boolos does this as follows:

• Given a finite Kripke model ${\mathbf W}$, let ${X}$ be the set of all non-empty decreasing sequences in ${W}$, so ${X}$ is still finite.
• Order ${X}$ by reverse end-extension, e.g. ${\langle w_1,\ldots,w_m,w_{m+1}\rangle<\langle w_1,\ldots,w_m\rangle}$ and so forth. This order clearly satisfies the tree property.
• Define a satisfaction relation on ${X}$ by ${\langle w_1,\ldots,w_m\rangle\models p}$ if and only if the smallest world in the sequence ${w_m\models p}$ in the original model ${\mathbf W}$.
• This makes ${X}$ into a model ${\mathbf X}$ which satisfies exactly the same formulas as ${\mathbf W}$.

Note that we don’t require a tree to have a maximum “root” node, so it might be more accurate to call our trees forests, consisting of several independent trees. Of course, if the modal completeness theorem can construct a forest-shaped model not satisfying a certain formula ${D}$, then there’s a single-rooted tree within that forest which (is itself a Kripke model and) also doesn’t satisfy ${D}$, so modal completeness still holds for the family of trees with a maximum node.

We conclude the section with proofs of modal soundness and completeness.

Proof of modal soundness: Propositional tautologies and the distribution schema are all “obviously” satisfied in all models, and Modus Ponens preserves satisfaction. Necessitation also preserves satisfaction in a model: if every ${w\in W}$ satisfies ${A}$, then in particular every ${x does, so every ${w}$ satisfies ${\square A}$.

Then we just need to show that the Löb axiom is satisfied in every model, which is a bit more interesting. Suppose ${w\models\square(\square A\rightarrow A)}$. That is, suppose

$\displaystyle (\forall x

which can be expanded again to

$\displaystyle (\forall x

The conclusion ${w\models\square A}$ we wish to derive is expanded as

$\displaystyle (\forall z

so the Löb axiom can be read as an induction principle for the satisfaction relation. There are no infinite descending chains in a finite model (transitivity plus irreflexivity forbids cycles), so the induction principle holds, and the Löb axiom is satisfied. $\Box$

Before moving on to modal completeness, we note that there’s one other kind of converse to the modal soundness theorem which may be worth pointing out. The definition of Kripke models can be stated perfectly well if ${R}$ is an arbitrary relation on an arbitrary (not necessarily finite) set ${W}$. As mentioned, other types of ${R}$ are more appropriate to other modal logics. But it turns out that the only kind of ${R}$‘s which will satisfy a modal soundness theorem for ${\mathrm{GL}}$ are the well-founded strict partial orders. (And it should be clear that the proof of the modal soundness theorem goes through just fine when ${(W,<)}$ is merely well-founded, instead of finite.)

Proof of modal completeness: If a formula ${D}$ is not proved in ${\mathrm{GL}}$, we need to construct a Kripke model ${\mathbf{W}}$ with ${\mathbf W\not\models D}$.

Worlds in the model we construct will be sets of modal formulas. The idea is that for each ${w}$ in a finite Kripke model, the relation ${\models}$ picks out the set ${S_w}$ of formulas that hold at ${w}$. When constructing our model, we’ll want fine control over which formulas are satisfied where, so we’ll construct ${w}$ in terms of ${S_w}$. Actually, what we’ll have is (almost) ${w=S_w}$, and we’ll do this in a way such that set membership (as in, ${A\in x}$) will define a valid satisfaction relation (${x\models A}$). To guide our construction, here are some properties that ${S_w}$‘s have in finite models:

• ${S_w}$ is consistent in that ${\mathrm{GL}}$ doesn’t refute any finite conjunction of formulas in ${S_w}$.
• ${S_w}$ is complete, in that for every ${A}$, either ${A}$ or ${\neg A}$ is in ${S_w}$.
• If ${x and ${\square A\in S_w}$, then both ${A}$ and ${\square A}$ belong to ${S_x}$. Actually ${S_x}$ contains strictly more formulas of the form ${\square A}$ than does ${S_x}$, by the earlier observation about heights.

Then the basic idea is that for any formula ${D}$ not proved by ${\mathrm{GL}}$, the singleton ${\{\neg D\}}$ is consistent, and can be extended to a complete consistent set ${y}$. If we do everything correctly so that ${\in}$ defines a valid satisfaction relation, then ${y\not\models D}$.

But since we want to construct a finite model, we can’t just take ${W}$ to be the set of consistent, complete sets of formulas. Rather, we want ${W}$ to be the set of “consistent, complete sets of formulas relative to the subformulas of ${D}$“. (Subformulas are defined in the obvious way: ${A}$ is a subformula of ${A}$, of ${\square A}$, of ${A\rightarrow B}$, and of ${B\rightarrow A}$, and “is a subformula of” is a transitive relation.)

So with reference to the list of properties above, let ${W}$ be the set of all sets ${w}$ of subformulas of ${D}$ and negations of subformulas of ${D}$ such that ${w}$ is consistent (the conjunction ${\bigwedge w}$ is not disproved by ${\mathrm{GL}}$) and ${D}$-complete (for any subformula ${A}$ of ${D}$, either ${A}$ or ${\neg A}$ lies in ${w}$). Then say ${x iff both

1. For every ${\square A}$ in ${w}$, both ${A}$ and ${\square A}$ belong to ${x}$;
2. There is a subformula ${\square B}$ of ${D}$ which belongs to ${x}$ but not ${w}$.

The relation ${<}$ is transitive by (1) and irreflexive by (2), and ${W}$ still contains an extension of ${\{\neg D\}}$.

Finally, we want to use set membership to define a satisfaction relation. Namely for ${w\in W}$, and each propositional variable ${p}$, put ${w\models p}$ iff ${p\in w}$ (in particular, ${w\not\models q}$ if ${q}$ does not appear in ${D}$, though this choice is not essential). Extend ${\models}$ to more complex formulas as required in the definition of the Kripke models.

We need ${\models}$ to be compatible with set membership (for subformulas of ${D}$ and their negations). This is proved by induction on the complexity of formulas, and the only non-trivial step is dealing with ${\square}$, so it’s the only one we’ll treat. Specifically, we need to show that a subformula ${\square B}$ of ${D}$ belongs to ${w}$ if and only if for all ${x, ${B}$ belongs to ${x}$. The relation ${<}$ defined above clearly satisfies the left-to-right implication.

Conversely, suppose ${\square B}$ does not belong to ${w}$; we need to construct ${x with ${\neg B}$ in ${x}$.

We construct such ${x}$ by extending the following set ${X}$ to an element of ${W}$. This will be possible assuming ${X}$ is consistent, and the definition of ${X}$ will guarantee ${x. Here ${X}$ consists of:

• ${\neg B}$
• All ${A}$ and all ${\square A}$, for ${\square A}$ in ${w}$
• ${\square B}$ (to get ${x, we just need some ${\square C}$ not in ${w}$, but ${\square B}$ is the obvious choice here).

Thus the last thing we need to prove is that ${X}$ is consistent, so suppose towards contradiction that it isn’t. Let ${A_0,\ldots,A_{n-1}}$ enumerate those sentences ${A}$ with ${\square A\in w}$, so that the inconsistency of ${X}$ implies

$\displaystyle \vdash_\mathrm{GL}\left(\bigwedge_{i

where ${\square^+A}$ is the abbreviation ${A\wedge\square A}$ from Section 3.

The consequent above is almost ready for Löb’s theorem; use the familiar trick of introducing a ${\square}$ and distributing it over the ${\rightarrow}$. This ${\square}$ also distributes over the conjunction, and recall that ${\square\square^+}$ reduces to ${\square}$. Then using Löb’s theorem on the consequent,

$\displaystyle \vdash_\mathrm{GL}\left(\bigwedge_{i

Since every ${\square A_i}$ is in ${w}$, this shows that ${\neg\square B}$ cannot belong to consistent ${w}$. Since ${\square B}$ is actually a subformula of ${D}$, it follows that ${\square B}$ belongs to ${w}$ by ${D}$-completeness. But ${\square B}$ does not belong to ${w}$, so ${X}$ must actually be consistent, and the construction goes through. $\Box$

Notes

1. This definition follows Boolos’ text. Lindström’s article requires models to have a maximum “root” world, and says a model satisfies ${A}$ iff the root does. Then in Lindström’s version, ${\mathbf{W}\not\models A}$ and ${\mathbf{W}\models\neg A}$ are equivalent. However, it seems to me that Lindström’s models fail the Necessitation axiom, as satisfaction at the root does not imply satisfaction at lower worlds, except for formulas starting with ${\square}$.