Almost exactly 12 years ago, I wrote an article1 describing an idea on how to hand-build a a Turing Machine which runs for more steps than Graham’s number \(g_{64}\) (A famously huge number). Doing so would give a constructive upper bound to the \(N\) such that \(BB(N) > g_{64}\) and perhaps provide a little more intuition on how the Busy Beaver function grows.

I worked on this problem off-and-on for about a month in 2010, but never ended up beating Graham and eventually this goal faded into the back of my mind. Then, a couple months later, a fellow Wikipedian, Robert Sawyer (“r.e.s.”) found a 91-state machine that beat Graham’s number! Over the years, he and others have continued to make improvements.2 In 2021, Daniel Nagaj claimed \(BB(16) > g_{64}\) in a Comment on the “Googology” fandom website.

After seeing BB(6) enter tetration territory, I was inspired to go back and try to investigate some of these TMs claiming to beet Graham. Are they legit? How do they work? In this article I will analyze Nagaj’s 16-state machine and prove that it does (indeed) surpass Graham’s number.

Graham’s Number

Graham’s number is a famously huge number. It was popularized by Martin Gardner in his “Mathematical Games” column and was awarded the “largest number ever used in a serious mathematical proof” by the Guinness Book of World Records in 1980. Using Knuth up-arrow notation, it is recursively defined by:

\[g_0 = 4\] \[g_{n+1} = 3 ↑^{g_n} 3\]

Even \(g_1 = 3↑↑↑↑3\) is spectacularly huge already and they only grow more and more gigantic. Graham’s number is the 64-th iteration: \(g_{64}\).

Designing a TM

In almost all of my previous TM analyses, I was analyzing a TM that was discovered in the wild. Generally one found by a computer exhaustively simulating millions of machines. And my goal was to bring order to the chaos, to see if it was possible to model this machine in a structured (human) way. In contrast, here I’m analyzing a machine that was designed intentionally by a human. The creators of these TM competitors had a plan for how they intended to beat Graham’s number and they built TMs to serve that purpose. In this case, the specific goal was to break the problem into two pieces: (1) Create a TM which computes a sufficiently fast-growing function on it’s input and (2) produce an initial input large enough that that function overtakes Graham’s number.

The Machine

I will first present the entire TM in all of it’s glory so that we are talking about a concrete machine. Don’t expect to make heads or tales of it yet, I will walk through it step-by-step below.

  0 1
D2 0 R D4 1 R D2
D4 1 R A7 0 L D10
D10 1 L D0 0 L A9
D0 1 R D2 1 L D0
A7 1 R E1 0 R A8
A8 1 R A7 1 L A12
A12 0 L A5 1 L E13
A5 1 R A9 1 L A9
A9 1 L D10 1 R A11
A11 0 R A9 1 L A5
E1 1 R M14 0 L E13
E13 1 L A12 0 L A12
M14 1 L S3 0 L M15
M15 0 L D4 1 L M16
M16 1 L M15 1 L M15
S3 1 L E1 1 R Halt

Note: The starting state is M14 (bolded).

This is exactly the same machine as Daniel Nagaj posted on the Googology forum (See Googology Comment and TM Definition for the originals) except that I have renamed and reordered the states in order to improve interpretability (I hope!). I few notes:

  • The numerical subscripts of each state are the original state numbers used by Nagaj. I have preserved them so that it is easier to compare between Nagaj’s definition and my own (and to expedite detection of any possible typos!). Note that they each use a unique number 0-16 (skipping 6).3
  • The uppercase letters used for states represent my interpretation of the “components” that make up the TM. I will describe each component in detail below.
  • Finally, the states have been re-ordered so that all the states in each component are together and the components are ordered. Note specifically that I have not followed the TNF convention of listing the start state first and all remaining states in order of first use.

Configuration Notation

In order to analyze this machine, let me first define some notation:

  • \(S(a, b, c, ..., z)\) = 0 1a D2> 0 1b 0 1c 0 ... 0 1z 0

for a parameterized TM configuration that will be the basis for this analysis. Note that in this notation:

  • \(S\) can have any (finite) number of parameters (Let’s say minimum 1 param).
  • Parameters may be 0. Ex: \(S(13, 0, 8)\) = 0 113 D2> 00 18 0. Parameters being 0 corresponds to 2 (or more) 0 symbols adjacent on the tape (without any 1s in between).
  • Any trailing 0s may be removed or added: \(S(5, 26, 0, 0) = S(5, 26)\).

Let me further define the notation:

\[S(a, 1 * n, b, ...) = S(a, \underbrace{1, 1, ..., 1}_{n \mbox{ copies of } 1}, b, ...)\]

so for example:

  • \(S(a, 1 * n, b)\) = 0 1a D2> 0 10n 1b 0

Components

As I discussed above, I have broken this TM’s states up into a sequence of components. My guess is that this roughly corresponds to how Nagaj designed this TM, by incrementally building bigger and bigger TMs that compute more and more powerful functions. The components are:

  1. Doubler (D) which roughly computes \(f_1(n) = 2n\).
  2. Ackermann (A) which roughly computes an Ackermann function \(A(n)\).
  3. Expandal (E) which roughly computes a repeated Ackermann function (like: \(E(n) = A^n(n)\)).
  4. Multiexpandal (M) which roughly computes a repeated “Expandal” function \(E^k(n)\).
  5. Setup (S) which is necessary only to set up the input tape configuration on which the previous components will compute.

Note: The names “expandal” and “multiexpandal” are taken from Nagaj’s description. I believe that they are Googology jargon for the respective functions I have listed above (repeated Ackermann and repeated expandal, respectively). They are not my favorite terms, but I use them here in order to have a distinct name for these components.

Doubler Component

The first component of this TM that I will describe is the Doubler:

  0 1
D2 0 R D4 1 R D2
D4 1 R A7 0 L D10
D10 1 L D0  
D0 1 R D2 1 L D0

This is a stand-alone 4-state machine that will evaluate a “doubling” function. Note:

  • I have removed the (D10, 1) transition since it is not necessary to this component.
  • We can think of the (D4 0) -> 1 R A7 as a halting transition for now. Reaching it indicates that this component has completed running.

Doubler Rules

The Doubler’s behavior is described by the rule:

  • Rule D: \(S(a, b+1, ...) \to S(a+2, b, ...)\) in \(a + 5\) steps.

and applying that rule repeatedly we get:

  • Rule Dx: \(S(a, b, ...) \to S(a+2b, 0, ...)\)

And thus you can see the way in which this TM computes a doubling (\(f_1(n) = 2n\) like) function.

I leave the proof of Rule D (and all subsequent rules) as an exercise to the reader. The process of proving these is similar to that in my previous analyses. (See for example: Collatz-like behavior of Busy Beavers and Complex Collatz-like Behavior).

Ackermann Component

Next is the Ackermann component:

  0 1
D2 0 R D4 1 R D2
D4 1 R A7 0 L D10
D10 1 L D0 0 L A9
D0 1 R D2 1 L D0
A7 1 R E1 0 R A8
A8 1 R A7 1 L A12
A12 0 L A5  
A5   1 L A9
A9 1 L D10 1 R A11
A11 0 R A9 1 L A5

This is a stand-alone 10-state TM built on top of the Doubler. As with the Doubler:

  • I have left a couple of transitions undefined since they are not needed for this component. (Note: I’ve also filled in the (D10, 1) -> 0 L A9 transition which will be needed at this point).
  • Reaching the (A7 0) -> 1 R E1 transition indicates that this component has completed running.

Ackermann Rules

The Ackermann Component follows the Doubler rules and additionally:

  • Rule A: \(S(a+1, 0, 1 * n, c+2, ...) \to S(3, 1 * n, a+2, c+1, ...)\)4

Note: Unless otherwise stated, this (and all other rules I define) apply for any integer values of all variables (\(\ge 0\)). So, the \(c+2\) in the input means that this value must be \(\ge 2\). Likewise, \(n\) may be 0 in which case we are in config \(S(a+1, 0, c+2)\).

This rule (and the Ackermann behavior in general) is quite a bit more complicated than the Doubler, so let’s explore the implications of this rule a bit. Starting from the configuration \(S(a, b, c, d, ...)\) what happens? (Assume \(a \ge 1\) for all)

S(a, 0, n, …)

\[S(a, 0, n+2, ...) \to S(3, a+1, n+1, ...) \to S(2a+5, 0, n+1, ...)\]

Let’s define \(s_0(n) = 2n + 5\), then we can see that:

  • Rule A0: \(S(a, 0, n+2, ...) \to S(s_0(a), 0, n+1, ...)\)
  • Rule A0x: \(S(a, 0, n+1, ...) \to S(s_0^n(a), 0, 1, ...)\)

S(a, 0, 1, n, …)

\[\begin{array}{rcl} S(a, 0, 1, n+2, ...) &\to& S(3, 1, a+1, n+1, ...) \to S(5, 0, a+1, n+1, ...) \\ &\to& S(s_0^a(5), 0, 1, n+1, ...) \\ \end{array}\]

Let \(s_{k+1}(n) = s_k^n(5)\), then:

  • Rule A1: \(S(a, 0, 1, n+2, ...) \to S(s_1(a), 0, 1, n+1, ...)\)
  • Rule A1x: \(S(a, 0, 1, n+1, ...) \to S(s_1^n(a), 0, 1, 1, ...)\)

S(a, 0, 1 * k, n, …)

And by induction, we can generalize this to:

  • Rule Ak: \(S(a, 0, 1 * k, n+2, ...) \to S(s_k(a), 0, 1 * k, n+1, ...)\)
  • Rule Akx: \(S(a, 0, 1 * k, n+1, ...) \to S(s_k^n(a), 0, 1 * k, 1, ...)\)

where:

  • \[s_0(n) = 2n + 5\]
  • \[s_{k+1}(n) = s_k^n(5)\]

Based on this definition \(s_k(n)\) grows like the (notoriously fast-growing) Ackermann function \(A(k, n)\) (which is why we’ve named it this way).

Expandal Component

Continuing to the Expandal component:

  0 1
D2 0 R D4 1 R D2
D4 1 R A7 0 L D10
D10 1 L D0 0 L A9
D0 1 R D2 1 L D0
A7 1 R E1 0 R A8
A8 1 R A7 1 L A12
A12 0 L A5 1 L E13
A5 1 R A9 1 L A9
A9 1 L D10 1 R A11
A11 0 R A9 1 L A5
E1 1 R M14 0 L E13
E13 1 L A12 0 L A12

This is a stand-alone 12-state TM built on top of the Ackermann.

Expandal Rules

The Expandal Component follows the Ackermann rules and additionally:

  • Rule E0: \(S(2k+2, 0, 1 * n, 0, d+1, ...) \to S(3, 1, 1 * k, 2n+3, 0, d, ...)\) in \(4n + 2k + 20\) steps.
  • Rule E1: \(S(2k+3, 0, 1 * n, 0, d+1, ...) \to S(3, 3, 1 * k, 2n+3, 0, d, ...)\) in \(4n + 2k + 22\) steps.

Combining these rules with Rules Akx and Dx we see:

\[\begin{array}{rcl} S(2k+2, 0, 1 * n, 0, d+1, ...) &\to& S(3, 1, 1 * k, 2n+3, 0, d, ...) \\ &\to& S(5, 0, 1 * k, 2n+3, 0, d, ...) \\ &\to& S(s_k^{2n+2}(5), 0, 1 * (k + 1), 0, d, ...) \\ \end{array}\] \[\begin{array}{rcl} S(2k+3, 0, 1 * n, 0, d+1, ...) &\to& S(3, 3, 1 * k, 2n+3, 0, d, ...) \\ &\to& S(9, 0, 1 * k, 2n+3, 0, d, ...) \\ &\to& S(s_k^{2n+2}(9), 0, 1 * (k + 1), 0, d, ...) \\ \end{array}\]

It’s a bit more complicated than before (because of the parity differences, and multiple inputs/outputs (\(k, n\))) but let me define \(s_{\omega} : \mathbb{N} \times \mathbb{N} \to \mathbb{N} \times \mathbb{N}\) by (assuming \(k \ge 1\)):

  • \[s_{\omega}(2k , n) = (s_k^{2n+2}(5), k)\]
  • \[s_{\omega}(2k+1, n) = (s_k^{2n+2}(9), k)\]

So, I’m defining \(s_{\omega}\) as a function that takes a pair of arguments and returns a pair. In order to access each member of that pair, let \(\pi_1(m, n) = m\) and \(\pi_2(m, n) = n\) be the “projection functions”.

Finally we can say:

  • Rule E: \(S(m, 0, 1 * n, 0, k+1, ...) \to S(\pi_1(s_\omega(m, n)), 0, 1 * \pi_2(s_\omega(m, n)), 0, k, ...)\)
  • Rule Ex: \(S(m, 0, 1 * n, 0, k, ...) \to S(\pi_1(s_\omega^k(m, n)), 0, 1 * \pi_2(s_\omega^k(m, n)), 0, 0, ...)\)

As we’ll see, \(s_{\omega}\) grows like the Ackermann function, so Rule Ex shows how the “Expandal” component grows like the iterated Ackermann function.

Multiexpandal Component

Alright, hold on tight everyone. We’re almost there!! The last (non-setup) component is the Multiexpandal:

  0 1
D2 0 R D4 1 R D2
D4 1 R A7 0 L D10
D10 1 L D0 0 L A9
D0 1 R D2 1 L D0
A7 1 R E1 0 R A8
A8 1 R A7 1 L A12
A12 0 L A5 1 L E13
A5 1 R A9 1 L A9
A9 1 L D10 1 R A11
A11 0 R A9 1 L A5
E1 1 R M14 0 L E13
E13 1 L A12 0 L A12
M14 1 L S3 0 L M15
M15 0 L D4 1 L M16
M16 1 L M15 1 L M15

This is a stand-alone 15-state TM built on top of the Expandal.

Multiexpandal Rules

The new rules are:

  • Rule M0: \(S(2k+2, 0, 1 * n, 0, 0, d+1, ...) \to S(3, 3, 0, 2k+2n+5, d, ...)\) in \(4n + 2k + 26\) steps.
  • Rule M1: \(S(2k+1, 0, 1 * n, 0, 0, d+1, ...) \to S(3, 3, 0, 2k+2n+5, d, ...)\) in \(4n + 2k + 26\) steps.

Combining these with previous rules we get (consider \(r \in \{1, 2\}\) in all equations below):

\[\begin{array}{rcl} S(2k+r, 0, 1 * n, 0, 0, d+1, ...) &\to& S(3, 3, 0, 2k+2n+5, d, ...) \\ &\to& S(9, 0, 0, 2k+2n+5, d, ...) \\ &\to& S(\pi_1(s_\omega^{2k+2n+5}(9, 0)), 0, 1 * \pi_2(s_\omega^{2k+2n+5}(9, 0)), 0, 0, d, ...) \\ \end{array}\]

So, let’s define \(s_{\omega + 1}(2k+r, n) = s_\omega^{2k+2n+5}(9, 0)\) so that we can write:

  • Rule M: \(S(m, 0, 1 * n, 0, 0, k+1, ...) \to S(\pi_1(s_{\omega + 1}(m, n)), 0, 1 * \pi_2(s_{\omega + 1}(m, n)), 0, 0, k, ...)\)
  • Rule Mx: \(S(m, 0, 1 * n, 0, 0, k, ...) \to S(\pi_1(s_{\omega + 1}^k(m, n)), 0, 1 * \pi_2(s_{\omega + 1}^k(m, n)), 0, 0, 0, ...)\)

As we’ll see, \(s_{\omega + 1}\) grows like the “Expandal” (iterated Ackermann) function. so Rule Mx demonstrates how the “Multiexpandal” component grows like the iterated Expandal (“Multiexpandal”) function.

Setup Component

The final “component” is the Setup component. I list only the single added state here:

  0 1
S3 1 L E1 1 R Halt

Through some sort of deep and powerful black magic of optimization it was somehow possible to only add a single state in order to augment this TM so that it could setup an input to our Multiexpandal that would not only be a valid input, but also “big” enough to beat Graham’s number!

Setup Rules

With the TM completely defined, we can finally add the last couple of rules needed to evaluate this TM:

  • Rule S: 0 M14> 0 \(\to S(3, 3, 0, 1, 1)\) in 21 steps.
  • Rule H: \(S(m, 0, 1*n, 0, 0, 0, ...) \to \mbox{Halt}(m + n + 4)\) in \(2n + 6\) steps. (Where, by notation \(\mbox{Halt}(m + n + 4)\), I mean that it halts with \(m + n + 4\) symbols on the tape.)

Rule S is the somewhat miraculous one (IMHO). It takes the Multiexpandal machine (which was not designed to work on a blank tape!) and with the addition of only a single state, somehow produces an input sufficiently large to beat Graham (and it is only barely big enough, if it was missing the rightmost 1, it would fail!).

Bringing it all Together

Alright, let’s bring everything we need together here and calculate the number of symbols that this TM leave on the tape.

Rules

Summarizing the rules:

\[\begin{array}{llcl} \mbox{Rule S: } & \mbox{Blank Tape} &\to& S(3, 3, 0, 1, 1) \\ \mbox{Rule Dx: } & S(m, k, ...) &\to& S(m+2k, 0, ...) \\ \mbox{Rule Akx: } & S(m, 0, 1 * n, k+1, ...) &\to& S(s_n^k(m), 0, 1 * (n+1), ...) \\ \mbox{Rule Ex: } & S(m, 0, 1 * n, 0, k, ...) &\to& S(\pi_1(s_\omega^k(m, n)), 0, 1 * \pi_2(s_\omega^k(m, n)), 0, 0, ...) \\ \mbox{Rule Mx: } & S(m, 0, 1 * n, 0, 0, k, ...) &\to& S(\pi_1(s_{\omega + 1}^k(m, n)), 0, 1 * \pi_2(s_{\omega + 1}^k(m, n)), 0, 0, 0, ...) \\ \mbox{Rule H: } & S(m, 0, 1*n, 0, 0, 0, ...) &\to& \mbox{Halt}(m + n + 4) \\ \end{array}\]

Summarizing the \(s_*\) hierarchy of recursively defined functions:

\[\begin{array}{lcl} s_0(n) &=& 2n + 5 \\ s_{k+1}(n) &=& s_k^n(5) \\ s_{\omega}(2k , n) &=& (s_k^{2n+2}(5), k) \\ s_{\omega}(2k+1, n) &=& (s_k^{2n+2}(9), k) \\ s_{\omega + 1}(2k+r, n) &=& s_\omega^{2k+2n+5}(9, 0) \mbox{ (for } r \in \{1, 2\} \mbox{)} \\ \end{array}\]

Simulating from Blank Tape

\[\begin{array}{rcll} \mbox{Blank Tape} &\to& S(3, 3, 0, 1, 1) & \mbox{ by Rule S} \\ &\to& S(9, 0, 0, 1, 1) & \mbox{ by Rule Dx} \\ &\to& S(A, 0, 1 * 4, 0, 0, 1) & \mbox{ by Rule Ex} \\ &\to& S(B, 0, 1 * C, 0, 0, 0) & \mbox{ by Rule Mx} \\ &\to& \mbox{Halt}(B + C + 4) & \mbox{ by Rule H} \\ \end{array}\]

Where:

\[\begin{array}{ccl} (A, 4) &=& s_{\omega}(9, 0) = (s_4^2(9), 4) \\ (B, C) &=& s_{\omega+1}(A, 4) = s_\omega^{A-r+13}(9, 0) \\ \end{array}\]

where \(r\) is the unique value such that: \(r \in \{1, 2\}\) and \(r \equiv A \pmod{2}\).

Evaluating the \(s_*\) Hierarchy

OK, but what the heck does \(s_\omega^{s_4^2(9)-r+13}(9, 0)\) evalutate to? Does it actually beat Graham’s number? Let me start by computing some values of the \(s_*\) hierarchy functions:

\[\begin{array}{rclcl} s_1(n) &=& s_0^n(5) = 10 \cdot 2^n - 5 &>& 2^n \\ s_2(n) &=& s_1^n(5) > (2↑)^n(5) = 2↑↑n[↑5] &>& 2↑↑n \\ & & \vdots & & \\ s_k(n) &=& s_{k-1}^n(5) > 2 ↑^k n [↑^{k-1} 5] &>& 2 ↑^k n \\ \pi_1(s_\omega(m, n)) &\ge& s_{\lfloor m/2 \rfloor}^{2n+2}(5) > 2 ↑^{\lfloor m/2 \rfloor + 1} (2n+2) [↑^{\lfloor m/2 \rfloor} 5] &>& 2 ↑^{\lfloor m/2 \rfloor + 1} (2n+4) \\ \end{array}\]

Note: \(2↑↑n[↑5]\) and \(2 ↑^k n [↑^{k-1} 5]\) are my “extended up-arrow notation” (the first one is a power tower of \(n\) 2s with an \(5\) on top).

So \(A = s_4^2(9) > 2↑↑↑↑2↑↑↑↑9\)

Comparing to Graham’s Number

Remember the definition of Graham’s number:

\[g_0 = 4\] \[g_{n+1} = 3 ↑^{g_n} 3\]

Here’s a rough outline of how I claim that this TM beats it:

  • Lemma 1: \(4 \cdot 3 ↑^n 3 < 2 ↑^{2n} 5\)
  • Lemma 2: \(4 g_{n+1} < \pi_1(s_{\omega}(4 g_n, 0))\)
  • Lemma 3: \(4 g_n \le \pi_1(s_{\omega}^n(16, 0))\)
  • Corollary 4: Graham’s number \(g_{64} < \pi_1(s_{\omega}^{64}(16, 0)) < \pi_1(s_{\omega + 1}(65, 4)) < \pi_1(s_{\omega + 1}(A, 4)) = B\)

In other words, the intention is to prove that \(B > g_{A-1} > g_{2↑↑↑↑2↑↑↑↑9}\) which is obviously way way bigger than \(g_{64}\)! And remember, this TM halts with \(B + C + 4\) symbols on the tape. so this is enough to prove that this TM beats Graham’s number and \(BB(16) > g_{64}\).

I leave the proof of these inequalities as an exercise for the reader :) They are extremely loose inequalities. However, comparing Knuth up-arrow quantities is a bit grungy and tedious to write out. It is beyond the scope of this article to dig into the details of that here.

If you want a hint at how to prove them. Robert Sawyer (“r.e.s.”) suggested to me that “Lemma 1” may be proven using:

  1. Saibian’s Knuth-arrow inequality5: \((a ↑^k b) ↑^k c < a ↑^k (b+c)\) or
  2. For \(n \ge 2\), the result follows from an alternative inequality that’s easier to show by induction: \(a ↑^k b \le 2 ↑^k ((a-1) b)\). And the \(n = 1\) case can be confirmed directly to finish the proof.

Acknowledgements

  1. I would like to thank Daniel Nagaj for finding and sharing this TM discovery. While he did not leave much written about this machine, his TM file is heavily annotated and those notes were useful in making sure I was on the correct track.
  2. It is also worth noting that this machine is not only the work of Nagaj, but is built from the ashes of a 18-state machine created by “Wythagoras”. In fact, many of the state numbers, complete transitions and the addition of only a single state for the Setup component are all evident in the “Wythagoras” machine.
  3. Likewise, “Wythagoras” built his machine based directly upon one by “Deedlit11” who designed the basic components still used in this current machine (Doubler [although his was a tripler], Ackermann, Expandal, Multiexpandal, Setup).
  4. And finally, a thanks to Wikipedian “r.e.s.” who actually wrote the original Fast-growing hierarchy Wikipedia article based upon my request in 2009! While I ended up cutting my reference to the fast-growing hierarchy from this article explicitly, the \(s_*\) hierarchy is a type of fast-growing hierarchy and it’s properties are what allow us to beat Graham’s number!
  5. Thanks also to “r.e.s.” for emailing me in Feb 2023 to suggest the two approaches to proving “Lemma 1” which I listed above.

Footnotes

  1. The original article was actually posted on my Wikipedia User page of all places! And, honestly, that location has served me well. That article has been preserved and is still available 12 years later, at no cost, ad free and even with a detailed change history! That said, I did recently copy it to this blog. 

  2. I have reconstructed a history of claimed record-breakers in this section), if any are missing, let me know and will be happy to add them. 

  3. I assume that Nagaj originally had a state 6, but found some way to eliminate it and decided not to renumber everything. 

  4. As a hint to proving Rule A: You will first want to prove 1 <9 01n 1 -> <9 01n 11 in \(4n + 2\) steps. 

  5. Saibian’s proof of this inequality is 32 pages long!