What does it mean that, “My computer can do what your computer can do?” My computer can play Crysis, and yours probably can’t, but is that really a meaningful distinction? My computer has a bunch of ports and things on it—USB, HDMI, etc.—and there’s probably a bit of overlap between my computer and yours there. But my computer could be missing most, if not all of its ports, and it could definitely still be said to compute.

You might have a Mac, and I have a PC, but again, that seems like a superficial difference—while there may be certain programs that only run on PCs and not Macs, it’s not like a similar sort of program couldn’t be written for a Mac that did the same thing.

We could play this little reductionist game all day—winding down past the processor of the computer, comparing flip-flops and NAND gates, past transistors, down to the electrons themselves—those tireless carriers of the binary alphabet of our information age.

But what’s the difference between electrons carrying out computations versus *me* carrying out a computation? After all, given enough time, I (or perhaps, Suzie) could manually reproduce anything either my or your computer is capable of computing. We’d just have to follow the correct rules (and possess a sufficient amount of paper…and pencil lead).

Unfortunately, this has brought us no closer to defining this idea of equivalent computational power—all that we’ve decided is that we are at least as computationally able as our own computers! But this is a subtle sort of computational equivalence—it has nothing to do with speed, as the CPU on a modern processor carries out in the ballpark of a few billion operations per second, and you or I can probably carry out somewhere in the ballpark of one per second. Further, it has nothing to do with representation—be the computational substrate electrons, lead, rocks, or water, we don’t care so long as the procedure can be replicated and carried out to yield an equivalent result.

It turns out that there is a wonderful (and precise) formalism for this idea of equivalence which was developed long before there was anything like the modern idea of a computer, and it came from the grandfather of computing himself, Alan Turing (for a glimpse at Alan Turing’s tragic life, see here, and for a less tragic entertaining fictional depiction, see here). Turing called them a-machines, but the idea is now known as a Turing Machine.

The formalism, which I’ll sketch shortly, seems a bit strange at first glance, but the real power of it is that *any computing device can be reduced to a certain type of Turing Machine*.

So: Imagine you are seated at a table, and in front of you is a long, segmented strip of paper and a pencil. One end of the paper extends (countably) infinitely far to your left, and the other, infinitely far to your right. You might imagine the segments as being lines, as if on lined paper, except the lines are going top to bottom instead of left to right. One particular line of this infinitely long strip of paper is the focus of a magnifying glass in front of you.

On your lap is a leatherbound book suggestively entitled “PROGRAM”. Upon opening it, you see there’s a long table, and each line looks something like this:

- IF SEGMENT READS ‘
**B**’ AND YOU ARE**HAPPY**, REPLACE WITH CHARACTER ‘**A**’, SHIFT TAPE TO THE**LEFT**, AND BECOME**DESPONDENT**.

With some limited variation: sometimes the ‘B’ in first clause and the ‘A’ in the second clause are other characters from some finite collection (say the alphabet), including ‘blank’ and a special character called HALT. The third clause sometimes might say to shift right instead of left, or perhaps not shift at all, and the emotions of the first and last clause always come from a finite list (despondency, happiness, anger, etc.).

After some rumination, you notice that the table is also self-consistent—there’s no two entries on the table that could ever both apply to the same configuration of your emotional state and the reading on the strip of paper. Further, there’s always one and only one thing for you to do to the strip for any given combination of characters-under-magnifying-glass and emotional states. Lastly, you surmise that the character to be replaced is the character upon which the magnifying glass currently focuses.

So, you set out following the program—writing characters, shifting the strip of paper back and forth, erasing characters, and schizophrenically altering your emotional disposition upon command of your leatherbound overlord.

Eventually, you come across that peculiar HALT character, and you stand up to inspect the strip of paper.

And it occurs to you that the paper now contains the first million digits of Pi.

Up to some technical considerations, you have just implemented a Turing Machine! But not all Turing Machines are the same—as I said, they are typified by some specific finite alphabet, and some specific finite set of mental states. It turns out that what any specific Turing Machine is capable of computing is strongly dependent on the number of characters in its alphabet and the number of emotional states that it can cruelly instruct you to emote.

A two-state, two-character Turing Machine is not terribly interesting. With patience, you could probably figure out every “computation” such a device was capable of performing. One heuristic way of quantifying this is finding the maximum number of steps any given program will run on that class of Turing Machine before a HALT given a completely blank starting strip of paper (that it be initially blank is important). Notice that it’s straightforward to create a Turing Machine that never reaches the HALT command. You just feed it a blank sheet of paper and the command:

- IF SEGMENT READS
**‘BLANK’**AND YOU ARE**ANGRY**, REPLACE WITH CHARACTER**‘BLANK’**, SHIFT TAPE TO THE**RIGHT**, AND BECOME**ANGRIER**.

Thus you will become angrier and angrier, and the Turing Machine will never HALT. But it must be true that, if a Turing Machine *is* going to HALT, it will always do so after some maximum number of steps. And it just so happens that if a two-state, two-character Turing Machine is going to HALT given some program and a blank input, it will do so in 6 steps (proving this is actually a little nontrivial). Such a number is innocently dubbed a Busy Beaver number, and was introduced by Tibor Rado in the 1960s.

Thus, a two-state, two-character Turing Machine fed a blank input won’t ever tell us anything more interesting than can be specified in about 6 binary characters (it’s actually a bit worse than this, as a two-state, two-symbol Turing Machine will only ever write at most 4 of the same type of non-BLANK characters on to the strip of paper before it HALTs).

So what makes for an “interesting” Turing Machine? Well, if you modestly increase the number of characters to 6, and the number of states to 4, two very important things happen.

First, we have no idea what the Busy Beaver number is for such a machine. It’s at least 1.3 × 10^{7036}, but in reality significantly larger than that (for reference, this number is about 10^{6960} times larger than the number of atoms in the observable universe—it’s one of those numbers that’s so large, it’s hard to actually articulate how large it is). In fact, even if you got your hands on what this number might be, the proof of its truth might be independent of Zermelo-Fraenkel Set Theory.

Secondly, such a Turing Machine can simulate *any other *Turing machine, including Turing Machines with more characters and more states. Such Turing Machines are called Universal.

The specific schemes demonstrating exactly how such a thing is possible are, in general, rather complicated. But the result is dramatic: any program that you or I run on one of our computers that takes an input and registers an output can be cast on and evaluated by a 6 character, 4 state Turing Machine. Remember that we required that the starting input be blank. If we relax this constraint, there’s a somewhat controversial claim that even two state, three symbol Turing Machines can be made universal. The controversy arises because it becomes difficult to disambiguate the complexity of the Turing Machine from the complexity of the specific encoding scheme used to initially prime the strip of paper with symbols.

The interesting results only start here: in part 2b, with this idea of computational equivalence in hand, we’ll flesh out this idea of computability a bit further. And finally, we will encounter the Russian.