Maths for Hackers

There is a lot computer science owes to mathematicians such as Ada Lovelace, Charles Babbage, Emil Post, Alan Turing, Stephen Cole Kleene, et al. But much of the beautiful mathematics of Computability Theory and other areas of Mathematical Logic are unknown to many in the area of computer science - and here I include hackers, makers, programmers, systems administrators, etc.

Also, much of the language of crytography is taken from mathematics, and used to bludgeon those not ‘in the know’ about such things. Whilst these language games are being played, cryptography is having a real and tangible effect on people’s lives. It is time to open the subject up and create clear, concise descriptions of what is going on - accessible to as many people as possible, with only a minimal amount of effort.

Cryptography needs demystifying, and computability theory needs to be better known.

This is the first in what I hope will be a series of posts on mathematics, specifically the bits that get left out or forgotten by comp. sci. lecturers and youtube celebs.

Computability Theory - The Hacker Theorem

TL;DR - there’s a wonderful little theorem that permits input data to be computer code, and that computer code is essentially a string. As such, all those lovely injection attacks hackers create, and even virtual machines themselves, aren’t clever overlays or autmentations of computer functionality, they are intrinsic features of computation from the very start!

More properly called the $s-m-n$-theorem, after a symbol in a proof due to Kleene, it was first published by Alan Turing in 1936. It is sadly often derided for its importance in terms of the implications you can derive from the theorem, and oft overshadowed by the Recursion Theorem (and, you could argue, rightly so).

A Note On Style

In mathematics, we usually keep to the following running order:

  • Introduction
    • State the overview of the research
    • State the main result(s)
    • Occasionally include jabs at colleagues
  • Definitions (the most re-read section)
  • Elementary proofs (to induce a false sense of security)
  • Main Result(s) restated and proved in full
  • Closing remarks

…and so the maths research paper goes. However, there is a problem with the way that scientific data is amalgamated for study, vis a vis textbooks. The original papers generally cover the intentions and motivations for some particular work. However, owing to space and publishing costs, these are often lost. In mathematics, this can actually worsen a student’s chances of engaging with a proof, as the physically shortest proof is often chosen for publication. Whilst many don’t see a problem with this - we always want the most succinct way of obtaining knowledge - sometimes it is a detriment; the longer proof can convey a sense of the meaning of a particular result, and may be a better educational tool than a terse, lead nugget kind of proof. That said, there are some wonderful textbooks by matematicians that avoid this pitfall - my own former supervisor’s textbook ‘Computability Theory’ is, I think, a decent example of how it should be done. But I digress…

My intention here is to deviate from the ‘standard form’ of a mathematics textbook, and introduce the ideas in a verbose, possibly prosaic fashion, and offer up commentary on not just what is happening, but why and how and with what motivation.

Building Blocks

Instead of Definition 1, Defn. 2, Def. 3, etc. let’s try this. (As always, comments and feedback are gratefully received).

So, being in the world of mathematics we want to talk about computation as the process by which we take some input $x$, so something to it, and then give some output, call it $y$ (for ‘$y$-not’… geddit? It’s a joke. Tell your face.)

Normally we’d describe this as $f(x) = y$, which may have triggered some PTSD from high school maths lessons - for which I’m truly sorry. But we want to formalise what we mean by ‘computable’, and find a way to apply it to this context. To do this we take a leaf out of a monstrously persecuted, quiet and lovely gay guy’s book, and describe Alan’s…

Turing Machines

We’re probably familiar with these - they’re somewhat of a cliche these days. Let’s run through the main pieces that make up a Turing Machine (TM):

  • A Program $P$
  • An infinite tape on which we do our work, with marked sections
  • A read/write head - this performs the actions of the program $P$ and moves along the tape left and right

There you go. That’s it. Well, no, there’s a lot more…

First question is how do we describe a program $P$? Well, it’s really simple, actually, we describe it as a series of 5-tuples. 5-wha-cha-bles? We just mean brackets that have uniformly 5 things between them. Each thing has a particular meaning depending on where it is. Here’s what we will use:

\[< \text{current symbol}, \text{current state}, \text{new symbol}, \text{new state}, \text{direction to move} >\]

…or in a more mathematical way:

\[< s, q, s', q', \{L,0,R\} >\]

The last bit, the $ ( L,0,R ) $ bit, just says that we want our direction to be one of left/right/don’t move. But what is a symbol? Well, it’s a thing written to the tape, one per demarked section. What’s a state? This is a bit trickier, so we need to describe how this thing will work.

A program $P$ will be composed of a sequence of these 5-tuples, each one stipulating what to do when presented with some symbol $s$ and state $q$. We define what to do by stating what symbol to write, and what internal state to set, and which direction to move in.

When we have an input, this gets written to the tape, and the head is placed at the tape cell at the left-most of the input and set to state $q_1$. We will express this as a series of ‘1’s, and to make sure we can fit zero in, we’ll use $x+1$-many ‘1’s on the tape. So, we now have written our input, what now? We want to do something to this - but what? Let me show you an example program:

\[<1, q_1, 1, q_1, R > \\\\ < 0, q_1, 0, q_2, L > \\\\ < 1, q_2, 0, q_3, R > \\\\ < 0, q_3, 1, \text{HALT}, 0 >\]

See what it does? If not, then think of it this way. First, we seek right, by moving right and not changing any symbols on our input. When it gets to a ‘0’, it goes left one, writes a 0 if it finds a 1, and then exits. Exits? Well, yes. That’s what that $\text{HALT}$ means. It’s a the fourth thing I didn’t put on our original list - a way to kill the computation at any point by declaring it in our program. Whatever is left on the tape after we declare $\text{HALT}$ is our output from our computation.

So we now have a program $P$ that does $f(x) = x-1$. Easy.

Turing’s Genius

Turing didn’t invent computers to save the world. Contrary to what you’ll believe from those who believe everything started in WWII, he actually derived the model of a Turing Machine in 1936 to solve a problem posed 8 years earlier, Hilbert’s Entscheidungsproblem.

The real genius at play is that he married together the two main models of computation - the lambda calculus, and register machines. He showed that they were equivalent by showing all their relevant actions could be performed with Turing Machines. Quite simply a wonderful achievement, and paving the way for his future work on computability theory proper (something I hope to elaborate on).

The upshot of all this thinking was the following item:

The Church-Turing Thesis

Every function for which there is an intuitively effective process for computing its values can be shown to be Turing Computable.

Turing’s words here are taken from Barry Cooper’s book ‘Computability Theory’. But this is not the sense in which we will use the phrase ‘Turing computable’ or just ‘computable’ to refer to the members of a set. Members of a set? Yes. We will say a set, call it $A$, is compuable if there exists a Turing Machine that accurately describes, for all $a$, a potentially infinitely list of potential members, whether $a$ is in $A$ or not. Formally, we say that such a turing machine decides whether $a \in A$, and we can formalize this as follows; there is some computable $f$ such that for any $a$:

\[f(a) = 1 \text{ if }a \in A \\\\ f(a) = 0 \text{ if }a \notin A\]

This is different from usual introductions to computable functions - they take the route of ‘recusive functions’ which is a beautiful subject, but not very accessible. The next piece we need to take care of is what happens when we have a set where there are three possible answers: in, out, and ¯\_(ツ)_/¯

Partial Computable Functions

Take a program $P$ that adds 1 until it gets to 50, then HALTs. Input 47, and you get 48, 49, 50, and we’re done. Input 51 and you get 52, 53, 54, 55, 56, 57, 58, … the TM will never stop, as it will never reach the halt condition.

There are lots of functions and programs and algorithms like this - infinitely many! These are clearly ‘partial’ functions - the set of values for which we get an answer is not the full set of possible values.

Gödel Numbering

Sounds fancy, but really it’s just a way of assigning a unique value to a string. What strings? Well, any. Including our programs - they are essentially just strings of characters interpreted by the machinery of a Turing Machine.

Kurt Gödel came up with the idea for his own Incompleteness Theorems (monuments of modern logic, and perhaps the most significant shift in human thought in recent times). The idea is quite simple - for each symbol, assign an integer. For each position assign a unique prime. Take each prime and put it to the power of the integer representing the symbol in the position the prime represents. For example ‘LOL’ might be $2^{12} \cdot 3^{15} \cdot 5^{12}$ which equals 14,348,907,000,000,000,000. Simple.

The numbers get very large very quickly for even small sets of symbols and short strings, but the idea is this - we know from the fundamental theorem of arithmetic that each number has a unique prime factorization. As such, each string under some coding where we avoid overlaps has a unique number that represents it perfectly. Perfectly? yes, we can always reconstruct the string from the number.

What’s really nice about using this is that the coding and decoding of each number is perfectly computable. We can always generate a number, and always recover a string. The only problem is that the strings we recover by and large won’t be very meaningful as a program, but we don’t really care about that - it’s a side effect. So long as we’re always careful about choosing which numbers we decode, we’ll be fine.

We will introduce this by adding subscript $e$ to greek letters to represent funcitons. Why? Well, it looks cool, doesn’t it. So, for the program represented by the number $e$ acting on input $x$ we will write $\phi_e (x)$. If the program halts, we will write $\phi_e (x) \downarrow$.

For those more comfortable with mathematical notation such as $\exists s$ read ‘there exists $s$ such that…’ this is how we can express halting in more mathematical terms. We know that each computation has a potentially infinite, but still separate (in maths terms ‘discrete’) ‘stages’. These are represented as an additional subscript to our above notation. So, for a program $e$ at stage 29, we will write $\phi_{e,29}(x)$.

Halting can now be written as follows; for program $e$ on input $x$, there exists some stage $s$ such that the program halts, written

\[\exists s \phi_{e,s} (x) \downarrow\]

Neat. Now we have everything we need to talk about the…

s-m-n Theorem

But you said we’d get a ‘hacker theorem’!! yes, and I didn’t lie! This is what I called it in my MSc thesis on this subject. Ok, let’s try again.

The Hacker Theorem

Here’s the statement of the theorem:

Theorem: If $\psi_e(x,y)$ is partial computable, there is a computable $g$ for which $\psi_e(x,y) = \phi_{g(x)}(y)$

Proof: I’m not going to bore you with a formal proof. It’s much more elucidating to give a descriptive proof.

First, let’s look at the statement precisely - it says that you can take data that is an input, and there is a computable $g$ that takes this input and generates a program that can be run on a TM. We’ve already shown that any number can be decoded through the method above, so this is a total (meaning ‘always gives an answer’) computable function. But what if we decode a nonsense program? That’s fine - remember, $f$ is partial computable, so there’s no requirement for it to halt :P

If you think about it, we can always construct some $g$ that makes this true, for any 2-input $\psi$. This means that we can move the computation around $x$ in $\psi$ into $g$ as required. So, we can move information into and out of the subscript index and input parameters. $\square$

(That little box means ‘QED’ or in other words ‘drop mic’ or ‘I’m done here’.)

So there we are - it’s not a very daunting theorem, and has a nice, succint presentation.

A Little Extra

This theorem directly influenced the Recursion theorem and the Universal Turing Machine theorem. The Universal Turing machines are just virtual machines - the theorem states that you can simulate every piece of a Turing machine in itself, which is true!

This, however, is a little beyond the scope of this article, and in many ways is best kept by itself for now.

Implications

The theorem shows that programs are streams of bits as much as their inputs are. Look at common attacks:

  • SQL Injection states that website input can be injected SQL code
  • Cross Site Scripting shows that user web input when returned improperly can become HTML/Javascript code that runs in a browser
  • Buffer Overflow attacks are the best example - the input actually rewrites part of the stack, and the input then become running code.

Hence, many of the injection attacks and activities are validated by this theorem at an abstract level. It’s kind of reassuring to me that such a simple theorem validates so much work that is done in this manner in computer science.

Although this theorem seems a little counter intuitive in a sense - can we really always find $g$?? - it isn’t much work (and I encourage people to try it) to formalise this into a neat proof.

But what have we done here? Well, in a sense, we’ve shown in an abstract way that we can move ‘information’ between storage in the inputs of a function, into the computer itself, and vice versa.

The amazing thing, for me, is that it doesn’t matter how we formulate computation for this to still work. It also amazes me that a deeper ramification is seldom spoken of - the fact that we must be sensitive to where our ‘information’ is in computation. By coding an algorithm, you are storing some information. By writing down an algorithm into a text file as, say, C code for storage on the same computer, you are storing the same information. The act of compiling this code is nothing ‘magical’ - it is a direct and computable translation process of an ‘effective procedure’/algorithm into another form. Likewise, there is nothing to stop you passing a program as input to another program. (this observation is key to proving the Halting Problem).

Future Work

I’m hoping to write some overviews of the following:

  • The Halting Problem Without Tears (Unless you work in AI)
  • Recursion theorem
  • Universal Turing Machine theorem
  • Post’s Problem (maybe in a few parts)
  • Friedberg-Muchnik theorem and Priority Arguments

Maybe it’s of interest? If not, it’s a fun exercise for me.