# The Shortest Implementations of π

How small of a function can you write that takes as input an integer x in $$\{ 0,1,...,255 \}$$ and outputs pi[x] where pi is the table given below?

Here, "small" means that the corresponding source code and/or machine code length measured in bits is short. Why do we care? It is explained here and in [2].

unsigned char pi[] = {
0xFC,0xEE,0xDD,0x11,0xCF,0x6E,0x31,0x16,0xFB,0xC4,0xFA,0xDA,0x23,0xC5,0x04,0x4D,
0xE9,0x77,0xF0,0xDB,0x93,0x2E,0x99,0xBA,0x17,0x36,0xF1,0xBB,0x14,0xCD,0x5F,0xC1,
0xF9,0x18,0x65,0x5A,0xE2,0x5C,0xEF,0x21,0x81,0x1C,0x3C,0x42,0x8B,0x01,0x8E,0x4F,
0x05,0x84,0x02,0xAE,0xE3,0x6A,0x8F,0xA0,0x06,0x0B,0xED,0x98,0x7F,0xD4,0xD3,0x1F,
0xEB,0x34,0x2C,0x51,0xEA,0xC8,0x48,0xAB,0xF2,0x2A,0x68,0xA2,0xFD,0x3A,0xCE,0xCC,
0xB5,0x70,0x0E,0x56,0x08,0x0C,0x76,0x12,0xBF,0x72,0x13,0x47,0x9C,0xB7,0x5D,0x87,
0x15,0xA1,0x96,0x29,0x10,0x7B,0x9A,0xC7,0xF3,0x91,0x78,0x6F,0x9D,0x9E,0xB2,0xB1,
0x32,0x75,0x19,0x3D,0xFF,0x35,0x8A,0x7E,0x6D,0x54,0xC6,0x80,0xC3,0xBD,0x0D,0x57,
0xDF,0xF5,0x24,0xA9,0x3E,0xA8,0x43,0xC9,0xD7,0x79,0xD6,0xF6,0x7C,0x22,0xB9,0x03,
0xE0,0x0F,0xEC,0xDE,0x7A,0x94,0xB0,0xBC,0xDC,0xE8,0x28,0x50,0x4E,0x33,0x0A,0x4A,
0xA7,0x97,0x60,0x73,0x1E,0x00,0x62,0x44,0x1A,0xB8,0x38,0x82,0x64,0x9F,0x26,0x41,
0x07,0x58,0xB3,0x40,0x86,0xAC,0x1D,0xF7,0x30,0x37,0x6B,0xE4,0x88,0xD9,0xE7,0x89,
0xE1,0x1B,0x83,0x49,0x4C,0x3F,0xF8,0xFE,0x8D,0x53,0xAA,0x90,0xCA,0xD8,0x85,0x61,
0x20,0x71,0x67,0xA4,0x2D,0x2B,0x09,0x5B,0xCB,0x9B,0x25,0xD0,0xBE,0xE5,0x6C,0x52,
0x59,0xA6,0x74,0xD2,0xE6,0xF4,0xB4,0xC0,0xD1,0x66,0xAF,0xC2,0x39,0x4B,0x63,0xB6
};


Send me your best implementations in the languages of your choosing and I will publish them below! The source code doesn't have to be shorter than our C implementation; the machine code doesn't have to be shorter than the one for our ARM assembly implementation.

Alternatively, you can check out the challenge we put on the codegolf stackexchange here.

The best we could find1 in C is:

p(x){unsigned char*t="@rFTDVbpPBvdtfR@\xacp?\xe2>4\xa6\xe9{z\xe3q5\xa7\xe8",a=2,l=0,b=17;while(++l&&a^x)a=2*a^a/128*29;return l%b?t[l%b]^t[b+l/b]^b:t[l/b]^188;}


## 1 Some Explanations about our Implementation

### 1.1 The TKlog Structure

As I have established (see here and in [1]), this permutation actually has a very strong and very simple structure that is completely captured by the following system of equations:

$$\begin{cases} \pi(0) &= \kappa(0), \\ \pi\left( \alpha^{17j} \right) &= \kappa(16-j), ~\mathrm{for}~ 1 \leq j < 15, \\ \pi\left( \alpha^{i + 17j} \right) &= \kappa(16-i) \oplus \alpha^{17 s(j)}, ~\mathrm{for}~ 0 \leq i < 17 ~\mathrm{and}~ 0 \leq j < 15 ~, \end{cases}$$

where $$\alpha$$ is a root of $$X^8+X^4+X^3+X^2+1$$ so that it generates the multiplicative subgroup of the finite field with 256 elements. The functions $$s$$ is a permutation of $$\{ 0, 1,...,14 \}$$ defined in Table 1.

Table 1: The lookup table of $$s$$.
x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
s(x) 0 12 9 8 7 4 14 6 5 10 2 11 1 3 13

The function $$\kappa$$ is maps $$\mathbb{F}_2^4$$ to $$\mathbb{F}_2^8$$ and is defined as $$\kappa(x) = \Lambda(x) \oplus 252$$, where $$\Lambda$$ is a linear function defined as follows:

\begin{equation*} \Lambda(\mathtt{1}) = \mathtt{12}~, \Lambda(\mathtt{2}) = \mathtt{26}~, \Lambda(\mathtt{4}) = \mathtt{24}~, \Lambda(\mathtt{8}) = \mathtt{30}~. \end{equation*}

If you are not too familiar with finite field arithmetic, a simple and readable C implementation is provided below.

### 1.2 How to Implement it Using Little Space

Our best C implementation (see below) can be rearranged as follows using only newlines and spaces.

p(x){
unsigned char
*t="@rFTDVbpPBvdtfR@\xacp?\xe2>4\xa6\xe9{z\xe3q5\xa7\xe8",
a=2,l=0,b=17;
while(x && (l++,a^x)) a=2*a^a/128*29;
return l%b ? t[l%b]^t[b+l/b]^b : t[l/b]^188;
}


The newlines and spaces allow us to better see the logic.

• t stores the precomputation of both $$i \mapsto \kappa(16-i)$$ and $$j \mapsto (\alpha^{17})^{s(j)}$$ up to some constants.
• The variable $$b$$ merely serves as a shorthand for the constant $$17$$.
• The while loop computes the discrete log of x using a counter (l) that is incremented each time the variable a, which is initialized to $$\alpha$$, is multiplied by $$\alpha$$. Said multiplication is implemented as a Galois LFSR.
• The final return value uses a ternary operator to implement the logic of the if statement that is used to define the TKlog.

An even more readable implementation works as follows.

unsigned char p(unsigned char x){
unsigned char
s[]={1,221,146,79,147,153,11,68,214,215,78,220,152,10,69},
k[]={0,32,50,6,20,4,22,34,48,16,2,54,36,52,38,18,0};
if(x) {
unsigned char l=1, a=2;
while(a!=x) {
a=(a<<1)^(a>>7)*29;
l++;
}
if (l%17) return 252^k[l%17]^s[l/17];
else return 252^k[l/17];
}
else return 252;
}


In this implementation:

• s stores $$(\alpha^{17})^{s(j)}$$ for all $$0 \leq j < 15$$,
• k stores the value of $$\kappa(16-i) \oplus \kappa(0)$$ for all $$0 \leq i < 17$$,
• (a<<1)^(a>>7)*29 is the multiplication by $$\alpha$$ in the finite field, and
• 252 is $$\kappa(0)$$.

An implementation can be very short or it can yield a binary file (in machine code) that is very short on some platform. Both are interesting as neither should exist for a random permutation. They are summarized in the following table.

Author(s) Language Source code (bits) Machine code (bits) $$\log_{10}(\textrm{Pr})$$ Link
Brian Shaft Algol   832 -256 link
Xavier Bonnetain, Gaëtan Leurent ARM bytecode   640 -314 link
Xavier Bonnetain C 1127   -167 link
Xavier Bonnetain C 1106   -173 link
Xavier Bonnetain C 1071   -184 link
Léo Perrin Python3 1351   -100 link
Xavier Bonnetain Rust 1141   -163 SE link
embodiment-of-ignorance C# 1128   -167 SE link
arnauld JS (ES6) 1112   -171 SE link
xnor Python 3 1057   -188 SE link
ceilingcat C (ASCII) 973   -213 SE link
odzhan ARM64 bytecode   624 -318 SE link
Kevin Cruijssen 05AB1E 592   -328 SE link
jimmy23013 CJam 504   -354 SE link
Nick Kennedy Jelly 472   -364 SE link
recursive Stax 464   -367 SE link

For more explanations about the implementations in the second half of the table, you may have a look at the challenge we posted on the codegolf stack exchange as that is where they come from. If the characters used are limited to ASCII ones, we use 7 bits per byte. Otherwise, we use 8. Note that the implementations on stackexchange may seem to use UTF-8 characters that would need more than 8 bits to be stored. However, this is simply a convenient encoding as explained here.

The quantity $$\log_{10}(\textrm{Pr})$$ is an upper-bound on the base-10 logarithm probability that a random permutation can have an implementation this short. This bound is obtained by counting the number $$n$$ of strings of the given length (or smaller) and then computing $$2^n / (256!)$$. For instance, the probability that a random permutation has an implementation of length at most 464 in Stax is at most $$10^{-367}$$, which is a very, very, very small number.

#### 2.0.1 Algol

Brian Shaft (832 bits)
The following program can be compiled into a 832 bits binary file using the Algol compiler for the Unisys MCP operating system.
Real    Procedure P(X);
Value X;
Integer X;
Begin
Value Array S(1,221,146,79,147,153,11,68,214,215,78,220,152,10,69),
K(0,32,50,6,20,4,22,34,48,16,2,54,36,52,38,18,0);
Integer L,A;
A:= 2;
If X ^= 0 Then
Begin
For L:= 1 Step 1 While A ^= X Do
A:=(Real(Boolean(A+A).[7:8] eqv ^Boolean(A.[7:1]*29))).[7:8];
P:= Real((If L mod 17 = 0 Then
Boolean(K[L div 17])
Else
Boolean(K[L mod 17]) eqv ^Boolean(S[L div 17]) eqv ^Boolean(252));
End
Else
P:= 252;
End;


#### 2.0.2 ARM Assembly

Xavier Bonnetain & Gaëtan Leurent (640 bits)
The machine code corresponding to this implementation is 80 bytes long and thus fits on 640 bits. It corresponds to a complete 80 bytes long program, with 32 bytes of data and 20 instructions (16 16-bit instructions and 4 32-bit instructions). More details are given in [2]. Xavier's webpage; Gaëtan's webpage.
        .syntax unified
.text
.align  4

.global p
p:
movs    r3, #0
cbz     r0, .L2
mov     r2, #0x1000000
lsls    r0, #24
.L3:
lsls    r2, r2, #1
it      cs
eorcs   r2, r2, #0x1d000000
cmp     r2, r0
bne     .L3
.L2:
movs    r0, #17
udiv    r2, r3, r0
mls     r3, r0, r2, r3
cbz     r3, .LL
ldrb    r3, [r1, r3]
.LL:
ldrb    r0, [r1, r2]
eors    r0, r0, r3
bx      lr
.align  2

.table_k:
.byte   -4
.byte   -36
.byte   -50
.byte   -6
.byte   -24
.byte   -8
.byte   -22
.byte   -34
.byte   -52
.byte   -20
.byte   -2
.byte   -54
.byte   -40
.byte   -56
.byte   -38
.byte   -18
.byte   -4
.table_s:
.byte   1
.byte   -35
.byte   -110
.byte   79
.byte   -109
.byte   -103
.byte   11
.byte   68
.byte   -42
.byte   -41
.byte   78
.byte   -36
.byte   -104
.byte   10
.byte   69


#### 2.0.3 C

The programs here may not be perfectly standard (see for instance the omission of int in function prototypes) but they should be successfully processed by your compiler.

Xavier Bonnetain (1106 bits)
This implementation is 158 ASCII characters long and thus fits on 1106 bits. It simplifies the previous implementation by removing the variable a containing the generator raised to power l. Instead, this programs looks for l such that $$x^{255-\ell} = 1$$. Xavier's webpage.
p(x){unsigned char*k="@rFTDVbpPBvdtfR@\xacp?\xe2>4\xa6\xe9{z\xe3q5\xa7\xe8",l=0,b=17;while(--l&&x^1)x=2*x^x/128*285;return l%b?k[l%b]^k[b+l/b]^b:k[l/b]^188;}

Xavier Bonnetain (1127 bits)
This implementation is 161 ASCII characters long and thus fits on 1127 bits. Xavier's webpage.
p(x){unsigned char*t="@rFTDVbpPBvdtfR@\xacp?\xe2>4\xa6\xe9{z\xe3q5\xa7\xe8",a=2,l=0,b=17;while(x&&(l++,a^x))a=2*a^a/128*29;return l%b?t[l%b]^t[b+l/b]^b:t[l/b]^188;}


#### 2.0.4 C (Lovecraftian Horror)

In this category, I put C programs that can be compiled by some compilers and successfully evaluate pi. The functions in this part are not portable or even standard C—Lovecraftian horrors really.

Xavier Bonnetain (1071 bits)
This implementation is 153 ASCII characters long and thus fits on 1071 bits. Xavier's webpage.
p(x){char*k="@rFTDVbpPBvdtfR@\xacp?\xe2>4\xa6\xe9{z\xe3q5\xa7\xe8";int l=256,b=17;while(--l*x^l)x=2*x^x/128*285;return l%b?k[l%b]^k[b+l/b]^b:k[l/b]^188;}


#### 2.0.5 Python3

Léo Perrin (1351 bits)
This implementation is 193 ASCII characters long, meaning that it fits on 1351 bits. It is based on Xavier's C implementation. Note that it wouldn't work with Python2.
def p(x):
t=b'@rFTDVbpPBvdtfR@\xacp?\xe2>4\xa6\xe9{z\xe3q5\xa7\xe8'
l,d=255,17
if x<2:return(252,238)[x]
while 1^x:x,l=(x<<1)^(x>>7)*285,l-1
return(188^t[l//d],d^t[l%d]^t[d+l//d])[l%d>0]


## 3 Acknowledgement

I thank Brian Shaft for pointing out some typos that have now been fixed. I also thank xnor from the codegolf stackexchange who greatly helped me when posting the challenge on said website.

## 4 Bibliography

1. Léo Perrin. Partitions in the S-Box of Streebog and Kuznyechik. IACR Transactions on Symmetric Cryptology, 2019(1), 302–329, 2019. link to tosc.iacr.org.
2. Xavier Bonnetain, Léo Perrin and Shizhu Tian. Anomalies and Vector Space Search: Tools for S-Box Reverse-Engineering. link to eprint.iacr.org.

## Footnotes:

1

The best we could find when I posted this note; Xavier actually improved it shortly after!

Last Update (by me): 19/02/2020