The Shortest Implementations of π

1. Some Explanations about our Implementation
- 1.1. The TKlog Structure
- 1.2. How to Implement it Using Little Space
2. Your Best Implementations
3. Acknowledgement
4. Bibliography

How small of a function can you write that takes as input an integer x in \(\{ 0,1,...,255 \}\) and outputs pi[x] where pi is the table given below?

Here, "small" means that the corresponding source code and/or machine code length measured in bits is short. Why do we care? It is explained here and in [2].

unsigned char pi[] = {
    0xFC,0xEE,0xDD,0x11,0xCF,0x6E,0x31,0x16,0xFB,0xC4,0xFA,0xDA,0x23,0xC5,0x04,0x4D,
    0xE9,0x77,0xF0,0xDB,0x93,0x2E,0x99,0xBA,0x17,0x36,0xF1,0xBB,0x14,0xCD,0x5F,0xC1,
    0xF9,0x18,0x65,0x5A,0xE2,0x5C,0xEF,0x21,0x81,0x1C,0x3C,0x42,0x8B,0x01,0x8E,0x4F,
    0x05,0x84,0x02,0xAE,0xE3,0x6A,0x8F,0xA0,0x06,0x0B,0xED,0x98,0x7F,0xD4,0xD3,0x1F,
    0xEB,0x34,0x2C,0x51,0xEA,0xC8,0x48,0xAB,0xF2,0x2A,0x68,0xA2,0xFD,0x3A,0xCE,0xCC,
    0xB5,0x70,0x0E,0x56,0x08,0x0C,0x76,0x12,0xBF,0x72,0x13,0x47,0x9C,0xB7,0x5D,0x87,
    0x15,0xA1,0x96,0x29,0x10,0x7B,0x9A,0xC7,0xF3,0x91,0x78,0x6F,0x9D,0x9E,0xB2,0xB1,
    0x32,0x75,0x19,0x3D,0xFF,0x35,0x8A,0x7E,0x6D,0x54,0xC6,0x80,0xC3,0xBD,0x0D,0x57,
    0xDF,0xF5,0x24,0xA9,0x3E,0xA8,0x43,0xC9,0xD7,0x79,0xD6,0xF6,0x7C,0x22,0xB9,0x03,
    0xE0,0x0F,0xEC,0xDE,0x7A,0x94,0xB0,0xBC,0xDC,0xE8,0x28,0x50,0x4E,0x33,0x0A,0x4A,
    0xA7,0x97,0x60,0x73,0x1E,0x00,0x62,0x44,0x1A,0xB8,0x38,0x82,0x64,0x9F,0x26,0x41,
    0xAD,0x45,0x46,0x92,0x27,0x5E,0x55,0x2F,0x8C,0xA3,0xA5,0x7D,0x69,0xD5,0x95,0x3B,
    0x07,0x58,0xB3,0x40,0x86,0xAC,0x1D,0xF7,0x30,0x37,0x6B,0xE4,0x88,0xD9,0xE7,0x89,
    0xE1,0x1B,0x83,0x49,0x4C,0x3F,0xF8,0xFE,0x8D,0x53,0xAA,0x90,0xCA,0xD8,0x85,0x61,
    0x20,0x71,0x67,0xA4,0x2D,0x2B,0x09,0x5B,0xCB,0x9B,0x25,0xD0,0xBE,0xE5,0x6C,0x52,
    0x59,0xA6,0x74,0xD2,0xE6,0xF4,0xB4,0xC0,0xD1,0x66,0xAF,0xC2,0x39,0x4B,0x63,0xB6
};

Send me your best implementations in the languages of your choosing and I will publish them below! The source code doesn't have to be shorter than our C implementation; the machine code doesn't have to be shorter than the one for our ARM assembly implementation.

Alternatively, you can check out the challenge we put on the codegolf stackexchange here.

The best we could find¹ in C is:

p(x){unsigned char*t="@`rFTDVbpPBvdtfR@\xacp?\xe2>4\xa6\xe9{z\xe3q5\xa7\xe8",a=2,l=0,b=17;while(++l&&a^x)a=2*a^a/128*29;return l%b?t[l%b]^t[b+l/b]^b:t[l/b]^188;}

To help you get the best implementations, here are some explanations about ours. The best implementations follow.

1 Some Explanations about our Implementation

1.1 The TKlog Structure

As I have established (see here and in [1]), this permutation actually has a very strong and very simple structure that is completely captured by the following system of equations:

\begin{equation} \begin{cases} \pi(0) &= \kappa(0), \\ \pi\left( \alpha^{17j} \right) &= \kappa(16-j), ~\mathrm{for}~ 1 \leq j < 15, \\ \pi\left( \alpha^{i + 17j} \right) &= \kappa(16-i) \oplus \alpha^{17 s(j)}, ~\mathrm{for}~ 0 \leq i < 17 ~\mathrm{and}~ 0 \leq j < 15 ~, \end{cases} \end{equation}

where \(\alpha\) is a root of \(X^8+X^4+X^3+X^2+1\) so that it generates the multiplicative subgroup of the finite field with 256 elements. The functions \(s\) is a permutation of \(\{ 0, 1,...,14 \}\) defined in Table 1.

Table 1: The lookup table of \(s\).
x	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14
s(x)	0	12	9	8	7	4	14	6	5	10	2	11	1	3	13

The function \(\kappa\) is maps \(\mathbb{F}_2^4\) to \(\mathbb{F}_2^8\) and is defined as \(\kappa(x) = \Lambda(x) \oplus 252\), where \(\Lambda\) is a linear function defined as follows:

\begin{equation*} \Lambda(\mathtt{1}) = \mathtt{12}~, \Lambda(\mathtt{2}) = \mathtt{26}~, \Lambda(\mathtt{4}) = \mathtt{24}~, \Lambda(\mathtt{8}) = \mathtt{30}~. \end{equation*}

If you are not too familiar with finite field arithmetic, a simple and readable C implementation is provided below.

1.2 How to Implement it Using Little Space

Our best C implementation (see below) can be rearranged as follows using only newlines and spaces.

p(x){
  unsigned char
      *t="@`rFTDVbpPBvdtfR@\xacp?\xe2>4\xa6\xe9{z\xe3q5\xa7\xe8",
      a=2,l=0,b=17;
  while(x && (l++,a^x)) a=2*a^a/128*29;
  return l%b ? t[l%b]^t[b+l/b]^b : t[l/b]^188;
}

The newlines and spaces allow us to better see the logic.

t stores the precomputation of both \(i \mapsto \kappa(16-i)\) and \(j \mapsto (\alpha^{17})^{s(j)}\) up to some constants.
The variable \(b\) merely serves as a shorthand for the constant \(17\).
The while loop computes the discrete log of x using a counter (l) that is incremented each time the variable a, which is initialized to \(\alpha\), is multiplied by \(\alpha\). Said multiplication is implemented as a Galois LFSR.
The final return value uses a ternary operator to implement the logic of the if statement that is used to define the TKlog.

An even more readable implementation works as follows.

unsigned char p(unsigned char x){
    unsigned char
        s[]={1,221,146,79,147,153,11,68,214,215,78,220,152,10,69},
        k[]={0,32,50,6,20,4,22,34,48,16,2,54,36,52,38,18,0};    
    if(x) {
        unsigned char l=1, a=2;
        while(a!=x) {
            a=(a<<1)^(a>>7)*29;
            l++;
        }
        if (l%17) return 252^k[l%17]^s[l/17];
        else return 252^k[l/17];
    }
    else return 252;
}

In this implementation:

s stores \((\alpha^{17})^{s(j)}\) for all \(0 \leq j < 15\),
k stores the value of \(\kappa(16-i) \oplus \kappa(0)\) for all \(0 \leq i < 17\),
(a<<1)^(a>>7)*29 is the multiplication by \(\alpha\) in the finite field, and
252 is \(\kappa(0)\).

2 Your Best Implementations

An implementation can be very short or it can yield a binary file (in machine code) that is very short on some platform. Both are interesting as neither should exist for a random permutation. They are summarized in the following table.

Author(s)	Language	Source code (bits)	Machine code (bits)	\(\log_{10}(\textrm{Pr})\)	Link
Brian Shaft	Algol		832	-256	link
Xavier Bonnetain, Gaëtan Leurent	ARM bytecode		640	-314	link
Xavier Bonnetain	C	1127		-167	link
Xavier Bonnetain	C	1106		-173	link
Xavier Bonnetain	C	1071		-184	link
Léo Perrin	Python3	1351		-100	link
Xavier Bonnetain	Rust	1141		-163	SE link
embodiment-of-ignorance	C#	1128		-167	SE link
arnauld	JS (ES6)	1112		-171	SE link
xnor	Python 3	1057		-188	SE link
ceilingcat	C (ASCII)	973		-213	SE link
odzhan	ARM64 bytecode		624	-318	SE link
Kevin Cruijssen	05AB1E	592		-328	SE link
jimmy23013	CJam	504		-354	SE link
Nick Kennedy	Jelly	472		-364	SE link
recursive	Stax	464		-367	SE link

For more explanations about the implementations in the second half of the table, you may have a look at the challenge we posted on the codegolf stack exchange as that is where they come from. If the characters used are limited to ASCII ones, we use 7 bits per byte. Otherwise, we use 8. Note that the implementations on stackexchange may seem to use UTF-8 characters that would need more than 8 bits to be stored. However, this is simply a convenient encoding as explained here.

The quantity \(\log_{10}(\textrm{Pr})\) is an upper-bound on the base-10 logarithm probability that a random permutation can have an implementation this short. This bound is obtained by counting the number \(n\) of strings of the given length (or smaller) and then computing \(2^n / (256!)\). For instance, the probability that a random permutation has an implementation of length at most 464 in Stax is at most \(10^{-367}\), which is a very, very, very small number.

2.0.1 Algol

Brian Shaft (832 bits): The following program can be compiled into a 832 bits binary file using the Algol compiler for the Unisys MCP operating system.

Real    Procedure P(X);
   Value X;
   Integer X;
Begin
   Value Array S(1,221,146,79,147,153,11,68,214,215,78,220,152,10,69),
               K(0,32,50,6,20,4,22,34,48,16,2,54,36,52,38,18,0);
   Integer L,A;
   A:= 2;
   If X ^= 0 Then
   Begin
      For L:= 1 Step 1 While A ^= X Do
         A:=(Real(Boolean(A+A).[7:8] eqv ^Boolean(A.[7:1]*29))).[7:8];
      P:= Real((If L mod 17 = 0 Then
                   Boolean(K[L div 17])
                Else
                   Boolean(K[L mod 17]) eqv ^Boolean(S[L div 17]) eqv ^Boolean(252));
   End
   Else
      P:= 252;
End;

2.0.2 ARM Assembly

Xavier Bonnetain & Gaëtan Leurent (640 bits): The machine code corresponding to this implementation is 80 bytes long and thus fits on 640 bits. It corresponds to a complete 80 bytes long program, with 32 bytes of data and 20 instructions (16 16-bit instructions and 4 32-bit instructions). More details are given in [2]. Xavier's webpage; Gaëtan's webpage.

        .syntax unified
        .text
        .align  4

        .global p
p:
        movs    r3, #0
        cbz     r0, .L2
        mov     r2, #0x1000000
        lsls    r0, #24
.L3:
        lsls    r2, r2, #1
        it      cs
        eorcs   r2, r2, #0x1d000000
        adds    r3, r3, #1
        cmp     r2, r0
        bne     .L3
.L2:
        movs    r0, #17
        udiv    r2, r3, r0
        mls     r3, r0, r2, r3
        adr     r1, .table_k
        cbz     r3, .LL
        ldrb    r3, [r1, r3]
        add     r1, r0
.LL:
        ldrb    r0, [r1, r2]
        eors    r0, r0, r3
        bx      lr
        .align  2

.table_k:
        .byte   -4
        .byte   -36
        .byte   -50
        .byte   -6
        .byte   -24
        .byte   -8
        .byte   -22
        .byte   -34
        .byte   -52
        .byte   -20
        .byte   -2
        .byte   -54
        .byte   -40
        .byte   -56
        .byte   -38
        .byte   -18
        .byte   -4
.table_s:
        .byte   1
        .byte   -35
        .byte   -110
        .byte   79
        .byte   -109
        .byte   -103
        .byte   11
        .byte   68
        .byte   -42
        .byte   -41
        .byte   78
        .byte   -36
        .byte   -104
        .byte   10
        .byte   69

2.0.3 C

The programs here may not be perfectly standard (see for instance the omission of int in function prototypes) but they should be successfully processed by your compiler.

Xavier Bonnetain (1106 bits)

This implementation is 158 ASCII characters long and thus fits on 1106 bits. It simplifies the previous implementation by removing the variable a containing the generator raised to power l. Instead, this programs looks for l such that \(x^{255-\ell} = 1\). Xavier's webpage.

p(x){unsigned char*k="@`rFTDVbpPBvdtfR@\xacp?\xe2>4\xa6\xe9{z\xe3q5\xa7\xe8",l=0,b=17;while(--l&&x^1)x=2*x^x/128*285;return l%b?k[l%b]^k[b+l/b]^b:k[l/b]^188;}

Xavier Bonnetain (1127 bits)

This implementation is 161 ASCII characters long and thus fits on 1127 bits. Xavier's webpage.

p(x){unsigned char*t="@`rFTDVbpPBvdtfR@\xacp?\xe2>4\xa6\xe9{z\xe3q5\xa7\xe8",a=2,l=0,b=17;while(x&&(l++,a^x))a=2*a^a/128*29;return l%b?t[l%b]^t[b+l/b]^b:t[l/b]^188;}

2.0.4 C (Lovecraftian Horror)

In this category, I put C programs that can be compiled by some compilers and successfully evaluate pi. The functions in this part are not portable or even standard C—Lovecraftian horrors really.

Xavier Bonnetain (1071 bits)

This implementation is 153 ASCII characters long and thus fits on 1071 bits. Xavier's webpage.

p(x){char*k="@`rFTDVbpPBvdtfR@\xacp?\xe2>4\xa6\xe9{z\xe3q5\xa7\xe8";int l=256,b=17;while(--l*x^l)x=2*x^x/128*285;return l%b?k[l%b]^k[b+l/b]^b:k[l/b]^188;}

2.0.5 Python3

Léo Perrin (1351 bits): This implementation is 193 ASCII characters long, meaning that it fits on 1351 bits. It is based on Xavier's C implementation. Note that it wouldn't work with Python2.

def p(x):
 t=b'@`rFTDVbpPBvdtfR@\xacp?\xe2>4\xa6\xe9{z\xe3q5\xa7\xe8'
 l,d=255,17
 if x<2:return(252,238)[x]
 while 1^x:x,l=(x<<1)^(x>>7)*285,l-1
 return(188^t[l//d],d^t[l%d]^t[d+l//d])[l%d>0]

3 Acknowledgement

I thank Brian Shaft for pointing out some typos that have now been fixed. I also thank xnor from the codegolf stackexchange who greatly helped me when posting the challenge on said website.

4 Bibliography

Léo Perrin. Partitions in the S-Box of Streebog and Kuznyechik. IACR Transactions on Symmetric Cryptology, 2019(1), 302–329, 2019. link to tosc.iacr.org.
Xavier Bonnetain, Léo Perrin and Shizhu Tian. Anomalies and Vector Space Search: Tools for S-Box Reverse-Engineering. link to eprint.iacr.org.

Footnotes:

The best we could find when I posted this note; Xavier actually improved it shortly after!