The Shortest Implementations of π
Table of Contents
How small of a function can you write that takes as input an integer
x
in \(\{ 0,1,...,255 \}\) and outputs pi[x]
where pi
is the table
given below?
Here, "small" means that the corresponding source code and/or machine code length measured in bits is short. Why do we care? It is explained here and in [2].
unsigned char pi[] = { 0xFC,0xEE,0xDD,0x11,0xCF,0x6E,0x31,0x16,0xFB,0xC4,0xFA,0xDA,0x23,0xC5,0x04,0x4D, 0xE9,0x77,0xF0,0xDB,0x93,0x2E,0x99,0xBA,0x17,0x36,0xF1,0xBB,0x14,0xCD,0x5F,0xC1, 0xF9,0x18,0x65,0x5A,0xE2,0x5C,0xEF,0x21,0x81,0x1C,0x3C,0x42,0x8B,0x01,0x8E,0x4F, 0x05,0x84,0x02,0xAE,0xE3,0x6A,0x8F,0xA0,0x06,0x0B,0xED,0x98,0x7F,0xD4,0xD3,0x1F, 0xEB,0x34,0x2C,0x51,0xEA,0xC8,0x48,0xAB,0xF2,0x2A,0x68,0xA2,0xFD,0x3A,0xCE,0xCC, 0xB5,0x70,0x0E,0x56,0x08,0x0C,0x76,0x12,0xBF,0x72,0x13,0x47,0x9C,0xB7,0x5D,0x87, 0x15,0xA1,0x96,0x29,0x10,0x7B,0x9A,0xC7,0xF3,0x91,0x78,0x6F,0x9D,0x9E,0xB2,0xB1, 0x32,0x75,0x19,0x3D,0xFF,0x35,0x8A,0x7E,0x6D,0x54,0xC6,0x80,0xC3,0xBD,0x0D,0x57, 0xDF,0xF5,0x24,0xA9,0x3E,0xA8,0x43,0xC9,0xD7,0x79,0xD6,0xF6,0x7C,0x22,0xB9,0x03, 0xE0,0x0F,0xEC,0xDE,0x7A,0x94,0xB0,0xBC,0xDC,0xE8,0x28,0x50,0x4E,0x33,0x0A,0x4A, 0xA7,0x97,0x60,0x73,0x1E,0x00,0x62,0x44,0x1A,0xB8,0x38,0x82,0x64,0x9F,0x26,0x41, 0xAD,0x45,0x46,0x92,0x27,0x5E,0x55,0x2F,0x8C,0xA3,0xA5,0x7D,0x69,0xD5,0x95,0x3B, 0x07,0x58,0xB3,0x40,0x86,0xAC,0x1D,0xF7,0x30,0x37,0x6B,0xE4,0x88,0xD9,0xE7,0x89, 0xE1,0x1B,0x83,0x49,0x4C,0x3F,0xF8,0xFE,0x8D,0x53,0xAA,0x90,0xCA,0xD8,0x85,0x61, 0x20,0x71,0x67,0xA4,0x2D,0x2B,0x09,0x5B,0xCB,0x9B,0x25,0xD0,0xBE,0xE5,0x6C,0x52, 0x59,0xA6,0x74,0xD2,0xE6,0xF4,0xB4,0xC0,0xD1,0x66,0xAF,0xC2,0x39,0x4B,0x63,0xB6 };
Send me your best implementations in the languages of your choosing and I will publish them below! The source code doesn't have to be shorter than our C implementation; the machine code doesn't have to be shorter than the one for our ARM assembly implementation.
Alternatively, you can check out the challenge we put on the codegolf stackexchange here.
The best we could find1 in C is:
p(x){unsigned char*t="@`rFTDVbpPBvdtfR@\xacp?\xe2>4\xa6\xe9{z\xe3q5\xa7\xe8",a=2,l=0,b=17;while(++l&&a^x)a=2*a^a/128*29;return l%b?t[l%b]^t[b+l/b]^b:t[l/b]^188;}
To help you get the best implementations, here are some explanations about ours. The best implementations follow.
1 Some Explanations about our Implementation
1.1 The TKlog Structure
As I have established (see here and in [1]), this permutation actually has a very strong and very simple structure that is completely captured by the following system of equations:
\begin{equation} \begin{cases} \pi(0) &= \kappa(0), \\ \pi\left( \alpha^{17j} \right) &= \kappa(16-j), ~\mathrm{for}~ 1 \leq j < 15, \\ \pi\left( \alpha^{i + 17j} \right) &= \kappa(16-i) \oplus \alpha^{17 s(j)}, ~\mathrm{for}~ 0 \leq i < 17 ~\mathrm{and}~ 0 \leq j < 15 ~, \end{cases} \end{equation}where \(\alpha\) is a root of \(X^8+X^4+X^3+X^2+1\) so that it generates the multiplicative subgroup of the finite field with 256 elements. The functions \(s\) is a permutation of \(\{ 0, 1,...,14 \}\) defined in Table 1.
x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
s(x) | 0 | 12 | 9 | 8 | 7 | 4 | 14 | 6 | 5 | 10 | 2 | 11 | 1 | 3 | 13 |
The function \(\kappa\) is maps \(\mathbb{F}_2^4\) to \(\mathbb{F}_2^8\) and is defined as \(\kappa(x) = \Lambda(x) \oplus 252\), where \(\Lambda\) is a linear function defined as follows:
\begin{equation*} \Lambda(\mathtt{1}) = \mathtt{12}~, \Lambda(\mathtt{2}) = \mathtt{26}~, \Lambda(\mathtt{4}) = \mathtt{24}~, \Lambda(\mathtt{8}) = \mathtt{30}~. \end{equation*}If you are not too familiar with finite field arithmetic, a simple and readable C implementation is provided below.
1.2 How to Implement it Using Little Space
Our best C implementation (see below) can be rearranged as follows using only newlines and spaces.
p(x){ unsigned char *t="@`rFTDVbpPBvdtfR@\xacp?\xe2>4\xa6\xe9{z\xe3q5\xa7\xe8", a=2,l=0,b=17; while(x && (l++,a^x)) a=2*a^a/128*29; return l%b ? t[l%b]^t[b+l/b]^b : t[l/b]^188; }
The newlines and spaces allow us to better see the logic.
t
stores the precomputation of both \(i \mapsto \kappa(16-i)\) and \(j \mapsto (\alpha^{17})^{s(j)}\) up to some constants.- The variable \(b\) merely serves as a shorthand for the constant \(17\).
- The
while
loop computes the discrete log ofx
using a counter (l
) that is incremented each time the variablea
, which is initialized to \(\alpha\), is multiplied by \(\alpha\). Said multiplication is implemented as a Galois LFSR. - The final return value uses a ternary operator to implement the
logic of the
if
statement that is used to define the TKlog.
An even more readable implementation works as follows.
unsigned char p(unsigned char x){ unsigned char s[]={1,221,146,79,147,153,11,68,214,215,78,220,152,10,69}, k[]={0,32,50,6,20,4,22,34,48,16,2,54,36,52,38,18,0}; if(x) { unsigned char l=1, a=2; while(a!=x) { a=(a<<1)^(a>>7)*29; l++; } if (l%17) return 252^k[l%17]^s[l/17]; else return 252^k[l/17]; } else return 252; }
In this implementation:
s
stores \((\alpha^{17})^{s(j)}\) for all \(0 \leq j < 15\),k
stores the value of \(\kappa(16-i) \oplus \kappa(0)\) for all \(0 \leq i < 17\),(a<<1)^(a>>7)*29
is the multiplication by \(\alpha\) in the finite field, and252
is \(\kappa(0)\).
2 Your Best Implementations
An implementation can be very short or it can yield a binary file (in machine code) that is very short on some platform. Both are interesting as neither should exist for a random permutation. They are summarized in the following table.
Author(s) | Language | Source code (bits) | Machine code (bits) | \(\log_{10}(\textrm{Pr})\) | Link |
---|---|---|---|---|---|
Brian Shaft | Algol | 832 | -256 | link | |
Xavier Bonnetain, Gaëtan Leurent | ARM bytecode | 640 | -314 | link | |
Xavier Bonnetain | C | 1127 | -167 | link | |
Xavier Bonnetain | C | 1106 | -173 | link | |
Xavier Bonnetain | C | 1071 | -184 | link | |
Léo Perrin | Python3 | 1351 | -100 | link | |
Xavier Bonnetain | Rust | 1141 | -163 | SE link | |
embodiment-of-ignorance | C# | 1128 | -167 | SE link | |
arnauld | JS (ES6) | 1112 | -171 | SE link | |
xnor | Python 3 | 1057 | -188 | SE link | |
ceilingcat | C (ASCII) | 973 | -213 | SE link | |
odzhan | ARM64 bytecode | 624 | -318 | SE link | |
Kevin Cruijssen | 05AB1E | 592 | -328 | SE link | |
jimmy23013 | CJam | 504 | -354 | SE link | |
Nick Kennedy | Jelly | 472 | -364 | SE link | |
recursive | Stax | 464 | -367 | SE link |
For more explanations about the implementations in the second half of the table, you may have a look at the challenge we posted on the codegolf stack exchange as that is where they come from. If the characters used are limited to ASCII ones, we use 7 bits per byte. Otherwise, we use 8. Note that the implementations on stackexchange may seem to use UTF-8 characters that would need more than 8 bits to be stored. However, this is simply a convenient encoding as explained here.
The quantity \(\log_{10}(\textrm{Pr})\) is an upper-bound on the base-10 logarithm probability that a random permutation can have an implementation this short. This bound is obtained by counting the number \(n\) of strings of the given length (or smaller) and then computing \(2^n / (256!)\). For instance, the probability that a random permutation has an implementation of length at most 464 in Stax is at most \(10^{-367}\), which is a very, very, very small number.
2.0.1 Algol
- Brian Shaft (832 bits)
- The following program can be compiled into a 832 bits binary file using the Algol compiler for the Unisys MCP operating system.
Real Procedure P(X); Value X; Integer X; Begin Value Array S(1,221,146,79,147,153,11,68,214,215,78,220,152,10,69), K(0,32,50,6,20,4,22,34,48,16,2,54,36,52,38,18,0); Integer L,A; A:= 2; If X ^= 0 Then Begin For L:= 1 Step 1 While A ^= X Do A:=(Real(Boolean(A+A).[7:8] eqv ^Boolean(A.[7:1]*29))).[7:8]; P:= Real((If L mod 17 = 0 Then Boolean(K[L div 17]) Else Boolean(K[L mod 17]) eqv ^Boolean(S[L div 17]) eqv ^Boolean(252)); End Else P:= 252; End;
2.0.2 ARM Assembly
- Xavier Bonnetain & Gaëtan Leurent (640 bits)
- The machine code corresponding to this implementation is 80 bytes long and thus fits on 640 bits. It corresponds to a complete 80 bytes long program, with 32 bytes of data and 20 instructions (16 16-bit instructions and 4 32-bit instructions). More details are given in [2]. Xavier's webpage; Gaëtan's webpage.
.syntax unified .text .align 4 .global p p: movs r3, #0 cbz r0, .L2 mov r2, #0x1000000 lsls r0, #24 .L3: lsls r2, r2, #1 it cs eorcs r2, r2, #0x1d000000 adds r3, r3, #1 cmp r2, r0 bne .L3 .L2: movs r0, #17 udiv r2, r3, r0 mls r3, r0, r2, r3 adr r1, .table_k cbz r3, .LL ldrb r3, [r1, r3] add r1, r0 .LL: ldrb r0, [r1, r2] eors r0, r0, r3 bx lr .align 2 .table_k: .byte -4 .byte -36 .byte -50 .byte -6 .byte -24 .byte -8 .byte -22 .byte -34 .byte -52 .byte -20 .byte -2 .byte -54 .byte -40 .byte -56 .byte -38 .byte -18 .byte -4 .table_s: .byte 1 .byte -35 .byte -110 .byte 79 .byte -109 .byte -103 .byte 11 .byte 68 .byte -42 .byte -41 .byte 78 .byte -36 .byte -104 .byte 10 .byte 69
2.0.3 C
The programs here may not be perfectly standard (see for instance the
omission of int
in function prototypes) but they should be
successfully processed by your compiler.
- Xavier Bonnetain (1106 bits)
- This implementation is
158 ASCII characters long and thus fits on 1106 bits. It
simplifies the previous implementation by removing the variable
a
containing the generator raised to powerl
. Instead, this programs looks forl
such that \(x^{255-\ell} = 1\). Xavier's webpage.p(x){unsigned char*k="@`rFTDVbpPBvdtfR@\xacp?\xe2>4\xa6\xe9{z\xe3q5\xa7\xe8",l=0,b=17;while(--l&&x^1)x=2*x^x/128*285;return l%b?k[l%b]^k[b+l/b]^b:k[l/b]^188;}
- Xavier Bonnetain (1127 bits)
- This implementation is 161 ASCII characters long and thus fits on 1127 bits. Xavier's webpage.
p(x){unsigned char*t="@`rFTDVbpPBvdtfR@\xacp?\xe2>4\xa6\xe9{z\xe3q5\xa7\xe8",a=2,l=0,b=17;while(x&&(l++,a^x))a=2*a^a/128*29;return l%b?t[l%b]^t[b+l/b]^b:t[l/b]^188;}
2.0.4 C (Lovecraftian Horror)
In this category, I put C programs that can be compiled by
some compilers and successfully evaluate pi
. The functions in
this part are not portable or even standard C—Lovecraftian
horrors really.
- Xavier Bonnetain (1071 bits)
- This implementation is 153 ASCII characters long and thus fits on 1071 bits. Xavier's webpage.
p(x){char*k="@`rFTDVbpPBvdtfR@\xacp?\xe2>4\xa6\xe9{z\xe3q5\xa7\xe8";int l=256,b=17;while(--l*x^l)x=2*x^x/128*285;return l%b?k[l%b]^k[b+l/b]^b:k[l/b]^188;}
2.0.5 Python3
- Léo Perrin (1351 bits)
- This implementation is 193 ASCII characters long, meaning that it fits on 1351 bits. It is based on Xavier's C implementation. Note that it wouldn't work with Python2.
def p(x): t=b'@`rFTDVbpPBvdtfR@\xacp?\xe2>4\xa6\xe9{z\xe3q5\xa7\xe8' l,d=255,17 if x<2:return(252,238)[x] while 1^x:x,l=(x<<1)^(x>>7)*285,l-1 return(188^t[l//d],d^t[l%d]^t[d+l//d])[l%d>0]
3 Acknowledgement
I thank Brian Shaft for pointing out some typos that have now been fixed. I also thank xnor from the codegolf stackexchange who greatly helped me when posting the challenge on said website.
4 Bibliography
- Léo Perrin. Partitions in the S-Box of Streebog and Kuznyechik. IACR Transactions on Symmetric Cryptology, 2019(1), 302–329, 2019. link to tosc.iacr.org.
- Xavier Bonnetain, Léo Perrin and Shizhu Tian. Anomalies and Vector Space Search: Tools for S-Box Reverse-Engineering. link to eprint.iacr.org.
Footnotes:
The best we could find when I posted this note; Xavier actually improved it shortly after!