# On S-Box Reverse-Engineering

A S(ubstitution)-box is a non-linear function $$S$$ defined as follows for some $$n$$ and $$m$$: $$S : \begin{cases} \{ 0,1 \}^n &\to \{ 0,1 \}^m \\ x &\mapsto S(x) ~,\end{cases}$$ or, in other words, that map the set of $$n$$-bit strings to the set of $$m$$-bit string. In practice, $$n$$ is small enough that $$S$$ can be specified via its Look-up table, i.e. using the list $$\{S(0), S(1), ... \}$$. Such components are very common in symmetric cryptography.

S-box reverse-engineering is the term corresponding to the techniques and algorithms that can be used to recover the method used to generate an S-box. Indeed, due to the importance they have in the security of the ciphers using them, S-boxes are carefully engineered. The aim of reverse-engineer is to invert this process. In particular, we look for decomposition, that is, an algorithm that can evaluate the function without using its look-up table. It can be a C program, a math formula…

I have worked a lot on the topic of S-box reverse-engineering—along with many co-authors! A summary of our results on this topic is given below.

## 1 On the Russian S-box

### 1.1 Index

This is likely why you ended on this page. The results on this topic are summarized in the following dedicated pages.

First Results on Reverse-Engineering
An overview of the first results on this topic is ; I obtained them with Alex Biryukov. It shows the basic approach to follow if you need to reverse-engineer an S-box you found in the wild. Work in progress
On the S-box of Streebog and Kuznyechik
The details of the structure I identified in the Russian S-box. The context and the details of said structure (the TKlog) are described.
Disproving a Coincidence with Code-Golf
The main argument proving that a structure at least as simple as a TKlog cannot be obtained by chance.
The Shortest Implementations of π
The shortest implementations of $$\pi$$ (which play a crucial role in the argument outlined in the previous point) are listed on this page.

As of February 2020, the (alleged) designers of $$\pi$$ still claim that they have generated their S-box by first creating a bunch using a Fisher-Yates shuffle and then picking the best among them using some usual criteria.

The likelihood of this being correct is lower than the probability of getting 1000 heads when tossing an unbiased coin 1000 times (literally: it is possible to compute this probability and we did the math).

### 1.2 Timeline

The sequence of papers and events related to $$\pi$$ can divided into three phases.

#### 1.2.1 Standardization

At first, nobody looked into $$\pi$$ and the standardization of Kuznyechik and Streebog went smoothly.

• 2012 July : Standardization of Streebog in Russia (GOST R34.12-2015)
• 2013 August : Standardization of Streebog by the IETF (RFC RFC 6986)
• 2015 June : Standardization of Kuznyechik in Russia (GOST R34.12-2015)
• 2016 March : Standardization of Kuznyechik by the IETF (RFC 7801)

#### 1.2.2 Increasingly Structured Decompositions

With my co-authors, we started looking at the S-box and found several distinct decompositions of $$\pi$$. In parallel, the Russian algorithms were being pushed for standardization by ISO.

• 2016 May : Publication of the first decomposition [1]. It merely showed the presence of non-random behaviour in $$\pi$$ and that a hardware implementation could be optimized with this knowledge.
• 2017 Mar : Publication of the second decomposition [7]. It showed that $$\pi$$ is also related to a discrete logarithm in a non-intuitive way. To find this result, we identified similarities between $$\pi$$ and the S-box of BelT (the Belarussian standard block cipher). As the latter is known to be based on a finite-field exponential and as the inverse of $$\pi$$ has somewhat similar properties, it was natural to conclude that there is a relation between them.
• 2018 Oct : Standardization of Streebog by ISO (it was added to standard 10118-3).
• 2019 Jan : Publication of the third (and last) decomposition [5]: $$\pi$$ has a TKlog structure. I think the TKlog is the one intended by its designers:
• it is much simpler than the previous two decompositions (see a description of its "simplicity" here),
• the previous two decompositions are easily explained by this third one, and
• the TKlog structure highlights some very strong mathematical properties of $$\pi$$ that were not known before (see here).

These results lead to me advise against using the algorithms until the authors of these algorithms explain their design process. At the time, I didn't know that they were actively denying using a structure.

#### 1.2.3 Time for Action

In March 2019, I was told that the designers of $$\pi$$ were still claiming a random process. I knew that was their initial claim but I did not know that they were insisting.1 Upon seeing the document they provided describing their alleged design process (which have since been made available online, see the design description and the discussions), I decided to actively try and stop the standardization/deprecate these algorithms.

• 2019 February : ISO was supposed to make a decision regarding the standardization of Kuznyechik but, as a consequence of [5], it was agreed to postpone this decision to October.
• 2019 June : Xavier Bonnetain presented our work at a French conference (the SSTIC) showing that "standardized" does not equal "good".
• 2019 June : With Xavier Bonnetain, we posted our code golf challenge. See here for why it matters.
• 2019 July : I gave a (remote) talk at IETF 105 calling for the deprecation of the RFCs of Kuznyechik and Streebog.
• 2019 October : The ISO meeting where a decision had to be taken took place in Paris. I attended it, along with the alleged designer of $$\pi$$. Let us say that this did not change my position: $$\pi$$ was not generated in the way that its designers claim, meaning that they are lying and thus that their algorithms should not be used.
• 2019 December : Our paper presenting the methods that can be used to reverse-engineer S-boxes [2] was published in the proceedings of Asiacrypt 2019. Its main conclusion is that, as you would expect, obtaining anything as structured as a TKlog by chance is pretty much impossible.

## 2 Do It Yourself

### 2.1 Tutorial

Work in progress

I am writing a SAGE tutorial on S-box reverse-engineering, it is available here. Its first version was written for an invited lecture on S-box reverse-engineering I gave in spring 2019 at Rostock university.

### 2.2SboxU

Overtime, I have written my own SAGE toolbox for studying Boolean functions and S-boxes. It contains optimized implementations of functions that are provided by SAGE (i.e. the one returning the differential spectrum in SboxU does not store the full DDT in memory, is written in C++, and is multi-threaded), as well as functions that are unique to it, in particular the ones returning all the vector spaces of a fixed dimension that are contained in a set of vectors (the corresponding algorithm is the one specified in [2]).

To learn how to install and use sboxU, see the Documentation of sboxU.

## 3 On APN Functions

Work in progress

The techniques we developed to decompose the Russian S-box have broader applications. In particular, Aleksei Udovenko had the (brillant) idea to apply them to the study of the only known APN permutations on 6 bits. This allowed us to obtain the butterfly structure, of which this permutation is a particular case (see [8]).

With Anne Canteaut and Sébastien Duval, we unfortunately showed that even a first generalization of this structure could not produce APN permutations for $$n > 6$$ (see [3]). Worst, with Anne Canteaut and Shizhu Tian, we showed that the same held for an even broader generalization of the initial butterfly structure [4].

## 4 References

1. Alex Biryukov, Léo Perrin, and Aleksei Udovenko. Reverse-engineering the S-box of streebog, kuznyechik and STRIBOBr1. In Marc Fischlin and Jean-Sébastien Coron, editors, Advances in Cryptology – EUROCRYPT 2016, Part I, volume 9665 of Lecture Notes in Computer Science, pages 372–402. Springer, Heidelberg, May 2016. link to eprint.iacr.org.
2. Xavier Bonnetain, Léo Perrin, and Shizhu Tian. Anomalies and Vector Space Search: Tools for S-Box Analysis. In Steven Galbraith and Shiho Moriai, editors, Advances in Cryptology – ASIACRYPT 2019, Part I, volume 11921 of Lecture Notes in Computer Science, pages 196–223. Springer, Heidelberg, December 2019. link to eprint.iacr.org.
3. Anne Canteaut, Sébastien Duval, and Léo Perrin. A generalisation of Dillon's APN permutation with the best known differential and nonlinear properties for all fields of size $$2^{4k+2}$$. IEEE Transactions on Information Theory, 63(11):7575–7591, Nov 2017. link to eprint.iacr.org.
4. Anne Canteaut, Léo Perrin and Shizhu Tian. If a Generalised Butterfly is APN then it Operates on 6 Bits. Cryptography and Communications, pp. 1–18, April 2019. link to eprint.iacr.org.
5. Léo Perrin. Partitions in the S-Box of Streebog and Kuznyechik IACR Transactions on Symmetric Cryptology, 2019(1), 302–329, 2019. link to tosc.iacr.org. Presentation (by myself): link to youtube.
6. Léo Perrin and Xavier Bonnetain. Russian Style (Lack of) Randomness. Symposium sur la sécurité des technologies de l'information et des communications, 2019. The article webpage contains links to the article itself (in English), the slides (in English), and the talk itself given by Xavier Bonnetain (in French).
7. Léo Perrin and Aleksei Udovenko. Exponential S-boxes: a link between the S-boxes of BelT and Kuznyechik/Streebog. IACR Transactions on Symmetric Cryptology, 2016(2):99–124, 2017. link to tosc.iacr.org.
8. Léo Perrin, Aleksei Udovenko, and Alex Biryukov. Cryptanalysis of a theorem: Decomposing the only known solution to the big APN problem. In Matthew Robshaw and Jonathan Katz, editors, Advances in Cryptology – CRYPTO 2016, Part II, volume 9815 of Lecture Notes in Computer Science, pages 93–122. Springer, Heidelberg, August 2016. link to eprint.iacr.org; Presentation (by Aleskei Udovenko): link to youtube.

## Footnotes:

1

The discussion at ISO are held behind closed doors with little to no input from the outside. Such processes are not conducive to the best discussions…

I had been contacted by a country's representative before, when the standardization of Streebog was being considered. My results at the time did not allow me to provide irrefutable proofs that the designers were lying, so I did not insist on stopping its standardization.

Last Update (by me): 10/03/2020