# On S-Box Reverse-Engineering

## Table of Contents

A **S(ubstitution)-box** is a non-linear function \(S\) defined as follows
for some \(n\) and \(m\):
$$
S : \begin{cases} \{ 0,1 \}^n &\to \{ 0,1 \}^m \\ x &\mapsto S(x) ~,\end{cases}
$$
or, in other words, that map the set of \(n\)-bit strings to the set of
\(m\)-bit string. In practice, \(n\) is small enough that \(S\) can be
specified via its *Look-up table*, i.e. using the list \(\{S(0), S(1),
... \}\). Such components are very common in symmetric cryptography.

**S-box reverse-engineering** is the term corresponding to the
techniques and algorithms that can be used to recover the method used
to generate an S-box. Indeed, due to the importance they have in the
security of the ciphers using them, S-boxes are carefully
*engineered*. The aim of *reverse-engineer* is to invert this
process. In particular, we look for **decomposition**, that is, an
algorithm that can evaluate the function *without* using its look-up
table. It can be a `C`

program, a math formula…

I have worked a lot on the topic of S-box reverse-engineering—along with many co-authors! A summary of our results on this topic is given below.

## 1 On the Russian S-box

### 1.1 Index

This is likely why you ended on this page. The results on this topic are summarized in the following dedicated pages.

- First Results on Reverse-Engineering
- An overview of the first
results on this topic is ; I obtained them with Alex Biryukov. It
shows the basic approach to follow if you need to
reverse-engineer an S-box you found in the wild.
**Work in progress** - On the S-box of Streebog and Kuznyechik
- The details of the
structure I identified in the Russian S-box. The context and the
details of said structure (the
*TKlog*) are described. - Disproving a Coincidence with Code-Golf
- The main argument proving
that a structure
*at least as simple as a TKlog*cannot be obtained by chance. - The Shortest Implementations of π
- The shortest implementations of \(\pi\) (which play a crucial role in the argument outlined in the previous point) are listed on this page.

As of February 2020, the (alleged) designers of \(\pi\) still claim that they have generated their S-box by first creating a bunch using a Fisher-Yates shuffle and then picking the best among them using some usual criteria.

The likelihood of this being correct is lower than the probability of getting 1000 heads when tossing an unbiased coin 1000 times (literally: it is possible to compute this probability and we did the math).

### 1.2 Timeline

The sequence of papers and events related to \(\pi\) can divided into three phases.

#### 1.2.1 Standardization

At first, nobody looked into \(\pi\) and the standardization of Kuznyechik and Streebog went smoothly.

#### 1.2.2 Increasingly Structured Decompositions

With my co-authors, we started looking at the S-box and found several distinct decompositions of \(\pi\). In parallel, the Russian algorithms were being pushed for standardization by ISO.

**2016 May**: Publication of the first decomposition [1]. It merely showed the presence of non-random behaviour in \(\pi\) and that a hardware implementation could be optimized with this knowledge.**2017 Mar**: Publication of the second decomposition [7]. It showed that \(\pi\) is also related to a discrete logarithm in a non-intuitive way. To find this result, we identified similarities between \(\pi\) and the S-box of BelT (the Belarussian standard block cipher). As the latter is known to be based on a finite-field exponential and as the inverse of \(\pi\) has somewhat similar properties, it was natural to conclude that there is a relation between them.**2018 Oct**: Standardization of Streebog by ISO (it was added to standard 10118-3).**2019 Jan**: Publication of the third (and last) decomposition [5]: \(\pi\) has a**TKlog**structure. I think the TKlog is the one intended by its designers:- it is much simpler than the previous two decompositions (see a description of its "simplicity" here),
- the previous two decompositions are easily explained by this third one, and
- the TKlog structure highlights some very strong mathematical properties of \(\pi\) that were not known before (see here).

These results lead to me advise against using the algorithms until the authors of these algorithms explain their design process. At the time, I didn't know that they were actively

*denying*using a structure.

#### 1.2.3 Time for Action

In March 2019, I was told that the designers of \(\pi\) were still
claiming a random process. I knew that was their initial claim but I
did not know that they were insisting.^{1} Upon seeing the document
they provided describing their alleged design process (which have
since been made available online, see the design description and
the discussions), I decided to actively try and stop the
standardization/deprecate these algorithms.

**2019 February**: ISO was supposed to make a decision regarding the standardization of Kuznyechik but, as a consequence of [5], it was agreed to postpone this decision to October.**2019 June**: Xavier Bonnetain presented our work at a French conference (the SSTIC) showing that "standardized" does not equal "good".**2019 June**: With Xavier Bonnetain, we posted our code golf challenge. See here for why it matters.**2019 July**: I gave a (remote) talk at IETF 105 calling for the deprecation of the RFCs of Kuznyechik and Streebog.**2019 October**: The ISO meeting where a decision had to be taken took place in Paris. I attended it, along with the alleged designer of \(\pi\). Let us say that this did not change my position: \(\pi\) was not generated in the way that its designers claim, meaning that they are lying and thus that their algorithms should not be used.**2019 December**: Our paper presenting the methods that can be used to reverse-engineer S-boxes [2] was published in the proceedings of Asiacrypt 2019. Its main conclusion is that, as you would expect, obtaining anything as structured as a TKlog by chance is pretty much impossible.

## 2 Do It Yourself

### 2.1 Tutorial

### 2.2 `SboxU`

Overtime, I have written my own SAGE toolbox for studying Boolean
functions and S-boxes. It contains optimized implementations of
functions that are provided by SAGE (i.e. the one returning the
differential spectrum in `SboxU`

does not store the full DDT in
memory, is written in C++, and is multi-threaded), as well as
functions that are unique to it, in particular the ones returning all
the vector spaces of a fixed dimension that are contained in a set of
vectors (the corresponding algorithm is the one specified in
[2]).

To learn how to install and use `sboxU`

, see the Documentation of
`sboxU`

.

## 3 On APN Functions

**Work in progress**

The techniques we developed to decompose the Russian S-box have
broader applications. In particular, Aleksei Udovenko had the
(brillant) idea to apply them to the study of the only known *APN*
permutations on 6 bits. This allowed us to obtain the **butterfly
structure**, of which this permutation is a particular case (see
[8]).

With Anne Canteaut and Sébastien Duval, we unfortunately showed that even a first generalization of this structure could not produce APN permutations for \(n > 6\) (see [3]). Worst, with Anne Canteaut and Shizhu Tian, we showed that the same held for an even broader generalization of the initial butterfly structure [4].

## 4 References

- Alex Biryukov, Léo Perrin, and Aleksei Udovenko.
**Reverse-engineering the S-box of streebog, kuznyechik and STRIBOBr1**. In Marc Fischlin and Jean-Sébastien Coron, editors,*Advances in Cryptology – EUROCRYPT 2016, Part I*, volume 9665 of*Lecture Notes in Computer Science*, pages 372–402. Springer, Heidelberg, May 2016. link to eprint.iacr.org. - Xavier Bonnetain, Léo Perrin, and Shizhu Tian.
**Anomalies and Vector Space Search: Tools for S-Box Analysis**. In Steven Galbraith and Shiho Moriai, editors,*Advances in Cryptology – ASIACRYPT 2019, Part I*, volume 11921 of*Lecture Notes in Computer Science*, pages 196–223. Springer, Heidelberg, December 2019. link to eprint.iacr.org. - Anne Canteaut, Sébastien Duval, and Léo Perrin.
**A generalisation of Dillon's APN permutation with the best known differential and nonlinear properties for all fields of size \(2^{4k+2}\)**.*IEEE Transactions on Information Theory*, 63(11):7575–7591, Nov 2017. link to eprint.iacr.org. - Anne Canteaut, Léo Perrin and Shizhu Tian.
**If a Generalised Butterfly is APN then it Operates on 6 Bits**.*Cryptography and Communications*, pp. 1–18, April 2019. link to eprint.iacr.org. - Léo Perrin.
**Partitions in the S-Box of Streebog and Kuznyechik***IACR Transactions on Symmetric Cryptology*, 2019(1), 302–329, 2019. link to tosc.iacr.org. Presentation (by myself): link to youtube. - Léo Perrin and Xavier Bonnetain.
**Russian Style (Lack of) Randomness**.*Symposium sur la sécurité des technologies de l'information et des communications*, 2019. The article webpage contains links to the article itself (in English), the slides (in English), and the talk itself given by Xavier Bonnetain (in French). - Léo Perrin and Aleksei Udovenko.
**Exponential S-boxes: a link between the S-boxes of BelT and Kuznyechik/Streebog**.*IACR Transactions on Symmetric Cryptology*, 2016(2):99–124, 2017. link to tosc.iacr.org. - Léo Perrin, Aleksei Udovenko, and Alex Biryukov.
**Cryptanalysis of a theorem: Decomposing the only known solution to the big APN problem**. In Matthew Robshaw and Jonathan Katz, editors,*Advances in Cryptology – CRYPTO 2016, Part II*, volume 9815 of*Lecture Notes in Computer Science*, pages 93–122. Springer, Heidelberg, August 2016. link to eprint.iacr.org; Presentation (by Aleskei Udovenko): link to youtube.

## Footnotes:

^{1}

The discussion at ISO are held behind closed doors with little to no input from the outside. Such processes are not conducive to the best discussions…

I had been contacted by a country's representative before, when the standardization of Streebog was being considered. My results at the time did not allow me to provide irrefutable proofs that the designers were lying, so I did not insist on stopping its standardization.