# Tutorial on S-box Analysis

## Table of Contents

**Work in progress!**

In order to be able to follow this tutorial, you need to have a recent version of SAGE installed (at least version 8.1); instructions are available here.

In this page, I list several of the functions provided by in the
`crypto.sage`

module that are relevant for the study of S-boxes,
i.e. of functions mapping \(\{ 0,1 \}^m \to \{ 0,1 \}^n\) where \(m\) is a
"small" integer. Such functions can be specified by their lookup
tables (hence the requirement that \(m\) be small) and are very commonly
used in symmetric cryptography to provide non-linearity. In this
context, typical values of \(m\) and \(n\) are \(m=n=4\) or \(m=n=8\). It is
not always the case though: SHA-3 uses \(m=n=5\) and the much older DES
uses \(m=6, n=4\).

Of course, "S-boxes" do not have to be used in ciphers. In fact, the study of Boolean functions, and even more broadly of functions operating over finite fields, can all be interpreted as the study of S-boxes. The algorithmic tools presented below can thus be used in other contexts.

## 1 Studying S-boxes with SAGE

### 1.1 Use of SAGE

There are different ways of using SAGE. It can be used within a session, where you have a prompt and can run commands and small programs one after another. You can also simply use it like a programming language, i.e. write script and run them from the command line. The latter is the approach I use, so it is the one I will describe!

To create a SAGE script, simply create a file called `script.sage`

with the following content:

print "hello world!"

and then run the following from the command line:

sage script.sage

It will then simply print `hello world!`

, as you would expect.

SAGE is a superset of Python so you can use all the power of this programming language within SAGE. Two examples:

- Printing integers.
- To print all the integers between 1 and 10 in the same line, do
line = "" for i in xrange(1, 10+1): line += str(i) + ", " print line

- Constant XOR.
- To XOR the constant
`0x130`

into each integer in \(\{ 1,5,119,10 \}\) and print the results, dofor i in [1,5,119,10]: print i, "XOR 0x130 =", i ^^ 0x130

Note the use of

**two**"^" to denote the XOR unlike, in regular Python. This is because a single one corresponds to exponentiation in SAGE.

### 1.2 Creating S-boxes

#### 1.2.1 Basics

Sage provides a class dedicated to the analysis of S-boxes:
`crypto.sage.sbox.SBox`

. Its documentation is available here. To be
able to use it, you need to *import* it. To do so, add the following
line at the top of your script:

from crypto.sage.sbox import SBox

Objects of the class `SBox`

contain the lookup table of said S-box
along with many useful methods. Some of them are listed below. If you
want to print the lookup table of an `SBox`

instance, you can simply
print it:

# if s is an SBox instance print s

When interacting with S-boxes in practice, it is far more convenient
to intepret them as functions mapping \(\mathbb{Z}/2^m\mathbb{Z}\) to
\(\mathbb{Z}/2^n\mathbb{Z}\) as we can then use integers to represent
their inputs and outputs. That is what `SBox`

does by default.

If you want to access the content of an `SBox`

instance, you can use
brackets as if it where an array or parenthesis as if it where a
function; both work just the same:

# if s is an SBox instance print s[1] # prints the output of s on the input of \{0, ..., 0, 0, 1\} print s(2) # prints the output of s on the input of \{0, ..., 0, 1, 0\}

To see the expected lengths of the input and output, use respectively
`s.m`

and `s.n`

.

**WARNING!** Although an `SBox`

instance behaves pretty much like a
Python `list`

with (many!) additional features, `len(s)`

returns
`s.m`

, i.e. the **bit** length of the input, not the total length of the
underlying list containing the lookup table of the S-box!

#### 1.2.2 From their LUTs

I you know the lookup table of an S-box then it is trivial to initialize an instance. For example, to initialize an S-box containing the S-box of the PRESENT block cipher, simply use:

```
s = SBox([0xC,0x5,0x6,0xB,0x9,0x0,0xA,0xD,0x3,0xE,0xF,0x8,0x4,0x7,0x1,0x2])
```

#### 1.2.3 Getting Real World Ones

During my PhD, I made a list of all the S-boxes that I was aware of,
i.e. of all those that were used in ciphers and hash functions from
the literature. Friedrich Wiemer then picked it up, added many S-boxes
to it from the literature on Boolean functions, and got the result
added into SAGE! It corresponds to `crypto.sage.sboxes`

module. Thus,
to obtain the S-box of PRESENT, we can also use

from crypto.sage.sboxes import PRESENT as s

which will import the `SBox`

object called "PRESENT" and assign it to
the variable `s`

. There are **many** S-boxes in this list. To print it,
use:

print dir(crypto.sage.sboxes)

#### 1.2.4 Finite Field Arithmetic

When doing research in finite fields (of characteristic 2), or when studying S-boxes with a strong algebraic structure, it is convenient to use SAGE's support of finite field arithmetic. To interact with a finite field, we first need to declare a variable containing said finite field. If we want a degree of \(n\), we can use

```
F = GF(2**n)
```

SAGE will automatically choose a primitive polynomial to set it up. To
access it, use `F.modulus()`

. By default, for \(n=8\), the modulus is
\(X^8 + X^4 + X^3 + X^2 + 1\) (it is the smallest primitive polynomial
in lexicographic order with a weight of only 5, which is the minimum).

If you want to specify a particular modulus, adapt the following snippet:

X = GF(2).polynomial_ring().gen() F = GF(2**8, name="a", modulus=X^8 + X^6 + X^5 + X^4 + 1)

the `name`

will be used when printing field elements. It has to be
specified manually when the modulus is imposed.

In order to generate elements of the finite field, two methods are
possible. You can use powers of the primitive element `F.gen()`

or you can use the trivial mapping integers of
\(\mathbb{Z}/2^n\mathbb{Z}\) and \(\mathbb{F}_{2^n}\) defined as
follows. Let \(\alpha\) be a root of the primitive polynomial used to
define the finite field (it is the output of `F.gen()`

). Then there is
a bijection between the following integer, finite field element and
bitstring:
$$
\mathsf{bstr} = (x_0, ..., x_{n-1})~, ~
\mathsf{int} = \sum_{i=0}^{n-1}x_i \times 2^i~, ~
\mathsf{elm} = \sum_{i=0}^{n-1}x_i \times \alpha^i ~,
$$
where \(\mathsf{bstr} \in \{ 0,1 \}^n\), \(\mathsf{int} \in
\mathbb{Z}/2^n\mathbb{Z}\), and \(\mathsf{elm} \in \mathbb{F}_{2^n}\).

This mapping is already supported by SAGE, along with all the basic
arithmetic and Boolean operations. To get the element from the finite
field `F`

that corresponds to the integer `x`

, use
`F.fetch_int(x)`

. To get the integer corresponding to an element `y`

of `F`

, use `y.integer_representation()`

. We then for example have that
`F.gen()==F.fetch_int(2)`

is `True`

.

Let us now create a function that returns and S-box corresponding to the multiplicative inverse in a finite field of degree \(n\). Said function is a power function with exponent \(2^n-2\).

def get_inverse_sbox(n): F = GF(2**n) lut = [] for x in xrange(0, 2**n): x_ff = F.fetch_int(x) # x_ff is an element in the finite field y_ff = x_ff**(2^n-2) # y_ff is the inverse of x_ff lut.append(y_ff.integer_representation()) # we add the integer corresponding to y_ff return SBox(lut)

### 1.3 Table-Based Properties

Several S-box properties that are relevant to cryptographers can be summarized using tables of size \(2^m \times 2^n\). They can of course be generated using SAGE.

#### 1.3.1 Differential

The **Difference Distribution Table (DDT)** captures how good an S-box
\(S : \{ 0,1 \}^n \to \{ 0,1 \}^m\) is at preventing differential
cryptanalysis [1]. It is defined by
$$
\textrm{DDT}_{S}[a, b] ~=~ \#\big\{x \in \{ 0,1 \}^n ~|~ S(x \oplus a) \oplus S(x) = b \big\} ~,
$$
where \(\oplus\) denotes the exclusive OR (what is implemented by `^^`

in SAGE). In SAGE, it is implemented by the
`difference_distribution_table`

method which returns a matrix. For
example, running

from sage.crypto.sboxes import PRESENT as s print s.difference_distribution_table()

yields

[16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [ 0 0 0 4 0 0 0 4 0 4 0 0 0 4 0 0] [ 0 0 0 2 0 4 2 0 0 0 2 0 2 2 2 0] [ 0 2 0 2 2 0 4 2 0 0 2 2 0 0 0 0] [ 0 0 0 0 0 4 2 2 0 2 2 0 2 0 2 0] [ 0 2 0 0 2 0 0 0 0 2 2 2 4 2 0 0] [ 0 0 2 0 0 0 2 0 2 0 0 4 2 0 0 4] [ 0 4 2 0 0 0 2 0 2 0 0 0 2 0 0 4] [ 0 0 0 2 0 0 0 2 0 2 0 4 0 2 0 4] [ 0 0 2 0 4 0 2 0 2 0 0 0 2 0 4 0] [ 0 0 2 2 0 4 0 0 2 0 2 0 0 2 2 0] [ 0 2 0 0 2 0 0 0 4 2 2 2 0 2 0 0] [ 0 0 2 0 0 4 0 2 2 2 2 0 0 0 2 0] [ 0 2 4 2 2 0 0 2 0 0 2 2 0 0 0 0] [ 0 0 2 2 0 0 2 2 2 2 0 0 2 2 0 0] [ 0 4 0 0 4 0 0 0 0 0 0 0 0 0 4 4]

The maximum coefficient for \(a \neq 0\) is the **differential
uniformity** of \(S\). The lower it is, the better. The lowest possible
is 2; a function reaching this bound is called **Almost Perfect
Non-linear (APN)**. The differential uniformity is returned using the
`differential_uniformity()`

method.

#### 1.3.2 Linear

For linear attacks [3], the relevant table is the **Linear
Approximation Table (LAT)**. It is defined by:
$$
\textrm{LAT}_{S}[a, b] ~=~ \sum_{x \in \{0,1\}^n} (-1)^{a \cdot x + b \cdot S(x)} ~,
$$
where "\(\cdot\)" denotes the scalar product: if \(a = (a_0,...,a_{n-1})\)
and \(b = (b_0,...,b_{n-1})\) then \(a \cdot b =
\bigoplus_{i=0}^{n-1}a_ib_i\), so that \(a \cdot b \in \{ 0,1 \}\).

It quantifies how close each **component** of \(S\), i.e. each function \(x
\mapsto c \cdot S(x)\), is to each Boolean linear function. Unlike the
DDT, its coefficients are signed. The maximum absolute value of the
LAT coefficients is called the **linearity**. Like the differential
uniformity, the lower it is the better.

The method returning this table is
`linear_approximation_matrix()`

. It has some optional parameters so
that, by default, it returns a table where the coefficients are half
of those specified above. Running

from sage.crypto.sboxes import PRESENT as s print s.linear_approximation_table()

yields

[ 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [ 0 0 0 0 0 -4 0 -4 0 0 0 0 0 -4 0 4] [ 0 0 2 2 -2 -2 0 0 2 -2 0 4 0 4 -2 2] [ 0 0 2 2 2 -2 -4 0 -2 2 -4 0 0 0 -2 -2] [ 0 0 -2 2 -2 -2 0 4 -2 -2 0 -4 0 0 -2 2] [ 0 0 -2 2 -2 2 0 0 2 2 -4 0 4 0 2 2] [ 0 0 0 -4 0 0 -4 0 0 -4 0 0 4 0 0 0] [ 0 0 0 4 4 0 0 0 0 -4 0 0 0 0 4 0] [ 0 0 2 -2 0 0 -2 2 -2 2 0 0 -2 2 4 4] [ 0 4 -2 -2 0 0 2 -2 -2 -2 -4 0 -2 2 0 0] [ 0 0 4 0 2 2 2 -2 0 0 0 -4 2 2 -2 2] [ 0 -4 0 0 -2 -2 2 -2 -4 0 0 0 2 2 2 -2] [ 0 0 0 0 -2 -2 -2 -2 4 0 0 -4 -2 2 2 -2] [ 0 4 4 0 -2 -2 2 2 0 0 0 0 2 -2 2 -2] [ 0 0 2 2 -4 4 -2 -2 -2 -2 0 0 -2 -2 0 0] [ 0 4 -2 2 0 0 -2 -2 -2 2 4 0 2 2 0 0]

#### 1.3.3 Jackson Pollock Representation

As the DDT and the LAT can be too big to be displayed conveniently, even for \(n=m=4\), I suggested the use of what I called the "Jackson Pollock representation" in [2]. It is simply a picture where each absolute value has a color associated to it, the table then consisting in \(2^n \times 2^m\) pixels. For example, a Pollock representation LAT of the Russian S-box π is given in Figure 1.

Figure 1: The "Jackson Pollock representation" [2] of the LAT of π, the S-Box of the latest Russian standards. It was obtained using the "coolwarm" color scheme. Can you spot the vertical lines?

The color scheme plays a crucial as some patterns may be visible with some color schemes but not with others! A list of all color schemes is available on this page. A handy SAGE function handling the generation of such a picture is given below.

### 1.4 Algebraic

#### 1.4.1 Component Boolean Function

The built-in method `component_function`

returns a component of the
corresponding S-box. More precisely, if `s`

is an S-box instance
with an \(m\)-bit input and if \(c < 2^m\) is an integer then
`s.component_function(c)`

is a `sage.crypto.boolean_function`

instance
corresponding to the function \(x \mapsto c \cdot S(x)\).

#### 1.4.2 ANF

In order to get the algebraic nomal form of a
`sage.crypto.boolean_function`

instance, we can simply use the
built-in `algebraic_normal_form`

function. Thus, to get the ANF of an
S-box, we can use the following function.

def anf(s): result = [] for i in xrange(0, s.m): result.append(s.component_function(1 << i).algebraic_normal_form()) return result

## 2 Useful Functions

The functions given in this section are in practice very useful when studying S-boxes. You can also use them as example of SAGE code.

### 2.1 Jackson Pollock Implementation

The following function can be used to create a file with name
`file_name`

in the current folder that will contain the Pollock
representation of the matrix `mat`

. Note that it is the absolute value
of the coefficients in the matrix that is used, information about the
sign will be lost.

def save_pollock(mat, color_scheme="CMRmap_r", file_name="pollock", vmin=0, vmax=20, folder=None, frame=True, visible_axes=True, colorbar=True, file_type="png"): import matplotlib.pyplot as plt fig, p = plt.subplots(figsize=(15,15)) if isinstance(mat, list): abs_mat = [[abs(mat[i][j]) for j in xrange(0, len(mat[0]))] for i in xrange(0, len(mat))] else: abs_mat = [[abs(mat[i][j]) for j in xrange(0, mat.ncols())] for i in xrange(0, mat.nrows())] axes = p.imshow( abs_mat, interpolation="none", cmap=plt.cm.get_cmap(color_scheme, 100), vmin=vmin, vmax=vmax, ) if colorbar: fig.colorbar(axes, orientation='vertical', fraction=0.046, pad=0.04) p.set_aspect('equal') p.get_xaxis().set_visible(visible_axes) p.get_yaxis().set_visible(visible_axes) p.patch.set_alpha(0) p.set_frame_on(frame) if folder == None: name_base = "{}."+file_type else: name_base = folder + "/{}." + file_type fig.savefig(name_base.format(file_name))

### 2.2 Multiplication by a Matrix

If `x`

is an integer (smaller than \(2^n\)) corresponding to a bitstring
\(x\), and if `M`

is a \(2^m \times 2^n\) binary `Matrix`

instance, then
you can use the following function `apply_bin_mat`

to obtain `y`

such
that `y`

is the integer representation of the bitstring \(y\) such that
\(y= M \times x\).

def tobin(x, n): return [(x >> i) & 1 for i in reversed(xrange(0, n))] def frombin(v): y = 0 for i in xrange(0, len(v)): y = (y << 1) | int(v[i]) return y def apply_bin_mat(x, mat): n = mat.ncols() x = Matrix(GF(2), n, 1, tobin(x, n)) y = mat * x return frombin(y.T[0][:mat.nrows()])

### 2.3 Obtaining a Full Rank Matrix

The following function `get_generating_matrix`

returns a full rank
binary matrix \(M\) mapping \(\{ 0,1 \}^n\) to itself such that \(M \times
[0,0,..,0,1,0,..,0]^T = e_i\), where the only 1 is at position \(i\) and
where the vectors \(e_i\) are linearly independent elements of \(\{ 0,1 \}^n\).

def complete_basis(basis, N): """Returns a list of length N spanning the space 0..2^N-1 which contains the list of integers `basis`. Assumes that the elements of `basis` are linearly independent. """ r = len(basis) for i in xrange(1, 2**N): new_basis = basis + [i] new_r = Matrix(GF(2), len(new_basis), N, [tobin(x, N) for x in new_basis]).rank() if new_r == N: return new_basis elif new_r > r: basis = new_basis r = new_r return [] def get_generating_matrix(basis, N): """Returns an NxN binary matrix M such that M*(1 << i) = basis[i] for all i < len(basis) and such that M has full rank. """ b = complete_basis(basis, N) print b return Matrix(GF(2), N, N, [ [(b[i] >> j) & 1 for j in reversed(xrange(0, N))] for i in reversed(xrange(0, N)) ]).transpose()

## 3 Useful Quantities

The indices of the low contrast columns in the LAT of π (and a basis for the corresponding vector space) are

indices = [0x00,0x1a,0x20,0x3a,0x44,0x5e,0x64,0x7e,0x8a,0x90,0xaa,0xb0,0xce,0xd4,0xee,0xf4] basis = [0x1a,0x20,0x44,0x8a]

## 4 References

- Eli Biham and Adi Shamir.
**Differential cryptanalysis of DES-like cryptosystems.***Journal of Cryptology, Volume 4, issue 1*, pp 3–72, 1991. link to springer.com - Alex Biryukov and Léo Perrin.
**On reverse-engineering S-boxes with hidden design criteria or structure**. In Rosario Gennaro and Matthew J. B. Robshaw, editors,*Advances in Cryptology – CRYPTO 2015, Part I*, volume 9215 of*Lecture Notes in Computer Science*, pages 116–140. Springer, Heidelberg, August 2015. link to eprint.iacr.org. - Mitsuru Matsui
**Linear Cryptanalysis Method for DES Cipher.***Advances in Cryptology — EUROCRYPT'93*, pp 386–397, 1993. link to springer.com