Cosmiq Homepage | Léo Perrin's Homepage | S-box Reverse-Engineering

Tutorial on S-box Analysis

Table of Contents

Work in progress!

In order to be able to follow this tutorial, you need to have a recent version of SAGE installed (at least version 8.1); instructions are available here.

In this page, I list several of the functions provided by in the crypto.sage module that are relevant for the study of S-boxes, i.e. of functions mapping \(\{ 0,1 \}^m \to \{ 0,1 \}^n\) where \(m\) is a "small" integer. Such functions can be specified by their lookup tables (hence the requirement that \(m\) be small) and are very commonly used in symmetric cryptography to provide non-linearity. In this context, typical values of \(m\) and \(n\) are \(m=n=4\) or \(m=n=8\). It is not always the case though: SHA-3 uses \(m=n=5\) and the much older DES uses \(m=6, n=4\).

Of course, "S-boxes" do not have to be used in ciphers. In fact, the study of Boolean functions, and even more broadly of functions operating over finite fields, can all be interpreted as the study of S-boxes. The algorithmic tools presented below can thus be used in other contexts.

1 Studying S-boxes with SAGE

1.1 Use of SAGE

There are different ways of using SAGE. It can be used within a session, where you have a prompt and can run commands and small programs one after another. You can also simply use it like a programming language, i.e. write script and run them from the command line. The latter is the approach I use, so it is the one I will describe!

To create a SAGE script, simply create a file called script.sage with the following content:

print "hello world!"

and then run the following from the command line:

sage script.sage

It will then simply print hello world!, as you would expect.

SAGE is a superset of Python so you can use all the power of this programming language within SAGE. Two examples:

Printing integers.
To print all the integers between 1 and 10 in the same line, do
line = ""
for i in xrange(1, 10+1):
    line += str(i) + ", "
print line
Constant XOR.
To XOR the constant 0x130 into each integer in \(\{ 1,5,119,10 \}\) and print the results, do
for i in [1,5,119,10]:
    print i, "XOR 0x130 =", i ^^ 0x130

Note the use of two "^" to denote the XOR unlike, in regular Python. This is because a single one corresponds to exponentiation in SAGE.

1.2 Creating S-boxes

1.2.1 Basics

Sage provides a class dedicated to the analysis of S-boxes: crypto.sage.sbox.SBox. Its documentation is available here. To be able to use it, you need to import it. To do so, add the following line at the top of your script:

from crypto.sage.sbox import SBox

Objects of the class SBox contain the lookup table of said S-box along with many useful methods. Some of them are listed below. If you want to print the lookup table of an SBox instance, you can simply print it:

# if s is an SBox instance
print s

When interacting with S-boxes in practice, it is far more convenient to intepret them as functions mapping \(\mathbb{Z}/2^m\mathbb{Z}\) to \(\mathbb{Z}/2^n\mathbb{Z}\) as we can then use integers to represent their inputs and outputs. That is what SBox does by default.

If you want to access the content of an SBox instance, you can use brackets as if it where an array or parenthesis as if it where a function; both work just the same:

# if s is an SBox instance
print s[1] # prints the output of s on the input of \{0, ..., 0, 0, 1\}
print s(2) # prints the output of s on the input of \{0, ..., 0, 1, 0\}

To see the expected lengths of the input and output, use respectively s.m and s.n.

WARNING! Although an SBox instance behaves pretty much like a Python list with (many!) additional features, len(s) returns s.m, i.e. the bit length of the input, not the total length of the underlying list containing the lookup table of the S-box!

1.2.2 From their LUTs

I you know the lookup table of an S-box then it is trivial to initialize an instance. For example, to initialize an S-box containing the S-box of the PRESENT block cipher, simply use:

s = SBox([0xC,0x5,0x6,0xB,0x9,0x0,0xA,0xD,0x3,0xE,0xF,0x8,0x4,0x7,0x1,0x2])

1.2.3 Getting Real World Ones

During my PhD, I made a list of all the S-boxes that I was aware of, i.e. of all those that were used in ciphers and hash functions from the literature. Friedrich Wiemer then picked it up, added many S-boxes to it from the literature on Boolean functions, and got the result added into SAGE! It corresponds to crypto.sage.sboxes module. Thus, to obtain the S-box of PRESENT, we can also use

from crypto.sage.sboxes import PRESENT as s

which will import the SBox object called "PRESENT" and assign it to the variable s. There are many S-boxes in this list. To print it, use:

print dir(crypto.sage.sboxes)

1.2.4 Finite Field Arithmetic

When doing research in finite fields (of characteristic 2), or when studying S-boxes with a strong algebraic structure, it is convenient to use SAGE's support of finite field arithmetic. To interact with a finite field, we first need to declare a variable containing said finite field. If we want a degree of \(n\), we can use

F = GF(2**n)

SAGE will automatically choose a primitive polynomial to set it up. To access it, use F.modulus(). By default, for \(n=8\), the modulus is \(X^8 + X^4 + X^3 + X^2 + 1\) (it is the smallest primitive polynomial in lexicographic order with a weight of only 5, which is the minimum).

If you want to specify a particular modulus, adapt the following snippet:

X = GF(2).polynomial_ring().gen()
F = GF(2**8, name="a", modulus=X^8 + X^6 + X^5 + X^4 + 1)

the name will be used when printing field elements. It has to be specified manually when the modulus is imposed.

In order to generate elements of the finite field, two methods are possible. You can use powers of the primitive element F.gen() or you can use the trivial mapping integers of \(\mathbb{Z}/2^n\mathbb{Z}\) and \(\mathbb{F}_{2^n}\) defined as follows. Let \(\alpha\) be a root of the primitive polynomial used to define the finite field (it is the output of F.gen()). Then there is a bijection between the following integer, finite field element and bitstring: $$ \mathsf{bstr} = (x_0, ..., x_{n-1})~, ~ \mathsf{int} = \sum_{i=0}^{n-1}x_i \times 2^i~, ~ \mathsf{elm} = \sum_{i=0}^{n-1}x_i \times \alpha^i ~, $$ where \(\mathsf{bstr} \in \{ 0,1 \}^n\), \(\mathsf{int} \in \mathbb{Z}/2^n\mathbb{Z}\), and \(\mathsf{elm} \in \mathbb{F}_{2^n}\).

This mapping is already supported by SAGE, along with all the basic arithmetic and Boolean operations. To get the element from the finite field F that corresponds to the integer x, use F.fetch_int(x). To get the integer corresponding to an element y of F, use y.integer_representation(). We then for example have that F.gen()==F.fetch_int(2) is True.

Let us now create a function that returns and S-box corresponding to the multiplicative inverse in a finite field of degree \(n\). Said function is a power function with exponent \(2^n-2\).

def get_inverse_sbox(n):
    F = GF(2**n)
    lut = []
    for x in xrange(0, 2**n):
        x_ff = F.fetch_int(x) # x_ff is an element in the finite field
        y_ff = x_ff**(2^n-2)  # y_ff is the inverse of x_ff
        lut.append(y_ff.integer_representation()) # we add the integer corresponding to y_ff
    return SBox(lut)

1.3 Table-Based Properties

Several S-box properties that are relevant to cryptographers can be summarized using tables of size \(2^m \times 2^n\). They can of course be generated using SAGE.

1.3.1 Differential

The Difference Distribution Table (DDT) captures how good an S-box \(S : \{ 0,1 \}^n \to \{ 0,1 \}^m\) is at preventing differential cryptanalysis [1]. It is defined by $$ \textrm{DDT}_{S}[a, b] ~=~ \#\big\{x \in \{ 0,1 \}^n ~|~ S(x \oplus a) \oplus S(x) = b \big\} ~, $$ where \(\oplus\) denotes the exclusive OR (what is implemented by ^^ in SAGE). In SAGE, it is implemented by the difference_distribution_table method which returns a matrix. For example, running

from sage.crypto.sboxes import PRESENT as s
print s.difference_distribution_table()

yields

[16  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
[ 0  0  0  4  0  0  0  4  0  4  0  0  0  4  0  0]
[ 0  0  0  2  0  4  2  0  0  0  2  0  2  2  2  0]
[ 0  2  0  2  2  0  4  2  0  0  2  2  0  0  0  0]
[ 0  0  0  0  0  4  2  2  0  2  2  0  2  0  2  0]
[ 0  2  0  0  2  0  0  0  0  2  2  2  4  2  0  0]
[ 0  0  2  0  0  0  2  0  2  0  0  4  2  0  0  4]
[ 0  4  2  0  0  0  2  0  2  0  0  0  2  0  0  4]
[ 0  0  0  2  0  0  0  2  0  2  0  4  0  2  0  4]
[ 0  0  2  0  4  0  2  0  2  0  0  0  2  0  4  0]
[ 0  0  2  2  0  4  0  0  2  0  2  0  0  2  2  0]
[ 0  2  0  0  2  0  0  0  4  2  2  2  0  2  0  0]
[ 0  0  2  0  0  4  0  2  2  2  2  0  0  0  2  0]
[ 0  2  4  2  2  0  0  2  0  0  2  2  0  0  0  0]
[ 0  0  2  2  0  0  2  2  2  2  0  0  2  2  0  0]
[ 0  4  0  0  4  0  0  0  0  0  0  0  0  0  4  4]

The maximum coefficient for \(a \neq 0\) is the differential uniformity of \(S\). The lower it is, the better. The lowest possible is 2; a function reaching this bound is called Almost Perfect Non-linear (APN). The differential uniformity is returned using the differential_uniformity() method.

1.3.2 Linear

For linear attacks [3], the relevant table is the Linear Approximation Table (LAT). It is defined by: $$ \textrm{LAT}_{S}[a, b] ~=~ \sum_{x \in \{0,1\}^n} (-1)^{a \cdot x + b \cdot S(x)} ~, $$ where "\(\cdot\)" denotes the scalar product: if \(a = (a_0,...,a_{n-1})\) and \(b = (b_0,...,b_{n-1})\) then \(a \cdot b = \bigoplus_{i=0}^{n-1}a_ib_i\), so that \(a \cdot b \in \{ 0,1 \}\).

It quantifies how close each component of \(S\), i.e. each function \(x \mapsto c \cdot S(x)\), is to each Boolean linear function. Unlike the DDT, its coefficients are signed. The maximum absolute value of the LAT coefficients is called the linearity. Like the differential uniformity, the lower it is the better.

The method returning this table is linear_approximation_matrix(). It has some optional parameters so that, by default, it returns a table where the coefficients are half of those specified above. Running

from sage.crypto.sboxes import PRESENT as s
print s.linear_approximation_table()

yields

[ 8  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
[ 0  0  0  0  0 -4  0 -4  0  0  0  0  0 -4  0  4]
[ 0  0  2  2 -2 -2  0  0  2 -2  0  4  0  4 -2  2]
[ 0  0  2  2  2 -2 -4  0 -2  2 -4  0  0  0 -2 -2]
[ 0  0 -2  2 -2 -2  0  4 -2 -2  0 -4  0  0 -2  2]
[ 0  0 -2  2 -2  2  0  0  2  2 -4  0  4  0  2  2]
[ 0  0  0 -4  0  0 -4  0  0 -4  0  0  4  0  0  0]
[ 0  0  0  4  4  0  0  0  0 -4  0  0  0  0  4  0]
[ 0  0  2 -2  0  0 -2  2 -2  2  0  0 -2  2  4  4]
[ 0  4 -2 -2  0  0  2 -2 -2 -2 -4  0 -2  2  0  0]
[ 0  0  4  0  2  2  2 -2  0  0  0 -4  2  2 -2  2]
[ 0 -4  0  0 -2 -2  2 -2 -4  0  0  0  2  2  2 -2]
[ 0  0  0  0 -2 -2 -2 -2  4  0  0 -4 -2  2  2 -2]
[ 0  4  4  0 -2 -2  2  2  0  0  0  0  2 -2  2 -2]
[ 0  0  2  2 -4  4 -2 -2 -2 -2  0  0 -2 -2  0  0]
[ 0  4 -2  2  0  0 -2 -2 -2  2  4  0  2  2  0  0]

1.3.3 Jackson Pollock Representation

As the DDT and the LAT can be too big to be displayed conveniently, even for \(n=m=4\), I suggested the use of what I called the "Jackson Pollock representation" in [2]. It is simply a picture where each absolute value has a color associated to it, the table then consisting in \(2^n \times 2^m\) pixels. For example, a Pollock representation LAT of the Russian S-box π is given in Figure 1.

Stribog.png

Figure 1: The "Jackson Pollock representation" [2] of the LAT of π, the S-Box of the latest Russian standards. It was obtained using the "coolwarm" color scheme. Can you spot the vertical lines?

The color scheme plays a crucial as some patterns may be visible with some color schemes but not with others! A list of all color schemes is available on this page. A handy SAGE function handling the generation of such a picture is given below.

1.4 Algebraic

1.4.1 Component Boolean Function

The built-in method component_function returns a component of the corresponding S-box. More precisely, if s is an S-box instance with an \(m\)-bit input and if \(c < 2^m\) is an integer then s.component_function(c) is a sage.crypto.boolean_function instance corresponding to the function \(x \mapsto c \cdot S(x)\).

1.4.2 ANF

In order to get the algebraic nomal form of a sage.crypto.boolean_function instance, we can simply use the built-in algebraic_normal_form function. Thus, to get the ANF of an S-box, we can use the following function.

def anf(s):
    result = []
    for i in xrange(0, s.m):
        result.append(s.component_function(1 << i).algebraic_normal_form())
    return result

2 Useful Functions

The functions given in this section are in practice very useful when studying S-boxes. You can also use them as example of SAGE code.

2.1 Jackson Pollock Implementation

The following function can be used to create a file with name file_name in the current folder that will contain the Pollock representation of the matrix mat. Note that it is the absolute value of the coefficients in the matrix that is used, information about the sign will be lost.

def save_pollock(mat,
                 color_scheme="CMRmap_r",
                 file_name="pollock",
                 vmin=0,
                 vmax=20,
                 folder=None,
                 frame=True,
                 visible_axes=True,
                 colorbar=True,
                 file_type="png"):
    import matplotlib.pyplot as plt
    fig, p = plt.subplots(figsize=(15,15))
    if isinstance(mat, list):
        abs_mat = [[abs(mat[i][j]) for j in xrange(0, len(mat[0]))]
                   for i in xrange(0, len(mat))]
    else:
        abs_mat = [[abs(mat[i][j]) for j in xrange(0, mat.ncols())]
                   for i in xrange(0, mat.nrows())]
    axes = p.imshow(
        abs_mat,
        interpolation="none",
        cmap=plt.cm.get_cmap(color_scheme, 100),
        vmin=vmin,
        vmax=vmax,
    )
    if colorbar:
        fig.colorbar(axes, orientation='vertical', fraction=0.046, pad=0.04)
    p.set_aspect('equal')
    p.get_xaxis().set_visible(visible_axes)
    p.get_yaxis().set_visible(visible_axes)
    p.patch.set_alpha(0)
    p.set_frame_on(frame)
    if folder == None:
        name_base = "{}."+file_type
    else:
        name_base = folder + "/{}." + file_type
    fig.savefig(name_base.format(file_name))

2.2 Multiplication by a Matrix

If x is an integer (smaller than \(2^n\)) corresponding to a bitstring \(x\), and if M is a \(2^m \times 2^n\) binary Matrix instance, then you can use the following function apply_bin_mat to obtain y such that y is the integer representation of the bitstring \(y\) such that \(y= M \times x\).

def tobin(x, n):
    return [(x >> i) & 1 for i in reversed(xrange(0, n))]

def frombin(v):
    y = 0
    for i in xrange(0, len(v)):
        y = (y << 1) | int(v[i])
    return y

def apply_bin_mat(x, mat):
    n = mat.ncols()
    x = Matrix(GF(2), n, 1, tobin(x, n))
    y = mat * x
    return frombin(y.T[0][:mat.nrows()])

2.3 Obtaining a Full Rank Matrix

The following function get_generating_matrix returns a full rank binary matrix \(M\) mapping \(\{ 0,1 \}^n\) to itself such that \(M \times [0,0,..,0,1,0,..,0]^T = e_i\), where the only 1 is at position \(i\) and where the vectors \(e_i\) are linearly independent elements of \(\{ 0,1 \}^n\).

def complete_basis(basis, N):
    """Returns a list of length N spanning the space 0..2^N-1 which
    contains the list of integers `basis`. Assumes that the elements
    of `basis` are linearly independent.

    """
    r = len(basis)
    for i in xrange(1, 2**N):
        new_basis = basis + [i]
        new_r = Matrix(GF(2), len(new_basis), N, [tobin(x, N) for x in new_basis]).rank()
        if new_r == N:
            return new_basis
        elif new_r > r:
            basis = new_basis
            r = new_r
    return []


def get_generating_matrix(basis, N):
    """Returns an NxN binary matrix M such that M*(1 << i) = basis[i] for
    all i < len(basis) and such that M has full rank.

    """
    b = complete_basis(basis, N)
    print b
    return Matrix(GF(2), N, N, [
        [(b[i] >> j) & 1 for j in reversed(xrange(0, N))]
        for i in reversed(xrange(0, N))
    ]).transpose()

3 Useful Quantities

The indices of the low contrast columns in the LAT of π (and a basis for the corresponding vector space) are

indices = [0x00,0x1a,0x20,0x3a,0x44,0x5e,0x64,0x7e,0x8a,0x90,0xaa,0xb0,0xce,0xd4,0xee,0xf4]
basis   = [0x1a,0x20,0x44,0x8a]

4 References

  1. Eli Biham and Adi Shamir. Differential cryptanalysis of DES-like cryptosystems. Journal of Cryptology, Volume 4, issue 1, pp 3–72, 1991. link to springer.com
  2. Alex Biryukov and Léo Perrin. On reverse-engineering S-boxes with hidden design criteria or structure. In Rosario Gennaro and Matthew J. B. Robshaw, editors, Advances in Cryptology – CRYPTO 2015, Part I, volume 9215 of Lecture Notes in Computer Science, pages 116–140. Springer, Heidelberg, August 2015. link to eprint.iacr.org.
  3. Mitsuru Matsui Linear Cryptanalysis Method for DES Cipher. Advances in Cryptology — EUROCRYPT'93, pp 386–397, 1993. link to springer.com

Last Update (by me): 26/02/2020