Tutorial on S-box Analysis
Table of Contents
Work in progress!
In order to be able to follow this tutorial, you need to have a recent version of SAGE installed (at least version 8.1); instructions are available here.
In this page, I list several of the functions provided by in the
crypto.sage
module that are relevant for the study of S-boxes,
i.e. of functions mapping \(\{ 0,1 \}^m \to \{ 0,1 \}^n\) where \(m\) is a
"small" integer. Such functions can be specified by their lookup
tables (hence the requirement that \(m\) be small) and are very commonly
used in symmetric cryptography to provide non-linearity. In this
context, typical values of \(m\) and \(n\) are \(m=n=4\) or \(m=n=8\). It is
not always the case though: SHA-3 uses \(m=n=5\) and the much older DES
uses \(m=6, n=4\).
Of course, "S-boxes" do not have to be used in ciphers. In fact, the study of Boolean functions, and even more broadly of functions operating over finite fields, can all be interpreted as the study of S-boxes. The algorithmic tools presented below can thus be used in other contexts.
1 Studying S-boxes with SAGE
1.1 Use of SAGE
There are different ways of using SAGE. It can be used within a session, where you have a prompt and can run commands and small programs one after another. You can also simply use it like a programming language, i.e. write script and run them from the command line. The latter is the approach I use, so it is the one I will describe!
To create a SAGE script, simply create a file called script.sage
with the following content:
print "hello world!"
and then run the following from the command line:
sage script.sage
It will then simply print hello world!
, as you would expect.
SAGE is a superset of Python so you can use all the power of this programming language within SAGE. Two examples:
- Printing integers.
- To print all the integers between 1 and 10 in the same line, do
line = "" for i in xrange(1, 10+1): line += str(i) + ", " print line
- Constant XOR.
- To XOR the constant
0x130
into each integer in \(\{ 1,5,119,10 \}\) and print the results, dofor i in [1,5,119,10]: print i, "XOR 0x130 =", i ^^ 0x130
Note the use of two "^" to denote the XOR unlike, in regular Python. This is because a single one corresponds to exponentiation in SAGE.
1.2 Creating S-boxes
1.2.1 Basics
Sage provides a class dedicated to the analysis of S-boxes:
crypto.sage.sbox.SBox
. Its documentation is available here. To be
able to use it, you need to import it. To do so, add the following
line at the top of your script:
from crypto.sage.sbox import SBox
Objects of the class SBox
contain the lookup table of said S-box
along with many useful methods. Some of them are listed below. If you
want to print the lookup table of an SBox
instance, you can simply
print it:
# if s is an SBox instance print s
When interacting with S-boxes in practice, it is far more convenient
to intepret them as functions mapping \(\mathbb{Z}/2^m\mathbb{Z}\) to
\(\mathbb{Z}/2^n\mathbb{Z}\) as we can then use integers to represent
their inputs and outputs. That is what SBox
does by default.
If you want to access the content of an SBox
instance, you can use
brackets as if it where an array or parenthesis as if it where a
function; both work just the same:
# if s is an SBox instance print s[1] # prints the output of s on the input of \{0, ..., 0, 0, 1\} print s(2) # prints the output of s on the input of \{0, ..., 0, 1, 0\}
To see the expected lengths of the input and output, use respectively
s.m
and s.n
.
WARNING! Although an SBox
instance behaves pretty much like a
Python list
with (many!) additional features, len(s)
returns
s.m
, i.e. the bit length of the input, not the total length of the
underlying list containing the lookup table of the S-box!
1.2.2 From their LUTs
I you know the lookup table of an S-box then it is trivial to initialize an instance. For example, to initialize an S-box containing the S-box of the PRESENT block cipher, simply use:
s = SBox([0xC,0x5,0x6,0xB,0x9,0x0,0xA,0xD,0x3,0xE,0xF,0x8,0x4,0x7,0x1,0x2])
1.2.3 Getting Real World Ones
During my PhD, I made a list of all the S-boxes that I was aware of,
i.e. of all those that were used in ciphers and hash functions from
the literature. Friedrich Wiemer then picked it up, added many S-boxes
to it from the literature on Boolean functions, and got the result
added into SAGE! It corresponds to crypto.sage.sboxes
module. Thus,
to obtain the S-box of PRESENT, we can also use
from crypto.sage.sboxes import PRESENT as s
which will import the SBox
object called "PRESENT" and assign it to
the variable s
. There are many S-boxes in this list. To print it,
use:
print dir(crypto.sage.sboxes)
1.2.4 Finite Field Arithmetic
When doing research in finite fields (of characteristic 2), or when studying S-boxes with a strong algebraic structure, it is convenient to use SAGE's support of finite field arithmetic. To interact with a finite field, we first need to declare a variable containing said finite field. If we want a degree of \(n\), we can use
F = GF(2**n)
SAGE will automatically choose a primitive polynomial to set it up. To
access it, use F.modulus()
. By default, for \(n=8\), the modulus is
\(X^8 + X^4 + X^3 + X^2 + 1\) (it is the smallest primitive polynomial
in lexicographic order with a weight of only 5, which is the minimum).
If you want to specify a particular modulus, adapt the following snippet:
X = GF(2).polynomial_ring().gen() F = GF(2**8, name="a", modulus=X^8 + X^6 + X^5 + X^4 + 1)
the name
will be used when printing field elements. It has to be
specified manually when the modulus is imposed.
In order to generate elements of the finite field, two methods are
possible. You can use powers of the primitive element F.gen()
or you can use the trivial mapping integers of
\(\mathbb{Z}/2^n\mathbb{Z}\) and \(\mathbb{F}_{2^n}\) defined as
follows. Let \(\alpha\) be a root of the primitive polynomial used to
define the finite field (it is the output of F.gen()
). Then there is
a bijection between the following integer, finite field element and
bitstring:
$$
\mathsf{bstr} = (x_0, ..., x_{n-1})~, ~
\mathsf{int} = \sum_{i=0}^{n-1}x_i \times 2^i~, ~
\mathsf{elm} = \sum_{i=0}^{n-1}x_i \times \alpha^i ~,
$$
where \(\mathsf{bstr} \in \{ 0,1 \}^n\), \(\mathsf{int} \in
\mathbb{Z}/2^n\mathbb{Z}\), and \(\mathsf{elm} \in \mathbb{F}_{2^n}\).
This mapping is already supported by SAGE, along with all the basic
arithmetic and Boolean operations. To get the element from the finite
field F
that corresponds to the integer x
, use
F.fetch_int(x)
. To get the integer corresponding to an element y
of F
, use y.integer_representation()
. We then for example have that
F.gen()==F.fetch_int(2)
is True
.
Let us now create a function that returns and S-box corresponding to the multiplicative inverse in a finite field of degree \(n\). Said function is a power function with exponent \(2^n-2\).
def get_inverse_sbox(n): F = GF(2**n) lut = [] for x in xrange(0, 2**n): x_ff = F.fetch_int(x) # x_ff is an element in the finite field y_ff = x_ff**(2^n-2) # y_ff is the inverse of x_ff lut.append(y_ff.integer_representation()) # we add the integer corresponding to y_ff return SBox(lut)
1.3 Table-Based Properties
Several S-box properties that are relevant to cryptographers can be summarized using tables of size \(2^m \times 2^n\). They can of course be generated using SAGE.
1.3.1 Differential
The Difference Distribution Table (DDT) captures how good an S-box
\(S : \{ 0,1 \}^n \to \{ 0,1 \}^m\) is at preventing differential
cryptanalysis [1]. It is defined by
$$
\textrm{DDT}_{S}[a, b] ~=~ \#\big\{x \in \{ 0,1 \}^n ~|~ S(x \oplus a) \oplus S(x) = b \big\} ~,
$$
where \(\oplus\) denotes the exclusive OR (what is implemented by ^^
in SAGE). In SAGE, it is implemented by the
difference_distribution_table
method which returns a matrix. For
example, running
from sage.crypto.sboxes import PRESENT as s print s.difference_distribution_table()
yields
[16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [ 0 0 0 4 0 0 0 4 0 4 0 0 0 4 0 0] [ 0 0 0 2 0 4 2 0 0 0 2 0 2 2 2 0] [ 0 2 0 2 2 0 4 2 0 0 2 2 0 0 0 0] [ 0 0 0 0 0 4 2 2 0 2 2 0 2 0 2 0] [ 0 2 0 0 2 0 0 0 0 2 2 2 4 2 0 0] [ 0 0 2 0 0 0 2 0 2 0 0 4 2 0 0 4] [ 0 4 2 0 0 0 2 0 2 0 0 0 2 0 0 4] [ 0 0 0 2 0 0 0 2 0 2 0 4 0 2 0 4] [ 0 0 2 0 4 0 2 0 2 0 0 0 2 0 4 0] [ 0 0 2 2 0 4 0 0 2 0 2 0 0 2 2 0] [ 0 2 0 0 2 0 0 0 4 2 2 2 0 2 0 0] [ 0 0 2 0 0 4 0 2 2 2 2 0 0 0 2 0] [ 0 2 4 2 2 0 0 2 0 0 2 2 0 0 0 0] [ 0 0 2 2 0 0 2 2 2 2 0 0 2 2 0 0] [ 0 4 0 0 4 0 0 0 0 0 0 0 0 0 4 4]
The maximum coefficient for \(a \neq 0\) is the differential
uniformity of \(S\). The lower it is, the better. The lowest possible
is 2; a function reaching this bound is called Almost Perfect
Non-linear (APN). The differential uniformity is returned using the
differential_uniformity()
method.
1.3.2 Linear
For linear attacks [3], the relevant table is the Linear Approximation Table (LAT). It is defined by: $$ \textrm{LAT}_{S}[a, b] ~=~ \sum_{x \in \{0,1\}^n} (-1)^{a \cdot x + b \cdot S(x)} ~, $$ where "\(\cdot\)" denotes the scalar product: if \(a = (a_0,...,a_{n-1})\) and \(b = (b_0,...,b_{n-1})\) then \(a \cdot b = \bigoplus_{i=0}^{n-1}a_ib_i\), so that \(a \cdot b \in \{ 0,1 \}\).
It quantifies how close each component of \(S\), i.e. each function \(x \mapsto c \cdot S(x)\), is to each Boolean linear function. Unlike the DDT, its coefficients are signed. The maximum absolute value of the LAT coefficients is called the linearity. Like the differential uniformity, the lower it is the better.
The method returning this table is
linear_approximation_matrix()
. It has some optional parameters so
that, by default, it returns a table where the coefficients are half
of those specified above. Running
from sage.crypto.sboxes import PRESENT as s print s.linear_approximation_table()
yields
[ 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [ 0 0 0 0 0 -4 0 -4 0 0 0 0 0 -4 0 4] [ 0 0 2 2 -2 -2 0 0 2 -2 0 4 0 4 -2 2] [ 0 0 2 2 2 -2 -4 0 -2 2 -4 0 0 0 -2 -2] [ 0 0 -2 2 -2 -2 0 4 -2 -2 0 -4 0 0 -2 2] [ 0 0 -2 2 -2 2 0 0 2 2 -4 0 4 0 2 2] [ 0 0 0 -4 0 0 -4 0 0 -4 0 0 4 0 0 0] [ 0 0 0 4 4 0 0 0 0 -4 0 0 0 0 4 0] [ 0 0 2 -2 0 0 -2 2 -2 2 0 0 -2 2 4 4] [ 0 4 -2 -2 0 0 2 -2 -2 -2 -4 0 -2 2 0 0] [ 0 0 4 0 2 2 2 -2 0 0 0 -4 2 2 -2 2] [ 0 -4 0 0 -2 -2 2 -2 -4 0 0 0 2 2 2 -2] [ 0 0 0 0 -2 -2 -2 -2 4 0 0 -4 -2 2 2 -2] [ 0 4 4 0 -2 -2 2 2 0 0 0 0 2 -2 2 -2] [ 0 0 2 2 -4 4 -2 -2 -2 -2 0 0 -2 -2 0 0] [ 0 4 -2 2 0 0 -2 -2 -2 2 4 0 2 2 0 0]
1.3.3 Jackson Pollock Representation
As the DDT and the LAT can be too big to be displayed conveniently, even for \(n=m=4\), I suggested the use of what I called the "Jackson Pollock representation" in [2]. It is simply a picture where each absolute value has a color associated to it, the table then consisting in \(2^n \times 2^m\) pixels. For example, a Pollock representation LAT of the Russian S-box π is given in Figure 1.
Figure 1: The "Jackson Pollock representation" [2] of the LAT of π, the S-Box of the latest Russian standards. It was obtained using the "coolwarm" color scheme. Can you spot the vertical lines?
The color scheme plays a crucial as some patterns may be visible with some color schemes but not with others! A list of all color schemes is available on this page. A handy SAGE function handling the generation of such a picture is given below.
1.4 Algebraic
1.4.1 Component Boolean Function
The built-in method component_function
returns a component of the
corresponding S-box. More precisely, if s
is an S-box instance
with an \(m\)-bit input and if \(c < 2^m\) is an integer then
s.component_function(c)
is a sage.crypto.boolean_function
instance
corresponding to the function \(x \mapsto c \cdot S(x)\).
1.4.2 ANF
In order to get the algebraic nomal form of a
sage.crypto.boolean_function
instance, we can simply use the
built-in algebraic_normal_form
function. Thus, to get the ANF of an
S-box, we can use the following function.
def anf(s): result = [] for i in xrange(0, s.m): result.append(s.component_function(1 << i).algebraic_normal_form()) return result
2 Useful Functions
The functions given in this section are in practice very useful when studying S-boxes. You can also use them as example of SAGE code.
2.1 Jackson Pollock Implementation
The following function can be used to create a file with name
file_name
in the current folder that will contain the Pollock
representation of the matrix mat
. Note that it is the absolute value
of the coefficients in the matrix that is used, information about the
sign will be lost.
def save_pollock(mat, color_scheme="CMRmap_r", file_name="pollock", vmin=0, vmax=20, folder=None, frame=True, visible_axes=True, colorbar=True, file_type="png"): import matplotlib.pyplot as plt fig, p = plt.subplots(figsize=(15,15)) if isinstance(mat, list): abs_mat = [[abs(mat[i][j]) for j in xrange(0, len(mat[0]))] for i in xrange(0, len(mat))] else: abs_mat = [[abs(mat[i][j]) for j in xrange(0, mat.ncols())] for i in xrange(0, mat.nrows())] axes = p.imshow( abs_mat, interpolation="none", cmap=plt.cm.get_cmap(color_scheme, 100), vmin=vmin, vmax=vmax, ) if colorbar: fig.colorbar(axes, orientation='vertical', fraction=0.046, pad=0.04) p.set_aspect('equal') p.get_xaxis().set_visible(visible_axes) p.get_yaxis().set_visible(visible_axes) p.patch.set_alpha(0) p.set_frame_on(frame) if folder == None: name_base = "{}."+file_type else: name_base = folder + "/{}." + file_type fig.savefig(name_base.format(file_name))
2.2 Multiplication by a Matrix
If x
is an integer (smaller than \(2^n\)) corresponding to a bitstring
\(x\), and if M
is a \(2^m \times 2^n\) binary Matrix
instance, then
you can use the following function apply_bin_mat
to obtain y
such
that y
is the integer representation of the bitstring \(y\) such that
\(y= M \times x\).
def tobin(x, n): return [(x >> i) & 1 for i in reversed(xrange(0, n))] def frombin(v): y = 0 for i in xrange(0, len(v)): y = (y << 1) | int(v[i]) return y def apply_bin_mat(x, mat): n = mat.ncols() x = Matrix(GF(2), n, 1, tobin(x, n)) y = mat * x return frombin(y.T[0][:mat.nrows()])
2.3 Obtaining a Full Rank Matrix
The following function get_generating_matrix
returns a full rank
binary matrix \(M\) mapping \(\{ 0,1 \}^n\) to itself such that \(M \times
[0,0,..,0,1,0,..,0]^T = e_i\), where the only 1 is at position \(i\) and
where the vectors \(e_i\) are linearly independent elements of \(\{ 0,1 \}^n\).
def complete_basis(basis, N): """Returns a list of length N spanning the space 0..2^N-1 which contains the list of integers `basis`. Assumes that the elements of `basis` are linearly independent. """ r = len(basis) for i in xrange(1, 2**N): new_basis = basis + [i] new_r = Matrix(GF(2), len(new_basis), N, [tobin(x, N) for x in new_basis]).rank() if new_r == N: return new_basis elif new_r > r: basis = new_basis r = new_r return [] def get_generating_matrix(basis, N): """Returns an NxN binary matrix M such that M*(1 << i) = basis[i] for all i < len(basis) and such that M has full rank. """ b = complete_basis(basis, N) print b return Matrix(GF(2), N, N, [ [(b[i] >> j) & 1 for j in reversed(xrange(0, N))] for i in reversed(xrange(0, N)) ]).transpose()
3 Useful Quantities
The indices of the low contrast columns in the LAT of π (and a basis for the corresponding vector space) are
indices = [0x00,0x1a,0x20,0x3a,0x44,0x5e,0x64,0x7e,0x8a,0x90,0xaa,0xb0,0xce,0xd4,0xee,0xf4] basis = [0x1a,0x20,0x44,0x8a]
4 References
- Eli Biham and Adi Shamir. Differential cryptanalysis of DES-like cryptosystems. Journal of Cryptology, Volume 4, issue 1, pp 3–72, 1991. link to springer.com
- Alex Biryukov and Léo Perrin. On reverse-engineering S-boxes with hidden design criteria or structure. In Rosario Gennaro and Matthew J. B. Robshaw, editors, Advances in Cryptology – CRYPTO 2015, Part I, volume 9215 of Lecture Notes in Computer Science, pages 116–140. Springer, Heidelberg, August 2015. link to eprint.iacr.org.
- Mitsuru Matsui Linear Cryptanalysis Method for DES Cipher. Advances in Cryptology — EUROCRYPT'93, pp 386–397, 1993. link to springer.com