Coccinelle examples for Outreachy 2025

Coccinelle is a program matching and transformation engine which provides the language SmPL (Semantic Patch Language) for specifying desired matches and transformations in C code. Coccinelle makes it possible to search and transform code at a large scale, based on patterns written in the patch-like SmPL language. You can find out more about Coccinelle on its web page. Coccinelle is available in various Linux distributions, on its web page, and in github. Please report and problems encountered in installing or using Coccinelle.

Coccinelle can be used via the following command line:

spatch foo.cocci ~/linux/drivers/staging --very-quiet

Omitting the --very-quiet argument gives a list of the files as they are treated. Multiple CPUs can be used with the argument -j N, where N is the number of cores to use. Many other options can be found with --help. The resulting changes comes out as a diff on standard output. All other informational messages are placed on standard error.

Here are three examples of semantic patches that find some things in Linux kernel drivers/staging code. Just as for checkpatch, it is the responsability of the user to check that each change actually represents an improvement to the code and that the generated code compiles and maintains correctness at runtime.

Using boolean values

The C language itself does not offer boolean values. Instead 0 is used for false and anything else is true. But the Linux kernel defines a bool type with the values true and false. This semantic patch finds cases where true and false are used, and then suggests that other uses of the same location that use 1 and 0 should use true and false instead. There is no check that the location actually has bool type. It may be desirable to update that as well if true and false are going to be used.

@initialize:ocaml@
@@

let same_function p q =
  (List.hd p).Coccilib.current_element = (List.hd q).Coccilib.current_element

@r@
expression e;
position p;
symbol true,false;
@@

e =@p \(true\|false\)

@@
expression r.e;
position q : script:ocaml(r.p) { same_function p q };
@@

e =@q
(
- 1
+ true
|
- 0
+ false
)

@s@
type T;
T e;
identifier fld;
symbol true,false;
@@

e.fld = \(true\|false\)

@@
type s.T;
T e;
identifier s.fld;
@@

e.fld =
(
- 1
+ true
|
- 0
+ false
)

Avoiding unnecessary cleanup calls on NULL values

The C language does not offer exceptions, so C code has to deal with exceptional situations in an ad hoc way. A typical strategy is to jump to a label at the end of the function that contains various cleanup code. As exceptional situations can occur at multiple places in a function, different sequences of cleanup operations may be needed in different circumstances. In the Linux kernel, it is discouraged to have a single "out", "exit", or "error" label, but rather there should be a different label for each possible sequence of cleanup code. Typically, the cleanup code needed increases as execution moves through the function, in which case the tails of the sequences of cleanup operations can be shared. This leads to a sequence of labels at the end of the function, according to which cleanup operations are needed.

The following semantic patch detects cases where there is a jump to some cleanup code where some value has been detected as being NULL, expected to represent some kind of failed allocation. Many Linux kernel freeing functions now tolerate NULL as an argument, so a free of NULL at the jump destination typically does not cause a crash, but relying on this property is sloppy, and the code may need to be backported older kernels that might not be so forgiving.

This semantic patch doesn't actually make a change, because the change is typically heavily dependent on the context. Instead, it uses *, which causes Coccinelle to highlight the matched line with a -. This should not be interpreted as a suggestion to remove the line, only to study it further.

A more modern way to handle the cleanup issue is often provided by devm functions and the facilities of scope-based cleanup. The internship project will focus in part on using the latter. But for now one can just reorganize the labels, to less drastically change the code and to become more familiar with the problem.

@@
expression x;
identifier f;
@@

* if (x == NULL)
{
   ...
   f(x);
   ...
   return ...;
}

Simplifying local-variable initializations

Often a local variable will be declared and then immediately initialized in some simple way. In that case, it can be more concise to integrate the initialization into the variable declaration. Of course, this requires that none of the code executed between the declaration site and the initialization site is necessary to do the initialization. This semantic patch checks for some undesirable cases: when the initial value depends on some variable declared along the way, when the initial value is immediately tested, and when the expression describing the initial value is very long. If none of these checks for undesirable behavior succeeds, this semantic patch pulls the initial value into the declaration. The semantic patch contains three copies of the same sequence of rules, to perform this operation for up to three variables.

Some judgement is needed about whether the proposed change is actually desirable. In some cases, it may be more readable to leave the code as is. In other cases, the basic idea of the transformation may be desirable, but the resulting code may needed some rewriting.

@initialize:ocaml@
@@

let bigcode p =
  let p = List.hd p in
  (p.Coccilib.line <> p.Coccilib.line_end) ||
  (p.Coccilib.col_end - p.Coccilib.col >= 60) 

@r1@
type T;
identifier x;
position p;
@@

T@p x;

@badr1@
type T,T2;
identifier r1.x,y;
statement S1,S2;
position r1.p;
@@

T@p x;
... when != S1
T2 y;
... when != S2
x = <+... y ...+>;

@badr1a@
type T;
expression e,f;
identifier r1.x;
statement S,S1,S2;
position r1.p;
binary operator op;
@@

T@p x;
... when != S
x = e;
if (<+...\(x == NULL\|x != NULL\|x op 0\|f(...,x,...)\)...+>) S1 else S2

@badr1b@
type T;
expression e;
identifier r1.x;
statement S;
position r1.p;
position q : script:ocaml() { bigcode q };
@@

T@p x;
... when != S
x = e@q;

@depends on !badr1 && !badr1a && !badr1b@
type T;
identifier r1.x;
expression e;
statement S;
position r1.p;
@@

T@p x
+ = e
  ;
... when != S
- x = e;

// ---------------------------

@r2@
type T;
identifier x;
position p;
@@

T@p x;

@badr2@
type T,T2;
identifier r2.x,y;
statement S1,S2;
position r2.p;
@@

T@p x;
... when != S1
T2 y;
... when != S2
x = <+... y ...+>;

@badr2a@
type T;
expression e,f;
identifier r2.x;
statement S,S1,S2;
position r2.p;
binary operator op;
@@

T@p x;
... when != S
x = e;
if (<+...\(x == NULL\|x != NULL\|x op 0\|f(...,x,...)\)...+>) S1 else S2

@badr2b@
type T;
expression e;
identifier r2.x;
statement S;
position r2.p;
position q : script:ocaml() { bigcode q };
@@

T@p x;
... when != S
x = e@q;

@depends on !badr2 && !badr2a && !badr2b@
type T;
identifier r2.x;
expression e;
statement S;
position r2.p;
@@

T@p x
+ = e
  ;
... when != S
- x = e;

// ---------------------------

@r3@
type T;
identifier x;
position p;
@@

T@p x;

@badr3@
type T,T2;
identifier r3.x,y;
statement S1,S2;
position r3.p;
@@

T@p x;
... when != S1
T2 y;
... when != S2
x = <+... y ...+>;

@badr3a@
type T;
expression e,f;
identifier r3.x;
statement S,S1,S2;
position r3.p;
binary operator op;
@@

T@p x;
... when != S
x = e;
if (<+...\(x == NULL\|x != NULL\|x op 0\|f(...,x,...)\)...+>) S1 else S2

@badr3b@
type T;
expression e;
identifier r3.x;
statement S;
position r3.p;
position q : script:ocaml() { bigcode q };
@@

T@p x;
... when != S
x = e@q;

@depends on !badr3 && !badr3a && !badr3b@
type T;
identifier r3.x;
expression e;
statement S;
position r3.p;
@@

T@p x
+ = e
  ;
... when != S
- x = e;