Hello, LLVM World
Besides working with the source code written in C and C++, Joern supports CPGs generated from the LLVM Bitcode.
This article shows a basic example of how to use llvm2cpg with Joern.
The basic workflow is the following:
- Convert a program into LLVM Bitcode
- Generate a CPG using llvm2cpg
- Import the CPG into Joern and start the analysis
#
Emit LLVM BitcodeLet's start with a simple C program:
You can use the following command to convert foo.c
to the bticode format:
Here is a brief explanation of what each flag does:
-emit-llvm
tells clang to emit LLVM Bitcode instead of an object file or an executable-S
forces clang to emit the bitcode in a human-readable, textual format-g
enables debug info. Strictly speaking, this one is not needed, but it's essential if we want to map bitcode instructions back to the original source code-O1
by default, clang emits a very inefficient bitcode with a lot of redundancy. This flag tells clang to apply some optimizations to make the bitcode a bit more concise-o foo.ll
tells clang to store the result in the filefoo.ll
Upon success, foo.ll
should contain the following:
Note: it's very likely that you have different target datalayout
and target triple
depending on the machine/OS you're running.
#
Emit CPGTo convert LLVM Bitcode into CPG you need to get llvm2cpg and run the following command:
Once done, the CPG (/tmp/foo.cpg.bin.zip
) can be fed to Joern.
#
Analyze CPG with JoernLet's find the simple flow in the above program:
Joern tells us that the result of the call to source
(line 5) is passed to the function sink
as an argument (line 6).
Looking at the original code it seems legit:
#
Slightly more complex analysisThe previous example may seem too boring, so let's at something a bit more interesting now. Consider the following program with a double free bug:
Following the same steps, we get a CPG:
And start the analysis. Here we are interested to see if any value passed as an argument to the free
function is passed as an argument to the function free
.
By default, we get three flows as follows:
The first two are 'loops': there is a flow from the free
to itself.
We can filter these results out by only asking for flows that are longer than one:
Which yields the double-free bug in the program!