Joern is a powerful command-line tool designed for static code analysis. It can assist you in detecting and fixing security vulnerabilities in programs with large amounts of code, even those that are difficult to identify with fuzzing. Joern also comes with an interactive shell and automation features based on Code Property Graphs.
This article provides an introduction to the fundamentals of using Joern. You will learn how to create and modify Code Property Graphs, query them, and use organizational commands. In case you haven’t installed Joern yet, you can refer to these instructions.
Obtaining the Sample Program #
Before you begin analyzing with Joern, ensure you have a program ready for analysis. To do this, clone the following git repository which contains a simple program named X42
:
git clone https://github.com/ShiftLeftSecurity/x42.git
Let us start with a problem statement. Show - without running the
program - that an input exists for which X42
writes a string to
standard error (STDERR).
// X42.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[]) {
if (argc > 1 && strcmp(argv[1], "42") == 0) {
fprintf(stderr, "It depends!\n");
exit(42);
}
printf("What is the meaning of life?\n");
exit(0);
}
Starting Joern’s Interactive Shell #
Launch Joern in your shell:
$ joern
A console session will start and you will see a prompt:
joern>
The prompt you are looking at is the prompt of a Scala-based REPL. If you have no experience with Scala or read-eval-print-loops, don’t worry, you can accomplish a lot with Joern by focusing only on what its commands allow you to do. If you are familiar with Scala and REPLs, you may be pleasantly surprised at the flexibility it provides you with.
Importing the Code #
We create a Code Property Graph for the X42
program using the
command importCode
, which requires the path to the source code to be
passed as a first argument, and a project name as a second
argument. In particular, importCode
creates a new project directory
and stores a binary representation of the Code Property Graph in it.
joern> importCode(inputPath="./x42/c", projectName="x42-c")
Creating project `x42-c` for code at `x42/c`
... output omitted
res1: Option[Cpg] = Some(io.shiftleft.codepropertygraph.Cpg@31ed46c5)
If you see an error and a return value of None
, you have probably pointed Joern to the wrong input path for the directory containing the source code for the sample project.
Querying the Code Property Graph #
You are ready to analyze your first program using Joern and the Code
Property Graph. Code analysis in Joern is done using the CPG query
language, a domain-specific language designed specifically to work
with the Code Property Graph. It contains practical representations of
the various nodes found in the Code Property Graph, and useful
functions for querying their properties and relationships between each
other. The top-level entry point into a Code Property Graph loaded in
memory, and the root object of the query Language is cpg
. If you
evaluate cpg
at the prompt, the output is underwhelming:
joern> cpg
res2: Cpg = io.shiftleft.codepropertygraph.Cpg@cb0d5241
Rest assured, a lot is hidden behind that simple statement. You will
discover the full set of commands in time, but for now, you should
learn a helpful Joern trick: TAB
-completion. In the Joern prompt,
type cpg.
, do not press ENTER
, but instead press TAB
. You will
see a list of available functions cpg
supports:
joern> cpg.
all comment goto literal namespace tryBlock
argument continue graph local namespaceBlock typ
arithmetic controlStructure help member parameter typeDecl
assignment doBlock id metaData ret typeRef
break elseBlock identifier method runScript whileBlock
call file ifBlock methodRef switchBlock
close forBlock jumpTarget methodReturn tag
TAB-completion is available for all query language directives, and for
top-level commands. For more descriptive assistance, use the help
command, like so:
joern> help.cpg
res3: String = """
Upon importing code, a project is created that holds an intermediate
representation called `Code Property Graph`. This graph is a composition of
low-level program representations such as abstract syntax trees and control flow
graphs, but it can be arbitrarily extended to hold any information relevant in
your audit, information about HTTP entry points, IO routines, information flows,
or locations of vulnerable code. Think of Joern as a CPG editor.
In practice, `cpg` is the root object of the query language, that is, all query
language constructs can be invoked starting from `cpg`. For exanple,
`cpg.method.l` lists all methods, while `cpg.finding.l` lists all findings of
potentially vulnerable code."""
Solving the Challenge #
Now that we have a good set of basic commands, and a Code Property
Graph loaded in memory, let us return to our X42
program and the
problem we want to solve using Joern.
To reiterate, the problem statement is Show that an input exists for
which the X42 program always writes a string to STDERR. And this is
the X42
program:
// X42.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[]) {
if (argc > 1 && strcmp(argv[1], "42") == 0) {
fprintf(stderr, "It depends!\n");
exit(42);
}
printf("What is the meaning of life?\n");
exit(0);
}
There are two parts in the problem statement: 1. does the program
write anything to STDERR?, and 2. _if there is a call writing to
STDERR, is it conditional on a value passed in as argument to the
X42
program?
Joern makes answering both questions easy. To answer the first one,
whether the program writes anything to STDERR, we can search for nodes
of type CALL
in the graph, then use the argument
step to only
select those calls which have connections to nodes of type ARGUMENT
,
followed by the code("stderr")
property filter step which selects only
those nodes that have the string stderr
as the value of their CODE
property. We find exactly one:
joern> cpg.call.argument.code("stderr").toList
res3: List[Expression] = List(
Identifier(
id -> 1000118L,
argumentIndex -> 1,
argumentName -> None,
code -> "stderr",
columnNumber -> Some(value = 12),
lineNumber -> Some(value = 7),
name -> "stderr",
order -> 1,
typeFullName -> "ANY"
)
)
This query shows that stderr
is used somewhere in the program, but doesn’t give us any more information. Using the query from the previous step, we can use the astParent
construct to find out more about the surroundings of the stderr
usage by moving up the hierarchy of the abstract syntax tree that is part of the Code Property Graph. Moving up one level in the AST hierarchy gives us an fprintf
call:
joern> cpg.call.argument.code("stderr").astParent.toList
res4: List[AstNode] = List(
Call(
id -> 1000117L,
argumentIndex -> 1,
argumentName -> None,
code -> "fprintf(stderr, \"It depends!\\n\")",
columnNumber -> Some(value = 4),
dispatchType -> "STATIC_DISPATCH",
lineNumber -> Some(value = 7),
methodFullName -> "fprintf",
name -> "fprintf",
order -> 1,
signature -> "TODO",
typeFullName -> "<empty>"
)
)
With this query we have proven the first part of our problem statement correct, there is a place in the X42
program that writes to STDERR. Let us move to the second part, the check whether the call that writes something to STDERR is conditional on a value passed as input to the X42
program. Since we are analyzing a program written in C
, we will search the Code Property Graph for the conventional argc
or argv
parameters of the main
function as the input that potentially triggers the call which writes to STDERR.
As before, we can use the astParent
to move up the AST. Moving up another level in the AST hierarchy gives us a block; not very helpful:
joern> cpg.call.argument.code("stderr").astParent.astParent.toList
res5: List[AstNode] = List(
Block(
id -> 1000116L,
argumentIndex -> 2,
argumentName -> None,
code -> "",
columnNumber -> Some(value = 46),
lineNumber -> Some(value = 6),
order -> 2,
typeFullName -> "void"
)
)
Another layer up gives us an if statement, much better:
joern> cpg.call.argument.code("stderr").astParent.astParent.astParent.toList
res6: List[AstNode] = List(
ControlStructure(
id -> 1000104L,
argumentIndex -> 1,
argumentName -> None,
code -> "if (argc > 1 && strcmp(argv[1], \"42\") == 0)",
columnNumber -> Some(value = 2),
controlStructureType -> "IF",
lineNumber -> Some(value = 6),
order -> 1,
parserTypeName -> "IfStatement"
)
)
The CODE
property of the CONTROL_STRUCTURE
node you just found proves the second part of our problem statement correct, the call that writes to STDERR is conditional on argc
and argv
. Hence, the whole problem statement is correct.
Closing the Project #
Now that we’ve finished the analysis, let us close the project, which also unloads the Code Property Graph from memory. You do not have to worry about losing any data, because it will remain on disk in the x42-c
project you created earlier with importCode
. Close the project using the aptly-named close
:
joern> close
2020-05-08 01:13:01.752 WARN clearing 105 references - this may take some time
2020-05-08 01:13:01.756 WARN cleared all clearable references
res7: Option[io.shiftleft.console.workspacehandling.Project] = Some(
Project(
ProjectFile("/home/user/x42/c", "x42-c"),
/home/user/.shiftleft/joern/workspace/x42-c,
None
)
)
As a final step, exit Joern:
joern> :exit
Congratulations, you have successfully queried your first Code Property Graph using Joern and its query language. More examples can be found on the query-database website (also see Joern Scan).
In subsequent articles, you will learn the more advanced features of Joern and also how to use it to find your first real-world vulnerability.