Joern is used in academic research as a source for
intermediate graph representations of code, particularly in machine
learning and vulnerability discovery applications [e.g., 1,2,3,4,5]. To support
this use-case, Joern provides both plotting capabilities in the
interactive console as well as the joern-export
command line
utility.
In summary, Joern can create the following graph representations for C/C++ code:
- Abstract Syntax Trees (AST)
- Control Flow Graphs (CFG)
- Control Dependence Graphs (CDG)
- Data Dependence Graphs (DDG)
- Program Dependence graphs (PDG)
- Code Property Graphs (CPG14)
- Entire graph, i.e. convert to a different graph format (ALL)
The command line tool joern-export
#
All of these representations can be plotted and exported into the graphviz dot format to enable processing with third party tools or via external scripts.
To parse the code in /src/directory
and dump Program Dependence
Graphs for all methods into the directory outdir
, you can run the
following commands on the system shell:
joern-parse /src/directory
joern-export --repr pdg --out outdir
You can also export the entire graph into a neo4j csv format (along with instructions on how to import it into a running neo4j instance), graphml, graphson or graphviz dot:
./joern-export --repr=all --format=neo4jcsv
./joern-export --repr=all --format=graphml
./joern-export --repr=all --format=graphson
./joern-export --repr=all --format=dot
For a complete overview of options, run joern-export --help
.
Plotting and Exporting on the Joern Console #
If you would like to explore graph representations interactively, you
can do so on the Joern shell. To this end, we define
the following steps on method
nodes to dump representations in dot
format.
cpg.method($name).dotAst.l // output AST in dot format
cpg.method($name).dotCfg.l // output CFG in dot format
...
cpg.method($name).dotCpg14.l // output CPG'14 in dot format
You can also plot and view representations using the following queries:
cpg.method($name).plotDotAst // plot AST
cpg.method($name).plotDotCfg // plot CFG
...
cpg.method($name).plotDotCpg14 // plot CPG'14
Note that the ossdataflow
layer needs to have been calculated for
the source CPG via run.ossdataflow
.
Example #
Generate the CPG along with the data flow layer for a sample function
named myfunc
.
joern> importCode.c.fromString( """
int myfunc(int b) {
int a = 42;
if (b > 10) {
foo(a)
}
bar(a);
}
"""
)
joern> run.ossdataflow
You can now plot the AST as follows:
joern> cpg.method("myfunc").plotDotAst
You can obtain the dot representation of the AST as well:
joern> cpg.method("myfunc").dotAst.l
res4: List[String] = List(
"""digraph myfunc {
"1000102" [label = "(METHOD,myfunc)" ]
"1000103" [label = "(PARAM,int b)" ]
"1000104" [label = "(BLOCK,,)" ]
"1000105" [label = "(LOCAL,a: int)" ]
"1000106" [label = "(<operator>.assignment,a = 42)" ]
"1000107" [label = "(IDENTIFIER,a,a = 42)" ]
"1000108" [label = "(LITERAL,42,a = 42)" ]
"1000109" [label = "(CONTROL_STRUCTURE,if (b > 10),if (b > 10))" ]
"1000110" [label = "(<operator>.greaterThan,b > 10)" ]
"1000111" [label = "(IDENTIFIER,b,b > 10)" ]
"1000112" [label = "(LITERAL,10,b > 10)" ]
"1000113" [label = "(BLOCK,,)" ]
"1000114" [label = "(bar,bar(a))" ]
"1000115" [label = "(IDENTIFIER,a,bar(a))" ]
"1000116" [label = "(METHOD_RETURN,int)" ]
"1000102" -> "1000103"
"1000102" -> "1000104"
"1000102" -> "1000116"
"1000104" -> "1000105"
"1000104" -> "1000106"
"1000104" -> "1000109"
"1000104" -> "1000114"
"1000106" -> "1000107"
"1000106" -> "1000108"
"1000109" -> "1000110"
"1000109" -> "1000113"
"1000110" -> "1000111"
"1000110" -> "1000112"
"1000114" -> "1000115"
}
"""
)
Dumping representations for all functions from the shell #
You can also dump all representations into the directory out
using
run.dumpast
run.dumpcfg
...
run.dumpcpg14
References #
Research that employs Joern as an extraction tool for intermediate representations of code:
(5) SPIDER: Enabling Fast Patch Propagation In Related Software Repositories, Machiry et al., S&P'20