Getting LLVM Bitcode
LLVM Bitcode may take one of the following forms:
- LLVM IR (human-readable representation)
- LLVM Bitcode (bitstream representation)
- Embedded Bitcode (bitstream representation embedded into a binary)
There are several ways to get LLVM Bitcode out of high-level source code. This section describes these approaches, covering both basic mechanics and the real-world use cases. It concludes with a list of known issues.
#
'Hello-world' versionLet's use the following program as an example:
#
LLVM IRTo emit LLVM IR for the single file, one can use the following command:
Emitted main.ll
can be passed to the llvm2cpg
:
#
LLVM BitcodeThere are two ways to get the bitstream representation.
Or via LTO trick:
In these cases, both main.o
and main.bc
contain LLVM Bitcode:
Either of them can be passed to llvm2cpg
.
#
Embedded BitcodeThis is the ideal case since it gives the most straightforward integration and can be easily added to an existing build system without affecting the resulting software.
The resulting main
can be passed to llvm2cpg
as is:
#
Real-world versionGetting Bitcode for the real-world projects with all the different build systems is less straightforward, but still doable. One need to inject one of the following flags into the build system:
-emit-llvm
-flto
-fembed-bitcode
Note: Alternatively, one can use whole-program-llvm.
In the case of -emit-llvm
, the build doesn't finish properly (linking fails since there are no object files produced), but all the bitcode files will be available.
In the case of -flto
, the build succeeds, and all the intermediate object files, in fact, will contain bitcode.
In the case of -fembed-bitcode
, the build succeeds, and the resulting binary contains required bitcode.
#
CMake#
XcodeAdd a flag to both Other C Flags
and Other Linker Flags
.
#
xcodebuild#
Other build systemsConsider looking into whole-program-llvm.
#
Known issues-fembed-bitcode
may not work on macOS if a project links a static library that was not compiled with embedded bitcode support- if
-fembed-bitcode
is combined with-flto
, then bitcode won't be embedded into a binary - in some cases,
llvm2cpg
cannot read debug information emitted by Xcode's clang. In this case, everything still works, but the debug info is not taken into account.
#
Getting CPG out of a projectOnce you get the bitcode, the CPG emission is trivial. Here are typical commands you may want to run depending on the way you get bitcode.
-emit-llvm
:
-flto
:
-fembed-bitcode
:
whole-program-llvm
: