This RFC adopts the high level design proposed and discussed in an earlier attempt to support call graph information from the 2021 RFC. However, as we started reviving that earlier effort we’ve had to break away from the old design and implementation plan in significant ways based on feedback from code owners. We are presenting a brief but complete picture in this RFC the design for supporting call graph information generation. At the end of the write up we have a section that lists the questions for which we would like feedback on the items that were not covered by the old RFC.
Introduction
In program analysis, call graph refers to a directed graph that represents calling relationships between functions in a program. The objective of this effort is to generate precise (but still conservatively correct) call graph information of C/C++ programs at build time. Tools that directly operate on disassembly to generate call graphs are less precise in presence of indirect calls, due to the loss of semantic information during compilation. We propose to modify the compiler in a way that the rich information available in the compiler front-end regarding call relationships is preserved and stored in a non-allocated ELF section named .callgraph
.
Background
Function type identifier
Type identifiers are computed by generalizing the function type, taking the MD5 of the generalized name, and reinterpreting the first 8 bytes as a little-endian u64. Pointer generalization collapses pointer chains but honors qualifiers (e.g., char* vs const char* are distinct). This identifier is calculated in Clang where rich source-level information about each function is available. This implementation is already in use for Control Flow Integrity and is leveraged for generating call graph info. This key component offers the missing semantic information to the rest of the compilation pipeline beyond the compiler front-end.
Handling indirect calls
For every indirect call, Clang front-end will generate a named callee_type
metadata encoding the type information of the intended callee and will attach this metadata to indirect call instructions. This metadata will be preserved in LLVM across code transformations and the codegen AsmPrinter
will emit a .callgraph section with information about each indirect call and the type of their intended callees.
!callee_type metadata
We introduce !callee_type
metadata as a mechanism by which we can annotate the type of the intended callees of each indirect call site in LLVM IR. !callee_type is a list of one or more !type
metadata nodes, each pointing to a type id for an intended callee signature.
define i32 @call_foo() { entry: %fptr = load ptr, ptr null, align 8 %call = call i32 %fptr(i8 0), !callee_type !0 ret i32 0 } !0 = !{!1} !1 = !{i64 0, !"_ZTSFicE.generalized"}
In the simple example above, call to %fptr
is attached a !callee_type !0
metadata indicating that the intended types of potential callees are !0 = !{!1}
which is !{i64 0, !"_ZTSFvcE.generalized"}
.
The implementation prototype that was created as a follow up to the old RFC used operand bundles instead of metadata to transport callee type. @nikic pointed out that this is not an appropriate choice as the other operand bundles consistently impact the semantics of the call in some way which is not the case here. Please see https://github.com/llvm/llvm-project/pull/87573#issuecomment-2736295811 for the detailed discussion on this topic.
Middle-end → Backend Plumbing
calleetypeIds
field is added to the existing CallSiteInfo
, and MIR can round-trip them. LLVM’s lowering/selection adds extraction logic: numeric type IDs are pulled from the call site metadata and carried in MachineFunction::CallSiteInfo
so the backends (AArch64, ARM, Mips, RISC-V and X86) can use them.
name: call_foo callSites: - { bb: 0, offset: 0, fwdArgRegs: [], calleeTypeIds: [ 3498816979441845844 ] } body: bb.0.entry: CALL64m ...// Machine call instruction params hidden for brevity.
Emitting call graph information section
AsmPrinter is responsible for emitting a new ELF section .callgraph
.
Call graph section layout
For every machine function, AsmPrinter writes one record into a .callgraph output section with this exact field order:
- u64 FormatVersion
- FunctionEntryPC — pointer-sized relocation to the function’s begin symbol
- u64 FunctionKind — one of:
- 0 = not an indirect-call target
- 1 = possible indirect target, type id unknown
- 2 = possible indirect target, type id known
- u64 FunctionTypeId — a numeric type id (only if FunctionKind = 2)
- u64 IndirectCallSiteCount — count of the following pairs.
- IndirectCallSiteCount repetitions of:
4. u64 CallSiteTypeId
5. CallSitePC — pointer-sized relocation to the label emitted right before the indirect call in .text - u64 DirectCallSiteCount — count of the following pairs.
- DirectCallSiteCount repetitions of:
6. CallSitePC — pointer-sized relocation to the label emitted right before the indirect call in .text
7. CalleePC — pointer-sized relocation to the callee function’s begin symbol
The old RFC did not contain information pertaining to direct call sites in the layout. We are expanding the design to include direct call information and discuss the reasons in sections that follow.
Handling direct calls
In AsmPrinter, for every machine function, we identify direct call sites and add this information into the call graph section described above. The old RFC proposed using the MC layer in llvm-objdump
to handle direct calls instead of embedding this information in the .callgraph section at build time.
The new approach significantly simplifies the design of tooling for parsing the .callgraph section. By eliminating the dependency on the MC layer and the need for disassembly, we can achieve a substantial reduction in code complexity, making the tooling easier to maintain and faster to execute.
Tooling to generate machine readable call graph information output
We would like to add a new flag –call-graph-info
to llvm-readelf
to read the .callgraph section. llvm-readelf supports JSON output format which makes it easier for generating machine readable output that can be easily handled by call graph reconstruction tools. We intend to contribute python scripts to reconstruct the program call graph from the compiler generated .callgraph section.
Implementation status
Introduction of callee_type metadata in llvm, its propagation and emitting call graph section have all landed in the following PRs.
Landed PRs
- [llvm] Introduce callee_type metadata by Prabhuk · Pull Request #87573 · llvm/llvm-project · GitHub
- [llvm] Add CalleeTypeIds field to CallSiteInfo by Prabhuk · Pull Request #87574 · llvm/llvm-project · GitHub
- [llvm] Extract and propagate callee_type metadata by Prabhuk · Pull Request #87575 · llvm/llvm-project · GitHub
- [llvm][AsmPrinter] Emit call graph section by Prabhuk · Pull Request #87576 · llvm/llvm-project · GitHub
PRs under review
- Handling direct callsites in AsmPrinter: [llvm][AsmPrinter] Add direct calls to callgraph section by Prabhuk · Pull Request #155706 · llvm/llvm-project · GitHub
- Implement
--call-graph-info
flag to dump.callgraph
ELF section as text (GNU/LLVM) or JSON output. - Clang codegen and driver flag changes: These will be the last things we would like to land when we are satisfied with the implementation and call graph section’s layout properties. The flag will be experimental (
-fexperimental-call-graph-section
) first.
2. [clang] Introduce CallGraphSection codegen option by Prabhuk · Pull Request #117037 · llvm/llvm-project · GitHub
3. [clang] callee_type metadata for indirect calls by Prabhuk · Pull Request #117036 · llvm/llvm-project · GitHub
WIP
- Documentation for call graph section info, its format, how to use related flags and tooling.
- Python script to reconstruct call graph from llvm-readelf output.
Future directions
Rust support
We work with codebases which have Rust & C++ interop. We would like to instrument Rust front-end to construct a complete call graph for these contexts. We are working on a proposal for upstream Rust towards this and will take this up as soon as Clang bits land.
Applications
The old RFC discussed how call graph information can be used in the context of program analysis and memory safety tools which collect stack traces. These tools can collect efficient, compact stack trace information and can reconstruct the stack traces in human readable format using the call graph information.
Stack size analysis is another use case which is of immediate concern to our customers. Call graph data, when combined with stack size information from the -fstack-size-section
compiler flag, can help developers to calculate a safe, conservative upper limit for the stack requirements of specific functions or entry points in an application. This is especially useful in environments like embedded systems or operating system kernels where manual stack allocation is required.
Questions for the community
The write up above presents a complete picture of this effort. In this final section we are listing the work that is still in progress and would like the community’s feedback on.
- Embed direct call site information into the
.callgraph
section. - Introduce
–call-graph-info
flag in llvm-readelf. - Land
-fexperimental-call-graph-section
. - Add new scripts to reconstruct call graph (and potentially for stack size estimation) using
.callgraph
section.