[RFC] Call Graph Information from Clang/LLVM for C/C++

This RFC adopts the high level design proposed and discussed in an earlier attempt to support call graph information from the 2021 RFC. However, as we started reviving that earlier effort we’ve had to break away from the old design and implementation plan in significant ways based on feedback from code owners. We are presenting a brief but complete picture in this RFC the design for supporting call graph information generation. At the end of the write up we have a section that lists the questions for which we would like feedback on the items that were not covered by the old RFC.

Introduction

In program analysis, call graph refers to a directed graph that represents calling relationships between functions in a program. The objective of this effort is to generate precise (but still conservatively correct) call graph information of C/C++ programs at build time. Tools that directly operate on disassembly to generate call graphs are less precise in presence of indirect calls, due to the loss of semantic information during compilation. We propose to modify the compiler in a way that the rich information available in the compiler front-end regarding call relationships is preserved and stored in a non-allocated ELF section named .callgraph.

Background

Function type identifier

Type identifiers are computed by generalizing the function type, taking the MD5 of the generalized name, and reinterpreting the first 8 bytes as a little-endian u64. Pointer generalization collapses pointer chains but honors qualifiers (e.g., char* vs const char* are distinct). This identifier is calculated in Clang where rich source-level information about each function is available. This implementation is already in use for Control Flow Integrity and is leveraged for generating call graph info. This key component offers the missing semantic information to the rest of the compilation pipeline beyond the compiler front-end.

Handling indirect calls

For every indirect call, Clang front-end will generate a named callee_type metadata encoding the type information of the intended callee and will attach this metadata to indirect call instructions. This metadata will be preserved in LLVM across code transformations and the codegen AsmPrinter will emit a .callgraph section with information about each indirect call and the type of their intended callees.

!callee_type metadata

We introduce !callee_type metadata as a mechanism by which we can annotate the type of the intended callees of each indirect call site in LLVM IR. !callee_type is a list of one or more !type metadata nodes, each pointing to a type id for an intended callee signature.

define i32 @call_foo() { entry: %fptr = load ptr, ptr null, align 8 %call = call i32 %fptr(i8 0), !callee_type !0 ret i32 0 } !0 = !{!1} !1 = !{i64 0, !"_ZTSFicE.generalized"} 

In the simple example above, call to %fptr is attached a !callee_type !0 metadata indicating that the intended types of potential callees are !0 = !{!1} which is !{i64 0, !"_ZTSFvcE.generalized"}.

The implementation prototype that was created as a follow up to the old RFC used operand bundles instead of metadata to transport callee type. @nikic pointed out that this is not an appropriate choice as the other operand bundles consistently impact the semantics of the call in some way which is not the case here. Please see https://github.com/llvm/llvm-project/pull/87573#issuecomment-2736295811 for the detailed discussion on this topic.

Middle-end → Backend Plumbing

calleetypeIds field is added to the existing CallSiteInfo, and MIR can round-trip them. LLVM’s lowering/selection adds extraction logic: numeric type IDs are pulled from the call site metadata and carried in MachineFunction::CallSiteInfo so the backends (AArch64, ARM, Mips, RISC-V and X86) can use them.

name: call_foo callSites: - { bb: 0, offset: 0, fwdArgRegs: [], calleeTypeIds: [ 3498816979441845844 ] } body: bb.0.entry: CALL64m ...// Machine call instruction params hidden for brevity. 

Emitting call graph information section

AsmPrinter is responsible for emitting a new ELF section .callgraph.

Call graph section layout

For every machine function, AsmPrinter writes one record into a .callgraph output section with this exact field order:

  1. u64 FormatVersion
  2. FunctionEntryPC — pointer-sized relocation to the function’s begin symbol
  3. u64 FunctionKind — one of:
    1. 0 = not an indirect-call target
    2. 1 = possible indirect target, type id unknown
    3. 2 = possible indirect target, type id known
  4. u64 FunctionTypeId — a numeric type id (only if FunctionKind = 2)
  5. u64 IndirectCallSiteCount — count of the following pairs.
  6. IndirectCallSiteCount repetitions of:
    4. u64 CallSiteTypeId
    5. CallSitePC — pointer-sized relocation to the label emitted right before the indirect call in .text
  7. u64 DirectCallSiteCount — count of the following pairs.
  8. DirectCallSiteCount repetitions of:
    6. CallSitePC — pointer-sized relocation to the label emitted right before the indirect call in .text
    7. CalleePC — pointer-sized relocation to the callee function’s begin symbol

The old RFC did not contain information pertaining to direct call sites in the layout. We are expanding the design to include direct call information and discuss the reasons in sections that follow.

Handling direct calls

In AsmPrinter, for every machine function, we identify direct call sites and add this information into the call graph section described above. The old RFC proposed using the MC layer in llvm-objdump to handle direct calls instead of embedding this information in the .callgraph section at build time.

The new approach significantly simplifies the design of tooling for parsing the .callgraph section. By eliminating the dependency on the MC layer and the need for disassembly, we can achieve a substantial reduction in code complexity, making the tooling easier to maintain and faster to execute.

Tooling to generate machine readable call graph information output

We would like to add a new flag –call-graph-info to llvm-readelf to read the .callgraph section. llvm-readelf supports JSON output format which makes it easier for generating machine readable output that can be easily handled by call graph reconstruction tools. We intend to contribute python scripts to reconstruct the program call graph from the compiler generated .callgraph section.

Implementation status

Introduction of callee_type metadata in llvm, its propagation and emitting call graph section have all landed in the following PRs.

Landed PRs

  1. [llvm] Introduce callee_type metadata by Prabhuk · Pull Request #87573 · llvm/llvm-project · GitHub
  2. [llvm] Add CalleeTypeIds field to CallSiteInfo by Prabhuk · Pull Request #87574 · llvm/llvm-project · GitHub
  3. [llvm] Extract and propagate callee_type metadata by Prabhuk · Pull Request #87575 · llvm/llvm-project · GitHub
  4. [llvm][AsmPrinter] Emit call graph section by Prabhuk · Pull Request #87576 · llvm/llvm-project · GitHub

PRs under review

  1. Handling direct callsites in AsmPrinter: [llvm][AsmPrinter] Add direct calls to callgraph section by Prabhuk · Pull Request #155706 · llvm/llvm-project · GitHub
  2. Implement --call-graph-info flag to dump .callgraph ELF section as text (GNU/LLVM) or JSON output.
    1. [llvm-readobj] Dump callgraph section info for ELF by Prabhuk · Pull Request #157499 · llvm/llvm-project · GitHub
  3. Clang codegen and driver flag changes: These will be the last things we would like to land when we are satisfied with the implementation and call graph section’s layout properties. The flag will be experimental (-fexperimental-call-graph-section) first.
    2. [clang] Introduce CallGraphSection codegen option by Prabhuk · Pull Request #117037 · llvm/llvm-project · GitHub
    3. [clang] callee_type metadata for indirect calls by Prabhuk · Pull Request #117036 · llvm/llvm-project · GitHub

WIP

  1. Documentation for call graph section info, its format, how to use related flags and tooling.
  2. Python script to reconstruct call graph from llvm-readelf output.

Future directions

Rust support

We work with codebases which have Rust & C++ interop. We would like to instrument Rust front-end to construct a complete call graph for these contexts. We are working on a proposal for upstream Rust towards this and will take this up as soon as Clang bits land.

Applications

The old RFC discussed how call graph information can be used in the context of program analysis and memory safety tools which collect stack traces. These tools can collect efficient, compact stack trace information and can reconstruct the stack traces in human readable format using the call graph information.

Stack size analysis is another use case which is of immediate concern to our customers. Call graph data, when combined with stack size information from the -fstack-size-section compiler flag, can help developers to calculate a safe, conservative upper limit for the stack requirements of specific functions or entry points in an application. This is especially useful in environments like embedded systems or operating system kernels where manual stack allocation is required.

Questions for the community

The write up above presents a complete picture of this effort. In this final section we are listing the work that is still in progress and would like the community’s feedback on.

  1. Embed direct call site information into the .callgraph section.
  2. Introduce –call-graph-info flag in llvm-readelf.
  3. Land -fexperimental-call-graph-section.
  4. Add new scripts to reconstruct call graph (and potentially for stack size estimation) using .callgraph section.

Thank you for posting.

Out of interest does the callgraph info get one output .callgraph section per module or one per executable section .callgraph.text.foo ? With the latter then it should be possible for COMDAT group selection and with some care possibly garbage collection to drop out of linking. With the former we may have lots of information for discarded functions to sift through. Presumably the linker could use a “tombstone” address, or if that’s not possible 0x0, for the FunctionEntryPC relocation resolution.

On item 4. Mainly coming from the use case of stack-size estimation.

One of the potential problems I can see is that not all objects will be compiled with a .callgraph section. For example assembly code and third-party binary libraries. Ideally it would be possible to supplement the .callgraph section information from another source, such as disassembly. There’s many possible ways that could be done. For example:

  • Instead of llvm-readelf, use llvm-objdump and use the disassembly to reconstruct at least missing direct-calls.
  • Allow a user to supplement the .callgraph information provided to your script in WIP (2) via an input file. That input file could be generated by some third-party tool or script, potentially llvm-objdump.

Some thoughts on stack size checking:

There are some scripts that are based on disassembly and combine the output of gcc/clang -fstack-usage [1].

One aspect our Cortex-M RTOS team have asked us to look at is which functions use floating point. Cortex-M has got a “lazy save” of floating point registers based on whether they’ve been used in the current task. The stack usage of the kernel and some tasks could be reduced if they could know that a particular task and interrupt service routine (callgraph starting at a function) uses floating point. At the moment the advice [2] is that this is too complicated to work out, so just allocate stack as if it uses floating point registers.

I had an idea that llvm-objdump could be used to disassemble each instruction and output a summary of hardware features needed by instructions in the function. That could be combined with the callgraph information.

[1] Stack-checking a program that will execute in orbit
[2] Documentation – Arm Developer Determining the stack usage of applications

Thank you @smithp35 for the response. The points raised here were discussed in the embedded WG meeting today. I’ll respond briefly with what’s been discussed and will share the minutes from the WG meeting when it becomes available.

We currently use SHF_LINK_ORDER (and SHF_GROUP) flags to link the .callgraph section to function symbols similar to how -fstack-size-section was implemented. Please see: llvm-project/llvm/lib/MC/MCObjectFileInfo.cpp at main · llvm/llvm-project · GitHub
In the WG meeting today, it was pointed out that this may have problems with template instantiation cases. @petrhosek suggested that section groups (zero copy sections) will be a better a choice for this. I will work with Petr to evaluate it and transition the implementation.

I agree that disassembling the binary will be required to recover direct calls which are not instrumented by the compiler and therefore will be missing from the .callgraph section. However, I strongly prefer that the disassembling of the binary and the corresponding complexity should live outside the main tooling to dump the contents of the .callgraph section. The script to reconstruct the call graph can employ llvm-objdump or bolt to generate information regarding missing direct calls.

Yes. This is our current idea as well to augment the compiler generated call graph information with user inputs for the script to reconstruct the call graph.

Thank you again.

I’m in favor overall, I think it’s reasonable and as far as I’m aware we already provide some similar ELF-level call graph information for PGO in the linker. For GPU kernels, we need to calculate the necessary resources to execute that kernel so that the driver knows how many registers to allocate in the launch packet. We currently don’t fully support ELF linking of AMDGPU binaries because we don’t have the necessary information to derive this in the linker without access to the kernel’s full call graph in LLVM-IR. Do you think this would be usable for that?

I think that should be doable. The CG info should be as precise as anything you can get from IR, since the callee types are generated by the frontend and then the metadata is serialized late in the backend (just before AsmPrinter IIRC). You may need some additional step in the linker or a post link step to make use of the information, but I could imagine it would give you what you need.

I think if we emit a metadata entry for each function, we should be able to find the diameter of the call graph by cross-checking the symbol names in the section. Then we’d probably just need to move the kernel lowering into ld.lld. I’m assuming there would be helpers for traversing this call graph format somewhere. So I’m heavily in favor if it makes this straightforward.

1 Like

Based on review comments and feedback from other forums, I am making changes to the call graph section layout to be more space efficient. One of the main differences from my proposal is that instead of maintaining per callsite information, I am retaining only the unique callee information of a function. For direct calls this is unique set of direct callee addresses and for indirect calls this is a set of unique target type IDs. While this greatly improves space efficiency, this call graph information cannot be used for callsite specific information reconstructions such as enriching a raw stack trace data. It has been pointed out that is the purpose of debug information such as DWARF and I see that as a valid argument. My changes to the layout are under review here: [llvm][AsmPrinter] Call graph section format. by Prabhuk · Pull Request #159866 · llvm/llvm-project · GitHub

Please feel free to leave a review if you have objections or other suggestions. Thank you.