Name	Name	Last commit message	Last commit date
Latest commit History 519 Commits
docs	docs
events/x86	events/x86
examples	examples
include/perfcpp	include/perfcpp
script	script
src	src
test	test
.clang-format	.clang-format
.gitignore	.gitignore
CHANGELOG.md	CHANGELOG.md
CMakeLists.txt	CMakeLists.txt
LICENSE	LICENSE
README.md	README.md

perf-cpp: Effortless Hardware Performance Monitoring for C++ Applications

Quick Start | How to Build | Documentation | System Requirements

perf-cpp embeds Linux's hardware performance monitoring directly into your code, letting you profile exactly what matters and process the results in your application. Tools like Linux Perf, Intel® VTune™, and AMD uProf are powerful but monitor entire programs – and high-performance applications need surgical precision.

What can perf-cpp do?

Built around Linux's powerful perf subsystem, perf-cpp provides a clean interface for counting and sampling hardware events – without the complexity of low-level APIs.

Measure exactly what you want – utilize performance counters to count hardware events, similar to perf stat, but around specific code paths, not an entire binary (documentation).
Calculate metrics such as cycles per instruction and cache miss to access ratio based on hardware events and timing (documentation).
Low-latency performance counters access without starting/stopping the counters, for micro-benchmarks or adaptive tuning (documentation).
Record instruction and memory samples, just like perf [mem] record – but from inside your application (documentation).
Correlate samples with data structures and symbols to generate per-class access statistics and flame graphs.
Mix built-in events (e.g., cycles, instructions, cache misses, ...) with processor-specific counters (documentation).

See various practical examples and the documentation for more details.

Quick Start

Record Hardware Event Statistics

Recording hardware event statistics operates much like perf stat: it quantifies critical events–such as executed instructions, CPU cycles, and cache misses–throughout a code segment's execution.

#include <perfcpp/event_counter.h> /// Initialize the counter auto event_counter = perf::EventCounter{}; /// Specify hardware events to count event_counter.add({"seconds", "instructions", "cycles", "cache-misses"}); /// Run the workload event_counter.start(); code_to_profile(); /// <-- Statistics recorded while execution event_counter.stop(); /// Print the result to the console const auto result = event_counter.result(); for (const auto [event_name, value] : result) { std::cout << event_name << ": " << value << std::endl; }

Possible output:

seconds: 0.0955897 instructions: 5.92087e+07 cycles: 4.70254e+08 cache-misses: 1.35633e+07

Note

For additional insights please refer to the guides on recording event statistics and event statistics on multiple CPUs/threads. Also, check out the hardware events documentation for details on both built-in and processor-specific events.

Record Samples

Recording samples functions much like perf [mem] record: it captures execution snapshots, e.g., the instruction pointer, executing CPU, and timestamp, at regular intervals (here every 50,000th CPU cycle).

#include <perfcpp/sampler.h> /// Create the sampler auto sampler = perf::Sampler{}; /// Specify when a sample is recorded: every 50,000th cycle sampler.trigger("cycles", perf::Period{50000U}); /// Specify what data is included into a sample: time, CPU ID, instruction sampler.values() .timestamp(true) .cpu_id(true) .instruction_pointer(true); /// Run the workload sampler.start(); code_to_profile(); /// <-- Samples recorded while execution sampler.stop(); /// Print the samples to the console const auto samples = sampler.result(); for (const auto& record : samples) { const auto timestamp = record.metadata().timestamp().value(); const auto cpu_id = record.metadata().cpu_id().value(); const auto instruction = record.instruction_execution().logical_instruction_pointer().value(); std::cout << "Time = " << timestamp << " | CPU = " << cpu_id << " | Instruction = 0x" << std::hex << instruction << std::dec << std::endl; }

Possible output:

Time = 365449130714033 | CPU = 8 | Instruction = 0x5a6e84b2075c Time = 365449130913157 | CPU = 8 | Instruction = 0x64af7417c75c Time = 365449131112591 | CPU = 8 | Instruction = 0x5a6e84b2075c Time = 365449131312005 | CPU = 8 | Instruction = 0x64af7417c75c

Note

For additional details–such as the types of data that can be included in samples–please consult the sampling guide. Additionally, consult the sampling on multiple CPUs/threads guide for instructions on parallel sampling.

More Examples

We include a collection of examples demonstrating the functionality and interface of perf-cpp in the examples/ directory, including

examples for counting hardware events (examples/statistics)
and for sampling (examples/sampling).

Building

perf-cpp is designed as a library (static or shared) that can be linked to your application.

# Clone the repository git clone https://github.com/jmuehlig/perf-cpp.git # Switch to the repository folder cd perf-cpp # Optional: Switch to this development version git checkout v0.12.4 # Build the library (in build/) # -DBUILD_EXAMPLES=1 compiles all examples (optional) # -DBUILD_LIB_SHARED=1 creates the library as a shared one (optional) # -DGEN_PROCESSOR_EVENTS=1 generates and compiles a .cpp file that adds events specific to the underlying CPU (optional) cmake . -B build -DBUILD_EXAMPLES=1 cmake --build build # Optional: Build examples (in build/examples/bin) if -DBUILD_EXAMPLES=1 cmake --build build --target examples

Note

Further information and detailed building instructions (e.g., how to integrate into CMake projects) are available in the building guide.

Full Documentation

Building: Integrate perf-cpp seamlessly into your C++ projects.
Counting Performance Events
- Basics: Master recording hardware event statistics directly within your application.
- Parallel and Multithreaded: Learn how to monitor events across threads and CPU cores.
- Metrics: Learn how to combine hardware events into meaningful metrics for clearer performance insights.
- Live Access: See how events can be accessed without stopping the recording, ideal for profiling tight loops and small functions.
Recording Samples
- Basics: Understand sampling mechanisms, which data to record, and how to access the results.
- Parallel and Multithreaded: Learn how to record samples in multithreaded workloads.
- Use the Linux Perf Tool to Analyze Recorded Samples: See how samples recorded via perf-cpp can be analyzed with perf [mem] report.
- Translating Instruction Pointers into Symbols and Samples into flame graphs: See how to translate instruction pointers into function names and prepare sampling results to transform them into flame graphs (e.g., using FlameGraph).
- Analyzing Memory Access Patterns: See how to link memory sampling data to specific data objects to profile detailed memory access characteristics.
Built-in and Hardware-specific Events: Discover built-in events and learn how to define new ones tailored to your hardware.
Perf Paranoid: Learn how to configure perf permissions.

System Requirements

Clang / GCC with support for C++17 features.
CMake version 3.10 or higher.
Linux Kernel 4.0 or newer (note that some features need a newer Kernel).
perf_event_paranoid setting: Adjust as needed to allow access to performance counters (see the Paranoid Value documentation).
Python3, if you make use of processor-specific hardware event generation.

Contribute and Contact

We welcome contributions and feedback. For feature requests, feedback, or bug reports, please reach out via our issue tracker or submit a pull request.

Alternatively, you can email me: jan.muehlig@tu-dortmund.de.

Further PMU-related Projects

Below is a non-exhaustive list of some other valuable profiling projects:

PAPI offers access not only to CPU performance counters but also to a variety of other hardware components including GPUs, I/O systems, and more.
Likwid is a collection of several command line tools for benchmarking, including an extensive wiki.
PerfEvent provides lightweight access to performance counters, facilitating streamlined performance monitoring.
Intel's Instrumentation and Tracing Technology allows applications to manage the collection of trace data effectively when used in conjunction with Intel VTune Profiler.
For those who prefer a more hands-on approach, the perf_event_open system call can be utilized directly without any wrappers.

Resources about (Perf-) Profiling

This is a non-exhaustive list of academic research papers and blog articles (feel free to add to it, e.g., via pull request – also your own work).

Academical Papers

Blog Posts

C2C - False Sharing Detection in Linux Perf (2016)
PMU counters and profiling basics. (2018)
Detect false sharing with Data Address Profiling. (2019)
Advanced profiling topics. PEBS and LBR. (2018)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

perf-cpp: Effortless Hardware Performance Monitoring for C++ Applications

What can perf-cpp do?

Quick Start

Record Hardware Event Statistics

Record Samples

More Examples

Building

Full Documentation

Further Reading

System Requirements

Contribute and Contact

Further PMU-related Projects

Resources about (Perf-) Profiling

Academical Papers

Blog Posts

About

Uh oh!

Releases 24

Packages

Contributors 5

Uh oh!

Languages

License

jmuehlig/perf-cpp

Folders and files

Latest commit

History

Repository files navigation

perf-cpp: Effortless Hardware Performance Monitoring for C++ Applications

What can perf-cpp do?

Quick Start

Record Hardware Event Statistics

Record Samples

More Examples

Building

Full Documentation

Further Reading

System Requirements

Contribute and Contact

Further PMU-related Projects

Resources about (Perf-) Profiling

Academical Papers

Blog Posts

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 24

Packages 0

Contributors 5

Uh oh!

Languages

Packages