Examples | How to Build | Documentation | System Requirements
perf-cpp enables access to Performance Monitoring Units and Performance Counters directly from C++ applications.
Built around Linux's powerful perf subsystem, perf-cpp provides a uniform interface that streamlines counting and sampling hardware events–without the complexity of low-level APIs. It can be integrated into the application to measure only desired execution paths and exclude parts of the application irrelevant to profiling. The key features include:
- Count Hardware Events: Record performance statistics (comparable to
perf stat) directly in your application and control what and when. Additionally, measure metrics like cycles per instruction and read event counters in realtime. - Record Samples: Leverage sampling the mechanism to capture critical profiling data such as instruction pointers, memory addresses, latency, cache misses, ... (similar to
perf [mem] record). - Specify Hardware Events: Mix built-in events (e.g., cycles, instructions, cache-misses, ...) with events specific to the underlying hardware.
- Practical Examples and Detailed Documentation: Quickly get started with ready-to-use examples demonstrating diverse, real-world applications.
perf-cpp extends the standard perf subsystem's capabilities, such as leveraging AMD IBS features to expose rich, CPU-specific data unavailable through the perf_event_open interface.
Recording hardware event statistics operates much like perf stat: it quantifies critical events—such as executed instructions, CPU cycles, and cache misses–throughout a code segment's execution.
#include <perfcpp/event_counter.h> /// Initialize the counter auto counters = perf::CounterDefinition{}; auto event_counter = perf::EventCounter{ counters }; /// Specify hardware events to count event_counter.add({"seconds", "instructions", "cycles", "cache-misses"}); /// Run the workload event_counter.start(); code_to_profile(); /// <-- Statistics recorded while execution event_counter.stop(); /// Print the result to the console const auto result = event_counter.result(); for (const auto [event_name, value] : result) { std::cout << event_name << ": " << value << std::endl; }Possible output:
seconds: 0.0955897 instructions: 5.92087e+07 cycles: 4.70254e+08 cache-misses: 1.35633e+07 Note
For additional insights please refer to the guides on Recording Events and Recording Events on Multiple CPUs/Threads. Also, check out the Hardware Events documentation for comprehensive details on both built-in and hardware-specific events.
Recording samples functions much like perf [mem] record: it captures execution snapshots, e.g., the instruction pointer, executing CPU, and timestamp, at regular intervals (here every 4,000th CPU cycle).
#include <perfcpp/sampler.h> /// Create the sampler auto counters = perf::CounterDefinition{}; auto sampler = perf::Sampler{ counters }; /// Specify when a sample is recorded: every 4000th cycle sampler.trigger("cycles", perf::Period{4000U}); /// Specify what data is included into a sample: time, CPU ID, instruction sampler.values() .timestamp(true) .cpu_id(true) .instruction_pointer(true); /// Run the workload sampler.start(); code_to_profile(); /// <-- Samples recorded while execution sampler.stop(); /// Print the samples to the console const auto samples = sampler.result(); for (const auto& record : samples) { const auto timestamp = record.metadata().timestmap().value(); const auto cpu_id = record.metadata().cpu_id().value(); const auto instruction = record.instruction_execution().logical_instruction_pointer().value(); std::cout << "Time = " << timestamp << " | CPU = " << cpu_id << " | Instruction = 0x" << std::hex << instruction << std::dec << std::endl; }Possible output:
Time = 365449130714033 | CPU = 8 | Instruction = 0x5a6e84b2075c Time = 365449130913157 | CPU = 8 | Instruction = 0x64af7417c75c Time = 365449131112591 | CPU = 8 | Instruction = 0x5a6e84b2075c Time = 365449131312005 | CPU = 8 | Instruction = 0x64af7417c75c Note
For additional details—such as the types of data that can be included in samples—please consult the Sampling Guide. Additionally, consult the Sampling on Multiple CPUs/Threads Guide for instructions on parallel sampling.
We include a comprehensive collection of examples demonstrating the advanced capabilities of perf-cpp, including, for example, counting events in parallel settings and sampling memory accesses.
Tip
All code examples are available in the examples/ folder.
perf-cpp is designed as a library (static or shared) that can be linked to your application.
# Clone the repository git clone https://github.com/jmuehlig/perf-cpp.git # Switch to the repository folder cd perf-cpp # Optional: Switch to the latest stable version git checkout v0.11.1 # Build the library (in build/) # Note: -DBUILD_EXAMPLES=1 can be used to compile examples # Note: -DBUILD_LIB_SHARED=1 can be used to build the library as a shared one cmake . -B build -DBUILD_EXAMPLES=1 cmake --build build # Optional: Build examples (in build/examples/bin) if -DBUILD_EXAMPLES=1 cmake --build build --target examplesNote
Further information and detailed building instructions (e.g., how to integrate into CMake projects) are available in the Building Guide.
- Building: Integrate perf-cpp seamlessly into your C++ projects.
- Counting Performance Events
- Basics: Master recording hardware event statistics directly within your application.
- Parallel and Multithreaded: Learn how to monitor events across threads and CPU cores.
- Metrics: Learn how to combine hardware events into meaningful metrics for clearer performance insights.
- Live Access: See how events can be accessed without stopping the recording, ideal for profiling tight loops and small functions.
- Recording Samples
- Basics: Understand sampling mechanisms, which data to record, and how to access the results.
- Parallel and Multithreaded: Learn how to record samples in multithreaded workloads.
- Analyzing Memory Access Patterns: See how to link memory sampling data to specific data objects to profile detailed memory access characteristics.
- Built-in and Hardware-specific Events: Discover built-in events and learn how to define new ones tailored to your hardware.
- Perf Paranoid: Learn how to configure perf permissions.
- Examples: Learn how to set up different features from code-examples.
- Changelog: Stay updated with the latest changes and improvements.
- Requires support for C++17 features.
- CMake Version 3.10 or higher.
- Linux Kernel 4.0 or newer (note that some features need a newer Kernel).
perf_event_paranoidsetting: Adjust as needed to allow access to performance counters (see the Paranoid Value documentation).
We welcome contributions and feedback. For feature requests, feedback, or bug reports, please reach out via our issue tracker or submit a pull request.
Alternatively, you can email me: jan.muehlig@tu-dortmund.de.
Below is a non-exhaustive list of some other valuable profiling projects:
- PAPI offers access not only to CPU performance counters but also to a variety of other hardware components including GPUs, I/O systems, and more.
- Likwid is a collection of several command line tools for benchmarking, including an extensive wiki.
- PerfEvent provides lightweight access to performance counters, facilitating streamlined performance monitoring.
- Intel's Instrumentation and Tracing Technology allows applications to manage the collection of trace data effectively when used in conjunction with Intel VTune Profiler.
- For those who prefer a more hands-on approach, the perf_event_open system call can be utilized directly without any wrappers.
This is a non-exhaustive list of academic research papers and blog articles (feel free to add to it, e.g., via pull request – also your own work).
- Quantitative Evaluation of Intel PEBS Overhead for Online System-Noise Analysis (2017)
- Analyzing memory accesses with modern processors (2020)
- Precise Event Sampling on AMD Versus Intel: Quantitative and Qualitative Comparison (2023)
- Multi-level Memory-Centric Profiling on ARM Processors with ARM SPE (2024)