Name	Name	Last commit message	Last commit date
Latest commit History 369 Commits
docs	docs
examples	examples
include/perfcpp	include/perfcpp
script	script
src	src
test	test
.clang-format	.clang-format
.gitignore	.gitignore
CHANGELOG.md	CHANGELOG.md
CMakeLists.txt	CMakeLists.txt
LICENSE	LICENSE
README.md	README.md

perf-cpp: Effortless Hardware Performance Monitoring for C++ Applications

Examples | How to Build | Documentation | System Requirements

perf-cpp enables access to Performance Monitoring Units and Performance Counters directly from C++ applications.

About

Built around Linux's powerful perf subsystem, perf-cpp provides a uniform interface that streamlines counting and sampling hardware events–without the complexity of low-level APIs. It can be integrated into the application to measure only desired execution paths and exclude parts of the application irrelevant to profiling. The key features include:

Count Hardware Events: Record performance statistics (comparable to perf stat) directly in your application and control what and when. Additionally, measure metrics like cycles per instruction and read event counters in realtime.
Record Samples: Leverage sampling the mechanism to capture critical profiling data such as instruction pointers, memory addresses, latency, cache misses, ... (similar to perf [mem] record).
Specify Hardware Events: Mix built-in events (e.g., cycles, instructions, cache-misses, ...) with events specific to the underlying hardware.
Practical Examples and Detailed Documentation: Quickly get started with ready-to-use examples demonstrating diverse, real-world applications.

perf-cpp extends the standard perf subsystem's capabilities, such as leveraging AMD IBS features to expose rich, CPU-specific data unavailable through the perf_event_open interface.

Examples

Record Hardware Event Statistics

Recording hardware event statistics operates much like perf stat: it quantifies critical events—such as executed instructions, CPU cycles, and cache misses–throughout a code segment's execution.

#include <perfcpp/event_counter.h> /// Initialize the counter auto counters = perf::CounterDefinition{}; auto event_counter = perf::EventCounter{ counters }; /// Specify hardware events to count event_counter.add({"seconds", "instructions", "cycles", "cache-misses"}); /// Run the workload event_counter.start(); code_to_profile(); /// <-- Statistics recorded while execution event_counter.stop(); /// Print the result to the console const auto result = event_counter.result(); for (const auto [event_name, value] : result) { std::cout << event_name << ": " << value << std::endl; }

Possible output:

seconds: 0.0955897 instructions: 5.92087e+07 cycles: 4.70254e+08 cache-misses: 1.35633e+07

Note

For additional insights please refer to the guides on Recording Events and Recording Events on Multiple CPUs/Threads. Also, check out the Hardware Events documentation for comprehensive details on both built-in and hardware-specific events.

Record Samples

Recording samples functions much like perf [mem] record: it captures execution snapshots, e.g., the instruction pointer, executing CPU, and timestamp, at regular intervals (here every 4,000th CPU cycle).

#include <perfcpp/sampler.h> /// Create the sampler auto counters = perf::CounterDefinition{}; auto sampler = perf::Sampler{ counters }; /// Specify when a sample is recorded: every 4000th cycle sampler.trigger("cycles", perf::Period{4000U}); /// Specify what data is included into a sample: time, CPU ID, instruction sampler.values() .timestamp(true) .cpu_id(true) .instruction_pointer(true); /// Run the workload sampler.start(); code_to_profile(); /// <-- Samples recorded while execution sampler.stop(); /// Print the samples to the console const auto samples = sampler.result(); for (const auto& record : samples) { const auto timestamp = record.metadata().timestmap().value(); const auto cpu_id = record.metadata().cpu_id().value(); const auto instruction = record.instruction_execution().logical_instruction_pointer().value(); std::cout << "Time = " << timestamp << " | CPU = " << cpu_id << " | Instruction = 0x" << std::hex << instruction << std::dec << std::endl; }

Possible output:

Time = 365449130714033 | CPU = 8 | Instruction = 0x5a6e84b2075c Time = 365449130913157 | CPU = 8 | Instruction = 0x64af7417c75c Time = 365449131112591 | CPU = 8 | Instruction = 0x5a6e84b2075c Time = 365449131312005 | CPU = 8 | Instruction = 0x64af7417c75c

Note

For additional details—such as the types of data that can be included in samples—please consult the Sampling Guide. Additionally, consult the Sampling on Multiple CPUs/Threads Guide for instructions on parallel sampling.

Advanced Examples

We include a comprehensive collection of examples demonstrating the advanced capabilities of perf-cpp, including, for example, counting events in parallel settings and sampling memory accesses.

Tip

All code examples are available in the examples/ folder.

Building

perf-cpp is designed as a library (static or shared) that can be linked to your application.

# Clone the repository git clone https://github.com/jmuehlig/perf-cpp.git # Switch to the repository folder cd perf-cpp # Optional: Switch to the latest stable version git checkout v0.11.1 # Build the library (in build/) # Note: -DBUILD_EXAMPLES=1 can be used to compile examples # Note: -DBUILD_LIB_SHARED=1 can be used to build the library as a shared one cmake . -B build -DBUILD_EXAMPLES=1 cmake --build build # Optional: Build examples (in build/examples/bin) if -DBUILD_EXAMPLES=1 cmake --build build --target examples

Note

Further information and detailed building instructions (e.g., how to integrate into CMake projects) are available in the Building Guide.

Documentation

Building: Integrate perf-cpp seamlessly into your C++ projects.
Counting Performance Events
- Basics: Master recording hardware event statistics directly within your application.
- Parallel and Multithreaded: Learn how to monitor events across threads and CPU cores.
- Metrics: Learn how to combine hardware events into meaningful metrics for clearer performance insights.
- Live Access: See how events can be accessed without stopping the recording, ideal for profiling tight loops and small functions.
Recording Samples
- Basics: Understand sampling mechanisms, which data to record, and how to access the results.
- Parallel and Multithreaded: Learn how to record samples in multithreaded workloads.
- Analyzing Memory Access Patterns: See how to link memory sampling data to specific data objects to profile detailed memory access characteristics.
Built-in and Hardware-specific Events: Discover built-in events and learn how to define new ones tailored to your hardware.
Perf Paranoid: Learn how to configure perf permissions.

System Requirements

Requires support for C++17 features.
CMake Version 3.10 or higher.
Linux Kernel 4.0 or newer (note that some features need a newer Kernel).
perf_event_paranoid setting: Adjust as needed to allow access to performance counters (see the Paranoid Value documentation).

Contribute and Contact

We welcome contributions and feedback. For feature requests, feedback, or bug reports, please reach out via our issue tracker or submit a pull request.

Alternatively, you can email me: jan.muehlig@tu-dortmund.de.

Further PMU-related Projects

Below is a non-exhaustive list of some other valuable profiling projects:

PAPI offers access not only to CPU performance counters but also to a variety of other hardware components including GPUs, I/O systems, and more.
Likwid is a collection of several command line tools for benchmarking, including an extensive wiki.
PerfEvent provides lightweight access to performance counters, facilitating streamlined performance monitoring.
Intel's Instrumentation and Tracing Technology allows applications to manage the collection of trace data effectively when used in conjunction with Intel VTune Profiler.
For those who prefer a more hands-on approach, the perf_event_open system call can be utilized directly without any wrappers.

Resources about (Perf-) Profiling

This is a non-exhaustive list of academic research papers and blog articles (feel free to add to it, e.g., via pull request – also your own work).

Academical Papers

Blog Posts

C2C - False Sharing Detection in Linux Perf (2016)
PMU counters and profiling basics. (2018)
Detect false sharing with Data Address Profiling. (2019)
Advanced profiling topics. PEBS and LBR. (2018)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

perf-cpp: Effortless Hardware Performance Monitoring for C++ Applications

About

Examples

Record Hardware Event Statistics

Record Samples

Advanced Examples

Building

Documentation

Further Reading

System Requirements

Contribute and Contact

Further PMU-related Projects

Resources about (Perf-) Profiling

Academical Papers

Blog Posts

About

Uh oh!

Releases 24

Packages

Contributors 5

Languages

License

jmuehlig/perf-cpp

Folders and files

Latest commit

History

Repository files navigation

perf-cpp: Effortless Hardware Performance Monitoring for C++ Applications

About

Examples

Record Hardware Event Statistics

Record Samples

Advanced Examples

Building

Documentation

Further Reading

System Requirements

Contribute and Contact

Further PMU-related Projects

Resources about (Perf-) Profiling

Academical Papers

Blog Posts

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 24

Packages 0

Contributors 5

Languages

Packages