Skip to content

Lightweight recording and sampling of performance counters for specific code segments directly from your C++ application.

License

Notifications You must be signed in to change notification settings

jmuehlig/perf-cpp

 
 

Repository files navigation

perf-cpp: Effortless Hardware Performance Monitoring for C++ Applications

Examples | How to Build | Documentation | System Requirements

perf-cpp enables access to Performance Monitoring Units and Performance Counters directly from C++ applications.

About

Built around Linux's powerful perf subsystem, perf-cpp provides a uniform interface that streamlines counting and sampling hardware events–without the complexity of low-level APIs. It can be integrated into the application to measure only desired execution paths and exclude parts of the application irrelevant to profiling. The key features include:

perf-cpp extends the standard perf subsystem's capabilities, such as leveraging AMD IBS features to expose rich, CPU-specific data unavailable through the perf_event_open interface.

Examples

Record Hardware Event Statistics

Recording hardware event statistics operates much like perf stat: it quantifies critical events—such as executed instructions, CPU cycles, and cache misses–throughout a code segment's execution.

#include <perfcpp/event_counter.h> /// Initialize the counter auto counters = perf::CounterDefinition{}; auto event_counter = perf::EventCounter{ counters }; /// Specify hardware events to count event_counter.add({"seconds", "instructions", "cycles", "cache-misses"}); /// Run the workload event_counter.start(); code_to_profile(); /// <-- Statistics recorded while execution event_counter.stop(); /// Print the result to the console const auto result = event_counter.result(); for (const auto [event_name, value] : result) { std::cout << event_name << ": " << value << std::endl; }

Possible output:

seconds: 0.0955897 instructions: 5.92087e+07 cycles: 4.70254e+08 cache-misses: 1.35633e+07 

Note

For additional insights please refer to the guides on Recording Events and Recording Events on Multiple CPUs/Threads. Also, check out the Hardware Events documentation for comprehensive details on both built-in and hardware-specific events.

Record Samples

Recording samples functions much like perf [mem] record: it captures execution snapshots, e.g., the instruction pointer, executing CPU, and timestamp, at regular intervals (here every 4,000th CPU cycle).

#include <perfcpp/sampler.h> /// Create the sampler auto counters = perf::CounterDefinition{}; auto sampler = perf::Sampler{ counters }; /// Specify when a sample is recorded: every 4000th cycle sampler.trigger("cycles", perf::Period{4000U}); /// Specify what data is included into a sample: time, CPU ID, instruction sampler.values() .timestamp(true) .cpu_id(true) .instruction_pointer(true); /// Run the workload sampler.start(); code_to_profile(); /// <-- Samples recorded while execution sampler.stop(); /// Print the samples to the console const auto samples = sampler.result(); for (const auto& record : samples) { const auto timestamp = record.metadata().timestmap().value(); const auto cpu_id = record.metadata().cpu_id().value(); const auto instruction = record.instruction_execution().logical_instruction_pointer().value(); std::cout << "Time = " << timestamp << " | CPU = " << cpu_id << " | Instruction = 0x" << std::hex << instruction << std::dec << std::endl; }

Possible output:

Time = 365449130714033 | CPU = 8 | Instruction = 0x5a6e84b2075c Time = 365449130913157 | CPU = 8 | Instruction = 0x64af7417c75c Time = 365449131112591 | CPU = 8 | Instruction = 0x5a6e84b2075c Time = 365449131312005 | CPU = 8 | Instruction = 0x64af7417c75c 

Note

For additional details—such as the types of data that can be included in samples—please consult the Sampling Guide. Additionally, consult the Sampling on Multiple CPUs/Threads Guide for instructions on parallel sampling.

Advanced Examples

We include a comprehensive collection of examples demonstrating the advanced capabilities of perf-cpp, including, for example, counting events in parallel settings and sampling memory accesses.

Tip

All code examples are available in the examples/ folder.

Building

perf-cpp is designed as a library (static or shared) that can be linked to your application.

# Clone the repository git clone https://github.com/jmuehlig/perf-cpp.git # Switch to the repository folder cd perf-cpp # Optional: Switch to the latest stable version git checkout v0.11.1 # Build the library (in build/) # Note: -DBUILD_EXAMPLES=1 can be used to compile examples # Note: -DBUILD_LIB_SHARED=1 can be used to build the library as a shared one cmake . -B build -DBUILD_EXAMPLES=1 cmake --build build # Optional: Build examples (in build/examples/bin) if -DBUILD_EXAMPLES=1 cmake --build build --target examples

Note

Further information and detailed building instructions (e.g., how to integrate into CMake projects) are available in the Building Guide.

Documentation

  • Building: Integrate perf-cpp seamlessly into your C++ projects.
  • Counting Performance Events
    • Basics: Master recording hardware event statistics directly within your application.
    • Parallel and Multithreaded: Learn how to monitor events across threads and CPU cores.
    • Metrics: Learn how to combine hardware events into meaningful metrics for clearer performance insights.
    • Live Access: See how events can be accessed without stopping the recording, ideal for profiling tight loops and small functions.
  • Recording Samples
    • Basics: Understand sampling mechanisms, which data to record, and how to access the results.
    • Parallel and Multithreaded: Learn how to record samples in multithreaded workloads.
    • Analyzing Memory Access Patterns: See how to link memory sampling data to specific data objects to profile detailed memory access characteristics.
  • Built-in and Hardware-specific Events: Discover built-in events and learn how to define new ones tailored to your hardware.
  • Perf Paranoid: Learn how to configure perf permissions.

Further Reading

  • Examples: Learn how to set up different features from code-examples.
  • Changelog: Stay updated with the latest changes and improvements.

System Requirements

  • Requires support for C++17 features.
  • CMake Version 3.10 or higher.
  • Linux Kernel 4.0 or newer (note that some features need a newer Kernel).
  • perf_event_paranoid setting: Adjust as needed to allow access to performance counters (see the Paranoid Value documentation).

Contribute and Contact

We welcome contributions and feedback. For feature requests, feedback, or bug reports, please reach out via our issue tracker or submit a pull request.

Alternatively, you can email me: jan.muehlig@tu-dortmund.de.


Further PMU-related Projects

Below is a non-exhaustive list of some other valuable profiling projects:

  • PAPI offers access not only to CPU performance counters but also to a variety of other hardware components including GPUs, I/O systems, and more.
  • Likwid is a collection of several command line tools for benchmarking, including an extensive wiki.
  • PerfEvent provides lightweight access to performance counters, facilitating streamlined performance monitoring.
  • Intel's Instrumentation and Tracing Technology allows applications to manage the collection of trace data effectively when used in conjunction with Intel VTune Profiler.
  • For those who prefer a more hands-on approach, the perf_event_open system call can be utilized directly without any wrappers.

Resources about (Perf-) Profiling

This is a non-exhaustive list of academic research papers and blog articles (feel free to add to it, e.g., via pull request – also your own work).

Academical Papers

Blog Posts