code-analysis-tools

Guidelines and notes about useful tools to analyze and optimize code.

The idea is to gather some tools that we use to develop high performance code. We should provide information about how to install and use these tools. Let's begin by collecting all instructions here and later move them to subfolders for each tool if readme gets confusing.

Compiling

Use cache to speed up the compilation. If you compile the same project again and again with small changes, this can save you a lot of time. It is easy to setup. Installation via conda or via apt-get. See the manual for the different run modes. For cmake, you can use a flag

cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX -DCMAKE_CXX_COMPILER_LAUNCHER=ccache

Use Clang & Ninja to speed up the compilation.

Use mold to speed up the linking. For cmake, you can add link option in CMakeLists.txt

add_link_options("-fuse-ld=mold")

Tip: To compile a current dev branch like coal in a clean conda environment containing just the necessary dependencies, install compilers and use conda's cmake to ensure compatibility and avoid local dependency conflicts..

conda create --name ENV_NAME python=3.12 conda activate ENV_NAME conda install -c conda-forge cmake compilers conda install coal --only-deps mkdir build && cd build cmake .. -DALL_YOUR_FLAGS make -j8

Debugging C++ code

If you execute compiled experimental code or run a python script that is corresponding c++ bindings Segmentation Faults. Specific tools can help you to fix them.

FIRST: compile in Debug mode

Use either gdb (linux) or lldb (macOS). The commands specified below work for both.

Usage C++

# Start debugging session gdb <your-executable> # set breakpoints in two different ways, either on specific line for function b file.cpp:40` breakpoint set -n functionName  # start run

See here for further details.

Core dump file analysis

Core dump is another way to debug. First, enable core files

ulimit -c unlimited

Then, view the backtrace in gdb

gdb path/to/my/executable path/to/coredumpfile

See here for further details.

Usage Backward

Backward is a beautiful stack trace pretty printer for C++. You need to compile your project with generation of debug symbols enabled, usually -g with clang++ and g++. Add the following code to the source file

#include <backward.hpp> namespace backward { backward::SignalHandling sh; }

Usage Python bindings

gdb python # set breakpoints etc. run your_script.py

Once a program crashes, use bt to show the full backtrace.

Debugging Python code

Quick-and-dirty one-liner

You can spawn a Python interpreter in-context anywhere in your code:

__import__("IPython").embed()

You may add breakpoints in your Python code using:

breakpoint()

For a better debugger than the basic pdb one, you may install pdb++ using one of the following commands:

pip install pdbpp # In a pip environment conda install -c conda-forge pdbpp # In a Conda environment

With pdb++, add breakpoints again with breakpoint(). You may run sticky in the pdb++ environment to toggle a sticky mode (with colored code of the whole function) and start a Python interpreter with the interact command.

For debugging code with a graphical interface, check pudb (similar usage).

Performance analysis with FlameGraph

Checking how much time is spent for every function. Can help you to find the bottleneck in your code. FlameGraph is a nice visual tool to display your stack trace.

Install 1

Use Rust-powered flamegraph -> fast

# Install tools needed for analysis like perf sudo apt-get install linux-tools-common linux-tools-generic linux-tools-`uname -r` # Install rust on linux or macOS curl --proto '=https' --tlsv1.3 https://sh.rustup.rs -sSf | sh # Now you have the rust package managar cargo and you can do cargo install flamegraph # Necessary to allow access to cpu (only once) echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid echo 0 | sudo tee /proc/sys/kernel/kptr_restrict # Command to produce your flamegraph (use -c for custom options for perf) flamegraph -o sparse_wrapper.svg -v -- example-cpp

Install 2

Use classic perl flamegraph

# Install tools needed for analysis like perf sudo apt-get install linux-tools-common linux-tools-generic linux-tools-`uname -r` # Copy the repo https://github.com/brendangregg/FlameGraph.git # Necessary to allow access to cpu (only once) echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid echo 0 | sudo tee /proc/sys/kernel/kptr_restrict # Run perf on your executable and create output repot in current dir perf record --call-graph dwarf example-cpp perf script > perf.out # You can read the report with cat perf.out .. cd <cloned-flamegraph-repo> ./stackcollapse-perf.pl <location-of-perf.out> > out.folded ./flamegraph.pl out.folded > file.svg # Now open the file.svg in your favorite browser in enjoy the interactive mode

As the process with the default flamegraph repo is quite a pain, you can write your own script like @ManifoldFR in proxDDP.

Performance analysis with Tracy Profiler

Tracy is a real time, nanosecond resolution, remote telemetry, hybrid frame and sampling profiler for games and other applications that comes with a nice documentation. You can checkout the interactive demo.

To use it in your project, you can follow the same steps as in pinocchio or simple:

Install it via: conda install tracy-profiler tracy-profiler-gui -c conda-forge
Include tracy.cmake from jrl-cmakemodules in your main CMakeLists file
Add tracy as project dependency e.g.

if(PROJECT_NAME_BUILD_WITH_TRACY) # assume it is installed somewhere add_project_dependency(Tracy REQUIRED) endif(PROJECT_NAME_BUILD_WITH_TRACY)

In your code: #include "project_name/tracy.hpp" and use PROJECT_NAME_TRACY_ZONE_SCOPED_N("NAME_OF_ZONE") , where project_name should be replaced by your actual project name. See here for all macros.

After specifying the code segments you wish to monitor with Tracy, execute tracy-profiler from the command line (ensure your conda environment is active). Run your benchmark files and review the various statistics in the GUI. Note that if you are benchmarking on a remote server, you can connect to it using ssh user@remote -X to display the Tracy GUI on your screen.

Finding memory leaks

Valgrind can automatically detect many memory management and threading bugs.

sudo apt install valgrind # Use valgrind with input to check for mem leak valgrind --leak-check=yes myprog arg1 arg2

Check here and doc for further explanation.

leaks is an alternative tool available on macOS to detect memory leaks:

$ leaks -atExit -- myprog Date/Time: 2024-07-17 17:46:43.948 +0200 Launch Time: 2024-07-17 17:46:42.781 +0200 OS Version: macOS 14.5 (23F79) Report Version: 7 Analysis Tool: Xcode.app/Contents/Developer/usr/bin/leaks Analysis Tool Version: Xcode 15.4 (15F31d) Physical footprint: 4646K Physical footprint (peak): 4646K Idle exit: untracked ---- leaks Report Version: 4.0, multi-line stacks Process 56686: 507 nodes malloced for 47 KB Process 56686: 0 leaks for 0 total leaked bytes.

Check out here for more information.

AddressSanitizer

AddressSanitizer (aka ASan) is a memory error detector for C/C++. For cmake, you can add link option in CMakeLists.txt

set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fsanitize=address") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=address")

Check Eigen malloc

Use Eigen tools to make sure you are not allocating memory where you do not want to do so -> it is slowing down your program. Check proxqp or here.

The macros defined in ProxQP allow us to do

PROXSUITE_EIGEN_MALLOC_NOT_ALLOWED(); output = superfast_function_without_allocations(); PROXSUITE_EIGEN_MALLOC_ALLOWED();

and if this code is compiled in Debug mode, we will have assertation errors if eigen is allocation memory inside the function.

Checking for memory alloctions

GUI to check how much memory is allocated in every function when executing a program.
-> Valgrind + KCachegrind

Install

sudo apt-get install valgrind kcachegrind graphviz

Usage

valgrind --tool=massif --xtree-memory=full <your-executable> kcachegrind <output-file-of-previous-cmd>

Performance analysis in Python

Python provides a profiler named cProfile. To profile a script, simply add -m cProfile -o profile.prof when running the script, i.e.:

python -m cProfile -o profile.prof my_script.py --my_args

This saves the result in the specified output path (here, profile.txt), which you can then visualize with snakeviz: pip install snakeviz and then:

snakeviz profile.prof

This opens a browser tab with an interactive visualization of execution times.

External resources

Some very useful advices to optimize your C++ code that you should have in mind.

Narrator: "also quite amusing to read..."

A nice online course to get started with C++ explaining most of the basic concepts in c++14/17.

Debug Github CI

If you have a pipeline that is failing, and you would like to check some quick fixes directly on the CI machine debug-via-ssh is precious: Sign into your account on ngrok (you can use github) and follow the readme to set it up locally (2mins). Copy the token you obtained from ngrok into the secrets section of your repo, specify a password for the SSH connection also as token of the repo. Copy this to your workflow at the position where you would like to stop:

- name: Start SSH session uses: luchihoratiu/debug-via-ssh@main with: NGROK_AUTH_TOKEN: ${{ secrets.NGROK_AUTH_TOKEN }} SSH_PASS: ${{ secrets.SSH_PASS }}

Run the CI and follow the output. Note: the option continue-on-error: True can be very useful the continue a failing workflow until the point where you ssh to it.

lhotari/action-upterm and action-tmate are also two alternative GitHub actions with similar functionality that can be ran directly without any setup in your CI jobs:

- name: Start SSH session uses: lhotari/action-upterm@v1

Consider using the action with limit-access-to-actor: true, to limit access to your account.

Profile C++ compile time of a translation unit

When doing heavy template meta-programming in C++ it can be useful to analyze what part of the code is taking a long time to compile.

clang allows to profile the compilation time of a translation unit. To activate this function, add the -ftime-trace option while building.

In a CMake project, you can do this with the following command line:

cmake .. -DCMAKE_CXX_FLAGS="-ftime-trace"

Each .cpp will then produce a .json file. To find them you can use the following command line:

find . -iname "*.json"

Then, you can open the file with the Chromium tracing tool. Open the about:tracing URL in Chromium and load the .json. You will have the following display:

Note for GNU/Linux users

-ftime-trace is only available with clang. With conda, you can install it with the following command line conda install clangxx.

Then, when running CMake for the first time use the following command:

CC=clang CXX=clang++ cmake ..

Profile C++ compile time of a CMake target

When building a CMake target, it's interesting to profile which translation unit took most of the compile time.

Ninja and ninjatracing allow to do visualize the whole target build time.

First clone ninjatracing somewhere.

Then configure your CMake build to use Ninja. Run the following command in a NEW build directory:

cmake .. -GNinja

Now, build your project with Ninja:

ninja

The .ninja_log file should had been created in the build directory. You can convert it to a file compatible with the Chromium tracing tool.

Run the following command:

path/to/ninjatracing .ninja_log > trace.json

Now, you can visualize it with the Chromium tracing tool. Open the about:tracing URL in Chromium and load the .json. You will have the following display:

Launch a process on a particular CPU/CORE/PU

hwloc tools allow to study computer and computer cluster topology.

To install hwloc on ubuntu, run the following apt command:

sudo apt install hwloc

You can show your computer architecture topology with:

lstopo

Also, you can run a process on a particular CPU/CORE/PU with the following command:

# Run my_exe on processing unit 0 hwloc-bind pu:0 -- my_exe # Run my_exe on core 2 hwloc-bind pu:2 -- my_exe

You can then watch where processes are running with:

lstopo --ps

To go deeper, a tutorial is available.

Get stack size

The stack of a program is limited in memory space available. One way to get this value is to run in your favorite terminal:

ulimit -s

More information on the current configuration can be obtained via:

ulimit -a

Counting number of lines of codes

It might be useful at some point to count the number of lines of code in a given project. Cloc is an open-source tool that counts blank lines, comment lines, and physical lines of source code in many programming languages.

In Pinocchio, counting the important lines of code can be done using:

cloc unittest src include examples bindings

which gives:

 3149 text files. 2299 unique files. 970 files ignored. github.com/AlDanial/cloc v 2.04 T=0.90 s (2543.7 files/s, 395546.2 lines/s) -------------------------------------------------------------------------------- Language files blank comment code -------------------------------------------------------------------------------- C/C++ Header 672 20927 20408 107549 C++ 423 15686 4418 61598 XML 154 101 120 53449 CMake 325 2595 4955 18197 Python 214 4459 2277 15361 make 13 2073 1463 4100 INI 100 657 0 4047 Markdown 71 976 40 2509 Text 89 231 0 1444 CSS 4 138 57 549 YAML 10 48 10 519 Bourne Shell 4 83 167 444 SVG 1 1 1 382 Jupyter Notebook 1 0 655 214 JavaScript 196 196 3332 196 TeX 5 0 0 167 reStructuredText 8 102 83 147 HTML 4 1 25 92 Objective-C 1 11 16 92 JSON 1 0 0 46 Bourne Again Shell 2 14 12 45 awk 1 0 0 10 -------------------------------------------------------------------------------- SUM: 2299 48299 38039 271157 --------------------------------------------------------------------------------

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
resources		resources
README.md		README.md

Uh oh!

Uh oh!

Simple-Robotics/code-analysis-tools

Folders and files

Latest commit

History

Repository files navigation

code-analysis-tools

Compiling

Debugging C++ code

Usage C++

Core dump file analysis

Usage Backward

Usage Python bindings

Debugging Python code

Quick-and-dirty one-liner

Performance analysis with FlameGraph

Install 1

Install 2

Performance analysis with Tracy Profiler

Finding memory leaks

AddressSanitizer

Check Eigen malloc

Checking for memory alloctions

Install

Usage

Performance analysis in Python

External resources

Debug Github CI

Profile C++ compile time of a translation unit

Note for GNU/Linux users

Profile C++ compile time of a CMake target

Launch a process on a particular CPU/CORE/PU

Get stack size

Counting number of lines of codes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Uh oh!

Packages