this is a sort of intermission
Getting perf to work up to a point
Apparently the opensbi-mediated access to the performance counter does not map so using the usual cycles
and instructions
event works in perf record
. I got this board mainly to help with dav1d development efforts, so not having perf support would make harder to reason about performance.
The best workaround after a discussion in the forums, is to build the pmu-events
to include custom ones and then rely on the overly precise cpu-specific events instead:
$ perf list | grep cycle bus-cycles [Hardware event] cpu-cycles OR cycles [Hardware event] ref-cycles [Hardware event] stalled-cycles-backend OR idle-cycles-backend [Hardware event] stalled-cycles-frontend OR idle-cycles-frontend [Hardware event] m_mode_cycle [M-mode cycles] rtu_flush_cycle s_mode_cycle [S-mode cycles] stalled_cycle_backend [Stalled cycles backend] stalled_cycle_frontend [Stalled cycles frontend] u_mode_cycle [U-mode cycles] vidu_total_cycle vidu_vec0_cycle vidu_vec1_cycle ...
$ perf list | grep inst branch-instructions OR branches [Hardware event] instructions [Hardware event] br_inst [Branch instructions] cond_br_inst [Conditional branch instructions] indirect_br_inst [Indirect branch instructions] taken_cond_br_inst [Taken conditional branch instructions] uncond_br_inst [Unconditional branch instructions] instruction: alu_inst [ALU (integer) instructions] amo_inst [AMO instructions] atomic_inst [Atomic instructions] bus_fence_inst [Bus FENCE instructions] csr_inst [CSR instructions] div_inst [Division instructions] ecall_inst [ECALL instructions] failed_sc_inst [Failed SC instructions] fence_inst [FENCE instructions] fp_div_inst [Floating-point division instructions] fp_inst [Floating-point instructions] fp_load_inst [Floating-point load instructions] fp_store_inst [Floating-point store instructions] load_inst [Load instructions] lr_inst [LR instructions] mult_inst [Multiplication instructions] sc_inst [SC instructions] store_inst [Store instructions] unaligned_load_inst [Unaligned load instructions] unaligned_store_inst [Unaligned store instructions] vector_div_inst [Vector division instructions] vector_inst [Vector instructions] vector_load_inst [Vector load instructions] vector_store_inst [Vector store instructions] id_inst_pipedown [ID instruction pipedowns] id_one_inst_pipedown [ID one instruction pipedowns] issued_inst [Issued instructions] rf_inst_pipedown [RF instruction pipedowns] rf_one_inst_pipedown [RF one instruction pipedowns]
Building perf
Perf way to deal with cpu-specific events is through some machinery called jevents.
It lives in tools/perf/pmu-events
and you can manually trigger it with.
./jevents.py riscv arch pmu-events.c
And produce C code from a bunch of JSON and a CSV map file.
When I tried build the sources the first time I tried to cut it by setting most NO_{}
make variables and left NO_JEVENTS=1
, luckily I fixed it after noticing the different output in the forum.
## I assume you have here the custom linux sources cd /usr/src/pi-linux/tools/perf ## being lazy I disabled about everything instead of installing dependencies, one time I disabled too much. make -j 8 V=1 VF=1 HOSTCC=riscv64-unknown-linux-gnu-gcc HOSTLD=riscv64-unknown-linux-gnu-ld CC=riscv64-unknown-linux-gnu-gcc CXX=riscv64-unknown-linux-gnu-g++ AR=riscv64-unknown-linux-gnu-ar LD=riscv64-unknown-linux-gnu-ld NM=riscv64-unknown-linux-gnu-nm PKG_CONFIG=riscv64-unknown-linux-gnu-pkg-config prefix=/usr bindir_relative=bin tipdir=share/doc/perf-6.8 'EXTRA_CFLAGS=-O2 -pipe' 'EXTRA_LDFLAGS=-Wl,-O1 -Wl,--as-needed' ARCH=riscv BUILD_BPF_SKEL= BUILD_NONDISTRO=1 JDIR= CORESIGHT= GTK2= feature-gtk2-infobar= NO_AUXTRACE= NO_BACKTRACE= NO_DEMANGLE= NO_JEVENTS=0 NO_JVMTI=1 NO_LIBAUDIT=1 NO_LIBBABELTRACE=1 NO_LIBBIONIC=1 NO_LIBBPF=1 NO_LIBCAP=1 NO_LIBCRYPTO= NO_LIBDW_DWARF_UNWIND= NO_LIBELF= NO_LIBNUMA=1 NO_LIBPERL=1 NO_LIBPFM4=1 NO_LIBPYTHON=1 NO_LIBTRACEEVENT= NO_LIBUNWIND=1 NO_LIBZSTD=1 NO_SDT=1 NO_SLANG=1 NO_LZMA=1 NO_ZLIB= TCMALLOC= WERROR=0 LIBDIR=/usr/libexec/perf-core libdir=/usr/lib64 plugindir=/usr/lib64/perf/plugins -f Makefile.perf install
Now I have a perf
with still cycles
and instructions
not working with perf record
, I wonder if there is a way at opensbi or kernel level to aggregate events to make it work properly, but I never had to look into perf internals so probably I poke it way later if nobody address it otherwise, anyway
perf record --group -e u_mode_cycle,m_mode_cycle,s_mode_cycle
produces something close enough for cycles, well u_mode_cycle
is enough.
While for instructions the situation is a bit more annoying
perf record --group -e alu_inst,amo_inst,atomic_inst,fp_div_inst,fp_inst,fp_load_inst,fp_store_inst,load_inst,lr_inst,mult_inst,sc_inst,store_inst,unaligned_load_inst,unaligned_store_inst
is close to count all the scalar instructions, but trying to add vector_div_inst,vector_inst,vector_load_inst,vector_store_inst
somehow makes perf record stop collecting samples silently, adding just 3 more events works though, so I guess I can be happy with u_mode_cycle,alu_inst,atomic_inst,fp_inst,vector_inst
at least.
Top comments (0)