- Notifications
You must be signed in to change notification settings - Fork 94
Description
Discussed in #1433
Originally posted by zamazan4ik November 21, 2023
Hi!
Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are available here. According to the tests, PGO can help with achieving better performance. Also, I found interesting results about PGO effects on tsv-utils - project in the similar domain as qsv. Since all of these, I think trying to optimize qsv with PGO can be a good idea.
I already did some benchmarks and want to share my results.
Test environment
- Macbook Pro 14 M1 (6 + 2 CPU, 16 Gib RAM)
- Compiler - Rustc 1.74
- qsv version:
masterbranch on commit531acbb072c48cbaca5d58b593243e0f5f0ec8d3
Right now I cannot perform the tests on my Linux machine (Fedora-based) due to some build errors: #1431 . But I think the results should be the same for the Linux platform as well.
Benchmark
For benchmark purposes, I use this QSV benchmark. For PGO optimization I use cargo-pgo tool. The same benchmark suite was used for the PGO training phase built with cargo pgo build -- --release --locked -F feature_capable,apply,geocode,luau,to,polars --bin qsv but with disabled LTO. The only change to the benchmark suite was done is benchmark run reduction since for the training phase is enough to run every test case only once.
PGO optimized results I got with QSV built with cargo pgo optimize build -- --release --locked -F feature_capable,apply,geocode,luau,to,polars --bin qsv but with disabled LTO. Release version is built with cargo build --release --locked -F feature_capable,apply,geocode,luau,to,polars --bin qsv.
Unfortunately, due to the bug in the Rustc compiler right now PGO cannot be enabled simultaneously with LTO for QSV. So I compare "QSV with LTO" vs "QSV with PGO". Later, when the bug will be fixed, we can apply LTO + PGO to QSV at the same time.
Results
I got the following results:
- QSV Release + LTO results: https://gist.github.com/zamazan4ik/f166967f12ad27a2bf4253975c2e1907
- QSV Release + PGO optimized results: https://gist.github.com/zamazan4ik/25611016ee295e13b29da4c07adb681b
- (just for reference) QSV Relese + PGO instrumentation: https://gist.github.com/zamazan4ik/a0d217d3e25ff23670eae20003ddd40c
As I interpret the results, PGO measurably improves QSV performance in many cases.
Further steps
I can suggest the following action points:
- Perform more PGO benchmarks on QSV. If it shows improvements - add a note to the documentation about possible improvements in QSV performance with PGO.
- Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize QSV according to their workloads.
- Optimize pre-built QSV binaries
Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.
Here are some examples of how PGO optimization is integrated in other projects:
- Rustc: a CI script for the multi-stage build
- GCC:
- Clang: Docs
- Python:
- Go: Bash script
- V8: Bazel flag
- ChakraCore: Scripts
- Chromium: Script
- Firefox: Docs
- Thunderbird has PGO support too
- PHP - Makefile command and old Centminmod scripts
- MySQL: CMake script
- YugabyteDB: GitHub commit
- FoundationDB: Script
- Zstd: Makefile
- Foot: Scripts
- Windows Terminal: GitHub PR
- Pydantic-core: GitHub PR
- file.d: GitHub PR
- OceanBase: CMake flag