Skip to content

Conversation

@mhucka
Copy link
Collaborator

@mhucka mhucka commented Jun 29, 2025

Changes:

  • Previously, the CMake configuration was such that it would always build the AVX2, AVX512 & SSE2 Pybind11 modules without testing whether the current system supported those options. The changes here use CMake features to detect whether the architectural features are in fact available on the host computer, and only attempt to build the appropriate modules. The code for testing this is in a new CheckCPU.cmake file.

  • Previously, each of 8 or so pybind_interface/ subdirectories had CMakeLists.txt files that contained the same text for setting certain compilation flags. This PR removes the duplication in favor of putting the settings into the top-level CMakeLists.txt file.

  • On MacOS, this adds more include and link directories (specifically those commonly used with Homebrew) for finding the LLVM and the OMP include files and libraries.

  • Finally, this PR includes a bit of refactoring:

    • The introduction of a new directory, dev_tools/cmake/, for placing .cmake files included by other files. It moves pybind_interface/GetPybind11.cmake into this new location alongside the new CheckCPU.cmake file.
    • The reorganization of much of the top-level CMakeLists.txt file, to (hopefully) organize some of the code more logically.
    • The addition of more message() statements to help understand what's happening.
mhucka added 7 commits June 29, 2025 22:52
Previously, the CMake configuration was such that it would always build the AVX2, AVX512 & SSE2 Pybind11 modules without testing whether the current system supported those options. The changes here use CMake features to detect whether the architectural features are in fact available, and only attempt to build the appropriate modules. In addition, there is a minor bit of refactoring to group some of the code more logically, and to add some more informational message printouts.
Each of 8 or so `pybind_interface/` subdirectories had `CMakeLists.txt` files that contained the same text for setting certain compilation flags. This commit removes the duplication in favor of putting the settings into the top-level CMake file.
On Ubuntu, one sees warnings like this: ``` lto-wrapper: warning: using serial compilation of 13 LTRANS jobs lto-wrapper: note: see the ‘-flto’ option documentation for more information lto-wrapper: warning: using serial compilation of 15 LTRANS jobs lto-wrapper: note: see the ‘-flto’ option documentation for more information lto-wrapper: warning: using serial compilation of 16 LTRANS jobs lto-wrapper: note: see the ‘-flto’ option documentation for more information lto-wrapper: warning: using serial compilation of 16 LTRANS jobs lto-wrapper: note: see the ‘-flto’ option documentation for more information ``` This seems to be the default behavior if the option `-flto` is not given a value (c.f. https://stackoverflow.com/a/72222512/28972686). Giving it a value of "auto" makes the warning go away and lets the compilation toolchain decide how much parallism it can use.
@mhucka mhucka changed the title Improve hardware feature detection & conslidate duplicated settings in CMake files Improve hardware feature detection & consolidate duplicated settings in CMake files Jun 30, 2025
mhucka added 2 commits June 30, 2025 03:47
This can enable additional optimizations without requiring specific avx/sse/etc. instructions.
@mhucka mhucka force-pushed the mh-consolidate-cmake-configs branch from 1b753ba to 3941a3f Compare June 30, 2025 03:51
@github-actions github-actions bot added the size: L 250< lines changed <1000 label Jul 1, 2025
The common wisdom is wrong, apparently: `-march=native` does not work for AVX on MacOS.
@mhucka mhucka force-pushed the mh-consolidate-cmake-configs branch from 1756d01 to 20c9dd8 Compare July 1, 2025 03:37
@mhucka mhucka self-assigned this Jul 1, 2025
CMakeLists.txt Outdated
check_cxx_compiler_flag("/arch:AVX512" HAVE_AVX512)
else()
check_cxx_compiler_flag("-mavx2" HAVE_AVX2)
check_cxx_compiler_flag("-mavx512f" HAVE_AVX512)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really a good way to detect whether the architectural features are in fact available on the host computer? check_cxx_compiler_flag checks whether the C++ compiler supports a given flag. It seems the g++ compiler supports the -mavx512f flag even if the host CPU does not support AVX512 instructions.

mhucka added 5 commits July 1, 2025 21:24
Getting correct info at run-time about SSE support on Windows has been a bit difficult. To help get the best performance, if we know SSE is supposed to be supported, we can try to use multiple known architecture flags starting with those introduced most recently.
The previous code wrongly assumed that testing for compiler support would imply the underlying hardware supported the features. This is wrong (and rather obviously wrong -- I don't know what I was thinking before). The new code tries in various ways to get info about the architectural features of the host's CPU.
@mhucka mhucka changed the title Improve hardware feature detection & consolidate duplicated settings in CMake files Improve hardware feature detection in CMake files & do refactoring Jul 6, 2025
@mhucka mhucka changed the title Improve hardware feature detection in CMake files & do refactoring Improve hardware feature detection in CMake files and do further refactoring & enhancement Jul 6, 2025
We have at least two custom CMake files now, and may have more in the future. The files only need to be included from the top-level `CMakeLists.txt` file; they don't actually need to be included from the `CMakeLists.txt` files in the `pybind_interface/*` subdirectories because those subdirectories are added to the top-level `CMakeLists.txt` file. For better organization and maintaince, I think it makes sense to move `GetPybind11.cmake` and other *.cmake files into a dedicated subdirectory, and a location inside `dev_tools/` seems to make the most sense.
mhucka added 12 commits July 7, 2025 03:46
This updates the CMake code and rewrites the logic in several ways: * Test if pybind11 has already been found, and skip the work if so. * Use `find_package(pybind11)` to look for the Pybind11 package first, with a hint about where it might be located in the Python installation directories. * If the previous approach didn't find pybind11, only then try to get it from the GitHub repository. * Add the Pybind11 header files directory to the Python include files directories list, so that the CMake configuration files for the qsim modules can find them without having to add them explicitly in the module's CMakeLists.txt files.
Depending on the host platform, this uses different methods to try to determine whether the CPU supports the SIMD and vector instruction sets used by qsim (different flavors of AVX and SSE).
This file turned out to do so little that it's not worth maintaining a separate file for it.
Due to the fact that the top-level `CMakeLists.txt` file includes these files using `add_directory`, it's not necessary for them to do things like include `GetPybind.cmake` or call `project()`. Those directives are already done by the top-level file. We can simplify these sub-`CMakeLists.txt` files.
Due to the fact that the top-level `CMakeLists.txt` file includes these files using `add_directory`, it's not necessary for them to do things like include `GetPybind.cmake` or call `project()`. Those directives are already done by the top-level file. We can simplify these sub-`CMakeLists.txt` files. In addition, in this file in particular, it's possible to pull out `target_link_libraries()` as a common element because it's done for all the cases tested. Finally, instead of including the `GetCUDAARCHS.cmake` I had created before, let's just include the 3 lines directly in here. It makes it more obvious what's happening and reduces maintenance burden.
Due to the fact that the top-level `CMakeLists.txt` file includes these files using `add_directory`, it's not necessary for them to do things like include `GetPybind.cmake` or call `project()`. Those directives are already done by the top-level file. We can simplify these sub-`CMakeLists.txt` files. In addition, this commit rewrites the logic for determining which flag to pass to the compiler on Windows. The Windows case is complicated due to changes in the flags in MSVC over the years. The new logic tries to be robust in the face of different possibilities.
Due to the fact that the top-level `CMakeLists.txt` file includes these files using `add_directory`, it's not necessary for them to do things like include `GetPybind.cmake` or call `project()`. Those directives are already done by the top-level file. We can simplify these sub-`CMakeLists.txt` files. In addition, the logic for setting `CMAKE_CUDA_ARCHITECTURES` can be made very simple since we don't do complicated things to try to figure out which CUDA/GPU variants the host has. If the `CUDAARCHS` environment variable is not set and `CMAKE_CUDA_ARCHITECTURES` is not set either, we just default to using `native` and hope for the best.
This is another round of rewriting the top-level file. * Moves the `check_cpu_supports()` macro to a separate file. * Improves testing for CPU features for vectorizing. * Uses more modern ways of doing things like looking for `hipcc`. * Renames variables to make them a little bit more obvious.
It seems to figure out automatically how many threads to use for the builds.
@mhucka mhucka marked this pull request as draft July 7, 2025 17:37
mhucka added 4 commits July 7, 2025 22:44
This makes the CMake build files test for OpenMP and use it if it's available, but not fail if it's not.
CMake has features to handle LTO for different compilers and platforms. Better to use this than to add our own lto options.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size: L 250< lines changed <1000

3 participants