Skip to content

Conversation

@boomanaiden154
Copy link
Contributor

@boomanaiden154 boomanaiden154 commented Oct 25, 2025

This patch adds a job to the bazel checks workflow to run the bazel build/test. This patch only tests a couple projects just to get things going. The plan is to expand to more projects eventually and setup a GCS bucket for caching so jobs complete quickly by using cached artifacts.

This should add minimal load to the CI given the low frequency of bazel PRs, and especially when we enable GCS based caching due to bazel's effective use of caching. Google is also sponsoring the Linux Premerge CI and is interested in having premerge bazel builds which is why it makes sense to do premerge testing of this alternative build system using those resources.

This patch adds a job to the bazel checks workflow to run the bazel build/test. This patch only tests a couple projects just to get things going. The plan is to expand to more projects eventually and setup a GCS bucket for caching so jobs complete quickly by using cached artifacts. This should add minimal load to the CI given the low frequency of bazel PRs, and especially when we enable GCS based caching due to bazel's effective use of caching. Google is also sponsoring the Linux Premerge CI and is interested in having premerge bazel builds which is why it makes sense to do premerge testing of this alternative build system.
@boomanaiden154 boomanaiden154 marked this pull request as ready for review October 25, 2025 04:44
@llvmbot
Copy link
Member

llvmbot commented Oct 25, 2025

@llvm/pr-subscribers-github-workflow

Author: Aiden Grossman (boomanaiden154)

Changes

This patch adds a job to the bazel checks workflow to run the bazel build/test. This patch only tests a couple projects just to get things going. The plan is to expand to more projects eventually and setup a GCS bucket for caching so jobs complete quickly by using cached artifacts.

This should add minimal load to the CI given the low frequency of bazel PRs, and especially when we enable GCS based caching due to bazel's effective use of caching. Google is also sponsoring the Linux Premerge CI and is interested in having premerge bazel builds which is why it makes sense to do premerge testing of this alternative build system using those resources.


Full diff: https://github.com/llvm/llvm-project/pull/165071.diff

1 Files Affected:

  • (modified) .github/workflows/bazel-checks.yml (+25)
diff --git a/.github/workflows/bazel-checks.yml b/.github/workflows/bazel-checks.yml index 65d51649dd9e7..5cc0342015cf6 100644 --- a/.github/workflows/bazel-checks.yml +++ b/.github/workflows/bazel-checks.yml @@ -30,3 +30,28 @@ jobs: - name: Run Buildifier run: | buildifier --mode=check $(find ./utils/bazel -name *BUILD*) +  + bazel-build: + name: "Bazel Build/Test" + runs-on: llvm-premerge-linux-runners + if: github.repository == 'llvm/llvm-project' + steps: + - name: Fetch LLVM sources + uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0 + # TODO(boomanaiden154): We should use a purpose built container for this. Move + # over when we have fixed the issues with using custom containers with Github + # ARC in GKE. + - name: Setup System Dependencies + run: | + sudo apt-get update + sudo apt-get install -y libmpfr-dev libpfm4-dev + sudo curl -L https://github.com/bazelbuild/bazelisk/releases/download/v1.27.0/bazelisk-amd64.deb > /tmp/bazelisk.deb + sudo apt-get install -y /tmp/bazelisk.deb + rm /tmp/bazelisk.deb + - name: Build/Test + working-directory: utils/bazel + run: | + bazel test --config=ci --sandbox_base="" \ + @llvm-project//llvm/... \ + @llvm-project//clang/... \ + @llvm-project//mlir/... 
@boomanaiden154
Copy link
Contributor Author

Only doing a couple of the core projects now to limit what we build before we setup caching. Building lldb is also failing on a missing editline.h header that doesn't resolve when installing a package that populates /usr/include/editline.h likely because of the bazel sandbox.

@rupprecht rupprecht added the bazel "Peripheral" support tier build system: utils/bazel label Oct 27, 2025
Copy link
Collaborator

@rupprecht rupprecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! FYI @keith @aaronmondal as the other bazel owners.

One notable difference vs the current buildkite setup: this runs bazel test ... instead of bazel query ...| xargs bazel test. The query/xargs is a hack to make sure that targets tagged "manual" will still be built. It comes from the IREE build setup, but AFAIK we don't need it for LLVM -- skipping "manual" targets should be fine (and expected).

https://github.com/iree-org/iree/blob/330c4401b737756c422353142949fcc610ec102a/build_tools/bazel/build_test_all.sh#L129

- name: Build/Test
working-directory: utils/bazel
run: |
bazel test --config=ci --sandbox_base="" \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be bazelisk instead of bazel?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, bazelisk would be better. There is actually this action which might be nice to look into as it's fairly standardized and it seems to me like it could work more or less out of the box:

https://github.com/bazel-contrib/setup-bazel

Comment on lines +55 to +57
@llvm-project//llvm/... \
@llvm-project//clang/... \
@llvm-project//mlir/...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch only tests a couple projects just to get things going. The plan is to expand to more projects eventually and setup a GCS bucket for caching so jobs complete quickly by using cached artifacts.

llvm+clang+mlir is most of targets already. If you want to get this running w/ a minimal config and only add other projects once we have caching, should we start smaller? e.g. just @llvm-project//llvm/unittests:adt_tests

Copy link
Member

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this reduces reliance on buildkite, though I'm fairly biased as I have strong interest in this as well.

@jaroeichler and I are generally looking to sponsor infra for bazel workflows in LLVM as well. We've just recently started to build that out, so I can't give a too exact timeframe for that (maybe around 1-3 months?) But regarding runner images we might be able to contribute to this effort sooner. (Jaro, WDYT?)

# ARC in GKE.
- name: Setup System Dependencies
run: |
sudo apt-get update
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems risky. Unless this is pinned, it'll be very cache inefficient since apt installs are not reproducible by default. Even with pinning, i believe since the installs contain timestamps it would still not be possible to reuse the cache like this. IIRC tools like the OSSF scorecard would flag this as "critical level" risk as it essentially allows whatever tooling to end up on this runner and AFAICS the llvm-premerge-linux-runners aren't pinned either.

Another thing is that this fairly heavily creates a dependence on ubuntu and the particular configuration of the runner.

I guess if it's only temporary I don't have too much of an opinion against this, but I think it's worth mentioning that this really shouldn't stay like this forever.

Regarding a more long-term solution, my vote would be on a nixos image as that not only freezes deps properly, but is also verifiable for external users, i.e. from a configuration file it'll be possible to bit-by-bit reproduce the runner image by third parties. This gives essentially perfect cache-reuse and makes things comparatively easily verifiable.

I believe i have such an image lying around somewhere. If there is interest, i can take another look or set up a new one for this usecase.

For just the initial implementation, though it seems unintuitive, it might be an option to actually just remove the apt update call. I'm not sure whether this works with the default llvm-premerge runners, but if it does, it would at least pseudo-pin the apt repo to whatever version is on those runners which might be a bit more stable than updating everytime this workflow runs.

- name: Build/Test
working-directory: utils/bazel
run: |
bazel test --config=ci --sandbox_base="" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, bazelisk would be better. There is actually this action which might be nice to look into as it's fairly standardized and it seems to me like it could work more or less out of the box:

https://github.com/bazel-contrib/setup-bazel

@jaroeichler
Copy link

But regarding runner images we might be able to contribute to this effort sooner. (Jaro, WDYT?)

Yes, will spend some time in the next few days to work on this.

@boomanaiden154
Copy link
Contributor Author

We can't use custom runner images on the existing premerge infrastructure because we do not have a workaround for kubernetes-sigs/apiserver-network-proxy#748 yet. That shouldn't be a permanent state, but is where we're at currently.

I don't think this is feasible to do on the free GIthub Actions runners.

@aaronmondal
Copy link
Member

We can't use custom runner images on the existing premerge infrastructure because we do not have a workaround for kubernetes-sigs/apiserver-network-proxy#748 yet. That shouldn't be a permanent state, but is where we're at currently.

I don't think this is feasible to do on the free GIthub Actions runners.

Ah sorry for the confusion! For Bazel specifically its possible to use remote execution rather than running locally. I.e. the runner in this workflow just needs to construct the build graph, and can then send that off to one or multiple separate builder machines. This way it doesn't matter too much whether it's a free or a self-hosted runner, because the actual compilation happens elsewhere (e.g. in the same K8s cluster in another container or somewhere else entirely).

The way this works is roughly that the llvm-premerge-runner (or a free runner, kinda doesn't matter too much) invokes the bazel build and then sets --remote_executor=xxx --remote_cache=yyy. It then only builds up the compile commands and sends them off to the remote builder(s) to actually run.

The tricky part here is that the entire build needs to be fully hermetic so that the machine that constructs the compile commands (i.e. the free or the llvm-premerge runner) needs to do so with binary-identical tools that match what the remote builders expect. Otherwise even with "the same paths" it would lead to cache misses.

Fortunately that's a fixable toolchain issue and mostly bazel-side configuration that can be layered on top without affecting other workflows/development flows. The "image" i was talking about was the image for such remote builders, not for the runner that invokes this workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bazel "Peripheral" support tier build system: utils/bazel github:workflow

5 participants