Skip to content

Conversation

@abhinavgaba
Copy link
Contributor

@abhinavgaba abhinavgaba commented Aug 14, 2025

This adds support for using ATTACH map-type for proper pointer-attachment when mapping list-items that have base-pointers.

For example, for the following:

 int *p; #pragma omp target enter data map(p[1:10])

The following maps are now emitted by clang:

 (A) &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM &p, &p[1], sizeof(p), ATTACH 

Previously, the two possible maps emitted by clang were:

 (B) &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM (C) &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ 

(B) does not perform any pointer attachment, while (C) also maps the
pointer p, both of which are incorrect.


With this change, we are using ATTACH-style maps, like (A), for cases where the expression has a base-pointer. For example:

 int *p, **pp; S *ps, **pps; ... map(p[0]) ... map(p[10:20]) ... map(*p) ... map(([20])p) ... map(ps->a) ... map(pps->p->a) ... map(pp[0][0]) ... map(*(pp + 10)[0]) 

Why a large PR

While it's unfortunate that this PR has gotten large and difficult to review, the issue is that all the functional changes have to be made together, to prevent regressions from partially implemented changes.

For example, the changes to capturing were previously done separately (#145454), but they would still cause stability issues in absence of full attach-mapping. And attach-mapping needs those changes to be able to launch kernels.

We extracted the utilities and functions, like those for finding attach-ptrs, or comparing exprs, out as a separate NFC PR that doesn't call those functions, just adds them (#155625), and another NFC PR for adding an implicit clause on attach-ptrs, when applicable, is #161294. Maybe the change that adds a new error message for use_device_addr on array-sections with non-var base-pointers can be extracted out too (but that will have to be a follow-up change in that case, and we may get comp-fails with this PR when the erroneous case is not caught/diagnosed).

Grouping of maps based on attach base-pointers

We also group mapping of clauses with the same base decl in the order of the increasing complexity of their base-pointers, e.g. for something like:

 S **spp; map(spp[0][0], spp[0][0].a), // attach-ptr: spp[0] map(spp[0]), // attach-ptr: spp map(spp), // attach-ptr: N/A 

We first map spp, then spp[0] then spp[0][0] and spp[0][0].a.

This allows us to also group "struct" allocation based on their attach pointers. This resolves the issues of us always mapping everything from the beginning of the symbol spp. Each group is mapped independently, and at the same level, like spp[0][0] and its member spp[0][0].a, we still get map them together as part of the same contiguous struct spp[0][0]. This resolves issue #141042.

use_device_ptr/addr fixes

The handling of use_device_ptr/addr was updated to use the attach-ptr information, and works for many cases that were failing before. It has to be done as part of this series because otherwise, the switch from ptr_to_obj to attach-style mapping would have caused regressions in existing use_device_ptr/addr tests.

Handling of attach-pointers that are members of implicitly mapped structs:

  • When a struct member-pointer, like p below, is a base-pointer in a map clause on a target construct (like map(p[0:1]), and the base of that struct is either the this pointer (implicitly or explicitly), or a struct that is implicitly mapped on that construct, we add an implicit map(p) so that we don't implicitly map the full struct.
 struct S { int *p; void f1() { #pragma omp target map(p[0:1]) // Implicitly map this->p, to ensure // that the implicit map of `this[:]` does // not map the full struct printf("%p %p\n", &p, p); }

Scope for improvement:

  • We may be able to compute attach-ptr expr while collecting component-lists in Sema.
    • But we cache the computation results already, and findAttachPtrExpr is fairly simple, and fast.
  • There may be a better way to implement semantic expr comparison.

Needs future work:

  • Attach-style maps not yet emitted for declare mappers.
  • Mapping of class member references: We are still using PTR_AND_OBJ maps for them. We will likely need to change that to handle ref_ptr/ref_ptee, and attach map-type-modifier on them.
For the following: ```c int *p; \#pragma omp target map(p[0]) // (A) (void)p; \#pragma omp target map(p) // (B) (void)p; \#pragma omp target map(p, p[0]) // (C) (void)p; \#pragma omp target map(p[0], p) // (D) (void)p; ``` For (A), the pointer `p` is predetermined `firstprivate`, so it should be (and is) captured by-copy. However, for (B), (C), and (D), since `p` is already listed in a `map` clause, it's not predetermined `firstprivate`, and hence, should be captured by-reference, like any other mapped variable. To ensure the correct handling of (C) and (D), the following changes were made: 1. In SemaOpenMP, we now ensure that `p` is marked to be captured by-reference in these cases. 2. We no longer ignore `map(p)` during codegen of `target` constructs, even if there's another map like `map(p[0])` that would have been mapped using a PTR_AND_OBJ map. 3. For cases like (D), we now handle `map(p)` before `map(p[0])`, so the former gets the TARGET_PARAM flag and sets the kernel argument.
The output of the compile-and-run tests is incorrect. These will be used for reference in future commits that resolve the issues. Also updated the existing clang LIT test, target_map_both_pointer_pointee_codegen.cpp, with more regions and more narrowed-down update_cc_test_checks filters.
This patch introduces libomptarget support for the ATTACH map-type, which can be used to implement OpenMP conditional compliant pointer attachment, based on whether the pointer/pointee is newly mapped on a given construct. For example, for the following: ```c int *p; #pragma omp target enter data map(p[1:10]) ``` The following maps can be emitted by clang: ``` (A) &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM &p, &p[1], sizeof(p), ATTACH ``` Without this map-type, the two possible maps emitted by clang: ``` (B) &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM (C) &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ ```` (B) does not perform any pointer attachment, while (C) also maps the pointer p, which are both incorrect. In terms of implementation, maps with the ATTACH map-type are handled after all other maps have been processed, as it requires knowledge of which new allocations happened as part of the construct. As per OpenMP 5.0, an attachment should happen only when either the pointer or the pointee was newly mapped while handling the construct. Maps with ATTACH map-type-bit do not increase/decrease the ref-count. With OpenMP 6.1, `attach(always/never)` can be used to force/prevent attachment. For `attach(always)`, the compiler will insert the ALWAYS map-type, which would let libomptarget bypass the check about one of the pointer/pointee being new. With `attach(never)`, the ATTACH map will not be emitted at all. The size argument of the ATTACH map-type can specify values greater than `sizeof(void*)` which can be used to support pointer attachment on Fortran descriptors. Note that this also requires shadow-pointer tracking to also support them. That has not been implemented in this patch. This was worked upon in coordination with Ravi Narayanaswamy, who has since retired. Happy retirement, Ravi!
This patch introduces libomptarget support for the ATTACH map-type, which can be used to implement OpenMP conditional compliant pointer attachment, based on whether the pointer/pointee is newly mapped on a given construct. For example, for the following: ```c int *p; #pragma omp target enter data map(p[1:10]) ``` The following maps can be emitted by clang: ``` (A) &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM &p, &p[1], sizeof(p), ATTACH ``` Without this map-type, the two possible maps emitted by clang: ``` (B) &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM (C) &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ ```` (B) does not perform any pointer attachment, while (C) also maps the pointer p, which are both incorrect. In terms of implementation, maps with the ATTACH map-type are handled after all other maps have been processed, as it requires knowledge of which new allocations happened as part of the construct. As per OpenMP 5.0, an attachment should happen only when either the pointer or the pointee was newly mapped while handling the construct. Maps with ATTACH map-type-bit do not increase/decrease the ref-count. With OpenMP 6.1, `attach(always/never)` can be used to force/prevent attachment. For `attach(always)`, the compiler will insert the ALWAYS map-type, which would let libomptarget bypass the check about one of the pointer/pointee being new. With `attach(never)`, the ATTACH map will not be emitted at all. The size argument of the ATTACH map-type can specify values greater than `sizeof(void*)` which can be used to support pointer attachment on Fortran descriptors. Note that this also requires shadow-pointer tracking to also support them. That has not been implemented in this patch. This was worked upon in coordination with Ravi Narayanaswamy, who has since retired. Happy retirement, Ravi!
abhinavgaba added a commit to abhinavgaba/llvm-project that referenced this pull request Sep 29, 2025
…on `target, when applicable. On a target construct, if there's an implicit map on a struct, or that of this[:], and an explicit map with a member of that struct/class as the base-pointer, we need to make sure that base-pointer is implicitly mapped, to make sure we don't map the full struct/class. For example: ```cpp struct S { int dummy[10000]; int *p; void f1() { #pragma omp target map(p[0:1]) (void)this; } }; S s; void f2() { #pragma omp target map(s.p[0:10]) (void)s; } ``` Only `this-p` and `s.p` should be mapped in the two cases above. If we were to implicitly map the full struct `s`, or `this[0:1]`, it would map the `dummy` field as well. This was pulled out of llvm#153683 to make that PR smaller. it also pulls out one other utility, and an NFC changes to the AttachPtrExpr comparator from that PR.
Copy link
Contributor Author

@abhinavgaba abhinavgaba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled out another NFC subset out of this into #161294. Breaking this down further is going to be difficult practically, except maybe pulling out the error-checking for use_device_addr operands.

I had originally thought about breaking this up based on whether the construct is target/non-target, but that doesn't work because they share the calls to the underlying generateAllInfo and generateInfoForComponentList.

Aside from that, leaving out the use_device_ptr/addr changes would cause stability regressions, and the same is true for leaving out the capturing code (however, not as widespread, so if preferable, #145454 can be merged first).

mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
… attach base-ptrs. (llvm#155625) These have been pulled out of the codegen PR llvm#153683, to reduce the size of that PR.
abhinavgaba added a commit to abhinavgaba/llvm-project that referenced this pull request Oct 3, 2025
…rom pointer to reference. Also adds an instance of `AttachPtrExprComparator` to the `MappableExprHandler` class, so that it can be reused for multiple comparisons. This was extracted out of llvm#153683 to make that PR more focused on the functional changes.
abhinavgaba added a commit that referenced this pull request Oct 6, 2025
…rom pointer to reference. (#161785) Also adds an instance of `AttachPtrExprComparator` to the `MappableExprHandler` class, so that it can be reused for multiple comparisons. This was extracted out of #153683 to make that PR more focused on the functional changes.
abhinavgaba added a commit that referenced this pull request Oct 6, 2025
…et.h. (#161791) The clang changes that use this map-type are in #153683.
abhinavgaba added a commit to abhinavgaba/llvm-project that referenced this pull request Oct 18, 2025
Two of the tests are currently asserting, and two are emitting unexpected results. The asserting tests will be fixed using the ATTACH-style codegen from llvm#153683. The other two involve use_device_addr on byrefs, and need more follow-up codegen changes, that have been noted in a FIXME comment.
abhinavgaba added a commit that referenced this pull request Oct 20, 2025
…164039) Two of the tests are currently asserting, and two are emitting unexpected results. The asserting tests will be fixed using the ATTACH-style codegen from #153683. The other two involve `use_device_addr` on byrefs, and need more follow-up codegen changes, that have been noted in a FIXME comment.
@abhinavgaba
Copy link
Contributor Author

@alexey-bataev, were you able to give this (or #161294) more thought?

void f2() {
int *ptr;
// &ptr[0], &ptr[2], sizeof(ptr[2]), TO | FROM | PARAM
// &ptr, &ptr[2], sizeof(ptr), ATTACH
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the pointer ptr may already be present on the device, in which case, an attachment to that existing pointer needs to happen. But if the pointer does not exist, no attachment should happen.

Comment on lines +27 to +29
// &ptr, &ptr, sizeof(ptr), TO | FROM | PARAM
// &ptr[0], &ptr[0], 2 * sizeof(ptr[0]), TO | FROM
// &ptr, &ptr[0], sizeof(ptr), ATTACH
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, what's new here, why do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attachment between ptr and its pointee should be conditional, and only happen when either the pointer or the pointee is newly mapped. If both of them were already present on the device, no attachment should happen. e.g.

int *ptr2 = ptr; // No attachment between ptr2 and ptr2[0], // because ptr2 does not exist on device. #pragma omp target data map(ptr, ptr2[0:10]) { // No attachment b/w ptr and ptr[0] because neither is // newly mapped. Both were already present on device. #pragma omp target data map(ptr, ptr[0:10]) { } }

The ATTACH map-type does that.

/// If use_device_ptr or use_device_addr is used on a decl which is a struct
/// member and there is no map information about it, then emission of that
/// entry is deferred until the whole struct has been processed.
struct DeferredDevicePtrEntryTy {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to remove it and related functionality in this patch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was trying to create maps like &this->p, &this->p[0], 0, PTR_AND_OBJ | RETURN_PARAM for cases like use_device_ptr(p) (where p is a class-member pointer), which interferes with the other attach-ptr based maps like map(p) or map(p[0:1]).

There should not be any functionality lost from removing this code. In fact, with the current patch, we get many more passes in use_device_ptr/addr tests than before.

We still handle use_device_ptr(p) on class members, just that instead of the PRT_AND_OBJ maps, we get &this->p[0], &this->p[0], 0, RETURN_PARAM.

abhinavgaba added a commit to abhinavgaba/llvm-project that referenced this pull request Oct 28, 2025
…espective of order. For cases like: ```c map(alloc: x) map(to: x) ``` If the entry of `map(to: x)` is encountered after the entry for `map(alloc:x)`, we still want to do a data-transfer even though the ref-count of `x` was already 0, because the new allocation for `x` happened as part of the current directive. Similarly, for: ```c ... map(alloc: x) map(from: x) ``` If the entry for `map(from:x)` is encountered before the entry for `map(alloc:x)`, we want to do a data-transfer even though the ref-count was not 0 when looking at the `from` entry, because by the end of the directive, the ref-count of `x` will go down to zero. And for: ```c ... map(from : x) map(alloc, present: x) ``` If the "present" entry is encountered after the "from" entry, then it becomes a no-op, as the "from" entry will do an allocation if no match was found. In this PR, these are handled by the runtime via the following: * For `to` and `present`, we also look-up in the existing table where we tracked new allocations when making the decision for the entry. * For `from`, we keep track of any deferred data transfers and when the ref-count of a pointer goes to zero, see if there were any previously deferred `from` transfers for that pointer. This can be done in the compiler, and that would avoid any runtime overhead, but it would require creating two separate offload struct entries for the entry and exit mappings (even for the `target` construct), with properly decayed maps, and either: (1) sorted in order of: * `present > to > ...` for the implied `target enter data`; and * `from > ...` for the `target exit data` e.g. ```c #pragma omp target map(to: x) map(present, alloc: x) map(always, from: x) // has to be broken into: // from becomes alloc on entry: // #pragma omp target enter data map(present, alloc: x) // map(to: x) // map(alloc: x) // // "present" and "to" just "decay" into "alloc" // #pragma omp target exit data map(always, from: x) // map(alloc: x) // map(alloc: x) ``` Or, (2) Merged into one entry each on the `target enter/exit data` directives. ```c #pragma omp target map(to: x) map(present, alloc: x) map(always, from: x) // has to be broken into: // from becomes alloc on entry: // #pragma omp target enter data map(present, to: x) // // "present" and "to" just "decay" into "alloc" // #pragma omp target exit data map(always, from: x) ``` The number of entries on the two would need to stay the same on the two to avoid ref-count mismatch. (1) would be simpler, but won't likely work for cases like: ```c ... map(delete: x) map(from:x) ``` as there is no clear "winner" between the two. So, for such cases, the compiler would likely have to do (2), which is the cleanest solution, but will take longer to implement. For EXPR comparisons, it can build-upon the `AttachPtrExprComparator` that was implemented as part of llvm#153683, but that should probably wait for the PR to be merged to avoid conflicts. Another alternative is to sort the entries in the runtime, which may be slower than on-demand lookups/updates that this PR does, because we always would be doing this sorting even when not needed, but may be faster in others where the constant-time overhead of map/set insertions/lookups becomes too large because of the number of maps. But that will still have to worry about the `from` + `delete` case.
Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Oct 29, 2025
…lvm#164039) Two of the tests are currently asserting, and two are emitting unexpected results. The asserting tests will be fixed using the ATTACH-style codegen from llvm#153683. The other two involve `use_device_addr` on byrefs, and need more follow-up codegen changes, that have been noted in a FIXME comment.
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
…lvm#164039) Two of the tests are currently asserting, and two are emitting unexpected results. The asserting tests will be fixed using the ATTACH-style codegen from llvm#153683. The other two involve `use_device_addr` on byrefs, and need more follow-up codegen changes, that have been noted in a FIXME comment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants