DW_OP_implicit_pointer design/implementation in general

Alok_Sharma · December 11, 2019, 5:06am

Hi David,

This is regarding missing multilevel handling in branch for explicit pointers.

does the proposed IR format support multiple layers of dereference (eg: int ** where we know it ultimately points to the value 3 but can’t describe either the first or second level pointers that get to that value) - it sounds like any intrinsic that’s special cased to deref (like llvm.dbg.derefval) wouldn’t be able to capture that, which seems like it’s overly narrow/special case, then?

The PoC of DW_OP_LLVM_explicit_pointer does not have handling of multilevel indirection. As of now it is so due to below reason.

Explicit pointer handles cases when variable points to a temporary which contains constant. Due to language standard constraints, we don’t find pointers in such cases, what we get is references. Unlike pointers, references have single level. (reference to reference is just reference while pointer to pointer is double pointer).
Case of reference to reference, second level can be handled using DW_OP_LLVM_explicit_pointer itself.
Case of pointer to reference, second level can be handled using DW_OP_implicit_pointer.

Though it would not be complex to make explicit pointer multilevel, I avoided so due to lack of use case. Please let me know if I am missing something.

Regards,
Alok

dblaikie · December 18, 2019, 11:24pm

(I’m still pretty concerned that there are IR changes going in for a feature that seems incomplete and more invasive than really seems justified to me - though I admit I’m clearly not paying enough attention to this feature to have a nuanced/fully informed opinion & so maybe I just need to step back from all of this - but given the addition of new intrinsics, it seems like there should be more clear design discussion)

Hi David,

This is regarding missing multilevel handling in branch for explicit pointers.

does the proposed IR format support multiple layers of dereference (eg: int ** where we know it ultimately points to the value 3 but can’t describe either the first or second level pointers that get to that value) - it sounds like any intrinsic that’s special cased to deref (like llvm.dbg.derefval) wouldn’t be able to capture that, which seems like it’s overly narrow/special case, then?

The PoC of DW_OP_LLVM_explicit_pointer does not have handling of multilevel indirection. As of now it is so due to below reason.

Explicit pointer handles cases when variable points to a temporary which contains constant. Due to language standard constraints, we don’t find pointers in such cases, what we get is references. Unlike pointers, references have single level. (reference to reference is just reference while pointer to pointer is double pointer).

Case of reference to reference, second level can be handled using DW_OP_LLVM_explicit_pointer itself.
Case of pointer to reference, second level can be handled using DW_OP_implicit_pointer.

Though it would not be complex to make explicit pointer multilevel, I avoided so due to lack of use case. Please let me know if I am missing something.

Sorry, I couldn’t understand your language related to references and pointers - I don’t understand why they would be handled differently or represent challenges/tradeoffs for features related to collapsed indirection like this.

Multi-level indirection seems to have as much use as single level indirection. (if a DWARF user may want to know what a pointer points to even when what it points to isn’t in memory, the same would hold true for pointers to pointers, etc)

I would expect this to be handled with a general OP saying “hey, I’m skipping one level of indirection indirection in the resulting value, because that indirection is missing/not in the final program” and that this would be encoded in a llvm.dbg.value/DIExpression as usual, without the need for new IR intrinsics, though possibly with the need for an LLVM extension DWARF OP (DW_OP_LLVM_explicit_pointer?)

To reconstitute that general form into the current DWARF limited “indirection needs to refer to another variable DIE” issue - as I think Paul speculated previously, we could always reconstitute a synthetic variable DIE & not try to reflect the case where the indirection lands at another named/known variable - as I expect that’s the minority case. In most cases in C++ I expect pointers and references do not refer to named variables in the same function. They refer to return values from functions, they refer to array elements in dynamically allocated arrays, etc, etc.

pogo59 · December 19, 2019, 4:27pm

I regret to say I also have not been following this with the attention it deserves, and I am pretty much on holiday until 14 January.

I am particularly surprised by the appearance of something called DW_OP_LLVM_explicit_pointer, which I wouldn’t have thought necessary and don’t remember from the discussions that I did read.

I will try to mend my ways and pay more attention when I return.

–paulr

dblaikie · December 20, 2019, 12:48am

I think the new OP_LLVM extension might’ve been in response to my suggestion for something more general, that could handle multiple indirections to things that weren’t existing variables, etc. But I might be wrong on that.

Alok_Sharma · December 23, 2019, 6:26pm

Hi David,

Sorry, I couldn’t understand your language related to references and pointers - I don’t understand why they would be handled differently or represent challenges/tradeoffs for features related to collapsed indirection like this.

Let me try to explain what I wanted to convey with an example.

Example of multilevel pointer:

int var;
int *ptr = &var; // first level of indirection
int *ptrptr = &ptr; //second level of indirection

Example of multilevel references:

int var;
int &ref = var; // first level of reference

int &refref = ref; // second level of reference

Though variable refref is reference of another reference but that is still of type reference.

As I earlier said I am struggling to find a case where multilevel of indirection is needed with DW_OP_LLVM_explicit_pointer) in case of references, please let me know if you have any example in mind. I shall modify the patch for multilevel of indirection. ( DW_OP_LLVM_explicit_pointer is used only in case of references)

Multi-level indirection seems to have as much use as single level indirection. (if a DWARF user may want to know what a pointer points to even when what it points to isn’t in memory, the same would hold true for pointers to pointers, etc)

For pointer to pointer, multilevel indirection is already handled. As all those cases use DW_OP_implicit_pointer.

Regards,
Alok

Alok_Sharma · December 23, 2019, 7:03pm

Hi Paul,

As David already replied about the emergence of DW_OP_LLVM_explicit_pointer. Let me explain a bit more about it.

In order to address a case David has put regarding a variable pointing to a temporary (which happens in case of references). For the same case a solution is already suggested by you (using artificial variable for temporary).

To make maximum use of the discussion, I tried to provide additional option to choose from.

Note that this case is not handled even by gnu gcc, So how much gcc does should be must for us and beyond that anything should be aspire.

Now to include that aspire case we have two options

Create Artificial variable (flip side we need to carry extra artifical DIE)
Define the value inline using DW_OP_LLVM_explicit_pointer (flip side new operator need to be introduced)

I think we should go ahead with must functionality anyway and chose one of the options for aspire functionality.

Regards,
Alok

Since this case

dblaikie · December 24, 2019, 8:59pm

(sorry, I accidentally dropped everyone/and the list from the thread, adding them back in here)

OK, so we’re on the same page, none of this has anything to do with references specifically, but about whether the pointer or reference points/refers to a named variable or not.

And this was the point I was trying to make (perhaps poorly) at the start: The DWARF feature (OP_implicit_pointer) is both awkward to implement (seen by the fact that there’s discussion to add new intrinsics to support it) and only supports a small subset of cases.

My argument is that we should not implement this DWARF feature - or, at least, not the way we’re doing it.

We should implement a more general feature (currently titled OP_LLVM_explicit_pointer - though I’m not sure “explicit” v “implicit” is expressing, at least to me, the distinction between these two things) & only that, at least only that at the LLVM IR level.

Possibly we can implement DWARFv5 standard support (perhaps this is the only DWARF emission mode we support) where the OP_LLVM_explicit_pointer is lowered to an artificial variable (doesn’t need a name, file/line number, etc - it’s just a big indirection due to the way DWARFv5 implicit_pointer is specified) with the location attached to it.

We lose some functionality by doing this, for sure - the consumer won’t know that the pointer aliases a variable, though GDB at least doesn’t visualize the name of the variable a pointer points to so far as I can tell, for example (Sony/Apple - do you have debuggers that would have particularly improved UI were you to know that a pointer points to a particular variable, rather than to the pieces of the variable’s value (assuming the variable isn’t completely memory at all - so it’s up in registers, incomplete/fragmented hunks of memory, etc))

I think implementing /only/ this more general solution is tidier (doesn’t introduce new LLVM IR intrinsics & the complexities of tracking which variable another variable points to) and covers more cases. If at some point in the future someone finds that the value on top of this, of knowing the name of the variable being pointed to, is worth the added complexity - then we could do that. But my gut feel is that it is not worth the added complexity.

Alok_Sharma · January 1, 2020, 7:04pm

Hi David,

Happy new year !

I just uploaded a POC patch that covers the cases when pointer points to un-named variables using DW_OP_implicit_pointer (references and dynamic allocation). This is using artificial variable as suggested by Paul.

https://reviews.llvm.org/D72055

I hope that now it should address your concerns.

Scope of DW_OP_implicit_pointer: As we initially decided split of original patch should be splits for back-end changes + splits for different front-end changes to address different scope.

the current patch fits in that decision. Now we cover most of the cases if not all. And good thing is that we can add more scope whenever needed. We don’t need to stall current set of patches.

Regarding addition of new intrinsic: it came up from the current discussion and would benefit us to identify when value of a variable is denoted or de-referenced valued of (pointer/reference) variable is denoted. It has gone through first set of review.

Please let me know your thoughts.

Regards,
Alok

Jeremy_Morse · January 2, 2020, 2:37pm

Hi all,

On the topic of intrinsics, right now we have two (dbg.value /
dbg.addr) that respectively describe:
* The "direct value" (quoting langref) of a variable, and
* The address of where the current variable value is stored.
Both of which map onto dwarf locations later on. My reasoning for
wanting a new intrinsic is that implicit pointers are neither of these
things: what is being described is an entirely new domain of
information about the variable, i.e. what it points at. To me that's a
major difference from a "direct value", and something to signal at an
early stage to any consumer wishing to interpret debug intrinsics,
rather than having consumers interpret the DIExpression to discover
whether this is actually the variable value or not.

The counter-argument would be that, in reality, dbg.value is used to
represent everything about with variables and their values, any other
debug intrinsic is likely to be dropped by optimisations, and you
usually have to interpret the DIExpression anyway.

IMO, having a new intrinsic would be conceptually neater; that
neatness might not have a lot of practical value though. I'm not
familiar with LLVMs long term compatibility guarantees when it comes
to intrinsics, so maybe it's more trouble than it's worth.

adrian.prantl · January 8, 2020, 8:38pm

Hi all,

On the topic of intrinsics, right now we have two (dbg.value /
dbg.addr) that respectively describe:
* The "direct value" (quoting langref) of a variable, and
* The address of where the current variable value is stored.
Both of which map onto dwarf locations later on. My reasoning for
wanting a new intrinsic is that implicit pointers are neither of these
things: what is being described is an entirely new domain of
information about the variable, i.e. what it points at. To me that's a
major difference from a "direct value", and something to signal at an
early stage to any consumer wishing to interpret debug intrinsics,
rather than having consumers interpret the DIExpression to discover
whether this is actually the variable value or not.

As far as LLVM semantics are concerned, the implicit pointer doesn't seem to be that much different from any other implicit values (such as constants) to me. Why do you think that it needs to be represented differently inside of LLVM IR?

-- adrian

Jeremy_Morse · January 10, 2020, 3:01pm

Hi,

As far as LLVM semantics are concerned, the implicit pointer doesn't seem to be that much different from any other implicit values (such as constants) to me. Why do you think that it needs to be represented differently inside of LLVM IR?

I think it's almost entirely that the first argument to dbg.value will
change from "Always ValueAsMetadata" to "Maybe metadata, maybe
Value". I get the feeling that allowing more options here will come
out as more conditions / branching elsewhere, in a way we could try to
avoid.

However it's a mild opinion with a certain amount of hand waving; and
not one that anyone else seems to share, so I'm happy to drop that
part of the discussion.

dblaikie · January 10, 2020, 7:36pm

Hi,

As far as LLVM semantics are concerned, the implicit pointer doesn’t seem to be that much different from any other implicit values (such as constants) to me. Why do you think that it needs to be represented differently inside of LLVM IR?

I think it’s almost entirely that the first argument to dbg.value will
change from “Always ValueAsMetadata” to “Maybe metadata, maybe
Value”.

What changes do you have in mind there? Are you referring to the possibility of implicit values to refer to other variables?

I’m sort of interested in maybe not doing that - and only implementing a more general form (what’s been talked about with the LLVM_implicit_value (or was it LLVM_explicit_value? I forget)) - and synthesizing artificial variables in the backend rather than trying to track which variable a pointer points to. I think this would keep the impact on optimizations smaller & would be more general. My wager/belief/instinct is that most cases won’t be pointing to a named variable with a single level of indirection, but to unnamed variables, multiple levels of indirection, etc.

adrian.prantl · January 13, 2020, 5:20pm

Hi,

> As far as LLVM semantics are concerned, the implicit pointer doesn't seem to be that much different from any other implicit values (such as constants) to me. Why do you think that it needs to be represented differently inside of LLVM IR?

I think it's almost entirely that the first argument to dbg.value will
change from "Always ValueAsMetadata" to "Maybe metadata, maybe
Value".

What changes do you have in mind there? Are you referring to the possibility of implicit values to refer to other variables?

I'm sort of interested in maybe not doing that - and only implementing a more general form (what's been talked about with the LLVM_implicit_value (or was it LLVM_explicit_value? I forget)) - and synthesizing artificial variables in the backend rather than trying to track which variable a pointer points to. I think this would keep the impact on optimizations smaller & would be more general. My wager/belief/instinct is that most cases won't be pointing to a named variable with a single level of indirection, but to unnamed variables, multiple levels of indirection, etc.

The extra artificial variable also strikes me as a DWARF-ism that we don't necessarily should model in LLVM IR.

I get the feeling that allowing more options here will come
out as more conditions / branching elsewhere, in a way we could try to
avoid.

If it were possible to synthesize it in AsmPrinter, would that remove the motivation for the new intrinsic for you?

-- adrian

dblaikie · January 13, 2020, 5:23pm

Hi,

As far as LLVM semantics are concerned, the implicit pointer doesn’t seem to be that much different from any other implicit values (such as constants) to me. Why do you think that it needs to be represented differently inside of LLVM IR?

I think it’s almost entirely that the first argument to dbg.value will
change from “Always ValueAsMetadata” to “Maybe metadata, maybe
Value”.

What changes do you have in mind there? Are you referring to the possibility of implicit values to refer to other variables?

I’m sort of interested in maybe not doing that - and only implementing a more general form (what’s been talked about with the LLVM_implicit_value (or was it LLVM_explicit_value? I forget)) - and synthesizing artificial variables in the backend rather than trying to track which variable a pointer points to. I think this would keep the impact on optimizations smaller & would be more general. My wager/belief/instinct is that most cases won’t be pointing to a named variable with a single level of indirection, but to unnamed variables, multiple levels of indirection, etc.

The extra artificial variable also strikes me as a DWARF-ism that we don’t necessarily should model in LLVM IR.

Oh, yeah, that’s certainly my intent in making this suggestion, that these artificial variables would not be part of the LLVM IR and only generated in the backend (& perhaps not generated at all in some conditions if/where we decide to support an extension DW_OP that’s more similar to what I’m suggesting to use at the IR level). Thanks for describing/clarifying that explicitly.

Jeremy_Morse · January 13, 2020, 7:58pm

Hi,

David wrote:

Are you referring to the possibility of implicit values to refer to other variables? I'm sort of interested in maybe not doing that - and only implementing a more general form (what's been talked about with the LLVM_implicit_value (or was it LLVM_explicit_value? I forget)) - and synthesizing artificial variables in the backend rather than trying to track which variable a pointer points to. I think this would keep the impact on optimizations smaller & would be more general.

Adrian wrote:

If it were possible to synthesize it in AsmPrinter, would that remove the motivation for the new intrinsic for you?

Ah, yeah, those changes would avoid any need for a new intrinsic to my
mind, and sounds much more palatable. Thanks for explaining.

Alok_Sharma · January 14, 2020, 4:49am

Hi,

Let me consolidate what we discussed with my opinion.

On the point of new intrinsic llvm.dbg.derefval:

It (new intrinsic) was more a neater way than a needed way. The whole functionality can go ahead without it and using llvm.dbg.value instead. Though I liked it (new intrinsic), since most of us are against it, it should be fine for me to drop it.
This is because the transformation was like

llvm.dbg.value → DBG_VALUE → DWARF location-list
llvm.dbg.derefval → DBG_VALUE → DWARF location-list

Since it was just for better readability in LLVM IR only later it (new intrinsic) was sharing the same path with llvm.dbg.value. So it should be fine to drop it without any impact in later functionality.

On question of identify such cases we can anyway identify using the type of expression (DW_OP_implicit_pointer).
On question of ( DW_OP_implicit_pointer ) fitting to dbg.value intrinsic it perfectly does that as value in such case is metadata and prototype of dbg.value is dbg.value(metadata, metadata, metadata).
So it should be fine to drop it and back to where it was started before introduction of new intrinsic.

On the point of handing of pointer pointing to temporary / unnamed variables (Lets call it Scope S1)

As two proposed patches are there for bringing pointers referring to temporary / unnamed variable
A) first patch uses (new proposed operator) DW_OP_LLVM_explicit_pointer(both in LLVM-IR and DWARF)
B) Second patch uses artificial variable representing temporary (both in LLVM-IR and DWARF)

https://reviews.llvm.org/D72055 [DebugInfo] Support for DW_OP_implicit_pointer (for temp references & dynamic allocations)
If I understood David correctly, he wants LLVM-IR look like patch-A and DWARF look like patch-B (lets call it way C)

Since patch-A is not desired because we don’t support anything beyond DWARF-5 and patch proposes new DWARF operator. I want to clarify that patch-B can exist even without new intrinsic and can use dbg.value and fits perfectly in existing LLVM-IR template. if only reason to go way-C is to

I would like to go way-B or way-C for the scope of unnamed variables.

For the cases when pointer points to named variable (Lets call it Scope S2):
I would update the patches with replacing dbg.derefval to dbg.value and using DW_OP_implicit_pointer (to named variable) in both LLVM-IR and DWARF.

In summary,

Scope S1 can be solved with

way-B) DW_OP_implicit_pointer with artificial variable and with intrinsic dbg.value in LLVM-IR and DWARF
or

way-C) DW_OP_LLVM_explicit_pointer with intrinsic dbg.value in LLVM-IR + DW_OP_implicit_pointer with artificial variable in DWARF

Scope S2 can be solved with
DW_OP_implicit_pointer with actual named variable with dbg.value in LLVM-IR and DWARF

Regards,
Alok

Though

lets

dblaikie · January 14, 2020, 4:47pm

Hi,

Let me consolidate what we discussed with my opinion.

On the point of new intrinsic llvm.dbg.derefval:

It (new intrinsic) was more a neater way than a needed way. The whole functionality can go ahead without it and using llvm.dbg.value instead. Though I liked it (new intrinsic), since most of us are against it, it should be fine for me to drop it.
This is because the transformation was like

llvm.dbg.value → DBG_VALUE → DWARF location-list
llvm.dbg.derefval → DBG_VALUE → DWARF location-list

Since it was just for better readability in LLVM IR only later it (new intrinsic) was sharing the same path with llvm.dbg.value. So it should be fine to drop it without any impact in later functionality.

On question of identify such cases we can anyway identify using the type of expression (DW_OP_implicit_pointer).

On question of ( DW_OP_implicit_pointer ) fitting to dbg.value intrinsic it perfectly does that as value in such case is metadata and prototype of dbg.value is dbg.value(metadata, metadata, metadata).
So it should be fine to drop it and back to where it was started before introduction of new intrinsic.

On the point of handing of pointer pointing to temporary / unnamed variables (Lets call it Scope S1)

As two proposed patches are there for bringing pointers referring to temporary / unnamed variable
A) first patch uses (new proposed operator) DW_OP_LLVM_explicit_pointer(both in LLVM-IR and DWARF)
B) Second patch uses artificial variable representing temporary (both in LLVM-IR and DWARF)

https://reviews.llvm.org/D72055 [DebugInfo] Support for DW_OP_implicit_pointer (for temp references & dynamic allocations)
If I understood David correctly, he wants LLVM-IR look like patch-A and DWARF look like patch-B (lets call it way C)

Since patch-A is not desired because we don’t support anything beyond DWARF-5 and patch proposes new DWARF operator. I want to clarify that patch-B can exist even without new intrinsic and can use dbg.value and fits perfectly in existing LLVM-IR template. if only reason to go way-C is to

I would like to go way-B or way-C for the scope of unnamed variables.

For the cases when pointer points to named variable (Lets call it Scope S2):
I would update the patches with replacing dbg.derefval to dbg.value and using DW_OP_implicit_pointer (to named variable) in both LLVM-IR and DWARF.

In summary,

Scope S1 can be solved with

way-B) DW_OP_implicit_pointer with artificial variable and with intrinsic dbg.value in LLVM-IR and DWARF
or

way-C) DW_OP_LLVM_explicit_pointer with intrinsic dbg.value in LLVM-IR + DW_OP_implicit_pointer with artificial variable in DWARF

Scope S2 can be solved with
DW_OP_implicit_pointer with actual named variable with dbg.value in LLVM-IR and DWARF

Based on my current understanding, I’d rather not do this ^ it seems like added complexity to LLVM optimizations/middle end/IR representation with limited value. I think doing (C) above but using it to cover both S1 and S2 (ie: not special casing “pointing to an existing variable” - instead treating that the same as “pointing to an unnamed entity”) initially, and then, maybe later, evaluating whether referencing named variables from dbg.value is worth the added benefit, would be the right way to go.

dblaikie · July 21, 2020, 11:25pm

Realized I didn't document the original reviews that motivated this thread:

A stack of reviews, split off from here: https://reviews.llvm.org/D69787

Alok's posted a new patch (with smaller patches split off from the
monolithic one) here: https://reviews.llvm.org/D84112

I haven't had a chance to page in all the old context, nor look at the
new ones in detail yet. But probably worth keeping high level design
review here, I think? Once the general direction seems good, we can go
into the separate review threads for the implementation/mechanical
details.

Jeremy_Morse · July 30, 2020, 11:59am

Hi,

I've taken a look at the patches (thanks Alok) and will submit
comments in a bit,

David wrote:

I haven't had a chance to page in all the old context, nor look at the
new ones in detail yet. But probably worth keeping high level design
review here, I think? Once the general direction seems good, we can go
into the separate review threads for the implementation/mechanical
details.

I think the latest patch series matches what came out of the
discussion above, as you described it:

I would expect this to be handled with a general OP saying "hey, I'm
skipping one level of indirection indirection in the resulting value,
because that indirection is missing/not in the final program" and that this
would be encoded in a llvm.dbg.value/DIExpression as usual, without the
need for new IR intrinsics, though possibly with the need for an LLVM
extension DWARF OP (DW_OP_LLVM_explicit_pointer?)

That's what's been implemented, whenever an alloca is promoted,
variable locations that used the allocas address are transformed into
promoted-value variable locations in the usual way, but with a
DW_OP_LLVM_explicit_pointer at the front of the expression to indicate
"the pointer is absent, but this is what it would have pointed at".
Simple case:

  i32 *%foo = alloca i32
  dbg.declare(%foo, !123, !DIExpression())
  dbg.value(%foo, !456, !DIExpression())
  store i32 0, i32 *%foo

Where !123 is a plain i32 source variable, and !456 a pointer-to-i32
source variable. When %foo is promoted, these would become:

dbg.value(i32 0, !123, !DIExpression())
dbg.value(i32 0, !456, !DIExpression(DW_OP_LLVM_explicit_pointer))

When it comes to the IR way of modelling these things, I think that
this matches the discussion, and is a lightweight way of representing
what's going on.

I have some reservations about further down the compiler though:
artificial variables get created at isel time, which seems early to
me, and duplicates the work for each instruction selector. Is there a
reason why it can't be done in the DWARF emitter? The artificial
variables are also tracked with additional DBG_VALUE instructions, if
we could push artificial variable creation back to emission time then
we wouldn't have to answer questions such as "what is the lifetime of
a DBG_VALUE of an artificial variable?"

At promotion time: some of the handling of variable promotion appears
to happen within Instruction::eraseFromParent, which seems out of
place. I reckon you've missed the calls in PromoteMemoryToRegister.cpp
to the ConvertDebugDeclareToDebugValue helpers -- shifting the
promotion handling there would be better, and not dependent on the
order that things are erased in. I think those ConvertDebug... helper
functions and the two other functions you've instrumented in the same
file should be sufficient to catch all promotions.

Additionally, I believe that promoted allocas are getting
DW_OP_LLVM_explicit_pointer dbg.values generated for any pointer that
_ever_ points at it. You'll need to consider circumstances where
pointer variables have multiple values, i.e.:

  int foo, bar, baz;
  int *qux = &foo;
  qux = &bar;
  qux = &baz;
  foo = 1;
  bar = 2;
  baz = 3;

If I understood the code correctly, 'qux' will have implicit-pointer
values for each of the assignments to foo / bar / baz, where it should
only have a dbg.value for the assignment to 'baz'. (It might be
alright to limit handling to scenarios where a pointer variable only
ever has one value, and then expand what can be handled later).

dblaikie · September 24, 2020, 2:56am

[+CC some folks interested in optimized debug info variable locations]

Hi,

I've taken a look at the patches (thanks Alok) and will submit
comments in a bit,

Thanks for that! Looks like you've hit a lot of the usual/important
bits, for sure.

David wrote:
> I haven't had a chance to page in all the old context, nor look at the
> new ones in detail yet. But probably worth keeping high level design
> review here, I think? Once the general direction seems good, we can go
> into the separate review threads for the implementation/mechanical
> details.

I think the latest patch series matches what came out of the
discussion above, as you described it:

> I would expect this to be handled with a general OP saying "hey, I'm
> skipping one level of indirection indirection in the resulting value,
> because that indirection is missing/not in the final program" and that this
> would be encoded in a llvm.dbg.value/DIExpression as usual, without the
> need for new IR intrinsics, though possibly with the need for an LLVM
> extension DWARF OP (DW_OP_LLVM_explicit_pointer?)

That's what's been implemented, whenever an alloca is promoted,
variable locations that used the allocas address are transformed into
promoted-value variable locations in the usual way, but with a
DW_OP_LLVM_explicit_pointer at the front of the expression to indicate
"the pointer is absent, but this is what it would have pointed at".
Simple case:

  i32 *%foo = alloca i32
  dbg.declare(%foo, !123, !DIExpression())
  dbg.value(%foo, !456, !DIExpression())
  store i32 0, i32 *%foo

Where !123 is a plain i32 source variable, and !456 a pointer-to-i32
source variable. When %foo is promoted, these would become:

  dbg.value(i32 0, !123, !DIExpression())
  dbg.value(i32 0, !456, !DIExpression(DW_OP_LLVM_explicit_pointer))

When it comes to the IR way of modelling these things, I think that
this matches the discussion, and is a lightweight way of representing
what's going on.

Awesome. I noticed there's also implicit_pointer as well as
explicit_pointer - what's the distinction there and are the two
concepts related enough to have some shared implementation
concerns/merit shared review?

Also, it looked like (at a glance) the LLVM_explicit (or implicit?)
pointer thing took an integer parameter, which I think is the amount
of indirection (so an int** collapsed down to an int would have a
value of '2' for this int parameter?) - minor discussion about whether
that's worthwhile, or whether we should stack LLVM_explicit_pointers
on top of each other to represent this case?

I have some reservations about further down the compiler though:
artificial variables get created at isel time, which seems early to
me, and duplicates the work for each instruction selector.

Is this consistent with how we do dynamic array bounds (void f1(int i)
{ int x[i]; }), which I think is one of the things that we modeled
this idea from? Or perhaps it is done for array dimensions but isn't
suitable to this new use case as much?

Is there a
reason why it can't be done in the DWARF emitter? The artificial
variables are also tracked with additional DBG_VALUE instructions, if
we could push artificial variable creation back to emission time then
we wouldn't have to answer questions such as "what is the lifetime of
a DBG_VALUE of an artificial variable?"

This/these sort of questions I'd like to punt to Adrian and other
folks who have had more investment in optimized debug info locations
in the last couple of years... I hope that isn't a cop-out, it's just
these particular aspects are probably not ones I have the most context
on.

At promotion time: some of the handling of variable promotion appears
to happen within Instruction::eraseFromParent, which seems out of
place. I reckon you've missed the calls in PromoteMemoryToRegister.cpp
to the ConvertDebugDeclareToDebugValue helpers -- shifting the
promotion handling there would be better, and not dependent on the
order that things are erased in. I think those ConvertDebug... helper
functions and the two other functions you've instrumented in the same
file should be sufficient to catch all promotions.

Additionally, I believe that promoted allocas are getting
DW_OP_LLVM_explicit_pointer dbg.values generated for any pointer that
_ever_ points at it. You'll need to consider circumstances where
pointer variables have multiple values, i.e.:

  int foo, bar, baz;
  int *qux = &foo;
  qux = &bar;
  qux = &baz;
  foo = 1;
  bar = 2;
  baz = 3;

If I understood the code correctly, 'qux' will have implicit-pointer
values for each of the assignments to foo / bar / baz, where it should
only have a dbg.value for the assignment to 'baz'. (It might be
alright to limit handling to scenarios where a pointer variable only
ever has one value, and then expand what can be handled later).

Sounds like something to be aware of/see some test cases for, for sure.

Much thanks!
- Dave

Topic		Replies	Views
RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value LLVM Dev List Archives	20	186	September 7, 2017
RFC: Unify debug and optimized variable locations with llvm.dbg.addr [was: DW_OP_LLVM_memory] LLVM Dev List Archives	10	227	September 11, 2017
A question to the DWARF experts on symbol indirection LLVM Dev List Archives	5	89	July 26, 2018
Marking source locations without interfering with optimization? LLVM Dev List Archives	5	112	August 24, 2005
A few more questions about DIFactory and source-level debugging. LLVM Dev List Archives	1	81	October 21, 2009

DW_OP_implicit_pointer design/implementation in general

Related topics