RFC for DIExpression editing

Problem

The current interfaces for building/editing/querying DIExpressions are fragmented, low-level, and require throwaway work which leaves “orphaned” uniqued nodes in LLVMContext.

Survey of Status Quo

Operations over DIExpressions are generally implemented in a functional style, because a DIExpression is always uniqued and so immutable. However, this has a few downsides:

  • Operations which would be more efficiently implemented as mutation cannot be.
  • The input to every operation has to be uniqued, as that’s the only way to get a DIExpression *. This costs both time and space (which is not reclaimed until the LLVMContext is destroyed).
    • As a corollary, composing operations back-to-back means creating intermediate DIExpression *s that are never actually used.
    • In my testing this isn’t a huge deal in most cases, and even 40-50% “wasted” expressions can represent almost nothing in terms of e.g. RSS, but if we are improving the interface it should address this inefficiency.
  • There is no clear “home” for operations, with some being free functions, some members, some static members.

When these costs are high enough (or the author won’t tolerate them) we get functions which work on a mutable proxy for a DIExpression, e.g. SmallVectorImpl<uint64_t> &.

As an example of the issue, Introduce DIExpression::foldConstantMath() by rastogishubham · Pull Request #71718 · llvm/llvm-project optimizes large expressions, but still has to unique the unoptimized expressions first, only to orphan the intermediate DIExpression (which then lives in the LLVMContext forever).

There is also no abstraction over the vector-of-uint64_t representation, even though all of the operations actually in use in LLVM could readily be statically typed. This may be more or less critical to others, but if we’re fixing the other issues it seems like something to address. Hiding as much as reasonable also means further optimizations can be done without touching the whole codebase.

Prior Art

There is already DIExpression::ExprOperand which improves the vector-of-uint64_t situation with about as little overhead as possible.

There is even a TODO on that type:

TODO: Store arguments directly and change \a DIExpression to store a range of these. 

This RFC is effectively trying to set us on a path where we can make good on that TODO.

Proposal

I tried to look for patterns in the current code and extract types whose interfaces allow the current implementations to drop-in largely unchanged whenever possible.

At a high level there are two kinds of operations related to DIExpression in the codebase today:

  • “Query” operations, which don’t yield a new expression. These include some which yield subexpressions like DIExpression::getExtOps.
  • “Edit” operations which yield a new expression.

These are mapped to immutable+non-owning (DIExprRef) and mutable+owning (DIExprBuf) types, respectively. I see these as analogous to the LLVM SmallVector/ArrayRef, the Java String/StringBuilder, the Rust &str/String, etc.

To represent a single operation, there is a variant-like type DIOp::Op. Its alternative types are all in the DIOp namespace, and have PascalCase-ified names corresponding to DW_OP_* ops already in use e.g. DIOp::PlusUConst(uint64_t). This is not a std::variant because it is not possible to pack the “tag” with the data members of the alternative types, and because it isn’t possible to meaningfully derive from std::variant in C++17.

Currently the DIExprRef is 4-quadwords, but could be optimized to 2-quadwords. It works in terms of an iterator over a range of uint64_ts, where the iterator lazily constructs DIOp::Ops.

While iterating, if an unknown operation is encountered, the iterator yields a range-like DIOp::Escape over the remainder of the underlying vector. This seemed preferable to the Error-based fallible-iterator pattern, which is surprisingly difficult to get right, and which asks more of each call-site. This does leak the underlying representation, and I’m not actually sure what use the range would be, so maybe just a unit-type DIOp::Error could be used here instead.

The DIExprBuf is as large as an LLVMContext * and two SmallVector<uint64_t, 0>, as it allows for remembering the context of the input expression, and it bakes in double-buffering which is common to many edit operations. It currently does not handle the single-operation case optimally, as it eagerly copies the source expression into a buffer at construction. I believe this can be optimized behind the current interface so I didn’t worry about it for the initial version.

A couple of additional constraints I’ve placed on this version are:

  • Leave the at-rest DIExpression representation alone. Always get back to vector<uint64_t>. This decision was made to make an incremental transition easier. Downstream users can continue to use the old interfaces initially, tests in terms of DIExpressions can remain as-is and validate the correctness of the new APIs, and the performance characteristics of the new approach will be easier to reason about relative to the current implementation.
  • Keep DIOp::Op no larger than 2-quadwords. Working with value types is convenient, and along with some other restrictions this should allow the type to be passed around in register classes in SysV/Itanium. This also opens up the possibility of switching the underlying encoding over to just SmallVector<DIOp::Op> without a significant increase in memory usage.

Request

I have been spending a long time iterating on this without any feedback, so I wanted to share it in its current state and request high-level thoughts/feedback. The code is currently at [WIP][NFC] Start to hide details of DIExpression by slinder1 · Pull Request #161716 · llvm/llvm-project · GitHub and the compile-time-tracker is under at slinder1/perf/diexpr-ref-buf. The current numbers aren’t ideal, but I don’t think the performance issues are intrinsic to the approach.

One final note, there are a few different approaches taken in various places, such as some implementations in DIExprRef/DIExprBuf being essentially lifted from the current implementations and some being in terms of the FromUIntIterator. I left the different approaches side-by-side so those reviewing can compare; I’m not sure which is best, and I think it depends on whether we intend to end up with a SmallVector<DIOp::Op> like representation for DIExpression or not.

1 Like

Do you have a link with just your diff so it’s easy to see the changes in the interface and implementation?

That link should be to just my diff, although it is based on a now slightly-old main.

That branch only has a single commit, [WIP][NFC] Start to hide details of DIExpression · slinder1/llvm-project@27fe700 · GitHub although I can split things up somewhat if the diff is too impenetrable. Maybe separating the actual interfaces being introduced from the many changes in passes and things like salvageDebugInfo?

The core of the proposal is implemented in llvm/include/llvm/IR/DebugInfoExprs.h and llvm/lib/IR/DebugInfoExprs.cpp, and that’s really what I’m hoping to get feedback on.

Edit: I also just realized there isn’t a direct way for anyone to give feedback on it when it is just a patch, so I’ll create a WIP PR. I should have it up tomorrow.

Thanks for putting up this proposal! I think it is a good idea to reduce the intermediate DIExpressions that are created and improve the current interface.

I remember when I wrote DIExpression::foldConstantMath one of the big issues I faced was since DIExpressions are just a vector of uint64_ts, there is no way to know if a specific uint64_tis an operation or an operand, and the only way to know was to start at the beginning of the DIExpressionand use the DIExpressionCursorto move.

Since we want to create a new API for mutations, do you think it would be feasible to design something where we would be able to tell the difference between an operand and operation in the DIExpression instead of having to start at the beginning every time?

Thanks for putting up this proposal! I think it is a good idea to reduce the intermediate DIExpressions that are created and improve the current interface.

Of course, I’m glad there is some appetite for it!

Since we want to create a new API for mutations, do you think it would be feasible to design something where we would be able to tell the difference between an operand and operation in the DIExpression instead of having to start at the beginning every time?

I didn’t highlight this as one of the defining motivations, because for smaller expressions that tend to show up when e.g. just dogfooding LLVM the cost of re-walking the vectors didn’t stand out when profiling, but it was something I thought about.

If we move to a representation based on something like DIOp::Op then this will just fall out. Each DIOp::Op is a single operation which encapsulates its operands.

An earlier version of the patch actually used this representation in DIExprBuf, and converted it when making a DIExpression. In effect, rather than the SmallVector<uint64_t, 0> currently at [WIP][NFC] Start to hide details of DIExpression · slinder1/llvm-project@27fe700 · GitHub it was SmallVector<DIOp::Op, 0>.

The reason I dropped this was for simplicity and code-reuse, and because there is a cost to the added translations which is often not offset by the benefit of gaining random access when mutating. We can revisit it as an alternative to changing the underlying representation of DIExpression *, though.

1 Like

I remember when writing DIExpression::foldConstantMath one of the most annoying things was the fact that we had to start from the beginning of the DIExpression because we had no idea where the last operand from the end was.

The original idea was to only do fold operations on any new appends by looking back. So a change that will make the ambiguity between what an operation and an operand is, will be really great!

I posted a WIP PR at [WIP][NFC] Start to hide details of DIExpression by slinder1 · Pull Request #161716 · llvm/llvm-project · GitHub and updated the original post. I tried to split things up to make it more approachable.

Thanks for working on this, an update for DIExpressions has been a long time coming.

I think broadly speaking, I’m happy with the direction of normalizing all DIExpression operations, and as part of that process separating out read and write operations into different classes. One minor ergonomic suggestion I’d make is to change the API so that instead of explicitly constructing a DIExprRef from a DIExpression, we simply return a DIExprRef from a DIExpression::getExpression function or something similar. It could work as a canonical view of the DIExpression contents, rather than a separate class that has to be explicitly constructed.

Using static classes for the DIOps has benefits for code that actually operates on DIExpressions; I’m not sure that we should look to replace the DIExpression “at rest” representation at any point though - the current representation is simple and compact (it seems very unlikely that we could ever make the DIOp[] representation smaller than the uint64_t[] representation for most expressions) and since it’s trivial to construct a view of DIOps from the DIExpression, it’s not necessary for usability. Also, I’m not certain if we ought to guard against invalid expressions in the general case with LLVMEscape - we use DIExpression::isValid in a bunch of places so checking validity once upfront shouldn’t be much of an extra cost; a comparable solution here may be to just make acquiring a DIExprRef fallible, e.g. std::optional<DIExprRef> DIExpression::getExpression().

I also wonder whether it might be better to rework DIExprBuf as a DIExprBuilder class - basically keeping it the same, but allowing it to be constructed with just an LLVMContext* and optionally a list of DIOp so that it can be used for constructing expressions from scratch as well as modifying existing expressions. This might not have any useful applications in LLVM, and can definitely be skipped if that’s the case, but it seems like modifying and constructing expressions would have a common base.

As a side note not directly related to this change, I believe DIExpressions could be even more compact - if we modified the DIExpression class to no longer be an MDNode (as discussed elsewhere it doesn’t really make use of MDNode features), and allocated the uint64_t elements adjacent to the DIExpression rather than as a separate pointer (similar to the MDNode override of operator new), we could save some space and reduce indirection. I don’t think the size of DIExpressions is a major performance issue right now though, so it probably wouldn’t be a priority.

One minor ergonomic suggestion I’d make is to change the API so that instead of explicitly constructing a DIExprRef from a DIExpression, we simply return a DIExprRef from a DIExpression::getExpression function or something similar.

a comparable solution here may be to just make acquiring a DIExprRef fallible, e.g. std::optional<DIExprRef> DIExpression::getExpression().

I did have a DIExpression::asRef method along these lines, but removed it during some refactoring to get better decoupling so that e.g. translation units which #include "DebugInfoMetadata.h" but do not deal with expressions specifically don’t pay the cost of including the new definitions. That’s the reason for the DebugInfoCommon and DebugInfoExprs split (although I think I can merge DebugInfoCommon back into DebugInfoMetadata to simplify this, will report back).

However, this method was not fallible, because that requires eager iteration of the whole expression to confirm a DIExprRef can represent it. In general, repeatedly iterating over these expression vectors is cheap, so maybe the ergonomic benefits outweigh the small cost? I can make the change and benchmark it, since I agree it would be simpler.

the current representation is simple and compact (it seems very unlikely that we could ever make the DIOp[] representation smaller than the uint64_t[] representation for most expressions) and since it’s trivial to construct a view of DIOps from the DIExpression, it’s not necessary for usability.

I agree uint64_t[] is a pretty compact and elegant representation, but the DIOp[] representation seems to only consume marginally more memory on average. It can also be offset by a reduction in redundant expressions being uniqued.

I will bundle up the patches and data I used to get these numbers, but for example when dogfooding LLVM in RelWithDebInfo using a compiler built to spit out stats in LLVMContext::~LLVMContext I see:

# DIOp[]size / uint64_t[]size df['diop_cost_ratio'] = ((df.num_ops * 16) / (df.num_bytes)) df['diop_cost_ratio'].describe() count 1.531823e+06 mean 1.112958e+00 std 2.375864e-01 min 6.666667e-01 25% 1.000000e+00 50% 1.200000e+00 75% 1.333333e+00 max 2.000000e+00 Name: diop_cost_ratio, dtype: float64 

In other words, while the worst case is 2x, the average is only ~11% higher memory usage.

Admittedly the best-case (6.666667e-01) of this is driven by DW_OP_LLVM_fragment having two operands and often appearing alone:

df.sort_values(by=['diop_cost_ratio']).head()	actually_used	num_ops	num_bytes	max_rss	expr	compilation_id	diop_cost_ratio 846379	True	1	24	624836608	!DIExpression(DW_OP_LLVM_fragment, 512, 64)	8d0e77cb8b	0.666667 915141	True	1	24	525606912	!DIExpression(DW_OP_LLVM_fragment, 576, 160)	06f9620534	0.666667 915143	False	1	24	525606912	!DIExpression(DW_OP_LLVM_fragment, 192, 192)	06f9620534	0.666667 915144	True	1	24	525606912	!DIExpression(DW_OP_LLVM_fragment, 0, 224)	06f9620534	0.666667 1284721	True	1	24	299380736	!DIExpression(DW_OP_LLVM_fragment, 64, 32)	a211ebfce7	0.666667 

However, even with all these cases filtered out (as might be the case when we move to encoding fragments elsewhere in the metadata) the mean cost only rises to ~25%:

df[~df['expr'].str.contains('DW_OP_LLVM_fragment')]['diop_cost_ratio'].describe() count 1.011389e+06 mean 1.241406e+00 std 1.570306e-01 min 6.666667e-01 25% 1.200000e+00 50% 1.285714e+00 75% 1.333333e+00 max 2.000000e+00 Name: diop_cost_ratio, dtype: float64 

I’m not saying we definitely should switch, but the cognitive load would be reduced, and the costs of translating between representations would disappear.

I also wonder whether it might be better to rework DIExprBuf as a DIExprBuilder class - basically keeping it the same, but allowing it to be constructed with just an LLVMContext* and optionally a list of DIOp so that it can be used for constructing expressions from scratch as well as modifying existing expressions. This might not have any useful applications in LLVM, and can definitely be skipped if that’s the case, but it seems like modifying and constructing expressions would have a common base.

If I’m understanding correctly, this is the current intent! These are the public constructors for DIExprBuf along with :

DIExprBuf() = default; explicit DIExprBuf(LLVMContext *Ctx); explicit DIExprBuf(const DIExpression *From); 

The hope was a single DIExprBuf could be reused in many cases (i.e. when a pass is transforming many expressions) so for ergonomic reasons there are also methods to “assign” into one:

/// Clear the buffer and assign From into it as-if it were being newly /// constructed. Useful where many expressions are manipulated to amortize /// allocation costs. DIExprBuf &assign(const DIExpression *From); DIExprBuf &assign(LLVMContext *Ctx, DIExprRef From); 

I did just notice that I didn’t update the actual constructor to accept a DIExprRef, and I agree it should be optional. I’ll clean that up.

1 Like

Here’s the hack I based the numbers in the last post on: Comparing llvm:main…slinder1:upstreaming-debug-info_benchmarking · llvm/llvm-project

I also made a few edits with an eye to close the performance gap in ReleaseLTO-g; I overwrote slinder1/perf/diexpr-ref-buf_squashed:

LLVM Compile-Time Tracker

instructions:u Commit	67d25eee98	c4e7da3da5 stage1-O3	60554M (-0.02%)	60563M stage1-ReleaseThinLTO	76249M (+0.00%)	76246M stage1-ReleaseLTO-g	89006M (-0.04%)	89038M stage1-O0-g	18520M (+0.01%)	18519M stage1-aarch64-O3	67536M (+0.00%)	67536M stage1-aarch64-O0-g	22600M (-0.01%)	22603M stage2-O3	52437M (-0.04%)	52460M stage2-O0-g	16219M (-0.01%)	16221M stage2-clang	34618385M (+0.07%)	34592728M 

There are also some meaningful RSS improvements in some cases:

max-rss Commit	67d25eee98	c4e7da3da5 stage1-O3	2780MiB (-0.82%)	2803MiB stage1-ReleaseThinLTO	2712MiB (-0.57%)	2727MiB stage1-ReleaseLTO-g	3280MiB (-0.77%)	3305MiB stage1-O0-g	2605MiB (-0.90%)	2629MiB stage1-aarch64-O3	2974MiB (-0.68%)	2995MiB stage1-aarch64-O0-g	2596MiB (-0.78%)	2617MiB stage2-O3	2590MiB (+0.17%)	2586MiB stage2-O0-g	2493MiB (+0.14%)	2490MiB stage2-clang	2321MiB (+0.91%)	2300MiB 

I will update and clean up the patch series.

1 Like

Yes, there’s a tradeoff involved in storing uint64_t and iterating through the expression every time we fetch a view. My thoughts with the usage characteristics of DIExpressions however is that they spend most of their time “at rest”, are occasionally read to check some property, and are infrequently modified/created. Many of the methods that read properties of DIExpressions already make a validity check every time they’re called, so doing so once whenever we acquire a reference for a particular DIExpression might be more efficient than the current model, since we effectively cache the validity check.

I’m somewhat surprised that the cost of including the new code is so significant - DebugInfoMetadata.h in itself is a very expensive include that we usually go out of our way to avoid including (e.g. with DebugLoc), so it’s surprising that adding an extra include on top of that would be noticeable. I suppose there is a new .def file though, which tends to rack up costs a bit; maybe it’d be reasonable to just move DIExpression to a new file altogether?

It certainly could be worth it; my preference though (which need not be followed) would be to first implement this with a conversion every time the view is constructed, and then look at moving DIOp[] into DIExpressions as a followup change, so we can get more concrete comparisons. My current suspicion is that the runtime cost will be insignificant, while the RSS improvement will be small but noticeable, but it’ll be hard to know for sure without doing it!

Yes, there’s a tradeoff involved in storing uint64_t and iterating through the expression every time we fetch a view. My thoughts with the usage characteristics of DIExpressions however is that they spend most of their time “at rest”, are occasionally read to check some property, and are infrequently modified/created. Many of the methods that read properties of DIExpressions already make a validity check every time they’re called, so doing so once whenever we acquire a reference for a particular DIExpression might be more efficient than the current model, since we effectively cache the validity check.

Makes sense, and one step further if we can require validity for all DIExpression we could do the check once while uniqueing, and constructing the view from a DIExpression can just assert validity.

The validity of the DIExpression::isValid is stronger than required for the view to be constructed, so we could allow e.g. syntactically-valid but semantically-invalid DIExpressions here if we needed to.

I’m somewhat surprised that the cost of including the new code is so significant - DebugInfoMetadata.h in itself is a very expensive include that we usually go out of our way to avoid including (e.g. with DebugLoc ), so it’s surprising that adding an extra include on top of that would be noticeable. I suppose there is a new .def file though, which tends to rack up costs a bit; maybe it’d be reasonable to just move DIExpression to a new file altogether?

Yes, and the strongly-typed operations mean a lot of small method bodies which appear to be a significant bottleneck when profiling with callgrind. I haven’t thought of a real way to mitigate this, and it is a significant drawback of the approach.

maybe it’d be reasonable to just move DIExpression to a new file altogether?

This sounds like the best option to me.

It certainly could be worth it; my preference though (which need not be followed) would be to first implement this with a conversion every time the view is constructed, and then look at moving DIOp[] into DIExpressions as a followup change, so we can get more concrete comparisons. My current suspicion is that the runtime cost will be insignificant, while the RSS improvement will be small but noticeable, but it’ll be hard to know for sure without doing it!

I completely agree that the first step should not touch the DIExpression representation!

If we then decide never to touch it then at least most code doesn’t need to worry about it anymore. If we do decide to make the change it should be significantly easier once most code no longer depends on the internals.

1 Like