- Notifications
You must be signed in to change notification settings - Fork 5.2k
Generalize loop inversion #50982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize loop inversion #50982
Conversation
| @AndyAyersMS @dotnet/jit-contrib PTAL |
Curious about regex redux... how much faster? |
My local run says 12%. I don't have too much experience running these locally, so I'm not confident about what the noise level is. |
| btw, all my perf runs and diffs were Windows x64. |
AndyAyersMS left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks fine, just some small things for follow-up.
src/coreclr/jit/optimizer.cpp Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe leave a breadcrumb here describing why 34 is a plausible choice?
Have you looked at code size growth as a function of this number? Would be curious to see how sensitive we are to changes here. (may be hard to assess with SPMI, given it will only show deltas for diffs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you looked at code size growth as a function of this number?
I haven't done a study, but it shouldn't be too hard to do. Although graphing against code size doesn't tell much about generated code performance, possibly.
may be hard to assess with SPMI, given it will only show deltas for diffs
Wouldn't that be sufficient? If we want to know overall code size growth percentage, it wouldn't be hard to compute the total baseline code size once and apply the diff on top of that.
Might also be interesting to graph the number against things like CSEs, hoisted code, cloned loops.
Not sure how much time it's worth spending on something like this, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
graphing against code size doesn't tell much about generated code performance...
This is one of those "optimizations" that is really an enabling transformation -- not particularly beneficial on its own, possibly even a pessimization (at least in size, maybe not in cycles), but one that can lead to very profitable optimizations later.
So any sort of "growth vs threshold" chart would be interesting to see. If it's just the absolute growth and not the relative growth, fine... as you say we can figure out a rough baseline for the unchanged methods.
So my thought is that we should do this transformation as aggressively as we can without causing undue code size increases. I am curious where we sit currently on the spectrum (I don't really know) and whether there is some sort of inflection point or points out there possibly reflecting common language patters (say a dramatic jump in the curve at some threshold value)...
src/coreclr/jit/optimizer.cpp Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea what this costs in terms of throughput? Loops are rare, and loops with large complex exit conditions are rare, so it seems unlikely to matter, but still....
I wonder if we could have some sort of early out here if the cost gets so large we can't imagine being able to shrink it back down with the discounts we try to apply later, just to avoid wasting time analyzing things we won't ever optimize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea what this costs in terms of throughput?
I can do some pin measurements.
I wonder if we could have some sort of early out here if the cost gets so large we can't imagine being able to shrink it back down with the discounts we try to apply later, just to avoid wasting time analyzing things we won't ever optimize.
Yeah, I was wondering about something like that. Might be easy to spit out stats for estimated cost versus max cost across some spmi collection.
src/coreclr/jit/optimizer.cpp Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This formula seems a bit odd to me...
- not clear why
loopIterationsfactors in like it does, and adding1.5seems awfully arbitrary - I would expect this loop to reduce the cost estimate rather than increase the maximum we're willing to pay
Not suggesting you change it yet, mind you, but here's a case where it would be nice to keep track of cost and benefit; having an expensive CSE should both reduce cost (less code ends up getting duplicated) and increase benefit (an expensive computation happens just once, not repeatedly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't follow, nor I see any documentation around the numbers 24, 8 and 1.5. Probably worth mentioning it somewhere and use const variable with appropriate name to describe it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calculations here, and numbers chosen, are seem somewhat arbitrary. I don't have a good sense as to why they are good choices. The only changes I made to this set of numbers and calculations is:
- Bumped initial
maxDupCostSzfrom 32 to 34. This was specifically to avoid regressions where before we measured the cost of the JTRUE op1, and now we measure the cost of the JTRUE tree itself, which adds a cost of 2. - I added
8 * arrayLengthCountto the max cost, based on appearances of array.Length expressions. This itself was chosen arbitrarily.
So, I don't have a good sense as to why the original numbers of 32, *4, *2, 12.0, 96.0, 24, or 1.5 were chosen, or why loopIterations is used as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect this loop to reduce the cost estimate rather than increase the maximum we're willing to pay
It does seem like that would make sense, and make the code easier to reason about. Since I was sure how to think about the existing sharedStaticHelperCount calculation, I wasn't sure how to translate to that form.
src/coreclr/jit/optimizer.cpp Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be (optInvertTotalInfo.sharedStaticHelperCount > 0)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, the problem is it should be optInvertInfo.arrayLengthCount > 0 -- not a bug, but an inefficiency. The idea is to avoid the recalculation of max cost unless there would be a change.
src/coreclr/jit/optimizer.cpp Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't understand the significance of the comment // for the JitDump output below. We have to store maxDupCostSz regardless, unless I am missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we don't use the stored maxDupCostSz except in the JITDUMP output. costIsTooHigh has already been calculated.
src/coreclr/jit/optimizer.cpp Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't follow, nor I see any documentation around the numbers 24, 8 and 1.5. Probably worth mentioning it somewhere and use const variable with appropriate name to describe it?
| I collected some stats about this change versus a baseline using SPMI:
The loops recognized, loops cloned, and CSEs come from JitTimeLogCsv (I added CSEs), so are per-function. The other numbers are from custom output every time we get past the legality screening of loop inversion and decide whether or not to actually do the inversion based on cost. So it can be seen that quite a few more loops are inverted compared to the baseline. What's odd, and I don't have a good explanation, is the number of recognized loops and CSEs goes down, and the number of cloned loops stays the same. Running "pin" on a SPMI replay of the benchmarks shows a possible 0.1% throughput regression. It's close to the noise range. |
| Interesting. I wonder if inversion + subsequent flow opts breaks recognition in some cases. |
| There are 18 functions in benchmarks (out of 3688 in the "diff" case with any loops) where the number of recognized natural loops changes, 7 more, 11 less:
TBD is why. |
| One case we lose loops, for |
| Here's another experiment. I removed all the "max allowed cost" boosting, and ran SPMI replay over a combination of aspnet + benchmarks + libraries.crossgen2 collections, increasing the allowed max cost loop inversion from 0 to 159. This was with the generalized inversion code of this PR (allowing multiple condition statements to be cloned). The code size went from 52649983 to 53073432 (a 0.8% increase, maximum). The graph: The current max cost of 34 allows about 95% of all code size increases (not counting the boosts we allow for static helpers and array.Length just added). There's a big jump from 21 to 23. After 26, it's very gradual. |
| Another case where we lose loops is |
For cases like these with OR exit conditions, we'd need multiple blocks to be handled during inversions. Basically look at all the backedges into the loop top block, and if any of those come from blocks "below" that first exit... they also need to be duplicated. You can't handle the subsequent blocks quite the same way as the first one. That is, copying the block code into the block before the loop is wrong, you now need a new block.
I wonder if there's some simple way to forestall this, eg settting a bb flag on key blocks during inversion? |
It still feels to me like loop inversion should be done after loop recognition, and done on the loop graph, and maybe this is another reason why.
Possibly we could mark that all "loop blocks" found by inversion, and do some kind of avoidance of block movement of these blocks, at least before loop recognition. Overall, are there additional changes you think need to be made, or things investigated, for this particular change? You can also contact me offline to discuss if preferred. |
Agree.
No, I think this is a good improvement. |
AndyAyersMS left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Starting with an experimental branch from @AndyAyersMS, generalize the loop inversion `optInvertWhileLoop` code to consider duplicating an entire conditional block, not just a conditional block that contains exactly one JTRUE tree. Since the JTRUE itself wasn't considered before in the costing, bump up the maximum cost allowed by 2 to account for that. (Note that there is a lot of room here for tuning the cost/benefit analysis.) Additionally, the code already bumped the allowed cost if the condition tree contained calls to a shared static helper. Add another boost if the tree contains array.Length expressions, which are likely to be CSE'ed. Loop inversion by itself doesn't buy much, but our downstream phases, like natural loop recognition, loop hoisting, and loop cloning, depend on an inverted loop structure with a zero trip test. There are many diffs, typically size regressions because we are duplicating code. Some notable cases: 1. BenchI Puzzle. This was a motivating example, from dotnet#6569. It ends up with multiple new CSEs and BDN shows a 24% run-time improvement. 2. RegexRedux_5 and ReverseComplement_1 also report faster 3. On the contrary, FannkuchRedux_5 reports 23% slower. It appears we create fewer CSEs and end up with a couple more instructions inside some tight loops. 4. Other things typically seen in diffs: (a) removed null and bounds checks, (b) additional CSEs, (c) loop alignment kicks in. The loop matching at the stage inversion runs is lexical and very simplistic, as it is before natural loop recognition. As a result, for some loops, like multiple-condition `while` loops (e.g., `while (--digits >= 0 || value != 0)` in `System.Number:UInt32ToDecChars`), the inversion ends up creating some pretty weird flow. (That's actually not new to this change.) I added a `COMPlus_JitDoLoopInversion` config to allow turning off the phase to do experiments. Here are the diffs for SPMI on the benchmarks: ``` Summary of Code Size diffs: (Lower is better) Total bytes of base: 306240 Total bytes of diff: 306022 Total bytes of delta: -218 (-0.07% of base) diff is an improvement. ``` <details> <summary>Detail diffs</summary> ``` Top file regressions (bytes): 204 : 18210.dasm (13.98% of base) 123 : 17149.dasm (3.84% of base) 115 : 12617.dasm (1.95% of base) 106 : 19256.dasm (5.92% of base) 105 : 8717.dasm (8.19% of base) 94 : 5445.dasm (9.49% of base) 85 : 16452.dasm (1.48% of base) 82 : 19234.dasm (12.01% of base) 76 : 4626.dasm (20.05% of base) 64 : 18620.dasm (6.97% of base) 52 : 19202.dasm (16.20% of base) 50 : 19210.dasm (3.15% of base) 48 : 25971.dasm (0.66% of base) 46 : 18490.dasm (6.05% of base) 46 : 19131.dasm (8.44% of base) 36 : 17812.dasm (1.84% of base) 36 : 10409.dasm (4.20% of base) 36 : 15922.dasm (17.48% of base) 35 : 17997.dasm (11.18% of base) 34 : 14621.dasm (4.82% of base) Top file improvements (bytes): -557 : 18190.dasm (-19.94% of base) -525 : 18412.dasm (-12.80% of base) -260 : 18400.dasm (-19.89% of base) -202 : 17991.dasm (-16.67% of base) -201 : 20540.dasm (-18.91% of base) -200 : 18417.dasm (-9.96% of base) -159 : 17736.dasm (-7.87% of base) -128 : 18326.dasm (-7.15% of base) -90 : 18182.dasm (-4.71% of base) -67 : 18177.dasm (-3.94% of base) -59 : 18359.dasm (-5.13% of base) -53 : 20672.dasm (-8.02% of base) -53 : 18280.dasm (-10.77% of base) -45 : 18575.dasm (-8.08% of base) -42 : 16930.dasm (-7.89% of base) -40 : 17935.dasm (-4.77% of base) -38 : 17814.dasm (-2.11% of base) -38 : 18458.dasm (-4.87% of base) -36 : 18222.dasm (-3.07% of base) -36 : 18440.dasm (-6.75% of base) 376 total files with Code Size differences (153 improved, 223 regressed), 5 unchanged. Top method regressions (bytes): 204 (13.98% of base) : 18210.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceNamedTypeSymbol:MakeAcyclicInterfaces(Roslyn.Utilities.ConsList`1[[Microsoft.CodeAnalysis.CSharp.Symbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]],Microsoft.CodeAnalysis.DiagnosticBag):System.Collections.Immutable.ImmutableArray`1[[Microsoft.CodeAnalysis.CSharp.Symbols.NamedTypeSymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]]:this 123 ( 3.84% of base) : 17149.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.LanguageParser:ParseNamespaceBody(byref,byref,byref,ushort):this 115 ( 1.95% of base) : 12617.dasm - Utf8Json.Formatters.DictionaryFormatterBase`5[Int32,__Canon,__Canon,Enumerator,__Canon][System.Int32,System.__Canon,System.__Canon,System.Collections.Generic.Dictionary`2+Enumerator[System.Int32,System.__Canon],System.__Canon]:Serialize(byref,System.__Canon,Utf8Json.IJsonFormatterResolver):this 106 ( 5.92% of base) : 19256.dasm - <Microsoft-Cci-ITypeDefinition-GetExplicitImplementationOverrides>d__31:MoveNext():bool:this 105 ( 8.19% of base) : 8717.dasm - System.String:SplitWithPostProcessing(System.ReadOnlySpan`1[Int32],System.ReadOnlySpan`1[Int32],int,int,int):System.String[]:this 94 ( 9.49% of base) : 5445.dasm - System.Uri:CreateUriInfo(long):this 85 ( 1.48% of base) : 16452.dasm - DynamicClass:_DynamicMethod9(System.IO.TextReader,int):MicroBenchmarks.Serializers.MyEventsListerViewModel 82 (12.01% of base) : 19234.dasm - <GetTopLevelTypesCore>d__53:MoveNext():bool:this 76 (20.05% of base) : 4626.dasm - System.Collections.Generic.PriorityQueue`2[__Canon,__Canon][System.__Canon,System.__Canon]:MoveDownCustomComparer(System.ValueTuple`2[__Canon,__Canon],int):this 64 ( 6.97% of base) : 18620.dasm - Microsoft.CodeAnalysis.CSharp.Binder:BindSimpleBinaryOperator(Microsoft.CodeAnalysis.CSharp.Syntax.BinaryExpressionSyntax,Microsoft.CodeAnalysis.DiagnosticBag):Microsoft.CodeAnalysis.CSharp.BoundExpression:this 52 (16.20% of base) : 19202.dasm - <GetAssemblyReferencesFromAddedModules>d__36:MoveNext():bool:this 50 ( 3.15% of base) : 19210.dasm - <GetTopLevelTypes>d__23[__Canon,__Canon,__Canon,__Canon,__Canon,__Canon,__Canon,__Canon,__Canon][System.__Canon,System.__Canon,System.__Canon,System.__Canon,System.__Canon,System.__Canon,System.__Canon,System.__Canon,System.__Canon]:MoveNext():bool:this 48 ( 0.66% of base) : 25971.dasm - DynamicClass:_DynamicMethod1(System.IO.TextReader,int):MicroBenchmarks.Serializers.CollectionsOfPrimitives 46 ( 6.05% of base) : 18490.dasm - Microsoft.CodeAnalysis.CSharp.ClsComplianceChecker:VisitAssembly(Microsoft.CodeAnalysis.CSharp.Symbols.AssemblySymbol):this 46 ( 8.44% of base) : 19131.dasm - <AddedModulesResourceNames>d__200:MoveNext():bool:this 36 ( 1.84% of base) : 17812.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceNamespaceSymbol:MakeNameToMembersMap(Microsoft.CodeAnalysis.DiagnosticBag):System.Collections.Generic.Dictionary`2[[System.String, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Collections.Immutable.ImmutableArray`1[[Microsoft.CodeAnalysis.CSharp.Symbols.NamespaceOrTypeSymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]], System.Collections.Immutable, Version=6.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]]:this 36 ( 4.20% of base) : 10409.dasm - System.Net.Security.SslStream:FillHandshakeBufferAsync(System.Net.Security.AsyncReadWriteAdapter,int):System.Threading.Tasks.ValueTask`1[Int32]:this 36 (17.48% of base) : 15922.dasm - System.Number:DecimalToNumber(byref,byref) 35 (11.18% of base) : 17997.dasm - <CreateNestedTypes>d__126:MoveNext():bool:this 34 ( 4.82% of base) : 14621.dasm - BenchmarksGame.KNucleotide_1:Bench(System.IO.Stream,BenchmarksGame.NucleotideHelpers,bool):bool Top method improvements (bytes): -557 (-19.94% of base) : 18190.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceNamedTypeSymbol:MakeDeclaredBases(Roslyn.Utilities.ConsList`1[[Microsoft.CodeAnalysis.CSharp.Symbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]],Microsoft.CodeAnalysis.DiagnosticBag):System.Tuple`2[[Microsoft.CodeAnalysis.CSharp.Symbols.NamedTypeSymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35],[System.Collections.Immutable.ImmutableArray`1[[Microsoft.CodeAnalysis.CSharp.Symbols.NamedTypeSymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]], System.Collections.Immutable, Version=6.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]]:this -525 (-12.80% of base) : 18412.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceMemberContainerTypeSymbol:ComputeInterfaceImplementations(Microsoft.CodeAnalysis.DiagnosticBag,System.Threading.CancellationToken):System.Collections.Immutable.ImmutableArray`1[[Microsoft.CodeAnalysis.CSharp.Symbols.SynthesizedExplicitImplementationForwardingMethod, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]]:this -260 (-19.89% of base) : 18400.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.OverriddenOrHiddenMembersHelpers:FindOverriddenOrHiddenMembersInType(Microsoft.CodeAnalysis.CSharp.Symbol,bool,Microsoft.CodeAnalysis.CSharp.Symbols.NamedTypeSymbol,Microsoft.CodeAnalysis.CSharp.Symbols.NamedTypeSymbol,byref,byref,byref) -202 (-16.67% of base) : 17991.dasm - Microsoft.CodeAnalysis.CSharp.CSharpCompilation:GetRuntimeMember(Microsoft.CodeAnalysis.CSharp.Symbols.NamedTypeSymbol,byref,Microsoft.CodeAnalysis.RuntimeMembers.SignatureComparer`5[[Microsoft.CodeAnalysis.CSharp.Symbols.MethodSymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35],[Microsoft.CodeAnalysis.CSharp.Symbols.FieldSymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35],[Microsoft.CodeAnalysis.CSharp.Symbols.PropertySymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35],[Microsoft.CodeAnalysis.CSharp.Symbols.TypeSymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35],[Microsoft.CodeAnalysis.CSharp.Symbols.ParameterSymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]],Microsoft.CodeAnalysis.CSharp.Symbols.AssemblySymbol):Microsoft.CodeAnalysis.CSharp.Symbol -201 (-18.91% of base) : 20540.dasm - Microsoft.CodeAnalysis.CSharp.SyntaxTreeSemanticModel:GetDeclaredMember(Microsoft.CodeAnalysis.CSharp.Symbols.NamespaceOrTypeSymbol,Microsoft.CodeAnalysis.Text.TextSpan,System.String):Microsoft.CodeAnalysis.CSharp.Symbol:this -200 (-9.96% of base) : 18417.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceMemberContainerTypeSymbol:CheckMemberNameConflicts(Microsoft.CodeAnalysis.DiagnosticBag):this -159 (-7.87% of base) : 17736.dasm - Microsoft.CodeAnalysis.CSharp.CSharpCompilation:GetDiagnostics(int,bool,Microsoft.CodeAnalysis.DiagnosticBag,System.Threading.CancellationToken):this -128 (-7.15% of base) : 18326.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceMemberContainerTypeSymbol:MakeTypeMembers(Microsoft.CodeAnalysis.DiagnosticBag):System.Collections.Generic.Dictionary`2[[System.String, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Collections.Immutable.ImmutableArray`1[[Microsoft.CodeAnalysis.CSharp.Symbols.NamedTypeSymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]], System.Collections.Immutable, Version=6.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]]:this -90 (-4.71% of base) : 18182.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceNamedTypeSymbol:PostDecodeWellKnownAttributes(System.Collections.Immutable.ImmutableArray`1[[Microsoft.CodeAnalysis.CSharp.Symbols.CSharpAttributeData, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]],System.Collections.Immutable.ImmutableArray`1[[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeSyntax, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]],Microsoft.CodeAnalysis.DiagnosticBag,short,Microsoft.CodeAnalysis.WellKnownAttributeData):this -67 (-3.94% of base) : 18177.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceMemberContainerTypeSymbol:ForceComplete(Microsoft.CodeAnalysis.SourceLocation,System.Threading.CancellationToken):this -59 (-5.13% of base) : 18359.dasm - Microsoft.CodeAnalysis.CSharp.Binder:ValidateParameterNameConflicts(System.Collections.Immutable.ImmutableArray`1[[Microsoft.CodeAnalysis.CSharp.Symbols.TypeParameterSymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]],System.Collections.Immutable.ImmutableArray`1[[Microsoft.CodeAnalysis.CSharp.Symbols.ParameterSymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]],Microsoft.CodeAnalysis.DiagnosticBag):this -53 (-8.02% of base) : 20672.dasm - Microsoft.CodeAnalysis.CSharp.Imports:LookupSymbolInUsings(System.Collections.Immutable.ImmutableArray`1[NamespaceOrTypeAndUsingDirective],Microsoft.CodeAnalysis.CSharp.Binder,Microsoft.CodeAnalysis.CSharp.LookupResult,System.String,int,Roslyn.Utilities.ConsList`1[[Microsoft.CodeAnalysis.CSharp.Symbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]],int,bool,byref) -53 (-10.77% of base) : 18280.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.NamespaceOrTypeSymbol:GetSourceTypeMember(System.String,int,ushort,Microsoft.CodeAnalysis.CSharp.CSharpSyntaxNode):Microsoft.CodeAnalysis.CSharp.Symbols.SourceNamedTypeSymbol:this -45 (-8.08% of base) : 18575.dasm - Microsoft.CodeAnalysis.CSharp.ExecutableCodeBinder:ValidateIteratorMethods(Microsoft.CodeAnalysis.DiagnosticBag):this -42 (-7.89% of base) : 16930.dasm - V8.Crypto.BigInteger:fromByteArray(System.Byte[]):this -40 (-4.77% of base) : 17935.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.NamespaceOrTypeSymbol:LookupMetadataType(byref):Microsoft.CodeAnalysis.CSharp.Symbols.NamedTypeSymbol:this -38 (-2.11% of base) : 17814.dasm - Microsoft.CodeAnalysis.CSharp.MergedNamespaceDeclaration:MakeChildren():System.Collections.Immutable.ImmutableArray`1[[Microsoft.CodeAnalysis.CSharp.MergedNamespaceOrTypeDeclaration, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]]:this -38 (-4.87% of base) : 18458.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceMemberMethodSymbol:ForceComplete(Microsoft.CodeAnalysis.SourceLocation,System.Threading.CancellationToken):this -36 (-3.07% of base) : 18222.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceMemberContainerTypeSymbol:MakeAllMembers(Microsoft.CodeAnalysis.DiagnosticBag):System.Collections.Generic.Dictionary`2[[System.String, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Collections.Immutable.ImmutableArray`1[[Microsoft.CodeAnalysis.CSharp.Symbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]], System.Collections.Immutable, Version=6.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]]:this -36 (-6.75% of base) : 18440.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceMemberContainerTypeSymbol:CheckTypeParameterNameConflicts(Microsoft.CodeAnalysis.DiagnosticBag):this Top method regressions (percentages): 17 (36.17% of base) : 22943.dasm - System.Collections.IndexerSetReverse`1[Int32][System.Int32]:Array():System.Int32[]:this 17 (36.17% of base) : 24496.dasm - System.Collections.IndexerSetReverse`1[__Canon][System.__Canon]:Array():System.__Canon[]:this 18 (27.27% of base) : 13842.dasm - System.Collections.IterateFor`1[__Canon][System.__Canon]:Get(System.Collections.Generic.IList`1[__Canon]):System.__Canon:this 15 (26.79% of base) : 13055.dasm - System.Number:UInt32ToDecChars(long,int,int):long 15 (25.86% of base) : 975.dasm - System.Number:UInt32ToDecChars(long,int,int):long 16 (25.81% of base) : 20092.dasm - System.Collections.IndexerSet`1[__Canon][System.__Canon]:Set(System.Collections.Generic.IList`1[__Canon]):System.Collections.Generic.IList`1[__Canon]:this 22 (25.58% of base) : 9613.dasm - System.Xml.XmlLoader:LoadDocSequence(System.Xml.XmlDocument):this 18 (25.35% of base) : 11926.dasm - EMFloatClass:normalize(InternalFPF) 14 (22.95% of base) : 25097.dasm - V8.Richards.Packet:addTo(V8.Richards.Packet):V8.Richards.Packet:this 16 (21.92% of base) : 4954.dasm - System.IO.Compression.DeflateStream:WriteDeflaterOutput():this 10 (20.41% of base) : 27331.dasm - System.Collections.IterateFor`1[Int32][System.Int32]:ImmutableList():int:this 10 (20.41% of base) : 9944.dasm - System.Collections.IterateFor`1[Int32][System.Int32]:ImmutableSortedSet():int:this 76 (20.05% of base) : 4626.dasm - System.Collections.Generic.PriorityQueue`2[__Canon,__Canon][System.__Canon,System.__Canon]:MoveDownCustomComparer(System.ValueTuple`2[__Canon,__Canon],int):this 10 (20.00% of base) : 14834.dasm - System.Collections.IterateFor`1[__Canon][System.__Canon]:ImmutableList():System.__Canon:this 10 (20.00% of base) : 15485.dasm - System.Collections.IterateFor`1[__Canon][System.__Canon]:ImmutableSortedSet():System.__Canon:this 24 (19.05% of base) : 8484.dasm - System.Collections.IterateForEach`1[__Canon][System.__Canon]:ImmutableArray():System.__Canon:this 17 (18.09% of base) : 11992.dasm - Newtonsoft.Json.Utilities.DateTimeUtils:CopyIntToCharArray(System.Char[],int,int,int) 9 (18.00% of base) : 19831.dasm - System.Reflection.Metadata.Ecma335.MetadataSizes:CalculateTableStreamHeaderSize():int:this 16 (17.98% of base) : 4541.dasm - System.Collections.IndexerSetReverse`1[__Canon][System.__Canon]:Span():System.__Canon:this 36 (17.48% of base) : 15922.dasm - System.Number:DecimalToNumber(byref,byref) Top method improvements (percentages): -19 (-28.79% of base) : 18465.dasm - Microsoft.CodeAnalysis.CSharp.MergedTypeDeclaration:get_AnyMemberHasAttributes():bool:this -557 (-19.94% of base) : 18190.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceNamedTypeSymbol:MakeDeclaredBases(Roslyn.Utilities.ConsList`1[[Microsoft.CodeAnalysis.CSharp.Symbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]],Microsoft.CodeAnalysis.DiagnosticBag):System.Tuple`2[[Microsoft.CodeAnalysis.CSharp.Symbols.NamedTypeSymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35],[System.Collections.Immutable.ImmutableArray`1[[Microsoft.CodeAnalysis.CSharp.Symbols.NamedTypeSymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]], System.Collections.Immutable, Version=6.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]]:this -260 (-19.89% of base) : 18400.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.OverriddenOrHiddenMembersHelpers:FindOverriddenOrHiddenMembersInType(Microsoft.CodeAnalysis.CSharp.Symbol,bool,Microsoft.CodeAnalysis.CSharp.Symbols.NamedTypeSymbol,Microsoft.CodeAnalysis.CSharp.Symbols.NamedTypeSymbol,byref,byref,byref) -30 (-19.35% of base) : 19588.dasm - Microsoft.Cci.MetadataWriter:MayUseSmallExceptionHeaders(System.Collections.Immutable.ImmutableArray`1[[Microsoft.Cci.ExceptionHandlerRegion, Microsoft.CodeAnalysis, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]]):bool -201 (-18.91% of base) : 20540.dasm - Microsoft.CodeAnalysis.CSharp.SyntaxTreeSemanticModel:GetDeclaredMember(Microsoft.CodeAnalysis.CSharp.Symbols.NamespaceOrTypeSymbol,Microsoft.CodeAnalysis.Text.TextSpan,System.String):Microsoft.CodeAnalysis.CSharp.Symbol:this -11 (-16.92% of base) : 18912.dasm - Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator:EmitStatements(System.Collections.Immutable.ImmutableArray`1[[Microsoft.CodeAnalysis.CSharp.BoundStatement, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]]):this -11 (-16.92% of base) : 18800.dasm - Microsoft.CodeAnalysis.CSharp.DataFlowPass:DeclareVariables(System.Collections.Immutable.ImmutableArray`1[[Microsoft.CodeAnalysis.CSharp.Symbols.LocalSymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]]):this -11 (-16.92% of base) : 18812.dasm - Microsoft.CodeAnalysis.CSharp.DataFlowPass:ReportUnusedVariables(System.Collections.Immutable.ImmutableArray`1[[Microsoft.CodeAnalysis.CSharp.Symbols.LocalFunctionSymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]]):this -202 (-16.67% of base) : 17991.dasm - Microsoft.CodeAnalysis.CSharp.CSharpCompilation:GetRuntimeMember(Microsoft.CodeAnalysis.CSharp.Symbols.NamedTypeSymbol,byref,Microsoft.CodeAnalysis.RuntimeMembers.SignatureComparer`5[[Microsoft.CodeAnalysis.CSharp.Symbols.MethodSymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35],[Microsoft.CodeAnalysis.CSharp.Symbols.FieldSymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35],[Microsoft.CodeAnalysis.CSharp.Symbols.PropertySymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35],[Microsoft.CodeAnalysis.CSharp.Symbols.TypeSymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35],[Microsoft.CodeAnalysis.CSharp.Symbols.ParameterSymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]],Microsoft.CodeAnalysis.CSharp.Symbols.AssemblySymbol):Microsoft.CodeAnalysis.CSharp.Symbol -11 (-16.18% of base) : 20764.dasm - Microsoft.CodeAnalysis.CSharp.AbstractRegionDataFlowPass:MakeSlots(System.Collections.Immutable.ImmutableArray`1[[Microsoft.CodeAnalysis.CSharp.Symbols.ParameterSymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]]):this -11 (-15.94% of base) : 19532.dasm - Microsoft.Cci.MetadataVisitor:Visit(System.Collections.Immutable.ImmutableArray`1[[Microsoft.Cci.IParameterDefinition, Microsoft.CodeAnalysis, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]]):this -11 (-15.71% of base) : 19548.dasm - Microsoft.Cci.MetadataVisitor:Visit(System.Collections.Immutable.ImmutableArray`1[[Microsoft.Cci.ILocalDefinition, Microsoft.CodeAnalysis, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]]):this -11 (-15.71% of base) : 19553.dasm - Microsoft.Cci.MetadataVisitor:Visit(System.Collections.Immutable.ImmutableArray`1[[Microsoft.Cci.ExceptionHandlerRegion, Microsoft.CodeAnalysis, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]]):this -11 (-15.71% of base) : 19457.dasm - Microsoft.Cci.MetadataVisitor:Visit(System.Collections.Immutable.ImmutableArray`1[[Microsoft.Cci.IParameterTypeInformation, Microsoft.CodeAnalysis, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]]):this -11 (-15.71% of base) : 19459.dasm - Microsoft.Cci.MetadataVisitor:Visit(System.Collections.Immutable.ImmutableArray`1[[Microsoft.Cci.ICustomModifier, Microsoft.CodeAnalysis, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]]):this -11 (-15.49% of base) : 18811.dasm - Microsoft.CodeAnalysis.CSharp.DataFlowPass:ReportUnusedVariables(System.Collections.Immutable.ImmutableArray`1[[Microsoft.CodeAnalysis.CSharp.Symbols.LocalSymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]]):this -11 (-15.07% of base) : 18797.dasm - Microsoft.CodeAnalysis.CSharp.DataFlowPass:EnterParameters(System.Collections.Immutable.ImmutableArray`1[[Microsoft.CodeAnalysis.CSharp.Symbols.ParameterSymbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]]):this -11 (-15.07% of base) : 20750.dasm - Microsoft.CodeAnalysis.CSharp.PreciseAbstractFlowPass`1[LocalState][Microsoft.CodeAnalysis.CSharp.DataFlowPass+LocalState]:VisitMultipleLocalDeclarations(Microsoft.CodeAnalysis.CSharp.BoundMultipleLocalDeclarations):Microsoft.CodeAnalysis.CSharp.BoundNode:this -11 (-15.07% of base) : 18768.dasm - Microsoft.CodeAnalysis.CSharp.PreciseAbstractFlowPass`1[LocalState][Microsoft.CodeAnalysis.CSharp.ControlFlowPass+LocalState]:VisitStatements(System.Collections.Immutable.ImmutableArray`1[[Microsoft.CodeAnalysis.CSharp.BoundStatement, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]]):this -12 (-13.64% of base) : 18415.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceMemberContainerTypeSymbol:CheckMemberNamesDistinctFromType(Microsoft.CodeAnalysis.DiagnosticBag):this 376 total methods with Code Size differences (153 improved, 223 regressed), 5 unchanged. ``` </details> -------------------------------------------------------------------------------- Additionally, I removed some dead "optimization enabling" code.
ee47af2 to 8dc789d Compare 


Starting with an experimental branch from @AndyAyersMS,
generalize the loop inversion
optInvertWhileLoopcodeto consider duplicating an entire conditional block, not
just a conditional block that contains exactly one JTRUE
tree. Since the JTRUE itself wasn't considered before in
the costing, bump up the maximum cost allowed by 2 to account
for that. (Note that there is a lot of room here for tuning
the cost/benefit analysis.)
Additionally, the code already bumped the allowed cost if the
condition tree contained calls to a shared static helper. Add
another boost if the tree contains array.Length expressions,
which are likely to be CSE'ed.
Loop inversion by itself doesn't buy much, but our downstream
phases, like natural loop recognition, loop hoisting, and loop
cloning, depend on an inverted loop structure with a zero trip
test.
There are many diffs, typically size regressions because we
are duplicating code. Some notable cases:
It ends up with multiple new CSEs and BDN shows a 24% run-time
improvement.
we create fewer CSEs and end up with a couple more instructions
inside some tight loops.
bounds checks, (b) additional CSEs, (c) loop alignment kicks in.
The loop matching at the stage inversion runs is lexical and very
simplistic, as it is before natural loop recognition. As a result,
for some loops, like multiple-condition
whileloops (e.g.,while (--digits >= 0 || value != 0)inSystem.Number:UInt32ToDecChars), the inversion ends up creatingsome pretty weird flow. (That's actually not new to this change.)
I added a
COMPlus_JitDoLoopInversionconfig to allow turningoff the phase to do experiments.
Here are the diffs for SPMI on the benchmarks:
Detail diffs
Additionally, I removed some dead "optimization enabling" code.