Skip to content

Conversation

@EgorBo
Copy link
Member

@EgorBo EgorBo commented Jan 8, 2024

This PR enables PGO for castclass/isinst by default. Since we now have a dedicated tier for instrumentation it should not be a big deal to enable more probes IMO. Profiled casts seem to help LINQ for small inputs since LINQ is typically full of type casts for fast paths (e.g. this).

Benchmark:

[HideColumns("Error", "Job", "StdDev", "Median", "RatioSD")] public class LinqBenchmarks { IEnumerable<int> _enum1 = new int[10]; IEnumerable<int> _enum2 = new int[10]; [Benchmark] public bool SequenceEqual() => _enum1.SequenceEqual(_enum2); [Benchmark] public int ElementAt() => _enum1.ElementAt(5); [Benchmark] public int Count() => _enum1.Count(); [Benchmark] public List<int> ToList() => _enum1.ToList(); }
Method Toolchain Mean Ratio
SequenceEqual \Core_Root\corerun.exe 9.372 ns 2.96
SequenceEqual \Core_Root_PR\corerun.exe 3.171 ns 1.00
ElementAt \Core_Root\corerun.exe 7.163 ns 4.69
ElementAt \Core_Root_PR\corerun.exe 1.529 ns 1.00
Count \Core_Root\corerun.exe 3.482 ns 3.14
Count \Core_Root_PR\corerun.exe 1.127 ns 1.00
ToList \Core_Root\corerun.exe 16.334 ns 1.30
ToList \Core_Root_PR\corerun.exe 12.332 ns 1.00

PS: SequenceEqual should benefit from #96571 too

Codegen example

// Promote IsList to Tier1 with PGO for (int i = 0; i < 200; i++) { IsList(new int[10]); // source is always int[] Thread.Sleep(10); } [MethodImpl(MethodImplOptions.NoInlining)] bool IsList<T>(IEnumerable<T> source) => source is IList<T>;

Codegen diff (Main vs PR) for IsList:

; Assembly listing for method IsList G_M12490_IG01: sub rsp, 40 G_M12490_IG02: + mov rax, rcx + test rax, rax ; if (source == null) + je SHORT G_M12490_IG05 +G_M12490_IG03: + mov rdx, 0x7FF941DE9750 + cmp qword ptr [rax], rdx + je SHORT G_M12490_IG05 ; if (source is int[]) +G_M12490_IG04: mov rdx, rcx mov rcx, 0x7FF942439A60 call CORINFO_HELP_ISINSTANCEOFINTERFACE ; slow fallback +G_M12490_IG05: test rax, rax setne al movzx rax, al -G_M12490_IG03: +G_M12490_IG06: add rsp, 40 ret ; Total bytes of code 59

@AndyAyersMS @dotnet/jit-contrib do you think this should go through an internal run of dotnet/performance first?
I tested impact on TE (no impact on startup and rps) and a few desktop app. Also, I did a run of 250 LINQ benchmarks in dotnet/performance (10% of them improved, none regressed).

@ghost ghost assigned EgorBo Jan 8, 2024
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 8, 2024
@ghost
Copy link

ghost commented Jan 8, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR enables PGO for castclass/isinst by default. Since we now have a dedicated tier for instrumentation it's no longer a big deal to enable more probes IMO. Profiled casts seem to help a lot to LINQ for small inputs since LINQ is typically full of type casts for fast paths (e.g. this).

Benchmark:

IEnumerable<int> _enum1 = new int[10]; IEnumerable<int> _enum2 = new int[10]; [Benchmark] public bool SequenceEqual() => _enum1.SequenceEqual(_enum2); [Benchmark] public int ElementAt() => _enum1.ElementAt(5); [Benchmark] public int Count() => _enum1.Count();
Method Job Toolchain Mean Ratio
SequenceEqual Job-YKPWSW \Core_Root\corerun.exe 9.372 ns 2.96
SequenceEqual Job-AYWKCJ \Core_Root_PR\corerun.exe 3.171 ns 1.00
ElementAt Job-YKPWSW \Core_Root\corerun.exe 7.163 ns 4.69
ElementAt Job-AYWKCJ \Core_Root_PR\corerun.exe 1.529 ns 1.00
Count Job-YKPWSW \Core_Root\corerun.exe 3.482 ns 3.14
Count Job-AYWKCJ \Core_Root_PR\corerun.exe 1.127 ns 1.00

Codegen example

// Promote IsList to Tier1 with PGO for (int i = 0; i < 200; i++) { IsList(new int[10]); // source is always int[] Thread.Sleep(10); } [MethodImpl(MethodImplOptions.NoInlining)] bool IsList<T>(IEnumerable<T> source) => source is IList<T>;
; Assembly listing for method IsList G_M12490_IG01: sub rsp, 40 G_M12490_IG02: + mov rax, rcx + test rax, rax ; if (source == null) + je SHORT G_M12490_IG05 +G_M12490_IG03: + mov rdx, 0x7FF941DE9750 + cmp qword ptr [rax], rdx + je SHORT G_M12490_IG05 ; if (source is int[]) +G_M12490_IG04: mov rdx, rcx mov rcx, 0x7FF942439A60 call CORINFO_HELP_ISINSTANCEOFINTERFACE ; slow fallback +G_M12490_IG05: test rax, rax setne al movzx rax, al -G_M12490_IG03: +G_M12490_IG06: add rsp, 40 ret ; Total bytes of code 59

@AndyAyersMS do you think this should go through an internal run of dotnet/performance first?
I tested impact on TE (no impact on startup and rps) and a few desktop app. Also, I did a run of 250 LINQ benchmarks in dotnet/performance (10% of them improved, none regressed).

Author: EgorBo
Assignees: EgorBo
Labels:

area-CodeGen-coreclr

Milestone: -
@EgorBo
Copy link
Member Author

EgorBo commented Jan 8, 2024

/azp run runtime-coreclr pgo, runtime-coreclr libraries-pgo, runtime-coreclr pgostress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).
@stephentoub
Copy link
Member

Thanks. If this is isinst/castclass only, I assume it doesn't cover a pattern like this?

// Use `GetType() == typeof(...)` rather than `is` to avoid cast helpers. This is measurably cheaper
// but does mean we could end up missing some rare cases where we could get a span but don't (e.g. a uint[]
// masquerading as an int[]). That's an acceptable tradeoff. The Unsafe usage is only after we've
// validated the exact type; this could be changed to a cast in the future if the JIT starts to recognize it.
// We only pay the comparison/branching costs here for super common types we expect to be used frequently
// with LINQ methods.
bool result = true;
if (source.GetType() == typeof(TSource[]))
{
span = Unsafe.As<TSource[]>(source);
}
else if (source.GetType() == typeof(List<TSource>))
{
span = CollectionsMarshal.AsSpan(Unsafe.As<List<TSource>>(source));
}
else
{
span = default;
result = false;
}

(or is that covered separately?)

@EgorBo
Copy link
Member Author

EgorBo commented Jan 8, 2024

Thanks. If this is isinst/castclass only, I assume it doesn't cover a pattern like this?

// Use `GetType() == typeof(...)` rather than `is` to avoid cast helpers. This is measurably cheaper
// but does mean we could end up missing some rare cases where we could get a span but don't (e.g. a uint[]
// masquerading as an int[]). That's an acceptable tradeoff. The Unsafe usage is only after we've
// validated the exact type; this could be changed to a cast in the future if the JIT starts to recognize it.
// We only pay the comparison/branching costs here for super common types we expect to be used frequently
// with LINQ methods.
bool result = true;
if (source.GetType() == typeof(TSource[]))
{
span = Unsafe.As<TSource[]>(source);
}
else if (source.GetType() == typeof(List<TSource>))
{
span = CollectionsMarshal.AsSpan(Unsafe.As<List<TSource>>(source));
}
else
{
span = default;
result = false;
}

(or is that covered separately?)

This code looks like it's already devirtualized by hands so there is nothing JIT can do additionally here. Although, you can convert it back to is TSource[] arr if you want (but it might regress NAOT that typically doesn't have any PGO data and Mono).

One benefit of is TSource[] arr would be that uint[]/int[] case 🙂

@stephentoub
Copy link
Member

you can convert it back to is TSource[] arr if you want
One benefit of is TSource[] arr would be that uint[]/int[] case 🙂

It was written this way to explicitly avoid the variance costs

@EgorBo EgorBo marked this pull request as ready for review January 8, 2024 09:54
@EgorBo EgorBo requested a review from AndyAyersMS January 8, 2024 09:55
@EgorBo
Copy link
Member Author

EgorBo commented Jan 8, 2024

PGO pipelines fail with #96612 (unrelated issue)

@AndyAyersMS
Copy link
Member

As a follow up you might look at whether cloning can profitably key off of one of these casts. Right now type test cloning requires OMF_HAS_GUARDEDDEVIRT.

@EgorBo
Copy link
Member Author

EgorBo commented Jan 8, 2024

As a follow up you might look at whether cloning can profitably key off of one of these casts. Right now type test cloning requires OMF_HAS_GUARDEDDEVIRT.

Ah good point, will file an issue for it!

@github-actions github-actions bot locked and limited conversation to collaborators Mar 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

3 participants