Skip to content

Conversation

@tannergooding
Copy link
Member

This cleans up the NI_System_Math_* related logic to make it easier to determine what entries were being missed by alphabetizing the IDs and adding an assert for the range.

As part of that, this ensures all the existing public Math "intrinsic" APIs (minus FusedMultiplyAdd and SinCos, which have special requirements) are going through CSE and enables constant folding for them when IsReadyToRun is false.
This was done under the premise that, for a given run these APIs are deterministic and it is only harmful to constant fold in AOT like scenarios where the CRT used to compile and that which is being run against might differ.

The PMI diffs for --frameworks shows some minimal gains:

Summary of Code Size diffs: (Lower is better) Total bytes of base: 51165736 Total bytes of diff: 51165724 Total bytes of delta: -12 (-0.00% of base) diff is an improvement. Top file regressions (bytes): 12 : System.Runtime.Numerics.dasm (0.02% of base) Top file improvements (bytes): -21 : System.Speech.dasm (-0.00% of base) -3 : Microsoft.VisualBasic.Core.dasm (-0.00% of base) 3 total files with Code Size differences (2 improved, 1 regressed), 267 unchanged. Top method regressions (bytes): 12 ( 1.39% of base) : System.Runtime.Numerics.dasm - Complex:Asin_Internal(double,double,byref,byref,byref) Top method improvements (bytes): -21 (-17.36% of base) : System.Speech.dasm - ConvertTextFrag:ScaleNumber(float,int,int):int -3 (-0.55% of base) : Microsoft.VisualBasic.Core.dasm - Financial:NPer(double,double,double,double,int):double Top method regressions (percentages): 12 ( 1.39% of base) : System.Runtime.Numerics.dasm - Complex:Asin_Internal(double,double,byref,byref,byref) Top method improvements (percentages): -21 (-17.36% of base) : System.Speech.dasm - ConvertTextFrag:ScaleNumber(float,int,int):int -3 (-0.55% of base) : Microsoft.VisualBasic.Core.dasm - Financial:NPer(double,double,double,double,int):double 3 total methods with Code Size differences (2 improved, 1 regressed), 263270 unchanged. 

The win in System.Speech is due to a call being inlined:

- vmovsd qword ptr [rsp+20H], xmm0 - vmovsd xmm0, qword ptr [reloc @RWD08] - call Math:Log(double):double - vmovsd xmm1, qword ptr [rsp+20H] - vdivsd xmm0, xmm1, xmm0 + vdivsd xmm0, xmm0, qword ptr [reloc @RWD08]

The regression in System.Runtime.Numerics looks to be because Log was not previously CSE'd and so now there is a new temp introduced:

- vmovsd xmm6, qword ptr [rax] + vmovsd xmm1, qword ptr [rax] + vmovsd qword ptr [rsp+38H], xmm1 call Math:Log(double):double - vaddsd xmm6, xmm0, xmm6 + vmovsd xmm1, qword ptr [rsp+38H] + vaddsd xmm6, xmm1, xmm0

It isn't exactly clear why this is needed for this scenario, particularly when the value isn't reused again elsewhere (the managed code is: https://source.dot.net/#System.Runtime.Numerics/System/Numerics/Complex.cs,594)

We see larger gains in --benchmarks, this time constant folding methods like Exp due to values that are constant post inlining:

Total bytes of base: 505742 Total bytes of diff: 505475 Total bytes of delta: -267 (-0.05% of base) diff is an improvement. Top file improvements (bytes): -88 : Benchstones\BenchF\Trap\Trap\Trap.dasm (-10.73% of base) -59 : Benchstones\BenchF\Simpsn\Simpsn\Simpsn.dasm (-7.49% of base) -46 : Benchstones\BenchF\NewtE\NewtE\NewtE.dasm (-3.61% of base) -40 : Benchstones\BenchF\Romber\Romber\Romber.dasm (-1.83% of base) -19 : Benchstones\BenchF\BenchMrk\BenchMrk\BenchMrk.dasm (-2.01% of base) -15 : Benchstones\BenchF\BenchMk2\BenchMk2\BenchMk2.dasm (-1.60% of base) 6 total files with Code Size differences (6 improved, 0 regressed), 76 unchanged. Top method improvements (bytes): -88 (-34.92% of base) : Benchstones\BenchF\Trap\Trap\Trap.dasm - Trap:Bench():bool -59 (-21.61% of base) : Benchstones\BenchF\Simpsn\Simpsn\Simpsn.dasm - Simpsn:Bench():bool -40 (-4.44% of base) : Benchstones\BenchF\Romber\Romber\Romber.dasm - Romber:Bench():bool -25 (-43.86% of base) : Benchstones\BenchF\NewtE\NewtE\NewtE.dasm - NewtE:FX(double,double):double -21 (-24.42% of base) : Benchstones\BenchF\NewtE\NewtE\NewtE.dasm - NewtE:F(double,double):double -19 (-4.19% of base) : Benchstones\BenchF\BenchMrk\BenchMrk\BenchMrk.dasm - BenchMrk:Bench():bool -15 (-3.36% of base) : Benchstones\BenchF\BenchMk2\BenchMk2\BenchMk2.dasm - BenchMk2:Bench():bool Top method improvements (percentages): -25 (-43.86% of base) : Benchstones\BenchF\NewtE\NewtE\NewtE.dasm - NewtE:FX(double,double):double -88 (-34.92% of base) : Benchstones\BenchF\Trap\Trap\Trap.dasm - Trap:Bench():bool -21 (-24.42% of base) : Benchstones\BenchF\NewtE\NewtE\NewtE.dasm - NewtE:F(double,double):double -59 (-21.61% of base) : Benchstones\BenchF\Simpsn\Simpsn\Simpsn.dasm - Simpsn:Bench():bool -40 (-4.44% of base) : Benchstones\BenchF\Romber\Romber\Romber.dasm - Romber:Bench():bool -19 (-4.19% of base) : Benchstones\BenchF\BenchMrk\BenchMrk\BenchMrk.dasm - BenchMrk:Bench():bool -15 (-3.36% of base) : Benchstones\BenchF\BenchMk2\BenchMk2\BenchMk2.dasm - BenchMk2:Bench():bool 
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 22, 2021
@tannergooding
Copy link
Member Author

tannergooding commented Jan 22, 2021

I broke this up by commit to make it easier to review.

No diffs for --tests

Total bytes of base: 139700628 Total bytes of diff: 139700628 Total bytes of delta: 0 (0.00% of base) 0 total files with Code Size differences (0 improved, 0 regressed), 3512 unchanged. 0 total methods with Code Size differences (0 improved, 0 regressed), 478331 unchanged. 

Local tests for scenarios like Matrix4x4.CreateRotationX(MathF.PI / 2); // 90 degrees shows that it can now be fully optimized when inlined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be "nice" if these could call COMDouble::Acos, etc. However I think that's difficult due to layering and it wasn't obvious what would be needed to make that happen.

@tannergooding
Copy link
Member Author

CC. @jkotas, @briansull, @dotnet/jit-contrib

@tannergooding
Copy link
Member Author

Rebased to pick up unrelated test fixes so CI will pass again

@jkotas
Copy link
Member

jkotas commented Jan 28, 2021

The non-JIT part of change looks good to me.

@AndyAyersMS
Copy link
Member

@briansull can you take a look at the jit changes?

Copy link
Contributor

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the JIT changes, Looks good

@ghost
Copy link

ghost commented Jan 29, 2021

Hello @tannergooding!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.
@tannergooding
Copy link
Member Author

Thanks for the reviews!

@ghost ghost merged commit e8f09ed into dotnet:master Jan 29, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Feb 28, 2021
@tannergooding tannergooding deleted the math branch July 1, 2025 14:39
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

5 participants