Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@tannergooding
Copy link
Member

This ports the rest of the formatting code to be written in managed code. It is, currently, mostly a naive port of the native code and only has a few minor fix ups to the code.

I locally ran the Roslyn RealParser suite, as well as did a basic benchmark on both float and double, covering 267,386,880 values in the input range (including both denormal and normal inputs).

Benchmarking was done with Tiered Jitting disabled.

For the Roslyn Real Parser Tests

This was mostly to validate that I didn't mess up any code during the port. Their own parsing code takes up the majority in each, but overall has a 1.52% regression in elapsed time.

Native
image

Managed
image

For float.ToString() on 267,386,880 values between float.MinValue and float.MaxValue

Grisu3.DigitGen is the major player in both here. This showed a 9.89% regression in elapsed time. float.ToString() will default to 7 digits.

Native
image

Managed
image

For double.ToString() on 267,386,880 values between double.MinValue and double.MaxValue

Grisu3.DigitGen is again the major time taker here. This showed a 16.43% regression in elapsed time. double.ToString() will default to 15 digits.

Native
image

Managed
image

@tannergooding
Copy link
Member Author

Looking at the perf gaps, some of it is just things that the C++ Compiler gives you for free, that we don't have enabled quite yet (like combined DivMod) or where the native code was using a more optimized algorithm (like memcpy and memset), or where they have intrinsics which we don't have yet (like BitScanReverse).

These should, however, be easy enough to fix up.

For example:

Native
image

Managed
image

_length = (upper == 0) ? 1 : 2;
}

public static uint BitScanReverse(uint mask)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Native code just calls the intrinsic instead.

private int _length;
private BlocksBuffer _blocks;

public BigInteger(uint value)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Native code can configure the default constructor to efficiently initialize this, rather than requiring us to specialize case value == 0

_length += (int)(blocksToShift);

// Zero the remaining low blocks
Buffer.ZeroMemory((byte*)(_blocks.GetPointer()), (blocksToShift * sizeof(uint)));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not being done properly in native code (size was blocksToShift, rather than blocksToShift * sizeof(uint)).

Without Grisu3 enabled as the first code-path, the Roslyn RealParser tests exposed this issue.


public void Multiply(ref DiyFp rhs)
{
ulong lf = _f;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a 64x64=128 multiply, so using an intrinsic is possible.

decimalExponent = CachedPowerDecimalExponents[index];
}

private static bool DigitGen(ref DiyFp mp, int precision, char* digits, out int length, out int k)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the function that is causing the most perf diff between the managed and native implementations.

We could likely close the gap by caching some field/properties we lookup multiple times (into locals), and by using Math.DivRem (or better, a proper intrinsic), rather than independent divide and remainder operations.

@tannergooding
Copy link
Member Author

bcltype/bignum.cpp was ported as part of this, which will make it easier to test porting the roslyn RealParser.

@tannergooding
Copy link
Member Author

After a small cleanup to DigitGen, the perf regression (for doubles) is down to 11.18% (from 16.43%)
image

@tannergooding
Copy link
Member Author

CC. @danmosemsft, @jkotas

@danmoseley
Copy link
Member

Do you propose we commit this as is or work to close the gap further? I suggest the latter?

}

[StructLayout(LayoutKind.Sequential, Size = (MaxBlockCount * sizeof(uint)))]
private struct BlocksBuffer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed uint _buffer[MaxBlockCount] ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't that cause a perf hit due to UnsafeValueTypeAttribute?

Copy link
Member

@jkotas jkotas Sep 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can measure it.

You are using unsafe stackallocated buffers, so you do want to pay for the GS cookie checks. I assume that the C/C++ code paid for them too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move to a C# 7.3 compiler version (we are still on 2.8.0-beta4 right now) so that we can index a fixed-buffer without pinning: https://docs.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-7-3#indexing-fixed-fields-does-not-require-pinning

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is blocked more generally on: #19878 (comment).

@jkotas
Copy link
Member

jkotas commented Sep 17, 2018

Do you propose we commit this as is or work to close the gap further

The regression is relatively small (~10%, depends on how you measure it). I think it is fine to take this into master since this moves us in the right direction, and unblocks other work. (And it is paying back the shortcut that was made during implementation of the faster number float formatting algorithms earlier.)

@tannergooding
Copy link
Member Author

Fixing up Buffer::ZeroMemory (which is currently a naive while loop) brings it down to 64.579s (or a 5.84% regression).
-- This can be a separate PR.

@danmoseley
Copy link
Member

I think it is fine to take this into master since this moves us in the right direction

sounds good to me

@danmoseley
Copy link
Member

BitScanReverse and BitScanReverse64 are now dead in pal.h

@tannergooding
Copy link
Member Author

BitScanReverse and BitScanReverse64 are now dead in pal.h

Removed.

@tannergooding
Copy link
Member Author

The current numbers, with everything currently up in this PR (5.68% regression):
image

@tannergooding
Copy link
Member Author

Everything except for moving BigInteger to use fixed-sized buffers has been addressed. Moving to fixed-sized buffers is just pending a BuildTools update to move us to C# 7.3 (so we don't have to scatter fixed statements around the code, or add "hacks" to work around it).

@danmoseley
Copy link
Member

@tannergooding what do you plan to do about tests? You found at least one bug we weren't catching. Does it make sense to port the Roslyn ones into CoreFX?

@tannergooding
Copy link
Member Author

what do you plan to do about tests? You found at least one bug we weren't catching. Does it make sense to port the Roslyn ones into CoreFX?

Until parsing is also fixed, we can't port the Roslyn test-bed directly. I'm working on pulling out some of the tests, however, so that we do have more coverage on important areas.

@tannergooding
Copy link
Member Author

@jkotas, are you fine with this being merged and with me logging a bug/following up with a second PR to move to fixed-sized buffers after the build-tools change (bringing in C# 7.3 support) goes in, or would you rather this just wait and be done all at once?

@jkotas
Copy link
Member

jkotas commented Sep 19, 2018

I would add a unused fixed field to struct BlocksBuffer so that it gets marked properly as structure containing unchecked buffers.

Otherwise, I am fine with this being merged and logging bug/following up with second PR later.

@tannergooding
Copy link
Member Author

I would add a unused fixed field to struct BlocksBuffer so that it gets marked properly as structure containing unchecked buffers.

Fixed.


private static long ExtractFractionAndBiasedExponent(double value, out int exponent)
{
long fraction = GetMantissa(value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are likely losing some cycles by converting double to long twice in a row.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.
-- I wish we had the ability to specify a function was const so that the JIT could just do the right/expected thing here.

@tannergooding
Copy link
Member Author

@jkotas, any other feedback you would like me to address? I think I've gotten it all at this point.

_e = e;
}

public ulong f
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: There is no value in these being properties. Just makes the JIT to work harder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll track this is part of the fixed buffer cleanup after the build-tools update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas
Copy link
Member

jkotas commented Sep 20, 2018

Do the perf numbers still look good with all the feedback incorporated?

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tannergooding
Copy link
Member Author

Do the perf numbers still look good with all the feedback incorporated?

Yes. The current run shows the best numbers so far (only a 3.93% regression from the original 61.02s):

  • Noting that there is some noise here. I have also seen it take up to 65.83s, which is a 7.89% regression
  • I plan on looking at benchview after this pumps back to CoreFX as well, for another comparison point

image

@jkotas
Copy link
Member

jkotas commented Sep 20, 2018

@tannergooding
Copy link
Member Author

CoreCLR/CoreFX convention is to merge using "Squash and Merge". https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/contributing.md#merging-pull-requests-for-contributors-with-write-access (next time...)

Can we set that to be the default then? We have multiple repos across dotnet and many of them do not agree on this behavior. As such, I normally just use whatever the default merge option is configured for (which is currently Rebase and Merge)

@jkotas
Copy link
Member

jkotas commented Sep 20, 2018

Where do you set it as default?

@tannergooding
Copy link
Member Author

It should be under Settings, it looks like they've changed it around since the last time I looked:
I believe https://help.github.com/articles/configuring-commit-squashing-for-pull-requests/ has more details

@jkotas
Copy link
Member

jkotas commented Sep 20, 2018

Right, that let's you choose what is allowed, not what should be the default.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

3 participants