Skip to content

Conversation

@EgorBo
Copy link
Member

@EgorBo EgorBo commented Dec 21, 2023

Closes #32239

These were only implemented on ARM64 (and RISC-V). Unlike other Interlocked APIs, these return original value so it makes them impossible to optimize on XARCH when that original value is used. Looks like all usages of it in BCL don't need return value so we can optimize this case. Example:

Interlocked.And(ref m_stateFlags, ~(int)TaskStateFlags.WaitCompletionNotification);

void Test1(ref int x, int y) => Interlocked.Or(ref x, y); // return value is not used int Test2(ref int x, int y) => Interlocked.Or(ref x, y); // return value is used

Was:

; Method Tests:Test1(byref,int):this (FullOpts) G_M7530_IG01:  push rax G_M7530_IG02:  mov eax, dword ptr [rdx]  jmp SHORT G_M7530_IG03  align [0 bytes for IG03] G_M7530_IG03:  mov ecx, eax  or ecx, r8d  mov dword ptr [rsp+0x04], eax  lock   cmpxchg dword ptr [rdx], ecx  mov ecx, dword ptr [rsp+0x04]  cmp eax, ecx  je SHORT G_M7530_IG05 G_M7530_IG04:  mov ecx, eax  mov eax, ecx  jmp SHORT G_M7530_IG03 G_M7530_IG05:  add rsp, 8  ret  ; Total bytes of code: 37 ; Method Tests:Test2(byref,int):int:this (FullOpts) G_M30720_IG01:  push rax G_M30720_IG02:  mov eax, dword ptr [rdx]  jmp SHORT G_M30720_IG03  align [0 bytes for IG03] G_M30720_IG03:  mov ecx, eax  or ecx, r8d  mov dword ptr [rsp+0x04], eax  lock   cmpxchg dword ptr [rdx], ecx  mov ecx, dword ptr [rsp+0x04]  cmp eax, ecx  je SHORT G_M30720_IG05 G_M30720_IG04:  mov ecx, eax  mov eax, ecx  jmp SHORT G_M30720_IG03 G_M30720_IG05:  mov eax, ecx G_M30720_IG06:  add rsp, 8  ret  ; Total bytes of code: 39 

Now:

; Method Tests:Test1(byref,int):this (FullOpts) G_M7530_IG01: G_M7530_IG02:  lock   or dword ptr [rdx], r8d G_M7530_IG03:  ret  ; Total bytes of code: 5 ; Method Tests:Test2(byref,int):int:this (FullOpts) G_M30720_IG01: G_M30720_IG02:  mov eax, dword ptr [rdx] G_M30720_IG03:  mov ecx, eax  or ecx, r8d  lock   cmpxchg dword ptr [rdx], ecx  jne SHORT G_M30720_IG03 G_M30720_IG04:  ret  ; Total bytes of code: 14
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 21, 2023
@ghost ghost assigned EgorBo Dec 21, 2023
@ghost
Copy link

ghost commented Dec 21, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Closes #32239

These were only implemented on ARM64 (and RISC-V). Unlike other Interlocked APIs, these return original value so it makes them impossible to optimize on XARCH when that original value is used. Looks like all usages of it in BCL don't need return value so we can optimize this case.

void Test(ref int x, int y) => Interlocked.Or(ref x, y);

Was:

; Method Program:Test(byref,int):this (FullOpts) G_M55114_IG01: ;; offset=0x0000  push rax ;; size=1 bbWeight=1 PerfScore 1.00 G_M55114_IG02: ;; offset=0x0001  mov eax, dword ptr [rdx]  jmp SHORT G_M55114_IG03  align [0 bytes for IG03] ;; size=4 bbWeight=1 PerfScore 4.00 G_M55114_IG03: ;; offset=0x0005  mov ecx, eax  or ecx, r8d  mov dword ptr [rsp+0x04], eax  lock   cmpxchg dword ptr [rdx], ecx  mov ecx, dword ptr [rsp+0x04]  cmp eax, ecx  je SHORT G_M55114_IG05 ;; size=21 bbWeight=8 PerfScore 174.00 G_M55114_IG04: ;; offset=0x001A  mov ecx, eax  mov eax, ecx  jmp SHORT G_M55114_IG03 ;; size=6 bbWeight=4 PerfScore 10.00 G_M55114_IG05: ;; offset=0x0020  add rsp, 8  ret  ;; size=5 bbWeight=1 PerfScore 1.25 ; Total bytes of code: 37

Now:

; Method Program:Test(byref,int):this (FullOpts) G_M55114_IG01: ;; offset=0x0000 ;; size=0 bbWeight=1 PerfScore 0.00 G_M55114_IG02: ;; offset=0x0000  lock   or dword ptr [rdx], r8d ;; size=4 bbWeight=1 PerfScore 16.00 G_M55114_IG03: ;; offset=0x0004  ret  ;; size=1 bbWeight=1 PerfScore 1.00 ; Total bytes of code: 5
Author: EgorBo
Assignees: EgorBo
Labels:

area-CodeGen-coreclr

Milestone: -
@EgorBo EgorBo force-pushed the intrinsify-interlocked-and-or branch from 62cd091 to 4759e86 Compare January 4, 2024 21:11
@EgorBo EgorBo force-pushed the intrinsify-interlocked-and-or branch from bebdc5c to 988c8b7 Compare January 4, 2024 22:04
@EgorBo
Copy link
Member Author

EgorBo commented Jan 5, 2024

@dotnet/jit-contrib @BruceForstall PTAL, simple PR, optimizes Interlocked.And/Add the way native compilers do: https://godbolt.org/z/csfd3nG89

Diffs

A few size and TP regressions for the cases when we previously didn't inline Interlocked.And/Or and now it's inlined as intrinsic in both Tier0 (non-debug and non-minopts) and Tier1

Copy link
Contributor

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a comment in lsra.

@EgorBo EgorBo merged commit a5ff7e6 into dotnet:main Jan 5, 2024
@EgorBo EgorBo deleted the intrinsify-interlocked-and-or branch January 5, 2024 21:33
@JulieLeeMSFT JulieLeeMSFT added this to the 9.0.0 milestone Jan 5, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Feb 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

4 participants