Skip to content

Conversation

@erozenfeld
Copy link
Member

  1. Bug fix: when processing a zero assignment to a field of a promoted struct,
    check the reference count of the parent struct.

  2. Bug fix: when processing a zero assignment to a promoted struct, check the
    reference counts of all fields.

  3. Bug fix and improvement: a dependently promoted field is initialized in the prolog
    iff its parent struct is initialized in the prolog so we can remove a field zero initialization
    if the parent struct is already zero initialized.

  4. Improvement: keep track of explicitly zero-initialized locals so that duplicate
    explicit zero initializations can be eliminated.

Fixes #37666.

1. Bug fix: when processing a zero assignment to a field of a promoted struct, check the reference count of the parent struct. 2. Bug fix: when processing a zero assignment to a promoted struct, check the reference counts of all fields. 3. Bug fix and improvement: a dependently promoted field is initialized in the prolog iff its parent struct is initialized in the prolog so we can remove a field zero initialization if the parent struct is already zero initialized. 4. Improvement: keep track of explicitly zero-initialized locals so that duplicate explicit zero initializations can be eliminated. Fixes dotnet#37666.
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 12, 2020
@erozenfeld
Copy link
Member Author

Framework x64 pmi diffs:

PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for default jit Summary of Code Size diffs: (Lower is better) Total bytes of diff: -575 (-0.00% of base) diff is an improvement. Top file improvements (bytes): -325 : System.Private.Xml.dasm (-0.01% of base) -97 : System.Reflection.Metadata.dasm (-0.02% of base) -76 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.00% of base) -16 : Microsoft.Extensions.DependencyInjection.dasm (-0.03% of base) -14 : Microsoft.CodeAnalysis.dasm (-0.00% of base) -10 : Microsoft.CodeAnalysis.CSharp.dasm (-0.00% of base) -10 : System.Net.Http.dasm (-0.00% of base) -10 : System.Private.CoreLib.dasm (-0.00% of base) -9 : System.Net.HttpListener.dasm (-0.00% of base) -6 : Microsoft.Extensions.Primitives.dasm (-0.02% of base) -1 : System.Security.Cryptography.Algorithms.dasm (-0.00% of base) -1 : System.Security.Cryptography.Cng.dasm (-0.00% of base) 12 total files with Code Size differences (12 improved, 0 regressed), 252 unchanged. Top method regressions (bytes): 5 (12.82% of base) : Microsoft.CodeAnalysis.CSharp.dasm - ControlFlowPass:LabelState(LabelSymbol):LocalState:this 5 ( 0.32% of base) : System.Private.Xml.dasm - QilGenerator:.ctor(bool):this Top method improvements (bytes): -19 (-3.26% of base) : System.Private.Xml.dasm - XmlTextReaderImpl:.ctor():this -19 (-2.78% of base) : System.Private.Xml.dasm - XmlTextReaderImpl:.ctor(XmlNameTable):this -19 (-1.77% of base) : System.Private.Xml.dasm - XmlTextReaderImpl:.ctor(XmlResolver,XmlReaderSettings,XmlParserContext):this -13 (-3.72% of base) : System.Private.Xml.dasm - XmlBaseConverter:UntypedAtomicToDateTime(String):DateTime -13 (-3.60% of base) : System.Private.Xml.dasm - XmlBaseConverter:UntypedAtomicToDateTimeOffset(String):DateTimeOffset -13 (-3.72% of base) : System.Private.Xml.dasm - XmlBaseConverter:StringToDate(String):DateTime -13 (-3.72% of base) : System.Private.Xml.dasm - XmlBaseConverter:StringToDateTime(String):DateTime -13 (-3.72% of base) : System.Private.Xml.dasm - XmlBaseConverter:StringToGDay(String):DateTime -13 (-3.72% of base) : System.Private.Xml.dasm - XmlBaseConverter:StringToGMonth(String):DateTime -13 (-3.72% of base) : System.Private.Xml.dasm - XmlBaseConverter:StringToGMonthDay(String):DateTime -13 (-3.72% of base) : System.Private.Xml.dasm - XmlBaseConverter:StringToGYear(String):DateTime -13 (-3.72% of base) : System.Private.Xml.dasm - XmlBaseConverter:StringToGYearMonth(String):DateTime -13 (-3.60% of base) : System.Private.Xml.dasm - XmlBaseConverter:StringToDateOffset(String):DateTimeOffset -13 (-3.60% of base) : System.Private.Xml.dasm - XmlBaseConverter:StringToDateTimeOffset(String):DateTimeOffset -13 (-3.60% of base) : System.Private.Xml.dasm - XmlBaseConverter:StringToGDayOffset(String):DateTimeOffset -13 (-3.60% of base) : System.Private.Xml.dasm - XmlBaseConverter:StringToGMonthOffset(String):DateTimeOffset -13 (-3.60% of base) : System.Private.Xml.dasm - XmlBaseConverter:StringToGMonthDayOffset(String):DateTimeOffset -13 (-3.60% of base) : System.Private.Xml.dasm - XmlBaseConverter:StringToGYearOffset(String):DateTimeOffset -13 (-3.60% of base) : System.Private.Xml.dasm - XmlBaseConverter:StringToGYearMonthOffset(String):DateTimeOffset -13 (-3.72% of base) : System.Private.Xml.dasm - XmlBaseConverter:StringToTime(String):DateTime Top method regressions (percentages): 5 (12.82% of base) : Microsoft.CodeAnalysis.CSharp.dasm - ControlFlowPass:LabelState(LabelSymbol):LocalState:this 5 ( 0.32% of base) : System.Private.Xml.dasm - QilGenerator:.ctor(bool):this Top method improvements (percentages): -9 (-30.00% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - ControlFlowPass:ReachableState():LocalState:this -8 (-28.57% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - DimensionSize:VariableSize():DimensionSize -10 (-26.32% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - ControlFlowPass:UnreachableState():LocalState:this -6 (-22.22% of base) : System.Reflection.Metadata.dasm - TypeDefinitionHandleCollection:GetEnumerator():Enumerator:this -6 (-22.22% of base) : System.Reflection.Metadata.dasm - TypeReferenceHandleCollection:GetEnumerator():Enumerator:this -6 (-22.22% of base) : System.Reflection.Metadata.dasm - ExportedTypeHandleCollection:GetEnumerator():Enumerator:this -6 (-22.22% of base) : System.Reflection.Metadata.dasm - MemberReferenceHandleCollection:GetEnumerator():Enumerator:this -6 (-22.22% of base) : System.Reflection.Metadata.dasm - ManifestResourceHandleCollection:GetEnumerator():Enumerator:this -6 (-22.22% of base) : System.Reflection.Metadata.dasm - AssemblyFileHandleCollection:GetEnumerator():Enumerator:this -5 (-20.83% of base) : System.Reflection.Metadata.dasm - ModuleDefinitionHandle:op_Implicit(ModuleDefinitionHandle):Handle -5 (-19.23% of base) : Microsoft.CodeAnalysis.CSharp.dasm - ControlFlowPass:ReachableState():LocalState:this -4 (-14.81% of base) : Microsoft.CodeAnalysis.dasm - SubsystemVersion:get_Windows2000():SubsystemVersion -4 (-14.81% of base) : Microsoft.CodeAnalysis.dasm - SubsystemVersion:get_WindowsVista():SubsystemVersion -4 (-14.81% of base) : Microsoft.Extensions.DependencyInjection.dasm - ILEmitCallSiteAnalyzer:VisitConstant(ConstantCallSite,Object):ILEmitCallSiteAnalysisResult:this -4 (-14.81% of base) : Microsoft.Extensions.DependencyInjection.dasm - ILEmitCallSiteAnalyzer:VisitServiceProvider(ServiceProviderCallSite,Object):ILEmitCallSiteAnalysisResult:this -4 (-14.81% of base) : Microsoft.Extensions.DependencyInjection.dasm - ILEmitCallSiteAnalyzer:VisitServiceScopeFactory(ServiceScopeFactoryCallSite,Object):ILEmitCallSiteAnalysisResult:this -4 (-14.81% of base) : Microsoft.Extensions.DependencyInjection.dasm - ILEmitCallSiteAnalyzer:VisitFactory(FactoryCallSite,Object):ILEmitCallSiteAnalysisResult:this -4 (-14.29% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - DimensionSize:ConstantSize(int):DimensionSize -4 (-13.79% of base) : Microsoft.CodeAnalysis.CSharp.dasm - ControlFlowPass:UnreachableState():LocalState:this -10 (-9.52% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - DataFlowPass:ReachableState():LocalState:this 74 total methods with Code Size differences (72 improved, 2 regressed), 242656 unchanged. 
@erozenfeld
Copy link
Member Author

No diffs in benchmarks.

@erozenfeld
Copy link
Member Author

Only instance of a bug fix in framework:

Microsoft.CodeAnalysis.CSharp.dll ; Assembly listing for method ControlFlowPass:LabelState(LabelSymbol):LocalState:this G_M46536_IG01: sub rsp, 40 xor rax, rax mov qword ptr [rsp+20H], rax	;; bbWeight=1 PerfScore 1.50 G_M46536_IG02: mov rax, 0xD1FFAB1E call qword ptr [rax]PreciseAbstractFlowPass`1:LabelState(LabelSymbol):LocalState:this mov word ptr [rsp+20H], ax + mov byte ptr [rsp+21H], 0 movsx rax, word ptr [rsp+20H]	;; bbWeight=1 PerfScore 5.25	;; bbWeight=1 PerfScore 6.25 G_M46536_IG03: add rsp, 40 ret	;; bbWeight=1 PerfScore 1.25 -; Total bytes of code 39, prolog size 11, PerfScore 11.90, (MethodHash=997c4a37) for method ControlFlowPass:LabelState(LabelSymbol):LocalState:this +; Total bytes of code 44, prolog size 11, PerfScore 13.40, (MethodHash=997c4a37) for method ControlFlowPass:LabelState(LabelSymbol):LocalState:this 
@erozenfeld
Copy link
Member Author

Examples of improvement diffs:

Struct is explicitly zero initialized and we remove field zero inits

Microsoft.CodeAnalysis.VisualBasic.dll ; Assembly listing for method ControlFlowPass:UnreachableState():LocalState:this G_M2978_IG01: push rax	;; bbWeight=1 PerfScore 1.00 G_M2978_IG02: xor eax, eax mov word ptr [rsp], ax movzx rax, byte ptr [rcx+137] - xor edx, edx - mov word ptr [rsp], dx - mov byte ptr [rsp], 0 mov byte ptr [rsp+01H], al movsx rax, word ptr [rsp]	;; bbWeight=1 PerfScore 7.50	;; bbWeight=1 PerfScore 5.25 G_M2978_IG03: add rsp, 8 ret	;; bbWeight=1 PerfScore 1.25 -; Total bytes of code 38, prolog size 1, PerfScore 13.55, (MethodHash=1828f45d) for method ControlFlowPass:UnreachableState():LocalState:this +; Total bytes of code 28, prolog size 1, PerfScore 10.30, (MethodHash=1828f45d) for method ControlFlowPass:UnreachableState():LocalState:this

Struct is zero initialized in the prolog and we remove field zero init

Microsoft.CodeAnalysis.CSharp.dll ; Assembly listing for method SyntaxDiagnosticInfoList:GetEnumerator():Enumerator:this G_M39539_IG01: push rdi push rsi push rbx sub rsp, 64 vxorps xmm4, xmm4 vmovdqu xmmword ptr [rsp+28H], xmm4 xor rax, rax mov qword ptr [rsp+38H], rax mov rsi, rdx	;; bbWeight=1 PerfScore 6.08 G_M39539_IG02: mov rdi, gword ptr [rcx] - xor ecx, ecx - mov dword ptr [rsp+38H], ecx test rdi, rdi je G_M39539_IG08 ... -; Total bytes of code 240, prolog size 24, PerfScore 51.60, (MethodHash=0691658c) for method SyntaxDiagnosticInfoList:GetEnumerator():Enumerator:this +; Total bytes of code 234, prolog size 24, PerfScore 49.75, (MethodHash=0691658c) for method SyntaxDiagnosticInfoList:GetEnumerator():Enumerator:this

We remove duplicate explicit struct zero init

Microsoft.CodeAnalysis.VisualBasic.dll ; Assembly listing for method DataFlowPass:ReachableState():LocalState:this ; Emitting BLENDED_CODE for X64 CPU with AVX - Windows ; optimized code ; rsp based frame ; partially interruptible ; Final local variable assignments ; ;* V00 this [V00 ] ( 0, 0 ) ref -> zero-ref this class-hnd ; V01 RetBuf [V01,T01] ( 4, 4 ) byref -> rbx ; V02 OutArgs [V02 ] ( 1, 1 ) lclBlk (32) [rsp+0x00] "OutgoingArgSpace" ; V03 tmp1 [V03 ] ( 4, 8 ) struct (16) [rsp+0x30] do-not-enreg[XSFB] addr-exposed "NewObj constructor temp" ; V03 tmp1 [V03 ] ( 3, 6 ) struct (16) [rsp+0x30] do-not-enreg[XSFB] addr-exposed "NewObj constructor temp" ; V04 tmp2 [V04 ] ( 4, 8 ) struct (16) [rsp+0x20] do-not-enreg[XS] must-init addr-exposed "struct address for call/obj" ;* V05 tmp3 [V05 ] ( 0, 0 ) struct (16) zero-ref "Inlining Arg" ; V06 tmp4 [V06 ] ( 2, 3 ) ref -> [rsp+0x20] do-not-enreg[X] addr-exposed V04._bits(offs=0x00) P-DEP "field V04._bits (fldOffset=0x0)" ; V07 tmp5 [V07 ] ( 2, 3 ) int -> [rsp+0x28] do-not-enreg[X] addr-exposed V04._bits0(offs=0x08) P-DEP "field V04._bits0 (fldOffset=0x8)" ; V08 tmp6 [V08 ] ( 2, 3 ) int -> [rsp+0x2C] do-not-enreg[X] addr-exposed V04._capacity(offs=0x0c) P-DEP "field V04._capacity (fldOffset=0xc)" ; V09 tmp7 [V09,T02] ( 2, 2 ) ref -> rax V05._bits(offs=0x00) P-INDEP "field V05._bits (fldOffset=0x0)" ; V10 tmp8 [V10,T03] ( 2, 2 ) int -> rdx V05._bits0(offs=0x08) P-INDEP "field V05._bits0 (fldOffset=0x8)" ; V11 tmp9 [V11,T04] ( 2, 2 ) int -> rcx V05._capacity(offs=0x0c) P-INDEP "field V05._capacity (fldOffset=0xc)" ; V12 tmp10 [V12,T00] ( 4, 8 ) byref -> r8 stack-byref "BlockOp address local" ; ; Lcl frame size = 64 G_M31598_IG01: push rdi push rsi push rbx sub rsp, 64 vzeroupper xor rax, rax mov qword ptr [rsp+20H], rax mov rbx, rdx	;; bbWeight=1 PerfScore 5.75 G_M31598_IG02: vxorps xmm0, xmm0 vmovdqu xmmword ptr [rsp+30H], xmm0 lea rcx, bword ptr [rsp+20H] call BitVector:get_Empty():BitVector mov rax, gword ptr [rsp+20H] mov edx, dword ptr [rsp+28H] mov ecx, dword ptr [rsp+2CH] - vxorps xmm0, xmm0 - vmovdqu xmmword ptr [rsp+30H], xmm0 lea r8, bword ptr [rsp+30H] mov gword ptr [r8], rax mov dword ptr [r8+8], edx mov dword ptr [r8+12], ecx mov rdi, rbx lea rsi, bword ptr [rsp+30H] call CORINFO_HELP_ASSIGN_BYREF movsq mov rax, rbx	;; bbWeight=1 PerfScore 13.67	;; bbWeight=1 PerfScore 12.33 G_M31598_IG03: add rsp, 64 pop rbx pop rsi pop rdi ret	;; bbWeight=1 PerfScore 2.75 -; Total bytes of code 105, prolog size 17, PerfScore 33.07, (MethodHash=272a8491) for method DataFlowPass:ReachableState():LocalState:this +; Total bytes of code 95, prolog size 17, PerfScore 30.53, (MethodHash=272a8491) for method DataFlowPass:ReachableState():LocalState:this
@erozenfeld
Copy link
Member Author

@dotnet/jit-contrib @AndyAyersMS PTAL

@erozenfeld erozenfeld requested a review from AndyAyersMS June 12, 2020 07:08
@erozenfeld
Copy link
Member Author

I recommend reviewing with "Hide whitespace changes" since I added a couple of early-outs from the loop to reduce indentation.

@erozenfeld
Copy link
Member Author

Outerloop failures are pre-existing: #37759, #37470.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just one question about some code you didn't change.

fgRemoveStmt(block, stmt);
removedExplicitZeroInit = true;
lclDsc->lvSuppressedZeroInit = 1;
lclDsc->setLvRefCnt(lclDsc->lvRefCnt() - 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally don't incrementally update ref counts anymore. Is this important?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added that in #37280. The idea is to detect cases where we make the struct dependently promoted because of a block assignment and then delete that block assignment. In that case we can make the field tracked and eliminate dead field assignments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is any guarantee that the count is > 0 at this point? I suppose this is running not long after the first call to lvaMarkLocalVars so the ref counts are probably still fairly accurate.

And the code to determine which locals to track is just a heuristic, so it's ok if it is based on incorrect ref count info.

@erozenfeld erozenfeld merged commit d82b7be into dotnet:master Jun 12, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

3 participants