Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Support retbuf optimization for non 'lvIsTemp' locals #104467

Merged
merged 9 commits into from
Jul 9, 2024

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Jul 5, 2024

The retbuf optimization allows us to avoid address exposure for retbuf definitions; instead we consider them to be just defined (and not exposed) by the calls.

This optimization was previously only enabled for lvIsTemp locals, i.e. the locals created by lvaGrabTemp(true). The reason was that the definition of the actual retbuf happened when we saw the LCL_ADDR node; if there were additional uses after the LCL_ADDR node, then they would think they were referring to the LCL_ADDR definition. The lvIsTemp gave us reasonable confidence that there were no such additional uses.

This PR fixes the root cause of the problem and enables the optimization for non-lvIsTemp locals. To do that it teaches the various liveness phases to ignore the LCL_ADDR nodes when it gets to them and to instead handle the definition at the point of the parent CALL node.

These locals end up being dependently promoted. Skip them and allow
physical promotion to handle them instead.
- Handle liveness for the LCL_ADDR definitions when we get to the call
- Remove lvIsTemp check from retbuf optimization
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 5, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Jul 5, 2024

Some of the big size regressions here seem to be coming from more loop cloning. Looks to be the case for both

****** START compiling System.DefaultBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this (MethodHash=29b54a3e)

(biggest size regression in libraries_tests.run, accounts for > +50 KB)
and

****** START compiling System.Reflection.DynamicInvokeInfo:.ctor(System.Reflection.MethodBase,long):this (MethodHash=c15b0701)

(biggest size regression in smoke_tests, +1.6 KB)

Diffs with DOTNET_JitCloneLoops=1 (-413K overall). Diffs with DOTNET_JitCloneLoops=0 (-470K overall).

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-jit-experimental

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Jul 5, 2024

Detailed tp diff for realworld win-x64:

Base: 52411141834, Diff: 52550603925, +0.2661%

15664257  : +1.51%  : 5.92%  : +0.0299% : public: bool __cdecl Compiler::optCopyProp(struct BasicBlock *, struct Statement *, struct GenTreeLclVarCommon *, unsigned int, class JitHashTable<unsigned int, struct JitSmallPrimitiveKeyFuncs<unsigned int>, class ArrayStack<class Compiler::CopyPropSsaDef> *, class CompAllocator, class JitHashTableBehavior> *)                                                                                                                                                                                                         
15251899  : +26.41% : 5.76%  : +0.0291% : public: enum Compiler::fgWalkResult __cdecl GenTreeVisitor<class ReplaceVisitor>::WalkTree(struct GenTree **, struct GenTree *)                                                                                                                                                                                                                                                                                                                                                                                                  
10308582  : +90.47% : 3.89%  : +0.0197% : public: struct GenTreeLclVarCommon * __cdecl Compiler::gtCallGetDefinedRetBufLclAddr(struct GenTreeCall *)                                                                                                                                                                                                                                                                                                                                                                                                                       
7340088   : +86.46% : 2.77%  : +0.0140% : public: void __cdecl Compiler::fgComputeLifeCall                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
7006550   : +3.17%  : 2.65%  : +0.0134% : public: void __cdecl ArrayStack<struct Scev *>::Push(struct Scev *)                                                                                                                                                                                                                                                                                                                                                                                                                                                              
4968018   : +20.77% : 1.88%  : +0.0095% : public: bool __cdecl LIR::Range::TryGetUse(struct GenTree *, class LIR::Use *)                                                                                                                                                                                                                                                                                                                                                                                                                                                   
4789308   : +0.88%  : 1.81%  : +0.0091% : public: struct GenTree * __cdecl Compiler::optAssertionProp_LclVar(unsigned __int64 *const &, struct GenTreeLclVarCommon *, struct Statement *)                                                                                                                                                                                                                                                                                                                                                                                  
4034761   : +45.36% : 1.52%  : +0.0077% : private: void __cdecl PromotionLiveness::InterBlockLiveness(void)                                                                                                                                                                                                                                                                                                                                                                                                                                                                
3719667   : +15.68% : 1.41%  : +0.0071% : public: enum PhaseStatus __cdecl Promotion::Run(void)                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
3687313   : +1.24%  : 1.39%  : +0.0070% : jitstd::`anonymous namespace'::quick_sort<unsigned int *,LclVarDsc_BlendedCode_Less>                                                                                                                                                                                                                                                                                                                                                                                                                                             
3496041   : +1.85%  : 1.32%  : +0.0067% : public: struct GenTree * __cdecl Compiler::optCopyAssertionProp(struct Compiler::AssertionDsc *, struct GenTreeLclVarCommon *, struct Statement *)                                                                                                                                                                                                                                                                                                                                                                               
3123148   : +16.47% : 1.18%  : +0.0060% : public: void __cdecl ReplaceVisitor::EndBlock(void)                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
2992952   : +26.59% : 1.13%  : +0.0057% : private: struct GenTree ** __cdecl ReplaceVisitor::InsertMidTreeReadBacks(struct GenTree **)                                                                                                                                                                                                                                                                                                                                                                                                                                     
2949855   : +29.67% : 1.11%  : +0.0056% : GenTreeVisitor<`ReplaceVisitor::InsertPreStatementWriteBacks'::`2'::Visitor>::WalkTree                                                                                                                                                                                                                                                                                                                                                                                                                                           
2819893   : +7.86%  : 1.07%  : +0.0054% : private: void __cdecl SsaBuilder::AddPhiArg(struct BasicBlock *, struct Statement *, struct GenTreePhi *, unsigned int, unsigned int, struct BasicBlock *)                                                                                                                                                                                                                                                                                                                                                                       
2746692   : +21.99% : 1.04%  : +0.0052% : private: void __cdecl PromotionLiveness::FillInLiveness(unsigned __int64 *&, unsigned __int64 *, struct GenTreeLclVarCommon *)                                                                                                                                                                                                                                                                                                                                                                                                   
2548813   : +27.07% : 0.96%  : +0.0049% : private: void __cdecl PromotionLiveness::ComputeUseDefSets(void)                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
2510275   : +45.12% : 0.95%  : +0.0048% : `PromotionLiveness::PerBlockLiveness'::`2'::<lambda_1>::operator()                                                                                                                                                                                                                                                                                                                                                                                                                                                               
2468615   : +45.05% : 0.93%  : +0.0047% : BasicBlock::VisitRegularSuccs<`PromotionLiveness::PerBlockLiveness'::`2'::<lambda_1> >                                                                                                                                                                                                                                                                                                                                                                                                                                           
2459623   : +27.01% : 0.93%  : +0.0047% : private: void __cdecl PromotionLiveness::FillInLiveness(void)                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
2442181   : +43.34% : 0.92%  : +0.0047% : public: enum Compiler::fgWalkResult __cdecl GenTreeVisitor<class LocalsUseVisitor>::WalkTree(struct GenTree **, struct GenTree *)                                                                                                                                                                                                                                                                                                                                                                                                
2344686   : +18.48% : 0.89%  : +0.0045% : public: bool __cdecl LocalsUseVisitor::PickPromotions(class AggregateInfoMap &)                                                                                                                                                                                                                                                                                                                                                                                                                                                  
2310371   : +25.14% : 0.87%  : +0.0044% : private: void __cdecl ReplaceVisitor::ReplaceLocal(struct GenTree **, struct GenTree *)                                                                                                                                                                                                                                                                                                                                                                                                                                          
2259116   : +23.81% : 0.85%  : +0.0043% : private: void __cdecl PromotionLiveness::MarkUseDef(struct GenTreeLclVarCommon *, unsigned __int64 *&, unsigned __int64 *&)                                                                                                                                                                                                                                                                                                                                                                                                      
2057443   : NA      : 0.78%  : +0.0039% : public: void __cdecl Compiler::fgPerNodeLocalVarLiveness(struct GenTreeHWIntrinsic *)                                                                                                                                                                                                                                                                                                                                                                                                                                            
2033093   : +36.82% : 0.77%  : +0.0039% : public: enum Compiler::fgWalkResult __cdecl LocalsUseVisitor::PreOrderVisit(struct GenTree **, struct GenTree *)                                                                                                                                                                                                                                                                                                                                                                                                                 
2004242   : +1.35%  : 0.76%  : +0.0038% : GenTreeVisitor<`Compiler::gtUpdateStmtSideEffects'::`2'::UpdateSideEffectsWalker>::WalkTree                                                                                                                                                                                                                                                                                                                                                                                                                                      
1991995   : +17.50% : 0.75%  : +0.0038% : public: bool __cdecl GenTree::TryGetUse(struct GenTree *, struct GenTree ***)                                                                                                                                                                                                                                                                                                                                                                                                                                                    
1836022   : +0.65%  : 0.69%  : +0.0035% : public: void __cdecl Compiler::fgInterBlockLocalVarLiveness(void)                                                                                                                                                                                                                                                                                                                                                                                                                                                                
1772285   : +0.98%  : 0.67%  : +0.0034% : public: unsigned int __cdecl ValueNumStore::VNForMapSelectWork(enum ValueNumKind, enum var_types, unsigned int, unsigned int, int *, bool *, class SmallValueNumSet &)                                                                                                                                                                                                                                                                                                                                                           
1729659   : +0.80%  : 0.65%  : +0.0033% : public: void __cdecl Compiler::fgComputeLifeLIR(unsigned __int64 *&, struct BasicBlock *, unsigned __int64 *const &)                                                                                                                                                                                                                                                                                                                                                                                                             
1702470   : +0.56%  : 0.64%  : +0.0032% : public: void __cdecl Compiler::fgPerBlockLocalVarLiveness(void)                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
1670317   : +1.65%  : 0.63%  : +0.0032% : public: bool __cdecl Compiler::fgComputeLifeTrackedLocalDef(unsigned __int64 *&, unsigned __int64 *const &, class LclVarDsc &, struct GenTreeLclVarCommon *)                                                                                                                                                                                                                                                                                                                                                                     
1654709   : +29.31% : 0.63%  : +0.0032% : public: bool __cdecl LocalUses::EvaluateReplacement(class Compiler *, unsigned int, struct Access const &, unsigned int, double)                                                                                                                                                                                                                                                                                                                                                                                                 
1459311   : +0.89%  : 0.55%  : +0.0028% : public: void __cdecl emitter::emitUpdateLiveGCvars(unsigned __int64 *const &, unsigned char *)                                                                                                                                                                                                                                                                                                                                                                                                                                   
1361220   : +29.45% : 0.51%  : +0.0026% : `ReplaceVisitor::InsertPreStatementWriteBacks'::`2'::Visitor::PreOrderVisit                                                                                                                                                                                                                                                                                                                                                                                                                                                      
1307016   : +10.34% : 0.49%  : +0.0025% : public: static void __cdecl BitSetOps<unsigned __int64 *, 1, struct BitVecTraits *, struct BitVecTraits>::ClearD(struct BitVecTraits *, unsigned __int64 *&)                                                                                                                                                                                                                                                                                                                                                                     
1290619   : +1.15%  : 0.49%  : +0.0025% : public: void __cdecl Compiler::fgComputeLifeTrackedLocalUse(unsigned __int64 *&, class LclVarDsc &, struct GenTreeLclVarCommon *)                                                                                                                                                                                                                                                                                                                                                                                                
1284572   : +0.97%  : 0.49%  : +0.0025% : public: static void __cdecl BitSetOps<unsigned __int64 *, 1, class Compiler *, class TrackedVarBitSetTraits>::UnionD(class Compiler *, unsigned __int64 *&, unsigned __int64 *)                                                                                                                                                                                                                                                                                                                                                  
1254762   : +44.04% : 0.47%  : +0.0024% : public: static void __cdecl BitSetOps<unsigned __int64 *, 1, struct BitVecTraits *, struct BitVecTraits>::UnionD(struct BitVecTraits *, unsigned __int64 *&, unsigned __int64 *)                                                                                                                                                                                                                                                                                                                                                 
1249642   : +0.86%  : 0.47%  : +0.0024% : jitstd::`anonymous namespace'::insertion_sort<unsigned int *,LclVarDsc_BlendedCode_Less>                                                                                                                                                                                                                                                                                                                                                                                                                                         
1231299   : +0.34%  : 0.47%  : +0.0023% : private: bool __cdecl LiveVarAnalysis::PerBlockAnalysis(struct BasicBlock *, bool)                                                                                                                                                                                                                                                                                                                                                                                                                                               
1206930   : +34.25% : 0.46%  : +0.0023% : public: void __cdecl LocalUses::RecordAccess(unsigned int, enum var_types, class ClassLayout *, enum AccessKindFlags, double)                                                                                                                                                                                                                                                                                                                                                                                                    
1140071   : +0.33%  : 0.43%  : +0.0022% : public: unsigned short __cdecl Compiler::optAddAssertion(struct Compiler::AssertionDsc *)                                                                                                                                                                                                                                                                                                                                                                                                                                        
1046252   : +0.68%  : 0.40%  : +0.0020% : public: static void __cdecl BitSetOps<unsigned __int64 *, 1, class Compiler *, class TrackedVarBitSetTraits>::Assign(class Compiler *, unsigned __int64 *&, unsigned __int64 *)                                                                                                                                                                                                                                                                                                                                                  
1044451   : +2.91%  : 0.39%  : +0.0020% : public: enum Compiler::fgWalkResult __cdecl GenTreeVisitor<class LocalSequencer>::WalkTree(struct GenTree **, struct GenTree *)                                                                                                                                                                                                                                                                                                                                                                                                  
1041712   : +0.10%  : 0.39%  : +0.0020% : public: void * __cdecl ArenaAllocator::allocateMemory(unsigned __int64)                                                                                                                                                                                                                                                                                                                                                                                                                                                          
1038673   : +0.66%  : 0.39%  : +0.0020% : public: virtual enum PhaseStatus __cdecl LinearScan::doLinearScan(void)                                                                                                                                                                                                                                                                                                                                                                                                                                                          
1025322   : +55.39% : 0.39%  : +0.0020% : VisitEHSuccs<0,`PromotionLiveness::AddHandlerLiveVars'::`2'::<lambda_1> >                                                                                                                                                                                                                                                                                                                                                                                                                                                        
1006709   : +1.29%  : 0.38%  : +0.0019% : private: static void __cdecl BitSetOps<unsigned __int64 *, 1, class Compiler *, class TrackedVarBitSetTraits>::LivenessDLong(class Compiler *, unsigned __int64 *&, unsigned __int64 *const, unsigned __int64 *const, unsigned __int64 *const)                                                                                                                                                                                                                                                                                   
835278    : +0.09%  : 0.32%  : +0.0016% : public: unsigned int __cdecl Compiler::gtSetEvalOrder(struct GenTree *)                                                                                                                                                                                                                                                                                                                                                                                                                                                          
822858    : +7.54%  : 0.31%  : +0.0016% : private: unsigned int __cdecl ClassLayoutTable::GetObjLayoutIndex(class Compiler *, struct CORINFO_CLASS_STRUCT_*)                                                                                                                                                                                                                                                                                                                                                                                                               
709225    : +0.09%  : 0.27%  : +0.0014% : private: struct GenTree * __cdecl Compiler::fgMorphSmpOp(struct GenTree *, struct Compiler::MorphAddrContext *, bool *)                                                                                                                                                                                                                                                                                                                                                                                                          
707092    : +0.16%  : 0.27%  : +0.0013% : GenTreeVisitor<`Compiler::fgSetTreeSeq'::`2'::SetTreeSeqVisitor>::WalkTree                                                                                                                                                                                                                                                                                                                                                                                                                                                       
678906    : +31.54% : 0.26%  : +0.0013% : private: void __cdecl ReplaceVisitor::InsertPreStatementReadBacks(void)                                                                                                                                                                                                                                                                                                                                                                                                                                                          
671398    : +15.43% : 0.25%  : +0.0013% : private: void __cdecl ReplaceVisitor::HandleStructStore(struct GenTree **, struct GenTree *)                                                                                                                                                                                                                                                                                                                                                                                                                                     
643742    : +0.39%  : 0.24%  : +0.0012% : jitstd::`anonymous namespace'::quick_sort<GcInfoEncoder::LifetimeTransition *,CompareLifetimeTransitionsByOffsetThenSlot>                                                                                                                                                                                                                                                                                                                                                                                                        
619633    : +0.19%  : 0.23%  : +0.0012% : private: void __cdecl Compiler::fgMarkUseDef(struct GenTreeLclVarCommon *)                                                                                                                                                                                                                                                                                                                                                                                                                                                       
613563    : +2.46%  : 0.23%  : +0.0012% : public: unsigned int __cdecl ValueNumStore::VNForMapSelectInner(enum ValueNumKind, enum var_types, unsigned int, unsigned int)                                                                                                                                                                                                                                                                                                                                                                                                   
611788    : +0.72%  : 0.23%  : +0.0012% : public: unsigned __int64 *& __cdecl Compiler::GetAssertionDep(unsigned int)                                                                                                                                                                                                                                                                                                                                                                                                                                                      
608985    : +7.92%  : 0.23%  : +0.0012% : public: unsigned int __cdecl ValueNumStore::VNForLoad(enum ValueNumKind, unsigned int, unsigned int, enum var_types, __int64, unsigned int)                                                                                                                                                                                                                                                                                                                                                                                      
579249    : +0.95%  : 0.22%  : +0.0011% : public: void __cdecl LinearScan::resolveEdge(struct BasicBlock *, struct BasicBlock *, enum LinearScan::ResolveType, unsigned __int64 *const &, unsigned __int64)                                                                                                                                                                                                                                                                                                                                                                
578651    : +0.11%  : 0.22%  : +0.0011% : public: struct GenTree * __cdecl Compiler::fgMorphTree(struct GenTree *, struct Compiler::MorphAddrContext *)                                                                                                                                                                                                                                                                                                                                                                                                                    
572476    : +17.95% : 0.22%  : +0.0011% : public: int __cdecl LocalUses::PickInducedPromotions(class Compiler *, unsigned int, class AggregateInfoMap &)                                                                                                                                                                                                                                                                                                                                                                                                                   
570240    : +0.58%  : 0.22%  : +0.0011% : public: static void __cdecl BitSetOps<unsigned __int64 *, 1, class Compiler *, class TrackedVarBitSetTraits>::ClearD(class Compiler *, unsigned __int64 *&)                                                                                                                                                                                                                                                                                                                                                                      
556257    : +14.37% : 0.21%  : +0.0011% : public: class ClassLayout * __cdecl Compiler::typGetObjLayout(struct CORINFO_CLASS_STRUCT_*)                                                                                                                                                                                                                                                                                                                                                                                                                                     
544615    : +1.43%  : 0.21%  : +0.0010% : `Compiler::optReachable'::`12'::<lambda_1>::operator()                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
539663    : +0.14%  : 0.20%  : +0.0010% : public: struct GenTree * __cdecl Compiler::optAssertionProp(unsigned __int64 *const &, struct GenTree *, struct Statement *, struct BasicBlock *)                                                                                                                                                                                                                                                                                                                                                                                
531527    : +0.28%  : 0.20%  : +0.0010% : public: enum ExceptionSetFlags __cdecl GenTree::OperExceptions(class Compiler *)                                                                                                                                                                                                                                                                                                                                                                                                                                                 
529705    : +1.56%  : 0.20%  : +0.0010% : BasicBlock::VisitAllSuccs<`Compiler::optReachable'::`12'::<lambda_1> >                                                                                                                                                                                                                                                                                                                                                                                                                                                           
527486    : +0.14%  : 0.20%  : +0.0010% : public: void __cdecl Compiler::optAssertionGen(struct GenTree *)                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
527482    : +0.34%  : 0.20%  : +0.0010% : public: void __cdecl BitStreamWriter::Write(unsigned __int64, unsigned int)                                                                                                                                                                                                                                                                                                                                                                                                                                                      
521687    : +0.19%  : 0.20%  : +0.0010% : public: void __cdecl GcInfoEncoder::Build(void)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
503326    : +0.98%  : 0.19%  : +0.0010% : `Compiler::optCopyPropPushDef'::`2'::<lambda_1>::operator()                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
480354    : +1.19%  : 0.18%  : +0.0009% : public: static void __cdecl BitSetOps<unsigned __int64 *, 1, struct BitVecTraits *, struct BitVecTraits>::Assign(struct BitVecTraits *, unsigned __int64 *&, unsigned __int64 *)                                                                                                                                                                                                                                                                                                                                                 
476816    : +0.59%  : 0.18%  : +0.0009% : public: static unsigned __int64 * __cdecl BitSetOps<unsigned __int64 *, 1, struct BitVecTraits *, struct BitVecTraits>::MakeCopy(struct BitVecTraits *, unsigned __int64 *)                                                                                                                                                                                                                                                                                                                                                      
473912    : +0.27%  : 0.18%  : +0.0009% : public: bool __cdecl Compiler::optBlockCopyProp(struct BasicBlock *, class JitHashTable<unsigned int, struct JitSmallPrimitiveKeyFuncs<unsigned int>, class ArrayStack<class Compiler::CopyPropSsaDef> *, class CompAllocator, class JitHashTableBehavior> *)                                                                                                                                                                                                                                                                    
454801    : +0.38%  : 0.17%  : +0.0009% : public: void __cdecl Compiler::fgValueNumberBlock(struct BasicBlock *)                                                                                                                                                                                                                                                                                                                                                                                                                                                           
451569    : +0.23%  : 0.17%  : +0.0009% : public: unsigned short __cdecl Compiler::optCreateAssertion(struct GenTree *, struct GenTree *, enum Compiler::optAssertionKind, bool)                                                                                                                                                                                                                                                                                                                                                                                           
439431    : +9.25%  : 0.17%  : +0.0008% : private: void __cdecl DecompositionPlan::FinalizeCopy(class DecompositionStatementList *)                                                                                                                                                                                                                                                                                                                                                                                                                                        
423262    : +0.41%  : 0.16%  : +0.0008% : public: double __cdecl BasicBlock::getBBWeight(class Compiler *) const                                                                                                                                                                                                                                                                                                                                                                                                                                                           
423140    : +1.19%  : 0.16%  : +0.0008% : public: bool __cdecl JitHashTable<struct ValueNumStore::VNDefFuncApp<2>, struct ValueNumStore::VNDefFuncAppKeyFuncs<2>, class ValueNumStore::MapSelectWorkCacheEntry, class CompAllocator, class JitHashTableBehavior>::Set(struct ValueNumStore::VNDefFuncApp<2>, class ValueNumStore::MapSelectWorkCacheEntry, enum JitHashTable<struct ValueNumStore::VNDefFuncApp<2>, struct ValueNumStore::VNDefFuncAppKeyFuncs<2>, class ValueNumStore::MapSelectWorkCacheEntry, class CompAllocator, class JitHashTableBehavior>::SetKind)
405758    : +0.40%  : 0.15%  : +0.0008% : public: static void __cdecl BitSetOps<unsigned __int64 *, 1, struct BitVecTraits *, struct BitVecTraits>::IntersectionD(struct BitVecTraits *, unsigned __int64 *&, unsigned __int64 *)                                                                                                                                                                                                                                                                                                                                          
396193    : +1.05%  : 0.15%  : +0.0008% : public: void __cdecl LinearScan::handleOutgoingCriticalEdges(struct BasicBlock *)                                                                                                                                                                                                                                                                                                                                                                                                                                                
389092    : +0.13%  : 0.15%  : +0.0007% : protected: static enum Compiler::fgWalkResult __cdecl Compiler::optVNAssertionPropCurStmtVisitor(struct GenTree **, struct Compiler::fgWalkData *)                                                                                                                                                                                                                                                                                                                                                                               
382748    : +13.71% : 0.14%  : +0.0007% : private: static unsigned __int64 __cdecl Promotion::BinarySearch<struct Replacement, 0>(class jitstd::vector<struct Replacement, class jitstd::allocator<struct Replacement>> const &, unsigned int)                                                                                                                                                                                                                                                                                                                             
367883    : +8.73%  : 0.14%  : +0.0007% : public: bool __cdecl AggregateInfo::OverlappingReplacements(unsigned int, unsigned int, struct Replacement **, struct Replacement **)                                                                                                                                                                                                                                                                                                                                                                                            
367679    : +0.51%  : 0.14%  : +0.0007% : private: void __cdecl SsaBuilder::InsertPhiFunctions(void)                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
366248    : +1.06%  : 0.14%  : +0.0007% : private: unsigned int __cdecl SsaBuilder::RenamePushDef(struct GenTree *, struct BasicBlock *, unsigned int, bool)                                                                                                                                                                                                                                                                                                                                                                                                               
364218    : +0.16%  : 0.14%  : +0.0007% : public: enum Compiler::fgWalkResult __cdecl GenTreeVisitor<class GenericTreeWalker<0, 1, 0, 1>>::WalkTree(struct GenTree **, struct GenTree *)                                                                                                                                                                                                                                                                                                                                                                                   
362871    : +0.30%  : 0.14%  : +0.0007% : public: enum BasicBlockVisit __cdecl BasicBlock::VisitRegularSuccs<class `private: bool __cdecl LiveVarAnalysis::PerBlockAnalysis(struct BasicBlock *, bool)'::`2'::<lambda_1>>(class Compiler *, class `private: bool __cdecl LiveVarAnalysis::PerBlockAnalysis(struct BasicBlock *, bool)'::`2'::<lambda_1>)                                                                                                                                                                                                                   
362464    : +0.44%  : 0.14%  : +0.0007% : public: bool __cdecl GenTree::OperRequiresAsgFlag(void) const                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
354853    : +0.37%  : 0.13%  : +0.0007% : public: enum PhaseStatus __cdecl Compiler::optVnCopyProp(void)                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
345892    : +0.26%  : 0.13%  : +0.0007% : private: void __cdecl TreeLifeUpdater<1>::UpdateLifeVar(struct GenTree *, struct GenTreeLclVarCommon *)                                                                                                                                                                                                                                                                                                                                                                                                                          
339919    : +0.20%  : 0.13%  : +0.0006% : private: void __cdecl GcInfoEncoder::SizeofSlotStateVarLengthVector(class BitArray const &, unsigned int, unsigned int, unsigned int *, unsigned int *, unsigned int *)                                                                                                                                                                                                                                                                                                                                                          
338111    : +1.21%  : 0.13%  : +0.0006% : `Compiler::fgValueNumberLocalStore'::`2'::<lambda_1>::operator()                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
328790    : +1.05%  : 0.12%  : +0.0006% : public: bool __cdecl LclVarDsc_BlendedCode_Less::operator()(unsigned int, unsigned int)                                                                                                                                                                                                                                                                                                                                                                                                                                          
323762    : +0.70%  : 0.12%  : +0.0006% : public: unsigned short __cdecl Compiler::optLocalAssertionIsEqualOrNotEqual(enum Compiler::optOp1Kind, unsigned int, enum Compiler::optOp2Kind, __int64, unsigned __int64 *const &)                                                                                                                                                                                                                                                                                                                                              
313380    : +0.09%  : 0.12%  : +0.0006% : protected: void __cdecl CodeGen::genCodeForBBlist(void)                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
312469    : +15.17% : 0.12%  : +0.0006% : public: void __cdecl LocalSequencer::SequenceCall(struct GenTreeCall *)                                                                                                                                                                                                                                                                                                                                                                                                                                                          
310072    : +30.46% : 0.12%  : +0.0006% : private: class LocalUses * __cdecl LocalsUseVisitor::GetOrCreateUses(unsigned int)                                                                                                                                                                                                                                                                                                                                                                                                                                               
303470    : +0.15%  : 0.11%  : +0.0006% : protected: void __cdecl Compiler::lvaMarkLclRefs(struct GenTree *, struct BasicBlock *, struct Statement *, bool)                                                                                                                                                                                                                                                                                                                                                                                                                
301090    : +0.15%  : 0.11%  : +0.0006% : public: void __cdecl Compiler::gtUpdateNodeOperSideEffects(struct GenTree *)                                                                                                                                                                                                                                                                                                                                                                                                                                                     
299152    : +14.19% : 0.11%  : +0.0006% : private: void __cdecl PromotionLiveness::MarkIndex(unsigned int, bool, bool, unsigned __int64 *&, unsigned __int64 *&)                                                                                                                                                                                                                                                                                                                                                                                                           
276899    : +1.58%  : 0.10%  : +0.0005% : public: bool __cdecl Compiler::optReachable(struct BasicBlock *const, struct BasicBlock *const, struct BasicBlock *const)                                                                                                                                                                                                                                                                                                                                                                                                        
271706    : +0.17%  : 0.10%  : +0.0005% : public: void __cdecl Compiler::lvaSortByRefCount(void)                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
270968    : +1.25%  : 0.10%  : +0.0005% : VisitEHSuccs<0,`Compiler::optReachable'::`12'::<lambda_1> >                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
264842    : +0.59%  : 0.10%  : +0.0005% : private: unsigned int __cdecl ValueNumStore::VnForConst<__int64, class ValueNumStore::VNMap<__int64, struct JitLargePrimitiveKeyFuncs<__int64>>>(__int64, class ValueNumStore::VNMap<__int64, struct JitLargePrimitiveKeyFuncs<__int64>> *, enum var_types)                                                                                                                                                                                                                                                                      
-302523   : -0.11%  : 0.11%  : -0.0006% : protected: unsigned __int64 __cdecl emitter::emitOutputInstr(struct insGroup *, struct emitter::instrDesc *, unsigned char **)                                                                                                                                                                                                                                                                                                                                                                                                   
-443046   : -0.70%  : 0.17%  : -0.0008% : public: unsigned int __cdecl ValueNumStore::VNForIntCon(int)                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
-444349   : -7.23%  : 0.17%  : -0.0008% : public: unsigned int __cdecl Compiler::fgValueNumberByrefExposedLoad(enum var_types, unsigned int)                                                                                                                                                                                                                                                                                                                                                                                                                               
-473405   : -0.73%  : 0.18%  : -0.0009% : public: unsigned char * __cdecl emitter::emitOutputSV(unsigned char *, struct emitter::instrDesc *, unsigned __int64, struct emitter::CnsVal *)                                                                                                                                                                                                                                                                                                                                                                                  
-487953   : -1.10%  : 0.18%  : -0.0009% : private: bool __cdecl ValueNumStore::VNEvalCanFoldBinaryFunc(enum var_types, enum VNFunc, unsigned int, unsigned int)                                                                                                                                                                                                                                                                                                                                                                                                            
-502836   : -0.50%  : 0.19%  : -0.0010% : public: unsigned int __cdecl ValueNumStore::VNForFunc(enum var_types, enum VNFunc, unsigned int)                                                                                                                                                                                                                                                                                                                                                                                                                                 
-630341   : -1.31%  : 0.24%  : -0.0012% : private: unsigned int __cdecl ValueNumStore::VnForConst<int, class ValueNumStore::VNMap<int, struct JitLargePrimitiveKeyFuncs<int>>>(int, class ValueNumStore::VNMap<int, struct JitLargePrimitiveKeyFuncs<int>> *, enum var_types)                                                                                                                                                                                                                                                                                              
-830585   : -0.87%  : 0.31%  : -0.0016% : public: bool __cdecl emitter::TakesEvexPrefix(struct emitter::instrDesc const *) const                                                                                                                                                                                                                                                                                                                                                                                                                                           
-914835   : -2.89%  : 0.35%  : -0.0017% : public: unsigned int __cdecl ValueNumStore::VNForFunc(enum var_types, enum VNFunc, unsigned int, unsigned int, unsigned int)                                                                                                                                                                                                                                                                                                                                                                                                     
-1143748  : -0.67%  : 0.43%  : -0.0022% : private: void __cdecl JitHashTable<struct ValueNumStore::VNDefFuncApp<2>, struct ValueNumStore::VNDefFuncAppKeyFuncs<2>, class ValueNumStore::MapSelectWorkCacheEntry, class CompAllocator, class JitHashTableBehavior>::Grow(void)                                                                                                                                                                                                                                                                                              
-1221875  : -0.54%  : 0.46%  : -0.0023% : private: struct ValueNumStore::Chunk * __cdecl ValueNumStore::GetAllocChunk(enum var_types, enum ValueNumStore::ChunkExtraAttribs)                                                                                                                                                                                                                                                                                                                                                                                               
-1251505  : -1.08%  : 0.47%  : -0.0024% : private: unsigned int __cdecl ValueNumStore::EvalUsingMathIdentity(enum var_types, enum VNFunc, unsigned int, unsigned int)                                                                                                                                                                                                                                                                                                                                                                                                      
-1335169  : -0.30%  : 0.50%  : -0.0025% : memset                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
-1865260  : -0.84%  : 0.70%  : -0.0036% : public: bool __cdecl ValueNumStore::VNMap<struct ValueNumStore::VNDefFuncApp<2>, struct ValueNumStore::VNDefFuncAppKeyFuncs<2>>::Set(struct ValueNumStore::VNDefFuncApp<2>, unsigned int)                                                                                                                                                                                                                                                                                                                                        
-2233076  : -0.58%  : 0.84%  : -0.0043% : public: unsigned int __cdecl ValueNumStore::VNForFunc(enum var_types, enum VNFunc, unsigned int, unsigned int)                                                                                                                                                                                                                                                                                                                                                                                                                   
-2265191  : -0.22%  : 0.86%  : -0.0043% : public: unsigned __int64 __cdecl LinearScan::RegisterSelection::select<0>(class Interval *, class RefPosition *)                                                                                                                                                                                                                                                                                                                                                                                                                 
-2266320  : -0.23%  : 0.86%  : -0.0043% : public: void __cdecl LinearScan::allocateRegisters(void)                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
-3154163  : -8.91%  : 1.19%  : -0.0060% : public: bool __cdecl Compiler::fgComputeLifeUntrackedLocal(unsigned __int64 *&, unsigned __int64 *const &, class LclVarDsc &, struct GenTreeLclVarCommon *)                                                                                                                                                                                                                                                                                                                                                                      
-5532824  : -3.21%  : 2.09%  : -0.0106% : public: void __cdecl Compiler::fgComputeLife(unsigned __int64 *&, struct GenTree *, struct GenTree *, unsigned __int64 *const &, bool *)                                                                                                                                                                                                                                                                                                                                                                                         
-7471934  : -1.48%  : 2.82%  : -0.0143% : private: void __cdecl LinearScan::processBlockStartLocations(struct BasicBlock *)                                                                                                                                                                                                                                                                                                                                                                                                                                                
-16091413 : -5.82%  : 6.08%  : -0.0307% : public: void __cdecl Compiler::fgPerNodeLocalVarLiveness(struct GenTree *)                                                                                                                                                                                                                                                                                                                                                                                                                                                       

Cost mainly seems to be from having more tracked locals participating in various opts.

@jakobbotsch
Copy link
Member Author

/azp run Fuzzlyn

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-jit-experimental, Fuzzlyn

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@jakobbotsch
Copy link
Member Author

Looking at some regressions...

benchmarks.run_pgo Microsoft.CodeAnalysis.CSharp.Symbols.ParameterSymbol:Microsoft.Cci.IParameterTypeInformation.get_CustomModifiers()
@@ -8,20 +8,20 @@
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T00] (  6,  3.50)     ref  ->  rcx         this class-hnd single-def <Microsoft.CodeAnalysis.CSharp.Symbols.ParameterSymbol>
-;  V01 loc0         [V01    ] (  3,  1.50)  struct (24) [rsp+0x20]  do-not-enreg[XS] must-init addr-exposed ld-addr-op <Microsoft.CodeAnalysis.CSharp.Symbols.TypeWithAnnotations>
+;  V01 loc0         [V01    ] (  6,  2   )  struct (24) [rsp+0x20]  do-not-enreg[HS] must-init hidden-struct-arg ld-addr-op <Microsoft.CodeAnalysis.CSharp.Symbols.TypeWithAnnotations>
 ;  V02 OutArgs      [V02    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
 ;* V03 tmp1         [V03    ] (  0,  0   )     ref  ->  zero-ref    class-hnd single-def "spilling ret_expr" <Microsoft.CodeAnalysis.CSharp.Symbols.ParameterSymbol>
 ;* V04 tmp2         [V04    ] (  0,  0   )  struct ( 8) zero-ref    "spilled call-like call argument" <System.Collections.Immutable.ImmutableArray`1[Microsoft.CodeAnalysis.CustomModifier]>
 ;* V05 tmp3         [V05    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "guarded devirt this exact temp" <Microsoft.CodeAnalysis.CSharp.Symbols.Metadata.PE.PEParameterSymbol>
 ;* V06 tmp4         [V06    ] (  0,  0   )  struct ( 8) zero-ref    single-def "guarded devirt return temp" <System.Collections.Immutable.ImmutableArray`1[Microsoft.CodeAnalysis.CustomModifier]>
-;  V07 tmp5         [V07,T01] (  5,  5   )     ref  ->  rcx         single-def "guarded devirt arg temp"
+;* V07 tmp5         [V07    ] (  0,  0   )     ref  ->  zero-ref    single-def "guarded devirt arg temp"
 ;* V08 tmp6         [V08    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "guarded devirt this exact temp" <Microsoft.CodeAnalysis.CSharp.Symbols.TypeWithAnnotations+NonLazyType>
 ;* V09 tmp7         [V09    ] (  0,  0   )  struct ( 8) zero-ref    "Inline return value spill temp" <System.Collections.Immutable.ImmutableArray`1[Microsoft.Cci.ICustomModifier]>
 ;* V10 tmp8         [V10    ] (  0,  0   )     ref  ->  zero-ref    class-hnd "Inline stloc first use temp" <<unknown class>>
 ;* V11 tmp9         [V11    ] (  0,  0   )  struct ( 8) zero-ref    ld-addr-op "NewObj constructor temp" <System.Collections.Immutable.ImmutableArray`1[Microsoft.Cci.ICustomModifier]>
-;  V12 tmp10        [V12    ] (  2,  0.50)     ref  ->  [rsp+0x20]  do-not-enreg[X] addr-exposed "field V01.DefaultType (fldOffset=0x0)" P-DEP
-;  V13 tmp11        [V13    ] (  3,  1.50)     ref  ->  [rsp+0x28]  do-not-enreg[X] addr-exposed "field V01._extensions (fldOffset=0x8)" P-DEP
-;  V14 tmp12        [V14    ] (  2,  0.50)   ubyte  ->  [rsp+0x30]  do-not-enreg[X] addr-exposed "field V01.DefaultNullableAnnotation (fldOffset=0x10)" P-DEP
+;  V12 tmp10        [V12,T03] (  2,  0.50)     ref  ->  [rsp+0x20]  do-not-enreg[H] hidden-struct-arg "field V01.DefaultType (fldOffset=0x0)" P-DEP
+;  V13 tmp11        [V13,T01] (  6,  2   )     ref  ->  [rsp+0x28]  do-not-enreg[H] hidden-struct-arg "field V01._extensions (fldOffset=0x8)" P-DEP
+;  V14 tmp12        [V14,T04] (  2,  0.50)   ubyte  ->  [rsp+0x30]  do-not-enreg[H] hidden-struct-arg "field V01.DefaultNullableAnnotation (fldOffset=0x10)" P-DEP
 ;* V15 tmp13        [V15    ] (  0,  0   )     ref  ->  zero-ref    single-def "field V04.array (fldOffset=0x0)" P-INDEP
 ;  V16 tmp14        [V16,T02] (  3,  1.50)     ref  ->  rax         single-def "field V06.array (fldOffset=0x0)" P-INDEP
 ;* V17 tmp15        [V17    ] (  0,  0   )     ref  ->  zero-ref    "field V09.array (fldOffset=0x0)" P-INDEP
@@ -50,18 +50,19 @@ G_M54628_IG03:        ; bbWeight=0.50, gcrefRegs=0002 {rcx}, byrefRegs=0000 {},
 						;; size=20 bbWeight=0.50 PerfScore 4.00
 G_M54628_IG04:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
        ; gcrRegs -[rcx]
-       mov      rcx, gword ptr [rsp+0x28]
-       ; gcrRegs +[rcx]
-       mov      rax, 0xD1FFAB1E      ; Microsoft.CodeAnalysis.CSharp.Symbols.TypeWithAnnotations+NonLazyType
-       cmp      qword ptr [rcx], rax
+       mov      rax, gword ptr [rsp+0x28]
+       ; gcrRegs +[rax]
+       mov      rdx, 0xD1FFAB1E      ; Microsoft.CodeAnalysis.CSharp.Symbols.TypeWithAnnotations+NonLazyType
+       cmp      qword ptr [rax], rdx
        jne      SHORT G_M54628_IG08
 						;; size=20 bbWeight=1 PerfScore 5.25
-G_M54628_IG05:        ; bbWeight=0.50, gcrefRegs=0002 {rcx}, byrefRegs=0000 {}, byref
-       mov      rax, gword ptr [rcx+0x08]
+G_M54628_IG05:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+       ; gcrRegs -[rax]
+       mov      rax, gword ptr [rsp+0x28]
        ; gcrRegs +[rax]
-						;; size=4 bbWeight=0.50 PerfScore 1.00
+       mov      rax, gword ptr [rax+0x08]
+						;; size=9 bbWeight=0.50 PerfScore 1.50
 G_M54628_IG06:        ; bbWeight=1, gcrefRegs=0001 {rax}, byrefRegs=0000 {}, byref, epilog, nogc
-       ; gcrRegs -[rcx]
        add      rsp, 56
        ret      
 						;; size=5 bbWeight=1 PerfScore 1.25
@@ -75,17 +76,21 @@ G_M54628_IG07:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0002 {
        ; gcr arg pop 0
        jmp      SHORT G_M54628_IG04
 						;; size=20 bbWeight=0 PerfScore 0.00
-G_M54628_IG08:        ; bbWeight=0, gcrefRegs=0002 {rcx}, byrefRegs=0000 {}, byref, isz
+G_M54628_IG08:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
+       mov      rcx, gword ptr [rsp+0x28]
        ; gcrRegs +[rcx]
-       mov      rax, qword ptr [rcx]
+       mov      rax, gword ptr [rsp+0x28]
+       ; gcrRegs +[rax]
+       mov      rax, qword ptr [rax]
+       ; gcrRegs -[rax]
        mov      rax, qword ptr [rax+0x48]
        call     [rax+0x38]<unknown method>
        ; gcrRegs -[rcx] +[rax]
        ; gcr arg pop 0
        jmp      SHORT G_M54628_IG06
-						;; size=12 bbWeight=0 PerfScore 0.00
+						;; size=22 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 117, prolog size 21, PerfScore 19.58, instruction count 28, allocated bytes for code 117 (MethodHash=6a052a9b) for method Microsoft.CodeAnalysis.CSharp.Symbols.ParameterSymbol:Microsoft.Cci.IParameterTypeInformation.get_CustomModifiers():System.Collections.Immutable.ImmutableArray`1[Microsoft.Cci.ICustomModifier]:this (Tier1)
+; Total bytes of code 132, prolog size 21, PerfScore 20.08, instruction count 31, allocated bytes for code 132 (MethodHash=6a052a9b) for method Microsoft.CodeAnalysis.CSharp.Symbols.ParameterSymbol:Microsoft.Cci.IParameterTypeInformation.get_CustomModifiers():System.Collections.Immutable.ImmutableArray`1[Microsoft.Cci.ICustomModifier]:this (Tier1)

Looks like a case where we don't realize that this will be a retbuf and end up with old promotion, resulting in dependent promotion. If I hack the DNER in we get the following diff:

@@ -8,24 +8,22 @@
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T00] (  6,  3.50)     ref  ->  rcx         this class-hnd single-def <Microsoft.CodeAnalysis.CSharp.Symbols.ParameterSymbol>
-;  V01 loc0         [V01    ] (  3,  1.50)  struct (24) [rsp+0x20]  do-not-enreg[XS] must-init addr-exposed ld-addr-op <Microsoft.CodeAnalysis.CSharp.Symbols.TypeWithAnnotations>
+;  V01 loc0         [V01,T03] (  2,  0   )  struct (24) [rsp+0x20]  do-not-enreg[HS] must-init hidden-struct-arg ld-addr-op <Microsoft.CodeAnalysis.CSharp.Symbols.TypeWithAnnotations>
 ;  V02 OutArgs      [V02    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
 ;* V03 tmp1         [V03    ] (  0,  0   )     ref  ->  zero-ref    class-hnd single-def "spilling ret_expr" <Microsoft.CodeAnalysis.CSharp.Symbols.ParameterSymbol>
 ;* V04 tmp2         [V04    ] (  0,  0   )  struct ( 8) zero-ref    "spilled call-like call argument" <System.Collections.Immutable.ImmutableArray`1[Microsoft.CodeAnalysis.CustomModifier]>
 ;* V05 tmp3         [V05    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "guarded devirt this exact temp" <Microsoft.CodeAnalysis.CSharp.Symbols.Metadata.PE.PEParameterSymbol>
 ;* V06 tmp4         [V06    ] (  0,  0   )  struct ( 8) zero-ref    single-def "guarded devirt return temp" <System.Collections.Immutable.ImmutableArray`1[Microsoft.CodeAnalysis.CustomModifier]>
-;  V07 tmp5         [V07,T01] (  5,  5   )     ref  ->  rcx         single-def "guarded devirt arg temp"
+;* V07 tmp5         [V07    ] (  0,  0   )     ref  ->  zero-ref    single-def "guarded devirt arg temp"
 ;* V08 tmp6         [V08    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "guarded devirt this exact temp" <Microsoft.CodeAnalysis.CSharp.Symbols.TypeWithAnnotations+NonLazyType>
 ;* V09 tmp7         [V09    ] (  0,  0   )  struct ( 8) zero-ref    "Inline return value spill temp" <System.Collections.Immutable.ImmutableArray`1[Microsoft.Cci.ICustomModifier]>
 ;* V10 tmp8         [V10    ] (  0,  0   )     ref  ->  zero-ref    class-hnd "Inline stloc first use temp" <<unknown class>>
 ;* V11 tmp9         [V11    ] (  0,  0   )  struct ( 8) zero-ref    ld-addr-op "NewObj constructor temp" <System.Collections.Immutable.ImmutableArray`1[Microsoft.Cci.ICustomModifier]>
-;  V12 tmp10        [V12    ] (  2,  0.50)     ref  ->  [rsp+0x20]  do-not-enreg[X] addr-exposed "field V01.DefaultType (fldOffset=0x0)" P-DEP
-;  V13 tmp11        [V13    ] (  3,  1.50)     ref  ->  [rsp+0x28]  do-not-enreg[X] addr-exposed "field V01._extensions (fldOffset=0x8)" P-DEP
-;  V14 tmp12        [V14    ] (  2,  0.50)   ubyte  ->  [rsp+0x30]  do-not-enreg[X] addr-exposed "field V01.DefaultNullableAnnotation (fldOffset=0x10)" P-DEP
-;* V15 tmp13        [V15    ] (  0,  0   )     ref  ->  zero-ref    single-def "field V04.array (fldOffset=0x0)" P-INDEP
-;  V16 tmp14        [V16,T02] (  3,  1.50)     ref  ->  rax         single-def "field V06.array (fldOffset=0x0)" P-INDEP
-;* V17 tmp15        [V17    ] (  0,  0   )     ref  ->  zero-ref    "field V09.array (fldOffset=0x0)" P-INDEP
-;* V18 tmp16        [V18    ] (  0,  0   )     ref  ->  zero-ref    single-def "field V11.array (fldOffset=0x0)" P-INDEP
+;* V12 tmp10        [V12    ] (  0,  0   )     ref  ->  zero-ref    single-def "field V04.array (fldOffset=0x0)" P-INDEP
+;  V13 tmp11        [V13,T02] (  3,  1.50)     ref  ->  rax         single-def "field V06.array (fldOffset=0x0)" P-INDEP
+;* V14 tmp12        [V14    ] (  0,  0   )     ref  ->  zero-ref    "field V09.array (fldOffset=0x0)" P-INDEP
+;* V15 tmp13        [V15    ] (  0,  0   )     ref  ->  zero-ref    single-def "field V11.array (fldOffset=0x0)" P-INDEP
+;  V16 tmp14        [V16,T01] (  6,  2   )     ref  ->  rcx         "V01.[008..016)"
 ;
 ; Lcl frame size = 56
 
@@ -42,20 +40,14 @@ G_M54628_IG02:        ; bbWeight=1, gcrefRegs=0002 {rcx}, byrefRegs=0000 {}, byr
        cmp      qword ptr [rcx], rax
        jne      SHORT G_M54628_IG07
 						;; size=15 bbWeight=1 PerfScore 4.25
-G_M54628_IG03:        ; bbWeight=0.50, gcrefRegs=0002 {rcx}, byrefRegs=0000 {}, byref, nogc
-       vmovdqu  xmm0, xmmword ptr [rcx+0x40]
-       vmovdqu  xmmword ptr [rsp+0x20], xmm0
-       mov      rax, qword ptr [rcx+0x50]
-       mov      qword ptr [rsp+0x30], rax
-						;; size=20 bbWeight=0.50 PerfScore 4.00
-G_M54628_IG04:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
-       ; gcrRegs -[rcx]
-       mov      rcx, gword ptr [rsp+0x28]
-       ; gcrRegs +[rcx]
+G_M54628_IG03:        ; bbWeight=0.50, gcrefRegs=0002 {rcx}, byrefRegs=0000 {}, byref
+       mov      rcx, gword ptr [rcx+0x48]
+						;; size=4 bbWeight=0.50 PerfScore 1.00
+G_M54628_IG04:        ; bbWeight=1, gcrefRegs=0002 {rcx}, byrefRegs=0000 {}, byref, isz
        mov      rax, 0xD1FFAB1E      ; Microsoft.CodeAnalysis.CSharp.Symbols.TypeWithAnnotations+NonLazyType
        cmp      qword ptr [rcx], rax
        jne      SHORT G_M54628_IG08
-						;; size=20 bbWeight=1 PerfScore 5.25
+						;; size=15 bbWeight=1 PerfScore 4.25
 G_M54628_IG05:        ; bbWeight=0.50, gcrefRegs=0002 {rcx}, byrefRegs=0000 {}, byref
        mov      rax, gword ptr [rcx+0x08]
        ; gcrRegs +[rax]
@@ -73,10 +65,11 @@ G_M54628_IG07:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0002 {
        call     [rax+0x38]<unknown method>
        ; gcrRegs -[rcx]
        ; gcr arg pop 0
-       jmp      SHORT G_M54628_IG04
-						;; size=20 bbWeight=0 PerfScore 0.00
-G_M54628_IG08:        ; bbWeight=0, gcrefRegs=0002 {rcx}, byrefRegs=0000 {}, byref, isz
+       mov      rcx, gword ptr [rsp+0x28]
        ; gcrRegs +[rcx]
+       jmp      SHORT G_M54628_IG04
+						;; size=25 bbWeight=0 PerfScore 0.00
+G_M54628_IG08:        ; bbWeight=0, gcrefRegs=0002 {rcx}, byrefRegs=0000 {}, byref, isz
        mov      rax, qword ptr [rcx]
        mov      rax, qword ptr [rax+0x48]
        call     [rax+0x38]<unknown method>
@@ -85,17 +78,11 @@ G_M54628_IG08:        ; bbWeight=0, gcrefRegs=0002 {rcx}, byrefRegs=0000 {}, byr
        jmp      SHORT G_M54628_IG06
 						;; size=12 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 117, prolog size 21, PerfScore 19.58, instruction count 28, allocated bytes for code 117 (MethodHash=6a052a9b) for method Microsoft.CodeAnalysis.CSharp.Symbols.ParameterSymbol:Microsoft.Cci.IParameterTypeInformation.get_CustomModifiers():System.Collections.Immutable.ImmutableArray`1[Microsoft.Cci.ICustomModifier]:this (Tier1)
+; Total bytes of code 101, prolog size 21, PerfScore 15.58, instruction count 25, allocated bytes for code 101 (MethodHash=6a052a9b) for method Microsoft.CodeAnalysis.CSharp.Symbols.ParameterSymbol:Microsoft.Cci.IParameterTypeInformation.get_CustomModifiers():System.Collections.Immutable.ImmutableArray`1[Microsoft.Cci.ICustomModifier]:this (Tier1)

Opened #104571 about this.

@jakobbotsch
Copy link
Member Author

benchmarks.run_pgo System.Buffers.Text.FormattingHelpers:TryFormat[double]
@@ -9,11 +9,11 @@
 ; 4 inlinees with PGO data; 5 single block inlinees; 0 inlinees without PGO data
 ; Final local variable assignments
 ;
-;  V00 arg0         [V00,T18] (  3,  3   )  double  ->  [rsp+0x90]  ld-addr-op single-def
+;  V00 arg0         [V00,T16] (  3,  3   )  double  ->  [rsp+0x90]  ld-addr-op single-def
 ;  V01 arg1         [V01,T01] (  3,  6   )   byref  ->  rbx         single-def
 ;  V02 arg2         [V02,T02] (  3,  3   )   byref  ->   r8         single-def
 ;  V03 arg3         [V03    ] (  5,  4   )  struct ( 8) [rsp+0xA8]  do-not-enreg[XSF] addr-exposed ld-addr-op single-def <System.Buffers.StandardFormat>
-;  V04 loc0         [V04    ] (  4,  3   )  struct (16) [rsp+0x50]  do-not-enreg[XS] must-init addr-exposed ld-addr-op <System.Span`1[ushort]>
+;  V04 loc0         [V04,T17] (  3,  0   )  struct (16) [rsp+0x50]  do-not-enreg[HS] must-init hidden-struct-arg ld-addr-op <System.Span`1[ushort]>
 ;* V05 loc1         [V05    ] (  0,  0   )  struct (16) zero-ref    <System.Span`1[ushort]>
 ;* V06 loc2         [V06    ] (  0,  0   )  double  ->  zero-ref    ld-addr-op
 ;  V07 OutArgs      [V07    ] (  1,  1   )  struct (40) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
@@ -23,35 +23,37 @@
 ;* V11 tmp4         [V11    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.Span`1[ushort]>
 ;* V12 tmp5         [V12    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg" <System.Span`1[ushort]>
 ;* V13 tmp6         [V13    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.ReadOnlySpan`1[ushort]>
-;  V14 tmp7         [V14,T04] (  2,  4   )     int  ->  rbp         "Inlining Arg"
-;  V15 tmp8         [V15,T03] (  2,  4   )   byref  ->  rdi         single-def "Inlining Arg"
+;* V14 tmp7         [V14    ] (  0,  0   )     int  ->  zero-ref    "Inlining Arg"
+;* V15 tmp8         [V15    ] (  0,  0   )   byref  ->  zero-ref    single-def "Inlining Arg"
 ;* V16 tmp9         [V16    ] (  0,  0   )   ubyte  ->  zero-ref    "Inlining Arg"
 ;* V17 tmp10        [V17    ] (  0,  0   )   ubyte  ->  zero-ref    "Inlining Arg"
 ;  V18 tmp11        [V18,T00] (  9,  9.75)     ref  ->  rax         class-hnd single-def "Inlining Arg" <System.Globalization.CultureInfo>
 ;* V19 tmp12        [V19    ] (  0,  0   )  double  ->  zero-ref    "impAppendStmt"
-;  V20 tmp13        [V20,T08] (  3,  2   )     ref  ->   r8         class-hnd "Inline return value spill temp" <System.Globalization.NumberFormatInfo>
-;  V21 tmp14        [V21,T09] (  3,  2   )     ref  ->   r8         class-hnd "Inline return value spill temp" <System.Globalization.NumberFormatInfo>
+;  V20 tmp13        [V20,T07] (  3,  2   )     ref  ->   r8         class-hnd "Inline return value spill temp" <System.Globalization.NumberFormatInfo>
+;  V21 tmp14        [V21,T08] (  3,  2   )     ref  ->   r8         class-hnd "Inline return value spill temp" <System.Globalization.NumberFormatInfo>
 ;* V22 tmp15        [V22    ] (  0,  0   )     ref  ->  zero-ref    class-hnd single-def ptr "Inline stloc first use temp" <System.Globalization.CultureInfo>
-;  V23 tmp16        [V23,T06] (  2,  3   )     ref  ->   r8         class-hnd "spilling qmarkNull" <System.Globalization.NumberFormatInfo>
-;  V24 tmp17        [V24,T05] (  6,  3.19)     ref  ->   r8        
-;  V25 tmp18        [V25,T07] (  4,  2.06)     ref  ->  rax         single-def "ISINST eval op1"
-;  V26 tmp19        [V26,T15] (  3,  1.31)     ref  ->   r8         class-hnd "spilling qmarkNull" <System.Globalization.NumberFormatInfo>
-;  V27 tmp20        [V27,T16] (  2,  1   )     ref  ->   r8         class-hnd exact single-def "dup spill" <System.Globalization.NumberFormatInfo>
-;  V28 tmp21        [V28,T17] (  4,  0.88)     ref  ->   r8        
+;  V23 tmp16        [V23,T05] (  2,  3   )     ref  ->   r8         class-hnd "spilling qmarkNull" <System.Globalization.NumberFormatInfo>
+;  V24 tmp17        [V24,T03] (  6,  3.19)     ref  ->   r8        
+;  V25 tmp18        [V25,T06] (  4,  2.06)     ref  ->  rax         single-def "ISINST eval op1"
+;  V26 tmp19        [V26,T13] (  3,  1.31)     ref  ->   r8         class-hnd "spilling qmarkNull" <System.Globalization.NumberFormatInfo>
+;  V27 tmp20        [V27,T14] (  2,  1   )     ref  ->   r8         class-hnd exact single-def "dup spill" <System.Globalization.NumberFormatInfo>
+;  V28 tmp21        [V28,T15] (  4,  0.88)     ref  ->   r8        
 ;* V29 tmp22        [V29    ] (  0,  0   )   byref  ->  zero-ref    "field V05._reference (fldOffset=0x0)" P-INDEP
 ;* V30 tmp23        [V30    ] (  0,  0   )     int  ->  zero-ref    "field V05._length (fldOffset=0x8)" P-INDEP
 ;* V31 tmp24        [V31    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V09._reference (fldOffset=0x0)" P-INDEP
 ;* V32 tmp25        [V32    ] (  0,  0   )     int  ->  zero-ref    "field V09._length (fldOffset=0x8)" P-INDEP
-;  V33 tmp26        [V33,T19] (  2,  0   )   byref  ->   r8         single-def "field V11._reference (fldOffset=0x0)" P-INDEP
-;* V34 tmp27        [V34,T20] (  0,  0   )     int  ->  zero-ref    ptr "field V11._length (fldOffset=0x8)" P-INDEP
-;  V35 tmp28        [V35,T10] (  2,  2   )   byref  ->  rdi         single-def "field V12._reference (fldOffset=0x0)" P-INDEP
-;  V36 tmp29        [V36,T13] (  2,  2   )     int  ->  rbp         "field V12._length (fldOffset=0x8)" P-INDEP
-;  V37 tmp30        [V37,T11] (  2,  2   )   byref  ->  rdi         single-def "field V13._reference (fldOffset=0x0)" P-INDEP
-;  V38 tmp31        [V38,T14] (  2,  2   )     int  ->  rbp         "field V13._length (fldOffset=0x8)" P-INDEP
-;  V39 tmp32        [V39    ] (  3,  0   )  struct (16) [rsp+0x40]  do-not-enreg[XSF] must-init addr-exposed ptr "by-value struct argument" <System.Span`1[ushort]>
-;  V40 tmp33        [V40    ] (  3,  6   )  struct (16) [rsp+0x30]  do-not-enreg[XSF] must-init addr-exposed "by-value struct argument" <System.ReadOnlySpan`1[ushort]>
-;  V41 GsCookie     [V41    ] (  1,  1   )    long  ->  [rsp+0x60]  do-not-enreg[X] addr-exposed "GSSecurityCookie"
-;  V42 tmp35        [V42,T12] (  2,  2   )   byref  ->  rsi         single-def "shadowVar"
+;  V33 tmp26        [V33,T18] (  2,  0   )   byref  ->   r8         single-def "field V11._reference (fldOffset=0x0)" P-INDEP
+;* V34 tmp27        [V34,T19] (  0,  0   )     int  ->  zero-ref    ptr "field V11._length (fldOffset=0x8)" P-INDEP
+;* V35 tmp28        [V35    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V12._reference (fldOffset=0x0)" P-INDEP
+;* V36 tmp29        [V36    ] (  0,  0   )     int  ->  zero-ref    "field V12._length (fldOffset=0x8)" P-INDEP
+;  V37 tmp30        [V37,T10] (  2,  2   )   byref  ->  rdi         single-def "field V13._reference (fldOffset=0x0)" P-INDEP
+;  V38 tmp31        [V38,T12] (  2,  2   )     int  ->  rbp         "field V13._length (fldOffset=0x8)" P-INDEP
+;  V39 tmp32        [V39,T09] (  3,  2   )   byref  ->  rdi         "V04.[000..008)"
+;  V40 tmp33        [V40,T04] (  4,  3   )     int  ->  rbp         "V04.[008..012)"
+;  V41 tmp34        [V41    ] (  3,  0   )  struct (16) [rsp+0x40]  do-not-enreg[XSF] must-init addr-exposed ptr "by-value struct argument" <System.Span`1[ushort]>
+;  V42 tmp35        [V42    ] (  3,  6   )  struct (16) [rsp+0x30]  do-not-enreg[XSF] must-init addr-exposed "by-value struct argument" <System.ReadOnlySpan`1[ushort]>
+;  V43 GsCookie     [V43    ] (  1,  1   )    long  ->  [rsp+0x60]  do-not-enreg[X] addr-exposed "GSSecurityCookie"
+;  V44 tmp37        [V44,T11] (  2,  2   )   byref  ->  rsi         single-def "shadowVar"
 ;
 ; Lcl frame size = 104
 
@@ -75,35 +77,37 @@ G_M7166_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0108 {rbx r8}, b
        ; byrRegs +[r8]
        mov      rsi, r8
        ; byrRegs +[rsi]
-       vxorps   xmm1, xmm1, xmm1
-       vmovdqu  xmmword ptr [rsp+0x50], xmm1
+       xor      rdi, rdi
+       ; byrRegs +[rdi]
+       xor      ebp, ebp
        movzx    rax, byte  ptr [rsp+0xA8]
        movzx    rcx, byte  ptr [rsp+0xA9]
        or       eax, ecx
-       jne      G_M7166_IG18
-						;; size=37 bbWeight=1 PerfScore 4.83
-G_M7166_IG03:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0048 {rbx rsi}, byref
+       jne      G_M7166_IG19
+						;; size=31 bbWeight=1 PerfScore 4.00
+G_M7166_IG03:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=00C8 {rbx rsi rdi}, byref
        ; byrRegs -[r8]
-       mov      rdi, bword ptr [rsp+0x50]
-       ; byrRegs +[rdi]
-       mov      ebp, dword ptr [rsp+0x58]
+       test     ebp, ebp
+       jl       G_M7166_IG20
+						;; size=8 bbWeight=1 PerfScore 1.25
+G_M7166_IG04:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=00C8 {rbx rsi rdi}, byref
        call     [System.Globalization.CultureInfo:get_InvariantCulture():System.Globalization.CultureInfo]
        ; gcrRegs +[rax]
        ; gcr arg pop 0
        test     rax, rax
-       je       G_M7166_IG19
-						;; size=24 bbWeight=1 PerfScore 6.25
-G_M7166_IG04:        ; bbWeight=0.50, gcrefRegs=0001 {rax}, byrefRegs=00C8 {rbx rsi rdi}, byref, isz
+       je       G_M7166_IG21
+						;; size=15 bbWeight=1 PerfScore 4.25
+G_M7166_IG05:        ; bbWeight=0.50, gcrefRegs=0001 {rax}, byrefRegs=00C8 {rbx rsi rdi}, byref, isz
        cmp      byte  ptr [rax+0x61], 0
-       jne      SHORT G_M7166_IG08
+       jne      SHORT G_M7166_IG09
 						;; size=6 bbWeight=0.50 PerfScore 2.00
-G_M7166_IG05:        ; bbWeight=0.25, gcrefRegs=0001 {rax}, byrefRegs=00C8 {rbx rsi rdi}, byref, isz
+G_M7166_IG06:        ; bbWeight=0.25, gcrefRegs=0001 {rax}, byrefRegs=00C8 {rbx rsi rdi}, byref, isz
        mov      r8, gword ptr [rax+0x18]
        ; gcrRegs +[r8]
        test     r8, r8
-       jne      SHORT G_M7166_IG07
+       jne      SHORT G_M7166_IG08
 						;; size=9 bbWeight=0.25 PerfScore 0.81
-G_M7166_IG06:        ; bbWeight=0.12, gcrefRegs=0001 {rax}, byrefRegs=00C8 {rbx rsi rdi}, byref
+G_M7166_IG07:        ; bbWeight=0.12, gcrefRegs=0001 {rax}, byrefRegs=00C8 {rbx rsi rdi}, byref
        ; gcrRegs -[r8]
        mov      rcx, rax
        ; gcrRegs +[rcx]
@@ -116,11 +120,11 @@ G_M7166_IG06:        ; bbWeight=0.12, gcrefRegs=0001 {rax}, byrefRegs=00C8 {rbx
        mov      r8, rax
        ; gcrRegs +[r8]
 						;; size=16 bbWeight=0.12 PerfScore 0.94
-G_M7166_IG07:        ; bbWeight=0.25, gcrefRegs=0100 {r8}, byrefRegs=00C8 {rbx rsi rdi}, byref, isz
+G_M7166_IG08:        ; bbWeight=0.25, gcrefRegs=0100 {r8}, byrefRegs=00C8 {rbx rsi rdi}, byref, isz
        ; gcrRegs -[rax]
-       jmp      SHORT G_M7166_IG13
+       jmp      SHORT G_M7166_IG14
 						;; size=2 bbWeight=0.25 PerfScore 0.50
-G_M7166_IG08:        ; bbWeight=0.75, gcrefRegs=0001 {rax}, byrefRegs=00C8 {rbx rsi rdi}, byref, isz
+G_M7166_IG09:        ; bbWeight=0.75, gcrefRegs=0001 {rax}, byrefRegs=00C8 {rbx rsi rdi}, byref, isz
        ; gcrRegs -[r8] +[rax]
        mov      rcx, 0xD1FFAB1E      ; System.Globalization.NumberFormatInfo
        xor      r11, r11
@@ -130,9 +134,9 @@ G_M7166_IG08:        ; bbWeight=0.75, gcrefRegs=0001 {rax}, byrefRegs=00C8 {rbx
        ; gcrRegs +[r8]
        cmove    r8, rax
        test     r8, r8
-       jne      SHORT G_M7166_IG13
+       jne      SHORT G_M7166_IG14
 						;; size=28 bbWeight=0.75 PerfScore 3.94
-G_M7166_IG09:        ; bbWeight=0.38, gcrefRegs=0001 {rax}, byrefRegs=00C8 {rbx rsi rdi}, byref, isz
+G_M7166_IG10:        ; bbWeight=0.38, gcrefRegs=0001 {rax}, byrefRegs=00C8 {rbx rsi rdi}, byref, isz
        ; gcrRegs -[r8 r11]
        mov      rcx, rax
        ; gcrRegs +[rcx]
@@ -143,18 +147,18 @@ G_M7166_IG09:        ; bbWeight=0.38, gcrefRegs=0001 {rax}, byrefRegs=00C8 {rbx
        ; gcrRegs -[rcx rdx]
        ; gcr arg pop 0
        test     rax, rax
-       jne      SHORT G_M7166_IG16
+       jne      SHORT G_M7166_IG17
 						;; size=31 bbWeight=0.38 PerfScore 1.88
-G_M7166_IG10:        ; bbWeight=0.19, gcrefRegs=0000 {}, byrefRegs=00C8 {rbx rsi rdi}, byref
+G_M7166_IG11:        ; bbWeight=0.19, gcrefRegs=0000 {}, byrefRegs=00C8 {rbx rsi rdi}, byref
        ; gcrRegs -[rax]
        xor      r8, r8
        ; gcrRegs +[r8]
 						;; size=3 bbWeight=0.19 PerfScore 0.05
-G_M7166_IG11:        ; bbWeight=0.38, gcrefRegs=0100 {r8}, byrefRegs=00C8 {rbx rsi rdi}, byref, isz
+G_M7166_IG12:        ; bbWeight=0.38, gcrefRegs=0100 {r8}, byrefRegs=00C8 {rbx rsi rdi}, byref, isz
        test     r8, r8
-       jne      SHORT G_M7166_IG13
+       jne      SHORT G_M7166_IG14
 						;; size=5 bbWeight=0.38 PerfScore 0.47
-G_M7166_IG12:        ; bbWeight=0.19, gcrefRegs=0000 {}, byrefRegs=00C8 {rbx rsi rdi}, byref
+G_M7166_IG13:        ; bbWeight=0.19, gcrefRegs=0000 {}, byrefRegs=00C8 {rbx rsi rdi}, byref
        ; gcrRegs -[r8]
        call     [System.Globalization.NumberFormatInfo:get_CurrentInfo():System.Globalization.NumberFormatInfo]
        ; gcrRegs +[rax]
@@ -162,7 +166,7 @@ G_M7166_IG12:        ; bbWeight=0.19, gcrefRegs=0000 {}, byrefRegs=00C8 {rbx rsi
        mov      r8, rax
        ; gcrRegs +[r8]
 						;; size=9 bbWeight=0.19 PerfScore 0.61
-G_M7166_IG13:        ; bbWeight=1, gcrefRegs=0100 {r8}, byrefRegs=00C8 {rbx rsi rdi}, byref, isz
+G_M7166_IG14:        ; bbWeight=1, gcrefRegs=0100 {r8}, byrefRegs=00C8 {rbx rsi rdi}, byref, isz
        ; gcrRegs -[rax]
        mov      bword ptr [rsp+0x30], rdi
        mov      dword ptr [rsp+0x38], ebp
@@ -178,13 +182,13 @@ G_M7166_IG13:        ; bbWeight=1, gcrefRegs=0100 {r8}, byrefRegs=00C8 {rbx rsi
        ; gcr arg pop 0
        mov      rcx, 0xD1FFAB1E
        cmp      qword ptr [rsp+0x60], rcx
-       je       SHORT G_M7166_IG14
+       je       SHORT G_M7166_IG15
        call     CORINFO_HELP_FAIL_FAST
 						;; size=59 bbWeight=1 PerfScore 14.00
-G_M7166_IG14:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+G_M7166_IG15:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        nop      
 						;; size=1 bbWeight=1 PerfScore 0.25
-G_M7166_IG15:        ; bbWeight=1, epilog, nogc, extend
+G_M7166_IG16:        ; bbWeight=1, epilog, nogc, extend
        add      rsp, 104
        pop      rbx
        pop      rbp
@@ -192,19 +196,19 @@ G_M7166_IG15:        ; bbWeight=1, epilog, nogc, extend
        pop      rdi
        ret      
 						;; size=9 bbWeight=1 PerfScore 3.25
-G_M7166_IG16:        ; bbWeight=0.19, gcVars=0000000000000000 {}, gcrefRegs=0001 {rax}, byrefRegs=00C8 {rbx rsi rdi}, gcvars, byref, isz
+G_M7166_IG17:        ; bbWeight=0.19, gcVars=0000000000000000 {}, gcrefRegs=0001 {rax}, byrefRegs=00C8 {rbx rsi rdi}, gcvars, byref, isz
        ; gcrRegs +[rax]
        ; byrRegs +[rbx rsi rdi]
        mov      r8, 0xD1FFAB1E      ; System.Globalization.NumberFormatInfo
        cmp      qword ptr [rax], r8
-       jne      SHORT G_M7166_IG10
+       jne      SHORT G_M7166_IG11
 						;; size=15 bbWeight=0.19 PerfScore 0.80
-G_M7166_IG17:        ; bbWeight=0.09, gcrefRegs=0001 {rax}, byrefRegs=00C8 {rbx rsi rdi}, byref, isz
+G_M7166_IG18:        ; bbWeight=0.09, gcrefRegs=0001 {rax}, byrefRegs=00C8 {rbx rsi rdi}, byref, isz
        mov      r8, rax
        ; gcrRegs +[r8]
-       jmp      SHORT G_M7166_IG11
+       jmp      SHORT G_M7166_IG12
 						;; size=5 bbWeight=0.09 PerfScore 0.21
-G_M7166_IG18:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0048 {rbx rsi}, byref
+G_M7166_IG19:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0048 {rbx rsi}, byref
        ; gcrRegs -[rax r8]
        ; byrRegs -[rdi]
        lea      r8, [rsp+0x28]
@@ -215,19 +219,32 @@ G_M7166_IG18:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0048 {rbx rsi},
        lea      rcx, [rsp+0xA8]
        call     [System.Buffers.StandardFormat:Format(System.Span`1[ushort]):System.Span`1[ushort]:this]
        ; gcr arg pop 0
-       jmp      G_M7166_IG03
-						;; size=47 bbWeight=0 PerfScore 0.00
-G_M7166_IG19:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=00C8 {rbx rsi rdi}, byref
+       mov      rdi, bword ptr [rsp+0x50]
        ; byrRegs +[rdi]
+       mov      ebp, dword ptr [rsp+0x58]
+       jmp      G_M7166_IG03
+						;; size=56 bbWeight=0 PerfScore 0.00
+G_M7166_IG20:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=00C8 {rbx rsi rdi}, byref
+       mov      rcx, 0xD1FFAB1E
+       ; gcrRegs +[rcx]
+       mov      rdx, 0xD1FFAB1E
+       ; gcrRegs +[rdx]
+       mov      rax, 0xD1FFAB1E      ; code for <unknown method>
+       call     [rax]<unknown method>
+       ; gcrRegs -[rcx rdx]
+       ; gcr arg pop 0
+       jmp      G_M7166_IG04
+						;; size=37 bbWeight=0 PerfScore 0.00
+G_M7166_IG21:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=00C8 {rbx rsi rdi}, byref
        call     [System.Globalization.NumberFormatInfo:get_CurrentInfo():System.Globalization.NumberFormatInfo]
        ; gcrRegs +[rax]
        ; gcr arg pop 0
        mov      r8, rax
        ; gcrRegs +[r8]
-       jmp      G_M7166_IG13
+       jmp      G_M7166_IG14
 						;; size=14 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 379, prolog size 39, PerfScore 52.86, instruction count 88, allocated bytes for code 379 (MethodHash=f77ce401) for method System.Buffers.Text.FormattingHelpers:TryFormat[double](double,System.Span`1[ubyte],byref,System.Buffers.StandardFormat):ubyte (Tier1)
+; Total bytes of code 418, prolog size 39, PerfScore 51.28, instruction count 95, allocated bytes for code 418 (MethodHash=f77ce401) for method System.Buffers.Text.FormattingHelpers:TryFormat[double](double,System.Span`1[ubyte],byref,System.Buffers.StandardFormat):ubyte (Tier1)

Looks like a case where regular promotion gets us a Span<T>._length field that we consider never-negative, while physical promotion doesn't have that refinement for its promotion. Opened #104573.

@jakobbotsch
Copy link
Member Author

benchmarks.run_pgo System.Threading.Channels.BoundedChannel`1+BoundedChannelWriter[System.__Canon]:WriteAsync
@@ -10,50 +10,50 @@
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T04] (  5,  3.50)     ref  ->  [rbp+0x10]  this class-hnd EH-live single-def <System.Threading.Channels.BoundedChannel`1+BoundedChannelWriter[System.__Canon]>
-;  V01 RetBuf       [V01,T03] ( 10,  4   )   byref  ->  [rbp+0x18]  EH-live single-def
+;  V01 RetBuf       [V01,T03] ( 12,  5.00)   byref  ->  [rbp+0x18]  EH-live single-def
 ;  V02 arg1         [V02,T06] (  9,  3.01)     ref  ->  rbx         class-hnd single-def <System.__Canon>
 ;* V03 arg2         [V03    ] (  0,  0   )  struct ( 8) zero-ref    ld-addr-op single-def <System.Threading.CancellationToken>
-;  V04 loc0         [V04,T13] (  5,  2.01)     ref  ->  r15         class-hnd <System.Threading.Channels.AsyncOperation`1[System.__Canon]>
-;  V05 loc1         [V05,T10] (  5,  2.73)     ref  ->  r13         ld-addr-op class-hnd <System.Threading.Channels.AsyncOperation`1[ubyte]>
-;  V06 loc2         [V06,T02] ( 26,  7.49)     ref  ->  [rbp-0x60]  class-hnd exact EH-live spill-single-def <System.Threading.Channels.BoundedChannel`1[System.__Canon]>
+;  V04 loc0         [V04,T13] (  5,  2.01)     ref  ->  rdi         class-hnd <System.Threading.Channels.AsyncOperation`1[System.__Canon]>
+;  V05 loc1         [V05,T10] (  5,  2.73)     ref  ->  r14         ld-addr-op class-hnd <System.Threading.Channels.AsyncOperation`1[ubyte]>
+;  V06 loc2         [V06,T02] ( 26,  7.49)     ref  ->  [rbp-0x68]  class-hnd exact EH-live spill-single-def <System.Threading.Channels.BoundedChannel`1[System.__Canon]>
 ;  V07 loc3         [V07    ] (  8,  5.00)   ubyte  ->  [rbp-0x48]  do-not-enreg[X] addr-exposed ld-addr-op
-;  V08 loc4         [V08,T11] (  3,  2.49)     int  ->  rsi        
-;  V09 loc5         [V09    ] ( 12,  2.00)  struct (16) [rbp-0x58]  do-not-enreg[XS] must-init addr-exposed <System.Threading.Tasks.ValueTask>
-;  V10 loc6         [V10,T36] (  3,  0   )     ref  ->  r14         class-hnd <System.Threading.Channels.AsyncOperation`1[System.__Canon]>
+;  V08 loc4         [V08,T11] (  3,  2.49)     int  ->  r15        
+;  V09 loc5         [V09,T33] (  4,  0   )  struct (16) [rbp-0x58]  do-not-enreg[HS] must-init hidden-struct-arg <System.Threading.Tasks.ValueTask>
+;  V10 loc6         [V10,T34] (  3,  0   )     ref  ->  rsi         class-hnd <System.Threading.Channels.AsyncOperation`1[System.__Canon]>
 ;* V11 loc7         [V11    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op <System.Threading.Tasks.ValueTask>
 ;* V12 loc8         [V12    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact single-def <System.Threading.Channels.VoidAsyncOperationWithData`1[System.__Canon]>
 ;  V13 loc9         [V13,T05] ( 11,  5.46)     ref  ->  r15         class-hnd exact single-def <System.Threading.Channels.VoidAsyncOperationWithData`1[System.__Canon]>
 ;* V14 loc10        [V14    ] (  0,  0   )     ref  ->  zero-ref    class-hnd single-def <System.__Canon>
 ;  V15 OutArgs      [V15    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
-;  V16 tmp1         [V16,T37] (  3,  0   )     ref  ->  r14        
-;  V17 tmp2         [V17,T34] (  4,  0   )     ref  ->  rax         class-hnd single-def "dup spill" <System.Action`1[System.__Canon]>
+;  V16 tmp1         [V16,T35] (  3,  0   )     ref  ->  rsi        
+;  V17 tmp2         [V17,T31] (  4,  0   )     ref  ->  rax         class-hnd single-def "dup spill" <System.Action`1[System.__Canon]>
 ;* V18 tmp3         [V18    ] (  0,  0   )     ref  ->  zero-ref    single-def
-;  V19 tmp4         [V19,T35] (  4,  0   )     ref  ->  rax         class-hnd single-def "dup spill" <System.Action`1[System.__Canon]>
+;  V19 tmp4         [V19,T32] (  4,  0   )     ref  ->  rax         class-hnd single-def "dup spill" <System.Action`1[System.__Canon]>
 ;* V20 tmp5         [V20    ] (  0,  0   )     ref  ->  zero-ref    single-def
-;  V21 tmp6         [V21,T33] (  5,  0   )     ref  ->  rsi         class-hnd exact single-def "NewObj constructor temp" <System.Threading.Channels.VoidAsyncOperationWithData`1[System.__Canon]>
+;  V21 tmp6         [V21,T30] (  5,  0   )     ref  ->  rdi         class-hnd exact single-def "NewObj constructor temp" <System.Threading.Channels.VoidAsyncOperationWithData`1[System.__Canon]>
 ;* V22 tmp7         [V22    ] (  0,  0   )    long  ->  zero-ref    "spilling helperCall"
 ;* V23 tmp8         [V23    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.Threading.Tasks.ValueTask>
 ;* V24 tmp9         [V24    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.Threading.Tasks.ValueTask>
-;* V25 tmp10        [V25,T18] (  0,  0   )   ubyte  ->  zero-ref    "Inline return value spill temp"
+;* V25 tmp10        [V25,T21] (  0,  0   )   ubyte  ->  zero-ref    "Inline return value spill temp"
 ;* V26 tmp11        [V26    ] (  0,  0   )     ref  ->  zero-ref    class-hnd "Inlining Arg" <System.Threading.CancellationTokenSource>
-;  V27 tmp12        [V27,T38] (  3,  0   )     ref  ->  rdx         class-hnd single-def "Inlining Arg" <System.Threading.Tasks.Task>
-;  V28 tmp13        [V28,T08] (  2,  4   )     ref  ->  rcx         class-hnd exact single-def "Inlining Arg" <System.Collections.Generic.Deque`1[System.__Canon]>
+;  V27 tmp12        [V27,T36] (  3,  0   )     ref  ->  rdx         class-hnd single-def "Inlining Arg" <System.Threading.Tasks.Task>
+;  V28 tmp13        [V28,T08] (  2,  4   )     ref  ->   r8         class-hnd exact single-def "Inlining Arg" <System.Collections.Generic.Deque`1[System.__Canon]>
 ;  V29 tmp14        [V29,T09] (  2,  4   )   ubyte  ->  rdx         "Inlining Arg"
 ;  V30 tmp15        [V30,T29] (  3,  0.01)     ref  ->  rax         class-hnd single-def "Inlining Arg" <System.Threading.Tasks.Task>
 ;* V31 tmp16        [V31    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "Inlining Arg" <System.Collections.Generic.Deque`1[System.__Canon]>
 ;* V32 tmp17        [V32    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "Inlining Arg" <System.Collections.Generic.Deque`1[System.__Canon]>
-;  V33 tmp18        [V33,T00] ( 12, 10.29)     ref  ->  r12         class-hnd exact single-def "Inlining Arg" <System.Collections.Generic.Deque`1[System.__Canon]>
+;  V33 tmp18        [V33,T00] ( 12, 10.29)     ref  ->  r13         class-hnd exact single-def "Inlining Arg" <System.Collections.Generic.Deque`1[System.__Canon]>
 ;  V34 tmp19        [V34,T15] (  3,  1.51)     int  ->  rax         "Inline stloc first use temp"
-;* V35 tmp20        [V35,T24] (  0,  0   )   ubyte  ->  zero-ref    "Inline return value spill temp"
-;  V36 tmp21        [V36,T01] ( 12, 10.17)     ref  ->  rsi         class-hnd exact single-def "Inlining Arg" <System.Collections.Generic.Deque`1[System.__Canon]>
+;* V35 tmp20        [V35,T25] (  0,  0   )   ubyte  ->  zero-ref    "Inline return value spill temp"
+;  V36 tmp21        [V36,T01] ( 12, 10.17)     ref  ->  r12         class-hnd exact single-def "Inlining Arg" <System.Collections.Generic.Deque`1[System.__Canon]>
 ;  V37 tmp22        [V37,T17] (  3,  1.49)     int  ->  rcx         "Inline stloc first use temp"
 ;* V38 tmp23        [V38    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.Threading.Tasks.ValueTask>
 ;  V39 tmp24        [V39,T14] (  2,  1.99)   short  ->  rcx         "Inlining Arg"
 ;* V40 tmp25        [V40    ] (  0,  0   )  struct ( 8) zero-ref    "Inlining Arg" <System.Threading.CancellationToken>
 ;* V41 tmp26        [V41    ] (  0,  0   )     ref  ->  zero-ref    class-hnd single-def "Inline stloc first use temp" <System.Threading.Channels.AsyncOperation`1[ubyte]>
-;  V42 tmp27        [V42,T23] (  3,  0.72)     ref  ->  rdi         class-hnd single-def "Inline stloc first use temp" <System.Threading.Channels.AsyncOperation`1[ubyte]>
+;  V42 tmp27        [V42,T24] (  3,  0.72)     ref  ->  r15         class-hnd single-def "Inline stloc first use temp" <System.Threading.Channels.AsyncOperation`1[ubyte]>
 ;  V43 tmp28        [V43,T12] ( 11,  2.16)     ref  ->  r12         class-hnd "Inline stloc first use temp" <System.Threading.Channels.AsyncOperation`1[ubyte]>
-;  V44 tmp29        [V44,T22] (  2,  0.96)     ref  ->  r13         class-hnd "impAppendStmt" <System.Threading.Channels.AsyncOperation`1[ubyte]>
+;  V44 tmp29        [V44,T23] (  2,  0.96)     ref  ->  r13         class-hnd "impAppendStmt" <System.Threading.Channels.AsyncOperation`1[ubyte]>
 ;* V45 tmp30        [V45    ] (  0,  0   )     ref  ->  zero-ref   
 ;* V46 tmp31        [V46    ] (  0,  0   )     ref  ->  zero-ref   
 ;* V47 tmp32        [V47    ] (  0,  0   )   ubyte  ->  zero-ref    "Inline return value spill temp"
@@ -61,29 +61,32 @@
 ;* V49 tmp34        [V49    ] (  0,  0   )  struct ( 8) zero-ref    ld-addr-op "Inline stloc first use temp" <System.Threading.CancellationToken>
 ;* V50 tmp35        [V50,T28] (  0,  0   )   ubyte  ->  zero-ref    "Inline return value spill temp"
 ;* V51 tmp36        [V51    ] (  0,  0   )  struct ( 8) zero-ref    ld-addr-op "Inline stloc first use temp" <System.Threading.CancellationToken>
-;  V52 tmp37        [V52,T07] (  6,  2.50)     ref  ->  r14         single-def "field V03._source (fldOffset=0x0)" P-INDEP
+;  V52 tmp37        [V52,T07] (  6,  2.50)     ref  ->  rsi         single-def "field V03._source (fldOffset=0x0)" P-INDEP
 ;* V53 tmp38        [V53    ] (  0,  0   )     ref  ->  zero-ref    "field V11._obj (fldOffset=0x0)" P-INDEP
 ;* V54 tmp39        [V54    ] (  0,  0   )   short  ->  zero-ref    "field V11._token (fldOffset=0x8)" P-INDEP
 ;* V55 tmp40        [V55    ] (  0,  0   )   ubyte  ->  zero-ref    "field V11._continueOnCapturedContext (fldOffset=0xa)" P-INDEP
-;  V56 tmp41        [V56,T30] (  2,  0.00)     ref  ->  rax         single-def "field V23._obj (fldOffset=0x0)" P-INDEP
-;* V57 tmp42        [V57,T31] (  0,  0   )   short  ->  zero-ref    "field V23._token (fldOffset=0x8)" P-INDEP
-;* V58 tmp43        [V58,T32] (  0,  0   )   ubyte  ->  zero-ref    "field V23._continueOnCapturedContext (fldOffset=0xa)" P-INDEP
-;  V59 tmp44        [V59,T42] (  2,  0   )     ref  ->  rdx         single-def "field V24._obj (fldOffset=0x0)" P-INDEP
-;* V60 tmp45        [V60,T43] (  0,  0   )   short  ->  zero-ref    "field V24._token (fldOffset=0x8)" P-INDEP
-;* V61 tmp46        [V61,T44] (  0,  0   )   ubyte  ->  zero-ref    "field V24._continueOnCapturedContext (fldOffset=0xa)" P-INDEP
-;  V62 tmp47        [V62,T20] (  2,  0.99)     ref  ->  r15         single-def "field V38._obj (fldOffset=0x0)" P-INDEP
-;  V63 tmp48        [V63,T21] (  2,  0.99)   short  ->  rcx         "field V38._token (fldOffset=0x8)" P-INDEP
-;* V64 tmp49        [V64,T25] (  0,  0   )   ubyte  ->  zero-ref    "field V38._continueOnCapturedContext (fldOffset=0xa)" P-INDEP
+;* V56 tmp41        [V56    ] (  0,  0   )     ref  ->  zero-ref    single-def "field V23._obj (fldOffset=0x0)" P-INDEP
+;* V57 tmp42        [V57    ] (  0,  0   )   short  ->  zero-ref    "field V23._token (fldOffset=0x8)" P-INDEP
+;* V58 tmp43        [V58    ] (  0,  0   )   ubyte  ->  zero-ref    "field V23._continueOnCapturedContext (fldOffset=0xa)" P-INDEP
+;  V59 tmp44        [V59,T40] (  2,  0   )     ref  ->  rdx         single-def "field V24._obj (fldOffset=0x0)" P-INDEP
+;* V60 tmp45        [V60,T41] (  0,  0   )   short  ->  zero-ref    "field V24._token (fldOffset=0x8)" P-INDEP
+;* V61 tmp46        [V61,T42] (  0,  0   )   ubyte  ->  zero-ref    "field V24._continueOnCapturedContext (fldOffset=0xa)" P-INDEP
+;* V62 tmp47        [V62    ] (  0,  0   )     ref  ->  zero-ref    single-def "field V38._obj (fldOffset=0x0)" P-INDEP
+;* V63 tmp48        [V63    ] (  0,  0   )   short  ->  zero-ref    "field V38._token (fldOffset=0x8)" P-INDEP
+;* V64 tmp49        [V64    ] (  0,  0   )   ubyte  ->  zero-ref    "field V38._continueOnCapturedContext (fldOffset=0xa)" P-INDEP
 ;* V65 tmp50        [V65    ] (  0,  0   )     ref  ->  zero-ref    "field V40._source (fldOffset=0x0)" P-INDEP
 ;  V66 tmp51        [V66,T26] (  2,  0.48)     ref  ->  rcx         "field V49._source (fldOffset=0x0)" P-INDEP
 ;  V67 tmp52        [V67,T27] (  2,  0.48)     ref  ->  rcx         "field V51._source (fldOffset=0x0)" P-INDEP
-;  V68 PSPSym       [V68,T19] (  1,  1   )    long  ->  [rbp-0x70]  do-not-enreg[V] "PSPSym"
-;  V69 cse0         [V69,T16] (  3,  1.49)     ref  ->  [rbp-0x68]  spill-single-def "CSE #05: moderate"
-;  V70 rat0         [V70,T39] (  3,  0   )    long  ->  rcx         "Spilling to split statement for tree"
-;  V71 rat1         [V71,T40] (  3,  0   )    long  ->  rax         "runtime lookup"
-;  V72 rat2         [V72,T41] (  3,  0   )    long  ->  rax         "fgMakeTemp is creating a new local variable"
+;  V68 tmp53        [V68,T18] (  8,  1.00)     ref  ->  [rbp-0x70]  do-not-enreg[M] must-init EH-live "V09.[000..008)"
+;  V69 tmp54        [V69,T19] (  8,  1.00)   short  ->  [rbp-0x5C]  do-not-enreg[Z] EH-live "V09.[008..010)"
+;  V70 tmp55        [V70,T20] (  8,  1.00)   ubyte  ->  [rbp-0x60]  do-not-enreg[Z] EH-live "V09.[010..011)"
+;  V71 PSPSym       [V71,T22] (  1,  1   )    long  ->  [rbp-0x80]  do-not-enreg[V] "PSPSym"
+;  V72 cse0         [V72,T16] (  3,  1.49)     ref  ->  [rbp-0x78]  spill-single-def "CSE #05: moderate"
+;  V73 rat0         [V73,T37] (  3,  0   )    long  ->  rax         "Spilling to split statement for tree"
+;  V74 rat1         [V74,T38] (  3,  0   )    long  ->   r8         "runtime lookup"
+;  V75 rat2         [V75,T39] (  3,  0   )    long  ->   r8         "fgMakeTemp is creating a new local variable"
 ;
-; Lcl frame size = 88
+; Lcl frame size = 104
 
 G_M20076_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
@@ -94,174 +97,185 @@ G_M20076_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
        push     rdi
        push     rsi
        push     rbx
-       sub      rsp, 88
-       lea      rbp, [rsp+0x90]
+       sub      rsp, 104
+       lea      rbp, [rsp+0xA0]
+       vxorps   xmm4, xmm4, xmm4
+       vmovdqu  ymmword ptr [rbp-0x70], ymm4
        xor      eax, eax
-       mov      qword ptr [rbp-0x58], rax
-       mov      qword ptr [rbp-0x70], rsp
+       mov      qword ptr [rbp-0x50], rax
+       mov      qword ptr [rbp-0x80], rsp
        mov      qword ptr [rbp-0x40], rcx
        mov      gword ptr [rbp+0x10], rcx
        ; GC ptr vars +{V00}
        mov      bword ptr [rbp+0x18], rdx
        ; GC ptr vars +{V01}
-       mov      r10, rcx
-       ; gcrRegs +[r10]
        mov      rbx, r8
        ; gcrRegs +[rbx]
-       mov      r14, r9
-       ; gcrRegs +[r14]
-						;; size=55 bbWeight=1 PerfScore 14.75
-G_M20076_IG02:        ; bbWeight=1, gcVars=0000000000000018 {V00 V01}, gcrefRegs=4408 {rbx r10 r14}, byrefRegs=0000 {}, gcvars, byref
-       test     r14, r14
-       jne      G_M20076_IG54
-						;; size=9 bbWeight=1 PerfScore 1.25
-G_M20076_IG03:        ; bbWeight=1, gcrefRegs=4408 {rbx r10 r14}, byrefRegs=0000 {}, byref
-       xor      r15, r15
-       ; gcrRegs +[r15]
-       xor      r13, r13
-       ; gcrRegs +[r13]
-       mov      rdx, gword ptr [r10+0x08]
-       ; gcrRegs +[rdx]
-       mov      gword ptr [rbp-0x60], rdx
-       ; GC ptr vars +{V06}
-       xor      ecx, ecx
-       mov      dword ptr [rbp-0x48], ecx
-						;; size=19 bbWeight=1 PerfScore 4.75
-G_M20076_IG04:        ; bbWeight=1, gcVars=000000000000001C {V00 V01 V06}, gcrefRegs=E00C {rdx rbx r13 r14 r15}, byrefRegs=0000 {}, gcvars, byref
-       ; gcrRegs -[r10]
-       mov      rcx, gword ptr [rdx+0x28]
+       mov      rsi, r9
+       ; gcrRegs +[rsi]
+						;; size=61 bbWeight=1 PerfScore 16.83
+G_M20076_IG02:        ; bbWeight=1, gcVars=0000000000040018 {V00 V01 V68}, gcrefRegs=004A {rcx rbx rsi}, byrefRegs=0000 {}, gcvars, byref
        ; gcrRegs +[rcx]
+       ; GC ptr vars +{V68}
+       test     rsi, rsi
+       jne      G_M20076_IG62
+						;; size=9 bbWeight=1 PerfScore 1.25
+G_M20076_IG03:        ; bbWeight=1, gcrefRegs=004A {rcx rbx rsi}, byrefRegs=0000 {}, byref
+       xor      rdi, rdi
+       ; gcrRegs +[rdi]
+       xor      r14, r14
+       ; gcrRegs +[r14]
+       mov      rdx, gword ptr [rcx+0x08]
+       ; gcrRegs +[rdx]
+       mov      gword ptr [rbp-0x68], rdx
+       ; GC ptr vars +{V06}
+       xor      r8d, r8d
+       mov      dword ptr [rbp-0x48], r8d
+						;; size=20 bbWeight=1 PerfScore 4.75
+G_M20076_IG04:        ; bbWeight=1, gcVars=000000000004001C {V00 V01 V06 V68}, gcrefRegs=40CC {rdx rbx rsi rdi r14}, byrefRegs=0000 {}, gcvars, byref
+       ; gcrRegs -[rcx]
+       mov      r8, gword ptr [rdx+0x28]
+       ; gcrRegs +[r8]
        cmp      byte  ptr [rbp-0x48], 0
-       jne      G_M20076_IG25
+       jne      G_M20076_IG27
        lea      rdx, [rbp-0x48]
        ; gcrRegs -[rdx]
+       mov      rcx, r8
+       ; gcrRegs +[rcx]
        call     <unknown method>
-       ; gcrRegs -[rcx]
+       ; gcrRegs -[rcx r8]
        ; gcr arg pop 0
        movzx    rdx, byte  ptr [rbp-0x48]
        test     edx, edx
-       je       G_M20076_IG22
-						;; size=35 bbWeight=1 PerfScore 8.75
-G_M20076_IG05:        ; bbWeight=1, gcrefRegs=E008 {rbx r13 r14 r15}, byrefRegs=0000 {}, byref
-       mov      rdx, gword ptr [rbp-0x60]
+       je       G_M20076_IG24
+						;; size=38 bbWeight=1 PerfScore 9.00
+G_M20076_IG05:        ; bbWeight=1, gcrefRegs=40C8 {rbx rsi rdi r14}, byrefRegs=0000 {}, byref
+       mov      rdx, gword ptr [rbp-0x68]
        ; gcrRegs +[rdx]
        cmp      gword ptr [rdx+0x50], 0
-       jne      G_M20076_IG21
+       jne      G_M20076_IG23
 						;; size=15 bbWeight=1 PerfScore 5.00
-G_M20076_IG06:        ; bbWeight=1.00, gcrefRegs=E00C {rdx rbx r13 r14 r15}, byrefRegs=0000 {}, byref
+G_M20076_IG06:        ; bbWeight=1.00, gcrefRegs=40CC {rdx rbx rsi rdi r14}, byrefRegs=0000 {}, byref
        mov      rcx, gword ptr [rdx+0x28]
        ; gcrRegs +[rcx]
-       mov      esi, dword ptr [rcx+0x18]
-       test     esi, esi
-       jne      G_M20076_IG14
-						;; size=15 bbWeight=1.00 PerfScore 5.24
-G_M20076_IG07:        ; bbWeight=0.50, gcrefRegs=A00C {rdx rbx r13 r15}, byrefRegs=0000 {}, byref
-       ; gcrRegs -[rcx r14]
+       mov      r15d, dword ptr [rcx+0x18]
+       test     r15d, r15d
+       jne      G_M20076_IG16
+						;; size=17 bbWeight=1.00 PerfScore 5.24
+G_M20076_IG07:        ; bbWeight=0.50, gcrefRegs=408C {rdx rbx rdi r14}, byrefRegs=0000 {}, byref
+       ; gcrRegs -[rcx rsi]
        mov      rcx, gword ptr [rdx+0x30]
        ; gcrRegs +[rcx]
        cmp      dword ptr [rcx+0x18], 0
-       jne      G_M20076_IG23
+       jne      G_M20076_IG25
 						;; size=14 bbWeight=0.50 PerfScore 3.01
-G_M20076_IG08:        ; bbWeight=0.50, gcrefRegs=A00C {rdx rbx r13 r15}, byrefRegs=0000 {}, byref
+G_M20076_IG08:        ; bbWeight=0.50, gcrefRegs=408C {rdx rbx rdi r14}, byrefRegs=0000 {}, byref
        ; gcrRegs -[rcx]
-       test     r15, r15
-       jne      G_M20076_IG38
-       mov      r12, gword ptr [rdx+0x28]
-       ; gcrRegs +[r12]
-       mov      ecx, dword ptr [r12+0x18]
-       mov      r8, gword ptr [r12+0x08]
+       test     rdi, rdi
+       jne      G_M20076_IG46
+       mov      r13, gword ptr [rdx+0x28]
+       ; gcrRegs +[r13]
+       mov      ecx, dword ptr [r13+0x18]
+       mov      r8, gword ptr [r13+0x08]
        ; gcrRegs +[r8]
        cmp      ecx, dword ptr [r8+0x08]
-       je       G_M20076_IG24
-						;; size=33 bbWeight=0.50 PerfScore 5.65
-G_M20076_IG09:        ; bbWeight=0.50, gcrefRegs=9008 {rbx r12 r15}, byrefRegs=0000 {}, byref, isz
-       ; gcrRegs -[rdx r8 r13]
-       movsxd   rdx, dword ptr [r12+0x14]
-       mov      rcx, gword ptr [r12+0x08]
+       je       G_M20076_IG26
+						;; size=31 bbWeight=0.50 PerfScore 5.65
+G_M20076_IG09:        ; bbWeight=0.50, gcrefRegs=2088 {rbx rdi r13}, byrefRegs=0000 {}, byref, isz
+       ; gcrRegs -[rdx r8 r14]
+       movsxd   rdx, dword ptr [r13+0x14]
+       mov      rcx, gword ptr [r13+0x08]
        ; gcrRegs +[rcx]
        mov      r8, rbx
        ; gcrRegs +[r8]
        call     CORINFO_HELP_ARRADDR_ST
        ; gcrRegs -[rcx r8]
        ; gcr arg pop 0
-       mov      eax, dword ptr [r12+0x14]
+       mov      eax, dword ptr [r13+0x14]
        inc      eax
-       mov      dword ptr [r12+0x14], eax
-       mov      rcx, gword ptr [r12+0x08]
+       mov      dword ptr [r13+0x14], eax
+       mov      rcx, gword ptr [r13+0x08]
        ; gcrRegs +[rcx]
        cmp      dword ptr [rcx+0x08], eax
        je       SHORT G_M20076_IG12
-						;; size=40 bbWeight=0.50 PerfScore 8.28
-G_M20076_IG10:        ; bbWeight=0.50, gcrefRegs=9008 {rbx r12 r15}, byrefRegs=0000 {}, byref, isz
+						;; size=35 bbWeight=0.50 PerfScore 8.28
+G_M20076_IG10:        ; bbWeight=0.50, gcrefRegs=2088 {rbx rdi r13}, byrefRegs=0000 {}, byref, isz
        ; gcrRegs -[rcx]
-       inc      dword ptr [r12+0x18]
-       mov      rdx, gword ptr [rbp-0x60]
+       inc      dword ptr [r13+0x18]
+       mov      rdx, gword ptr [rbp-0x68]
        ; gcrRegs +[rdx]
-       mov      r13, gword ptr [rdx+0x40]
-       ; gcrRegs +[r13]
-       test     r13, r13
+       mov      r14, gword ptr [rdx+0x40]
+       ; gcrRegs +[r14]
+       test     r14, r14
        je       SHORT G_M20076_IG13
-						;; size=18 bbWeight=0.50 PerfScore 3.64
-G_M20076_IG11:        ; bbWeight=0.50, gcrefRegs=A00C {rdx rbx r13 r15}, byrefRegs=0000 {}, byref
-       ; gcrRegs -[r12]
+						;; size=17 bbWeight=0.50 PerfScore 3.64
+G_M20076_IG11:        ; bbWeight=0.50, gcrefRegs=408C {rdx rbx rdi r14}, byrefRegs=0000 {}, byref
+       ; gcrRegs -[r13]
        xor      rax, rax
        ; gcrRegs +[rax]
        mov      gword ptr [rdx+0x40], rax
-       jmp      G_M20076_IG38
+       jmp      G_M20076_IG46
 						;; size=11 bbWeight=0.50 PerfScore 1.62
-G_M20076_IG12:        ; bbWeight=0.13, gcrefRegs=9008 {rbx r12 r15}, byrefRegs=0000 {}, byref, isz
-       ; gcrRegs -[rax rdx r13] +[r12]
+G_M20076_IG12:        ; bbWeight=0.13, gcrefRegs=2088 {rbx rdi r13}, byrefRegs=0000 {}, byref, isz
+       ; gcrRegs -[rax rdx r14] +[r13]
        xor      eax, eax
-       mov      dword ptr [r12+0x14], eax
+       mov      dword ptr [r13+0x14], eax
        jmp      SHORT G_M20076_IG10
-						;; size=9 bbWeight=0.13 PerfScore 0.41
+						;; size=8 bbWeight=0.13 PerfScore 0.41
 G_M20076_IG13:        ; bbWeight=0.00, gcrefRegs=0004 {rdx}, byrefRegs=0000 {}, byref
-       ; gcrRegs -[rbx r12 r15] +[rdx]
-       vxorps   xmm0, xmm0, xmm0
-       vmovdqu  xmmword ptr [rbp-0x58], xmm0
-       jmp      G_M20076_IG50
-						;; size=14 bbWeight=0.00 PerfScore 0.01
-G_M20076_IG14:        ; bbWeight=0.50, gcrefRegs=400C {rdx rbx r14}, byrefRegs=0000 {}, byref
-       ; gcrRegs +[rbx r14]
-       cmp      esi, dword ptr [rdx+0x5C]
-       jl       G_M20076_IG36
+       ; gcrRegs -[rbx rdi r13] +[rdx]
+       xor      rax, rax
+       ; gcrRegs +[rax]
+       mov      gword ptr [rbp-0x70], rax
+						;; size=6 bbWeight=0.00 PerfScore 0.00
+G_M20076_IG14:        ; bbWeight=0.00, gcrefRegs=0004 {rdx}, byrefRegs=0000 {}, byref
+       ; gcrRegs -[rax]
+       mov      dword ptr [rbp-0x5C], eax
+						;; size=3 bbWeight=0.00 PerfScore 0.00
+G_M20076_IG15:        ; bbWeight=0.00, gcrefRegs=0004 {rdx}, byrefRegs=0000 {}, byref
+       mov      dword ptr [rbp-0x60], eax
+       jmp      G_M20076_IG58
+						;; size=8 bbWeight=0.00 PerfScore 0.01
+G_M20076_IG16:        ; bbWeight=0.50, gcrefRegs=004C {rdx rbx rsi}, byrefRegs=0000 {}, byref
+       ; gcrRegs +[rbx rsi]
+       cmp      r15d, dword ptr [rdx+0x5C]
+       jl       G_M20076_IG42
        cmp      dword ptr [rdx+0x58], 0
-       jne      G_M20076_IG30
-       test     r14, r14
-       jne      G_M20076_IG27
-       mov      r10, gword ptr [rbp+0x10]
-       ; gcrRegs +[r10]
-       mov      r15, gword ptr [r10+0x10]
+       jne      G_M20076_IG32
+       test     rsi, rsi
+       jne      G_M20076_IG29
+       mov      rcx, gword ptr [rbp+0x10]
+       ; gcrRegs +[rcx]
+       mov      r15, gword ptr [rcx+0x10]
        ; gcrRegs +[r15]
-       lea      rcx, bword ptr [r15+0x10]
-       ; byrRegs +[rcx]
-       xor      r8, r8
-       ; gcrRegs +[r8]
+       lea      r8, bword ptr [r15+0x10]
+       ; byrRegs +[r8]
+       xor      r10, r10
+       ; gcrRegs +[r10]
        mov      rax, 0xD1FFAB1E      ; const ptr
        mov      rax, gword ptr [rax]
        ; gcrRegs +[rax]
-       mov      gword ptr [rbp-0x68], rax
-       ; GC ptr vars +{V69}
+       mov      gword ptr [rbp-0x78], rax
+       ; GC ptr vars +{V72}
        lock     
-       cmpxchg  gword ptr [rcx], r8
-       cmp      rax, gword ptr [rbp-0x68]
-       jne      G_M20076_IG27
-       xor      rcx, rcx
-       ; gcrRegs +[rcx]
-       ; byrRegs -[rcx]
-       mov      gword ptr [r15+0x18], rcx
+       cmpxchg  gword ptr [r8], r10
+       cmp      rax, gword ptr [rbp-0x78]
+       jne      G_M20076_IG29
+       xor      rax, rax
+       mov      gword ptr [r15+0x18], rax
        mov      byte  ptr [r15+0x50], 0
-						;; size=86 bbWeight=0.50 PerfScore 19.61
-G_M20076_IG15:        ; bbWeight=0.50, gcrefRegs=8008 {rbx r15}, byrefRegs=0000 {}, byref
-       ; gcrRegs -[rax rcx rdx r8 r10 r14]
-       ; GC ptr vars -{V69}
-       mov      gword ptr [r15+0x08], rcx
-						;; size=4 bbWeight=0.50 PerfScore 0.50
-G_M20076_IG16:        ; bbWeight=0.50, gcrefRegs=8008 {rbx r15}, byrefRegs=0000 {}, byref
-       mov      gword ptr [r15+0x20], rcx
-						;; size=4 bbWeight=0.50 PerfScore 0.50
+						;; size=87 bbWeight=0.50 PerfScore 19.61
 G_M20076_IG17:        ; bbWeight=0.50, gcrefRegs=8008 {rbx r15}, byrefRegs=0000 {}, byref
-       mov      gword ptr [r15+0x28], rcx
+       ; gcrRegs -[rax rcx rdx rsi r10]
+       ; byrRegs -[r8]
+       ; GC ptr vars -{V72}
+       mov      gword ptr [r15+0x08], rax
+						;; size=4 bbWeight=0.50 PerfScore 0.50
+G_M20076_IG18:        ; bbWeight=0.50, gcrefRegs=8008 {rbx r15}, byrefRegs=0000 {}, byref
+       mov      gword ptr [r15+0x20], rax
+						;; size=4 bbWeight=0.50 PerfScore 0.50
+G_M20076_IG19:        ; bbWeight=0.50, gcrefRegs=8008 {rbx r15}, byrefRegs=0000 {}, byref
+       mov      gword ptr [r15+0x28], rax
        lea      rcx, bword ptr [r15+0x60]
        ; byrRegs +[rcx]
        mov      rdx, rbx
@@ -269,50 +283,50 @@ G_M20076_IG17:        ; bbWeight=0.50, gcrefRegs=8008 {rbx r15}, byrefRegs=0000
        call     CORINFO_HELP_ASSIGN_REF
        ; gcrRegs -[rdx rbx]
        ; byrRegs -[rcx]
-       mov      rdx, gword ptr [rbp-0x60]
+       mov      rdx, gword ptr [rbp-0x68]
        ; gcrRegs +[rdx]
-       mov      rsi, gword ptr [rdx+0x38]
-       ; gcrRegs +[rsi]
-       mov      ecx, dword ptr [rsi+0x18]
-       mov      r8, gword ptr [rsi+0x08]
+       mov      r12, gword ptr [rdx+0x38]
+       ; gcrRegs +[r12]
+       mov      ecx, dword ptr [r12+0x18]
+       mov      r8, gword ptr [r12+0x08]
        ; gcrRegs +[r8]
        cmp      ecx, dword ptr [r8+0x08]
-       je       G_M20076_IG26
-						;; size=41 bbWeight=0.50 PerfScore 6.82
-G_M20076_IG18:        ; bbWeight=0.50, gcrefRegs=8040 {rsi r15}, byrefRegs=0000 {}, byref, isz
+       je       G_M20076_IG28
+						;; size=44 bbWeight=0.50 PerfScore 6.82
+G_M20076_IG20:        ; bbWeight=0.50, gcrefRegs=9000 {r12 r15}, byrefRegs=0000 {}, byref, isz
        ; gcrRegs -[rdx r8]
-       movsxd   rdx, dword ptr [rsi+0x14]
-       mov      rcx, gword ptr [rsi+0x08]
+       movsxd   rdx, dword ptr [r12+0x14]
+       mov      rcx, gword ptr [r12+0x08]
        ; gcrRegs +[rcx]
        mov      r8, r15
        ; gcrRegs +[r8]
        call     CORINFO_HELP_ARRADDR_ST
        ; gcrRegs -[rcx r8]
        ; gcr arg pop 0
-       mov      ecx, dword ptr [rsi+0x14]
+       mov      ecx, dword ptr [r12+0x14]
        inc      ecx
-       mov      dword ptr [rsi+0x14], ecx
-       mov      rax, gword ptr [rsi+0x08]
+       mov      dword ptr [r12+0x14], ecx
+       mov      rax, gword ptr [r12+0x08]
        ; gcrRegs +[rax]
        cmp      dword ptr [rax+0x08], ecx
-       je       SHORT G_M20076_IG20
-						;; size=33 bbWeight=0.50 PerfScore 8.19
-G_M20076_IG19:        ; bbWeight=0.50, gcrefRegs=8040 {rsi r15}, byrefRegs=0000 {}, byref
+       je       SHORT G_M20076_IG22
+						;; size=40 bbWeight=0.50 PerfScore 8.19
+G_M20076_IG21:        ; bbWeight=0.50, gcrefRegs=9000 {r12 r15}, byrefRegs=0000 {}, byref
        ; gcrRegs -[rax]
-       inc      dword ptr [rsi+0x18]
+       inc      dword ptr [r12+0x18]
        movsx    rcx, word  ptr [r15+0x3C]
-       mov      gword ptr [rbp-0x58], r15
-       mov      word  ptr [rbp-0x50], cx
-       mov      byte  ptr [rbp-0x4E], 1
-       jmp      G_M20076_IG50
-						;; size=25 bbWeight=0.50 PerfScore 5.96
-G_M20076_IG20:        ; bbWeight=0.12, gcrefRegs=8040 {rsi r15}, byrefRegs=0000 {}, byref, isz
+       mov      gword ptr [rbp-0x70], r15
+       mov      dword ptr [rbp-0x5C], ecx
+       mov      dword ptr [rbp-0x60], 1
+       jmp      G_M20076_IG58
+						;; size=29 bbWeight=0.50 PerfScore 5.96
+G_M20076_IG22:        ; bbWeight=0.12, gcrefRegs=9000 {r12 r15}, byrefRegs=0000 {}, byref, isz
        xor      ecx, ecx
-       mov      dword ptr [rsi+0x14], ecx
-       jmp      SHORT G_M20076_IG19
-						;; size=7 bbWeight=0.12 PerfScore 0.40
-G_M20076_IG21:        ; bbWeight=0.00, gcrefRegs=0004 {rdx}, byrefRegs=0000 {}, byref
-       ; gcrRegs -[rsi r15] +[rdx]
+       mov      dword ptr [r12+0x14], ecx
+       jmp      SHORT G_M20076_IG21
+						;; size=9 bbWeight=0.12 PerfScore 0.40
+G_M20076_IG23:        ; bbWeight=0.00, gcrefRegs=0004 {rdx}, byrefRegs=0000 {}, byref
+       ; gcrRegs -[r12 r15] +[rdx]
        mov      rcx, gword ptr [rdx+0x50]
        ; gcrRegs +[rcx]
        call     [System.Threading.Channels.ChannelUtilities:CreateInvalidCompletionException(System.Exception):System.Exception]
@@ -324,14 +338,15 @@ G_M20076_IG21:        ; bbWeight=0.00, gcrefRegs=0004 {rdx}, byrefRegs=0000 {},
        ; gcrRegs -[rcx]
        ; gcr arg pop 0
        test     rax, rax
-       je       G_M20076_IG37
-       mov      gword ptr [rbp-0x58], rax
-       mov      word  ptr [rbp-0x50], 0
-       mov      byte  ptr [rbp-0x4E], 1
-       jmp      G_M20076_IG50
-						;; size=47 bbWeight=0.00 PerfScore 0.03
-G_M20076_IG22:        ; bbWeight=0, gcrefRegs=E008 {rbx r13 r14 r15}, byrefRegs=0000 {}, byref
-       ; gcrRegs -[rax] +[rbx r13-r15]
+       je       G_M20076_IG45
+       mov      gword ptr [rbp-0x70], rax
+       xor      ecx, ecx
+       mov      dword ptr [rbp-0x5C], ecx
+       mov      dword ptr [rbp-0x60], 1
+       jmp      G_M20076_IG58
+						;; size=49 bbWeight=0.00 PerfScore 0.03
+G_M20076_IG24:        ; bbWeight=0, gcrefRegs=40C8 {rbx rsi rdi r14}, byrefRegs=0000 {}, byref
+       ; gcrRegs -[rax] +[rbx rsi rdi r14]
        mov      rcx, 0xD1FFAB1E
        ; gcrRegs +[rcx]
        mov      rdx, 0xD1FFAB1E
@@ -341,117 +356,127 @@ G_M20076_IG22:        ; bbWeight=0, gcrefRegs=E008 {rbx r13 r14 r15}, byrefRegs=
        ; gcr arg pop 0
        jmp      G_M20076_IG05
 						;; size=31 bbWeight=0 PerfScore 0.00
-G_M20076_IG23:        ; bbWeight=0, gcrefRegs=A00C {rdx rbx r13 r15}, byrefRegs=0000 {}, byref
-       ; gcrRegs -[r14] +[rdx]
+G_M20076_IG25:        ; bbWeight=0, gcrefRegs=408C {rdx rbx rdi r14}, byrefRegs=0000 {}, byref
+       ; gcrRegs -[rsi] +[rdx]
        mov      rcx, gword ptr [rdx+0x30]
        ; gcrRegs +[rcx]
        cmp      dword ptr [rcx], ecx
        call     [System.Collections.Generic.Deque`1[System.__Canon]:DequeueHead():System.__Canon:this]
        ; gcrRegs -[rcx rdx] +[rax]
        ; gcr arg pop 0
-       mov      r14, rax
-       ; gcrRegs +[r14]
-       mov      rcx, r14
+       mov      rsi, rax
+       ; gcrRegs +[rsi]
+       mov      rcx, rsi
        ; gcrRegs +[rcx]
        cmp      dword ptr [rcx], ecx
        call     [System.Threading.Channels.AsyncOperation`1[System.__Canon]:UnregisterCancellation():ubyte:this]
        ; gcrRegs -[rax rcx]
        ; gcr arg pop 0
        test     eax, eax
-       mov      rdx, gword ptr [rbp-0x60]
+       mov      rdx, gword ptr [rbp-0x68]
        ; gcrRegs +[rdx]
        je       G_M20076_IG07
-       mov      r15, r14
+       mov      rdi, rsi
        jmp      G_M20076_IG08
 						;; size=46 bbWeight=0 PerfScore 0.00
-G_M20076_IG24:        ; bbWeight=0, gcrefRegs=9008 {rbx r12 r15}, byrefRegs=0000 {}, byref
-       ; gcrRegs -[rdx r13-r14] +[r12]
-       mov      rcx, r12
+G_M20076_IG26:        ; bbWeight=0, gcrefRegs=2088 {rbx rdi r13}, byrefRegs=0000 {}, byref
+       ; gcrRegs -[rdx rsi r14] +[r13]
+       mov      rcx, r13
        ; gcrRegs +[rcx]
        call     [System.Collections.Generic.Deque`1[System.__Canon]:Grow():this]
        ; gcrRegs -[rcx]
        ; gcr arg pop 0
        jmp      G_M20076_IG09
 						;; size=14 bbWeight=0 PerfScore 0.00
-G_M20076_IG25:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       ; gcrRegs -[rbx r12 r15]
+G_M20076_IG27:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+       ; gcrRegs -[rbx rdi r13]
        call     [System.Threading.Monitor:ThrowLockTakenException()]
        ; gcr arg pop 0
        int3     
 						;; size=7 bbWeight=0 PerfScore 0.00
-G_M20076_IG26:        ; bbWeight=0, gcrefRegs=8040 {rsi r15}, byrefRegs=0000 {}, byref
-       ; gcrRegs +[rsi r15]
-       mov      rcx, rsi
+G_M20076_IG28:        ; bbWeight=0, gcrefRegs=9000 {r12 r15}, byrefRegs=0000 {}, byref
+       ; gcrRegs +[r12 r15]
+       mov      rcx, r12
        ; gcrRegs +[rcx]
        call     [System.Collections.Generic.Deque`1[System.__Canon]:Grow():this]
        ; gcrRegs -[rcx]
        ; gcr arg pop 0
-       jmp      G_M20076_IG18
+       jmp      G_M20076_IG20
 						;; size=14 bbWeight=0 PerfScore 0.00
-G_M20076_IG27:        ; bbWeight=0, gcrefRegs=4008 {rbx r14}, byrefRegs=0000 {}, byref, isz
-       ; gcrRegs -[rsi r15] +[rbx r14]
-       mov      r10, gword ptr [rbp+0x10]
-       ; gcrRegs +[r10]
-       mov      rcx, qword ptr [r10]
-       mov      rax, qword ptr [rcx+0x38]
-       mov      rax, qword ptr [rax+0x08]
-       mov      rax, qword ptr [rax+0x10]
-       test     rax, rax
-       je       SHORT G_M20076_IG28
-       jmp      SHORT G_M20076_IG29
+G_M20076_IG29:        ; bbWeight=0, gcrefRegs=0048 {rbx rsi}, byrefRegs=0000 {}, byref, isz
+       ; gcrRegs -[r12 r15] +[rbx rsi]
+       mov      rcx, gword ptr [rbp+0x10]
+       ; gcrRegs +[rcx]
+       mov      rax, qword ptr [rcx]
+       mov      r8, qword ptr [rax+0x38]
+       mov      r8, qword ptr [r8+0x08]
+       mov      r8, qword ptr [r8+0x10]
+       test     r8, r8
+       je       SHORT G_M20076_IG30
+       jmp      SHORT G_M20076_IG31
 						;; size=26 bbWeight=0 PerfScore 0.00
-G_M20076_IG28:        ; bbWeight=0, gcrefRegs=4008 {rbx r14}, byrefRegs=0000 {}, byref
-       ; gcrRegs -[r10]
+G_M20076_IG30:        ; bbWeight=0, gcrefRegs=0048 {rbx rsi}, byrefRegs=0000 {}, byref
+       ; gcrRegs -[rcx]
+       mov      rcx, rax
        mov      rdx, 0xD1FFAB1E      ; global ptr
        call     CORINFO_HELP_RUNTIMEHANDLE_CLASS
        ; gcr arg pop 0
-						;; size=15 bbWeight=0 PerfScore 0.00
-G_M20076_IG29:        ; bbWeight=0, gcrefRegs=4008 {rbx r14}, byrefRegs=0000 {}, byref
-       mov      rcx, rax
+       mov      r8, rax
+						;; size=21 bbWeight=0 PerfScore 0.00
+G_M20076_IG31:        ; bbWeight=0, gcrefRegs=0048 {rbx rsi}, byrefRegs=0000 {}, byref
+       mov      rcx, r8
        call     CORINFO_HELP_NEWSFAST
        ; gcrRegs +[rax]
        ; gcr arg pop 0
-       mov      rsi, rax
-       ; gcrRegs +[rsi]
-       mov      rcx, rsi
+       mov      rdi, rax
+       ; gcrRegs +[rdi]
+       mov      rcx, rdi
        ; gcrRegs +[rcx]
-       mov      r8, r14
+       mov      r8, rsi
        ; gcrRegs +[r8]
        mov      edx, 1
        xor      r9d, r9d
        call     [System.Threading.Channels.AsyncOperation`1[System.VoidResult]:.ctor(ubyte,System.Threading.CancellationToken,ubyte):this]
-       ; gcrRegs -[rax rcx r8 r14]
+       ; gcrRegs -[rax rcx rsi r8]
        ; gcr arg pop 0
-       lea      rcx, bword ptr [rsi+0x60]
+       lea      rcx, bword ptr [rdi+0x60]
        ; byrRegs +[rcx]
        mov      rdx, rbx
        ; gcrRegs +[rdx]
        call     CORINFO_HELP_ASSIGN_REF
        ; gcrRegs -[rdx rbx]
        ; byrRegs -[rcx]
-       mov      rdx, gword ptr [rbp-0x60]
+       mov      rdx, gword ptr [rbp-0x68]
        ; gcrRegs +[rdx]
        mov      rcx, gword ptr [rdx+0x38]
        ; gcrRegs +[rcx]
-       mov      rdx, rsi
+       mov      rdx, rdi
        cmp      dword ptr [rcx], ecx
        call     [System.Collections.Generic.Deque`1[System.__Canon]:EnqueueTail(System.__Canon):this]
        ; gcrRegs -[rcx rdx]
        ; gcr arg pop 0
        lea      rdx, [rbp-0x58]
-       mov      rcx, rsi
+       mov      rcx, rdi
        ; gcrRegs +[rcx]
        call     [System.Threading.Channels.AsyncOperation`1[System.VoidResult]:get_ValueTask():System.Threading.Tasks.ValueTask:this]
-       ; gcrRegs -[rcx rsi]
+       ; gcrRegs -[rcx rdi]
        ; gcr arg pop 0
-       jmp      G_M20076_IG50
-						;; size=80 bbWeight=0 PerfScore 0.00
-G_M20076_IG30:        ; bbWeight=0, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref, isz
+       mov      rcx, gword ptr [rbp-0x58]
+       ; gcrRegs +[rcx]
+       mov      gword ptr [rbp-0x70], rcx
+       movsx    rcx, word  ptr [rbp-0x50]
+       ; gcrRegs -[rcx]
+       mov      dword ptr [rbp-0x5C], ecx
+       movzx    rcx, byte  ptr [rbp-0x4E]
+       mov      dword ptr [rbp-0x60], ecx
+       jmp      G_M20076_IG58
+						;; size=103 bbWeight=0 PerfScore 0.00
+G_M20076_IG32:        ; bbWeight=0, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref, isz
        ; gcrRegs +[rbx]
-       mov      rdx, gword ptr [rbp-0x60]
+       mov      rdx, gword ptr [rbp-0x68]
        ; gcrRegs +[rdx]
        cmp      dword ptr [rdx+0x58], 3
-       jne      SHORT G_M20076_IG32
+       jne      SHORT G_M20076_IG36
        mov      rcx, gword ptr [rdx+0x28]
        ; gcrRegs +[rcx]
        call     <unknown method>
@@ -459,12 +484,12 @@ G_M20076_IG30:        ; bbWeight=0, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byr
        ; gcr arg pop 0
        xor      edx, edx
        mov      dword ptr [rbp-0x48], edx
-       mov      rdx, gword ptr [rbp-0x60]
+       mov      rdx, gword ptr [rbp-0x68]
        ; gcrRegs +[rdx]
        mov      rax, gword ptr [rdx+0x18]
        ; gcrRegs +[rax]
        test     rax, rax
-       je       SHORT G_M20076_IG31
+       je       SHORT G_M20076_IG33
        mov      rdx, rbx
        mov      rcx, gword ptr [rax+0x08]
        ; gcrRegs +[rcx]
@@ -472,39 +497,47 @@ G_M20076_IG30:        ; bbWeight=0, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byr
        ; gcrRegs -[rax rcx rdx rbx]
        ; gcr arg pop 0
 						;; size=47 bbWeight=0 PerfScore 0.00
-G_M20076_IG31:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       vxorps   xmm0, xmm0, xmm0
-       vmovdqu  xmmword ptr [rbp-0x58], xmm0
-       jmp      G_M20076_IG50
-						;; size=14 bbWeight=0 PerfScore 0.00
-G_M20076_IG32:        ; bbWeight=0, gcrefRegs=000C {rdx rbx}, byrefRegs=0000 {}, byref, isz
+G_M20076_IG33:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+       xor      rcx, rcx
+       ; gcrRegs +[rcx]
+       mov      gword ptr [rbp-0x70], rcx
+						;; size=6 bbWeight=0 PerfScore 0.00
+G_M20076_IG34:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+       ; gcrRegs -[rcx]
+       mov      dword ptr [rbp-0x5C], ecx
+						;; size=3 bbWeight=0 PerfScore 0.00
+G_M20076_IG35:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+       mov      dword ptr [rbp-0x60], ecx
+       jmp      G_M20076_IG58
+						;; size=8 bbWeight=0 PerfScore 0.00
+G_M20076_IG36:        ; bbWeight=0, gcrefRegs=000C {rdx rbx}, byrefRegs=0000 {}, byref, isz
        ; gcrRegs +[rdx rbx]
        cmp      dword ptr [rdx+0x58], 1
-       je       SHORT G_M20076_IG33
+       je       SHORT G_M20076_IG37
        mov      rcx, gword ptr [rdx+0x28]
        ; gcrRegs +[rcx]
        cmp      dword ptr [rcx], ecx
        call     [System.Collections.Generic.Deque`1[System.__Canon]:DequeueHead():System.__Canon:this]
        ; gcrRegs -[rcx rdx] +[rax]
        ; gcr arg pop 0
-       mov      r14, rax
-       ; gcrRegs +[r14]
-       jmp      SHORT G_M20076_IG34
+       mov      rsi, rax
+       ; gcrRegs +[rsi]
+       jmp      SHORT G_M20076_IG38
 						;; size=23 bbWeight=0 PerfScore 0.00
-G_M20076_IG33:        ; bbWeight=0, gcrefRegs=000C {rdx rbx}, byrefRegs=0000 {}, byref
-       ; gcrRegs -[rax r14] +[rdx]
+G_M20076_IG37:        ; bbWeight=0, gcrefRegs=000C {rdx rbx}, byrefRegs=0000 {}, byref
+       ; gcrRegs -[rax rsi] +[rdx]
        mov      rcx, gword ptr [rdx+0x28]
        ; gcrRegs +[rcx]
        cmp      dword ptr [rcx], ecx
        call     [System.Collections.Generic.Deque`1[System.__Canon]:DequeueTail():System.__Canon:this]
        ; gcrRegs -[rcx rdx] +[rax]
        ; gcr arg pop 0
-       mov      r14, rax
-       ; gcrRegs +[r14]
+       mov      rsi, rax
+       ; gcrRegs +[rsi]
 						;; size=15 bbWeight=0 PerfScore 0.00
-G_M20076_IG34:        ; bbWeight=0, gcrefRegs=4008 {rbx r14}, byrefRegs=0000 {}, byref, isz
+G_M20076_IG38:        ; bbWeight=0, gcrefRegs=0048 {rbx rsi}, byrefRegs=0000 {}, byref, isz
        ; gcrRegs -[rax]
-       mov      rdx, gword ptr [rbp-0x60]
+       mov      rdx, gword ptr [rbp-0x68]
        ; gcrRegs +[rdx]
        mov      rcx, gword ptr [rdx+0x28]
        ; gcrRegs +[rcx]
@@ -513,7 +546,7 @@ G_M20076_IG34:        ; bbWeight=0, gcrefRegs=4008 {rbx r14}, byrefRegs=0000 {},
        call     [System.Collections.Generic.Deque`1[System.__Canon]:EnqueueTail(System.__Canon):this]
        ; gcrRegs -[rcx rdx rbx]
        ; gcr arg pop 0
-       mov      rdx, gword ptr [rbp-0x60]
+       mov      rdx, gword ptr [rbp-0x68]
        ; gcrRegs +[rdx]
        mov      rcx, gword ptr [rdx+0x28]
        ; gcrRegs +[rcx]
@@ -522,25 +555,33 @@ G_M20076_IG34:        ; bbWeight=0, gcrefRegs=4008 {rbx r14}, byrefRegs=0000 {},
        ; gcr arg pop 0
        xor      edx, edx
        mov      dword ptr [rbp-0x48], edx
-       mov      rdx, gword ptr [rbp-0x60]
+       mov      rdx, gword ptr [rbp-0x68]
        ; gcrRegs +[rdx]
        mov      rax, gword ptr [rdx+0x18]
        ; gcrRegs +[rax]
        test     rax, rax
-       je       SHORT G_M20076_IG35
-       mov      rdx, r14
+       je       SHORT G_M20076_IG39
+       mov      rdx, rsi
        mov      rcx, gword ptr [rax+0x08]
        ; gcrRegs +[rcx]
        call     [rax+0x18]<unknown method>
-       ; gcrRegs -[rax rcx rdx r14]
+       ; gcrRegs -[rax rcx rdx rsi]
        ; gcr arg pop 0
 						;; size=60 bbWeight=0 PerfScore 0.00
-G_M20076_IG35:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       vxorps   xmm0, xmm0, xmm0
-       vmovdqu  xmmword ptr [rbp-0x58], xmm0
-       jmp      G_M20076_IG50
-						;; size=14 bbWeight=0 PerfScore 0.00
-G_M20076_IG36:        ; bbWeight=0, gcrefRegs=000C {rdx rbx}, byrefRegs=0000 {}, byref
+G_M20076_IG39:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+       xor      rcx, rcx
+       ; gcrRegs +[rcx]
+       mov      gword ptr [rbp-0x70], rcx
+						;; size=6 bbWeight=0 PerfScore 0.00
+G_M20076_IG40:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+       ; gcrRegs -[rcx]
+       mov      dword ptr [rbp-0x5C], ecx
+						;; size=3 bbWeight=0 PerfScore 0.00
+G_M20076_IG41:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+       mov      dword ptr [rbp-0x60], ecx
+       jmp      G_M20076_IG58
+						;; size=8 bbWeight=0 PerfScore 0.00
+G_M20076_IG42:        ; bbWeight=0, gcrefRegs=000C {rdx rbx}, byrefRegs=0000 {}, byref
        ; gcrRegs +[rdx rbx]
        mov      rcx, gword ptr [rdx+0x28]
        ; gcrRegs +[rcx]
@@ -549,22 +590,31 @@ G_M20076_IG36:        ; bbWeight=0, gcrefRegs=000C {rdx rbx}, byrefRegs=0000 {},
        call     [System.Collections.Generic.Deque`1[System.__Canon]:EnqueueTail(System.__Canon):this]
        ; gcrRegs -[rcx rdx rbx]
        ; gcr arg pop 0
-       vxorps   xmm0, xmm0, xmm0
-       vmovdqu  xmmword ptr [rbp-0x58], xmm0
-       jmp      G_M20076_IG50
-						;; size=29 bbWeight=0 PerfScore 0.00
-G_M20076_IG37:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+       xor      rcx, rcx
+       ; gcrRegs +[rcx]
+       mov      gword ptr [rbp-0x70], rcx
+						;; size=21 bbWeight=0 PerfScore 0.00
+G_M20076_IG43:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+       ; gcrRegs -[rcx]
+       mov      dword ptr [rbp-0x5C], ecx
+						;; size=3 bbWeight=0 PerfScore 0.00
+G_M20076_IG44:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+       mov      dword ptr [rbp-0x60], ecx
+       jmp      G_M20076_IG58
+						;; size=8 bbWeight=0 PerfScore 0.00
+G_M20076_IG45:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      ecx, 9
        call     [System.ThrowHelper:ThrowArgumentNullException(int)]
        ; gcr arg pop 0
        int3     
 						;; size=12 bbWeight=0 PerfScore 0.00
-G_M20076_IG38:        ; bbWeight=0.50, gcrefRegs=A00C {rdx rbx r13 r15}, byrefRegs=0000 {}, byref, isz
-       ; gcrRegs +[rdx rbx r13 r15]
+G_M20076_IG46:        ; bbWeight=0.50, gcVars=000000000000001C {V00 V01 V06}, gcrefRegs=408C {rdx rbx rdi r14}, byrefRegs=0000 {}, gcvars, byref, isz
+       ; gcrRegs +[rdx rbx rdi r14]
+       ; GC ptr vars -{V68}
        cmp      byte  ptr [rbp-0x48], 0
-       je       SHORT G_M20076_IG40
+       je       SHORT G_M20076_IG48
 						;; size=6 bbWeight=0.50 PerfScore 1.50
-G_M20076_IG39:        ; bbWeight=0.50, gcrefRegs=A00C {rdx rbx r13 r15}, byrefRegs=0000 {}, byref
+G_M20076_IG47:        ; bbWeight=0.50, gcrefRegs=408C {rdx rbx rdi r14}, byrefRegs=0000 {}, byref
        mov      rcx, gword ptr [rdx+0x28]
        ; gcrRegs +[rcx]
        ; GC ptr vars -{V06}
@@ -572,17 +622,17 @@ G_M20076_IG39:        ; bbWeight=0.50, gcrefRegs=A00C {rdx rbx r13 r15}, byrefRe
        ; gcrRegs -[rcx rdx]
        ; gcr arg pop 0
 						;; size=9 bbWeight=0.50 PerfScore 1.49
-G_M20076_IG40:        ; bbWeight=0.50, gcrefRegs=A008 {rbx r13 r15}, byrefRegs=0000 {}, byref, isz
-       test     r15, r15
-       jne      SHORT G_M20076_IG49
+G_M20076_IG48:        ; bbWeight=0.50, gcrefRegs=4088 {rbx rdi r14}, byrefRegs=0000 {}, byref, isz
+       test     rdi, rdi
+       jne      SHORT G_M20076_IG57
 						;; size=5 bbWeight=0.50 PerfScore 0.62
-G_M20076_IG41:        ; bbWeight=0.48, gcrefRegs=2000 {r13}, byrefRegs=0000 {}, byref, isz
-       ; gcrRegs -[rbx r15]
-       test     r13, r13
-       jne      SHORT G_M20076_IG44
+G_M20076_IG49:        ; bbWeight=0.48, gcrefRegs=4000 {r14}, byrefRegs=0000 {}, byref, isz
+       ; gcrRegs -[rbx rdi]
+       test     r14, r14
+       jne      SHORT G_M20076_IG52
 						;; size=5 bbWeight=0.48 PerfScore 0.61
-G_M20076_IG42:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       ; gcrRegs -[r13]
+G_M20076_IG50:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+       ; gcrRegs -[r14]
        xor      eax, eax
        mov      rdx, bword ptr [rbp+0x18]
        ; byrRegs +[rdx]
@@ -591,8 +641,8 @@ G_M20076_IG42:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
        mov      rax, rdx
        ; byrRegs +[rax]
 						;; size=16 bbWeight=0.50 PerfScore 1.74
-G_M20076_IG43:        ; bbWeight=0.50, epilog, nogc, extend
-       add      rsp, 88
+G_M20076_IG51:        ; bbWeight=0.50, epilog, nogc, extend
+       add      rsp, 104
        pop      rbx
        pop      rsi
        pop      rdi
@@ -603,16 +653,16 @@ G_M20076_IG43:        ; bbWeight=0.50, epilog, nogc, extend
        pop      rbp
        ret      
 						;; size=17 bbWeight=0.50 PerfScore 2.62
-G_M20076_IG44:        ; bbWeight=0.24, gcVars=0000000000000018 {V00 V01}, gcrefRegs=2000 {r13}, byrefRegs=0000 {}, gcvars, byref
-       ; gcrRegs +[r13]
+G_M20076_IG52:        ; bbWeight=0.24, gcVars=0000000000000018 {V00 V01}, gcrefRegs=4000 {r14}, byrefRegs=0000 {}, gcvars, byref
+       ; gcrRegs +[r14]
        ; byrRegs -[rax rdx]
-       mov      rdi, gword ptr [r13+0x30]
-       ; gcrRegs +[rdi]
-       mov      r12, rdi
+       mov      r15, gword ptr [r14+0x30]
+       ; gcrRegs +[r15]
+       mov      r12, r15
        ; gcrRegs +[r12]
 						;; size=7 bbWeight=0.24 PerfScore 0.54
-G_M20076_IG45:        ; bbWeight=0.24, gcrefRegs=1080 {rdi r12}, byrefRegs=0000 {}, byref, isz
-       ; gcrRegs -[r13]
+G_M20076_IG53:        ; bbWeight=0.24, gcrefRegs=9000 {r12 r15}, byrefRegs=0000 {}, byref
+       ; gcrRegs -[r14]
        mov      r13, gword ptr [r12+0x30]
        ; gcrRegs +[r13]
        xor      rcx, rcx
@@ -620,16 +670,16 @@ G_M20076_IG45:        ; bbWeight=0.24, gcrefRegs=1080 {rdi r12}, byrefRegs=0000
        mov      gword ptr [r12+0x30], rcx
        mov      rcx, gword ptr [r12+0x58]
        test     rcx, rcx
-       jne      SHORT G_M20076_IG55
-						;; size=22 bbWeight=0.24 PerfScore 1.56
-G_M20076_IG46:        ; bbWeight=0.24, gcrefRegs=3080 {rdi r12 r13}, byrefRegs=0000 {}, byref, isz
+       jne      G_M20076_IG63
+						;; size=26 bbWeight=0.24 PerfScore 1.56
+G_M20076_IG54:        ; bbWeight=0.24, gcrefRegs=B000 {r12 r13 r15}, byrefRegs=0000 {}, byref
        ; gcrRegs -[rcx]
        mov      rcx, gword ptr [r12+0x58]
        ; gcrRegs +[rcx]
        test     rcx, rcx
-       jne      SHORT G_M20076_IG56
-						;; size=10 bbWeight=0.24 PerfScore 0.78
-G_M20076_IG47:        ; bbWeight=0.24, gcrefRegs=3080 {rdi r12 r13}, byrefRegs=0000 {}, byref
+       jne      G_M20076_IG64
+						;; size=14 bbWeight=0.24 PerfScore 0.78
+G_M20076_IG55:        ; bbWeight=0.24, gcrefRegs=B000 {r12 r13 r15}, byrefRegs=0000 {}, byref
        ; gcrRegs -[rcx]
        mov      byte  ptr [r12+0x40], 1
        mov      rcx, r12
@@ -638,45 +688,53 @@ G_M20076_IG47:        ; bbWeight=0.24, gcrefRegs=3080 {rdi r12 r13}, byrefRegs=0
        ; gcrRegs -[rcx r12]
        ; gcr arg pop 0
 						;; size=15 bbWeight=0.24 PerfScore 1.02
-G_M20076_IG48:        ; bbWeight=0.24, gcrefRegs=2080 {rdi r13}, byrefRegs=0000 {}, byref, isz
+G_M20076_IG56:        ; bbWeight=0.24, gcrefRegs=A000 {r13 r15}, byrefRegs=0000 {}, byref, isz
        mov      r12, r13
        ; gcrRegs +[r12]
-       cmp      r12, rdi
-       jne      SHORT G_M20076_IG45
-       jmp      SHORT G_M20076_IG42
+       cmp      r12, r15
+       jne      SHORT G_M20076_IG53
+       jmp      SHORT G_M20076_IG50
 						;; size=10 bbWeight=0.24 PerfScore 0.84
-G_M20076_IG49:        ; bbWeight=0.01, gcrefRegs=8008 {rbx r15}, byrefRegs=0000 {}, byref, isz
-       ; gcrRegs -[rdi r12-r13] +[rbx r15]
-       mov      rcx, r15
+G_M20076_IG57:        ; bbWeight=0.01, gcrefRegs=0088 {rbx rdi}, byrefRegs=0000 {}, byref, isz
+       ; gcrRegs -[r12-r13 r15] +[rbx rdi]
+       mov      rcx, rdi
        ; gcrRegs +[rcx]
        mov      rdx, rbx
        ; gcrRegs +[rdx]
        call     [System.Threading.Channels.AsyncOperation`1[System.__Canon]:TrySetResult(System.__Canon):ubyte:this]
-       ; gcrRegs -[rcx rdx rbx r15]
+       ; gcrRegs -[rcx rdx rbx rdi]
        ; gcr arg pop 0
-       jmp      SHORT G_M20076_IG42
+       jmp      SHORT G_M20076_IG50
 						;; size=14 bbWeight=0.01 PerfScore 0.08
-G_M20076_IG50:        ; bbWeight=0.50, gcVars=000000000000001C {V00 V01 V06}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
-       ; GC ptr vars +{V02 V06}
+G_M20076_IG58:        ; bbWeight=0.50, gcVars=000000000004001C {V00 V01 V06 V68}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
+       ; GC ptr vars +{V02 V06 V68}
        mov      rcx, rsp
 						;; size=3 bbWeight=0.50 PerfScore 0.13
-G_M20076_IG51:        ; bbWeight=0.50, nogc, extend
-       call     G_M20076_IG60
+G_M20076_IG59:        ; bbWeight=0.50, nogc, extend
+       call     G_M20076_IG68
        nop      
 						;; size=6 bbWeight=0.50 PerfScore 0.63
-G_M20076_IG52:        ; bbWeight=0.50, gcVars=0000000000000018 {V00 V01}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
+G_M20076_IG60:        ; bbWeight=0.50, gcVars=0000000000040018 {V00 V01 V68}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
        ; GC ptr vars -{V02 V06}
-       mov      rdi, bword ptr [rbp+0x18]
-       ; byrRegs +[rdi]
-       lea      rsi, bword ptr [rbp-0x58]
-       ; byrRegs +[rsi]
-       call     CORINFO_HELP_ASSIGN_BYREF
-       movsq    
-       mov      rax, bword ptr [rbp+0x18]
+       mov      rdx, gword ptr [rbp-0x70]
+       ; gcrRegs +[rdx]
+       mov      rcx, bword ptr [rbp+0x18]
+       ; byrRegs +[rcx]
+       ; GC ptr vars -{V68}
+       call     CORINFO_HELP_CHECKED_ASSIGN_REF
+       ; gcrRegs -[rdx]
+       ; byrRegs -[rcx]
+       movsx    rax, word  ptr [rbp-0x5C]
+       mov      rcx, bword ptr [rbp+0x18]
+       ; byrRegs +[rcx]
+       mov      word  ptr [rcx+0x08], ax
+       movzx    rax, byte  ptr [rbp-0x60]
+       mov      byte  ptr [rcx+0x0A], al
+       mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=19 bbWeight=0.50 PerfScore 2.26
-G_M20076_IG53:        ; bbWeight=0.50, epilog, nogc, extend
-       add      rsp, 88
+						;; size=36 bbWeight=0.50 PerfScore 5.14
+G_M20076_IG61:        ; bbWeight=0.50, epilog, nogc, extend
+       add      rsp, 104
        pop      rbx
        pop      rsi
        pop      rdi
@@ -687,23 +745,25 @@ G_M20076_IG53:        ; bbWeight=0.50, epilog, nogc, extend
        pop      rbp
        ret      
 						;; size=17 bbWeight=0.50 PerfScore 2.63
-G_M20076_IG54:        ; bbWeight=0, gcVars=0000000000000018 {V00 V01}, gcrefRegs=4408 {rbx r10 r14}, byrefRegs=0000 {}, gcvars, byref, isz
-       ; gcrRegs +[rbx r10 r14]
-       ; byrRegs -[rax rsi rdi]
-       cmp      dword ptr [r14+0x20], 0
+G_M20076_IG62:        ; bbWeight=0, gcVars=0000000000040018 {V00 V01 V68}, gcrefRegs=004A {rcx rbx rsi}, byrefRegs=0000 {}, gcvars, byref, isz
+       ; gcrRegs +[rcx rbx rsi]
+       ; byrRegs -[rax rcx]
+       ; GC ptr vars +{V68}
+       cmp      dword ptr [rsi+0x20], 0
        je       G_M20076_IG03
-       jmp      SHORT G_M20076_IG57
-						;; size=13 bbWeight=0 PerfScore 0.00
-G_M20076_IG55:        ; bbWeight=0, gcrefRegs=3080 {rdi r12 r13}, byrefRegs=0000 {}, byref, isz
-       ; gcrRegs -[rbx r10 r14] +[rdi r12-r13]
+       jmp      SHORT G_M20076_IG65
+						;; size=12 bbWeight=0 PerfScore 0.00
+G_M20076_IG63:        ; bbWeight=0, gcVars=0000000000000018 {V00 V01}, gcrefRegs=B000 {r12 r13 r15}, byrefRegs=0000 {}, gcvars, byref
+       ; gcrRegs -[rcx rbx rsi] +[r12-r13 r15]
+       ; GC ptr vars -{V68}
        lea      rcx, bword ptr [r12+0x48]
        ; byrRegs +[rcx]
        call     [System.Threading.CancellationTokenRegistration:Dispose():this]
        ; byrRegs -[rcx]
        ; gcr arg pop 0
-       jmp      SHORT G_M20076_IG46
-						;; size=13 bbWeight=0 PerfScore 0.00
-G_M20076_IG56:        ; bbWeight=0, gcrefRegs=3080 {rdi r12 r13}, byrefRegs=0000 {}, byref, isz
+       jmp      G_M20076_IG54
+						;; size=16 bbWeight=0 PerfScore 0.00
+G_M20076_IG64:        ; bbWeight=0, gcrefRegs=B000 {r12 r13 r15}, byrefRegs=0000 {}, byref
        lea      rcx, bword ptr [r12+0x38]
        ; byrRegs +[rcx]
        mov      r8d, 1
@@ -711,21 +771,21 @@ G_M20076_IG56:        ; bbWeight=0, gcrefRegs=3080 {rdi r12 r13}, byrefRegs=0000
        lock     
        cmpxchg  dword ptr [rcx], r8d
        test     eax, eax
-       jne      SHORT G_M20076_IG48
-       jmp      G_M20076_IG47
-						;; size=27 bbWeight=0 PerfScore 0.00
-G_M20076_IG57:        ; bbWeight=0, gcrefRegs=4000 {r14}, byrefRegs=0000 {}, byref, isz
-       ; gcrRegs -[rdi r12-r13] +[r14]
+       jne      G_M20076_IG56
+       jmp      G_M20076_IG55
+						;; size=31 bbWeight=0 PerfScore 0.00
+G_M20076_IG65:        ; bbWeight=0, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
+       ; gcrRegs -[r12-r13 r15] +[rsi]
        ; byrRegs -[rcx]
-       mov      rcx, r14
+       mov      rcx, rsi
        ; gcrRegs +[rcx]
        call     [System.Threading.Tasks.Task:FromCanceled(System.Threading.CancellationToken):System.Threading.Tasks.Task]
-       ; gcrRegs -[rcx r14] +[rax]
+       ; gcrRegs -[rcx rsi] +[rax]
        ; gcr arg pop 0
        mov      rdx, rax
        ; gcrRegs +[rdx]
        test     rdx, rdx
-       jne      SHORT G_M20076_IG58
+       jne      SHORT G_M20076_IG66
        mov      ecx, 9
        ; GC ptr vars -{V01}
        call     [System.ThrowHelper:ThrowArgumentNullException(int)]
@@ -733,7 +793,7 @@ G_M20076_IG57:        ; bbWeight=0, gcrefRegs=4000 {r14}, byrefRegs=0000 {}, byr
        ; gcr arg pop 0
        int3     
 						;; size=29 bbWeight=0 PerfScore 0.00
-G_M20076_IG58:        ; bbWeight=0, gcVars=0000000000000018 {V00 V01}, gcrefRegs=0004 {rdx}, byrefRegs=0000 {}, gcvars, byref
+G_M20076_IG66:        ; bbWeight=0, gcVars=0000000000000018 {V00 V01}, gcrefRegs=0004 {rdx}, byrefRegs=0000 {}, gcvars, byref
        ; gcrRegs +[rdx]
        ; GC ptr vars +{V01}
        mov      rcx, bword ptr [rbp+0x18]
@@ -748,8 +808,8 @@ G_M20076_IG58:        ; bbWeight=0, gcVars=0000000000000018 {V00 V01}, gcrefRegs
        mov      rax, rdx
        ; byrRegs +[rax]
 						;; size=26 bbWeight=0 PerfScore 0.00
-G_M20076_IG59:        ; bbWeight=0, epilog, nogc, extend
-       add      rsp, 88
+G_M20076_IG67:        ; bbWeight=0, epilog, nogc, extend
+       add      rsp, 104
        pop      rbx
        pop      rsi
        pop      rdi
@@ -760,9 +820,9 @@ G_M20076_IG59:        ; bbWeight=0, epilog, nogc, extend
        pop      rbp
        ret      
 						;; size=17 bbWeight=0 PerfScore 0.00
-G_M20076_IG60:        ; bbWeight=0.50, gcVars=000000000000001C {V00 V01 V06}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, funclet prolog, nogc
+G_M20076_IG68:        ; bbWeight=0.50, gcVars=000000000004001C {V00 V01 V06 V68}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, funclet prolog, nogc
        ; byrRegs -[rax rdx]
-       ; GC ptr vars +{V02 V06}
+       ; GC ptr vars +{V02 V06 V68}
        push     rbp
        push     r15
        push     r14
@@ -774,14 +834,14 @@ G_M20076_IG60:        ; bbWeight=0.50, gcVars=000000000000001C {V00 V01 V06}, gc
        sub      rsp, 40
        mov      rbp, qword ptr [rcx+0x20]
        mov      qword ptr [rsp+0x20], rbp
-       lea      rbp, [rbp+0x90]
+       lea      rbp, [rbp+0xA0]
 						;; size=32 bbWeight=0.50 PerfScore 5.89
-G_M20076_IG61:        ; bbWeight=0.50, gcVars=000000000000001C {V00 V01 V06}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, isz
+G_M20076_IG69:        ; bbWeight=0.50, gcVars=000000000004001C {V00 V01 V06 V68}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, isz
        cmp      byte  ptr [rbp-0x48], 0
-       je       SHORT G_M20076_IG63
+       je       SHORT G_M20076_IG71
 						;; size=6 bbWeight=0.50 PerfScore 1.50
-G_M20076_IG62:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      rdx, gword ptr [rbp-0x60]
+G_M20076_IG70:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+       mov      rdx, gword ptr [rbp-0x68]
        ; gcrRegs +[rdx]
        mov      rcx, gword ptr [rdx+0x28]
        ; gcrRegs +[rcx]
@@ -790,10 +850,10 @@ G_M20076_IG62:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
        ; gcrRegs -[rcx rdx]
        ; gcr arg pop 0
 						;; size=13 bbWeight=0.50 PerfScore 2.00
-G_M20076_IG63:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+G_M20076_IG71:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        nop      
 						;; size=1 bbWeight=0.50 PerfScore 0.13
-G_M20076_IG64:        ; bbWeight=0.50, funclet epilog, nogc, extend
+G_M20076_IG72:        ; bbWeight=0.50, funclet epilog, nogc, extend
        add      rsp, 40
        pop      rbx
        pop      rsi
@@ -806,7 +866,7 @@ G_M20076_IG64:        ; bbWeight=0.50, funclet epilog, nogc, extend
        ret      
 						;; size=17 bbWeight=0.50 PerfScore 2.63
 
-; Total bytes of code 1356, prolog size 55, PerfScore 135.55, instruction count 378, allocated bytes for code 1356 (MethodHash=34aab193) for method System.Threading.Channels.BoundedChannel`1+BoundedChannelWriter[System.__Canon]:WriteAsync(System.__Canon,System.Threading.CancellationToken):System.Threading.Tasks.ValueTask:this (Tier1)
+; Total bytes of code 1450, prolog size 61, PerfScore 140.77, instruction count 401, allocated bytes for code 1450 (MethodHash=34aab193) for method System.Threading.Channels.BoundedChannel`1+BoundedChannelWriter[System.__Canon]:WriteAsync(System.__Canon,System.Threading.CancellationToken):System.Threading.Tasks.ValueTask:this (Tier1)

The physically promoted fields end up EH-live and thus cannot be enregistered.
Here some form of stack packing to allow the fields to share stack location with the base struct local would have helped.

@jakobbotsch
Copy link
Member Author

cc @dotnet/jit-contrib PTAL @AndyAyersMS

Diffs. Quite substantial, more so on platforms with more retbuffers. Somewhat high TP impact, but cost seems to be mostly just from having more locals participating in opts (see detailed TP impact above).

See above for analysis of some regressions. I'll save some of that work for follow-ups -- I opened issues for them.

Failures look like #104570, #102706, #104316, #104269/#103940/#103630/#103549

}

LclVarDsc* dsc = lvaGetDesc(node->AsLclVarCommon());
if (!dsc->lvTracked)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This early out seems to make the method a bit more specific than the name implies... maybe add a Note: or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the function to fgIsTrackedRetBufferAddress

@jakobbotsch jakobbotsch enabled auto-merge (squash) July 9, 2024 08:21
@jakobbotsch jakobbotsch merged commit dd4b757 into dotnet:main Jul 9, 2024
95 of 107 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants