NNC walkthrough: how PyTorch ops get fused

FWIW, this is the test driver I’m using to test the different fusers:

 elif arg == '--fuser-nnc': torch._C._jit_override_can_fuse_on_cpu(True) torch._C._jit_override_can_fuse_on_gpu(True) torch._C._jit_set_texpr_parallel_cpu_enabled(True) torch._C._jit_set_te_must_use_llvm_cpu(False) os.environ['PYTORCH_TENSOREXPR_DONT_USE_LLVM'] = '1' elif arg == '--fuser-nnc-llvm': torch._C._jit_override_can_fuse_on_cpu(True) torch._C._jit_override_can_fuse_on_gpu(True) torch._C._jit_set_texpr_parallel_cpu_enabled(True) elif arg == '--nvfuser': #os.environ['PYTORCH_CUDA_FUSER_DISABLE_FMA'] = '1' torch._C._jit_override_can_fuse_on_cpu(False) torch._C._jit_override_can_fuse_on_gpu(False) torch._C._jit_set_texpr_fuser_enabled(False) torch._C._jit_set_nvfuser_enabled(True) 

not seeing great results so far to be honest.