Introduce XLAGraphExecutor #4270

alanwaketan · 2022-12-02T20:41:48Z

Summary:
This pull request moves all graph executor related parts out from XLATensor into XLAGraphExecutor such that it matches the format in upstream and therefore makes it easier to inherit the upstream LazyTensor and LazyGraphExecutor later.

A few changes to notice:

DeviceContextArena::RegisterTensor/UnregisterTensor are now proxied via XLAGraphExecutor.
torch::lazy::IsSpecialScalar is used to replace our own helper.
GetDeviceData are moved into XlaDataCacheArena and proxied via XLAGraphExecutor.
DeviceBarrier are moved into DeviceLockerArena and proxied via XLAGraphExecutor.
Few quirks are added to ease this patch but will be removed later:
5.1. XLATensor(std::shared_ptr data) are made public such that DeviceContextArena::GetLiveTensors can access it.
5.2. XLAGraphExecutor is made to be a friend class of XLATensor in order to access some of the later's private methods/members.

Test Plan:
CI.

alanwaketan · 2022-12-05T22:16:34Z

It looks like upstream has broken our PJRT_DEVICE=CPU python test/dynamo/test_dynamo.py test. This PR passes all the tests last Friday but then starts failing after a rebase. I tested it locally with the same XLA commit but a) last Friday's upstream ToT, and b) today's upstream ToT, and a passes the test while b fails the test.

JackCaoG

Did you have a chance to run resnet50 on TPU with this change? I think this pr does not introduce any functional change but would be good to check no speed regression is introduced.

alanwaketan · 2022-12-06T05:13:18Z

Did you have a chance to run resnet50 on TPU with this change? I think this pr does not introduce any functional change but would be good to check no speed regression is introduced.

I will be surprised if it does. Let me do a quick double check.

alanwaketan · 2022-12-06T08:04:17Z

Here are the results I just ran on tpu v3-8 with PJRT:

Type Mean Median 90th % Std Dev CV ------ ------ -------- -------- --------- ---- Rate 646.81 648.06 648.79 5.55 0.01 Rate 646.23 647.30 648.43 5.43 0.01

First row is without the patch, and the second row is with the patch. So no performance difference.

alanwaketan added the tracing Lazy Tensor tracing label Dec 2, 2022

alanwaketan requested review from JackCaoG and wonjoo-wj December 2, 2022 20:41

alanwaketan self-assigned this Dec 2, 2022

alanwaketan added 13 commits December 5, 2022 18:29

CopXLAGraphExecutor out

e81d384

Make it compiled

90f523d

Remove wrongly moved methods

b412540

Remove static

89474ee

Introduce XLAGraphExecutor singleton

67fa885

Switch to use XLAGraphExecutor

37195ec

Switch to use XLAGraphExecutor

dd9eeec

Switch to use XLAGraphExecutor

8cc4ee6

Add some missing interfaces from LazyGraphExecutor

fd93f4a

Switch to XLAGraphExecutor and make it work

b4cf3f7

Remove graph executor parts from XLATensor

f500d1f

Fix linters

56d62ee

Fix cpp tests

59cf84d

alanwaketan force-pushed the alanwaketan/graph_exe branch from 80d0cbd to 59cf84d Compare December 5, 2022 18:29

JackCaoG approved these changes Dec 6, 2022

View reviewed changes

alanwaketan merged commit c20bcdd into master Dec 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Introduce XLAGraphExecutor #4270

Introduce XLAGraphExecutor #4270

Uh oh!

alanwaketan commented Dec 2, 2022

alanwaketan commented Dec 5, 2022

JackCaoG left a comment

alanwaketan commented Dec 6, 2022

alanwaketan commented Dec 6, 2022

Labels

2 participants

Uh oh!

Introduce XLAGraphExecutor #4270

Introduce XLAGraphExecutor #4270

Uh oh!

Conversation

alanwaketan commented Dec 2, 2022

alanwaketan commented Dec 5, 2022

JackCaoG left a comment

Choose a reason for hiding this comment

alanwaketan commented Dec 6, 2022

alanwaketan commented Dec 6, 2022

Labels

2 participants