Skip to content

Conversation

@alanwaketan
Copy link
Collaborator

Summary:
This pull request moves all graph executor related parts out from XLATensor into XLAGraphExecutor such that it matches the format in upstream and therefore makes it easier to inherit the upstream LazyTensor and LazyGraphExecutor later.

A few changes to notice:

  1. DeviceContextArena::RegisterTensor/UnregisterTensor are now proxied via XLAGraphExecutor.
  2. torch::lazy::IsSpecialScalar is used to replace our own helper.
  3. GetDeviceData are moved into XlaDataCacheArena and proxied via XLAGraphExecutor.
  4. DeviceBarrier are moved into DeviceLockerArena and proxied via XLAGraphExecutor.
  5. Few quirks are added to ease this patch but will be removed later:
    5.1. XLATensor(std::shared_ptr data) are made public such that DeviceContextArena::GetLiveTensors can access it.
    5.2. XLAGraphExecutor is made to be a friend class of XLATensor in order to access some of the later's private methods/members.

Test Plan:
CI.

@alanwaketan alanwaketan added the tracing Lazy Tensor tracing label Dec 2, 2022
@alanwaketan alanwaketan self-assigned this Dec 2, 2022
@alanwaketan alanwaketan force-pushed the alanwaketan/graph_exe branch from 80d0cbd to 59cf84d Compare December 5, 2022 18:29
@alanwaketan
Copy link
Collaborator Author

It looks like upstream has broken our PJRT_DEVICE=CPU python test/dynamo/test_dynamo.py test. This PR passes all the tests last Friday but then starts failing after a rebase. I tested it locally with the same XLA commit but a) last Friday's upstream ToT, and b) today's upstream ToT, and a passes the test while b fails the test.

Copy link
Collaborator

@JackCaoG JackCaoG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you have a chance to run resnet50 on TPU with this change? I think this pr does not introduce any functional change but would be good to check no speed regression is introduced.

@alanwaketan
Copy link
Collaborator Author

Did you have a chance to run resnet50 on TPU with this change? I think this pr does not introduce any functional change but would be good to check no speed regression is introduced.

I will be surprised if it does. Let me do a quick double check.

@alanwaketan
Copy link
Collaborator Author

Here are the results I just ran on tpu v3-8 with PJRT:

Type Mean Median 90th % Std Dev CV ------ ------ -------- -------- --------- ---- Rate 646.81 648.06 648.79 5.55 0.01 Rate 646.23 647.30 648.43 5.43 0.01 

First row is without the patch, and the second row is with the patch. So no performance difference.

@alanwaketan alanwaketan merged commit c20bcdd into master Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tracing Lazy Tensor tracing

2 participants