XRT branch cherry-pick 07/10 #5293

JackCaoG · 2023-07-10T22:59:28Z

two commit I skipped are PJRT/OpenXLA related

Delete unused .so file, and .lds file #5275 build file
Enable PjRt Client Compilation with StableHLO #5233 StableHLO Pjrt
Suppress debug symbols in OpenXLA code #5269 Suppress debug symbols in OpenXLA code

The last pr I cherry-picked is avoid copy proto in PrepareOutputShardingPropagation ( at J07/07

…mic dimensions. (#5239) * Skip calling as_strided in empty_strided_symint. * only return empty_symint conditionally. * add a comment

* Add XRT nightly builds * remove space

* Add ToString method for both PjrtData and PjrtShardedData * on cpu same config will become replicated, dont't check actual op sharding type

* disable bazel remote cache if gcloud key is empty * remove remote cache from setup.py * experiment with debug msg * fix flag * add more logs * skip remote chache if credential file is empty * add comment * add logs * add check in test and coverage script * fix condition in coverage test * advance branch pr * allow remote cache if gloud file isn't specified explicitly * remove dummy comment

* Clean bazel stuff on distutils clean * Fix python formatting

However the generated StableHLO graph still hardcodes the non-tensor value. this is not correct, will fix later.

…etnly broken by pytorch (#5282)

Bazel should figure out that _XLAC.so is current or not, and trigger rebuild if any cpp files changed.

* Remove or improve several hardcoded TPU test conditions * Fix test condition

* Err if calling sizes() on dynamic tensor * try to set has_symbolic_sizes_strides_ * resolve merge conflict * enable CONTINUE_ON_ERROR * fixed the python test test_SizeEq_should_not_compile_for_identical_symints * fix test_index_types * set CONTINUE_ON_ERROR to true * remove some unwanted code. * add a print * directly set has_symbolic_sizes_strides_ = true * make some fixes. * fix empty_strided_symint * ran linter * change error type in the test. * fix comments * ran linter

…5281) * Fix the error where mark_step does not materalize tensors on SPMD:0 * typo * fix test_non_tensor_scalar

* Set torch._dynamo.config.automatic_dynamic_shapes to False * Enable DynamoInferenceBasicTest.test_simple_model_with_different_input_shape

Summary: This pull request does the following: 1. It hides token for all_gather. 2. It folds the out-of-place all_gather into the regular all_gather. 3. It fixes an issue with the last all_reduce_in_place PR where it forgot to set the token. Test Plan: PJRT_DEVICE=TPU python test/test_mp_all_gather.py

will-cromar · 2023-07-10T23:02:20Z

Are you sure we should cherry-pick Suppress debug symbols in OpenXLA code?

JackCaoG · 2023-07-10T23:03:09Z

lol good catch.. let me revert that

This reverts commit 3967d7b.

mateuszlewko · 2023-07-11T08:53:48Z

LGTM changes in /infra/...

JackCaoG · 2023-07-11T18:10:08Z

error is

ERROR: /tmp/pytorch/xla/torch_xla/csrc/runtime/BUILD:534:10: Linking torch_xla/csrc/runtime/libxla_computation_client.so failed: missing input file '//torch_xla/csrc/runtime:tf_exported_symbols.lds' ERROR: /tmp/pytorch/xla/torch_xla/csrc/runtime/BUILD:534:10: Linking torch_xla/csrc/runtime/libxla_computation_client.so failed: missing input file '//torch_xla/csrc/runtime:tf_version_script.lds

I think they are being deleted as one of the commit, I can add them back

This reverts commit e91ad3a.

JackCaoG · 2023-07-11T21:20:45Z

@will-cromar test is green, can you take another look at this pr? After it is approved, I think we should turn on the rebae and merge, merge this pr, then maybe turn it off again.

will-cromar

LGTM. It's up to you how you want to merge it. I'm fine with squashing.

JackCaoG · 2023-07-12T01:03:42Z

I enabled the allow rebase and merge in the setting but it is still grey here. I will give it a day to see if it is just config take times to propragate.

JackCaoG · 2023-07-12T18:04:57Z

rebase and merge still grey, I am just gonna squash.

vanbasten23 and others added 22 commits July 10, 2023 22:34

Skip calling as_strided in empty_strided_symint if the input has dyna…

efa627b

…mic dimensions. (#5239) * Skip calling as_strided in empty_strided_symint. * only return empty_symint conditionally. * add a comment

Add XRT nightly builds (#5261)

c91738a

* Add XRT nightly builds * remove space

Add ToString method for both PjrtData and PjrtShardedData (#5265)

5575a45

* Add ToString method for both PjrtData and PjrtShardedData * on cpu same config will become replicated, dont't check actual op sharding type

fix xrt tostring

94dfea0

Update Sharded graph HLO dumping (#5266)

d52993d

Suppress debug symbols in OpenXLA code (#5269)

3967d7b

[SPMD] Sharding n-d tensor on (n+1)-d Mesh (#5268)

bfb85fa

Make TPU detection more robust (#5271)

cba267f

Clean bazel stuff on distutils clean. (#5274)

a2d99ac

* Clean bazel stuff on distutils clean * Fix python formatting

fix conflict

e91ad3a

Fix the error when export_torch_model is given a non-tensor (#5277)

20d2986

However the generated StableHLO graph still hardcodes the non-tensor value. this is not correct, will fix later.

Dsiable test_simple_model_with_different_input_shape since it is curr…

6087f8b

…etnly broken by pytorch (#5282)

Always do build_ext in python setup.py develop (#5273)

aa112e0

Bazel should figure out that _XLAC.so is current or not, and trigger rebuild if any cpp files changed.

Remove or improve several hardcoded TPU test conditions (#5272)

dd5709d

* Remove or improve several hardcoded TPU test conditions * Fix test condition

Add runtime.host_index (#5283)

34e9e57

Fix the error where mark_step does not materalize tensors on SPMD:0 (#…

99452b0

…5281) * Fix the error where mark_step does not materalize tensors on SPMD:0 * typo * fix test_non_tensor_scalar

Disable torch._dynamo.config.automatic_dynamic_shapes (#5285)

2402f34

* Set torch._dynamo.config.automatic_dynamic_shapes to False * Enable DynamoInferenceBasicTest.test_simple_model_with_different_input_shape

Lower squeeze.dims (#5286)

4eb14e7

avoid copy proto in PrepareOutputShardingPropagation (#5287)

348faba

JackCaoG requested review from mateuszlewko, stgpetrovic and will-cromar as code owners July 10, 2023 22:59

JackCaoG added the xrt cherry pick label Jul 10, 2023

Revert "Suppress debug symbols in OpenXLA code (#5269)"

65afe65

This reverts commit 3967d7b.

Revert "fix conflict"

3444177

This reverts commit e91ad3a.

will-cromar approved these changes Jul 11, 2023

View reviewed changes

JackCaoG merged commit 4b1742e into xrt Jul 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

XRT branch cherry-pick 07/10 #5293

XRT branch cherry-pick 07/10 #5293

Uh oh!

JackCaoG commented Jul 10, 2023 •

edited

Loading

will-cromar commented Jul 10, 2023

JackCaoG commented Jul 10, 2023

mateuszlewko commented Jul 11, 2023

JackCaoG commented Jul 11, 2023

JackCaoG commented Jul 11, 2023

will-cromar left a comment

JackCaoG commented Jul 12, 2023

JackCaoG commented Jul 12, 2023

Labels

12 participants

Uh oh!

XRT branch cherry-pick 07/10 #5293

XRT branch cherry-pick 07/10 #5293

Uh oh!

Conversation

JackCaoG commented Jul 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

will-cromar commented Jul 10, 2023

JackCaoG commented Jul 10, 2023

mateuszlewko commented Jul 11, 2023

JackCaoG commented Jul 11, 2023

JackCaoG commented Jul 11, 2023

will-cromar left a comment

Choose a reason for hiding this comment

JackCaoG commented Jul 12, 2023

JackCaoG commented Jul 12, 2023

Labels

12 participants

JackCaoG commented Jul 10, 2023 •

edited

Loading