Skip to content

Conversation

@JackCaoG
Copy link
Collaborator

@JackCaoG JackCaoG commented Jul 10, 2023

two commit I skipped are PJRT/OpenXLA related

  1. Delete unused .so file, and .lds file #5275 build file
  2. Enable PjRt Client Compilation with StableHLO #5233 StableHLO Pjrt
  3. Suppress debug symbols in OpenXLA code #5269 Suppress debug symbols in OpenXLA code

The last pr I cherry-picked is avoid copy proto in PrepareOutputShardingPropagation ( at J07/07

vanbasten23 and others added 22 commits July 10, 2023 22:34
…mic dimensions. (#5239) * Skip calling as_strided in empty_strided_symint. * only return empty_symint conditionally. * add a comment
* Add XRT nightly builds * remove space
* Add ToString method for both PjrtData and PjrtShardedData * on cpu same config will become replicated, dont't check actual op sharding type
* disable bazel remote cache if gcloud key is empty * remove remote cache from setup.py * experiment with debug msg * fix flag * add more logs * skip remote chache if credential file is empty * add comment * add logs * add check in test and coverage script * fix condition in coverage test * advance branch pr * allow remote cache if gloud file isn't specified explicitly * remove dummy comment
* Clean bazel stuff on distutils clean * Fix python formatting
However the generated StableHLO graph still hardcodes the non-tensor value. this is not correct, will fix later.
Bazel should figure out that _XLAC.so is current or not, and trigger rebuild if any cpp files changed.
* Remove or improve several hardcoded TPU test conditions * Fix test condition
* Err if calling sizes() on dynamic tensor * try to set has_symbolic_sizes_strides_ * resolve merge conflict * enable CONTINUE_ON_ERROR * fixed the python test test_SizeEq_should_not_compile_for_identical_symints * fix test_index_types * set CONTINUE_ON_ERROR to true * remove some unwanted code. * add a print * directly set has_symbolic_sizes_strides_ = true * make some fixes. * fix empty_strided_symint * ran linter * change error type in the test. * fix comments * ran linter
…5281) * Fix the error where mark_step does not materalize tensors on SPMD:0 * typo * fix test_non_tensor_scalar
* Set torch._dynamo.config.automatic_dynamic_shapes to False * Enable DynamoInferenceBasicTest.test_simple_model_with_different_input_shape
Summary: This pull request does the following: 1. It hides token for all_gather. 2. It folds the out-of-place all_gather into the regular all_gather. 3. It fixes an issue with the last all_reduce_in_place PR where it forgot to set the token. Test Plan: PJRT_DEVICE=TPU python test/test_mp_all_gather.py
@will-cromar
Copy link
Collaborator

Are you sure we should cherry-pick Suppress debug symbols in OpenXLA code?

@JackCaoG
Copy link
Collaborator Author

lol good catch.. let me revert that

@mateuszlewko
Copy link
Collaborator

LGTM changes in /infra/...

@JackCaoG
Copy link
Collaborator Author

error is

ERROR: /tmp/pytorch/xla/torch_xla/csrc/runtime/BUILD:534:10: Linking torch_xla/csrc/runtime/libxla_computation_client.so failed: missing input file '//torch_xla/csrc/runtime:tf_exported_symbols.lds' ERROR: /tmp/pytorch/xla/torch_xla/csrc/runtime/BUILD:534:10: Linking torch_xla/csrc/runtime/libxla_computation_client.so failed: missing input file '//torch_xla/csrc/runtime:tf_version_script.lds 

I think they are being deleted as one of the commit, I can add them back

This reverts commit e91ad3a.
@JackCaoG
Copy link
Collaborator Author

@will-cromar test is green, can you take another look at this pr? After it is approved, I think we should turn on the rebae and merge, merge this pr, then maybe turn it off again.

Copy link
Collaborator

@will-cromar will-cromar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It's up to you how you want to merge it. I'm fine with squashing.

@JackCaoG
Copy link
Collaborator Author

I enabled the allow rebase and merge in the setting but it is still grey here. I will give it a day to see if it is just config take times to propragate.

@JackCaoG
Copy link
Collaborator Author

rebase and merge still grey, I am just gonna squash.

@JackCaoG JackCaoG merged commit 4b1742e into xrt Jul 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet