[Traceable Collecive] Hide token for all_gather #5232

alanwaketan · 2023-06-22T02:29:55Z

Summary:
This pull request does the following:

It hides token for all_gather.
It fixes an issue with the last all_reduce_in_place PR where it forgot to set the token.

Test Plan:
PJRT_DEVICE=TPU python test/test_mp_all_gather.py

JackCaoG · 2023-06-22T17:39:55Z

seems like you need to fix test_zero1. I guess user never explicitly pass token to all_gather so there shouldn't be user level api change?

alanwaketan · 2023-06-22T21:48:13Z

seems like you need to fix test_zero1. I guess user never explicitly pass token to all_gather so there shouldn't be user level api change?

Yea, a miss. Glad that it's caught by the CI.

alanwaketan · 2023-06-23T17:57:43Z

@JackCaoG The PR is ready for review.

torch_xla/csrc/init_python_bindings.cpp

Summary: This pull request does the following: 1. It hides token for all_gather. 2. It folds the out-of-place all_gather into the regular all_gather. 3. It fixes an issue with the last all_reduce_in_place PR where it forgot to set the token. Test Plan: PJRT_DEVICE=TPU python test/test_mp_all_gather.py

alanwaketan · 2023-07-07T00:50:27Z

@JackCaoG Can I get a review?

alanwaketan · 2023-07-07T17:47:04Z

Thanks, Jack.

Summary: This pull request does the following: 1. It hides token for all_gather. 2. It folds the out-of-place all_gather into the regular all_gather. 3. It fixes an issue with the last all_reduce_in_place PR where it forgot to set the token. Test Plan: PJRT_DEVICE=TPU python test/test_mp_all_gather.py

* Skip calling as_strided in empty_strided_symint if the input has dynamic dimensions. (#5239) * Skip calling as_strided in empty_strided_symint. * only return empty_symint conditionally. * add a comment * Add XRT nightly builds (#5261) * Add XRT nightly builds * remove space * Add ToString method for both PjrtData and PjrtShardedData (#5265) * Add ToString method for both PjrtData and PjrtShardedData * on cpu same config will become replicated, dont't check actual op sharding type * fix xrt tostring * Update Sharded graph HLO dumping (#5266) * Disable Bazel remote cache for forked PR (#5259) * disable bazel remote cache if gcloud key is empty * remove remote cache from setup.py * experiment with debug msg * fix flag * add more logs * skip remote chache if credential file is empty * add comment * add logs * add check in test and coverage script * fix condition in coverage test * advance branch pr * allow remote cache if gloud file isn't specified explicitly * remove dummy comment * Suppress debug symbols in OpenXLA code (#5269) * [SPMD] Sharding n-d tensor on (n+1)-d Mesh (#5268) * Make TPU detection more robust (#5271) * Clean bazel stuff on distutils clean. (#5274) * Clean bazel stuff on distutils clean * Fix python formatting * fix conflict * Fix the error when export_torch_model is given a non-tensor (#5277) However the generated StableHLO graph still hardcodes the non-tensor value. this is not correct, will fix later. * Dsiable test_simple_model_with_different_input_shape since it is curretnly broken by pytorch (#5282) * Always do build_ext in python setup.py develop (#5273) Bazel should figure out that _XLAC.so is current or not, and trigger rebuild if any cpp files changed. * Remove or improve several hardcoded TPU test conditions (#5272) * Remove or improve several hardcoded TPU test conditions * Fix test condition * Add `runtime.host_index` (#5283) * Make it an error if calling sizes() on a dynamic tensor. (#4998) * Err if calling sizes() on dynamic tensor * try to set has_symbolic_sizes_strides_ * resolve merge conflict * enable CONTINUE_ON_ERROR * fixed the python test test_SizeEq_should_not_compile_for_identical_symints * fix test_index_types * set CONTINUE_ON_ERROR to true * remove some unwanted code. * add a print * directly set has_symbolic_sizes_strides_ = true * make some fixes. * fix empty_strided_symint * ran linter * change error type in the test. * fix comments * ran linter * Fix the error where mark_step does not materalize tensors on SPMD:0 (#5281) * Fix the error where mark_step does not materalize tensors on SPMD:0 * typo * fix test_non_tensor_scalar * Disable torch._dynamo.config.automatic_dynamic_shapes (#5285) * Set torch._dynamo.config.automatic_dynamic_shapes to False * Enable DynamoInferenceBasicTest.test_simple_model_with_different_input_shape * [Traceable Collecive] Hide token for all_gather (#5232) Summary: This pull request does the following: 1. It hides token for all_gather. 2. It folds the out-of-place all_gather into the regular all_gather. 3. It fixes an issue with the last all_reduce_in_place PR where it forgot to set the token. Test Plan: PJRT_DEVICE=TPU python test/test_mp_all_gather.py * Lower squeeze.dims (#5286) * avoid copy proto in PrepareOutputShardingPropagation (#5287) * Revert "Suppress debug symbols in OpenXLA code (#5269)" This reverts commit 3967d7b. * Revert "fix conflict" This reverts commit e91ad3a. --------- Co-authored-by: iefgnoix <isaacwxf23@gmail.com> Co-authored-by: Will Cromar <wcromar@google.com> Co-authored-by: Siyuan Liu <lsiyuan@google.com> Co-authored-by: stgpetrovic <stgpetrovic@gmail.com> Co-authored-by: Mohit Khatwani <118776932+khatwanimohit@users.noreply.github.com> Co-authored-by: qihqi <hanq@google.com> Co-authored-by: Wonjoo Lee <wonjoo@google.com> Co-authored-by: Jiewen Tan <jwtan@google.com> Co-authored-by: Baole Ai <baoleai01@gmail.com>

alanwaketan requested a review from JackCaoG June 22, 2023 02:29

JackCaoG reviewed Jun 23, 2023

View reviewed changes

torch_xla/csrc/init_python_bindings.cpp Show resolved Hide resolved

alanwaketan added 5 commits June 28, 2023 21:30

Fix linters

b1c564f

Fix test_zero1

188b01f

Fix linters

6a40532

Revert out variant change

4f1ec78

alanwaketan force-pushed the alanwaketan/all_gather_token branch from 4b7bf6d to 4f1ec78 Compare June 29, 2023 00:45

JackCaoG approved these changes Jul 7, 2023

View reviewed changes

alanwaketan merged commit 42d9270 into master Jul 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Traceable Collecive] Hide token for all_gather #5232

[Traceable Collecive] Hide token for all_gather #5232

Uh oh!

alanwaketan commented Jun 22, 2023 •

edited

Loading

JackCaoG commented Jun 22, 2023

alanwaketan commented Jun 22, 2023

alanwaketan commented Jun 23, 2023

Uh oh!

alanwaketan commented Jul 7, 2023

alanwaketan commented Jul 7, 2023

Labels

3 participants

Uh oh!

[Traceable Collecive] Hide token for all_gather #5232

[Traceable Collecive] Hide token for all_gather #5232

Uh oh!

Conversation

alanwaketan commented Jun 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JackCaoG commented Jun 22, 2023

alanwaketan commented Jun 22, 2023

alanwaketan commented Jun 23, 2023

Uh oh!

alanwaketan commented Jul 7, 2023

alanwaketan commented Jul 7, 2023

Labels

3 participants

alanwaketan commented Jun 22, 2023 •

edited

Loading