Fix HLO dumping #4619

JackCaoG · 2023-02-14T02:17:47Z

After a LTC change, we delay the lock grabbing logic which improved the performance, but it also make HLO dumping triggered via XLA_SAVE_TENSORS_FMT='hlo' XLA_SAVE_TENSORS_FILE happen before the last execution finished. It is fine for IR, but for HLO this might trigger an access to placeholder IR.

verfiied that with this change

XLA_SAVE_TENSORS_FMT='hlo' XLA_SAVE_TENSORS_FILE='tmp/save.hlo' python test/dynamo/test_dynamo.py DynamoTrainingBasicTest.test_resnet18

passed

JackCaoG · 2023-02-14T02:18:10Z

It is a regression so we should include this in the 2.0 release.

vanbasten23 · 2023-02-14T05:16:46Z

torch_xla/csrc/debug_util.cpp

+ if (format == DebugUtil::GraphFormat::kHlo && indices->size() > 0) {
+ // Dumping the HLO might access the placeholder data created during
+ // previous execution. We need to wait for last execution to finish before
+ // proceeding.


I wonder what consequence we get if we don't wait, for example what error would you get when you run the python script

PJRT will throw a hasValue error when it access the placeholder.

* Fix HLO dumping (#4619) * Update TF pin to 2/13 (#4615) * Update TF pin to 2/13 * Fix pinned commit * Add patch to revert TF 3e24055 * Add comment to new patch * Fix patch command in TPU CI (#4623) * Skip execution for extract_compiled_graph (#4612) * Only warm up cache for dynamo extract_graph step * Add missing config * Make sure warm up run does not cause place holder to be created * Fix tests * Disable failing `test_operations.py` tests on TPU (#4622) * Disable `test_operations.py` tests failing on TPU * Add to TPU CI * Bazel (#4528) * Replace tensorflow with a bazel external repository * Basic migration to bazel for xla_client. * Revert to blob * Add vscode config. * Update newlines * Merge with pjrt client test build changes. * Migrate tests to new build * Format test and plugin * Order imports * Conditionally apply tf patches; apply pt patches always. * Format python * configure formatters * Mirror TF pin update an fixes in bazel. * Support local and sandboxed build based on flags * Add cloud cache URLs for llvm. * Merge with upstream * Update TF pin * Fix patching regression * Revert "Bazel (#4528)" (#4631) This reverts commit 3a90f5a. --------- Co-authored-by: JackCaoG <59073027+JackCaoG@users.noreply.github.com> Co-authored-by: Will Cromar <wcromar@google.com> Co-authored-by: stgpetrovic <stgpetrovic@gmail.com>

Fix HLO dumping

f49047f

JackCaoG requested review from will-cromar and yeounoh February 14, 2023 02:17

will-cromar approved these changes Feb 14, 2023

View reviewed changes

vanbasten23 reviewed Feb 14, 2023

View reviewed changes

JackCaoG merged commit f70afa4 into master Feb 14, 2023

JackCaoG added a commit that referenced this pull request Feb 16, 2023

Fix HLO dumping (#4619)

e88dc21

chandrasekhard2 pushed a commit that referenced this pull request Feb 22, 2023

Fix HLO dumping (#4619)

16951fe

chandrasekhard2 pushed a commit that referenced this pull request Feb 22, 2023

Fix HLO dumping (#4619)

f692e45

mateuszlewko pushed a commit that referenced this pull request Mar 15, 2023

Fix HLO dumping (#4619)

53c07e1

vanbasten23 mentioned this pull request Jul 21, 2023

Disable cxx_abi when building PyTorch/XLA in r2.0. #5332

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix HLO dumping #4619

Fix HLO dumping #4619

Uh oh!

JackCaoG commented Feb 14, 2023

JackCaoG commented Feb 14, 2023

vanbasten23 Feb 14, 2023

JackCaoG Feb 14, 2023

Labels

3 participants

Uh oh!

Fix HLO dumping #4619

Fix HLO dumping #4619

Uh oh!

Conversation

JackCaoG commented Feb 14, 2023

JackCaoG commented Feb 14, 2023

vanbasten23 Feb 14, 2023

Choose a reason for hiding this comment

JackCaoG Feb 14, 2023

Choose a reason for hiding this comment

Labels

3 participants