Skip to content

Conversation

@JackCaoG
Copy link
Collaborator

After a LTC change, we delay the lock grabbing logic which improved the performance, but it also make HLO dumping triggered via XLA_SAVE_TENSORS_FMT='hlo' XLA_SAVE_TENSORS_FILE happen before the last execution finished. It is fine for IR, but for HLO this might trigger an access to placeholder IR.

verfiied that with this change

XLA_SAVE_TENSORS_FMT='hlo' XLA_SAVE_TENSORS_FILE='tmp/save.hlo' python test/dynamo/test_dynamo.py DynamoTrainingBasicTest.test_resnet18 

passed

@JackCaoG
Copy link
Collaborator Author

It is a regression so we should include this in the 2.0 release.

if (format == DebugUtil::GraphFormat::kHlo && indices->size() > 0) {
// Dumping the HLO might access the placeholder data created during
// previous execution. We need to wait for last execution to finish before
// proceeding.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what consequence we get if we don't wait, for example what error would you get when you run the python script

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PJRT will throw a hasValue error when it access the placeholder.

@JackCaoG JackCaoG merged commit f70afa4 into master Feb 14, 2023
JackCaoG added a commit that referenced this pull request Feb 16, 2023
* Fix HLO dumping (#4619) * Update TF pin to 2/13 (#4615) * Update TF pin to 2/13 * Fix pinned commit * Add patch to revert TF 3e24055 * Add comment to new patch * Fix patch command in TPU CI (#4623) * Skip execution for extract_compiled_graph (#4612) * Only warm up cache for dynamo extract_graph step * Add missing config * Make sure warm up run does not cause place holder to be created * Fix tests * Disable failing `test_operations.py` tests on TPU (#4622) * Disable `test_operations.py` tests failing on TPU * Add to TPU CI * Bazel (#4528) * Replace tensorflow with a bazel external repository * Basic migration to bazel for xla_client. * Revert to blob * Add vscode config. * Update newlines * Merge with pjrt client test build changes. * Migrate tests to new build * Format test and plugin * Order imports * Conditionally apply tf patches; apply pt patches always. * Format python * configure formatters * Mirror TF pin update an fixes in bazel. * Support local and sandboxed build based on flags * Add cloud cache URLs for llvm. * Merge with upstream * Update TF pin * Fix patching regression * Revert "Bazel (#4528)" (#4631) This reverts commit 3a90f5a. --------- Co-authored-by: JackCaoG <59073027+JackCaoG@users.noreply.github.com> Co-authored-by: Will Cromar <wcromar@google.com> Co-authored-by: stgpetrovic <stgpetrovic@gmail.com>
JackCaoG added a commit that referenced this pull request Feb 16, 2023
chandrasekhard2 pushed a commit that referenced this pull request Feb 22, 2023
chandrasekhard2 pushed a commit that referenced this pull request Feb 22, 2023
mateuszlewko pushed a commit that referenced this pull request Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants