Skip to content

[Computation Hash] User Computation hash is agnostic to debug metadata #8538

@rpsilva-aws

Description

@rpsilva-aws

🚀 Feature

Similar to #8537:

Currently, the user computation hashes its computation operand from the protobuf bytes [1]:
hash_ = torch::lazy::MHash(name, computation_.proto().SerializeAsString());
[1] xla/torch_xla/csrc/runtime/computation_client.h](

torch::lazy::MHash(name, computation_.proto().SerializeAsString());

and in this case, different entries from IR debug will end up generating different unique hashes. Subsequently, the hash will require a recompilation due to a Torch JIT cache miss.

In this feature request, we discuss and consider making a design decision to make the cache agnostic to metadata that should not influence the execution of the computations.

Motivation

The motivation is that we want to avoid having to recompile in case there is metadata in the proto, which otherwise affects performance and possibly OOMs (some backend engines treat the resulting HLO from TorchXLA as an unique executable binary).

Pitch

We could consider introducing a new flag that changes that behavior, but make it an opt in (or opt-out):
Say, XLA_CACHE_IGNORE_METADATA that removes some entries in the metadata field of the HLO proto module, e.g.:

  • op_name
  • source_line
  • source_file
    before generating the hash key for caching, but retaining otherwise. See all entries in https://github.com/openxla/xla/blob/a49d854a355374453af37a266823450db6714d9a/xla/xla_data.proto#L385. This would be on any place where the resulting computed protobuf for caching (includes user computation) is generated. This should be extensible in case other entries fit the profile here as well. Hence, at least, TorchXLA's "debug" metadata entry should, by design, not interfere with the resulting computation - unlike explicit metadata like frontend_attributes, rhs_contracting_dims, etc, e.g.:
computations { name: "SiluForwardImpl.290" instructions { name: "p0.291" opcode: "parameter" shape { element_type: BF16 dimensions: 4 dimensions: 4096 dimensions: 11008 layout { minor_to_major: 2 minor_to_major: 1 minor_to_major: 0 tail_padding_alignment_in_elements: 1 } is_dynamic_dimension: false is_dynamic_dimension: false is_dynamic_dimension: false } metadata { <<<<<<<<<<<<<<<<<<<<<<<<<<<<< here op_type: "some_op" op_name: "xla__some_op" source_file: ".../lib/python3.10/site-packages/torch_xla/some_file.py" source_line: 1559 } id: 291 frontend_attributes { } } 

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions