Fix input output alias for custom inplace ops #7822

yitongh · 2024-08-09T03:53:42Z

Fix the issue where the input output alias does not work when using custom inplace operations such as xm.optimization_barrier_. This issue mainly arises from custom inplace operations not updating the alias id with _propagate_xla_data, so it still uses the outdated alias id. This issue is very common in FSDP and can result in param not being alias.

Consider the case without this pr

t1 = torch.randn(4, 5).to(xla_device) # Tensor ID 2, alias ID 2 t1 *= t1 # Tensor ID 3, alias ID 2 xm.mark_step() xm.optimization_barrier_([t1]) # Tensor ID 3, alias ID 2 t1 *= 100 # Tensor ID 4, alias ID 2

In the second graph, the t1 cannot be aliased because the tensor id of t1 is 3, not 2.

with this pr

t1 = torch.randn(4, 5).to(xla_device) # Tensor ID 2, alias ID 2 t1 *= t1 # Tensor ID 3, alias ID 2 xm.mark_step() xm.optimization_barrier_([t1]) # Tensor ID 3, alias ID 3 t1 *= 100 # Tensor ID 4, alias ID 3

In the second graph, the t1 can be aliased because the output alias id and input tensor id of t1 are both 3.

xla/torch_xla/csrc/xla_graph_executor.cpp

Lines 1258 to 1282 in 9fbd64a

     std::vector<size_t> XLAGraphExecutor::SetBufferDonors(  
   const std::vector<XLATensorPtr>& tensors, absl::Span<const size_t> indices,  
   LoweringContext* lowering_ctx) {  
   std::unordered_map<int64_t, size_t> output_tensor_id_map;  
   std::vector<size_t> buffer_donor_indexs;  
   // tensors[indices] represent all tensors that needs to be updated after  
   // the execution. We can only alias the current buffer of these tensors  
   // since those buffers are no longer needed after execution.  
   for (size_t i = 0; i < indices.size(); ++i) {  
   size_t tensor_index = indices[i];  
   int64_t tensor_id = tensors[tensor_index]->data()->alias_id;  
   output_tensor_id_map[tensor_id] = i;  
   }  
   const auto& parameters_data = lowering_ctx->GetParametersData();  
   std::vector<ssize_t> alias_map(indices.size(), -1);  
   for (size_t i = 0; i < parameters_data.size(); ++i) {  
   auto* data_info =  
   static_cast<torch::lazy::LazyGraphExecutor::DeviceDataInfo*>(  
   parameters_data[i]->info());  
   if (data_info != nullptr && !data_info->read_only) {  
   auto it = output_tensor_id_map.find(data_info->tensor_id);  
   // Parameter buffer's TensorId in output_tensor_id_map means  
   // this buffer is not needed after execution since XLATensor will get a  
   // new buffer.  
   if (it != output_tensor_id_map.end()) {  
 

.

JackCaoG

Thanks!

Fix input output alias for custom inplace ops

4dc4842

JackCaoG self-requested a review August 12, 2024 06:02

JackCaoG approved these changes Aug 12, 2024

View reviewed changes

JackCaoG merged commit 8f79488 into pytorch:master Aug 12, 2024

yitongh deleted the fix_custom_alias branch August 13, 2024 01:55

yitongh added a commit to AlibabaPAI/xla that referenced this pull request Aug 13, 2024

Fix input output alias for custom inplace ops (pytorch#7822)

bb71be3

yitongh mentioned this pull request Aug 13, 2024

Fix input output alias for custom inplace ops AlibabaPAI/xla#7

Merged

yitongh added a commit to AlibabaPAI/xla that referenced this pull request Aug 13, 2024

Fix input output alias for custom inplace ops (pytorch#7822)

95e9db4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix input output alias for custom inplace ops #7822

Fix input output alias for custom inplace ops #7822

Uh oh!

yitongh commented Aug 9, 2024

JackCaoG left a comment

Labels

2 participants

	std::vector<size_t> XLAGraphExecutor::SetBufferDonors(
	const std::vector<XLATensorPtr>& tensors, absl::Span<const size_t> indices,
	LoweringContext* lowering_ctx) {
	std::unordered_map<int64_t, size_t> output_tensor_id_map;
	std::vector<size_t> buffer_donor_indexs;
	// tensors[indices] represent all tensors that needs to be updated after
	// the execution. We can only alias the current buffer of these tensors
	// since those buffers are no longer needed after execution.
	for (size_t i = 0; i < indices.size(); ++i) {
	size_t tensor_index = indices[i];
	int64_t tensor_id = tensors[tensor_index]->data()->alias_id;
	output_tensor_id_map[tensor_id] = i;
	}
	const auto& parameters_data = lowering_ctx->GetParametersData();
	std::vector<ssize_t> alias_map(indices.size(), -1);
	for (size_t i = 0; i < parameters_data.size(); ++i) {
	auto* data_info =
	static_cast<torch::lazy::LazyGraphExecutor::DeviceDataInfo*>(
	parameters_data[i]->info());
	if (data_info != nullptr && !data_info->read_only) {
	auto it = output_tensor_id_map.find(data_info->tensor_id);
	// Parameter buffer's TensorId in output_tensor_id_map means
	// this buffer is not needed after execution since XLATensor will get a
	// new buffer.
	if (it != output_tensor_id_map.end()) {

Uh oh!

Fix input output alias for custom inplace ops #7822

Fix input output alias for custom inplace ops #7822

Uh oh!

Conversation

yitongh commented Aug 9, 2024

JackCaoG left a comment

Choose a reason for hiding this comment

Labels

2 participants