Expose mark_sharding_with_gradients as a public API. #8826

iwknow · 2025-03-12T22:39:53Z

This change concludes #8678

iwknow · 2025-03-13T16:14:48Z

it seems that i cannot trigger the re-run of the testing workflow. is it a permission issue? or do i miss anything? @tengyifei

tengyifei · 2025-03-13T18:09:54Z

@iwknow for security only repo writers can run workflow. i just ran it for you

iwknow · 2025-03-13T22:18:57Z

I am a little bit confused about what to expect here. in my test, i have

xt1 = torch.tensor([[1, 2, 3, 4, 5, 6, 7, 8]], dtype=torch.float, device=xm.xla_device(), requires_grad=True) xst1 = xs.mark_sharding_with_gradients(xt1, mesh, partition_spec) xst1.retain_grad() output = xst1.sum() output.retain_grad() output.backward()

there are three tensors: xt1, xst1, output and their corresponding gradients. I expect xst1, xst1.grad, output, output.grad, and xt1.grad have "sharding" section in their hlo. However, my experiment shows that xt1, xt1.grad, xst1, output have the "sharding" section in their hlo but xst1.grad and output.grad doesn't have the "sharding". is this expected? or is that related to retain_grad (by default, these grads are not retained)? or do i miss something?

test/spmd/test_xla_sharding.py

torch_xla/distributed/spmd/xla_sharding.py

tengyifei · 2025-03-14T22:14:55Z

I am a little bit confused about what to expect here. in my test, i have
xt1 = torch.tensor([[1, 2, 3, 4, 5, 6, 7, 8]], dtype=torch.float, device=xm.xla_device(), requires_grad=True) xst1 = xs.mark_sharding_with_gradients(xt1, mesh, partition_spec) xst1.retain_grad() output = xst1.sum() output.retain_grad() output.backward() 
there are three tensors: xt1, xst1, output and their corresponding gradients. I expect xst1, xst1.grad, output,
output.grad, and xt1.grad have "sharding" section in their hlo. However, my experiment shows that xt1, xt1.grad, xst1,
output have the "sharding" section in their hlo but xst1.grad and output.grad doesn't have the "sharding". is this
expected? or is that related to retain_grad (by default, these grads are not retained)? or do i miss something?

There's a difference between "having sharding in their hlo" vs "having sharding in their torch_xla._XLAC._get_xla_sharding_spec(my_tensor)".

If you check the HLO of a tensor, that will contain not just the HLO corresponding to the tensor itself, but also the HLO of any input tensor and their inputs, etc., recursively until you hit device data tensors. It's better to check torch_xla._XLAC._get_xla_sharding_spec, which will return the sharding spec of the tensor and nothing else. I think if you search for this function's usage in the code base you'd find how to write tests with it.

In our snippet above, I'd expect the following:

xst1 has sharding annotations if you look in _get_xla_sharding_spec
xst1.grad has sharding annotations if you look in _get_xla_sharding_spec
everything else won't have sharding annotations

That's because if we don't call torch_xla.sync(), the GSPMD sharding propagation is not run. Then only the tensors which we explicitly called mark_sharding on will have a sharding spec.

I do think xst.retain_grad() is required to keep the xst.grad node around though; otherwise it would be cleared by PyTorch after output.backward().

iwknow · 2025-03-15T06:10:38Z

Thanks for the detailed explanation! One more thing:

In our snippet above, I'd expect the following:

xst1 has sharding annotations if you look in _get_xla_sharding_spec
xst1.grad has sharding annotations if you look in _get_xla_sharding_spec

do you mean xt1 instead of xst1 on your second bullet point? my experiment shows that only xst1 and xt1.grad has the sharding annotation. This is also align with the assertions in other tests (i.e. test_mark_sharding_autograd, test_mark_sharding_aot_compile). my understanding is that the MarkShardingFunction is a custom operation in the computation graph. Then we have

FORWARD: xt1(sharding=None) ---- MarkShardingFunction.forward -----> xst1(sharding={user_defined_sharding}) BACKWARD: xr1.grad(sharding={user_defined_sharding}) <---- MarkShardingFunction.backward ---- xst1.grad(sharding=None)

tengyifei · 2025-03-15T06:12:25Z

Ah, you're right

…sharding annotation. Also update documentation of mark_sharding_with_gradients.

iwknow added 3 commits March 12, 2025 22:37

Expose mark_sharding_with_gradients as a public API.

bd23dbd

Format modified files.

e74101b

Fix typo in test.

99dc0dc

Merge branch 'pytorch:master' into master

d03f414

bhavya01 reviewed Mar 14, 2025

View reviewed changes

test/spmd/test_xla_sharding.py Outdated Show resolved Hide resolved

tengyifei requested changes Mar 14, 2025

View reviewed changes

torch_xla/distributed/spmd/xla_sharding.py Outdated Show resolved Hide resolved

torch_xla/distributed/spmd/xla_sharding.py Outdated Show resolved Hide resolved

iwknow and others added 4 commits March 15, 2025 14:58

Merge branch 'pytorch:master' into master

08f1627

Update the mark_sharding_with_gradients test to explicitly check the …

a9b7d08

…sharding annotation. Also update documentation of mark_sharding_with_gradients.

remove unused imports.

22c9178

fix typos and temporary settings.

75ef25a

iwknow requested review from bhavya01 and tengyifei March 15, 2025 23:58

tengyifei approved these changes Mar 17, 2025

View reviewed changes

tengyifei merged commit 5caaaee into pytorch:master Mar 17, 2025
23 checks passed

tengyifei mentioned this pull request Mar 17, 2025

Introduce a mark_sharding that also shards the backward #8678

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Expose mark_sharding_with_gradients as a public API. #8826

Expose mark_sharding_with_gradients as a public API. #8826

Uh oh!

iwknow commented Mar 12, 2025

iwknow commented Mar 13, 2025 •

edited

Loading

tengyifei commented Mar 13, 2025 •

edited

Loading

iwknow commented Mar 13, 2025

Uh oh!

Uh oh!

Uh oh!

tengyifei commented Mar 14, 2025 •

edited

Loading

iwknow commented Mar 15, 2025

tengyifei commented Mar 15, 2025

Uh oh!

Labels

3 participants

Uh oh!

Expose mark_sharding_with_gradients as a public API. #8826

Expose mark_sharding_with_gradients as a public API. #8826

Uh oh!

Conversation

iwknow commented Mar 12, 2025

iwknow commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

tengyifei commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

iwknow commented Mar 13, 2025

Uh oh!

Uh oh!

Uh oh!

tengyifei commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

iwknow commented Mar 15, 2025

tengyifei commented Mar 15, 2025

Uh oh!

Labels

3 participants

iwknow commented Mar 13, 2025 •

edited

Loading

tengyifei commented Mar 13, 2025 •

edited

Loading

tengyifei commented Mar 14, 2025 •

edited

Loading