-
Couldn't load subscription status.
- Fork 560
Expose mark_sharding_with_gradients as a public API. #8826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| it seems that i cannot trigger the re-run of the testing workflow. is it a permission issue? or do i miss anything? @tengyifei |
| @iwknow for security only repo writers can run workflow. i just ran it for you |
| I am a little bit confused about what to expect here. in my test, i have there are three tensors: xt1, xst1, output and their corresponding gradients. I expect |
There's a difference between "having sharding in their hlo" vs "having sharding in their If you check the HLO of a tensor, that will contain not just the HLO corresponding to the tensor itself, but also the HLO of any input tensor and their inputs, etc., recursively until you hit device data tensors. It's better to check In our snippet above, I'd expect the following:
That's because if we don't call I do think |
| Thanks for the detailed explanation! One more thing:
do you mean |
| Ah, you're right |
…sharding annotation. Also update documentation of mark_sharding_with_gradients.
This change concludes #8678