-
Couldn't load subscription status.
- Fork 560
[Distributed] Switch all_reduce to use the new functional collective op #6887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if you need any helps to setup XLA environment?
Hey @alanwaketan, I'd appreciate your help on this. Is it possible to setup the dev env without having access to docker? |
Right... |
Have you followed https://github.com/pytorch/xla/blob/master/CONTRIBUTING.md? What error do you get? I wonder may be you can just develop and test this on the cpu machine and no need to access TPU. |
Yea I tried to follow it. But building from source requires docker which is not available in my dev env :( |
Oh, so you cannot use docker or you cannot access our docker? |
Cannot use docker. It's a restriction of my corporate dev env :( |
| Maybe you can leave the development to our team then. I don't know how difficult it is to use non-docker env... @JackCaoG Do you know? |
PyTorch has implemented a new set of functional collective ops and is planning to remove the old ops. Migrating all_reduce to use the new op. See context in pytorch/pytorch#93173 (comment)
Sounds good. Thanks for the help @alanwaketan! I just wanted to make sure there wasn't a gap for torch-xla to adopt the new API. |
| @alanwaketan - following your advices, the CI is now green. Note this PR doesn't address the TODO re. generating groups. It merely switches to the new API while being consistent with the old behavior. Happy to continue discussing how to plumb through group information. Let me know what you think. |
| Thanks, Yifu. We don't need group information in XLA in general. So we can follow up on that later. Let me trigger the TPU CI. Once everything is green. I will just merge the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
| Oh, we cannot run TPU CI on fork I guess... Never mind. I will just merge it. The head CI will take care of the rest. |
… legacy funcol After pytorch/xla#6887, torch-xla now also uses the all_reduce from native funcol. So we can remove this logic. [ghstack-poisoned]
…-xla to use legacy funcol" After pytorch/xla#6887, torch-xla now also uses the all_reduce from native funcol. So we can remove this logic. [ghstack-poisoned]
…hat forces torch-xla to use legacy funcol" After pytorch/xla#6887, torch-xla now also uses the all_reduce from native funcol. So we can remove this logic. [ghstack-poisoned]
…-xla to use legacy funcol" After pytorch/xla#6887, torch-xla now also uses the all_reduce from native funcol. So we can remove this logic. [ghstack-poisoned]
… legacy funcol (#123776) After pytorch/xla#6887, torch-xla now also uses the all_reduce from native funcol. So we can remove this logic. Pull Request resolved: #123776 Approved by: https://github.com/wanchaol
… legacy funcol (pytorch#123776) After pytorch/xla#6887, torch-xla now also uses the all_reduce from native funcol. So we can remove this logic. Pull Request resolved: pytorch#123776 Approved by: https://github.com/wanchaol
… legacy funcol (pytorch#123776) After pytorch/xla#6887, torch-xla now also uses the all_reduce from native funcol. So we can remove this logic. Pull Request resolved: pytorch#123776 Approved by: https://github.com/wanchaol
PyTorch has implemented a new set of functional collective ops and is planning to remove the old ops. Migrating all_reduce to use the new op.
See context in pytorch/pytorch#93173 (comment)