Skip to content

Conversation

@yifuwang
Copy link
Contributor

@yifuwang yifuwang commented Apr 4, 2024

PyTorch has implemented a new set of functional collective ops and is planning to remove the old ops. Migrating all_reduce to use the new op.

See context in pytorch/pytorch#93173 (comment)

@JackCaoG JackCaoG requested a review from alanwaketan April 4, 2024 17:51
@yifuwang yifuwang marked this pull request as draft April 5, 2024 20:09
Copy link
Collaborator

@alanwaketan alanwaketan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you need any helps to setup XLA environment?

@yifuwang
Copy link
Contributor Author

yifuwang commented Apr 8, 2024

Let me know if you need any helps to setup XLA environment?

Hey @alanwaketan, I'd appreciate your help on this. Is it possible to setup the dev env without having access to docker?

@alanwaketan
Copy link
Collaborator

Let me know if you need any helps to setup XLA environment?

Hey @alanwaketan, I'd appreciate your help on this. Is it possible to setup the dev env without having access to docker?

Right...

@alanwaketan
Copy link
Collaborator

Let me know if you need any helps to setup XLA environment?

Hey @alanwaketan, I'd appreciate your help on this. Is it possible to setup the dev env without having access to docker?

Have you followed https://github.com/pytorch/xla/blob/master/CONTRIBUTING.md? What error do you get? I wonder may be you can just develop and test this on the cpu machine and no need to access TPU.

@yifuwang
Copy link
Contributor Author

yifuwang commented Apr 9, 2024

Have you followed https://github.com/pytorch/xla/blob/master/CONTRIBUTING.md? What error do you get? I wonder may be you can just develop and test this on the cpu machine and no need to access TPU.

Yea I tried to follow it. But building from source requires docker which is not available in my dev env :(

@alanwaketan
Copy link
Collaborator

Have you followed master/CONTRIBUTING.md? What error do you get? I wonder may be you can just develop and test this on the cpu machine and no need to access TPU.

Yea I tried to follow it. But building from source requires docker which is not available in my dev env :(

Oh, so you cannot use docker or you cannot access our docker?

@yifuwang
Copy link
Contributor Author

yifuwang commented Apr 9, 2024

Oh, so you cannot use docker or you cannot access our docker?

Cannot use docker. It's a restriction of my corporate dev env :(

@alanwaketan
Copy link
Collaborator

Maybe you can leave the development to our team then. I don't know how difficult it is to use non-docker env... @JackCaoG Do you know?

PyTorch has implemented a new set of functional collective ops and is planning to remove the old ops. Migrating all_reduce to use the new op. See context in pytorch/pytorch#93173 (comment)
@yifuwang
Copy link
Contributor Author

yifuwang commented Apr 9, 2024

Maybe you can leave the development to our team then.

Sounds good. Thanks for the help @alanwaketan! I just wanted to make sure there wasn't a gap for torch-xla to adopt the new API.

@yifuwang yifuwang marked this pull request as ready for review April 10, 2024 18:48
@yifuwang
Copy link
Contributor Author

@alanwaketan - following your advices, the CI is now green.

Note this PR doesn't address the TODO re. generating groups. It merely switches to the new API while being consistent with the old behavior. Happy to continue discussing how to plumb through group information.

Let me know what you think.

@alanwaketan
Copy link
Collaborator

Thanks, Yifu. We don't need group information in XLA in general. So we can follow up on that later. Let me trigger the TPU CI. Once everything is green. I will just merge the PR.

Copy link
Collaborator

@alanwaketan alanwaketan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@alanwaketan
Copy link
Collaborator

Oh, we cannot run TPU CI on fork I guess... Never mind. I will just merge it. The head CI will take care of the rest.

@alanwaketan alanwaketan merged commit a816c42 into pytorch:master Apr 10, 2024
yifuwang pushed a commit to pytorch/pytorch that referenced this pull request Apr 10, 2024
… legacy funcol After pytorch/xla#6887, torch-xla now also uses the all_reduce from native funcol. So we can remove this logic. [ghstack-poisoned]
yifuwang pushed a commit to pytorch/pytorch that referenced this pull request Apr 10, 2024
…-xla to use legacy funcol" After pytorch/xla#6887, torch-xla now also uses the all_reduce from native funcol. So we can remove this logic. [ghstack-poisoned]
yifuwang pushed a commit to pytorch/pytorch that referenced this pull request Apr 13, 2024
…hat forces torch-xla to use legacy funcol" After pytorch/xla#6887, torch-xla now also uses the all_reduce from native funcol. So we can remove this logic. [ghstack-poisoned]
yifuwang pushed a commit to pytorch/pytorch that referenced this pull request Apr 13, 2024
…-xla to use legacy funcol" After pytorch/xla#6887, torch-xla now also uses the all_reduce from native funcol. So we can remove this logic. [ghstack-poisoned]
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Apr 13, 2024
… legacy funcol (#123776) After pytorch/xla#6887, torch-xla now also uses the all_reduce from native funcol. So we can remove this logic. Pull Request resolved: #123776 Approved by: https://github.com/wanchaol
sanketpurandare pushed a commit to sanketpurandare/pytorch that referenced this pull request Apr 22, 2024
… legacy funcol (pytorch#123776) After pytorch/xla#6887, torch-xla now also uses the all_reduce from native funcol. So we can remove this logic. Pull Request resolved: pytorch#123776 Approved by: https://github.com/wanchaol
petrex pushed a commit to petrex/pytorch that referenced this pull request May 3, 2024
… legacy funcol (pytorch#123776) After pytorch/xla#6887, torch-xla now also uses the all_reduce from native funcol. So we can remove this logic. Pull Request resolved: pytorch#123776 Approved by: https://github.com/wanchaol
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants