Dynamo/AOTAutograd traceable flash attention #8654

zpcore · 2025-01-30T19:08:03Z

Resolves #8633

tengyifei

This is really great

test/test_pallas.py

torch_xla/experimental/custom_kernel.py

test/test_pallas_spmd.py

test/test_pallas.py

We replace the `for` loop in both Llama and Mixtral with an equivalent `HomogenousSequential` layer, which can be either run a for loop or use `torch_xla`'s scan operator. This is a clean-ish way to turn scan on/off without cluttering the modeling code. I also adjusted Mixtral slightly so that we can even run `scan` in Mixtral with its static MoE implementation. Scanning over GMM on the other hand won't work until GMM forward/backward is wrapped in a custom op similar to pytorch/xla#8654. Test: added unit test. Next PR will change the trainer to apply scan.

* Make models amenable to scan We replace the `for` loop in both Llama and Mixtral with an equivalent `HomogenousSequential` layer, which can be either run a for loop or use `torch_xla`'s scan operator. This is a clean-ish way to turn scan on/off without cluttering the modeling code. I also adjusted Mixtral slightly so that we can even run `scan` in Mixtral with its static MoE implementation. Scanning over GMM on the other hand won't work until GMM forward/backward is wrapped in a custom op similar to pytorch/xla#8654. Test: added unit test. Next PR will change the trainer to apply scan. * Address comments

Support fw and bw with spmd

a8c8f47

zpcore force-pushed the piz/autograde_trace branch from f82f373 to a8c8f47 Compare January 31, 2025 10:31

zpcore changed the title ~~backward with spmd issue~~ Dynamo/AOTAutograd traceable flash attention Jan 31, 2025

nit

ff69d5a

zpcore requested a review from tengyifei January 31, 2025 10:38

fix fw return signature

3bd4b1b

tengyifei approved these changes Jan 31, 2025

View reviewed changes

review update

de94ecb

zpcore merged commit 9ae017e into master Feb 1, 2025
12 checks passed

zpcore deleted the piz/autograde_trace branch February 1, 2025 04:22

zpcore mentioned this pull request Feb 28, 2025

llama 3.1 8B OOM with nightly build 02272025 AI-Hypercomputer/torchprime#127

Closed

tengyifei mentioned this pull request Mar 17, 2025

Make models amenable to scan AI-Hypercomputer/torchprime#157

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Dynamo/AOTAutograd traceable flash attention #8654

Dynamo/AOTAutograd traceable flash attention #8654

Uh oh!

zpcore commented Jan 30, 2025 •

edited

Loading

tengyifei left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Labels

2 participants

Uh oh!

Dynamo/AOTAutograd traceable flash attention #8654

Dynamo/AOTAutograd traceable flash attention #8654

Uh oh!

Conversation

zpcore commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

tengyifei left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Labels

2 participants

zpcore commented Jan 30, 2025 •

edited

Loading