Lower `as_strided_copy` use fast path with `slice` #8734

zpcore · 2025-02-21T19:51:44Z

When we execute the following two code snippets regarding flash attention kernel in custom_kernel.py, they suppose to produce the same result.
a.

l, m = (v[..., 0] for v in aux[-2:])

b.

l, m = aux[-2:] l = torch.ops.aten.slice(l, -1, 0, 1) m = torch.ops.aten.slice(m, -1, 0, 1)

Both will be lowered through

xla/torch_xla/csrc/aten_xla_type.cpp

Line 882 in 1acc987

at::Tensor XLANativeFunctions::as_strided_copy(

, the difference is that input argument stride and size will be one element fewer in code a compared with code b. With such argument difference, code a will be fallback into aten::take and this can trigger the following error when we call with SPMD:

F0223 07:18:45.157172 842998 hlo_sharding.cc:1024] Check failed: !IsManual()

I plan to check in test_as_stride_use_slice.py in this PR.
Note 1. Failing test test_scan_layer_aot is not enabled until #8742 is resolved.
2. Failing test test_scan_weight_layer_aot is not enabled unti #8753 is resolved

torch_xla/csrc/aten_xla_type.cpp

test/cpp/test_aten_xla_tensor_3.cpp

torch_xla/experimental/custom_kernel.py

test/test_as_stride_use_slice.py

test/cpp/test_aten_xla_tensor_3.cpp

torch_xla/experimental/custom_kernel.py

test/test_as_stride_use_slice.py

torch_xla/csrc/aten_xla_type.cpp

test/test_as_stride_use_slice.py

test/tpu/run_tests.sh

torch_xla/csrc/aten_xla_type.cpp

tengyifei · 2025-02-26T19:37:13Z

torch_xla/csrc/aten_xla_type.cpp

+ if (stride_mul != stride[j]) {
+ if (skip_dim == -1) {
+ skip_dim = i;
+ K = stride[j] / stride_mul;


Should we check that stride[j] can be evenly divided by stride_mul and exit if the remainder is not 0?

It doesn't need to be 'evenly divided' for stride[j] as long as all indexes before j of stride matches with the cumulative product of tensor dim.

torch_xla/csrc/aten_xla_type.cpp

test/test_as_stride_use_slice.py

zpcore force-pushed the piz/as_stride branch from 83b4643 to 2c006e8 Compare February 23, 2025 07:59

zpcore changed the title ~~slice lower~~ Lower as_strided_copy use fast path with slice Feb 23, 2025

zpcore marked this pull request as ready for review February 23, 2025 08:17

zpcore requested a review from tengyifei February 23, 2025 08:19

zpcore mentioned this pull request Feb 24, 2025

Perf regression from updating torch_xla from 2.6 stable to Feb 17 nightly AI-Hypercomputer/torchprime#119

Closed

zpcore requested a review from bhavya01 February 24, 2025 17:37

pgmoka reviewed Feb 24, 2025

View reviewed changes

torch_xla/csrc/aten_xla_type.cpp Outdated Show resolved Hide resolved

test/cpp/test_aten_xla_tensor_3.cpp Show resolved Hide resolved

torch_xla/experimental/custom_kernel.py Outdated Show resolved Hide resolved

test/test_as_stride_use_slice.py Show resolved Hide resolved

zpcore mentioned this pull request Feb 25, 2025

Use scan and hostoffloading for llama model AI-Hypercomputer/torchprime#123

Closed

tengyifei requested changes Feb 25, 2025

View reviewed changes

pgmoka reviewed Feb 26, 2025

View reviewed changes

torch_xla/csrc/aten_xla_type.cpp Outdated Show resolved Hide resolved

tengyifei requested changes Feb 26, 2025

View reviewed changes

tengyifei reviewed Feb 26, 2025

View reviewed changes

test/test_as_stride_use_slice.py Outdated Show resolved Hide resolved

tengyifei reviewed Feb 26, 2025

View reviewed changes

test/test_as_stride_use_slice.py Outdated Show resolved Hide resolved

tengyifei reviewed Feb 26, 2025

View reviewed changes

test/test_as_stride_use_slice.py Outdated Show resolved Hide resolved

tengyifei reviewed Feb 26, 2025

View reviewed changes

test/test_as_stride_use_slice.py Outdated Show resolved Hide resolved

zpcore commented Feb 26, 2025

View reviewed changes

test/test_as_stride_use_slice.py Show resolved Hide resolved

zpcore added 13 commits February 27, 2025 00:29

update stride

179334d

format

c0c260d

revert build script

267605a

nit

2540a29

update test

9b23a92

nit

f669e2b

update with feedback

8ba87b2

reformat

112d74f

format

a22834c

nit

338008d

nit

969fa18

nit

c10aab1

loose error

2357253

zpcore force-pushed the piz/as_stride branch from 33daabf to 2357253 Compare February 27, 2025 00:30

tengyifei approved these changes Feb 27, 2025

View reviewed changes

tengyifei merged commit 1ab8216 into master Feb 27, 2025
23 checks passed

zpcore deleted the piz/as_stride branch February 27, 2025 05:10

pgmoka mentioned this pull request Feb 28, 2025

Paramatize test_aten_xla_tensor tests #8772

Open

zpcore mentioned this pull request Feb 28, 2025

llama 3.1 8B OOM with nightly build 02272025 AI-Hypercomputer/torchprime#127

Closed

pgmoka pushed a commit that referenced this pull request Mar 5, 2025

Lower as_strided_copy use fast path with slice (#8734)

db57f00

Uh oh!

Lower as_strided_copy use fast path with slice #8734

Lower as_strided_copy use fast path with slice #8734

Uh oh!

Conversation

zpcore commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tengyifei Feb 26, 2025

Choose a reason for hiding this comment

zpcore Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Labels

3 participants

Lower `as_strided_copy` use fast path with `slice` #8734

Lower `as_strided_copy` use fast path with `slice` #8734

zpcore commented Feb 21, 2025 •

edited

Loading

zpcore Feb 26, 2025 •

edited

Loading