Skip to content

Conversation

danielvegamyhre
Copy link
Contributor

@danielvegamyhre danielvegamyhre commented Oct 7, 2025

Stacked PRs:


[mxfp8 moe training] integrate mxfp8 dim0 triton kernel

Test plan

  • pytest test/prototype/moe_training/test_scaled_grouped_mm.py -k dq -s

Benchmarks

M,N,K,G recipe bf16_fwd_bwd_us scaled_fwd_bwd_us scaled_fwd_bwd_speedup bf16_fwd_us scaled_fwd_us scaled_fwd_speedup ----------------------- -------------------- ----------------- ------------------- ------------------------ ------------- --------------- -------------------- (16384, 8192, 5120, 1) MoEScalingType.MXFP8 4239.74 2978.88 1.423x 1229.87 758.8 1.621x (16384, 8192, 5120, 2) MoEScalingType.MXFP8 4192.19 3381.7 1.24x 1229.5 1079.33 1.139x (16384, 8192, 5120, 4) MoEScalingType.MXFP8 3920.91 3419.1 1.147x 1093.42 820.416 1.333x (16384, 8192, 5120, 8) MoEScalingType.MXFP8 4309.06 3633.54 1.186x 1093.73 932.128 1.173x (128000, 8192, 5120, 1) MoEScalingType.MXFP8 50533.1 23270.9 2.172x 12149.6 6208.59 1.957x (128000, 8192, 5120, 2) MoEScalingType.MXFP8 57250.8 23629.6 2.423x 10176.2 6408.26 1.588x (128000, 8192, 5120, 4) MoEScalingType.MXFP8 35872.8 25179.2 1.425x 10041.7 5813.94 1.727x (128000, 8192, 5120, 8) MoEScalingType.MXFP8 50138.8 23592 2.125x 18598.9 6110.18 3.044x (16384, 1536, 5120, 1) MoEScalingType.MXFP8 808 987.136 0.819x 246.816 261.12 0.945x (16384, 1536, 5120, 2) MoEScalingType.MXFP8 855.072 914.56 0.935x 224.496 263.2 0.853x (16384, 1536, 5120, 4) MoEScalingType.MXFP8 824.4 1034.11 0.797x 287.744 273.28 1.053x (16384, 1536, 5120, 8) MoEScalingType.MXFP8 847.968 1033.44 0.821x 220.384 283.712 0.777x (128000, 1536, 5120, 1) MoEScalingType.MXFP8 6480.8 7623.65 0.85x 2100.29 2025.28 1.037x (128000, 1536, 5120, 2) MoEScalingType.MXFP8 6530.54 7277.41 0.897x 2112.32 1929.28 1.095x (128000, 1536, 5120, 4) MoEScalingType.MXFP8 7770.05 6168.45 1.26x 2020.35 1638.43 1.233x (128000, 1536, 5120, 8) MoEScalingType.MXFP8 7438.1 6244.24 1.191x 1847.2 1786.94 1.034x (16384, 2048, 7168, 1) MoEScalingType.MXFP8 1739.78 1519.78 1.145x 452.512 392.224 1.154x (16384, 2048, 7168, 2) MoEScalingType.MXFP8 1628.64 1522.7 1.07x 468 402.432 1.163x (16384, 2048, 7168, 4) MoEScalingType.MXFP8 1564.16 1437.3 1.088x 398.272 392.448 1.015x (16384, 2048, 7168, 8) MoEScalingType.MXFP8 1478.3 1647.55 0.897x 416.8 420.032 0.992x (128000, 2048, 7168, 1) MoEScalingType.MXFP8 13811.2 11483.7 1.203x 3793.09 3032.96 1.251x (128000, 2048, 7168, 2) MoEScalingType.MXFP8 12086.2 11340.2 1.066x 3795.1 3009.82 1.261x (128000, 2048, 7168, 4) MoEScalingType.MXFP8 12410.9 10389.6 1.195x 3529.3 2807.25 1.257x (128000, 2048, 7168, 8) MoEScalingType.MXFP8 14126 9803.76 1.441x 3377.52 2585.7 1.306x 
stack-info: PR: #3129, branch: danielvegamyhre/stack/76
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/76 branch from 51b9be2 to 168d4b7 Compare October 7, 2025 17:58
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 7, 2025
Copy link

pytorch-bot bot commented Oct 7, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3129

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link

pytorch-bot bot commented Oct 7, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3129

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure

As of commit 168d4b7 with merge base cd21d0e (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@danielvegamyhre danielvegamyhre changed the base branch from danielvegamyhre/stack/75 to main October 7, 2025 20:36
@danielvegamyhre danielvegamyhre changed the base branch from main to danielvegamyhre/stack/75 October 7, 2025 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

1 participant