Skip to content

Conversation

@MatthewBonanni
Copy link
Contributor

@MatthewBonanni MatthewBonanni commented Nov 13, 2025

Purpose

#24794 didn't mark FlashAttention as supporting sink, so GPT-OSS ends up selecting TRITON_ATTN, causing failure. This PR fixes the issue.

Test Plan

python examples/offline_inference/spec_decode.py \ --method "eagle3" \ --tp 1 \ --print-output \ --model-dir openai/gpt-oss-20b" \ --eagle-dir "RedHatAI/gpt-oss-20b-speculator.eagle3" \ --dataset_name "hf" \ --dataset_path "philschmid/mt-bench" \ --num-spec-tokens 3 

Test Result

Previously failed, now works


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses the bug where FlashAttention was not marked as supporting sink tokens, leading to incorrect backend selection for GPT-OSS models. The added supports_sink class method in FlashAttentionBackend properly checks for the availability of FlashAttention's variable-length function and its sink support. The implementation is straightforward and directly resolves the reported issue, ensuring that the correct attention backend is utilized.

@mgoin mgoin added ready ONLY add when PR is ready to merge/full CI is needed bug Something isn't working gpt-oss Related to GPT-OSS models labels Nov 13, 2025
@mgoin mgoin merged commit f9f3b59 into vllm-project:main Nov 13, 2025
55 of 56 checks passed
@github-project-automation github-project-automation bot moved this from To Triage to Done in gpt-oss Issues & Enhancements Nov 13, 2025
@MatthewBonanni MatthewBonanni deleted the fa_sink branch November 13, 2025 18:23
geodavic pushed a commit to geodavic/vllm that referenced this pull request Nov 16, 2025
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: George D. Torres <gdavtor@gmail.com>
bwasti pushed a commit to bwasti/vllm that referenced this pull request Nov 17, 2025
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Bram Wasti <bwasti@meta.com>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
charlotte12l pushed a commit to charlotte12l/vllm that referenced this pull request Dec 5, 2025
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Dec 6, 2025
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed v1

2 participants