- Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Description
System Info
tgi version : 3.3.4
gemma3 : 27B
gpu : h100
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
i think this issue related to flash attention v2 : Dao-AILab/flash-attention#1311 . newer flash attention version 3 was released(BETA) for H100 gpu. i think this is not a good practice for a big project like tgi to depends on selected version , 8 month ago you update the flash atten version to 2.6.1 . now we have 2.8.3 . i check your Dockerfile ,we have not an easy way to update the project .
Expected behavior
update flash attention
Metadata
Metadata
Assignees
Labels
No labels