Skip to content

Performance of Mixtral 8x7B matches the Huggingface fork #44

@tengyifei

Description

@tengyifei
  • Implement dropping implementation
  • Tune flash attention block size

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions