[RFC]: Add Ascend NPU as a new backend

@mgoin

Motivation.

VLLM provides an easy-to-use backend access machanism and there are many backends have been integrated.
As shown in #6368, #6728, #6066, many users want to use vllm on Ascend NPU.
The main purpose of this RFC is to follow the existing backend access machanism and make Ascend NPU available for VLLM.

Proposed Change.

We introduce Ascend Executor/Worker(s) based on GPU Executor/Worker(s) as Ascend runtime management and worker on NPU. We also apply the Ascend Backend as the replacement of attention layer, the Page Attention/Flash Attention ops are implemented here.

Because torch_npu already natively supports torch since 2.1.0, we should try to keep it consistent with the GPU code and make the least code changes in our implements.

Feedback Period.

A month

CC List.

@mgoin
@WoosukKwon

Any Other Things.

Background

Ascend NPU is a range of AI processors using Neural Processing Unit. It will efficiently handle matrix-matrix multiplication, dot-product and scalars. There are many projects have supported Ascend NPU, such as onnxruntime, deepspeed, llama.cpp

MindIE is the Ascend inference engine, a high-performance deep learning inference framework, is designed based on Ascend hardware.

RoadMap

The initial version will include the following:

Ascend Executor
Ascend Worker
Ascend Model Runner
Ascend MindIE Backend
Ascend SingleOps Backend

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

[RFC]: Add Ascend NPU as a new backend #7692

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Background

RoadMap

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

[RFC]: Add Ascend NPU as a new backend #7692

Description

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Background

RoadMap

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions