[WASM] Add Phi-3.5 wasms #136

CharlieFRuan · 2024-08-23T15:28:12Z

Compiled Phi-3.5 at:

MLC-LLM head (at the time of this PR): mlc-ai/mlc-llm@2cbf393
- Mainly needed [Model] Support Phi-3.5 mlc-llm#2839 for longrope support for phi-3.5
TVM: MasterJH5574/tvm@1c17406 (from [Runtime] Support KV cache with RoPE extension factor array apache/tvm#17294)
- With CharlieFRuan/tvm@9b2c5dc cherry-picked (from [Codegen][WebGPU] LetNode common subexpr override apache/tvm#17302)

This PR adds the newly release Phi3.5-mini, adding the following `model_id`s to our prebuilt model list: - `Phi-3.5-mini-instruct-q4f16_1-MLC` (4k KVCache) - `Phi-3.5-mini-instruct-q4f32_1-MLC` (4k KVCache) - `Phi-3.5-mini-instruct-q4f16_1-MLC-1k` (1k KVCache) - `Phi-3.5-mini-instruct-q4f16_1-MLC-1k` (1k KVCache) See mlc-ai/binary-mlc-llm-libs#136 for on which commits of TVM and MLC-LLM this is compiled with. Note that Phi-3.5-mini comes with support up to 128K context (unlike Phi-3-mini which only has 4k) thanks to rope scaling which MLC-LLM supports, which you can take advantage of in WebLLM by increasing `ModelRecord.overrides.context_window_size` or specifying it in `ChatOptions` when loading a model, as long as there is enough VRAM.

[WASM] Add Phi-3.5 wasms

59f36f7

CharlieFRuan marked this pull request as ready for review August 23, 2024 15:55

CharlieFRuan merged commit 055f568 into mlc-ai:main Aug 23, 2024

CharlieFRuan mentioned this pull request Aug 23, 2024

[Model] Add Phi3.5-mini mlc-ai/web-llm#555

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WASM] Add Phi-3.5 wasms #136

[WASM] Add Phi-3.5 wasms #136

Uh oh!

CharlieFRuan commented Aug 23, 2024 •

edited

Loading

Labels

1 participant

[WASM] Add Phi-3.5 wasms #136

[WASM] Add Phi-3.5 wasms #136

Uh oh!

Conversation

CharlieFRuan commented Aug 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Labels

1 participant

CharlieFRuan commented Aug 23, 2024 •

edited

Loading