Add support for quantized qwen2-0.5b #128

bil-ash · 2024-06-26T00:45:34Z

Add support for quantized(q4f16) qwen2-0.5b . Wasm library taken from https://huggingface.co/julientfai/Qwen2-0.5B-Instruct-q4f16_1-Opilot/resolve/main/Qwen2-0.5B-Instruct-q4f16_1-webgpu.wasm?download=true

Neet-Nestor · 2024-06-26T02:54:22Z

Related PRs:

CharlieFRuan

Took a look, this model is compiled with prefill chunk size 2k. Could you change the file name to Qwen2-0.5B-Instruct-q4f16_1-ctx4k_cs2k-webgpu.wasm? Thanks!

bil-ash · 2024-06-27T01:05:14Z

I have renamed as suggested. By the way, what is prefill chunk size and how does it relate to memory usage and performance?

CharlieFRuan · 2024-06-27T01:30:24Z

Thanks! Say prefill chunk size is 2k, if a prompt is 4k, it will be prefilled twice instead of all at once. This helps reduce the size of the intermediate buffer for the matrix multiplication.

Add quantized(q4f16) qwen2-0.5b to the list of supported models. [PR](mlc-ai/binary-mlc-llm-libs#128) must be merged before merging this.

Add support for quantized qwen2-0.5b

2933938

bil-ash mentioned this pull request Jun 26, 2024

Add quantized qwen2-0.5b mlc-ai/web-llm#490

Merged

CharlieFRuan reviewed Jun 26, 2024

View reviewed changes

renamed as per suggestion

7755f8a

CharlieFRuan merged commit 845359b into mlc-ai:main Jun 27, 2024

CharlieFRuan pushed a commit to mlc-ai/web-llm that referenced this pull request Jun 27, 2024

Add quantized qwen2-0.5b (#490)

1da0f76

Add quantized(q4f16) qwen2-0.5b to the list of supported models. [PR](mlc-ai/binary-mlc-llm-libs#128) must be merged before merging this.

jingyi-zhao-01 pushed a commit to jingyi-zhao-01/web-llm that referenced this pull request Dec 8, 2024

Add quantized qwen2-0.5b (mlc-ai#490)

fe9eb00

Add quantized(q4f16) qwen2-0.5b to the list of supported models. [PR](mlc-ai/binary-mlc-llm-libs#128) must be merged before merging this.

atebites-hub pushed a commit to atebites-hub/web-llm that referenced this pull request Oct 4, 2025

Add quantized qwen2-0.5b (mlc-ai#490)

ca66c5e

Add quantized(q4f16) qwen2-0.5b to the list of supported models. [PR](mlc-ai/binary-mlc-llm-libs#128) must be merged before merging this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for quantized qwen2-0.5b #128

Add support for quantized qwen2-0.5b #128

Uh oh!

bil-ash commented Jun 26, 2024

Neet-Nestor commented Jun 26, 2024

CharlieFRuan left a comment

bil-ash commented Jun 27, 2024

CharlieFRuan commented Jun 27, 2024

Labels

3 participants

Add support for quantized qwen2-0.5b #128

Add support for quantized qwen2-0.5b #128

Uh oh!

Conversation

bil-ash commented Jun 26, 2024

Neet-Nestor commented Jun 26, 2024

CharlieFRuan left a comment

Choose a reason for hiding this comment

bil-ash commented Jun 27, 2024

CharlieFRuan commented Jun 27, 2024

Labels

3 participants