Skip to content

Conversation

@bil-ash
Copy link
Contributor

@bil-ash bil-ash commented Jun 26, 2024

Copy link
Member

@CharlieFRuan CharlieFRuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a look, this model is compiled with prefill chunk size 2k. Could you change the file name to Qwen2-0.5B-Instruct-q4f16_1-ctx4k_cs2k-webgpu.wasm? Thanks!

@bil-ash
Copy link
Contributor Author

bil-ash commented Jun 27, 2024

I have renamed as suggested. By the way, what is prefill chunk size and how does it relate to memory usage and performance?

@CharlieFRuan
Copy link
Member

Thanks! Say prefill chunk size is 2k, if a prompt is 4k, it will be prefilled twice instead of all at once. This helps reduce the size of the intermediate buffer for the matrix multiplication.

@CharlieFRuan CharlieFRuan merged commit 845359b into mlc-ai:main Jun 27, 2024
CharlieFRuan pushed a commit to mlc-ai/web-llm that referenced this pull request Jun 27, 2024
Add quantized(q4f16) qwen2-0.5b to the list of supported models. [PR](mlc-ai/binary-mlc-llm-libs#128) must be merged before merging this.
jingyi-zhao-01 pushed a commit to jingyi-zhao-01/web-llm that referenced this pull request Dec 8, 2024
Add quantized(q4f16) qwen2-0.5b to the list of supported models. [PR](mlc-ai/binary-mlc-llm-libs#128) must be merged before merging this.
atebites-hub pushed a commit to atebites-hub/web-llm that referenced this pull request Oct 4, 2025
Add quantized(q4f16) qwen2-0.5b to the list of supported models. [PR](mlc-ai/binary-mlc-llm-libs#128) must be merged before merging this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants