- Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
The pre-compiled model WASMs in the binary-mlc-llm-libs repository are limited to 4K context windows, which significantly limits their usefulness for production applications. Please provide pre-compiled WASMs with larger context windows (8K, 16K, and 32K) for popular models.
Background
The current pre-compiled WASMs in binary-mlc-llm-libs are compiled with context_window_size=4096. For models like Qwen3-8B that natively support 32K+ context, this is a significant limitation.
Why This Matters
- Use Case Demand: Applications like chat interfaces, document Q&A, and code assistants benefit greatly from longer context
- Custom Compilation Barrier: Compiling custom WASMs requires:
- Matching exact TVM/MLC-LLM versions (commits
apache/tvm@c8515e1andmlc-ai/mlc-llm@4084e7ffor v0_2_80) - Complex build environment setup
- WASM/runtime version compatibility expertise
- Matching exact TVM/MLC-LLM versions (commits
- Version Mismatch Issues: Custom-compiled WASMs frequently fail with
LinkErrordue to TVM FFI function mismatches (e.g.,TVMFFIEnvSetStream: function import requires a callable)
Attempted Solutions
I attempted to compile custom WASMs with extended context following these approaches:
Approach 1: Using Latest MLC-LLM
pip install mlc-llm python -m mlc_llm compile model --device webgpu --overrides "context_window_size=32768"Result: LinkError: TVMFFIEnvSetStream: function import requires a callable
The web-llm 0.2.80 runtime expects different FFI functions than newer MLC-LLM versions export.
Approach 2: Building from Exact Commits
Checked out the exact commits from PR #158 in binary-mlc-llm-libs:
- TVM:
apache/tvm@c8515e1 - MLC-LLM:
mlc-ai/mlc-llm@4084e7f
Blockers:
- CMake version compatibility issues (many files require patching from
VERSION 3.1to3.5) - TVM Python bindings don't match pip-installed native libs
- Complex dependency chain between mlc_llm → tvm Python module → native libraries
Approach 3: Patching web-llm Runtime
Added stub functions for missing FFI calls:
function _TVMFFIEnvSetStream(){return 0;}_TVMFFIEnvSetStream.stub=true; function _TVMFFIEnvGetStream(){return 0;}_TVMFFIEnvGetStream.stub=true;Status: Partial success - addresses immediate LinkError but may have other incompatibilities
Compilation Command Reference
python -m mlc_llm compile MODEL_PATH \ --device webgpu \ --quantization q4f16_1 \ --overrides "context_window_size=32768;prefill_chunk_size=4096;max_batch_size=1" \ -o MODEL-ctx32k-webgpu.wasmEnvironment
- web-llm version: 0.2.80
- @mlc-ai/web-runtime: 0.23.0-dev1
- Target: WebGPU (browsers, Electron/Obsidian)
- Host OS: macOS (for testing)
Additional Context
The WebLLM project is excellent and I want to use it for local LLM inference in Obsidian. The 4K context limitation is the primary blocker for production use. I'm happy to test pre-release builds if you can point me to a working compilation pipeline.
References
- Related issues: [MLC-LLM] Uncaught (in promise) LinkError: WebAssembly.instantiate(): Import #4 "env" #373, Compiling Custom Model Fails to Load Into Web-LLM #633 (custom WASM LinkError issues)
- PR Qeed #158 in binary-mlc-llm-libs (documents exact commits used)
- MLC-LLM compilation docs: https://llm.mlc.ai/docs/compilation/compile_models.html
Repository: https://github.com/mlc-ai/web-llm/issues
Thank you for considering this request!