Feature Request: Pre-compiled WASMs with Larger Context Windows (8K-32K+)

The pre-compiled model WASMs in the binary-mlc-llm-libs repository are limited to 4K context windows, which significantly limits their usefulness for production applications. Please provide pre-compiled WASMs with larger context windows (8K, 16K, and 32K) for popular models.

Background

The current pre-compiled WASMs in binary-mlc-llm-libs are compiled with context_window_size=4096. For models like Qwen3-8B that natively support 32K+ context, this is a significant limitation.

Why This Matters

Use Case Demand: Applications like chat interfaces, document Q&A, and code assistants benefit greatly from longer context
Custom Compilation Barrier: Compiling custom WASMs requires:
- Matching exact TVM/MLC-LLM versions (commits apache/tvm@c8515e1 and mlc-ai/mlc-llm@4084e7f for v0_2_80)
- Complex build environment setup
- WASM/runtime version compatibility expertise
Version Mismatch Issues: Custom-compiled WASMs frequently fail with LinkError due to TVM FFI function mismatches (e.g., TVMFFIEnvSetStream: function import requires a callable)

Attempted Solutions

I attempted to compile custom WASMs with extended context following these approaches:

Approach 1: Using Latest MLC-LLM

pip install mlc-llm python -m mlc_llm compile model --device webgpu --overrides "context_window_size=32768"

Result: LinkError: TVMFFIEnvSetStream: function import requires a callable

The web-llm 0.2.80 runtime expects different FFI functions than newer MLC-LLM versions export.

Approach 2: Building from Exact Commits

Checked out the exact commits from PR #158 in binary-mlc-llm-libs:

TVM: apache/tvm@c8515e1
MLC-LLM: mlc-ai/mlc-llm@4084e7f

Blockers:

CMake version compatibility issues (many files require patching from VERSION 3.1 to 3.5)
TVM Python bindings don't match pip-installed native libs
Complex dependency chain between mlc_llm → tvm Python module → native libraries

Approach 3: Patching web-llm Runtime

Added stub functions for missing FFI calls:

function _TVMFFIEnvSetStream(){return 0;}_TVMFFIEnvSetStream.stub=true; function _TVMFFIEnvGetStream(){return 0;}_TVMFFIEnvGetStream.stub=true;

Status: Partial success - addresses immediate LinkError but may have other incompatibilities

Compilation Command Reference

python -m mlc_llm compile MODEL_PATH \ --device webgpu \ --quantization q4f16_1 \ --overrides "context_window_size=32768;prefill_chunk_size=4096;max_batch_size=1" \ -o MODEL-ctx32k-webgpu.wasm

Environment

web-llm version: 0.2.80
@mlc-ai/web-runtime: 0.23.0-dev1
Target: WebGPU (browsers, Electron/Obsidian)
Host OS: macOS (for testing)

Additional Context

The WebLLM project is excellent and I want to use it for local LLM inference in Obsidian. The 4K context limitation is the primary blocker for production use. I'm happy to test pre-release builds if you can point me to a working compilation pipeline.

References

Related issues: [MLC-LLM] Uncaught (in promise) LinkError: WebAssembly.instantiate(): Import #4 "env" #373, Compiling Custom Model Fails to Load Into Web-LLM #633 (custom WASM LinkError issues)
PR Qeed #158 in binary-mlc-llm-libs (documents exact commits used)
MLC-LLM compilation docs: https://llm.mlc.ai/docs/compilation/compile_models.html

Repository: https://github.com/mlc-ai/web-llm/issues

Thank you for considering this request!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Pre-compiled WASMs with Larger Context Windows (8K-32K+) #752

Background

Why This Matters

Attempted Solutions

Approach 1: Using Latest MLC-LLM

Approach 2: Building from Exact Commits

Approach 3: Patching web-llm Runtime

Compilation Command Reference

Environment

Additional Context

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Pre-compiled WASMs with Larger Context Windows (8K-32K+) #752

Description

Background

Why This Matters

Attempted Solutions

Approach 1: Using Latest MLC-LLM

Approach 2: Building from Exact Commits

Approach 3: Patching web-llm Runtime

Compilation Command Reference

Environment

Additional Context

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions