I'm passing a moderately large context (2k tokens) to WebLLM running in a service worker on Chrome (v142).
On both NVIDIA and MLX, it can stop the calling page from drawing using GPU (e.g., hardware-accelerated canvas) for multiple seconds, but 2-D operations like scrolling still do work. I presume this is some kind of underlying WebGPU bug (and Chromium should ultimately fix it), but also wonder if the WebLLM code could be structured in a way that would prevent it?