Skip to content

Inference in service worker can block webpage drawing for seconds #750

@herf

Description

@herf

I'm passing a moderately large context (2k tokens) to WebLLM running in a service worker on Chrome (v142).

On both NVIDIA and MLX, it can stop the calling page from drawing using GPU (e.g., hardware-accelerated canvas) for multiple seconds, but 2-D operations like scrolling still do work. I presume this is some kind of underlying WebGPU bug (and Chromium should ultimately fix it), but also wonder if the WebLLM code could be structured in a way that would prevent it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions