- Notifications
You must be signed in to change notification settings - Fork 6.5k
Open
Labels
Description
Is your feature request related to a problem? Please describe.
When reusing a prompt text encoder embeds are recomputed, this can be time consuming for something like T5-XXL with offloading or on CPU.
Text encoder embeds are relatively small, so keeping them in memory is feasible.
import torch clip_l = torch.randn([1, 77, 768]) t5_xxl = torch.randn([1, 512, 4096]) >>> clip_l.numel() * clip_l.dtype.itemsize 236544 >>> t5_xxl.numel() * t5_xxl.dtype.itemsize 8388608Describe the solution you'd like.
MVP would be reusing the last text encoder embeds if the prompt hasn't changed, this behaviour is supported in community UIs. Ideally, supports multiple prompts, potentially serializable.