Cache text encoder embeds in pipelines

Is your feature request related to a problem? Please describe.

When reusing a prompt text encoder embeds are recomputed, this can be time consuming for something like T5-XXL with offloading or on CPU.

Text encoder embeds are relatively small, so keeping them in memory is feasible.

import torch clip_l = torch.randn([1, 77, 768]) t5_xxl = torch.randn([1, 512, 4096]) >>> clip_l.numel() * clip_l.dtype.itemsize 236544 >>> t5_xxl.numel() * t5_xxl.dtype.itemsize 8388608

Describe the solution you'd like.

MVP would be reusing the last text encoder embeds if the prompt hasn't changed, this behaviour is supported in community UIs. Ideally, supports multiple prompts, potentially serializable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cache text encoder embeds in pipelines #10078

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cache text encoder embeds in pipelines #10078

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions