Request-based horizontal pod autoscaling

Description

Currently, the user must tune an API's CPU request for horizontal pod autoscaling to behave as expected. An approach based on concurrent requests per container may be better (similar to what Knative uses).

This would also make autoscaling for GPU workloads behave more as expected

It may make sense to have both request-based and CPU/GPU-based autoscaling active at the same time, i.e. it will scale when either of the thresholds are met, and won't scale back until both metrics have backed off.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Request-based horizontal pod autoscaling #573

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Request-based horizontal pod autoscaling #573

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions