- Notifications
You must be signed in to change notification settings - Fork 1.6k
Documenting APIPriorityAndFairness beta criteria #1632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Beta: | ||
| - Supports concurrency limiting upon long-running requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the goal, but I also see a lot of value in having this on by default even without long running request handling.
@lavalamp would you hold beta on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with David - it would be nice to have, but I don't think it should block beta. It would already be huge win without us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't block beta on this but I am looking for someone to add this. If possible I'd like it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
watch requests (iff served by the cache) is relatively "cheap" in terms of apiserver's performance. but there're certain cases where we need to police the long-runnings in protection of apiserver. added this to the non-blocking list.
| - Adequate documentation for the changes | ||
| - Minimum viable test cases mentioned in Test Plan section | ||
| Beta: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see a discussion of
- documenting metrics for inspection (on their way to a v1 metrics API, but they don't have to be there).
- recommended alerts for the prometheus, somewhere like here https://github.com/coreos/kube-prometheus/blob/master/manifests/prometheus-rules.yaml
- eliminating rate limiting by some means. Something like Standard API qps and burst in kubeconfig file. #1629 could do. (It's hard to prove otherwise).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
documenting metrics for inspection (on their way to a v1 metrics API, but they don't have to be there).
recommended alerts for the prometheus, somewhere like here https://github.com/coreos/kube-prometheus/blob/master/manifests/prometheus-rules.yaml
is there a common repo to place these prometheus integration and related docs? cluster admins will definitely find these docs helpful if they feel like understand the system better..
eliminating rate limiting by some means. Something like #1629 could do. (It's hard to prove otherwise).
added this to blocking list. i need an approach to opt-out client-side rate-limitting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding item 1, there is already some documentation at https://kubernetes.io/docs/concepts/cluster-administration/flow-control/#observability . Is something more being requested here?
| Beta: | ||
| - Supports concurrency limiting upon long-running requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with David - it would be nice to have, but I don't think it should block beta. It would already be huge win without us.
| - Allow constant concurrency shares in the priority-level API model | ||
| - Automatically manages versions of mandatory/suggested configuration | ||
| - Necessary e2e test | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really like to have the LIST calls addressed though. Currently we treat "list all pods" as the same cost as "get single pod". I think this is much more important than addressing long-running requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, and I think that's relatively easy to do.
Treat a LIST as weight = # of records requested. (unpaginated list, treat as a request for 10k items or something sufficiently punitive.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure 10k is a good number, maybe we can introduce something a bit more dynamic here (we already expose #objects per type metric (it's exposed by periodically sending Count() request to etcd - IIRC once per minute). We can use those numbers for it and make it a bit more reflecting reality - the reason I don't fully like 10k is that it may be too small (by order of magnitude) in large clusters). But it's implementation details and we should discuss it outside of this doc :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current plan is to focus on visibility & robustness for beta, and add features afterwards.
We can add features sooner--if we can find people to do the work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we treat "list all pods" as the same cost as "get single pod".
we've been discussing on discriminating unpaginated LIST calls from the very start --- we agreed it will be valuable for managing large clusters but didn't make it in alpha for simplicity. given the fact that unpaginated requests will eat-up/occupy more service-time in the apiserver, the current/alpha version is already applying some sort of discrimination upon unpaginated LIST calls linearly.. i think what @wojtek-t proposing is a significant/non-linear approach to punish those unproper LISTs in a way, (we can also punish those un-cached LISTs similarly).
(i just added this to non-blocking list)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's two pieces of work here:
- The mechanism for acting on the estimate of an API call's cost
- The rules that produce the cost estimation
We should make the code separate enough that it's easy for different people to work on these different tasks. It's easy to have someone go off and spend a month optimizing 2 once 1 is in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - (1) is much more important - we can work on tweaking (2) further on.
| - Blocking Items: | ||
| - Improving observability and robustness: finer metrics and adding debug endpoint dumping fine-grained states of the queues for priority-levels | ||
| - Opt-out client-side rate-limitting to prove the feature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand this line to match the verboseness of the previous line? People who didn't see the conversation here will likely not understand what this means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expanded the line, PTAL
| - Blocking Items: | ||
| - Improving observability and robustness: finer metrics and adding debug endpoint dumping fine-grained states of the queues for priority-levels | ||
| - Opt-out client-side rate-limitting to prove the feature | ||
| - Necessary e2e test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What specifically should the e2e test cover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added two requirements for e2e tests
| /lgtm |
| [APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lavalamp, yue9944882 The full list of commands accepted by this bot can be found here. The pull request process is described here Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…ovider authentication: direct external OIDC provider
/sig api-machinery
/kind feature
this pull is for discussing necessary criterias of moving toward beta for APIPriorityAndFairness
/cc @lavalamp @deads2k @MikeSpreitzer