Skip to content

Conversation

@yue9944882
Copy link
Member

/sig api-machinery
/kind feature

this pull is for discussing necessary criterias of moving toward beta for APIPriorityAndFairness

/cc @lavalamp @deads2k @MikeSpreitzer

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Mar 25, 2020
@yue9944882 yue9944882 changed the title Documenting APIPriorityAndFairness beta critiria Documenting APIPriorityAndFairness beta criteria Mar 25, 2020
Beta:
- Supports concurrency limiting upon long-running requests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the goal, but I also see a lot of value in having this on by default even without long running request handling.

@lavalamp would you hold beta on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with David - it would be nice to have, but I don't think it should block beta. It would already be huge win without us.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't block beta on this but I am looking for someone to add this. If possible I'd like it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

watch requests (iff served by the cache) is relatively "cheap" in terms of apiserver's performance. but there're certain cases where we need to police the long-runnings in protection of apiserver. added this to the non-blocking list.

- Adequate documentation for the changes
- Minimum viable test cases mentioned in Test Plan section
Beta:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see a discussion of

  1. documenting metrics for inspection (on their way to a v1 metrics API, but they don't have to be there).
  2. recommended alerts for the prometheus, somewhere like here https://github.com/coreos/kube-prometheus/blob/master/manifests/prometheus-rules.yaml
  3. eliminating rate limiting by some means. Something like Standard API qps and burst in kubeconfig file. #1629 could do. (It's hard to prove otherwise).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documenting metrics for inspection (on their way to a v1 metrics API, but they don't have to be there).
recommended alerts for the prometheus, somewhere like here https://github.com/coreos/kube-prometheus/blob/master/manifests/prometheus-rules.yaml

is there a common repo to place these prometheus integration and related docs? cluster admins will definitely find these docs helpful if they feel like understand the system better..

eliminating rate limiting by some means. Something like #1629 could do. (It's hard to prove otherwise).

added this to blocking list. i need an approach to opt-out client-side rate-limitting

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding item 1, there is already some documentation at https://kubernetes.io/docs/concepts/cluster-administration/flow-control/#observability . Is something more being requested here?

Beta:
- Supports concurrency limiting upon long-running requests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with David - it would be nice to have, but I don't think it should block beta. It would already be huge win without us.

- Allow constant concurrency shares in the priority-level API model
- Automatically manages versions of mandatory/suggested configuration
- Necessary e2e test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really like to have the LIST calls addressed though. Currently we treat "list all pods" as the same cost as "get single pod". I think this is much more important than addressing long-running requests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, and I think that's relatively easy to do.

Treat a LIST as weight = # of records requested. (unpaginated list, treat as a request for 10k items or something sufficiently punitive.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure 10k is a good number, maybe we can introduce something a bit more dynamic here (we already expose #objects per type metric (it's exposed by periodically sending Count() request to etcd - IIRC once per minute). We can use those numbers for it and make it a bit more reflecting reality - the reason I don't fully like 10k is that it may be too small (by order of magnitude) in large clusters). But it's implementation details and we should discuss it outside of this doc :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current plan is to focus on visibility & robustness for beta, and add features afterwards.

We can add features sooner--if we can find people to do the work.

Copy link
Member Author

@yue9944882 yue9944882 Apr 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we treat "list all pods" as the same cost as "get single pod".

we've been discussing on discriminating unpaginated LIST calls from the very start --- we agreed it will be valuable for managing large clusters but didn't make it in alpha for simplicity. given the fact that unpaginated requests will eat-up/occupy more service-time in the apiserver, the current/alpha version is already applying some sort of discrimination upon unpaginated LIST calls linearly.. i think what @wojtek-t proposing is a significant/non-linear approach to punish those unproper LISTs in a way, (we can also punish those un-cached LISTs similarly).

(i just added this to non-blocking list)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's two pieces of work here:

  1. The mechanism for acting on the estimate of an API call's cost
  2. The rules that produce the cost estimation

We should make the code separate enough that it's easy for different people to work on these different tasks. It's easy to have someone go off and spend a month optimizing 2 once 1 is in place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - (1) is much more important - we can work on tweaking (2) further on.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 7, 2020
- Blocking Items:
- Improving observability and robustness: finer metrics and adding debug endpoint dumping fine-grained states of the queues for priority-levels
- Opt-out client-side rate-limitting to prove the feature
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand this line to match the verboseness of the previous line? People who didn't see the conversation here will likely not understand what this means.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expanded the line, PTAL

- Blocking Items:
- Improving observability and robustness: finer metrics and adding debug endpoint dumping fine-grained states of the queues for priority-levels
- Opt-out client-side rate-limitting to prove the feature
- Necessary e2e test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What specifically should the e2e test cover?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added two requirements for e2e tests

@k8s-ci-robot k8s-ci-robot added the sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. label May 21, 2020
@lavalamp
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 21, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lavalamp, yue9944882

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 21, 2020
@k8s-ci-robot k8s-ci-robot merged commit c2aaf7a into kubernetes:master May 21, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone May 21, 2020
RomanBednar pushed a commit to RomanBednar/enhancements that referenced this pull request Jan 31, 2025
…ovider authentication: direct external OIDC provider
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

6 participants