Replies: 11 comments 80 replies
-
One thing to clarify here: This is just for new resources, not for new fields on graduated resources, right? |
Beta Was this translation helpful? Give feedback.
-
@robscott would be good to also gather feedback from implementations on who will support this new |
Beta Was this translation helpful? Give feedback.
-
Posted in slack but I'll surface this here Circling back to the discussion on which group |
Beta Was this translation helpful? Give feedback.
-
The APIs represent a common way to solve a specific problem. If 2 implementations use the same API - and 15 don't implement it - it is still better than the 2 implementations using different API, and if it is a rarely used feature - it is better than attempting to force the 15 others to implement it, and a good signal that the feature is not important enough to be part of the standard. My main concern is with calling this 'experimental' - like we called a lot of APIs 'alpha' before but still encouraged users to use in production. Should be treated just like vendor-specific v1 APIs, with the main difference that more than 1 vendor is using the same API. I used to be very enthusiastic about this proposal - but right now I think a better approach is to do nothing, and let each implementation define their own v1 API - perhaps with some communication and discussion on a common behavior. Let them try different things, get feedback, wait to see adoption and convergence - and only then attempt to standardize. Defining any standard API (or protocol) by committee, before having real world experience is never working well. …On Wed, Jan 29, 2025 at 5:55 PM Rob Scott ***@***.***> wrote: Yeah this specific proposal doesn't really provide isolation for experimental *fields*. My previous proposal that did (#3106 <#3106>) was not particularly popular, so I'm trying to start smaller here. — Reply to this email directly, view it on GitHub <#3497 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAUR2S67EXLO4WRTOHXS432NGBBRAVCNFSM6AAAAABTOYPLEKVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEMBQGE4TENY> . You are receiving this because you commented.Message ID: <kubernetes-sigs/gateway-api/repo-discussions/3497/comments/12001927@ github.com> |
Beta Was this translation helpful? Give feedback.
-
Not sure how supporting 2 groups is 'orders of magnitude' - I would expect at worst double, and no worse than what Istio does with HttpRoute and VirtualService. All implementation I know have vendor extensions - and I would expect they would remain supported, not thrown away even if a subset or variant of the extension is adopted ( and this is likely to be on a permanent basis for most implementations - expecting that upstream implements every single features that any vendor needs is not viable). Maybe the confusion is 'experiments' ( which should not be used by anyone in production ) versus features like FooRoute - that some vendors may adopt and maintain on their own for their customers. There is no expectation that a vendor FooRoute will go away and be replaced by an upstream FooRoute, just like there was no expectation that VirtualService will be deprecated when HttpRoute was v1. Since 'experiments' should never be in production - I am not at all concerned with their fate, as long as we avoid the istio confusion between alpha and v1. The (single or multi-vendor) FooRoute can be v1 and can have more fields or behave differently - and may have its own vendor-defined support lifecycle, and since common APIs are expected to be implementable by everyone - we can't expect implementations that can support more features to drop that. …On Fri, Jan 31, 2025 at 9:26 AM John Howard ***@***.***> wrote: @mikemorris <https://github.com/mikemorris> the big concern with splitting groups (from a maintainer POV) is the cost to support 2 groups is orders of magnitude larger than to support 2 versions of the same group (since versions are identical). So I really don't want to get into a world where an implementation has to deal with that. The easy way out is... say GW v1.5 promotes FooRoute. So v1.4 had xFooRoute. v1.5 should remove xFooRoute and add FooRoute (stable). Implementations supporting v1.5 should *replace* xFooRoute support with FooRoute support. tl;dr "within a projects codebase" — Reply to this email directly, view it on GitHub <#3497 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAUR2XLEC7TXQXEIZJ4N5L2NOW5BAVCNFSM6AAAAABTOYPLEKVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEMBSGA2DKOI> . You are receiving this because you commented.Message ID: <kubernetes-sigs/gateway-api/repo-discussions/3497/comments/12020459@ github.com> |
Beta Was this translation helpful? Give feedback.
-
In other words: I read x-k8s.io as "eXtended" - not 'eXperimental'. A collection of gateway API extensions that have common definition across few vendors - and are as v1 as the core APIs - but can't be supported by majority/all vendors, or are too corner cases. If some of those extensions become common or popular - the same or slightly different API can be added to core, of course - and the vendors will have a 2x effort to keep both. …On Fri, Jan 31, 2025 at 5:14 PM Costin Manolache ***@***.***> wrote: Not sure how supporting 2 groups is 'orders of magnitude' - I would expect at worst double, and no worse than what Istio does with HttpRoute and VirtualService. All implementation I know have vendor extensions - and I would expect they would remain supported, not thrown away even if a subset or variant of the extension is adopted ( and this is likely to be on a permanent basis for most implementations - expecting that upstream implements every single features that any vendor needs is not viable). Maybe the confusion is 'experiments' ( which should not be used by anyone in production ) versus features like FooRoute - that some vendors may adopt and maintain on their own for their customers. There is no expectation that a vendor FooRoute will go away and be replaced by an upstream FooRoute, just like there was no expectation that VirtualService will be deprecated when HttpRoute was v1. Since 'experiments' should never be in production - I am not at all concerned with their fate, as long as we avoid the istio confusion between alpha and v1. The (single or multi-vendor) FooRoute can be v1 and can have more fields or behave differently - and may have its own vendor-defined support lifecycle, and since common APIs are expected to be implementable by everyone - we can't expect implementations that can support more features to drop that. On Fri, Jan 31, 2025 at 9:26 AM John Howard ***@***.***> wrote: > @mikemorris <https://github.com/mikemorris> the big concern with > splitting groups (from a maintainer POV) is the cost to support 2 groups is > orders of magnitude larger than to support 2 versions of the same group > (since versions are identical). > > So I really don't want to get into a world where an implementation has to > deal with that. > > The easy way out is... say GW v1.5 promotes FooRoute. So v1.4 had > xFooRoute. v1.5 should remove xFooRoute and add FooRoute (stable). > Implementations supporting v1.5 should *replace* xFooRoute support with > FooRoute support. > > tl;dr "within a projects codebase" > > — > Reply to this email directly, view it on GitHub > <#3497 (reply in thread)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAAUR2XLEC7TXQXEIZJ4N5L2NOW5BAVCNFSM6AAAAABTOYPLEKVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEMBSGA2DKOI> > . > You are receiving this because you commented.Message ID: > <kubernetes-sigs/gateway-api/repo-discussions/3497/comments/12020459@ > github.com> > |
Beta Was this translation helpful? Give feedback.
-
On Fri, Jan 31, 2025 at 5:23 PM John Howard ***@***.***> wrote: supporting 2 versions is literally 0 effort so orders of magnitude was incorrect - more like infinite 🙂 I agree vendor Foo and experimental Foo are different, and this proposal is for experimental. The difference being that as (for example) the Istio project there is a ~0% chance I am going to implement GKEFoo and a very high chance I will implement ExperimentalFoo Right, most vendors will implement APIs defined by another vendor - and that is the problem, each vendor defining a XXXFoo API that is slightly different from each other - all v1 and supported but subtly different. Having a common gateway-extensions.io/Foo is far better for users, if 2-3 vendors agree on a common Foo. From the 'core' API perspective, each single vendor extension and each multi-vendor extension are 'experimental', in the sense that there is no clear evidence it is a universal problem or consensus across all vendors. I think that is the main purpose (and benefit) of vendor extensions, it allows vendor to 'experiment' by releasing features and APIs. If x-net.io/Foo (v1) was defined - maybe both Istio and GKE and other could implement it, users can adopt such a feature with confidence it will be supported as a v1 API, and when a core Foo API is defined we can use the lessons learned from x-net.io/Foo and make changes as needed. Istio implementing ExperimentalFoo - and dealing with migrations and users who (as usual) adopt ExperimentalFoo in production ignoring the alpha and experimental labels - is indeed order of magnitudes harder than support an x-net.io/FooV1 or an istio.io/FooV1 API long term and not dealing with any migration. |
Beta Was this translation helpful? Give feedback.
-
Alternative ProposalWhat?I propose an alternative solution which I will call "Full Copy". In summary this solution suggestions that we need to move to having all APIs copied to a new experimental group, rather than just new ones.
Why?In the past we've lived in a world where an implementation manages the CRDs. Increasingly since GA we live in a world where these APIs are deployed and managed by platforms, and more and more treated like core APIs as multiple implementations need to be using the same APIs. In this future where platforms need to serve Gateway API to multiple implementations, I do not see how we can effectively "overlay" experimental fields with the standard APIs. This means that fields may or may not be present on a "core-like" API and this is not a good situation to be in. Ultimately too, I think the complexity of one API potentially being one, or a very different experimental version on the same production cluster will drive platforms to block experimental to control API surface. How?High Level Steps
Other details, questions and considerations
To keep things cleanly separated we will use separate Go types. To avoid unintended drift between types and to ensure that experimental remains a true super set of standard, we will add CRD schema checks/comparisons to CI (crd-schema-checker supports this today. It can be used as a (Go) library and there's desire to move it into upstream).
We will make even stronger guarantees about this new group being "Experimental" in a more classical sense of the word, and less "Almost Standard". We will do this by updating our GEP process and "probationary period" to reflect a faster on-ramp and off-ramp. This means that we will optimize for moving things into experimental status quicker, but we also will be strict about when things need to move out (or change significantly). As such all APIs in this new group will be Platforms will be advised to avoid managing the experimental CRDs if feasible. Implementations will be advised to only enable experimental on clusters where they're certain to have clean control of the experimental CRDs.
If it meets our criteria for promotion, it gets manually copied over to the corresponding standard types in a new feature branch. To help reduce situations where "something that wasn't happening in experimental with a specific field is now happening on standard", conformance tests need to pivot over to the new standard implementation and submit reports prior to merging the feature branch into After that's complete the process continues as normal. Generators are run and the fields get published in an upcoming release.
We do not support references between standard and experimental API types. We advise implementations to keep those cleanly divided (e.g. don't support attaching an |
Beta Was this translation helpful? Give feedback.
-
I think we are still using the word 'experiment' in a very confusing way. In your '3 kinds of experiments' - for 1 you mention 'are used in production', so perhaps you mean the same thing as I do, a v1 vendor extensions that are considered as an 'experiment' for the core API, in the sense that other vendors and the WG waits to get feedback on how the v1 vendor feature behaves and what adoptions it gets. Assuming we mean the same thing - vendor v1 APIs, that multiple vendors support but in slightly different ways - having a common API, even if not part of this WG - is a good thing for users, and it doesn't really matter if Gateaway API is planning or not to adopt it in core, so (2) is the same as (1), stable v1 APIs that users can safely rely on with no risks or problems, but is outside of this WG but still cross-vendor. (3) is the only tricky one - if Gateway API decides to adopt an API from (1) or (2) category, there will be some pain for the user in using both the original (1/2) API and (3) at the same time - but it is reasonable to assume the vendor API will remain a super-set of the Gateway API and remain supported long-term by the vendors. The only source of pain is if a user touches any 'alpha'/experimental CRD in prod, and that is something every vendor and K8S community should do a better job of preventing and educating users about. …On Fri, Feb 7, 2025 at 11:20 AM Arko Dasgupta ***@***.***> wrote: @kflynn <https://github.com/kflynn> the cost is high for both cases - full-copy and new only proposals Regarding Ana 1. if envoy gateway doesnt implement the full copy or new-only proposal, because it already supports the feature using 1. or 2. then Ana shouldnt mind, because her use case is solved 2. If many users voice the fact that they want to use the Gateway API version of the feature, that may be a consideration for Envoy Gateway but its a stronger signal for Gateway API to move the feature into standard im just trying to highlight how this situation we are in, is not a win-win one for the API group and implementations — Reply to this email directly, view it on GitHub <#3497 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAUR2SP6FOK34FCXDQMFPT2OUBQLAVCNFSM6AAAAABTOYPLEKVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEMBZHAYDMMI> . You are receiving this because you commented.Message ID: <kubernetes-sigs/gateway-api/repo-discussions/3497/comments/12098061@ github.com> |
Beta Was this translation helpful? Give feedback.
-
I think the only real problem is legitimizing the use of unstable, experimental APIs in production environments. Any user who understands production is not for experimentation and no alpha API should be touched outside of test clusters will be safe. As long as implementations understand that once they declare an API or feature as 'ready for production' or 'v1' they can't remove fields - but they can create new stable APIs or use v1 APIs from a 'standard' group while maintaining the API they used - users will be safe, and there is little pain for implementations. …On Mon, Feb 10, 2025 at 4:04 AM Joel Speed ***@***.***> wrote: I was trying to think about what the copy all approach gains us in terms of our ability to make breaking changes, so ran through some thoughts on what it would like with some changes on Gateway Full Copy XGateway Field is added and dropped Lets assume in this instance we add a field Foo, but later decide that we want to drop it in v1.N - On upgrade to v1.N, any cluster that has been leveraging Foo will have data populated in the field, but implementations will not read the field - Any write to Foo will wipe the persisted data in Foo - Dropping a feature *is* a breaking change, but we are in experimental, so who cares? - Gateway has not been affected as XGateway had the field first Field is added but we want to change the shape Lets assume in this instance we add a field Bar first as []string but now want []struct - This change *needs* a version bump, or a new name - So we have to move XGateway from v1alphaN to v1alphaN+1 - If we do not bump the version, then serialization issues could ensue, which is no fun for anyone - We have to implement conversion between the two alpha versions and support that - Or because this is experimental, we bump the version but don't implement conversion - This leaves users to have to remove their gateway API install before they can upgrade - Gateway has not been affected, XGateway v1alphaN is now dead, impls need to now all move to v1alphaN+1 - When we do eventually promote the feature, Gateway gets the correct shape first time Adding new fields to Gateway Field is added and dropped Lets assume in this instance we add a field Foo, but later decide that we want to drop it in v1.N - On upgrade to v1.N, any cluster that has been leveraging Foo will have data populated in the field, but implementations will not read the field - Any write to Foo will wipe the persisted data in Foo - Dropping the feature *is* a breaking change. This is a stable resource and while the feature is marked experimental, we don't know how all implementations are handling it(?) - The risk here seems way higher here than it does for the other approach, but again we would argue that the field was experimental and we reserve the right to drop it - Gateway can NEVER re-use this field name if we come up with some reason to later - How do we make sure that we never re-use the field name? Field is added but we want to change the shape Lets assume in this instance we add a field Bar first as []string but now want []struct - The field must be dropped (see above) and then introduced under a new name - We have no option to change the shape without breaking clients - Again we have a need to understand how to make sure we never use a name again ------------------------------ So to me, there's a couple of things we probably want to work out where we are ok with this. For the copy all approach, when we decide we need to make a change of shape to an API, are we going to be ok bumping the API to a new version, and what's the impact of that? Would we rather just say, "name is spent" and move on in this case? And if so, how would we track spent names? If we go down the "name is spent" route, then the benefits of the copy all approach for existing APIs, in terms of our ability to make breaking changes are lessened. Though I do see other benefits to the copy all approach, from the end user and platform management perspective. A reduction in the dead fields problem or stale fields waiting to be pruned problem is certainly a nice benefit. — Reply to this email directly, view it on GitHub <#3497 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAUR2UY6IQ6UVAXSXQA7632PCIWVAVCNFSM6AAAAABTOYPLEKVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEMJRHAYDCOI> . You are receiving this because you commented.Message ID: <kubernetes-sigs/gateway-api/repo-discussions/3497/comments/12118019@ github.com> |
Beta Was this translation helpful? Give feedback.
-
(I've been slow to write here, but I have been trying to keep up.) We're at a point in this discussion where the various parties appear far enough apart that it's worth sitting back and asking where the disconnects are, and where we're making assumptions that aren't shared, so let me take a quick crack at getting that started -- and let me apologize in advance for any places where folks think I'm misunderstanding y'all. We agree on a lot of things. We all seem to want to have a place for experimentation with the API to happen, we all seem to recognize the value of having real users actually use the experimental version, and I think we all agree that the standard version is the one that's most important for actual users. We also seem to agree that we're expecting managed Gateway API from the cluster providers soon, and that we don't expect the managed offerings to include the experimental channel. (Anyone who doesn't think these are points of agreement, please speak up!) On the other hand, we seem to have a substantial disconnect around something I would've expected to be a point of agreement: whether we expect the experimental channel to be stable. That probably sounds silly on the face of it, given the many times that we've said that experimental can receive breaking changes -- but historically, that's happened only rarely, and in fact there was a point before Gateway API had a stable release where I remember many of us (including myself) downplaying the idea that experimental was unstable, because we wanted folks to use Gateway API at all. It's not hard to see this kind of history leading folks to see more stability in experimental than we've actually promised. And now, this discussion includes folks who seem to be saying both that their users are fine using experimental, and also that they don't want to destabilize those users. To me, those two together imply that the users are fine with experimental because they believe that the experimental APIs are in fact stable APIs (apparently to the point of being in production with them). (That may seem like a bit of a stretch, so for more context: if your users of experimental APIs are truly expecting breaking changes, then they won't consider breaking changes destabilizing. A user who finds breaking changes destabilizing is a user who didn't expect breaking changes... which means that they think they're using a stable API.) I want to be clear here that I'm not trying to pass judgment on these users. All I'm saying here is that as long as we don't agree even on whether or not experimental is meant to be stable, we're going to talk past each other till the cows come home, and that won't really get us very far. I think that there are other disconnects worth discussing, too, but this is the one that seems most fundamental to me. More on the others later. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
As a follow up to #3106, I'd like to propose a smaller subset of that to address many of the same problems:
Importantly this means that when a new resource graduates to GA, users would need to manually copy their config over from x-k8s.io (alpha) to k8s.io (v1) CRs. Although this is rather onerous, I'd argue that it's preferable to the current experience which can result in confusing error messages when trying to install Gateway API standard channel. This new approach would ensure that it was always safe to install standard channel CRDs. For example, this could allow Kubernetes to include Gateway API standard channel by default (x-ref #3576).
I don't really like 3) here as it means we're not solving the full set of problems that fully separate API groups for experimental channel and standard channel would, but this more limited approach does come with some advantages. Notably, controllers would not need to support both experimental and standard channel API groups at the same time for the same resource. Instead, they could transition to the
k8s.io
API group at the same time the API does. If they want to support a broader set of versions, they can support both API groups for a limited period of time, but importantly any Gateway API release would only ever include a single API group (and version) per resource.Note: All mentions of API groups here are just focused on the suffix. In all cases, they would also include the
gateway.networking.
prefix they already do.Beta Was this translation helpful? Give feedback.
All reactions