Skip to content

Conversation

@jkatz
Copy link
Contributor

@jkatz jkatz commented Apr 5, 2020

Checklist:

  • Have you added an explanation of what your changes do and why you'd like them to be included?
  • Have you updated or added documentation for the change, as applicable?
  • Have you tested your changes on all related environments with successful results, as applicable?

Type of Changes:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

What is the current behavior? (link to any open issues here)

Prior to this commit, one could specify the resources that the instances
in a PostgreSQL cluster could use by putting in a Kubernetes-format
string into the pgcluster CRD (as well as the pgreplica CRD) in a custom
format that requested both the Kubernetes Request and Limit for Memory
and CPU.

The original impetus for this change came from some research that showed
setting the Limit for Memory would increase the risk that a PostgreSQL
clusters could enter into some less-than-desirable situations where the
OOM killer could take out a PostgreSQL backend and cause every other
connection backend to fail as the PostgreSQL server enters crash
recovery mode. While the Guaranteed QoS in Kubernetes, which requires
both Memory & CPU Limits and Requests to be set, would set an overall
OOM score adjustment to be very low (albeit not the PostgreSQL
recommended score of -1000
), this does not affect the behavior of
Limit
and still puts the PostgreSQL connection backends and risk of
the OOM killer and aforementioned behavior.

What is the new behavior (if this is a feature change)?

As such, this change began with removing the option to set the "Limit"
resources in the pgcluster CRD, which afforded the opportunity to switch
the "Resource" requests to match the the container resource data type
for Kubernetes
. However, this was extend to do a few more things to
bring the PostgreSQL Operator inline with best practices for running
PostgreSQL on Kubernetes, including:

  • Removing the "ContainerResources" attribute from the pgreplica CRD.
    This ensures all failoverable instances have the same resource
    definitions. Future work will allow for more customization around this
    (though with strong caveats on when to set a replica with mismatched
    resources from the primary). You can achieve similar behavior now with
    the clone and standby functionality

  • Removing the "Default...ContainerResources" attributes from the
    PostgreSQL Operator configuration. This provides more flexibility for
    how these Pods are deployed and managed by Kubernetes. These include:

    • "PostgreSQL containers"
    • Load Job
    • Backup Jobs
    • "rmdata" Jobs
    • pgBadger
    • pgBouncer

Future work will allow some of these Pods to have their resources set by
the PostgreSQL Operator.

  • A small modification to how CPU and Memory are displayed by
    pgo show cluster, though likely it will go unnoticed

Other information:

Issue: [ch7766]

@jkatz jkatz force-pushed the no-limits branch 7 times, most recently from 9305f09 to 148c728 Compare April 6, 2020 23:36
@jkatz jkatz marked this pull request as ready for review April 7, 2020 17:47
Jonathan S. Katz added 2 commits April 7, 2020 16:05
Prior to this commit, one could specify the resources that the instances in a PostgreSQL cluster could use by putting in a Kubernetes-format string into the pgcluster CRD (as well as the pgreplica CRD) in a custom format that requested both the Kubernetes Request and Limit for Memory and CPU. The original impetus for this change came from some research that showed setting the Limit for Memory would increase the risk that a PostgreSQL clusters could enter into some less-than-desirable situations where the OOM killer could take out a PostgreSQL backend and cause every other connection backend to fail as the PostgreSQL server enters crash recovery mode. While the Guaranteed QoS in Kubernetes, which requires both Memory & CPU Limits and Requests to be set, would set an overall OOM score adjustment to be very low[1] (albeit not the PostgreSQL recommended score of -1000[2]), this does not affect behavior of Limit[3] and still puts the PostgreSQL connection backends and risk of the OOM killer and aforementioned behavior. As such, this change began with removing the option to set the "Limit" resources in the pgcluster CRD, which afforded the opportunity to switch the "Resource" requests to match the the container resource data type for Kubernetes[4]. However, this was extend to do a few more things to bring the PostgreSQL Operator inline with best practices for running PostgreSQL on Kubernetes, including: - Removing the "ContainerResources" attribute from the pgreplica CRD. This ensures all failoverable instances have the same resource definitions. Future work will allow for more customization around this (though with strong caveats on when to set a replica with mismatched resources from the primary). You can achieve similar behavior now with the clone and standby functionality - Removing the "Default...ContainerResources" attributes from the PostgreSQL Operator configuration. This provides more flexibility for how these Pods are deployed and managed by Kubernetes. These include: - "PostgreSQL containers" - Load Job - Backup Jobs - "rmdata" Jobs - pgBadger - pgBouncer Future work will allow some of these Pods to have their resources set by the PostgreSQL Operator. - A small modification to how CPU and Memory are displayed by `pgo show cluster`, though likely it will go unnoticed Issue: [ch7766] [1] https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#node-oom-behavior [2] https://www.postgresql.org/docs/current/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT [3] https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#how-pods-with-resource-limits-are-run [4] https://pkg.go.dev/k8s.io/api/core/v1?tab=doc#ResourceList
This was not being used anywhere.
Copy link
Member

@cbandy cbandy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this very much. The stronger upstream types are a huge improvement and were more work than I expected. The upstream YAML parser is good, too, but it means our field tags need to change.

So much cleaned up! 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see Spec.ClusterName is still used quite a lot. Is this a cleanup that I should also do when I see it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbandy Yup...so cluster.Name invokes cluster.Spec.ClusterName which is the name of the Cluster (vs. the individual instances).

@andrewlecuyer knows the bits that set that, I'm having trouble finding it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClusterName in the Spec has never really made sense to me since the ObjectMeta already has the name.

So from that perspective I'd def rather just start using cluster.Name (i.e. the ObjectMeta name) across the board, and hopefully eventually remove ClusterName from the spec. There are a lot of them to be cleaned up, but seems to make sense to start tackling them as we come across them.

Comment on lines +1025 to +1026
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is any resource.Quantity instance valid? If so, changing the type of request.BackrestCPURequest to *resource.Quantity could eliminate this call.

If it's difficult to change msgs.CreateClusterRequest, don't worry about it. 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could try it -- curious what happens if an invalid value is passed in when it's doing the Marshaling/Unmarshaling 🤔 Not sure if I do want to test it now

Jonathan S. Katz added 7 commits April 8, 2020 13:28
There are cases where it may be helpful to control the container resource utilization for the pgBouncer Deployments. As such, this modifies the pgcluster CRD to store the resource values pertaining to the pgBouncer Deployments. This also adds new flags to the PostgreSQL Operator client to manage the CPU / memory resources, including: - `pgo create cluster --pgbouncer-cpu` - `pgo create cluster --pgbouncer-memory` - `pgo create pgbouncer --cpu` - `pgo create pgbouncer --memory` - `pgo update pgbouncer --cpu` - `pgo update pgbouncer --memory` This also fixes a bug with rotating the pgbouncer system account password, where all pgBouncer Pods would be checked for propagation vs. just the pgBouncer Pods for the specific cluster. Also there are some general code cleanups.
There are cases where it may be helpful to control the container resource utilization for the pgBackRest repository Deployment that comes with every PostgreSQL cluster that is managed by the PostgreSQL Operato . This modifies the pgcluster CRD to store the resource values pertaining to the pgBackRest repository. This also adds new flags to the PostgreSQL Operator client to manage the CPU / memory resources, including: - `pgo create cluster --pgbackrest-cpu` - `pgo create cluster --pgbackrest-memory` - `pgo update cluster --pgbackrest-cpu` - `pgo update cluster --pgbackrest-memory` This also fixes a bug with rotating the pgbouncer system account password, where all pgBouncer Pods would be checked for propagation vs. just the pgBouncer Pods for the specific cluster. Also there are some general code cleanups.
This is now replaced with the one-off settings that are available in the PostgreSQL Operator. It is still possible to achieve similar behavior by modifying the various Job / Pod templates that are deployed in the "pgo-config" ConfigMap. However, this helps the Operator to do things in a more Kubernetes standard way.
@jkatz jkatz merged commit a169a29 into CrunchyData:master Apr 8, 2020
@jkatz jkatz deleted the no-limits branch April 22, 2020 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants