Posted on Nov 7, 2024

CockroachDB on OpenShift: Separate your logs from data!

CockroachDB and persistent volumes

When deployed on Kubernetes or OpenShift, CockroachDB uses persistent volumes (PVs) to store DB data, metadata, state-data, user-data, log files, configuration files. These volumes are typically file-system mounts that are mapped to disks/SSDs where the data is physically saved in a distributed fashion. When you operate CockroachDB and run queries, data must be read or written where these operations translate to frequent or continuous disk reads & writes.

Managing the disk: IOPS & throughput

On cloud-managed orchestrators, when you read or write data to disk (PVs), this consumes IOPS and utilizes some of the available IO throughput. These are limiting factors that can result bandwidth saturation, or worse, throttling by the cloud provider under heavier workloads. This condition can be identified by the combination of low CPU usage and high disk latencies, visualized through the CockroachDB UI console hardware dashboard metrics and charts.

Divide & conquer

To overcome these limitations, CockroachDB lets you take advantage of multiple, independent PVs to separate the destination of the cockroach runtime data. CockroachDB Logging is a good candidate to move out of the critical path by dedicating its own volume/storage. This will help with performance tuning since your SQL/schemas live on their own dedicated volume. In fact it's the production readiness recommendation to split the data from the logs into separate PVs.

Typical CockroachDB deployments

Most CockroachDB clusters implement a single PVC that is assigned to each node in a stateful set. Default configurations in both HELM and Operator managed environments create this 1:1 mapping as follows:

Default PV/PVC relationship between nodes and volumes

Our planned deployment with multiple PVs

…to the implementation

We need to make additions to the StatefulSet template along with custom log-configuration settings to direct CockroachDB logs into the new destination PV.

The logging “secret” configuration

This resource is the one-stop-shop for all your customized logging properties, including log sinks (output logs to different locations, including over the network), logging channels that are mapped to each sink, the format used by the log messages, any redaction-flags of log messages, the buffering and max sizes of log messages.

The following log configuration is the smallest/simplest configuration that we will use as a starting point. Here we keep most defaults, only adjusting the file-defaults destination path for the actual files, where this path will be mounted to a separate PV defined in the StatefulSet template.

file-defaults: dir: /cockroach/cockroach-logs sinks: file-groups: default: channels: - ALL

For a comprehensive explanation of this fragments, along with working examples and code-fragments, please refer to the Cockroach log configuration documentation so you can tailor the actual logging to your needs.

The StatefulSet template configuration

This statefulset fragment only highlights the added template properties to define the PVC and specific mount points to both the log config secret and the new logs folder. A full, complete StatefulSet example follows this fragment to show the entirety of an actual solution I deployed.

kind: StatefulSet apiVersion: apps/v1 spec: volumeClaimTemplates: # ... # ... # Fragment 1 # New volumeClaimTemplate to generate Log PVC & PV - kind: PersistentVolumeClaim apiVersion: v1 metadata: name: logsdir labels: app.kubernetes.io/instance: zlamal app.kubernetes.io/name: cockroachdb spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi volumeMode: Filesystem template: spec: containers: - # ... # ... volumeMounts: # ... # ... # Fragment 2 # Additional mount-points for path to logs and log-config - name: logsdir mountPath: /cockroach/cockroach-logs/ - name: log-config readOnly: true mountPath: /cockroach/log-config # Fragment 3 # Addition of a new “cockroach start” parameter --log-config-file=... # This parameter points CRDB to the mounted log-config secret args: - shell - '-ecx' - |- exec /cockroach/cockroach start --log-config-file=/cockroach/log-config/log-config.yaml --join=... --advertise-host=... --certs-dir=/cockroach/cockroach-certs/ --http-port=8081 --port=26257 --cache=11% --max-sql-memory=10% volumes: - name: datadir persistentVolumeClaim: claimName: datadir # Fragment 4 # Establish the logical YAML reference to the logging directory - name: logsdir persistentVolumeClaim: claimName: logsdir # Fragment 5 # Establish logical YAML reference to the log-config secret resource - name: log-config secret: secretName: zlamal-cockroachdb-log-config defaultMode: 420 # ... # ...

Note the “Fragment 1, 2, 3, 4, 5” additions to the StatefulSet

Here is the complete StatefulSet of these changes,including tags/labels specific to my cluster as a reference example that you can copy and edit to make your own (eg sizes, storage classes, IOPS, tags/labels. etc):

kind: StatefulSet apiVersion: apps/v1 metadata: name: zlamal-cockroachdb labels: app.kubernetes.io/component: cockroachdb app.kubernetes.io/instance: zlamal app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: cockroachdb helm.sh/chart: cockroachdb-14.0.4 spec: serviceName: zlamal-cockroachdb volumeClaimTemplates: - kind: PersistentVolumeClaim apiVersion: v1 metadata: name: datadir labels: app.kubernetes.io/instance: zlamal app.kubernetes.io/name: cockroachdb spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi volumeMode: Filesystem - kind: PersistentVolumeClaim apiVersion: v1 metadata: name: logsdir labels: app.kubernetes.io/instance: zlamal app.kubernetes.io/name: cockroachdb spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi volumeMode: Filesystem template: metadata: labels: app.kubernetes.io/component: cockroachdb app.kubernetes.io/instance: zlamal app.kubernetes.io/name: cockroachdb spec: restartPolicy: Always initContainers: - resources: {} terminationMessagePath: /dev/termination-log name: copy-certs command: - /bin/sh - '-c' - cp -f /certs/* /cockroach-certs/; chmod 0400 /cockroach-certs/*.key env: - name: POD_NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace imagePullPolicy: IfNotPresent volumeMounts: - name: certs mountPath: /cockroach-certs/ - name: certs-secret mountPath: /certs/ terminationMessagePolicy: File image: busybox serviceAccountName: zlamal-cockroachdb schedulerName: default-scheduler affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchLabels: app.kubernetes.io/component: cockroachdb app.kubernetes.io/instance: zlamal app.kubernetes.io/name: cockroachdb topologyKey: kubernetes.io/hostname terminationGracePeriodSeconds: 300 securityContext: {} containers: - resources: {} readinessProbe: httpGet: path: /health?ready=1 port: http scheme: HTTPS initialDelaySeconds: 10 timeoutSeconds: 1 periodSeconds: 5 successThreshold: 1 failureThreshold: 2 terminationMessagePath: /dev/termination-log name: db livenessProbe: httpGet: path: /health port: http scheme: HTTPS initialDelaySeconds: 30 timeoutSeconds: 1 periodSeconds: 5 successThreshold: 1 failureThreshold: 3 env: - name: STATEFULSET_NAME value: zlamal-cockroachdb - name: STATEFULSET_FQDN value: zlamal-cockroachdb.mz-helm-v11.svc.cluster.local - name: COCKROACH_CHANNEL value: kubernetes-helm ports: - name: grpc containerPort: 26257 protocol: TCP - name: http containerPort: 8081 protocol: TCP imagePullPolicy: IfNotPresent volumeMounts: - name: datadir mountPath: /cockroach/cockroach-data/ - name: logsdir mountPath: /cockroach/cockroach-logs/ - name: log-config readOnly: true mountPath: /cockroach/log-config - name: certs mountPath: /cockroach/cockroach-certs/ terminationMessagePolicy: File image: 'cockroachdb/cockroach:v23.2.1' args: - shell - '-ecx' - |- exec /cockroach/cockroach start --log-config-file=/cockroach/log-config/log-config.yaml --join=${STATEFULSET_NAME}-0.${STATEFULSET_FQDN}:26257,${STATEFULSET_NAME}-1.${STATEFULSET_FQDN}:26257,${STATEFULSET_NAME}-2.${STATEFULSET_FQDN}:26257 --advertise-host=$(hostname).${STATEFULSET_FQDN} --certs-dir=/cockroach/cockroach-certs/ --http-port=8081 --port=26257 --cache=11% --max-sql-memory=10%  topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: ScheduleAnyway labelSelector: matchLabels: app.kubernetes.io/component: cockroachdb app.kubernetes.io/instance: zlamal app.kubernetes.io/name: cockroachdb serviceAccount: zlamal-cockroachdb volumes: - name: datadir persistentVolumeClaim: claimName: datadir - name: logsdir persistentVolumeClaim: claimName: logsdir - name: log-config secret: secretName: zlamal-cockroachdb-log-config defaultMode: 420 - name: certs emptyDir: {} - name: certs-secret projected: sources: - secret: name: zlamal-cockroachdb-node-secret items: - key: ca.crt path: ca.crt mode: 256 - key: tls.crt path: node.crt mode: 256 - key: tls.key path: node.key mode: 256 defaultMode: 420 dnsPolicy: ClusterFirst podManagementPolicy: Parallel replicas: 3 updateStrategy: type: RollingUpdate selector: matchLabels: app.kubernetes.io/component: cockroachdb app.kubernetes.io/instance: zlamal app.kubernetes.io/name: cockroachdb

The logical names/mappings of the volumes are connected together

Conclusion & References

This is a versatile addition to the standard statefulset because the IOPS can be managed between the PVs, and the plumbing is in-place for log customization. DB admins can easily make changes the to logging channels in a running environment by editing a single log-config file that saved as a secrets object.

Cockroach Logging Overview
Cockroach log configuration
Cockroach start: logging
Production recommendations

Top comments (2)

Jim Hatcher • Nov 7 '24

Mark, I'm guessing you could take a similar approach to having multiple data store devices?

Mark Zlamal • Nov 17 '24

Yes indeed! Adding additional data-stores is an ideal solution to address several use-cases:

CRDB on high-vCPU worker-nodes: From our production readiness guidelines, we do not recommend workers with > 32 vCPUs. If you're bound to servers with 32 or more vCPUs, the additional store will benefit from the extra compute/processing power by creating additional processes per-store. These include splitting the GC workload, compactions, replica management, WAL, monitoring, etc. In the end you will leverage the additional CPU and will experience less waiting times on I/O operations.
You can create custom stores dedicated for specialized activities such as encryption at-rest for a subset of your data. In most cases there is a performance cost to encrypt/decrypt data, and you may not want to do this for the entirety of your data, maybe just a few tables managing PII. This is nicely written up with tangible examples in this blog: cockroachlabs.com/blog/selective-e...