API proxy deployments fail with apigee-serving-cert is not found or expired

You're viewing Apigee and Apigee hybrid documentation.
View Apigee Edge documentation.

Symptoms

API proxy deployments fail with the following error messages.

Error Messages

If the TLS certificate of the apigee-webhook-service.apigee-system.svc service has expired or is not yet valid, the following error message will be shown on apigee-watcher logs:

{"level":"error","ts":1687991930.7745812,"caller":"watcher/watcher.go:60", "msg":"error during watch","name":"ingress","error":"INTERNAL: INTERNAL: failed to update ApigeeRoute [org-env]-group-84a6bb5, namespace apigee: Internal error occurred: failed calling webhook \"mapigeeroute.apigee.cloud.google.com\": Post \"https://apigee-webhook-service.apigee-system.svc:443/mutate-apigee-cloud-google-com-v1alpha1-apigeeroute?timeout=30s\": x509: certificate has expired or is not yet valid: current time 2023-06-28T22:38:50Z is after 2023-06-17T17:14:13Z, INTERNAL: failed to update ApigeeRoute [org-env]-group-e7b3ff6, namespace apigee

Possible Causes

Cause	Description
The apigee-serving-cert is not found	If the `apigee-serving-cert` is not found in the `apigee-system` namespace, this issue could occur.
Duplicate certificate requests were created for renewing `apigee-serving-cert`	If there are duplicate certificate requests created for renewing the `apigee-serving-cert` certificate, the `apigee-serving-cert` certificate may not get renewed.
cert-manager is not healthy	If `cert-manager` is not healthy, the `apigee-serving-cert` certificate may not get renewed.

Cause: The apigee-serving-cert is not found

Diagnosis

Check the availability of the apigee-serving-cert certificate in the apigee-system namespace:
```
 kubectl -n apigee-system get certificates apigee-serving-cert 
```
If this certificate is available, an output similar to following should be seen:
```
NAME READY SECRET AGE apigee-serving-cert True webhook-server-cert 2d10h
```
If the apigee-serving-cert certificate is not found in the apigee-system namespace, that could be the reason for this issue.

Resolution

Update the apigee-serving-cert using Helm:
```
helm upgrade ENV_NAME apigee-env/ \ --namespace APIGEE_NAMESPACE \ --set env=ENV_NAME \ --atomic \ -f OVERRIDES_FILE
```
Make sure to include all of the settings shown, including --atomic so that the action rolls back on failure.
Verify that the apigee-serving-cert certificate has been created:
```
kubectl -n apigee-system get certificates apigee-serving-cert
```

Cause: Duplicate certificate requests were created for renewing apigee-serving-cert

Diagnosis

Check cert-manager controller logs and see whether an error message similar to the following has been returned.

List all cert-manager pods:

kubectl -n cert-manager get pods

An example output:

NAME READY STATUS RESTARTS AGE cert-manager-66d9545484-772cr 1/1 Running 0 6d19h cert-manager-cainjector-7d8b6bd6fb-fpz6r 1/1 Running 0 6d19h cert-manager-webhook-669b96dcfd-6mnm2 1/1 Running 0 6d19h

Check cert-manager controller logs:

kubectl -n cert-manager logs cert-manager-66d9545484-772cr | grep "issuance is skipped until there are no more duplicates"

Example outputs:

1 controller.go:163] cert-manager/certificates-readiness "msg"="re-queuing item due to error processing" "error"="multiple CertificateRequests were found for the 'next' revision 3, issuance is skipped until there are no more duplicates" "key"="apigee-system/apigee-serving-cert"

1 controller.go:167] cert-manager/certificates-readiness "msg"="re-queuing item due to error processing" "error"="multiple CertificateRequests were found for the 'next' revision 683, issuance is skipped until there are no more duplicates" "key"="apigee/apigee-istiod"

If you see either of the messages shown above, the apigee-serving-cert and the apigee-istiod-cert certificates will not be renewed.

List all certificate requests in the apigee-system namespace or the apigee namespace depending on the namespace printed in the log entries above and check to see if there are multiple certificate requests created for renewing the same apigee-serving-cert or apigee-istiod-cert certificate revisions:
```
kubectl -n apigee-system get certificaterequests
```

See the cert-manager issue relevant to this problem at cert-manager created multiple CertificateRequest objects with the same certificate-revision.

Resolution

Note: Execute the following commands for the apigee or apigee-system namespace depending on the namespace where the duplicate certificate requests were found.

Delete all certificate requests in apigee-system namespace:

kubectl -n apigee-system delete certificaterequests --all

Verify that duplicated certificate requests have been deleted and only one certificate request is available for the apigee-serving-cert certificate in apigee-system namespace:
```
kubectl -n apigee-system get certificaterequests
```

Verify that the apigee-serving-cert certificate has been renewed:

kubectl -n apigee-system get certificates apigee-serving-cert -o yaml

An example output:

apiVersion: cert-manager.io/v1 kind: Certificate metadata:  creationTimestamp: "2023-06-26T13:25:10Z"  generation: 1  name: apigee-serving-cert  namespace: apigee-system  resourceVersion: "11053"  uid: e7718341-b3ca-4c93-a6d4-30cf70a33e2b spec:  dnsNames:  - apigee-webhook-service.apigee-system.svc  - apigee-webhook-service.apigee-system.svc.cluster.local  issuerRef:  kind: Issuer  name: apigee-selfsigned-issuer  secretName: webhook-server-cert status:  conditions:  - lastTransitionTime: "2023-06-26T13:25:11Z"  message: Certificate is up to date and has not expired  observedGeneration: 1  reason: Ready  status: "True"  type: Ready  notAfter: "2023-09-24T13:25:11Z"  notBefore: "2023-06-26T13:25:11Z"  renewalTime: "2023-08-25T13:25:11Z"  revision: 1

Cause: cert-manager is not healthy

Diagnosis

Check the health of the cert-manager pods in the cert-manager namespace:

kubectl -n cert-manager get pods

If cert-manager pods are healthy, all cert-manager pods should be ready (1/1) and in Running state, otherwise, that could be the reason for this issue:

NAME READY STATUS RESTARTS AGE cert-manager-59cf78f685-mlkvx 1/1 Running 0 15d cert-manager-cainjector-78cc865768-krjcp 1/1 Running 0 15d cert-manager-webhook-77c4fb46b6-7g9g6 1/1 Running 0 15d

The cert-manager can fail for many reasons. Check the cert-manager logs and identify the reason for the failure and resolve them accordingly.

One known reason is that the cert-manager will fail if it cannot communicate with the Kubernetes API. In this case, an error message similar to following is displayed::
```
E0601 00:10:27.841516 1 leaderelection.go:330] error retrieving resource lock kube-system/cert-manager-controller: Get "https://192.168.0.1:443/api/v1/namespaces/kube-system/configmaps/cert-manager-controller": dial tcp 192.168.0.1:443: i/o timeout
```

Resolution

Check the health of the Kubernetes cluster and fix any issues found. See Troubleshooting Clusters.
Refer to Troubleshooting for additional cert-manager troubleshooting information.

Must gather diagnostic information

If the problem persists even after following the above instructions, gather the following diagnostic information, and then contact Google Cloud Customer Care.

Google Cloud Project ID
Apigee hybrid organization
Apigee hybrid overrides.yaml file, masking any sensitive information.

Kubernetes pod status in all namespaces:

kubectl get pods -A > kubectl-pod-status`date +%Y.%m.%d_%H.%M.%S`.txt

Kubernetes cluster-info dump:

# generate kubernetes cluster-info dump kubectl cluster-info dump -A --output-directory=/tmp/kubectl-cluster-info-dump # zip kubernetes cluster-info dump zip -r kubectl-cluster-info-dump`date +%Y.%m.%d_%H.%M.%S`.zip /tmp/kubectl-cluster-info-dump/*

API proxy deployments fail with apigee-serving-cert is not found or expired Stay organized with collections Save and categorize content based on your preferences.

Symptoms

Error Messages

Possible Causes

Cause: The apigee-serving-cert is not found

Diagnosis

Resolution

Cause: Duplicate certificate requests were created for renewing apigee-serving-cert

Diagnosis

Resolution

Cause: cert-manager is not healthy

Diagnosis

Resolution

Must gather diagnostic information

API proxy deployments fail with apigee-serving-cert is not found or expired