Skip to content

KubernetesClientException is swallowed in LeaderElector #4246

@wangyang0918

Description

@wangyang0918

Describe the bug

image

The above implementation will swallow the following KubernetesClientException and then cause the next renew could not work properly until reach the deadline. This will be a serious problem when the K8s has multiple APIServer and the renewing one crashed. It seems that this is not an issue in the master branch because we also catch the KubernetesClientException. https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-client-api/src/main/java/io/fabric8/kubernetes/client/extended/leaderelection/LeaderElector.java#L146

io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [ConfigMap] with name: [flink-example-statemachine-cluster-config-map] in namespace: [default] failed.	at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:206) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:167) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:90) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.ConfigMapLock.get(ConfigMapLock.java:55) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.tryAcquireOrRenew(LeaderElector.java:135) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.renew(LeaderElector.java:120) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$renewWithTimeout$1(LeaderElector.java:104) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:?]	at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [?:?]	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]	at java.lang.Thread.run(Unknown Source) [?:?] Caused by: java.net.ConnectException: Failed to connect to /10.96.0.1:443	at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:265) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connect(RealConnection.java:183) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:108) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:88) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:133) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.utils.TokenRefreshInterceptor.intercept(TokenRefreshInterceptor.java:42) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createApplicableInterceptors$6(HttpClientUtils.java:290) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:229) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.RealCall.execute(RealCall.java:81) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.retryWithExponentialBackoff(OperationSupport.java:589) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:558) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:488) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:470) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:831) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:201) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	... 12 more Caused by: java.net.ConnectException: Connection refused (Connection refused)	at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:?]	at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source) ~[?:?]	at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source) ~[?:?]	at java.net.AbstractPlainSocketImpl.connect(Unknown Source) ~[?:?]	at java.net.SocksSocketImpl.connect(Unknown Source) ~[?:?]	at java.net.Socket.connect(Unknown Source) ~[?:?]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.platform.Platform.connectSocket(Platform.java:130) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:263) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connect(RealConnection.java:183) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:108) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:88) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:133) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.utils.TokenRefreshInterceptor.intercept(TokenRefreshInterceptor.java:42) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createApplicableInterceptors$6(HttpClientUtils.java:290) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:229) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at org.apache.flink.kubernetes.shaded.okhttp3.RealCall.execute(RealCall.java:81) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.retryWithExponentialBackoff(OperationSupport.java:589) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:558) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:488) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:470) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:831) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:201) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]	... 12 more 

Fabric8 Kubernetes Client version

5.5.0

Steps to reproduce

  1. Configure the lease-duration and renew-deadline to 60s
  2. Restart the apiserver in minikube via docker restart {container-id}
  3. The apiserver will recover in 10s
  4. Get the logs Renew deadline reached after 60 seconds while renewing lock and leadership is revoked

Expected behavior

The leadership should not be revoked since the subsequent renew will succeed if apiserver recovered soon.

Runtime

minikube

Kubernetes API Server version

1.22.3@latest

Environment

Linux

Fabric8 Kubernetes Client Logs

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

Waiting on feedbackIssues that require feedback from User/Other community members

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions