- Notifications
You must be signed in to change notification settings - Fork 1.5k
Closed
Labels
Waiting on feedbackIssues that require feedback from User/Other community membersIssues that require feedback from User/Other community members
Milestone
Description
Describe the bug
The above implementation will swallow the following KubernetesClientException and then cause the next renew could not work properly until reach the deadline. This will be a serious problem when the K8s has multiple APIServer and the renewing one crashed. It seems that this is not an issue in the master branch because we also catch the KubernetesClientException. https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-client-api/src/main/java/io/fabric8/kubernetes/client/extended/leaderelection/LeaderElector.java#L146
io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [ConfigMap] with name: [flink-example-statemachine-cluster-config-map] in namespace: [default] failed. at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:206) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:167) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:90) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.ConfigMapLock.get(ConfigMapLock.java:55) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.tryAcquireOrRenew(LeaderElector.java:135) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.renew(LeaderElector.java:120) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$renewWithTimeout$1(LeaderElector.java:104) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:?] at java.util.concurrent.FutureTask.run(Unknown Source) [?:?] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?] at java.lang.Thread.run(Unknown Source) [?:?] Caused by: java.net.ConnectException: Failed to connect to /10.96.0.1:443 at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:265) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connect(RealConnection.java:183) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:108) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:88) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:133) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.utils.TokenRefreshInterceptor.intercept(TokenRefreshInterceptor.java:42) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createApplicableInterceptors$6(HttpClientUtils.java:290) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:229) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.RealCall.execute(RealCall.java:81) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.retryWithExponentialBackoff(OperationSupport.java:589) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:558) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:488) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:470) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:831) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:201) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] ... 12 more Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:?] at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source) ~[?:?] at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source) ~[?:?] at java.net.AbstractPlainSocketImpl.connect(Unknown Source) ~[?:?] at java.net.SocksSocketImpl.connect(Unknown Source) ~[?:?] at java.net.Socket.connect(Unknown Source) ~[?:?] at org.apache.flink.kubernetes.shaded.okhttp3.internal.platform.Platform.connectSocket(Platform.java:130) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:263) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connect(RealConnection.java:183) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:108) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:88) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:133) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.utils.TokenRefreshInterceptor.intercept(TokenRefreshInterceptor.java:42) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createApplicableInterceptors$6(HttpClientUtils.java:290) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:229) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.shaded.okhttp3.RealCall.execute(RealCall.java:81) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.retryWithExponentialBackoff(OperationSupport.java:589) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:558) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:488) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:470) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:831) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:201) ~[flink-kubernetes-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] ... 12 more Fabric8 Kubernetes Client version
5.5.0
Steps to reproduce
- Configure the
lease-durationandrenew-deadlineto 60s - Restart the apiserver in minikube via
docker restart {container-id} - The apiserver will recover in 10s
- Get the logs
Renew deadline reached after 60 seconds while renewing lockand leadership is revoked
Expected behavior
The leadership should not be revoked since the subsequent renew will succeed if apiserver recovered soon.
Runtime
minikube
Kubernetes API Server version
1.22.3@latest
Environment
Linux
Fabric8 Kubernetes Client Logs
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
Waiting on feedbackIssues that require feedback from User/Other community membersIssues that require feedback from User/Other community members
