Skip to content

Receiving Error: MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.UNAVAILABLE #1150

@MostafaOmar98

Description

@MostafaOmar98

Hello, so we have been seeing the following error:

<_MultiThreadedRendezvous of RPC that terminated with:	status = StatusCode.UNAVAILABLE	details = "Socket closed"	debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Socket closed", grpc_status:14, created_time:"2024-06-11T07:08:48.917638822+00:00"}" > 

Facts we know so far:

  1. It seems to be a transient error.
  2. It is not directly related to the spanner server instance but rather the connection between our application and spanner.
  3. It is not related to 1 specific query or application. Seems to be happening across different queries on different services.
  4. The error could be masked with retrial: https://cloud.google.com/spanner/docs/custom-timeout-and-retry. However, we see the rate of this error going up and down seemingly arbitrarily to us.

We have contacted the google support team and they have recommended we get insights by raising the issue on the client library. We acknowledge that we can mask this transient error by implementing a retrial mechanism. However, we are very interested in knowing what causes it and what factors cause this error to increase/decrease in its rate. We have a very performance-critical service that is getting affected by this error, so we would like to implement mechanisms to keep the error rate at its minimum and constant before we do a retrial on top of it.

Environment details

  • OS type and version: Debian 12.5
  • Python version: 3.10.14
  • pip version: 24.0
  • google-cloud-spanner version: "3.46.0"

Steps to reproduce

  1. Run a query enough amount of times for this transient error to happen

Code example

# init code client = Client("project name") instance = client.instance("instance name") pool = PingingPool( size=20, default_timeout=10, ping_interval=300 ) self.db = instance.database(db, pool=pool) SpannerDB.background_pool_pinging(pool) # query execution code query = "SELECT <> FROM <table>" with self.db.snapshot() as snapshot: res = snapshot.execute_sql(query) # background pinging pool code def background_pool_pinging(pool): import threading import time def target(): while True: pool.ping() time.sleep(10) background = threading.Thread(target=target, name='spanner-ping-pool') background.daemon = True background.start()

Stack trace

(censored internal function name/files)

_MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:	status = StatusCode.UNAVAILABLE	details = "Socket closed"	debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Socket closed", grpc_status:14, created_time:"2024-06-11T07:34:07.272354902+00:00"}" > File "/opt/venv/lib/python3.10/site-packages/google/api_core/grpc_helpers.py", line 170, in error_remapped_callable return _StreamingResponseIterator( File "/opt/venv/lib/python3.10/site-packages/google/api_core/grpc_helpers.py", line 92, in __init__ self._stored_first_result = next(self._wrapped) File "grpc/_channel.py", line 541, in __next__ return self._next() File "grpc/_channel.py", line 967, in _next raise self ServiceUnavailable: Socket closed File "starlette/applications.py", line 124, in __call__ await self.middleware_stack(scope, receive, send) File "starlette/middleware/errors.py", line 184, in __call__ raise exc File "starlette/middleware/errors.py", line 162, in __call__ await self.app(scope, receive, _send) File "starlette/middleware/base.py", line 72, in __call__ response = await self.dispatch_func(request, call_next) File "starlette/middleware/base.py", line 46, in call_next raise app_exc File "starlette/middleware/base.py", line 36, in coro await self.app(scope, request.receive, send_stream.send) File "/opt/venv/lib/python3.10/site-packages/opentelemetry/instrumentation/asgi/__init__.py", line 581, in __call__ await self.app(scope, otel_receive, otel_send) File "starlette/middleware/base.py", line 72, in __call__ response = await self.dispatch_func(request, call_next) File "********", line 149, in dispatch response = await call_next(request) File "starlette/middleware/base.py", line 46, in call_next raise app_exc File "starlette/middleware/base.py", line 36, in coro await self.app(scope, request.receive, send_stream.send) File "starlette/middleware/exceptions.py", line 75, in __call__ raise exc File "starlette/middleware/exceptions.py", line 64, in __call__ await self.app(scope, receive, sender) File "fastapi/middleware/asyncexitstack.py", line 21, in __call__ raise e File "fastapi/middleware/asyncexitstack.py", line 18, in __call__ await self.app(scope, receive, send) File "starlette/routing.py", line 680, in __call__ await route.handle(scope, receive, send) File "starlette/routing.py", line 275, in handle await self.app(scope, receive, send) File "starlette/routing.py", line 65, in app response = await func(request) File "********", line 35, in custom_route_handler response = await original_route_handler(request) File "fastapi/routing.py", line 231, in app raw_response = await run_endpoint_function( File "fastapi/routing.py", line 162, in run_endpoint_function return await run_in_threadpool(dependant.call, **values) File "starlette/concurrency.py", line 41, in run_in_threadpool return await anyio.to_thread.run_sync(func, *args) File "anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "********", line 70, in ******** return ********( File "********", line 125, in ******** ******** = ********( File "********", line 519, in get_rocket_warehouse_legs spanner_ctx.spanner_conn.execute_query( File "********", line 122, in execute_query return self.execute_sql(query, max_staleness_seconds, **new_kwargs) File "********", line 118, in execute_sql return SpannerProxy(res) File "********", line 21, in __new__ first = next(it) File "/opt/venv/lib/python3.10/site-packages/google/cloud/spanner_v1/streamed.py", line 145, in __iter__ self._consume_next() File "/opt/venv/lib/python3.10/site-packages/google/cloud/spanner_v1/streamed.py", line 117, in _consume_next response = next(self._response_iterator) File "/opt/venv/lib/python3.10/site-packages/google/cloud/spanner_v1/snapshot.py", line 88, in _restart_on_unavailable iterator = method(request=request) File "/opt/venv/lib/python3.10/site-packages/google/cloud/spanner_v1/services/spanner/client.py", line 1444, in execute_streaming_sql response = rpc( File "/opt/venv/lib/python3.10/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__ return wrapped_func(*args, **kwargs) File "/opt/venv/lib/python3.10/site-packages/google/api_core/timeout.py", line 120, in func_with_timeout return func(*args, **kwargs) File "/opt/venv/lib/python3.10/site-packages/google/api_core/grpc_helpers.py", line 174, in error_remapped_callable raise exceptions.from_grpc_error(exc) from exc 

Metadata

Metadata

Assignees

Labels

api: spannerIssues related to the googleapis/python-spanner API.priority: p3Desirable enhancement or fix. May not be included in next release.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions