[Bug]: VertexAI custom model does not pick up uploaded token

What happened?

Calls to predict methods on VertexAI's custom deployed models prefer to use the VertexAI tokens configured on the GOOGLE_APPLICATION_CREDENTIALS environment variable instead of the token file uploaded while creating the models.

Meanwhile, VertexAI models configured to use OpenAI like completion endpoints are able to use the tokens uploaded during model creation on proxy UI and produces responses as expected.

We have a situation in which a specific model from different VertexAI project instead of the default project has to be called through custom predict call. So it will be helpful to get the VertexAI custom deployed models to use the token file uploaded upon model creation during the predict callls.

Relevant log output

{"message": "Trying to fallback b/w models", "level": "INFO", "timestamp": "2025-02-17T19:04:24.835020"} Traceback (most recent call last): File "/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py", line 85, in __await__ response = yield from self._call.__await__() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.13/site-packages/grpc/aio/_call.py", line 327, in __await__ raise _create_rpc_error( ...<2 lines>... ) grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:	status = StatusCode.PERMISSION_DENIED	details = "Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*******/locations/us-central1/endpoints/1984786713414729728' (or it may not exist)."	debug_error_string = "UNKNOWN:Error received from peer ipv4:142.250.191.138:443 {created_time:"2025-02-17T19:04:22.270203099+00:00", grpc_status:7, grpc_message:"Permission \'aiplatform.endpoints.predict\' denied on resource \'//aiplatform.googleapis.com/projects/******/locations/us-central1/endpoints/1984786713414729728\' (or it may not exist)."}" > The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/lib/python3.13/site-packages/litellm/main.py", line 466, in acompletion response = await init_response ^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/vertex_ai_non_gemini.py", line 738, in async_streaming response_obj = await llm_model.predict( ^^^^^^^^^^^^^^^^^^^^^^^^ ...<2 lines>... ) ^ File "/usr/lib/python3.13/site-packages/google/cloud/aiplatform_v1/services/prediction_service/async_client.py", line 404, in predict response = await rpc( ^^^^^^^^^^ ...<4 lines>... ) ^ File "/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py", line 88, in __await__ raise exceptions.from_grpc_error(rpc_error) from rpc_error google.api_core.exceptions.PermissionDenied: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: "IAM_PERMISSION_DENIED" domain: "aiplatform.googleapis.com" metadata { key: "resource" value: "projects/*********/locations/us-central1/endpoints/1984786713414729728" } metadata { key: "permission" value: "aiplatform.endpoints.predict" } ] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.13/site-packages/litellm/router.py", line 2889, in async_function_with_fallbacks response = await self.async_function_with_retries(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3262, in async_function_with_retries raise original_exception File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3155, in async_function_with_retries response = await self.make_call(original_function, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3271, in make_call response = await response ^^^^^^^^^^^^^^ File "/usr/lib/python3.13/site-packages/litellm/router.py", line 1042, in _acompletion raise e File "/usr/lib/python3.13/site-packages/litellm/router.py", line 1001, in _acompletion response = await _response ^^^^^^^^^^^^^^^ File "/usr/lib/python3.13/site-packages/litellm/utils.py", line 1394, in wrapper_async raise e File "/usr/lib/python3.13/site-packages/litellm/utils.py", line 1253, in wrapper_async result = await original_function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.13/site-packages/litellm/main.py", line 485, in acompletion raise exception_type( ~~~~~~~~~~~~~~^ model=model, ^^^^^^^^^^^^ ...<3 lines>... extra_kwargs=kwargs, ^^^^^^^^^^^^^^^^^^^^ ) ^ File "/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2202, in exception_type raise e File "/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2178, in exception_type raise APIConnectionError( ...<8 lines>... ) litellm.exceptions.APIConnectionError: litellm.APIConnectionError: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: "IAM_PERMISSION_DENIED" domain: "aiplatform.googleapis.com" metadata { key: "resource" value: "projects/*********/locations/us-central1/endpoints/1984786713414729728" } metadata { key: "permission" value: "aiplatform.endpoints.predict" } ] Traceback (most recent call last): File "/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py", line 85, in __await__ response = yield from self._call.__await__() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.13/site-packages/grpc/aio/_call.py", line 327, in __await__ raise _create_rpc_error( ...<2 lines>... ) grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:	status = StatusCode.PERMISSION_DENIED	details = "Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist)."	debug_error_string = "UNKNOWN:Error received from peer ipv4:142.250.191.138:443 {created_time:"2025-02-17T19:04:22.270203099+00:00", grpc_status:7, grpc_message:"Permission \'aiplatform.endpoints.predict\' denied on resource \'//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728\' (or it may not exist)."}" > The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/lib/python3.13/site-packages/litellm/main.py", line 466, in acompletion response = await init_response ^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/vertex_ai_non_gemini.py", line 738, in async_streaming response_obj = await llm_model.predict( ^^^^^^^^^^^^^^^^^^^^^^^^ ...<2 lines>... ) ^ File "/usr/lib/python3.13/site-packages/google/cloud/aiplatform_v1/services/prediction_service/async_client.py", line 404, in predict response = await rpc( ^^^^^^^^^^ ...<4 lines>... ) ^ File "/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py", line 88, in __await__ raise exceptions.from_grpc_error(rpc_error) from rpc_error google.api_core.exceptions.PermissionDenied: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: "IAM_PERMISSION_DENIED" domain: "aiplatform.googleapis.com" metadata { key: "resource" value: "projects/*********/locations/us-central1/endpoints/1984786713414729728" } metadata { key: "permission" value: "aiplatform.endpoints.predict" } ] LiteLLM Retried: 1 times, LiteLLM Max Retries: 2 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3010, in async_function_with_fallbacks get_fallback_model_group( ~~~~~~~~~~~~~~~~~~~~~~~~^ fallbacks=fallbacks, # if fallbacks = [{"gpt-3.5-turbo": ["claude-3-haiku"]}] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ model_group=cast(str, model_group), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ) ^ File "/usr/lib/python3.13/site-packages/litellm/router_utils/fallback_event_handlers.py", line 61, in get_fallback_model_group if list(item.keys())[0] == model_group: # check exact match ~~~~~~~~~~~~~~~~~^^^ IndexError: list index out of range {"message": "litellm.router.py::async_function_with_fallbacks() - Error occurred while trying to do fallbacks - list index out of range\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 85, in __await__\n response = yield from self._call.__await__()\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/grpc/aio/_call.py\", line 327, in __await__\n raise _create_rpc_error(\n ...<2 lines>...\n )\ngrpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:\n\tstatus = StatusCode.PERMISSION_DENIED\n\tdetails = \"Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist).\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer ipv4:142.250.191.138:443 {created_time:\"2025-02-17T19:04:22.270203099+00:00\", grpc_status:7, grpc_message:\"Permission \\'aiplatform.endpoints.predict\\' denied on resource \\'//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728\\' (or it may not exist).\"}\"\n>\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/litellm/main.py\", line 466, in acompletion\n response = await init_response\n ^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/vertex_ai_non_gemini.py\", line 738, in async_streaming\n response_obj = await llm_model.predict(\n ^^^^^^^^^^^^^^^^^^^^^^^^\n ...<2 lines>...\n )\n ^\n File \"/usr/lib/python3.13/site-packages/google/cloud/aiplatform_v1/services/prediction_service/async_client.py\", line 404, in predict\n response = await rpc(\n ^^^^^^^^^^\n ...<4 lines>...\n )\n ^\n File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 88, in __await__\n raise exceptions.from_grpc_error(rpc_error) from rpc_error\ngoogle.api_core.exceptions.PermissionDenied: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n key: \"resource\"\n value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n key: \"permission\"\n value: \"aiplatform.endpoints.predict\"\n}\n]\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 2889, in async_function_with_fallbacks\n response = await self.async_function_with_retries(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3262, in async_function_with_retries\n raise original_exception\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3155, in async_function_with_retries\n response = await self.make_call(original_function, *args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3271, in make_call\n response = await response\n ^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 1042, in _acompletion\n raise e\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 1001, in _acompletion\n response = await _response\n ^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/utils.py\", line 1394, in wrapper_async\n raise e\n File \"/usr/lib/python3.13/site-packages/litellm/utils.py\", line 1253, in wrapper_async\n result = await original_function(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/main.py\", line 485, in acompletion\n raise exception_type(\n ~~~~~~~~~~~~~~^\n model=model,\n ^^^^^^^^^^^^\n ...<3 lines>...\n extra_kwargs=kwargs,\n ^^^^^^^^^^^^^^^^^^^^\n )\n ^\n File \"/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py\", line 2202, in exception_type\n raise e\n File \"/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py\", line 2178, in exception_type\n raise APIConnectionError(\n ...<8 lines>...\n )\nlitellm.exceptions.APIConnectionError: litellm.APIConnectionError: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n key: \"resource\"\n value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n key: \"permission\"\n value: \"aiplatform.endpoints.predict\"\n}\n]\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 85, in __await__\n response = yield from self._call.__await__()\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/grpc/aio/_call.py\", line 327, in __await__\n raise _create_rpc_error(\n ...<2 lines>...\n )\ngrpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:\n\tstatus = StatusCode.PERMISSION_DENIED\n\tdetails = \"Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist).\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer ipv4:142.250.191.138:443 {created_time:\"2025-02-17T19:04:22.270203099+00:00\", grpc_status:7, grpc_message:\"Permission \\'aiplatform.endpoints.predict\\' denied on resource \\'//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728\\' (or it may not exist).\"}\"\n>\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/litellm/main.py\", line 466, in acompletion\n response = await init_response\n ^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/vertex_ai_non_gemini.py\", line 738, in async_streaming\n response_obj = await llm_model.predict(\n ^^^^^^^^^^^^^^^^^^^^^^^^\n ...<2 lines>...\n )\n ^\n File \"/usr/lib/python3.13/site-packages/google/cloud/aiplatform_v1/services/prediction_service/async_client.py\", line 404, in predict\n response = await rpc(\n ^^^^^^^^^^\n ...<4 lines>...\n )\n ^\n File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 88, in __await__\n raise exceptions.from_grpc_error(rpc_error) from rpc_error\ngoogle.api_core.exceptions.PermissionDenied: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n key: \"resource\"\n value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n key: \"permission\"\n value: \"aiplatform.endpoints.predict\"\n}\n]\n LiteLLM Retried: 1 times, LiteLLM Max Retries: 2\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3010, in async_function_with_fallbacks\n get_fallback_model_group(\n ~~~~~~~~~~~~~~~~~~~~~~~~^\n fallbacks=fallbacks, # if fallbacks = [{\"gpt-3.5-turbo\": [\"claude-3-haiku\"]}]\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n model_group=cast(str, model_group),\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n )\n ^\n File \"/usr/lib/python3.13/site-packages/litellm/router_utils/fallback_event_handlers.py\", line 61, in get_fallback_model_group\n if list(item.keys())[0] == model_group: # check exact match\n ~~~~~~~~~~~~~~~~~^^^\nIndexError: list index out of range\n\n\nDebug Information:\nCooldown Deployments=[]", "level": "ERROR", "timestamp": "2025-02-17T19:04:24.849290"} {"message": "litellm.proxy.proxy_server.chat_completion(): Exception occured - litellm.APIConnectionError: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n key: \"resource\"\n value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n key: \"permission\"\n value: \"aiplatform.endpoints.predict\"\n}\n]\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 85, in __await__\n response = yield from self._call.__await__()\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/grpc/aio/_call.py\", line 327, in __await__\n raise _create_rpc_error(\n ...<2 lines>...\n )\ngrpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:\n\tstatus = StatusCode.PERMISSION_DENIED\n\tdetails = \"Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist).\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer ipv4:142.250.191.138:443 {created_time:\"2025-02-17T19:04:22.270203099+00:00\", grpc_status:7, grpc_message:\"Permission \\'aiplatform.endpoints.predict\\' denied on resource \\'//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728\\' (or it may not exist).\"}\"\n>\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/litellm/main.py\", line 466, in acompletion\n response = await init_response\n ^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/vertex_ai_non_gemini.py\", line 738, in async_streaming\n response_obj = await llm_model.predict(\n ^^^^^^^^^^^^^^^^^^^^^^^^\n ...<2 lines>...\n )\n ^\n File \"/usr/lib/python3.13/site-packages/google/cloud/aiplatform_v1/services/prediction_service/async_client.py\", line 404, in predict\n response = await rpc(\n ^^^^^^^^^^\n ...<4 lines>...\n )\n ^\n File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 88, in __await__\n raise exceptions.from_grpc_error(rpc_error) from rpc_error\ngoogle.api_core.exceptions.PermissionDenied: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n key: \"resource\"\n value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n key: \"permission\"\n value: \"aiplatform.endpoints.predict\"\n}\n]\n\nReceived Model Group=pco-llama3-1-8b-ft-icd-l4-predict\nAvailable Model Group Fallbacks=None\nError doing the fallback: list index out of range LiteLLM Retried: 1 times, LiteLLM Max Retries: 2", "level": "ERROR", "timestamp": "2025-02-17T19:04:24.855748", "stacktrace": "Traceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 85, in __await__\n response = yield from self._call.__await__()\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/grpc/aio/_call.py\", line 327, in __await__\n raise _create_rpc_error(\n ...<2 lines>...\n )\ngrpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:\n\tstatus = StatusCode.PERMISSION_DENIED\n\tdetails = \"Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist).\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer ipv4:142.250.191.138:443 {created_time:\"2025-02-17T19:04:22.270203099+00:00\", grpc_status:7, grpc_message:\"Permission \\'aiplatform.endpoints.predict\\' denied on resource \\'//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728\\' (or it may not exist).\"}\"\n>\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/litellm/main.py\", line 466, in acompletion\n response = await init_response\n ^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/vertex_ai_non_gemini.py\", line 738, in async_streaming\n response_obj = await llm_model.predict(\n ^^^^^^^^^^^^^^^^^^^^^^^^\n ...<2 lines>...\n )\n ^\n File \"/usr/lib/python3.13/site-packages/google/cloud/aiplatform_v1/services/prediction_service/async_client.py\", line 404, in predict\n response = await rpc(\n ^^^^^^^^^^\n ...<4 lines>...\n )\n ^\n File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 88, in __await__\n raise exceptions.from_grpc_error(rpc_error) from rpc_error\ngoogle.api_core.exceptions.PermissionDenied: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n key: \"resource\"\n value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n key: \"permission\"\n value: \"aiplatform.endpoints.predict\"\n}\n]\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/litellm/proxy/proxy_server.py\", line 3587, in chat_completion\n responses = await llm_responses\n ^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 904, in acompletion\n raise e\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 880, in acompletion\n response = await self.async_function_with_fallbacks(**kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3071, in async_function_with_fallbacks\n raise original_exception\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 2889, in async_function_with_fallbacks\n response = await self.async_function_with_retries(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3262, in async_function_with_retries\n raise original_exception\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3155, in async_function_with_retries\n response = await self.make_call(original_function, *args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3271, in make_call\n response = await response\n ^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 1042, in _acompletion\n raise e\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 1001, in _acompletion\n response = await _response\n ^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/utils.py\", line 1394, in wrapper_async\n raise e\n File \"/usr/lib/python3.13/site-packages/litellm/utils.py\", line 1253, in wrapper_async\n result = await original_function(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/main.py\", line 485, in acompletion\n raise exception_type(\n ~~~~~~~~~~~~~~^\n model=model,\n ^^^^^^^^^^^^\n ...<3 lines>...\n extra_kwargs=kwargs,\n ^^^^^^^^^^^^^^^^^^^^\n )\n ^\n File \"/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py\", line 2202, in exception_type\n raise e\n File \"/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py\", line 2178, in exception_type\n raise APIConnectionError(\n ...<8 lines>...\n )\nlitellm.exceptions.APIConnectionError: litellm.APIConnectionError: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n key: \"resource\"\n value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n key: \"permission\"\n value: \"aiplatform.endpoints.predict\"\n}\n]\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 85, in __await__\n response = yield from self._call.__await__()\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/grpc/aio/_call.py\", line 327, in __await__\n raise _create_rpc_error(\n ...<2 lines>...\n )\ngrpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:\n\tstatus = StatusCode.PERMISSION_DENIED\n\tdetails = \"Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist).\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer ipv4:142.250.191.138:443 {created_time:\"2025-02-17T19:04:22.270203099+00:00\", grpc_status:7, grpc_message:\"Permission \\'aiplatform.endpoints.predict\\' denied on resource \\'//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728\\' (or it may not exist).\"}\"\n>\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/litellm/main.py\", line 466, in acompletion\n response = await init_response\n ^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/vertex_ai_non_gemini.py\", line 738, in async_streaming\n response_obj = await llm_model.predict(\n ^^^^^^^^^^^^^^^^^^^^^^^^\n ...<2 lines>...\n )\n ^\n File \"/usr/lib/python3.13/site-packages/google/cloud/aiplatform_v1/services/prediction_service/async_client.py\", line 404, in predict\n response = await rpc(\n ^^^^^^^^^^\n ...<4 lines>...\n )\n ^\n File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 88, in __await__\n raise exceptions.from_grpc_error(rpc_error) from rpc_error\ngoogle.api_core.exceptions.PermissionDenied: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n key: \"resource\"\n value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n key: \"permission\"\n value: \"aiplatform.endpoints.predict\"\n}\n]\n\nReceived Model Group=pco-llama3-1-8b-ft-icd-l4-predict\nAvailable Model Group Fallbacks=None\nError doing the fallback: list index out of range LiteLLM Retried: 1 times, LiteLLM Max Retries: 2"} {"message": "{\"event\": \"giveup\", \"exception\": \"\"}", "level": "INFO", "timestamp": "2025-02-17T19:04:24.862449"} {"message": "Giving up chat_completion(...) after 1 tries (litellm.proxy._types.ProxyException)", "level": "ERROR", "timestamp": "2025-02-17T19:04:24.867654"} {"message": "litellm.acompletion(model=azure/mlp-genai-npe-eastus2-gpt4o)\u001b[32m 200 OK\u001b[0m", "level": "INFO", "timestamp": "2025-02-17T19:04:27.345561"} {"message": "disable_spend_logs=True. Skipping writing spend logs to db. Other spend updates - Key/User/Team table will still occur.", "level": "INFO", "timestamp": "2025-02-17T19:04:27.346675"}

Are you a ML Ops Team?

Yes

What LiteLLM version are you on ?

v1.61.3

Twitter / LinkedIn details

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: VertexAI custom model does not pick up uploaded token #8597

What happened?

Relevant log output

Are you a ML Ops Team?

What LiteLLM version are you on ?

Twitter / LinkedIn details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: VertexAI custom model does not pick up uploaded token #8597

Description

What happened?

Relevant log output

Are you a ML Ops Team?

What LiteLLM version are you on ?

Twitter / LinkedIn details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions