Hello Developers,
I’m currently encountering 429 errors when using the Gemini-Embedding-001 model. The specific error message is as follows:
{'error': {'code': 429, 'message': "Quota exceeded for quota metric 'Read API requests' and limit 'Model operations request limit per minute for a region' of service 'generativelanguage.googleapis.com' for consumer 'project_number:327036061011'.", 'status': 'RESOURCE_EXHAUSTED', 'details': [{'@type': 'type.googleapis.com/google.rpc.ErrorInfo', 'reason': 'RATE_LIMIT_EXCEEDED', 'domain': 'googleapis.com', 'metadata': {'quota_location': 'us-south1', 'quota_limit_value': '200', 'quota_unit': '1/min/{project}/{region}', 'service': 'generativelanguage.googleapis.com', 'consumer': 'projects/327036061011', 'quota_metric': 'generativelanguage.googleapis.com/model_requests', 'quota_limit': 'ModelRequestsPerMinutePerProjectPerRegion'}}, {'@type': 'type.googleapis.com/google.rpc.Help', 'links': [{'description': 'Request a higher quota limit.', 'url': 'https://cloud.google.com/docs/quotas/help/request_increase'}] I’ve noticed that while my Gemini API account is a Tier 1 paid account, and the documentation suggests an RPM (Requests Per Minute) of 3000 for Embedding models, I’m actually hitting a limit of “Model operations request limit per minute for a region” with a value of 200 RPM.
This is quite confusing. If this regional RPM limit of 200 is in place, how can we achieve the true 3000 RPM rate for Gemini Embedding models to support large-scale applications?
I’ve also tried upgrading to a Tier 2 account, but I’m still encountering the same 429 error.
Has anyone else experienced similar issues? Or is there an official explanation on how to overcome this regional RPM limit when using Embedding models at scale, to fully leverage the higher limits of paid accounts?
Any advice or guidance would be greatly appreciated!

