0

I'm currently using pay-as-you-go on Azure OpenAI and I'm curious to see how switching to provisioned throughput units would improve the latency.

I read:

Predictable performance: stable max latency and throughput for uniform workloads.

What's the max latency for OpenAI models accessed via Azure with provisioned throughput units?

1 Answer 1

0

https://azure.microsoft.com/en-us/blog/accelerate-scale-with-azure-openai-service-provisioned-offering

We’re introducing a 99% latency service level agreement for token generation. This latency SLA ensures that tokens are generated at a faster and more consistent speeds, especially at high volumes.

The latency depends on the chosen model and deployment region, as detailed on https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding (see the last row):

enter image description here

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.