We’ve updated our Terms of Service. A new AI Addendum clarifies how Stack Overflow utilizes AI interactions.

What's the max latency for OpenAI models accessed via Azure with provisioned throughput units?

Asked 1 year, 1 month ago

Modified 3 months ago

Viewed 225 times

0

I'm currently using pay-as-you-go on Azure OpenAI and I'm curious to see how switching to provisioned throughput units would improve the latency.

Predictable performance: stable max latency and throughput for uniform workloads.

What's the max latency for OpenAI models accessed via Azure with provisioned throughput units?

asked Oct 13, 2024 at 21:24

Franck Dernoncourt

1,4924 gold badges20 silver badges45 bronze badges

Add a comment |

1 Answer 1

Sorted by:

0

https://azure.microsoft.com/en-us/blog/accelerate-scale-with-azure-openai-service-provisioned-offering

We’re introducing a 99% latency service level agreement for token generation. This latency SLA ensures that tokens are generated at a faster and more consistent speeds, especially at high volumes.

The latency depends on the chosen model and deployment region, as detailed on https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding (see the last row):

edited Jul 29 at 1:10

answered Jul 28 at 21:47

Franck Dernoncourt

1,4924 gold badges20 silver badges45 bronze badges

Add a comment |

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Explore related questions

See similar questions with these tags.