I'm currently using pay-as-you-go on Azure OpenAI and I'm curious to see how switching to provisioned throughput units would improve the latency.
I read:
Predictable performance: stable max latency and throughput for uniform workloads.
What's the max latency for OpenAI models accessed via Azure with provisioned throughput units?
