Elevating AI Workflows: Integrating Azure API Management and Azure Functions with OpenAI

Elevating AI Workflows: Integrating Azure API Management and Azure Functions with Azure OpenAI Callon Campbell Microsoft MVP | Azure @flying_maverick

Sponsors Microsoft is a proud sponsor of Global Azure in Toronto on April 20th . We’re passionate about supporting the developer community and invite you to access valuable documentation and training resources by visiting docs.microsoft.com and the Microsoft Reactor. We are a team of senior infrastructure specialists, software developers and data engineers who are experts in the Microsoft Azure Cloud. We partner with you to deliver innovative business solutions using Agile, DevOps and advanced Software Automation. Twenty years in business. Headquartered in Toronto. Work for customers across Canada and the US. Work across multiple industries and sectors. www.objectsharp.com

About me  25 years enterprise development with Microsoft technologies – .NET (C#), Azure, ASP.NET, Desktop, SQL, and Mobile  Passionate about serverless and cloud-native application development, with focus on app migration and modernization, app integration and data analytics  Blog at https://TheFlyingMaverick.com, @flying_maverick  Speaker at community events and meetups  Organizer of “Canada’s Technology Triangle .NET User Group” in Kitchener, Ontario Callon Campbell Azure Architect | Developer Adastra Microsoft MVP | Azure (2018-2025)

Agenda  What is API Management and why it remains critical in the era of AI  How to govern runtime of AI APIs  Integrations with Azure OpenAI  Demos  Q&A

Why are we here • Generative AI has ignited a remarkable range of possibilities • All industry sectors are embracing AI advancements • Most AI services are utilized and accessed via APIs • It’s essential to have a well-planned API management strategy to ensure the effective use of AI services • Approaches driven by experimentation are the enablers that pave the road to success

APIs are the backbone of digital transformation, modern apps, and AI interfaces.

What is API Management?  Comprehensive platform for managing APIs across all environments.  Provides tools for:  Creating APIs  Publishing APIs  Securing APIs  Analyzing APIs  Helps organizations streamline their API strategies.

API Management Features  API Gateway: Acts as a front door for APIs, handling all incoming requests and routing them to the appropriate backend services.  Developer Portal: A customizable portal for API consumers to discover, learn about, and use APIs.  Management Plane: Tools for administrators to manage API lifecycle, policies, and analytics.  Security: Protects APIs with built-in security features like authentication, authorization, and rate limiting. (securely expose your Azure OpenAI endpoints)  Scalability: Supports scaling APIs to meet varying demand levels.  Monitoring & Analytics: Provides insights into API usage, performance, and health.

Runtime governance of AI APIs with API Management

GenAI development runs on APIs Intelligent Apps Conversational Agents Personalized Content Content Generation Chat on your Data Voice Assistants Your own Copilot But these APIs must be Managed Secured Governed AI Services Azure AI Services OpenAI Mistral LLaMa Azure AI Search Hugging Face Cohere and more!

Unmanaged AI APIs increase risk and hinder potential Unpredictable & unattributable costs Reliability concerns Security risks Developer friction Governance challenges

Azure API Management enables AI APIs Intelligent Apps Conversational Agents Personalized Content Content Generation Chat on your Data Voice Assistants Your own Copilot Cost efficiency High reliability Robust security Developer enablement Enhanced governance Native Azure integration Defender for APIs Policy Monitor … and more AI Services Azure AI Services OpenAI Mistral LLaMa Azure AI Search and more! Hugging Face Cohere

Maximize potential and take control of AI APIs with Azure API Management Cost Efficiency Control and attribute costs with token monitoring, limits, and quotas Return cached responses for semantically similar prompts High Reliability Enable geo- redundancy and automatic failovers with load balancing and circuit breakers Robust Security Isolate and manage user credentials Secure APIs with built-in controls and Microsoft Defender for Cloud Developer Enablement Replace custom backend code with built-in policies Publish AI APIs for consumption Gain insights with comprehensive logs Enhanced Governance Enforce runtime policies Centralize monitoring and audit logs

Scaling Up: Multiple Apps, Multiple OpenAI Endpoints Intelligent App Intelligent App Intelligent App Azure OpenAI Endpoints Scaling Challenges Track token usage Multiple OpenAI endpoints Authentication and authorization Assign token-based limits

GenAI gateway capabilities in API Management Intelligent App Intelligent App Intelligent App Azure API Management Token-based limiting GenAI Gateway Load balancing Semantic caching Observability Azure OpenAI Endpoints Managed identity

Demo API Management | Azure OpenAI

Request forwarding  APIM uses the managed identity (user or system assigned).  APIM is authorized to consume the Azure OpenAI API through Role Based Access Controls.  Zero impact on consumers using the API directly, with SDKs or orchestrators like LangChain. Just need to update the endpoint to use the APIM endpoint instead of Azure OpenAI endpoint.  Keyless approach: API consumers use the APIM subscription keys, and the Azure OpenAI keys are never used. Request forwarding

Token limit policy • Manage and enforce limits per API consumer based on the usage of Azure OpenAI Service tokens. • Set a rate limit, expressed in tokens-per-minute (TPM). • Set a token quota over a specified period, such as hourly, daily, weekly, monthly, or yearly.

Token limit policy <azure-openai-token-limit counter-key="@(context.Subscription.Id)" tokens-per-minute="500" estimate-prompt-tokens="false" remaining-tokens-variable-name="remainingTokens"> </azure-openai-token-limit>

Emit token metric policy • Sends metrics to Application Insights about consumption of LLM tokens through Azure OpenAI Service APIs. • Helps provide an overview of the utilization of Azure OpenAI Service models across multiple applications or API consumers. • Useful for chargeback scenarios, monitoring, and capacity planning.

Emit token metric policy <azure-openai-emit-token-metric namespace="openai"> <dimension name="Client IP" value="@(context.Request.IpAddress)" /> <dimension name="API ID" value="@(context.Api.Id)" /> <dimension name="User ID" value="@(context.Request.Headers.GetValueOrDefault("x-user-id", "N/A"))" /> </azure-openai-emit-token-metric>

Backend circuit breaking  Azure OpenAI endpoint is configured as an APIM backend, promoting reusability across APIs and improved governance.  Circuit breaking rules define controlled availability for the OpenAI endpoint.  When the circuit breaks, APIM stops sending requests to OpenAI.  Handles the status code 429 (Too Many Requests) and any other status code sent by the OpenAI service.  Doesn’t need any policy configuration. The rules are just properties of the backend. New product feature built-in backend circuit breaker functionality Backend circuit breaking

Backend load balancing  Spread the load to multiple backends, which may have individual backend circuit breakers.  Shift the load from one set of backends to another for upgrade (blue-green deployment).  Currently, the backend pool supports round-robin, weighted, and priority- based load balancing.  Doesn’t need any policy configuration. The rules are just properties of the backend. built-in load balancing backend pool functionality New product feature Backend load balancing

Semantic caching policy • Optimize token use by storing completions for similar prompts. • Helps reduce token consumption and improves response performance.

Well-Architected Framework principles • Zero-trust approach and keyless strategy for Azure OpenAI • Redundancy and capable of handling variable usage spikes • Elasticity and mechanisms to distribute the load to multiple endpoints • Observability to continuously improve quality and user experience • Cost control mechanisms to track token usage and allocate costs https://learn.microsoft.com/en-us/azure/well-architected/

Demos API Management | Backends, Policies, Chat App with Azure OpenAI

Architecture aka.ms/apim/genai/sample-app

Azure Functions Extensions for OpenAI

Why this extension? Compared with standard Azure OpenAI API call, the extension would give: • Capability to work with large variety of triggers and bindings offered by function apps. • Function Apps would have pre-defined triggers to allow developers control event-driven or routine-based tasks. This extension would work well with the current offered types of functions. • Flexibility in the development phase when multiple Azure products are engaged. Different bindings allow function apps to listen and respond when certain Azure product changes. With host.json file inside Function Apps, settings would be easier to adjust and test. • Essentially, this extension would help you make API calls to the Azure OpenAI endpoint with a smoother experience.

Integration With the integration between Azure OpenAI and Functions, you can build functions that can:

How does this work with API Management?  Essentially replace the Azure OpenAI endpoints with the APIM endpoints.

AI Hub Gateway Landing Zone accelerator aka.ms/apim-genai-lza

GenAI gateway reference architecture • Use APIM to create a GenAI gateway. • Integrates with Azure OpenAI services in the cloud and any on- premises custom LLMs that are deployed and available as REST endpoints. • The architecture incorporates elements that are engineered for batch use cases, with the aim of optimizing PTU utilization. GenAI gateway reference architecture using APIM

Wrapping up Effective API governance  Ensure compliance, reliability, and security while accelerating innovation instead of creating roadblocks. AI runtime governance  Use API Management capabilities to maximize the potential of AI APIs, incl. increased cost efficiency, reliability, security, and governance.

Useful resources API Management  aka.ms/apim/openai-docs | Documentation  aka.ms/apim/genai/sample-app | GenAI gateway guide  aka.ms/apim/genai/labs | GenAI gateway labs  aka.ms/apim-genai-lza | GenAI gateway accelerator  Designing and implementing a GenAI gateway solution | Microsoft Learn  GenAI gateway capabilities in Azure API Management | Micrososft Learn Azure Functions  Azure OpenAI extension for Azure Functions | Microsoft Learn  azure-functions-openai-extension/samples

Let’s connect  callon@cloudmavericks.ca  @flying_maverick  https://linkedin.com/in/calloncampbell  https://github.com/calloncampbell

Elevating AI Workflows: Integrating Azure API Management and Azure Functions with OpenAI

More Related Content

Similar to Elevating AI Workflows: Integrating Azure API Management and Azure Functions with OpenAI

More from Callon Campbell

Recently uploaded

Elevating AI Workflows: Integrating Azure API Management and Azure Functions with OpenAI

Editor's Notes