Skip to content
This repository has been archived by the owner on May 16, 2024. It is now read-only.

CrewAakash/aoai-apim-policies

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 

Repository files navigation

Caution

This repo isn't being maintained anymore. Switch to this repo https://github.com/Azure-Samples/apim-genai-gateway-toolkit for the latest policies and developments

Azure OpenAI APIM Policies

The repository includes a collection of Azure API Management (APIM) policy samples aiming for improving interactions with Azure OpenAI Service.

These samples focus on providing a head start for implementing key aspects as mentioned below.

  1. Load Balancing: Ensuring efficient round robin based distribution of traffic across multiple backend services.
  2. Keyless Authentication: Authentication mechanisms that don't rely on AOAI keys for access.
  3. Request Validation and API Versioning: Validate the model name the API version.
  4. Retry with exponential backoff: Retry logic based on response status codes and conditional routing of requests
  5. Priority Management Based on Subscription Keys and Quota Allocation: Controlling access and resources allocation based on subscription priority.*
  6. Event Hub Logging: Logging events for monitoring and analytics.
  7. Circuit Breaker: Temporarily halting subsequent requests upon error.
  8. Adaptive Rate Limiting: Adaptive rate limiting dynamically adjusts rate limits based on overall quota consumption. Services with lower usage leave unused quotas for others, ensuring smoother functioning by reallocating resources as needed.

Scope and Structure

Policy Scopes

These policies are implemented at both the product and API levels, catering to specific functionalities:

  • API-Level Policies: Encompass Load Balancing, retry mechanism, madel name validation and Keyless Authentication for specific APIs.
  • Product-Level Policies: Comprise Priority Management based on subscription keys and Event Hub logging to manage product access and log essential events.

Products Defined

  • Chatbot: Assigned a high-priority status due to the expectation of a high request volume per minute.
  • BatchProcessor: Identified as a low-priority product with a lower request per minute (RPM) count.
  • SimpleCircuitBreaker: Demonstrates a simple circuit breaker pattern.
  • AdaptiveRateLimiting: Demonstrates a dynamic rate limiting strategy that adjusts token distribution from a global token bucket based on the varying demand of the consumers.

Note* - The rate limiting policy is not applied when the response is consumed within the inbound process. For the rate limit policy to be effective, the response has to be consumed in the outbound process.

About

Contains sample APIM policies specifically focused on Generative AI resources as backend

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published