Wiring up Azure OpenAI, end to end

A working setup with auth, rate limits, and the gotchas nobody tells you about.

A
The Author
Azure · AI · Infrastructure
HERO IMAGE

Every Azure OpenAI tutorial I’ve found starts from the happy path. This one doesn’t. Here’s how to wire it up properly — with managed identity, rate limiting, and the bits that bite you when you skip ahead.

Prerequisites

You’ll need an Azure subscription with OpenAI access approved, and a resource group to work in. I’m assuming you’re comfortable with the Azure CLI.

# Check your subscription
az account show

# Create a resource group if you don't have one
az group create --name rg-openai-demo --location australiaeast

Deploy the resource

The Azure OpenAI resource is still a bit tucked away in the portal. CLI is easier:

az cognitiveservices account create \
  --name oai-placeholder \
  --resource-group rg-openai-demo \
  --kind OpenAI \
  --sku S0 \
  --location australiaeast

Authentication

Skip API keys if you can. Managed identity is cleaner and doesn’t end up in .env files committed to repos.

import { AzureOpenAI } from "@azure/openai";
import { DefaultAzureCredential } from "@azure/identity";

const credential = new DefaultAzureCredential();
const client = new AzureOpenAI({
  endpoint: process.env.AZURE_OPENAI_ENDPOINT!,
  credential,
  apiVersion: "2024-10-21",
});

Gotcha #1 DefaultAzureCredential tries multiple auth methods in order. In local dev it falls through to Azure CLI credentials. Make sure you’re logged in with az login before testing locally.

Rate limits

The default TPM (tokens per minute) limits are lower than you’d expect. For gpt-4o on the S0 tier, you’re starting at 150k TPM. That sounds like a lot until you have three engineers all hitting the same endpoint simultaneously.

Set up retry logic from day one:

import { isThrottlingError } from "@azure/core-rest-pipeline";

async function chatWithRetry(messages: ChatMessage[], maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await client.chat.completions.create({
        model: "gpt-4o",
        messages,
      });
    } catch (err) {
      if (isThrottlingError(err) && attempt < maxRetries - 1) {
        await new Promise(r => setTimeout(r, Math.pow(2, attempt) * 1000));
        continue;
      }
      throw err;
    }
  }
}

Where to go from here

Once the basics are wired up, the next step is usually RAG — connecting the model to your own documents. That’s a separate post, but the short version: Azure AI Search + your OpenAI endpoint + a decent chunking strategy gets you most of the way there.