Contents
Microsoft rebranded Azure AI Studio as Azure AI Foundry in late 2024, unifying the model catalogue, prompt flow, and Azure OpenAI Service under a single platform umbrella. The rebrand didn't change the pricing — it just made it harder to find, because now you're looking at Azure AI Foundry documentation when the actual charges still appear as "Azure OpenAI Service" and "Azure Machine Learning" on your Azure invoice.
For enterprises building AI applications — whether retrieval-augmented generation (RAG) systems, custom copilots, document intelligence pipelines, or fine-tuned domain models — the cost of Azure AI Foundry can grow faster than almost any other cloud expenditure category. Token pricing is inherently consumption-based, fine-tuning requires GPU compute at substantial cost, and provisioned throughput reservations carry fixed charges whether utilised or not.
This guide breaks down exactly what you're paying for, what's negotiable, and how to avoid the structural cost traps built into the Azure AI Foundry pricing model. We negotiate Microsoft contracts on a 25% gainshare basis — including Azure AI spend — so if there are savings available in your Azure OpenAI or AI Foundry contracts, you'll keep 75% of them.
Azure AI Foundry: Platform Overview
Overpaying for Microsoft? We handle Microsoft EA, NCE, and Azure negotiation on a 25% gainshare basis — you keep 75% of every dollar saved. No retainer. No risk.
Get a free Microsoft savings estimate →Azure AI Foundry is Microsoft's unified platform for enterprise AI development. It encompasses: the Azure OpenAI Service (GPT-4o, GPT-4, o1, o3 models), the Model Catalogue (open-source and third-party models including Meta Llama, Mistral, Cohere), Azure AI Search (vector search and RAG infrastructure), Azure Machine Learning (training, fine-tuning, MLOps), and AI Foundry hub resources (project management, evaluation, safety filtering).
Each component has its own pricing model. The platform is designed to make it easy to start building — but the commercial terms are designed to maximise revenue as usage scales. The default pay-as-you-go pricing is the most expensive option at scale, and most enterprises are on it because negotiating custom terms requires knowing they exist.
The Five Cost Layers in Azure AI Foundry
A full enterprise AI Foundry deployment has five distinct cost layers that stack on top of each other:
- 1. Model inference (token consumption): The primary cost for any production AI application. Charged per 1,000 input tokens and per 1,000 output tokens, with rates varying significantly by model.
- 2. Provisioned throughput units (PTUs): Reserved GPU capacity purchased for consistent latency and throughput guarantees. Priced per PTU-hour or via monthly/annual commitments.
- 3. Fine-tuning compute: GPU-based training charges for customising base models on proprietary data. Charged per training hour, model size, and compute tier.
- 4. Azure AI Search: Vector search infrastructure for RAG applications. Priced per search unit (SU) per hour, plus document cracking and indexing charges.
- 5. Compute and storage (Azure ML): For custom training pipelines, evaluation workflows, and managed endpoints — standard Azure compute rates apply.
Azure OpenAI Token Pricing by Model
Token pricing is the largest cost component for most production AI applications. Microsoft charges separately for input (prompt) tokens and output (completion) tokens. Output tokens are consistently priced at 3–4x the input rate, reflecting the higher compute cost of generation versus processing.
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Context Window |
|---|---|---|---|
| GPT-4o | $5.00 | $15.00 | 128K |
| GPT-4o mini | $0.15 | $0.60 | 128K |
| o1 (reasoning) | $15.00 | $60.00 | 200K |
| o3-mini | $1.10 | $4.40 | 200K |
| GPT-4 Turbo | $10.00 | $30.00 | 128K |
| text-embedding-3-large | $0.13 | N/A | 8K |
| Enterprise negotiated (GPT-4o) | $3.00–$3.75 | $9.00–$11.25 | 128K |
The per-token rates above are pay-as-you-go list prices as of early 2026. They shift with model releases and competitive pressure. Enterprise accounts with significant Azure AI consumption — typically $500K+ per year — should be negotiating custom pricing rather than paying list.
Provisioned Throughput vs. Pay-As-You-Go
For high-volume production applications, Microsoft offers Provisioned Throughput Units (PTUs) — reserved model capacity that guarantees a specific number of tokens per minute (TPM) regardless of Azure platform load. This addresses the latency unpredictability of shared pay-as-you-go capacity.
PTU pricing works as follows for GPT-4o: $2 per PTU per hour on pay-as-you-go, or approximately $1,460/PTU/month on monthly reserved pricing. Each PTU provides roughly 6,000 tokens per minute of throughput. An application requiring 600,000 tokens per minute needs 100 PTUs — $146,000/month in provisioned capacity.
The critical question is breakeven: at what consumption level does PTU pricing become cheaper than pay-as-you-go? For GPT-4o, provisioned throughput becomes cost-effective once you're consuming more than 65–70% of the provisioned capacity consistently. Applications with unpredictable bursts of usage are better on pay-as-you-go; applications with steady, predictable high volume benefit from PTU commitments. Most enterprises mix both — base capacity on PTUs, burst capacity on pay-as-you-go.
PTU trap: Microsoft's minimum PTU commitment is 50 PTUs per deployment. For organisations deploying multiple regional endpoints for redundancy, minimum spend commitments multiply quickly. A two-region active-active deployment with GPT-4o requires 100 PTUs minimum — $146,000/month before any actual token consumption. Right-size your PTU commitments before you commit.
Fine-Tuning and Custom Model Costs
Fine-tuning Azure OpenAI models on proprietary data adds a separate cost layer on top of inference. Fine-tuning pricing has three components:
- Training compute: Charged per 1,000 tokens processed during the training run. GPT-4o fine-tuning: $25 per 1M training tokens. GPT-4o mini fine-tuning: $3 per 1M training tokens.
- Hosted deployment: Fine-tuned models require a dedicated deployment endpoint. Charged per hour of deployment regardless of usage: approximately $1.70–$3.00/hour for fine-tuned GPT-4o, depending on region.
- Inference on fine-tuned model: Priced at a premium to base model rates — approximately 20–30% above standard inference pricing for the equivalent base model.
The hosted deployment charge is the most significant operational cost for fine-tuned models. A fine-tuned GPT-4o deployment running continuously: $1.70/hour × 8,760 hours = $14,892/year just for having the endpoint active, before any inference consumption. Enterprises that fine-tune models for infrequent batch workflows should evaluate whether on-demand deployment and shutdown between runs is operationally viable to avoid continuous deployment charges.
Real Enterprise Cost Scenarios
Scenario 1: Enterprise RAG Application (Customer Support)
500 concurrent users, average 10 RAG queries/user/day, GPT-4o for generation, text-embedding-3-large for retrieval, Azure AI Search for vector indexing. Daily: 500 × 10 = 5,000 queries. Average input: 3,000 tokens/query (context + docs). Average output: 500 tokens. Monthly token cost: 5,000 × 30 × 3,000 input tokens = 450M input tokens ($2,250); 5,000 × 30 × 500 output tokens = 75M output tokens ($1,125). Azure AI Search (S3 tier, 2 SUs): $1,960/month. Total: ~$5,335/month ($64,020/year) on PAYG. With negotiated 30% discount: ~$44,814/year.
Scenario 2: High-Volume Document Intelligence Pipeline
Legal firm processing 10,000 contracts/month, using GPT-4o for extraction (avg 8,000 input tokens per contract), plus o1 for complex clause reasoning on 1,000 contracts. GPT-4o: 10,000 × 8,000 = 80M input tokens ($400) + output. o1: 1,000 × 8,000 = 8M input tokens ($120) + output. Significant at scale but manageable — until the o1 reasoning model generates extended chain-of-thought tokens at $60/1M output, which can be 5–10× the input token count for complex reasoning tasks.
Further Reading
- Microsoft Volume Licensing Service Center ↗
- Gartner Magic Quadrant for Unified Communications ↗
- IDC Microsoft 365 Market Analysis ↗
Azure AI Costs Growing Faster Than Expected?
Token-based consumption pricing is designed to scale with your usage — but the rate you pay is negotiable. We benchmark your actual Azure AI consumption against enterprise discount thresholds and negotiate custom pricing through MACC commitments. Microsoft negotiation on 25% gainshare — no savings, no fee.
Get Your Free Azure AI Cost Assessment →How to Negotiate Azure AI Foundry Costs
MACC Commitment Is the Primary Lever
Azure OpenAI Service and AI Foundry charges flow through your Azure billing and count toward your Microsoft Azure Consumption Commitment (MACC). Increasing your MACC level to cover projected AI spend — rather than paying overage rates — unlocks discount tiers. At $1M+ MACC: typically 5–10% discount on Azure OpenAI. At $5M+ MACC: 15–25% discount. At $10M+ MACC: 25–40% discount. These aren't published; they're negotiated.
Private Pricing Agreements for Token Volume
For enterprises consuming above $500K/year in Azure OpenAI tokens, Microsoft will negotiate private pricing agreements (PPAs) with custom per-token rates. The leverage is: (1) demonstrated volume, (2) multi-year commit, (3) competitive alternatives — particularly open-source models on Azure ML or competing provider pricing from Anthropic or Google via other cloud providers.
Model Selection Optimisation Before Committing to Premium Rates
Not all workloads need GPT-4o. GPT-4o mini at $0.15/1M input tokens vs. GPT-4o at $5.00/1M is a 33x cost difference. Before negotiating premium tier pricing, audit whether your use cases require premium model capability or whether a tiered model strategy — routing simple classification/extraction tasks to mini models, reserving premium models for complex reasoning — can reduce your effective average token cost by 50–70% without quality degradation.
Embed AI Foundry into the Broader EA Renewal
Microsoft's most aggressive AI pricing concessions come when Azure AI Foundry spend is bundled into the total Microsoft relationship conversation — alongside MACC commitments, M365 Copilot seat counts, and any Azure infrastructure spend. Negotiating Azure AI in isolation limits the leverage available. As part of an integrated Microsoft EA negotiation, Azure AI discounts compound with broader Azure and M365 commitments.
Bottom Line
Azure AI Foundry's pricing architecture is a compound system — model inference, PTU reservations, fine-tuning compute, AI Search, and Azure ML charges stack on top of each other in ways that are genuinely difficult to forecast and manage without deliberate governance. The default pay-as-you-go rates are not the rates that large enterprises should be paying.
Three things every enterprise should do before their AI spend reaches $500K/year: model workload routing across model tiers to minimise average token cost, establish a MACC commitment that covers projected AI consumption at a negotiated discount, and evaluate PTU vs. PAYG breakeven for your specific usage patterns. We do all three as part of integrated cloud cost negotiation engagements. The savings are real — and on a gainshare basis, you keep 75% of every dollar we find.