The Two Billing Models (And Why One Is a Trap)
Overpaying for Microsoft? We handle Microsoft EA, NCE, and Azure negotiation on a 25% gainshare basis — you keep 75% of every dollar saved. No retainer. No risk.
Get a free Microsoft savings estimate →Azure OpenAI offers two consumption models, and the choice between them determines whether you're paying transparently or walking into a vendor trap that locks you in for years.
The first model is pay-as-you-go pricing based on token consumption. For GPT-4o, Microsoft charges $5 per 1 million input tokens and $15 per 1 million output tokens. On the surface, this looks cost-effective. If you're running a proof of concept or a single internal application, you might process 500 million tokens monthly and pay $5,000–$7,500. Sounds manageable.
The problem emerges when you scale. Real-world AI workloads produce output-heavy token consumption patterns. Reasoning tasks, code generation, and multi-step reasoning workflows generate exponentially more output tokens than input. A customer with 10 internal Copilot users might look harmless in month one. By month three, when those users are running 50+ prompts daily with 2,000+ output tokens per response, pay-as-you-go pricing becomes untenable. You're now paying $45,000–$75,000 monthly for what seemed like a $5,000 workload.
This is when Microsoft's sales team suggests the second model: Provisioned Throughput Units (PTUs).
What Are PTUs and How Do They Work?
PTUs are Microsoft's solution to "unlimited" AI workloads. Instead of paying per token, you provision a fixed amount of capacity measured in PTUs. One PTU doesn't equal one transaction or one token. Instead, each PTU provides a fixed throughput capacity: roughly 240,000 input tokens or 80,000 output tokens per minute, depending on the model.
Microsoft sells PTUs with hourly pricing. One PTU costs approximately $2.00 per hour, which breaks down to roughly $17,520 per year per PTU (assuming 365 days of 24/7 utilisation). For production workloads, enterprises typically need 50–500 PTUs. At the low end, that's $876,000 annually. At the high end, it's $8.76 million.
Here's the catch: PTU commitments come in minimum blocks, and you pay for the committed capacity regardless of whether you use it. Oversizing your PTU allocation by 20% — which is common in initial deployments — costs you $350,000+ annually in wasted capacity.
What PTUs Actually Cost at Enterprise Scale
Let's look at realistic pricing scenarios for enterprises at different stages of Azure OpenAI adoption:
| Deployment Size | PTUs Required | Annual Cost (List Price) | Typical Negotiated* |
|---|---|---|---|
| Small (dev/test) | 10 PTUs | ~$175,000 | ~$105,000 |
| Medium (production) | 100 PTUs | ~$1,752,000 | ~$1,100,000 |
| Large (multi-workload) | 500 PTUs | ~$8,760,000 | ~$5,200,000 |
| Enterprise (global) | 1,000+ PTUs | $17,520,000+ | Negotiated case-by-case |
*Negotiated figures based on actual enterprise deals closed 2025–2026. Discounts typically range from 30–45% for 3-year commitments. Without procurement leverage, enterprises pay list price.
Azure OpenAI PTU Commitments Lock You In — Unless You Negotiate First
Our former Microsoft executives know the PTU pricing matrix and where enterprises consistently overpay. We work on a 25% gainshare basis — zero cost if we don't save you money. See our Microsoft negotiation service →
The MACC Connection: Where Microsoft Bundles Pricing Power
Azure OpenAI pricing doesn't exist in isolation. It intersects dangerously with Microsoft's broader enterprise licensing strategy through Azure Consumption Commitments, known as MACC.
Here's how MACC works at a high level: Large enterprises negotiate annual or multi-year commitments to spend X dollars on Azure services. If the enterprise spends more than X, they pay the overage. If they spend less, they lose the unspent commitment. It's a "use it or lose it" budget pressure mechanism.
Normally, this would be neutral. But Microsoft's sales team uses MACC and Azure OpenAI PTU commitments as a coordinated lever to maximize deal size.
The MACC-PTU Trap
Here's the playbook: An enterprise has negotiated a $10 million MACC commitment for the year. By September, they've spent $7.2 million on standard Azure compute, storage, and networking. They have $2.8 million remaining to avoid shortfall penalties.
Microsoft's sales team now pivots the conversation to Azure OpenAI. They propose a 200 PTU commitment for $3.5 million, which:
- Consumes the remaining MACC budget (eliminating shortfall risk)
- Locks the enterprise into a multi-year contract
- Allocates capacity the enterprise hadn't initially planned for
- Creates artificial urgency ("commit before year-end to secure MACC coverage")
The enterprise ends up committing $3.5 million for 200 PTUs — capacity they don't actually need — to avoid MACC shortfall penalties on the remaining $2.8 million.
MACC Credit Multipliers
What many enterprises don't know: MACC credit values are negotiable. A $1.00 spent on standard Azure compute typically credits $1.00 toward MACC. But Azure OpenAI PTUs, because they're a premium service, can sometimes be negotiated at 1.2x or 1.5x MACC credit.
This is critical negotiation leverage. If your enterprise has a shortfall risk, insisting on 1.5x MACC credit for PTUs can defer millions in overage penalties. Most enterprises never ask for this, and Microsoft doesn't volunteer it.
Microsoft 365 Copilot vs. Azure OpenAI — The Duplication Problem
Large enterprises deploying both Microsoft 365 Copilot and custom Azure OpenAI models are often paying for the same underlying infrastructure twice. This is one of the most underdiagnosed cost problems we see.
What's Really Happening
Microsoft 365 Copilot ($30/user/month) runs on top of Azure OpenAI models but uses Microsoft's own internal provisioning infrastructure. Enterprises purchasing M365 Copilot for 5,000 users pay $1.8 million annually for Copilot seats. Simultaneously, those same enterprises are provisioning separate Azure OpenAI capacity for custom AI workloads.
The problem is architectural overlap. A large portion of what enterprises think requires "custom Azure OpenAI" capability can actually be delivered through M365 Copilot's built-in AI features, Microsoft 365 plugins, and Microsoft's Copilot Studio interface.
We've audited companies paying for:
- M365 Copilot for general knowledge workers ($1.8M for 5K users)
- Custom Azure OpenAI for document processing and summarization (350 PTUs, $6M)
- Yet 70% of the processing workload could be handled by Copilot's native document intelligence
Combined spend: $7.8 million for largely overlapping functionality. Audited and redesigned architecture: $3.2 million. Actual savings: $4.6 million.
How to Avoid Duplication
Before committing to Azure OpenAI PTUs, conduct a capability audit of M365 Copilot and Microsoft 365 plugins. Map each proposed AI use case against Copilot's native capabilities, Microsoft Graph connectors, and plugin extensibility. Only provision custom Azure OpenAI for workloads that genuinely fall outside M365 Copilot's scope (e.g., specialized reasoning, custom fine-tuning, non-Microsoft data integration).
The Model Version Lock-In Risk
Azure OpenAI model deprecation is aggressive. Microsoft operates a rotating model timeline: when a new model releases (GPT-4o, GPT-4.5, etc.), the previous version typically receives 6–12 months of support before deprecation.
Here's the financial risk: Enterprises provisioning 100+ PTUs for GPT-4 lock themselves into a specific model version. When GPT-4 is deprecated, enterprises must either:
- Migrate to GPT-4o or GPT-4.5 and renegotiate new PTU pricing
- Fall back to pay-as-you-go until new PTU capacity is provisioned
- Continue on the deprecated model at reduced support (security patch risk)
Each model migration forces renegotiation. And PTU commitments don't automatically transfer to new models — you're essentially rebooting the procurement process every 18–24 months.
Enterprises locked into 3-year PTU deals are particularly exposed. A deal signed in 2024 for GPT-4 will face forced renegotiation in 2025–2026 when GPT-4o deprecation begins. This is a leverage point Microsoft exploits: "Migrate to GPT-4.5 PTUs now, or we'll increase your pay-as-you-go rates."
What Enterprises Are Actually Negotiating (2025–2026 Data)
Based on our direct engagement with enterprise AI procurement teams and former Microsoft licensing executives, here's what mature negotiators are securing:
PTU Pricing Discounts
List price for PTUs is approximately $2.00/hour per PTU. Negotiated discounts typically range from 30–45% off list for 3-year commitments. At 100 PTUs, a 40% discount saves $280,000 annually versus list price.
PTU Model Portability
Sophisticated buyers are insisting on "model portability" clauses: The right to apply committed PTU capacity to any current or successor model without renegotiating the underlying PTU count or pricing. This protects against model deprecation lock-in.
Utilisation Thresholds and Credits
Enterprises with visibility into their actual AI workload consumption are negotiating "utilisation credits" or "excess capacity buyback" clauses. If you provision 100 PTUs and consistently use only 60, Microsoft may agree to buy back the excess 40 PTUs quarterly or credit you for underutilisation.
MACC Credit Multipliers
As discussed, insisting that Azure OpenAI PTU spend counts at 1.2x–1.5x toward MACC commitments is increasingly standard in large deals.
Enterprise Support Bundling
Enterprises purchasing 200+ PTUs are negotiating inclusion of Premier/Enterprise support as part of the deal, rather than as a separate $50,000+ annual line item.
Due Diligence Before You Commit
If your enterprise is evaluating Azure OpenAI, here's a procurement checklist to avoid the traps we see repeatedly:
Model Your Actual Workload
Before sizing PTUs, run 3 months of production-grade load testing with realistic token volumes. Track input and output token ratios separately. Many enterprises discover their actual monthly token consumption is 2–3x their initial estimates. Going into PTU negotiation with real data (not projections) gives you pricing leverage.
Separate Dev/Test From Production
Avoid monolithic PTU allocations. Separate development, test, and production PTU tiers. Dev/test can run on pay-as-you-go; production gets PTU protection. This prevents expensive PTU capacity from sitting idle while development teams experiment.
Audit for M365 Copilot Overlap
Map every proposed Azure OpenAI use case against M365 Copilot capabilities, plugins, and Copilot Studio extensibility. Eliminate overlap before committing.
Secure 6-Month Right-Sizing Clauses
Require contractual language allowing PTU adjustments (up or down) at 6-month intervals, with at least 30 days' notice. This protects you if your actual utilisation is 40% higher or lower than projected.
Get Model Version Commitment in Writing
Don't rely on email or handshake agreements. Insist on written contract language guaranteeing PTU pricing holds across model generations for the contract term. Define the criteria by which Microsoft can sunset models (e.g., minimum 12-month deprecation notice).
The Gainshare Opportunity
We've audited 47 enterprise Azure OpenAI deals closed in 2025–2026. The average enterprise overspent by 35–50% in year one compared to post-audit, post-negotiation baselines. Overspending came from:
- Oversized PTU allocations (45% of cases)
- MACC misalignment (30% of cases)
- Copilot duplication (25% of cases)
- Missing model portability clauses (60% of cases)
Our former Microsoft executives know the internal PTU pricing matrix, MACC credit mechanics, and model roadmap timelines that shape negotiation outcomes. We work on a 25% gainshare basis: We get paid only if we reduce your Azure OpenAI and related spend.
Key Takeaways
- PTU pricing multiplies your Azure OpenAI costs by 5–10x vs pay-as-you-go at scale, making PTU selection the highest-leverage pricing decision
- MACC commitments and PTU purchases are orchestrated by Microsoft sales to maximize deal size and lock in multiyear revenue
- Model deprecation creates forced renegotiation risk in multi-year PTU commitments; demand model portability clauses upfront
- M365 Copilot and Azure OpenAI often overlap functionally; audit for duplication before committing to large PTU allocations
- Enterprises achieving 35–45% PTU discounts and avoiding overspend all shared one trait: independent negotiation support before signing
Negotiate Your Azure OpenAI Commitment Risk-Free
NoSaveNoPay works on a 25% gainshare basis. If we don't reduce your Microsoft Azure spend, you pay nothing.