Generative AI infrastructure costs are the new cloud cost management challenge. Three years ago, enterprises were scrambling to understand Reserved Instance vs. Savings Plans optimisation. Today, the same organisations are watching AWS Bedrock charges appear on invoices with no clear owner, no established benchmarks, and no negotiation framework.
This analysis covers the full AWS Bedrock pricing architecture — on-demand token pricing, provisioned throughput models, Knowledge Bases costs, Agents costs, and Guardrails — and explains how enterprises can structure their AI infrastructure contracts to avoid the most common cost traps.
How AWS Bedrock Pricing Is Structured
Overpaying for AWS? We handle AWS EDP, Reserved Instance and savings plan negotiation on a 25% gainshare basis — you keep 75% of every dollar saved. No retainer. No risk.
Get a free AWS savings estimate →AWS Bedrock is a managed foundation model (FM) service that provides API access to models from Anthropic, Meta, Mistral, Cohere, Stability AI, Amazon Titan, and others. Pricing operates on three distinct models depending on how you access the service:
1. On-Demand Pricing (Pay Per Token)
On-demand pricing is the default mode. You pay per 1,000 input tokens and per 1,000 output tokens, with rates varying by model. There are no upfront commitments, no minimum usage requirements, and no reserved capacity.
On-demand is appropriate for development, low-volume production, and variable workloads. It is the most expensive per-token option at scale.
2. Provisioned Throughput
Provisioned Throughput (PT) is AWS Bedrock's commitment pricing model. You purchase model units (MUs) — essentially reserved capacity — for a defined term (one month, six months, or one year). This provides guaranteed throughput (tokens per minute) and lower effective per-token costs versus on-demand.
PT is appropriate for consistent, predictable high-volume workloads where throughput guarantees matter. The economics are compelling at scale: one-year PT commitments typically reduce effective per-token costs by 40–65% versus on-demand.
3. Batch Inference
Batch inference is AWS Bedrock's lowest-cost option for workloads that can tolerate latency. You submit jobs and receive results asynchronously, typically within 24 hours. Batch inference pricing is approximately 50% of on-demand rates for most models.
AWS Bedrock Pricing by Model: What Enterprises Are Actually Paying
AWS Bedrock's model-specific pricing changes frequently. The figures below reflect mid-2026 pricing; always verify against the AWS Bedrock pricing page before committing.
| Model | Input (per 1K tokens) | Output (per 1K tokens) | Notes |
|---|---|---|---|
| Amazon Nova Pro | $0.0008 | $0.0032 | AWS flagship; strong multimodal |
| Amazon Nova Lite | $0.00006 | $0.00024 | Low-cost, fast; limited reasoning |
| Anthropic Claude 3.5 Sonnet | $0.003 | $0.015 | High-quality general use |
| Anthropic Claude 3 Haiku | $0.00025 | $0.00125 | Fast; lowest Claude cost |
| Meta Llama 3.1 70B Instruct | $0.00099 | $0.00099 | Open model; flat in/out pricing |
| Mistral Large 2 | $0.002 | $0.006 | Strong multilingual |
| Cohere Command R+ | $0.003 | $0.015 | RAG-optimised |
| Amazon Titan Text Express | $0.0002 | $0.0006 | AWS proprietary; RAG use cases |
At enterprise scale, model selection alone can determine whether AI infrastructure costs are $500K or $5M annually for comparable output volumes. The most common enterprise mistake is defaulting to premium models (Claude 3.5 Sonnet, GPT-4-class equivalents) for workloads that would perform adequately on mid-tier or open models at 80–90% lower cost.
Further Reading
- AWS Pricing Calculator ↗
- AWS Cost Optimization Hub ↗
- Gartner Magic Quadrant for Cloud Infrastructure ↗
AWS Bedrock Costs Growing Faster Than Expected?
Our team helps enterprises structure AWS Bedrock deployments to control token costs, negotiate Provisioned Throughput terms, and include AI services in EDP discount coverage. 25% gainshare — you only pay us if we save you money.
Explore AWS Negotiation Services →Provisioned Throughput Economics: When It Makes Sense
Provisioned Throughput pricing is significantly more complex than on-demand, but the savings at scale are compelling. Understanding the economics requires modelling against your actual workload characteristics.
Model Units (MUs): AWS defines throughput in model units, where each MU provides a guaranteed number of tokens per minute (TPM) specific to the model. The number of MUs required depends on your peak and sustained throughput requirements.
Commitment terms and discount levels (approximate):
- No commitment (on-demand): Full list price per token
- 1-month Provisioned Throughput: ~30–40% discount vs on-demand
- 6-month Provisioned Throughput: ~45–55% discount vs on-demand
- 1-year Provisioned Throughput: ~55–65% discount vs on-demand
The crossover point where PT economics outperform on-demand typically occurs at approximately 60–70% utilisation of provisioned capacity. Below 60% utilisation, you are better off on-demand. Above 70%, PT delivers significant savings even after factoring in unused capacity costs.
The PT Utilisation Trap
A common enterprise mistake is over-provisioning PT capacity during initial deployment, then running it at 30–40% utilisation. At those levels, the fixed PT cost exceeds what on-demand would have cost. Always model PT against your 90th-percentile utilisation, not your peak capacity forecast.
AWS Bedrock Hidden Costs: What the Pricing Page Doesn't Show You
AWS Bedrock's base token pricing is only part of the total cost picture. Several additional cost components significantly affect total cost of ownership for enterprise deployments:
Knowledge Bases (RAG Infrastructure)
AWS Bedrock Knowledge Bases provides managed Retrieval-Augmented Generation (RAG) infrastructure. Costs include vector database storage (OpenSearch Serverless), embedding model inference (charged per token), retrieval queries, and data ingestion. For large document corpora, Knowledge Bases costs can add 30–80% to base model inference costs.
Bedrock Agents
AWS Bedrock Agents charges for orchestration steps (roughly $0.000 – $0.001 per orchestration trace step depending on model), plus underlying model inference costs for each tool use and reasoning step. Complex multi-step agents can consume 3–5x the tokens of a simple inference call for the same user interaction.
Guardrails
AWS Bedrock Guardrails (content filtering, PII detection, grounding checks) adds per-API-call charges. For high-throughput production systems, Guardrails costs can represent 15–25% of total Bedrock spend. Many enterprises discover this only when Guardrails bills appear for the first time.
Data Transfer and VPC Costs
If you access Bedrock from within a VPC (recommended for production workloads), VPC endpoint charges and associated data transfer costs apply. At high volume, these can add material costs that are not visible in basic Bedrock pricing estimates.
Embedding Model Inference
Embedding model costs are charged separately from generation model inference. If you are running a semantic search or RAG system with high document update frequency, embedding costs can rival generation costs. Amazon Titan Embeddings is significantly cheaper than third-party embedding models via Bedrock.
How AWS Bedrock Fits Into Your EDP: The Negotiation Gap
This is the most important strategic issue for enterprises managing material AWS Bedrock spend: by default, AWS Bedrock charges do not count toward EDP commitment satisfaction and do not receive EDP discounts.
AWS's default EDP eligible service list was established before the generative AI cost wave. Most existing EDPs were negotiated when Bedrock spend was minimal. Today, organisations with $5M+ annual Bedrock spend are often not receiving any EDP discount on that spend — and it's not counting toward their commitment minimums.
This creates two problems simultaneously:
- You are paying full on-demand rates (or list PT rates) for Bedrock without EDP discount coverage.
- Your EDP commitment undercount may trigger ratchet penalties in your next renewal, because AWS counts only eligible service spend toward your commitment baseline.
The negotiation opportunity: expanding your EDP eligible services list to include AWS Bedrock, SageMaker, and related AI/ML services. This is obtainable in EDP amendments and renewals — AWS has granted it for organisations with demonstrated Bedrock spend — but it requires explicit negotiation. It will not happen automatically.
⚠ AWS Bedrock EDP Exclusion: Check Your Contract Now
If you have an active EDP and material AWS Bedrock spend, check your EDP's eligible services schedule immediately. There is a high probability that Bedrock is not included, which means you are leaving significant savings on the table every month. An EDP amendment to include Bedrock can be negotiated mid-term — AWS processes these for customers with sufficient Bedrock spend volume.
Controlling AWS Bedrock Costs: Practical Strategies
1. Establish a Model Tiering Policy
Not every workload requires premium models. Create an organisational policy that assigns workloads to model tiers based on quality requirements. Complex reasoning, synthesis, and customer-facing tasks may justify Claude 3.5 Sonnet or Nova Pro. Internal classification, summarisation, and routing tasks can use Claude Haiku, Nova Lite, or Titan at 10–50x lower cost per token.
2. Implement Token Budget Controls
AWS Bedrock does not provide native spend controls at the application level. Without explicit token budget controls in your application layer, a poorly-configured prompt, an infinite-loop agent, or a Guardrail misconfiguration can generate unexpected token consumption. Implement cost anomaly detection (AWS Cost Anomaly Detection, with Bedrock-specific alerts) and application-level token counting.
3. Optimise Context Window Usage
Input token costs accumulate rapidly when large system prompts, conversation history, or retrieved context are sent with every API call. Audit your context window usage: implement semantic caching for repeated queries, truncate conversation history intelligently, and compress retrieved context before sending to the model.
4. Migrate Batch Workloads
Any Bedrock workload that can tolerate latency — document classification, content moderation backlogs, report generation — should use batch inference at approximately 50% of on-demand rates. Most enterprises run all Bedrock workloads on-demand by default, even those with no latency requirements.
5. Negotiate Custom Pricing for Production Scale
Once AWS Bedrock becomes a top-10 spend item in your cloud bill, custom pricing negotiations become viable. AWS has provided private pricing for specific models to large Bedrock customers, outside the standard PT framework. This typically requires $2M+ annual Bedrock spend and a direct negotiation with AWS commercial leadership rather than your standard account team.
Gainshare Approach to AWS Bedrock Cost Optimisation
NoSaveNoPay works with enterprises to address AWS Bedrock costs across three tracks: (1) EDP amendment negotiations to include Bedrock in eligible services, (2) Provisioned Throughput optimisation modelling to right-size PT commitments, and (3) private pricing negotiations for high-volume Bedrock customers. We work on a 25% gainshare basis — if we don't reduce your costs, you owe nothing.
AWS Bedrock vs. Direct API Providers: Total Cost Comparison
For enterprises considering whether to access Anthropic's Claude, Meta's Llama, or other models directly versus via AWS Bedrock, the cost calculus is nuanced:
AWS Bedrock advantages: Integrated billing with existing AWS spend and EDP (if eligible services include Bedrock), AWS IAM-native access controls, VPC integration, no separate vendor contracts, unified compliance footprint, and AWS SLAs covering the managed inference layer.
Direct API advantages: For Anthropic Claude specifically, direct API pricing is often 15–30% lower than equivalent Bedrock on-demand pricing. Direct API providers may also offer volume commitment discounts and custom enterprise agreements that are not available via Bedrock.
The optimal architecture for large enterprises is often a hybrid: running production workloads via Bedrock for operational simplicity and integrated billing, while accessing the most cost-sensitive high-volume workloads via direct API or self-hosted open models on EC2.
Related Resources
- AWS Enterprise Discount Program: How to Negotiate Beyond Standard Pricing
- AWS EDP Negotiation: How to Cut Your Cloud Bill by 30%
- AWS Spot vs Reserved vs Savings Plans: Enterprise Cost Strategy
- Enterprise FinOps: When Cloud Cost Tools Aren't Enough
- Microsoft Azure OpenAI Pricing: What Enterprises Are Actually Paying
- Google Cloud Gemini Pricing: Enterprise AI Licensing Cost Analysis 2026
- AWS Contract Negotiation Services
- Cloud Cost Negotiation Services
- AWS EDP Negotiation Handbook (White Paper)
- Cloud Cost Optimisation Checker — Free Tool