AI Embedding Model Pricing Comparison
Compare pricing for embedding models across 3+ providers including AWS Bedrock, Together, DeepInfra, Fireworks, and more. All prices per 1M input tokens.
Embedding API Pricing Overview
All Embedding Model Prices
Author | Model | Dimensions | Max Input | Input / 1M |
|---|---|---|---|---|
- | - | - | ||
- | - | - | ||
1,536 | 8,191 | $0.020 | ||
- | - | $0.020 | ||
1,536 | 8,191 | $0.050 | ||
1,024 | 512 | $0.100 | ||
1,024 | 512 | $0.100 | ||
1,536 | 8,191 | $0.100 | ||
1,024 | 128,000 | $0.120 | ||
3,072 | 8,191 | $0.130 | ||
- | - | $0.130 |
About AI Embedding Model Pricing
Embedding models convert text into dense vector representations used for semantic search, retrieval-augmented generation (RAG), clustering, and classification. Unlike LLMs, embedding models only have input pricing — there are no output tokens.
- AWS Bedrock provides managed access to embedding models from Amazon, Cohere, and others
- Together, DeepInfra, Fireworks offer competitive pricing for open embedding models
- Azure hosts OpenAI embedding models with global deployment tiers
All prices shown are per 1 million input tokens. Key factors when choosing an embedding model include dimensions (vector size), max input tokens (context window), and price. Use the provider tabs above the table to filter by specific inference provider.
Frequently Asked Questions
What are embedding models?
Embedding models convert text into numerical vectors that capture semantic meaning. These vectors enable similarity search, clustering, and retrieval-augmented generation (RAG). Unlike LLMs that generate text, embedding models produce fixed-size vectors as output.
What is the cheapest embedding API?
The cheapest embedding API starts at $0.0200 per 1M input tokens. Prices vary by model and provider. AWS Bedrock often offers competitive pricing for embedding models. Use our comparison table to find the best deal.
What do embedding dimensions mean?
Dimensions refer to the size of the output vector (e.g., 1024 means each text input is converted to a 1024-number vector). Higher dimensions can capture more nuance but require more storage and compute for similarity searches. Most modern models use 1024 dimensions.
What is max input tokens for embeddings?
Max input tokens is the maximum amount of text the model can process in a single embedding request. Models with larger context windows (like 128K tokens) can embed entire documents at once, while smaller windows (512 tokens) require chunking longer texts.
Built by @aellman
Tools
Directories
Models & Pricing
Endpoints
Rankings
- All Rankings
- All Benchmarks
- Best LLM for Coding
- Best LLM for Math
- Best LLM for Writing
- Best LLM for RAG
- Best LLM for OpenClaw
- Best LLM for Cursor
- Best LLM for Windsurf
- Best LLM for Cline
- Best LLM for Aider
- Best LLM for GitHub Copilot
- Best LLM for Bolt
- Best LLM for Continue.dev
- MMLU-Pro
- GPQA
- LiveCodeBench
- Aider
- AIME
- MATH (Hard)
- Big-Bench Hard
2026 68 Ventures, LLC. All rights reserved.
