Price Per TokenPrice Per Token
Price Per Token

AI Embedding Model Pricing Comparison

Compare pricing for embedding models across 3+ providers including AWS Bedrock, Together, DeepInfra, Fireworks, and more. All prices per 1M input tokens.

Embedding API Pricing Overview

11
Embedding Models
4
Model Authors
3
Providers
$0.0200
Lowest / 1M tokens

All Embedding Model Prices

About AI Embedding Model Pricing

Embedding models convert text into dense vector representations used for semantic search, retrieval-augmented generation (RAG), clustering, and classification. Unlike LLMs, embedding models only have input pricing — there are no output tokens.

  • AWS Bedrock provides managed access to embedding models from Amazon, Cohere, and others
  • Together, DeepInfra, Fireworks offer competitive pricing for open embedding models
  • Azure hosts OpenAI embedding models with global deployment tiers

All prices shown are per 1 million input tokens. Key factors when choosing an embedding model include dimensions (vector size), max input tokens (context window), and price. Use the provider tabs above the table to filter by specific inference provider.

Frequently Asked Questions

What are embedding models?

Embedding models convert text into numerical vectors that capture semantic meaning. These vectors enable similarity search, clustering, and retrieval-augmented generation (RAG). Unlike LLMs that generate text, embedding models produce fixed-size vectors as output.

What is the cheapest embedding API?

The cheapest embedding API starts at $0.0200 per 1M input tokens. Prices vary by model and provider. AWS Bedrock often offers competitive pricing for embedding models. Use our comparison table to find the best deal.

What do embedding dimensions mean?

Dimensions refer to the size of the output vector (e.g., 1024 means each text input is converted to a 1024-number vector). Higher dimensions can capture more nuance but require more storage and compute for similarity searches. Most modern models use 1024 dimensions.

What is max input tokens for embeddings?

Max input tokens is the maximum amount of text the model can process in a single embedding request. Models with larger context windows (like 128K tokens) can embed entire documents at once, while smaller windows (512 tokens) require chunking longer texts.