Embedding Models Guide
Compare 15 text embedding models by MTEB score, dimensions, price, and use case. Choose the right model for RAG, search, classification, or clustering.
What are text embeddings?
Embedding models convert text into dense numerical vectors (arrays of floats) that capture semantic meaning. Similar texts end up with similar vectors, enabling similarity search, clustering, and classification without exact keyword matching. They are the foundation of RAG (Retrieval-Augmented Generation) pipelines, semantic search engines, and recommendation systems.
| Model | Provider | Dims | Max Tokens | Price | MTEB | License | Use Cases | Notes |
|---|---|---|---|---|---|---|---|---|
voyage-large-2 | Voyage AI | 1,536 | 16,000 | $0.120/1M | 67.1 | Proprietary | RAGsemantic-searchclassification | Top MTEB scores; used by Anthropic for Claude retrieval |
text-embedding-004 | 768 | 2,048 | Free | 66.3 | Proprietary | RAGsemantic-searchclassification | Free tier available via Gemini API; strong MTEB performance | |
BGE-M3 OSS | BAAI | 1,024 | 8,192 | Free | 65.1 | Apache-2.0 | RAGsemantic-searchclustering | Multi-functionality: dense, sparse, and ColBERT retrieval. 100+ languages. |
jina-embeddings-v3 OSS | Jina AI | 1,024 | 8,192 | $0.018/1M | 65.1 | Apache-2.0 | RAGsemantic-searchclassificationclustering | Supports task-specific LoRA adapters; multilingual; matryoshka dimensions |
mxbai-embed-large-v1 OSS | MixedBread | 1,024 | 512 | Free | 64.7 | Apache-2.0 | RAGsemantic-searchclassification | Strong open-source baseline; competitive with text-embedding-3-small |
text-embedding-3-large | OpenAI | 3,072 | 8,191 | $0.130/1M | 64.6 | Proprietary | RAGsemantic-searchclassificationclustering | Supports dimension reduction via matryoshka training |
embed-english-v3.0 | Cohere | 1,024 | 512 | $0.100/1M | 64.5 | Proprietary | RAGsemantic-searchclassificationclustering | Requires input_type param: search_document / search_query |
embed-multilingual-v3.0 | Cohere | 1,024 | 512 | $0.100/1M | 64.0 | Proprietary | RAGsemantic-searchclustering | Supports 100+ languages; strong multilingual retrieval |
GTE-large OSS | Alibaba DAMO | 1,024 | 512 | Free | 63.1 | MIT | RAGsemantic-searchclassification | Strong English retrieval; works well out of the box |
nomic-embed-text-v1.5 OSS | Nomic | 768 | 8,192 | Free | 62.4 | Apache-2.0 | RAGsemantic-searchclustering | Supports matryoshka dimensions (64–768); fully open weights and training data |
text-embedding-3-small | OpenAI | 1,536 | 8,191 | $0.020/1M | 62.3 | Proprietary | RAGsemantic-searchclassification | Cost-efficient replacement for ada-002; supports dimension reduction |
E5-large-v2 OSS | Microsoft | 1,024 | 512 | Free | 62.1 | MIT | RAGsemantic-search | Prefix text with "query: " or "passage: " for best results |
text-embedding-ada-002 | OpenAI | 1,536 | 8,191 | $0.100/1M | 60.5 | Proprietary | RAGsemantic-search | Legacy model — prefer text-embedding-3-small for new projects |
all-MiniLM-L6-v2 OSS | Sentence Transformers | 384 | 256 | Free | 56.3 | Apache-2.0 | semantic-searchclusteringclassification | Tiny and very fast; great for real-time or device-edge use cases |
embedding-001 | 768 | 2,048 | Free | — | Proprietary | RAGsemantic-search | Older Google embedding model; prefer text-embedding-004 |
Use Case Guide
Retrieval-Augmented Generation (RAG)
Embed documents into a vector store and retrieve relevant chunks at query time to ground LLM responses in factual context. Requires high retrieval accuracy.
text-embedding-3-largebge-m3voyage-large-2- › Use asymmetric models (separate query/document prefixes) for better accuracy
- › Chunk documents to 256–512 tokens; experiment with overlap
- › Prefer models with high MTEB retrieval scores over overall MTEB
- › Cosine similarity or dot-product; normalize vectors when using dot-product
Semantic Search
Find semantically similar documents, FAQs, or products regardless of exact keyword overlap. Common in e-commerce, support, and knowledge bases.
text-embedding-3-smallmxbai-embed-largebge-m3- › For user-facing search, balance cost and latency — smaller models often suffice
- › Index with HNSW (Approximate Nearest Neighbor) for millions of vectors
- › Re-rank top-k results with a cross-encoder for precision
- › Regularly re-embed when model upgrades occur
Text Classification
Use embeddings as features for downstream classifiers (sentiment, topic, intent). Often more sample-efficient than fine-tuning the full model.
text-embedding-3-largegte-largeembed-english-v3- › Train a lightweight classifier (logistic regression, MLP) on top of frozen embeddings
- › Higher-dimensional embeddings (1536+) generally improve classification accuracy
- › Few-shot: embed examples and use k-NN for zero-shot classification
Clustering & Deduplication
Group similar documents, detect near-duplicates, or visualise topic structure. Use cosine similarity with k-means or HDBSCAN.
all-minilm-l6nomic-embed-textbge-m3- › Reduce dimensions with PCA or UMAP before clustering for speed
- › all-MiniLM is fast enough to cluster millions of short texts in seconds
- › HDBSCAN handles variable cluster sizes better than k-means
- › Cosine similarity threshold of 0.92+ usually catches near-duplicates
Dimension Guide — Quality vs Storage Tradeoff
| Size | Quality | Storage / Vector | Best For | Examples |
|---|---|---|---|---|
| Tiny (384) | Good | 1.5 KB (float32) | Edge/device inference, high-volume real-time similarity, clustering millions of short texts | all-MiniLM-L6-v2 |
| Small (768) | Very Good | 3 KB (float32) | Balanced retrieval — good quality without excessive storage; recommended default for most projects | nomic-embed-texttext-embedding-004embedding-001 |
| Medium (1024) | Excellent | 4 KB (float32) | Production RAG systems, classification with high accuracy requirements, semantic search at scale | BGE-M3mxbai-embed-largeembed-english-v3.0GTE-large |
| Large (1536) | Near-best | 6 KB (float32) | High-stakes retrieval, premium search applications, fine-grained classification | text-embedding-3-smalltext-embedding-ada-002voyage-large-2 |
| XL (3072) | Best available | 12 KB (float32) | Maximum retrieval quality where storage cost is not a constraint; research and benchmarking | text-embedding-3-large |
Storage shown as float32 (4 bytes per dimension). Use float16 to halve storage with minimal quality loss. int8 quantisation reduces to 25% of float32 size.
About
This guide compares 15+ text embedding models across the metrics that matter for production use: MTEB benchmark score (overall retrieval quality), vector dimensions (storage and search speed impact), maximum input tokens, price per million tokens, and license. Use the filter chips to narrow by license type or use case. The Use Case Guide section recommends the best models for RAG, semantic search, classification, and clustering.
How to use
- 1 Use filter chips to show Proprietary or Open Source models, or filter by use case.
- 2 Click column headers to sort by any metric (MTEB score, dimensions, price).
- 3 Scroll to the Use Case Guide section for task-specific recommendations.
- 4 Check the Dimension Guide to understand the storage/quality tradeoffs.
- What is MTEB?
- MTEB (Massive Text Embedding Benchmark) evaluates embedding models across 56 datasets and 8 task types: classification, clustering, pair classification, reranking, retrieval, STS (semantic textual similarity), summarization, and bitext mining. Higher MTEB scores indicate better general-purpose embedding quality.
- What vector dimensions should I choose?
- 384 dims: fast, small storage, good for classification/clustering. 768 dims: balanced — the sweet spot for most RAG applications. 1024 dims: better retrieval quality, moderate storage. 3072 dims (text-embedding-3-large with matryoshka): highest quality, largest storage. For RAG on consumer hardware, 768–1024 dims is the best tradeoff.
- Can I use an open-source embedding model for free?
- Yes — BGE-M3, E5-large, all-MiniLM-L6, nomic-embed-text, and mxbai-embed-large are Apache-2.0 or MIT licensed and can run locally via sentence-transformers (Python) or transformers.js (browser/Node). Local inference is free but requires a GPU for production throughput.