MTEB (Massive Text Embedding Benchmark) evaluates embedding models across 56 datasets and 8 task types: classification, clustering, pair classification, reranking, retrieval, STS (semantic textual similarity), summarization, and bitext mining. Higher MTEB scores indicate better general-purpose embedding quality.

What vector dimensions should I choose?

384 dims: fast, small storage, good for classification/clustering. 768 dims: balanced — the sweet spot for most RAG applications. 1024 dims: better retrieval quality, moderate storage. 3072 dims (text-embedding-3-large with matryoshka): highest quality, largest storage. For RAG on consumer hardware, 768–1024 dims is the best tradeoff.

Can I use an open-source embedding model for free?

Yes — BGE-M3, E5-large, all-MiniLM-L6, nomic-embed-text, and mxbai-embed-large are Apache-2.0 or MIT licensed and can run locally via sentence-transformers (Python) or transformers.js (browser/Node). Local inference is free but requires a GPU for production throughput.

Embedding Models Guide

Compare 15 text embedding models by MTEB score, dimensions, price, and use case. Choose the right model for RAG, search, classification, or clustering.

What are text embeddings?

Embedding models convert text into dense numerical vectors (arrays of floats) that capture semantic meaning. Similar texts end up with similar vectors, enabling similarity search, clustering, and classification without exact keyword matching. They are the foundation of RAG (Retrieval-Augmented Generation) pipelines, semantic search engines, and recommendation systems.

15 models

Model	Provider	Dims	Max Tokens	Price	MTEB	License	Use Cases	Notes
`voyage-large-2`	Voyage AI	1,536	16,000	$0.120/1M	67.1	Proprietary	RAGsemantic-searchclassification	Top MTEB scores; used by Anthropic for Claude retrieval
`text-embedding-004`	Google	768	2,048	Free	66.3	Proprietary	RAGsemantic-searchclassification	Free tier available via Gemini API; strong MTEB performance
`BGE-M3` OSS	BAAI	1,024	8,192	Free	65.1	Apache-2.0	RAGsemantic-searchclustering	Multi-functionality: dense, sparse, and ColBERT retrieval. 100+ languages.
`jina-embeddings-v3` OSS	Jina AI	1,024	8,192	$0.018/1M	65.1	Apache-2.0	RAGsemantic-searchclassificationclustering	Supports task-specific LoRA adapters; multilingual; matryoshka dimensions
`mxbai-embed-large-v1` OSS	MixedBread	1,024	512	Free	64.7	Apache-2.0	RAGsemantic-searchclassification	Strong open-source baseline; competitive with text-embedding-3-small
`text-embedding-3-large`	OpenAI	3,072	8,191	$0.130/1M	64.6	Proprietary	RAGsemantic-searchclassificationclustering	Supports dimension reduction via matryoshka training
`embed-english-v3.0`	Cohere	1,024	512	$0.100/1M	64.5	Proprietary	RAGsemantic-searchclassificationclustering	Requires input_type param: search_document / search_query
`embed-multilingual-v3.0`	Cohere	1,024	512	$0.100/1M	64.0	Proprietary	RAGsemantic-searchclustering	Supports 100+ languages; strong multilingual retrieval
`GTE-large` OSS	Alibaba DAMO	1,024	512	Free	63.1	MIT	RAGsemantic-searchclassification	Strong English retrieval; works well out of the box
`nomic-embed-text-v1.5` OSS	Nomic	768	8,192	Free	62.4	Apache-2.0	RAGsemantic-searchclustering	Supports matryoshka dimensions (64–768); fully open weights and training data
`text-embedding-3-small`	OpenAI	1,536	8,191	$0.020/1M	62.3	Proprietary	RAGsemantic-searchclassification	Cost-efficient replacement for ada-002; supports dimension reduction
`E5-large-v2` OSS	Microsoft	1,024	512	Free	62.1	MIT	RAGsemantic-search	Prefix text with "query: " or "passage: " for best results
`text-embedding-ada-002`	OpenAI	1,536	8,191	$0.100/1M	60.5	Proprietary	RAGsemantic-search	Legacy model — prefer text-embedding-3-small for new projects
`all-MiniLM-L6-v2` OSS	Sentence Transformers	384	256	Free	56.3	Apache-2.0	semantic-searchclusteringclassification	Tiny and very fast; great for real-time or device-edge use cases
`embedding-001`	Google	768	2,048	Free	—	Proprietary	RAGsemantic-search	Older Google embedding model; prefer text-embedding-004

Use Case Guide

Retrieval-Augmented Generation (RAG)

Embed documents into a vector store and retrieve relevant chunks at query time to ground LLM responses in factual context. Requires high retrieval accuracy.

Top models

text-embedding-3-largebge-m3voyage-large-2

Tips

› Use asymmetric models (separate query/document prefixes) for better accuracy
› Chunk documents to 256–512 tokens; experiment with overlap
› Prefer models with high MTEB retrieval scores over overall MTEB
› Cosine similarity or dot-product; normalize vectors when using dot-product

Semantic Search

Find semantically similar documents, FAQs, or products regardless of exact keyword overlap. Common in e-commerce, support, and knowledge bases.

Top models

text-embedding-3-smallmxbai-embed-largebge-m3

Tips

› For user-facing search, balance cost and latency — smaller models often suffice
› Index with HNSW (Approximate Nearest Neighbor) for millions of vectors
› Re-rank top-k results with a cross-encoder for precision
› Regularly re-embed when model upgrades occur

Text Classification

Use embeddings as features for downstream classifiers (sentiment, topic, intent). Often more sample-efficient than fine-tuning the full model.

Top models

text-embedding-3-largegte-largeembed-english-v3

Tips

› Train a lightweight classifier (logistic regression, MLP) on top of frozen embeddings
› Higher-dimensional embeddings (1536+) generally improve classification accuracy
› Few-shot: embed examples and use k-NN for zero-shot classification

Clustering & Deduplication

Group similar documents, detect near-duplicates, or visualise topic structure. Use cosine similarity with k-means or HDBSCAN.

Top models

all-minilm-l6nomic-embed-textbge-m3

Tips

› Reduce dimensions with PCA or UMAP before clustering for speed
› all-MiniLM is fast enough to cluster millions of short texts in seconds
› HDBSCAN handles variable cluster sizes better than k-means
› Cosine similarity threshold of 0.92+ usually catches near-duplicates

Dimension Guide — Quality vs Storage Tradeoff

Size	Quality	Storage / Vector	Best For	Examples
Tiny (384)	Good	1.5 KB (float32)	Edge/device inference, high-volume real-time similarity, clustering millions of short texts	`all-MiniLM-L6-v2`
Small (768)	Very Good	3 KB (float32)	Balanced retrieval — good quality without excessive storage; recommended default for most projects	`nomic-embed-texttext-embedding-004embedding-001`
Medium (1024)	Excellent	4 KB (float32)	Production RAG systems, classification with high accuracy requirements, semantic search at scale	`BGE-M3mxbai-embed-largeembed-english-v3.0GTE-large`
Large (1536)	Near-best	6 KB (float32)	High-stakes retrieval, premium search applications, fine-grained classification	`text-embedding-3-smalltext-embedding-ada-002voyage-large-2`
XL (3072)	Best available	12 KB (float32)	Maximum retrieval quality where storage cost is not a constraint; research and benchmarking	`text-embedding-3-large`

Storage shown as float32 (4 bytes per dimension). Use float16 to halve storage with minimal quality loss. int8 quantisation reduces to 25% of float32 size.

Embedding Models Guide

What are text embeddings?

Use Case Guide

Retrieval-Augmented Generation (RAG)

Semantic Search

Text Classification

Clustering & Deduplication

Dimension Guide — Quality vs Storage Tradeoff

About

How to use