Skip to main content
CodeLint.Dev Dev Tools

Embedding Models Guide

Compare 15 text embedding models by MTEB score, dimensions, price, and use case. Choose the right model for RAG, search, classification, or clustering.

What are text embeddings?

Embedding models convert text into dense numerical vectors (arrays of floats) that capture semantic meaning. Similar texts end up with similar vectors, enabling similarity search, clustering, and classification without exact keyword matching. They are the foundation of RAG (Retrieval-Augmented Generation) pipelines, semantic search engines, and recommendation systems.

15 models
Model Provider Dims Max Tokens Price MTEB LicenseUse CasesNotes
voyage-large-2 Voyage AI1,53616,000$0.120/1M67.1Proprietary
RAGsemantic-searchclassification
Top MTEB scores; used by Anthropic for Claude retrieval
text-embedding-004 Google7682,048Free66.3Proprietary
RAGsemantic-searchclassification
Free tier available via Gemini API; strong MTEB performance
BGE-M3 OSSBAAI1,0248,192Free65.1Apache-2.0
RAGsemantic-searchclustering
Multi-functionality: dense, sparse, and ColBERT retrieval. 100+ languages.
jina-embeddings-v3 OSSJina AI1,0248,192$0.018/1M65.1Apache-2.0
RAGsemantic-searchclassificationclustering
Supports task-specific LoRA adapters; multilingual; matryoshka dimensions
mxbai-embed-large-v1 OSSMixedBread1,024512Free64.7Apache-2.0
RAGsemantic-searchclassification
Strong open-source baseline; competitive with text-embedding-3-small
text-embedding-3-large OpenAI3,0728,191$0.130/1M64.6Proprietary
RAGsemantic-searchclassificationclustering
Supports dimension reduction via matryoshka training
embed-english-v3.0 Cohere1,024512$0.100/1M64.5Proprietary
RAGsemantic-searchclassificationclustering
Requires input_type param: search_document / search_query
embed-multilingual-v3.0 Cohere1,024512$0.100/1M64.0Proprietary
RAGsemantic-searchclustering
Supports 100+ languages; strong multilingual retrieval
GTE-large OSSAlibaba DAMO1,024512Free63.1MIT
RAGsemantic-searchclassification
Strong English retrieval; works well out of the box
nomic-embed-text-v1.5 OSSNomic7688,192Free62.4Apache-2.0
RAGsemantic-searchclustering
Supports matryoshka dimensions (64–768); fully open weights and training data
text-embedding-3-small OpenAI1,5368,191$0.020/1M62.3Proprietary
RAGsemantic-searchclassification
Cost-efficient replacement for ada-002; supports dimension reduction
E5-large-v2 OSSMicrosoft1,024512Free62.1MIT
RAGsemantic-search
Prefix text with "query: " or "passage: " for best results
text-embedding-ada-002 OpenAI1,5368,191$0.100/1M60.5Proprietary
RAGsemantic-search
Legacy model — prefer text-embedding-3-small for new projects
all-MiniLM-L6-v2 OSSSentence Transformers384256Free56.3Apache-2.0
semantic-searchclusteringclassification
Tiny and very fast; great for real-time or device-edge use cases
embedding-001 Google7682,048FreeProprietary
RAGsemantic-search
Older Google embedding model; prefer text-embedding-004

Use Case Guide

Retrieval-Augmented Generation (RAG)

Embed documents into a vector store and retrieve relevant chunks at query time to ground LLM responses in factual context. Requires high retrieval accuracy.

Top models
text-embedding-3-largebge-m3voyage-large-2
Tips
  • Use asymmetric models (separate query/document prefixes) for better accuracy
  • Chunk documents to 256–512 tokens; experiment with overlap
  • Prefer models with high MTEB retrieval scores over overall MTEB
  • Cosine similarity or dot-product; normalize vectors when using dot-product

Semantic Search

Find semantically similar documents, FAQs, or products regardless of exact keyword overlap. Common in e-commerce, support, and knowledge bases.

Top models
text-embedding-3-smallmxbai-embed-largebge-m3
Tips
  • For user-facing search, balance cost and latency — smaller models often suffice
  • Index with HNSW (Approximate Nearest Neighbor) for millions of vectors
  • Re-rank top-k results with a cross-encoder for precision
  • Regularly re-embed when model upgrades occur

Text Classification

Use embeddings as features for downstream classifiers (sentiment, topic, intent). Often more sample-efficient than fine-tuning the full model.

Top models
text-embedding-3-largegte-largeembed-english-v3
Tips
  • Train a lightweight classifier (logistic regression, MLP) on top of frozen embeddings
  • Higher-dimensional embeddings (1536+) generally improve classification accuracy
  • Few-shot: embed examples and use k-NN for zero-shot classification

Clustering & Deduplication

Group similar documents, detect near-duplicates, or visualise topic structure. Use cosine similarity with k-means or HDBSCAN.

Top models
all-minilm-l6nomic-embed-textbge-m3
Tips
  • Reduce dimensions with PCA or UMAP before clustering for speed
  • all-MiniLM is fast enough to cluster millions of short texts in seconds
  • HDBSCAN handles variable cluster sizes better than k-means
  • Cosine similarity threshold of 0.92+ usually catches near-duplicates

Dimension Guide — Quality vs Storage Tradeoff

SizeQualityStorage / VectorBest ForExamples
Tiny (384)Good1.5 KB (float32)Edge/device inference, high-volume real-time similarity, clustering millions of short texts
all-MiniLM-L6-v2
Small (768)Very Good3 KB (float32)Balanced retrieval — good quality without excessive storage; recommended default for most projects
nomic-embed-texttext-embedding-004embedding-001
Medium (1024)Excellent4 KB (float32)Production RAG systems, classification with high accuracy requirements, semantic search at scale
BGE-M3mxbai-embed-largeembed-english-v3.0GTE-large
Large (1536)Near-best6 KB (float32)High-stakes retrieval, premium search applications, fine-grained classification
text-embedding-3-smalltext-embedding-ada-002voyage-large-2
XL (3072)Best available12 KB (float32)Maximum retrieval quality where storage cost is not a constraint; research and benchmarking
text-embedding-3-large

Storage shown as float32 (4 bytes per dimension). Use float16 to halve storage with minimal quality loss. int8 quantisation reduces to 25% of float32 size.