Embedding
Embeddings Can Be Used For:
Similarity Search/Retrieval: query for similar items/lines/docs (distance) for RAG, rec systems
Classification/Clustering/Anomaly Detection
Embeddings can be at the word, sentence, paragraph, document level and be across mediums into images/audio(CLIP) too. Can be based on context too(BERT & GPT), so bank different embedding depending on context
Options
Comparing:
MTEB(Massive Text Embedding Benchmark)
Consider embeddings for your use case/lang, as no model is SOTA for all tasks including
Also see
Recommendation 9/21/24:
Other Options
https://replicate.com/collections/embedding-models
Voyage AI
js
import { VoyageAIClient } from "voyageai";
const client = new VoyageAIClient({ apiKey: "YOUR_API_KEY" });
await client.embed({
input: ["input1", "input2", "input3", "input4"],
model: "voyage-3-lite",
});
py
import voyageai
vo = voyageai.Client()
# This will automatically use the environment variable VOYAGE_API_KEY.
# Alternatively, you can use vo = voyageai.Client(api_key="<your secret key>")
# Embed the documents
documents_embeddings = vo.embed(
documents, model="voyage-3", input_type="document"
).embeddings
OpenAI
Js
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
export async function generateEmbedding(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: "text-embedding-3-large",
input: text,
});
return response.data[0].embedding;
}
Last updated