Embedding

Embeddings Can Be Used For:

  • Similarity Search/Retrieval: query for similar items/lines/docs (distance) for RAG, rec systems

  • Classification/Clustering/Anomaly Detection

Embeddings can be at the word, sentence, paragraph, document level and be across mediums into images/audio(CLIP) too. Can be based on context too(BERT & GPT), so bank different embedding depending on context

Options

Comparing:

Voyage AI

js

import { VoyageAIClient } from "voyageai";

const client = new VoyageAIClient({ apiKey: "YOUR_API_KEY" });
await client.embed({
    input: ["input1", "input2", "input3", "input4"],
    model: "voyage-3-lite",
});

py

import voyageai

vo = voyageai.Client()
# This will automatically use the environment variable VOYAGE_API_KEY.
# Alternatively, you can use vo = voyageai.Client(api_key="<your secret key>")

# Embed the documents
documents_embeddings = vo.embed(
    documents, model="voyage-3", input_type="document"
).embeddings

OpenAI

Js

import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

export async function generateEmbedding(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-large",
    input: text,
  });

  return response.data[0].embedding;
}

Last updated