Lesson 5: Embedding and Vector Search — Semantic search in Go without Python -

For a long time, embedding-based semantic search felt like Python territory. The tutorials all pointed to LangChain, FAISS, and numpy. But the actual operations — generate an embedding vector, store it in a database, query for nearest neighbors — map directly onto Go’s strengths: clean HTTP client code for the embedding API, pgx for PostgreSQL with pgvector, and fast concurrent query pipelines. I’ve built production semantic search systems entirely in Go and they’re fast, maintainable, and don’t require a Python sidecar.

The Problem

Keyword search fails on semantic similarity. A user searching for “I can’t log in” won’t match a support article titled “Authentication troubleshooting guide” because the words don’t overlap. Embedding-based search finds semantically similar content regardless of exact word match.

// WRONG — keyword search misses semantically related content
func searchDocs(db *sql.DB, query string) ([]Doc, error) {
    // This matches "login" and "log in" but not "authentication" or "sign in"
    rows, err := db.Query(`
        SELECT id, title, content
        FROM docs
        WHERE to_tsvector('english', content) @@ plainto_tsquery('english', $1)
        LIMIT 10
    `, query)
    // ... scan rows
}

The query “password reset not working” won’t match an article about “credential recovery” even though they describe the same thing.

The Idiomatic Way

The pipeline has three phases: (1) embed documents and store vectors on ingest, (2) embed the query at search time, (3) find nearest neighbors by cosine similarity.

Generating embeddings via the OpenAI API:

// embed/openai.go
package embed

import (
    "bytes"
    "context"
    "encoding/json"
    "fmt"
    "net/http"
)

const embeddingModel = "text-embedding-3-small" // 1536 dimensions, fast and cheap

type Client struct {
    apiKey string
    http   *http.Client
}

func NewClient(apiKey string) *Client {
    return &Client{
        apiKey: apiKey,
        http:   &http.Client{Timeout: 30 * time.Second},
    }
}

type EmbedRequest struct {
    Input []string `json:"input"`
    Model string   `json:"model"`
}

type EmbedResponse struct {
    Data []struct {
        Embedding []float32 `json:"embedding"`
        Index     int       `json:"index"`
    } `json:"data"`
    Usage struct {
        TotalTokens int `json:"total_tokens"`
    } `json:"usage"`
}

func (c *Client) Embed(ctx context.Context, texts []string) ([][]float32, error) {
    body, _ := json.Marshal(EmbedRequest{
        Input: texts,
        Model: embeddingModel,
    })

    req, err := http.NewRequestWithContext(ctx, http.MethodPost,
        "https://api.openai.com/v1/embeddings", bytes.NewReader(body))
    if err != nil {
        return nil, err
    }
    req.Header.Set("Content-Type", "application/json")
    req.Header.Set("Authorization", "Bearer "+c.apiKey)

    resp, err := c.http.Do(req)
    if err != nil {
        return nil, fmt.Errorf("embed request: %w", err)
    }
    defer resp.Body.Close()

    if resp.StatusCode != http.StatusOK {
        return nil, fmt.Errorf("embed API error: %d", resp.StatusCode)
    }

    var result EmbedResponse
    if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
        return nil, fmt.Errorf("decode embed response: %w", err)
    }

    // Return embeddings in input order
    embeddings := make([][]float32, len(result.Data))
    for _, d := range result.Data {
        embeddings[d.Index] = d.Embedding
    }
    return embeddings, nil
}

Storing and searching with PostgreSQL + pgvector:

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Documents table with embedding column
CREATE TABLE docs (
    id          BIGSERIAL PRIMARY KEY,
    title       TEXT NOT NULL,
    content     TEXT NOT NULL,
    embedding   vector(1536),  -- matches text-embedding-3-small dimensions
    created_at  TIMESTAMPTZ DEFAULT NOW()
);

-- IVFFlat index for fast approximate nearest-neighbor search
-- lists = sqrt(number of rows) is a good starting point
CREATE INDEX docs_embedding_idx ON docs USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);

// store/doc_store.go
package store

import (
    "context"
    "github.com/jackc/pgx/v5"
    "github.com/pgvector/pgvector-go"
)

type DocStore struct {
    pool *pgxpool.Pool
}

func (s *DocStore) Insert(ctx context.Context, title, content string, embedding []float32) (int64, error) {
    var id int64
    err := s.pool.QueryRow(ctx, `
        INSERT INTO docs (title, content, embedding)
        VALUES ($1, $2, $3)
        RETURNING id
    `, title, content, pgvector.NewVector(embedding)).Scan(&id)
    return id, err
}

type SearchResult struct {
    ID         int64
    Title      string
    Content    string
    Similarity float64
}

func (s *DocStore) Search(ctx context.Context, queryEmbedding []float32, limit int) ([]SearchResult, error) {
    rows, err := s.pool.Query(ctx, `
        SELECT id, title, content,
               1 - (embedding <=> $1) AS similarity
        FROM docs
        ORDER BY embedding <=> $1  -- cosine distance operator
        LIMIT $2
    `, pgvector.NewVector(queryEmbedding), limit)
    if err != nil {
        return nil, fmt.Errorf("vector search: %w", err)
    }
    defer rows.Close()

    var results []SearchResult
    for rows.Next() {
        var r SearchResult
        if err := rows.Scan(&r.ID, &r.Title, &r.Content, &r.Similarity); err != nil {
            return nil, err
        }
        results = append(results, r)
    }
    return results, rows.Err()
}

Putting it together in a search handler:

// Semantic search handler — embed the query, find nearest documents
func (h *Handler) Search(w http.ResponseWriter, r *http.Request) {
    query := r.URL.Query().Get("q")
    if query == "" {
        http.Error(w, "q required", http.StatusBadRequest)
        return
    }

    // Embed the query using the same model as the documents
    embeddings, err := h.embedder.Embed(r.Context(), []string{query})
    if err != nil {
        http.Error(w, "embed query failed", http.StatusInternalServerError)
        return
    }

    results, err := h.store.Search(r.Context(), embeddings[0], 10)
    if err != nil {
        http.Error(w, "search failed", http.StatusInternalServerError)
        return
    }

    // Filter by similarity threshold — results below 0.7 are usually not relevant
    var relevant []SearchResult
    for _, r := range results {
        if r.Similarity >= 0.70 {
            relevant = append(relevant, r)
        }
    }

    json.NewEncoder(w).Encode(relevant)
}

In The Wild

I built a semantic search system for a 50,000-article technical knowledge base. The previous keyword search handled about 60% of user queries successfully (the user found what they were looking for). After adding embedding-based search, that rose to 82%.

The performance concern was real: embedding the query adds a round trip to the OpenAI API before every search. We addressed this with a two-level cache: Redis for exact query text matches (many users ask identical questions), and an in-memory LRU for the current process instance. P99 latency including the cache miss path was 180ms — acceptable for a search interface.

The Gotchas

Embed with the same model you indexed with. Embeddings from different models live in different vector spaces. If you switch embedding models, you must re-embed all documents — they can’t be mixed. Store the model name alongside each embedding in your database.

Batch your embedding calls. Most embedding APIs support batching multiple texts in a single request. Embedding 100 documents one at a time costs 100 API calls and 100x the latency. Batch them — OpenAI allows up to 2048 inputs per request.

pgvector’s IVFFlat index needs SET ivfflat.probes for recall tuning. Higher probe count means better recall but slower search. Start with probes = 10 (the default is 1), measure recall against exact search, and tune from there.

Similarity scores are not probabilities. A cosine similarity of 0.72 doesn’t mean “72% relevant.” The right threshold varies by embedding model and domain. Measure it empirically using a labeled evaluation set.

Key Takeaway

Semantic search in Go requires an embedding API client, a vector database (pgvector on PostgreSQL works beautifully), and a search handler that embeds the query and finds nearest neighbors. Use the same embedding model for indexing and querying. Batch your embedding calls. Add a similarity threshold to filter low-confidence results. Cache embeddings of common queries. The whole stack runs in Go without Python, without a separate vector database service, and performs well under production load.

← Lesson 4: Tool Calling Patterns | Course Index | Next → Lesson 6: Agent Architectures

Atharva Pandey/Lesson 5: Embedding and Vector Search — Semantic search in Go without Python

The Problem

The Idiomatic Way

In The Wild

The Gotchas

Key Takeaway