CASE STUDY // 003pgvectorgcppythonembeddings

PGVector Platform: Scaling Semantic Search Indexes on GCP.

How we configured PostgreSQL pgvector indexes and embedding pipelines on Google Cloud Platform to support sub-50ms search times across 2 million items.

// THE ARCHITECTURE

Visual system wiring & data flows.

Cloud Run workers scale GPU embedding generation

pgvector HNSW index avoids full table scans

Cloud Run workers scale GPU embedding generation

pgvector HNSW index avoids full table scans

Cloud Run workers scale GPU embedding generation

// DEPLOYMENT LOG

Timeline logs & incidents.

09:30

Index compile

Postgres HNSW index compiled. Search scans optimized across 2.1M vectors.

15:00

Bulk updates

Re-indexing batch of 400k items. CPU consumption rose on database instance.

15:12

Index stabilized

Automatic replication caught up. Semantic search latency held under 42ms.

observer@edge

~ % gcloud run services list

› embedding-gen: active nodes=8 ok

› hnsw index scan duration=8ms

› p99 search=42ms

Never deploy straight to prod on a Friday without monitoring queue backups.

— postmortem checklist

// WHAT ALMOST BROKE

Honest failure points.

database

Slow index builds

IVFFlat indexes degraded search accuracy over time. Shifted to HNSW index builds for better recall.

infrastructure

CPU throttling

Generating embeddings locally on database instances melted CPUs. Delegated work to Cloud Run.

// LESSONS

The system got smarter because it broke in public.

SYSTEM ARCHITECTURE & POSTMORTEM

1. Situation Context Searching through millions of documents for semantic matches is computationally heavy. Standard SQL string searches fail to capture context, and raw embeddings indexing deadlocks server CPUs.

2. Why It Was Interesting GCP SQL databases are easy to run, but embedding updates scale poorly. We needed to design a pipeline that regenerated vectors dynamically on Cloud Run and updated pgvector indexes in Postgres without locking transactions.

3. What We Built We set up an HNSW indexing system on PostgreSQL. We isolated GPU embedding models on Cloud Run, while the database node focused solely on pgvector index lookups. Search latency dropped to 42ms, and index recall remained stable under bulk catalog updates.

HNSW is expensive in memory but critical for search recall rates.

— sam

keep model inference pipeline completely separated from the storage instance.

— lena

// READ NEXT — ANOTHER DEPLOYMENTPharmacyIQ: Building a Rust Edge Router to Sync Legacy Catalogs.

compiling_ecosystem…

The system got smarter because it broke in public.

SYSTEM ARCHITECTURE & POSTMORTEM

1. Situation Context Searching through millions of documents for semantic matches is computationally heavy. Standard SQL string searches fail to capture context, and raw embeddings indexing deadlocks server CPUs.

2. Why It Was Interesting GCP SQL databases are easy to run, but embedding updates scale poorly. We needed to design a pipeline that regenerated vectors dynamically on Cloud Run and updated pgvector indexes in Postgres without locking transactions.

3. What We Built We set up an HNSW indexing system on PostgreSQL. We isolated GPU embedding models on Cloud Run, while the database node focused solely on pgvector index lookups. Search latency dropped to 42ms, and index recall remained stable under bulk catalog updates.