SYSTEM ARCHITECTURE & POSTMORTEM
1. Situation Context
Searching through millions of documents for semantic matches is computationally heavy. Standard SQL string searches fail to capture context, and raw embeddings indexing deadlocks server CPUs.
2. Why It Was Interesting
GCP SQL databases are easy to run, but embedding updates scale poorly. We needed to design a pipeline that regenerated vectors dynamically on Cloud Run and updated pgvector indexes in Postgres without locking transactions.
3. What We Built
We set up an HNSW indexing system on PostgreSQL. We isolated GPU embedding models on Cloud Run, while the database node focused solely on pgvector index lookups. Search latency dropped to 42ms, and index recall remained stable under bulk catalog updates.