What We Set Out to Do

The Library of Babel — a personal semantic search engine over ~5,000 books and 2.4M text passages — was running on local Ollama embeddings (nomic-embed-text-v2-moe). The plan was straightforward: migrate to Gemini’s embedding API for better quality and speed.

What actually happened was three arcs that each built on the last.

Arc 1: The Gemini Nuclear Re-embed

Wrote a batch re-embed script targeting gemini-embedding-001 (768d vectors). 100 texts per batch call, rate limit handling with exponential backoff, budget caps, checkpoint resumption.

First attempt hit free tier rate limits hard. Upgraded to paid tier ($25 prepaid), tuned throttle to 0.3s between calls, and it stabilized at ~200 RPM. 362K+ chunks embedded so far at ~$3.50 actual cost. The script runs in a screen session — learned the hard way that nohup processes get killed by Claude Code signal interference.

Updated 9+ files across the codebase for the migration. Built multi-model fallback: Gemini embeddings queried first, nomic-v2-moe backfill for sparse results. Each model’s vectors queried separately (you cannot mix cosine similarity across embedding spaces) then merged and deduplicated.

Arc 2: Semantic Search Revolution

With Gemini embeddings in place, the search quality jump was immediately obvious. But the real breakthrough came from removing YAKE keyword extraction.

The old pipeline: passage -> YAKE keywords -> embed keywords -> store vector. The new pipeline: passage -> embed full text -> store vector.

Simpler, and dramatically better. A query about “religious prohibition against thinking machines” — with zero Dune-specific words — returns Dune books about the Butlerian Jihad. YAKE was stripping the contextual signal that made this possible.

Other search improvements shipped:

Fixed reading_link page estimation (was always page 1, now calculates from cumulative word counts)
Bumped query max_length from 500 to 4000 chars for full passage search
Added book deduplication and minimum similarity thresholds (0.45)
Integrated with life-dashboard: journal entries get “Find in Library” buttons, audiomarks cross-referenced

Arc 3: The Research Page

With semantic search working well, consolidated the scattered discovery tools into a single /research page with five tabs:

Search: Paste up to 600 words, semantic search across 4,945 books
Mind Map: Pulls recent journal/audiomark/highlight events, cross-references each against the library
Theme Explorer: Clusters highlights by semantic similarity using pgvector, finds books per cluster
Reading Radar: Recommends next books based on recent activity, cross-references Plex library
Highlights: Browse and bookmark discovered passages

The bookmark system saves passages as events in the DB. Nav consolidated across 7 HTML pages, old URLs redirect.

Key Decisions

Gemini over local inference: $3.50 for 362K chunks vs. hours of M2 Pro compute. The cost argument for local embeddings is gone.

Kill YAKE: The hardest decision because it meant admitting the preprocessing pipeline was making things worse. But the evidence was unambiguous.

Multi-model coexistence: Rather than blocking on a complete re-embed, query both model spaces and merge. Graceful degradation during migration.

Research page consolidation: Five separate pages with overlapping navigation were confusing. One page with tabs is discoverable.

What Shipped

Gemini embedding migration (362K+ chunks, 9 files updated)
YAKE removal, raw prose embedding
Multi-model query fallback and merge
Page estimation fix for reading links
Query length bump (500 -> 4000 chars)
Book deduplication and similarity thresholds
Research page with 5 tabs (Search, Mind Map, Themes, Radar, Highlights)
Bookmark/save system for discovered passages
Life-dashboard integration (journal -> library cross-reference)
Nav consolidation across 7 pages

What’s Next

The re-embed job is still running — 362K of 2.4M chunks complete. Once it finishes, the nomic fallback path can be removed and queries simplified. The Research page’s Theme Explorer needs tuning on cluster count and similarity thresholds. Reading Radar recommendations are plausible but could weight recency more.