50,704 Posts. Two API Servers. One Mac Mini. Zero Cloud Costs.

It’s 2am and I’m staring at a Grafana dashboard with 14 panels all showing green. Six hours ago this repo was a GitHub URL with some WebXR galaxy code and no local clone. Now there are two API servers running, a Postgres database with vector embeddings for 41,943 posts, enrichment pipelines that have classified every single post by sentiment and energy and intent, a knowledge graph with 12,039 edges, and a haiku oracle that texts me poetry generated from my own Threads posts via iMessage.

I need to write this down before I forget how we got here.

The Starting Point

The threads-analysis repo existed on GitHub — private, had the WebXR galaxy visualizations from the Fireflies session, the information-theory pipeline from the original platform build. But no local clone on this machine. No .env. No data files. Just code.

So step one was git clone. Step two was “let’s build the API layer I’ve been meaning to build.” Step nineteen was a haiku oracle sending poetry to my phone.

Scope creep is a feature, not a bug.

The Architecture

Two servers. One database. All local.

┌─────────────────────────────────────────────┐
│                Mac Mini (M2 Pro)             │
│                                             │
│  ┌──────────────┐    ┌──────────────┐       │
│  │  Flask API    │    │  Node API    │       │
│  │  :4323        │    │  :4322       │       │
│  │  69 endpoints │    │  26 endpoints│       │
│  │  Swagger/     │    │  Gemma 4 RAG │       │
│  │  ReDoc/Scalar │    │              │       │
│  └──────┬───────┘    └──────┬───────┘       │
│         │                    │               │
│         └────────┬───────────┘               │
│                  │                           │
│         ┌────────▼────────┐                  │
│         │   PostgreSQL    │                  │
│         │  pgvector:pg17  │                  │
│         │  50,706 posts   │                  │
│         │  41,943 vectors │                  │
│         └─────────────────┘                  │
│                                             │
│  ┌──────────────┐    ┌──────────────┐       │
│  │  Ollama       │    │  Grafana     │       │
│  │  Gemma 4      │    │  14 panels   │       │
│  │  nomic-embed  │    │              │       │
│  └──────────────┘    └──────────────┘       │
│                                             │
│         All accessible over Tailscale        │
└─────────────────────────────────────────────┘

Flask API (:4323) — 69 Endpoints

Started at 58, ended at 69. The Flask server is the primary API — handles all the CRUD, the analysis endpoints, the enrichment pipeline triggers, the genealogy museums, the knowledge graph queries. Six different API doc UIs because I couldn’t pick one:

Swagger UI — the classic
ReDoc — clean, readable
RapiDoc — interactive
Scalar — modern, beautiful
Elements — Stoplight’s thing
RapiPDF — generates PDF docs from OpenAPI spec

Yeah, six doc renderers for one API is overkill. But now I can compare them side by side and I have opinions about all of them. Scalar wins.

Node API (:4322) — 26 Endpoints + Gemma 4 RAG

The Node server handles the AI-powered stuff. Gemma 4 running through Ollama for retrieval-augmented generation. You ask it a question about my Threads posts and it pulls relevant vectors from pgvector, feeds them as context, and generates an answer grounded in actual data.

26 endpoints covering search, RAG queries, the haiku oracle, and vector similarity lookups.

The Data Pipeline

Sync: 50,704 Posts

Pulled everything from the Threads API. 19,997 original posts and 30,707 replies. The sync handles pagination, rate limiting, incremental updates. Every post goes into Postgres with full metadata.

Vector Embeddings: 41,943 Text Posts

Not every post has text (some are pure images/video). For the 41,943 that do, I generated 768-dimensional embeddings using nomic-embed-text running locally through Ollama. No OpenAI API calls. No embedding costs. Just a local model grinding through 42K posts on the M2 Pro.

The embeddings live in pgvector — proper HNSW indexing for fast cosine similarity search. Ask “find posts similar to this one” and it returns results in milliseconds.

Enrichment Pipeline

Every text post got classified across four dimensions:

Dimension	What It Captures
Sentiment	Positive / negative / neutral + confidence score
Energy	High / low / calm / chaotic
Intent	Inform / persuade / vent / joke / question / reflect
Language	Primary language detection

All running through local models. The whole enrichment pass across 41,943 posts took a while but cost exactly zero dollars.

The Genealogy Museums

This is where it got weird and I loved it.

Tech Genealogy Museum

49 topics extracted from the corpus. Not just tags — actual technology topics with post counts and temporal distribution. Claude shows up 452 times. ChatGPT 248. The museum tracks how tech discourse evolves over time in my posting.

792 co-occurrence edges map which technologies get discussed together. Claude and API show up together a lot. Python and data. visionOS and spatial computing. It’s a map of tech adjacency in my own thinking.

Pedagogy Genealogy Museum

30 topics focused on learning, teaching, knowledge transfer. Then 150 additional posts discovered purely through vector similarity — posts that weren’t explicitly tagged as pedagogical but whose embeddings clustered near known pedagogy content. The vectors found teaching moments I didn’t know I was having.

The Haiku Oracle

This is the one that made me laugh out loud at 1am.

The haiku oracle picks random posts from across different time periods, feeds them to Gemma 4, and asks it to distill the essence into a haiku. Then it sends the haiku to me via iMessage using the macOS osascript bridge. Every haiku gets logged with a UUID and the source posts are recorded in a graph — so you can trace any haiku back to the original thoughts that inspired it.

I got a haiku about something I posted in 2023 and it was genuinely insightful about a pattern I didn’t notice at the time.

An AI reading my old posts and texting me poetry about them. On a Mac mini. Over Tailscale. At 1am. The future is so specific.

Knowledge Graph & Interaction Network

The knowledge graph grew from the original session’s 1,638 nodes / 11,155 edges to 1,896 nodes / 12,039 edges. The new nodes come from the tech and pedagogy genealogy analysis — more entities extracted, more relationships mapped.

The interaction graph captures 1,746 interactions across 422 users. Who I reply to, who replies to me, the shape of my Threads social network as a directed graph. Up from the original 409 users / 1,668 interactions.

Grafana Dashboard

14 panels covering:

Post volume over time (with reply breakdown)
Sentiment distribution
Energy classification
Top topics by post count
Embedding cluster visualization
Enrichment pipeline progress
API health checks
Knowledge graph growth metrics
Sync status and freshness

All pointing at the local Postgres instance. Accessible over Tailscale from my MacBook or phone.

The Extras

iOS Shortcuts guide — 7 recipes for querying the API from your phone. “What was I posting about this time last year?” as a home screen widget.

visionOS native app design doc — 7 visualization concepts for taking the Fireflies WebXR experience to native SwiftUI + RealityKit. The WebXR version is good. A native version with proper spatial audio and shared spaces would be something else entirely.

Auto-generating llms.txt — the API generates its own llms.txt file from live routes. If an LLM hits the server, it can read the machine-readable endpoint catalog and figure out how to use the API without human documentation.

Monitor script — checks every 15 minutes that both servers are up, Postgres is responding, Ollama is loaded. If anything’s down, sends an iMessage alert.

The Code Review Gauntlet

Ran three parallel code reviews with different lenses:

Reuse review — found duplicated logic across Flask and Node servers
Quality review — error handling gaps, missing type hints, inconsistent response formats
Efficiency review — N+1 queries, unnecessary round-trips, unindexed columns

11 fixes applied from the combined findings. Then a security audit scrubbed any leaked secrets before the push to GitHub.

The Numbers

Metric	Count
Posts synced	50,704
Original posts	19,997
Replies	30,707
Posts in Postgres	50,706
Text posts with vectors	41,943
Vector dimensions	768
Flask endpoints	69
Node endpoints	26
Knowledge graph nodes	1,896
Knowledge graph edges	12,039
Interaction graph users	422
Interaction graph edges	1,746
Tech genealogy topics	49
Pedagogy topics	30
Co-occurrence edges	792
Grafana panels	14
iOS Shortcut recipes	7
visionOS concepts	7
Code review fixes	11
Cloud cost	$0

What I Learned This Session

pgvector on local Postgres is seriously underrated. Everyone reaches for Pinecone or Weaviate. But if your data fits on one machine — and 50K posts absolutely fits on one machine — pgvector with HNSW indexing is fast, free, and you already know SQL. No new query language. No managed service. No API rate limits.

Two API servers is fine. Flask for the structured data layer, Node for the AI/RAG layer. They share a database. They have different strengths. Trying to force everything into one framework would have been slower to build and harder to maintain.

Vector similarity finds things taxonomy can’t. The pedagogy museum’s 150 vector-discovered posts are the proof. My tagging system would never have caught those posts as pedagogical. But the embeddings knew. The geometry of meaning is more flexible than any classification system I could design by hand.

Ollama + local models changes the economics completely. Embedding 42K posts through OpenAI’s API would have cost real money. Running nomic-embed-text locally cost electricity and time. For batch processing where latency doesn’t matter, local inference is the obvious choice.

Stack

Layer	Tech
Primary API	Flask (Python), 69 endpoints
AI/RAG API	Node.js, 26 endpoints
Database	PostgreSQL 17 + pgvector
Embeddings	nomic-embed-text (768d) via Ollama
LLM	Gemma 4 via Ollama
Monitoring	Grafana (14 panels)
Alerts	iMessage via osascript
Hardware	Mac mini M2 Pro
Network	Tailscale
Docs	Swagger, ReDoc, RapiDoc, Scalar, Elements, RapiPDF

All local. All private. All mine.