From Broken Selectors to ReAct Agent: The Wei Bot Architecture Session

Started with a bug report: Wei Bot could only see Joan Didion on the research page. Ended with a multi-step ReAct agent architecture, 39 tools, a new Social Intelligence page, and a reading companion that tracks your active audiobook paragraph. 18 commits.

The Bug

Wei Bot used regex-based DOM scraping to understand each page. On the research page, the selectors were wrong — it grabbed a single cached element and reported Joan Didion regardless of what was actually displayed. The fix would have been updating the selectors. Instead, this turned into a full architecture rethink.

Research Phase

Before writing code, researched how production agent systems handle page-aware context:

Vercel AI SDK — structured tool definitions with Zod schemas, streaming tool calls
Sprinklr — enterprise ReAct agents over customer data, same loop pattern
GitHub Copilot — workspace indexing, progressive context injection
Intercom Fin — knowledge base RAG + tool calling for customer support

The consistent pattern: ReAct loop (reason, act, observe, repeat) with structured tool definitions. Nobody does regex DOM scraping. The page context is injected into the system prompt as structured data, not extracted from the rendered DOM.

Building the Agent

The ReAct Loop (`app/routes/agent.py`)

The core loop is simple. LLM receives the question plus page context, decides whether to answer directly or call a tool, server executes the tool and feeds the result back, LLM decides again. Max 6 iterations.

The complexity is in the edges: parsing unreliable JSON from DeepSeek, normalizing parameter names, preventing hallucinated tool results, recovering from malformed responses.

39 Tools (`app/agent_tools.py`)

Organized by cognitive intent, not API surface:

Data tools (12): today summary, weekly stats, now playing, reading progress, journal entries, highlights, streaks
Social tools (8): posting stats, engagement metrics, top posts, audience growth, content themes
Reading tools (9): book search, highlights by book, passages, chapter list, epub status, calibration info
Action tools (5): create audiomark, write journal entry, search by author, temporal correlation
Knowledge tools (5): search Library of Babel, find related books, get book details, topic search

Each tool is a direct DB query. Early versions proxied through HTTP endpoints, but 5 of those were silently broken — returning empty results or 404s. Replaced all proxies with direct asyncpg queries during a tool audit.

Structured Page Context

Eight page definitions, each listing the sections visible and the data keys available:

Home: now_playing, recently_played, today_stats, map_panel
Reading: browse_grid, cached_books, passages, active_player
Book: metadata, transcript_waterfall, chapters, player_bar
Log: search_results, filters, entry_cards, topics_sidebar
Research: search_tab, mind_map, themes, radar, highlights
Map: markers, filters, density_circles, route_display
Journal: entries, mood_tracker, writing_prompts
Social: posting_stats, engagement, audience, content_themes

The page context gets injected as structured JSON in the system prompt. The LLM knows what the user can see without scraping the DOM.

The DeepSeek Problem

DeepSeek V4 Flash (the current LLM provider via Ollama Cloud) has three failure modes that required workarounds:

Format inconsistency. Tool calls come as {"function": ...}, {"name": ...}, or {"action": ...} randomly. Parameters appear under params, args, or parameters. Built _normalize_tool_call() to handle all variants.

Hallucinated results. Given a tool to call, DeepSeek often generates the call AND a fabricated response in the same output. The agent loop skips execution because it sees a “result.” Fixed with strict “output ONLY JSON” prompting and post-result reinforcement.

Parameter mismatches. title instead of book_title, search instead of query, n instead of limit. Built PARAM_ALIASES mapping to catch common substitutions.

The session expanded into a full Social Intelligence page at /social — an 11-section analytics dashboard:

Posting frequency, engagement trends, audience growth
Top performing posts, content theme breakdown
Optimal posting times, hashtag performance
Network analysis, conversation threads

This required 22 new proxy endpoints to threads-analysis (bringing the total to 64). All wired as agent tools so the bot can answer social analytics questions from any page.

Reading Companion Mode

The most novel feature. When on a book page with active audio:

A 5-paragraph window around the current audio position gets injected into every message
Passage snapshots get saved into conversation history — the bot remembers what you were hearing when you asked a question
The bot can search 8,600 books in the Library of Babel for thematic connections to the active passage

Ask “what does this remind you of?” while listening to God Emperor of Dune, and the bot sees the exact paragraph, searches across all books, and finds thematic parallels. The reading companion emerged as the strongest use case — a conversational interface to your own reading history.

Prefetch Engine

Prefetch activates on the first chat message, not on page load. Zero cost until the user engages:

First message triggers background data loading (today stats, current reading, recent highlights)
A paragraph observer on the reading page refreshes Library of Babel connections every 15 seconds
Subsequent messages hit warm data instead of cold queries

The paragraph observer uses an IntersectionObserver on the epub reader — as new paragraphs scroll into view, it fetches semantic matches from the knowledge base. Cleaned up the observer lifecycle to prevent leaks on page navigation.

Cross-Page Memory

Conversation history persists via sessionStorage and gets logged to PostgreSQL as agent_chat events. Navigate from the reading page to the log page, and the bot remembers what you were discussing. Pending requests survive page changes through a navigation recovery system.

The PostgreSQL logging creates an automatic training corpus: every question, every tool call, every answer, every reading passage snapshot. The fine-tuning dataset builds itself.

Code Review: 3-Agent Audit

Ran a three-agent code review at the end of the session:

Agent 1: Architecture review — extracted _safe_payload() helper, moved imports to top level
Agent 2: Performance review — fixed IntersectionObserver leak, sanitized prefetch params, capped context size
Agent 3: Security review — parameter validation, injection prevention, rate limit checks

Also added audiomarks column to the daily_summary materialized view, which had been missing since audiomarks were added to the system.

The Cognitive Axes Framework

The most durable insight from the session. User questions map to three axes:

Immediate — answerable from page context. Free. No tool calls.
Vertical — depth into one data source. One tool call. Fast.
Horizontal — synthesis across sources. Multiple tool calls. Expensive.

This maps perfectly to the tool cost model and to user cognitive priority. Users ask immediate questions first (what’s playing?), then go vertical (what are my highlights from this book?), then go horizontal (what themes connect my reading to my journal?). Design the tool set to serve this progression.

What Shipped

ReAct agent loop with 6-step cap
39 tools across data, social, reading, action, and knowledge categories
Structured page context for 8 pages
Cross-page conversation memory (sessionStorage + PostgreSQL)
Social Intelligence page with 11 sections and 22 new proxy endpoints
Prefetch engine with paragraph observer
Reading companion mode with 5-paragraph window
Tool call normalization and parameter aliasing
Navigation recovery for pending requests
Conversation logging as embedded events

What’s Next

Fine-tune an LLM on the full life-dashboard dataset (73K+ events as training corpus)
Explore enterprise agent product potential (the architecture transfers directly)
Social page panel refinement (some panels need data format adjustments)