The Spark

It started with a link to mvanhorn/last30days-skill — a Claude Code skill that researches topics across Reddit, X, YouTube, HN, and Polymarket. Cool concept. One problem: it depends on ScrapeCreators at $47/mo for Reddit + TikTok + Instagram scraping.

Looked at the pricing page. Looked at the free APIs. Looked at the pricing page again.

“We can build our own. With blackjack and hookers.”

So we did.

What We Built (in one session)

1. The Blackboard — A Fourth Visual Surface

bythewei.dev has three visual worlds: cork board (homepage), bookshelf (catalog), whiteboard (research docs). Today we added a fourth: the blackboard.

Dark green chalkboard surface. Chalk dust texture. Wooden frame. Chalk tray with colored chalk pieces and an eraser. Cards with chalk-drawn borders and pastel text. The whiteboard is “my notes” — the blackboard is “what’s happening out there.”

src/content/blackboard/*.md → inline section on homepage
 → /?modal=blackboard (fullscreen)
 → /blackboard (redirect)
 → /chalk/{slug} (individual pages)

Drop a markdown file in the folder, it appears on the board at build time. Same pattern as the whiteboard docs. Zero config.

Also added /whiteboard as a redirect route while we were at it.

2. The Research Pipeline — 11 Sources, Zero Paid APIs

Instead of paying ScrapeCreators, we hit the APIs directly:

Source	Auth	How
Hacker News	None	Algolia search API
DuckDuckGo	None	HTML + instant answer
Bluesky	None	AT Protocol public
Mastodon	None	Multi-instance (mastodon.social, hachyderm, fosstodon)
Stack Overflow	Optional free key	StackExchange v2.3
Lobsters	None	JSON endpoints
Dev.to	Optional free key	Article + tag search
Lemmy	None	Federated search across 3 instances
Reddit	Free app key	OAuth2 client credentials
YouTube	Free API key	Data API v3
Threads	Free token	Meta Graph API (own posts only)

8 of 11 sources work with zero keys. The other 3 are free to set up.

All sources run in parallel via Promise.all. 8 agents searching simultaneously, results back in ~3 seconds.

3. The Scorer

Every result gets scored by engagement metrics specific to its platform:

HN: points + comments
Reddit: upvotes + comments + ratio
YouTube: views + likes
Mastodon: favourites + boosts
Lobsters: score + comments

Then trigram Jaccard similarity deduplicates across sources. If two results from different platforms match at >0.45 similarity, the lower-scored one gets merged and the winner gets a +15 convergence bonus. Cross-platform consensus = higher confidence signal.

4. The Link Validator

Dead links become research opportunities. The pipeline HEAD-checks every URL in the top 20 results. If something returns 4xx/5xx/timeout, it auto-searches DuckDuckGo for a replacement on the same topic and swaps the link. Zero dead links in the output.

5. The Content Enricher

The pipeline doesn’t just list links — it visits the top 10 URLs, extracts readable text from the article body, cleans HTML entities and junk (share buttons, subscribe CTAs, cookie banners), and includes real excerpts in the output. You get actual content, not just titles.

6. The Output Format

ADHD-friendly, not a link dump:

## TL;DR — instant hook, signal count
## What's Actually Happening — narrative from fetched articles
## Key Patterns — cross-source themes
## Deep Dives — per-source with engagement + excerpts
## What People Are Building — project table with links
## The Developer's Take — how to implement these signals
## Cross-Discipline Applications — finance, medicine, education, law, creative, engineering
## Signal Report — visual stats block

The cross-discipline section was the fun idea: every weekly digest ends with perspectives from 6 fields — how would a trader, a doctor, a teacher, a lawyer, a designer, and an engineer each use this week’s developments?

7. The MCP Server

The whole thing is also an MCP server with 3 tools:

Tool	What
`research_topic`	Full research pipeline — search, score, dedupe, enrich, output
`research_sentiment`	Bullish/bearish scoring for stocks, brands, products
`list_sources`	Show available sources and key status

Wire it up in .mcp.json and Claude Code can research any topic on demand.

8. Docker

FROM node:22-slim
# ...
CMD ["node", "src/mcp-server.mjs"]

One dependency: @modelcontextprotocol/sdk. Everything else is Node built-ins and fetch.

The Architecture

bythewei-research/
├── bin/cli.mjs ← CLI: node bin/cli.mjs "topic"
├── src/
│ ├── mcp-server.mjs ← MCP stdio server
│ ├── researcher.mjs ← Orchestrator (search → score → validate → enrich → output)
│ ├── sources/ ← 11 source modules, identical interface
│ │ ├── hackernews.mjs
│ │ ├── mastodon.mjs ← multi-instance
│ │ ├── reddit.mjs ← with smart subreddit discovery
│ │ └── ...
│ └── lib/
│ ├── scorer.mjs ← Per-source scoring + trigram dedup
│ ├── fetcher.mjs ← URL content extraction
│ ├── link-validator.mjs ← Dead link detection + replacement
│ ├── output.mjs ← Rich markdown generator
│ └── env.mjs ← Multi-path .env loader
├── Dockerfile
└── CLAUDE.md ← 150 lines, best practices

Every source module exports the same interface:

export const name = 'SourceName';
export const icon = '';
export const requiresKey = false;
export async function search(topic, { days, limit }) {
 // → [{ source, title, url, author, date, snippet, engagement, score: 0 }]
}

Add a new source in 50 lines. Drop it in sources/, add the import to researcher.mjs, done.

The Weekly Digest

Instead of one blackboard post per research query (which fragmented fast), we consolidated into one weekly article that covers everything:

All research from the week merged into narrative sections
Every mention links to its original source (HN discussion, GitHub repo, arXiv paper, SO question)
All 35 links validated — dead ones auto-replaced
Developer implementation guide at the end
Cross-discipline applications (6 fields) at the end

The first one — Week of Mar 24 — covers MCP going mainstream, agents going production, and open-source closing the gap. 120+ signals across 6 platforms.

What’s Next

Wire up Reddit, YouTube, and Threads API keys for full coverage
Cron job or Claude Code hook to auto-run weekly
Sentiment analysis mode for the ai-hedge-fund project
Maybe a Philosophy perspective in the cross-discipline section. Need to think about how to make that not cringe.

The Meta

The research pipeline that tracks MCP developments… is itself an MCP server… that posts to a blackboard… about MCP. We’ve gone full recursive.

Total build time: one Claude Code session. No paid APIs. Just fetch and Promise.all.