Building a Blackboard: Research Pipeline From Scratch
The Spark
It started with a link to mvanhorn/last30days-skill — a Claude Code skill that researches topics across Reddit, X, YouTube, HN, and Polymarket. Cool concept. One problem: it depends on ScrapeCreators at $47/mo for Reddit + TikTok + Instagram scraping.
Looked at the pricing page. Looked at the free APIs. Looked at the pricing page again.
“We can build our own. With blackjack and hookers.”
So we did.
What We Built (in one session)
1. The Blackboard — A Fourth Visual Surface
bythewei.dev has three visual worlds: cork board (homepage), bookshelf (catalog), whiteboard (research docs). Today we added a fourth: the blackboard.
Dark green chalkboard surface. Chalk dust texture. Wooden frame. Chalk tray with colored chalk pieces and an eraser. Cards with chalk-drawn borders and pastel text. The whiteboard is “my notes” — the blackboard is “what’s happening out there.”
src/content/blackboard/*.md → inline section on homepage
→ /?modal=blackboard (fullscreen)
→ /blackboard (redirect)
→ /chalk/{slug} (individual pages)
Drop a markdown file in the folder, it appears on the board at build time. Same pattern as the whiteboard docs. Zero config.
Also added /whiteboard as a redirect route while we were at it.
2. The Research Pipeline — 11 Sources, Zero Paid APIs
Instead of paying ScrapeCreators, we hit the APIs directly:
| Source | Auth | How |
|---|---|---|
| Hacker News | None | Algolia search API |
| DuckDuckGo | None | HTML + instant answer |
| Bluesky | None | AT Protocol public |
| Mastodon | None | Multi-instance (mastodon.social, hachyderm, fosstodon) |
| Stack Overflow | Optional free key | StackExchange v2.3 |
| Lobsters | None | JSON endpoints |
| Dev.to | Optional free key | Article + tag search |
| Lemmy | None | Federated search across 3 instances |
| Free app key | OAuth2 client credentials | |
| YouTube | Free API key | Data API v3 |
| Threads | Free token | Meta Graph API (own posts only) |
8 of 11 sources work with zero keys. The other 3 are free to set up.
All sources run in parallel via Promise.all. 8 agents searching simultaneously, results back in ~3 seconds.
3. The Scorer
Every result gets scored by engagement metrics specific to its platform:
- HN: points + comments
- Reddit: upvotes + comments + ratio
- YouTube: views + likes
- Mastodon: favourites + boosts
- Lobsters: score + comments
Then trigram Jaccard similarity deduplicates across sources. If two results from different platforms match at >0.45 similarity, the lower-scored one gets merged and the winner gets a +15 convergence bonus. Cross-platform consensus = higher confidence signal.
4. The Link Validator
Dead links become research opportunities. The pipeline HEAD-checks every URL in the top 20 results. If something returns 4xx/5xx/timeout, it auto-searches DuckDuckGo for a replacement on the same topic and swaps the link. Zero dead links in the output.
5. The Content Enricher
The pipeline doesn’t just list links — it visits the top 10 URLs, extracts readable text from the article body, cleans HTML entities and junk (share buttons, subscribe CTAs, cookie banners), and includes real excerpts in the output. You get actual content, not just titles.
6. The Output Format
ADHD-friendly, not a link dump:
## TL;DR — instant hook, signal count
## What's Actually Happening — narrative from fetched articles
## Key Patterns — cross-source themes
## Deep Dives — per-source with engagement + excerpts
## What People Are Building — project table with links
## The Developer's Take — how to implement these signals
## Cross-Discipline Applications — finance, medicine, education, law, creative, engineering
## Signal Report — visual stats block
The cross-discipline section was the fun idea: every weekly digest ends with perspectives from 6 fields — how would a trader, a doctor, a teacher, a lawyer, a designer, and an engineer each use this week’s developments?
7. The MCP Server
The whole thing is also an MCP server with 3 tools:
| Tool | What |
|---|---|
research_topic | Full research pipeline — search, score, dedupe, enrich, output |
research_sentiment | Bullish/bearish scoring for stocks, brands, products |
list_sources | Show available sources and key status |
Wire it up in .mcp.json and Claude Code can research any topic on demand.
8. Docker
FROM node:22-slim
# ...
CMD ["node", "src/mcp-server.mjs"]
One dependency: @modelcontextprotocol/sdk. Everything else is Node built-ins and fetch.
The Architecture
bythewei-research/
├── bin/cli.mjs ← CLI: node bin/cli.mjs "topic"
├── src/
│ ├── mcp-server.mjs ← MCP stdio server
│ ├── researcher.mjs ← Orchestrator (search → score → validate → enrich → output)
│ ├── sources/ ← 11 source modules, identical interface
│ │ ├── hackernews.mjs
│ │ ├── mastodon.mjs ← multi-instance
│ │ ├── reddit.mjs ← with smart subreddit discovery
│ │ └── ...
│ └── lib/
│ ├── scorer.mjs ← Per-source scoring + trigram dedup
│ ├── fetcher.mjs ← URL content extraction
│ ├── link-validator.mjs ← Dead link detection + replacement
│ ├── output.mjs ← Rich markdown generator
│ └── env.mjs ← Multi-path .env loader
├── Dockerfile
└── CLAUDE.md ← 150 lines, best practices
Every source module exports the same interface:
export const name = 'SourceName';
export const icon = '';
export const requiresKey = false;
export async function search(topic, { days, limit }) {
// → [{ source, title, url, author, date, snippet, engagement, score: 0 }]
}
Add a new source in 50 lines. Drop it in sources/, add the import to researcher.mjs, done.
The Weekly Digest
Instead of one blackboard post per research query (which fragmented fast), we consolidated into one weekly article that covers everything:
- All research from the week merged into narrative sections
- Every mention links to its original source (HN discussion, GitHub repo, arXiv paper, SO question)
- All 35 links validated — dead ones auto-replaced
- Developer implementation guide at the end
- Cross-discipline applications (6 fields) at the end
The first one — Week of Mar 24 — covers MCP going mainstream, agents going production, and open-source closing the gap. 120+ signals across 6 platforms.
What’s Next
- Wire up Reddit, YouTube, and Threads API keys for full coverage
- Cron job or Claude Code hook to auto-run weekly
- Sentiment analysis mode for the ai-hedge-fund project
- Maybe a Philosophy perspective in the cross-discipline section. Need to think about how to make that not cringe.
The Meta
The research pipeline that tracks MCP developments… is itself an MCP server… that posts to a blackboard… about MCP. We’ve gone full recursive.
Total build time: one Claude Code session. No paid APIs. Just fetch and Promise.all.
Related: How I Use Claude Code | Design Principles