AI Hedge Fund Skills: Mechanical Market Analysis on a Budget
Last updated: 2026-03-21
The Problem with LLM Hedge Funds
The original ai-hedge-fund repo is a clever idea: 18 LLM agents role-playing as different analysts (Buffett, Munger, Cathie Wood, etc.) that each analyze stocks and produce trading signals. A risk manager aggregates them. A portfolio manager makes the final call.
It sounds cool. In practice, it’s $0.82 per run and most of the “analysis” is just Claude wearing different hats. The persona agents (Buffett, Munger, Dalio) are literally the same model with different system prompts. They’re not accessing different data or running different math. They’re vibes.
So I gutted it.
The mechanical pipeline keeps the 6 agents that actually compute things (technical, fundamentals, growth, sentiment, valuation, news sentiment) and replaces even those with rule-based SQL queries. The LLM gets invoked exactly once at the end to write a human-readable allocation memo. Everything else is deterministic.
Architecture
┌─────────────────────────────────────────────────────────┐
│ Local Docker │
│ ┌───────────┐ ┌──────────────┐ ┌────────────────┐ │
│ │ Postgres │ │ Finviz Elite │ │ Yahoo Finance │ │
│ │ (market_ │ ← │ Snapshots │ │ OHLCV backfill │ │
│ │ data) │ │ (4hr pulls) │ │ (daily pulls) │ │
│ └─────┬─────┘ └──────────────┘ └────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────┐ │
│ │ 15 Signal Generators (SQL/Py) │ │
│ │ + Threads sentiment (optional) │ │
│ └─────────────┬───────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────┐ │
│ │ Composite scoring + regime │ │
│ │ detection + anomaly flags │ │
│ └─────────────┬───────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────┐ │
│ │ ONE Claude call → allocation │ │
│ │ memo with portfolio weights │ │
│ └─────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Data layer
| Component | Detail |
|---|---|
| Database | Postgres in Docker, market_data schema |
| Finviz snapshots | 639 tickers across 22 portfolios, pulled every 4 hours via scheduler.py |
| Yahoo OHLCV | 5-year backfill on first run, daily appends after that |
| Pipeline log | data_pipeline_log table tracks freshness — the skill checks this before running |
| Threads sentiment | Optional threads_sentiment table from Threads API keyword search |
The 22 portfolios
Not random ticker lists. These are thematic watchlists designed to cover the full market surface:
Momentum, Deep Value, Dividend Aristocrats, Growth at Reasonable Price, Small Cap Growth, Sector Rotation (11 ETFs), ARK Innovation holdings, Most Shorted, Buffett 13F, IPO Watch, Macro Indicators, Consumer Staples Defensive, REITs, Biotech Pipeline, China ADRs, Semiconductor Supply Chain, Energy Transition, Risk Regime (VIX/TLT/HYG/GLD proxies), Dogs of the Dow, Spin-offs & Special Situations, Insider Buying, Copper/Lithium/Rare Earth miners.
639 tickers total. Each portfolio has a Finviz Elite PID for direct watchlist export.
Skill 1: /market-analysis
The main event. Runs the full mechanical pipeline and produces an allocation recommendation.
What it does
-
Checks data freshness — Queries
data_pipeline_log. If Finviz or Yahoo data is >24h stale, warns you and offers to run a fresh pull. -
Detects market regime — Compares risk-on equity performance vs safe-haven assets. Checks volatility trend (weekly vs monthly), credit spread proxies (HYG vs TLT). Classifies the environment as RISK_ON / RISK_OFF / NEUTRAL / TRANSITION with a confidence score.
-
Runs 15 signal generators against the latest snapshot data:
| # | Signal | Data Source | What It Catches |
|---|---|---|---|
| 1 | Trend Alignment | SMA 20/50/200 | Multi-timeframe momentum |
| 2 | Mean Reversion | RSI + 52-week range | Oversold/overbought setups |
| 3 | Value Composite | Sector-relative P/E, P/S, P/B, PEG | Cheap vs expensive relative to peers |
| 4 | Quality Screen | Margins + returns + balance sheet | Companies that won’t blow up |
| 5 | Growth Trajectory | EPS acceleration | Earnings momentum |
| 6 | Insider + Institutional Flow | Insider/institutional ownership changes | Smart money movement |
| 7 | Short Squeeze Setup | Short float + days to cover | Crowded shorts |
| 8 | Volatility Regime | Ticker-level vol vs historical | Calm vs chaotic |
| 9 | Analyst Consensus Divergence | Target price vs current | Wall Street disagreement |
| 10 | Relative Strength | Within-portfolio ranking | Best-in-class picks |
| 11 | Dividend Safety | Payout ratio + yield vs history | Income sustainability |
| 12 | Price-Volume Divergence | Yahoo OHLCV | Volume confirming/denying price moves |
| 13 | Beta-Adjusted Performance | Returns adjusted for market risk | Alpha extraction |
| 14 | Earnings Surprise Momentum | Recent EPS beats/misses | Post-earnings drift |
| 15 | Crowding/Concentration Risk | Institutional overlap + position sizing | Herding risk |
If Threads sentiment data exists in the database, it adds Signal 16: Social Sentiment using real keyword search data from the Threads API — not LLM-guessed vibes, actual post counts and engagement metrics.
-
Composite scoring — Applies portfolio-type weight adjustments (momentum portfolios weight trend signals higher, deep value portfolios weight mean reversion higher) and regime-adjusted weights. Each ticker gets a score from -100 to +100.
-
Portfolio rotation — Ranks all 22 portfolios by composite momentum + breadth + quality. Determines allocation weights based on regime (more cash in RISK_OFF, more equity in RISK_ON).
-
Anomaly detection — Flags statistical outliers (>3 sigma), signal conflicts (fundamentals say buy, technicals say sell), rapid changes from previous snapshot, and portfolio dispersion breakdowns.
-
One LLM call — Feeds all mechanical outputs to Claude for the final allocation memo: regime assessment, portfolio weights, top individual positions with rationale, risk warnings, anomaly notes, rebalance triggers.
Usage
/market-analysis # Full analysis, all 22 portfolios
/market-analysis AAPL,MSFT,NVDA # Focus on specific tickers
/market-analysis --portfolio momentum # Single portfolio deep-dive
Output
Structured allocation recommendation with:
- Composite scores for top 20 bullish + top 20 bearish tickers
- Portfolio rotation weights (which of the 22 portfolios to overweight/underweight)
- Position sizing (max 5% per ticker, max 25% per theme)
- Cash allocation based on regime
- Risk warnings and anomaly flags
Cost
~$0.02–0.05 per run. One Claude call at the end. Compare that to $0.82 for the original 18-agent approach. You could run this 16 times for the cost of one run of the original.
Skill 2: /validate-portfolios
The hygiene skill. Catches bad data before it corrupts your analysis.
The problem it solves
The 22 portfolios contain 639 tickers. Some of those tickers were generated with LLM assistance, which means they’re trained on data that might be 6–18 months stale. Companies get acquired. Tickers get delisted. Symbols change. Crypto tokens sneak in where US stocks should be. If your signal generators are running math on a ticker that doesn’t exist anymore, your composite scores are garbage.
What it does
-
Loads the portfolio registry from
src/tools/finviz/portfolios.py— 22 portfolios, 639 tickers, grouped by theme. -
Spawns 22 parallel Haiku agents — one per portfolio, all running simultaneously with zero shared context. Each agent independently validates its portfolio’s tickers for:
- Ticker validity — Is this a real, currently trading US-listed stock/ETF? Flag delisted, bankrupt, acquired, taken-private, renamed symbols.
- Thematic accuracy — Does AAVE (a DeFi token) belong in a US equities portfolio? No.
- Completeness — For rules-based lists (Dogs of the Dow, Buffett 13F), are the actual current constituents correct?
- Date awareness — Training data defaults to 2025. We’re in 2026. Mergers, spinoffs, and ticker changes that happened in 2025–2026 need to be caught.
- Compiles results into a unified report:
- CRITICAL FIXES — delisted/invalid tickers that must go
- SYMBOL FIXES — wrong ticker symbols (BRK.B vs BRK-B)
- DUPLICATES — same ticker in multiple groups within one portfolio
- MISCLASSIFICATIONS — tickers in the wrong thematic group
- SUGGESTED ADDITIONS — obvious gaps
-
Offers to auto-apply — If you approve, it updates
portfolios.pydirectly. -
Optional Finviz API validation — Runs
multi_quote()against corrected portfolios to confirm every ticker returns data from Finviz Elite. Catches anything the agents missed.
Usage
/validate-portfolios # Validate all 22 portfolios
/validate-portfolios most_shorted # Validate one specific portfolio
Cost
~$0.44 per full run (22 Haiku agents at ~$0.02 each). Takes 2–3 minutes since they all run in parallel. Cheap insurance against stale data corrupting 15 signal generators.
The Threads Sentiment Integration
This is the part I’m most pleased with.
The original repo has a sentiment_agent that asks Claude to guess what social media sentiment looks like for a given ticker. It’s literally just the LLM imagining what people might be saying. That’s not sentiment analysis — that’s creative writing.
The replacement uses the Threads API to run actual keyword searches for ticker symbols and company names. Real post counts. Real engagement metrics. Real text that real humans wrote on a real social network. The data goes into a threads_sentiment table in Postgres, and Signal 16 reads from it during the pipeline.
Is Threads the best source of financial sentiment? No, that would be Twitter/X or StockTwits. But Threads is what I have API access to, and real data from an imperfect source beats imagined data from a perfect one every time.
Why This Matters (The Philosophy Bit)
The original ai-hedge-fund is a showcase for LangGraph multi-agent orchestration. It’s architecturally interesting. But as an actual trading analysis tool, it has a fundamental problem: the LLM is doing work that math should do.
Computing whether RSI is below 30 doesn’t require intelligence. Comparing P/E ratios to sector averages doesn’t require intelligence. Checking if a stock is above its 200-day SMA doesn’t require intelligence. These are table lookups and arithmetic.
What does require intelligence is synthesizing 15 different signals into a coherent narrative with position sizing and risk management. That’s where the LLM earns its $0.03.
The mechanical approach is:
- Deterministic — same data, same scores, every time
- Auditable — every signal has a SQL query you can inspect
- Cheap — $0.05 vs $0.82 per analysis
- Fast — SQL queries over pre-loaded Postgres, not 18 sequential API calls
- Transparent — you can see exactly why a ticker scored +73 or -41
The LLM-everything approach is:
- Non-deterministic — different analysis each run
- Opaque — “Buffett agent says buy” is not auditable
- Expensive — 18 API calls per run
- Slow — sequential agent chain with retry logic
Both are valid engineering. But if you’re actually going to look at the numbers and make decisions, you want the one where the numbers mean something.
Running It Yourself
Prerequisites
# Docker for Postgres
docker compose up -d
# Python deps
poetry install
# Seed the database
poetry run python src/data/scheduler.py --seed
# First data pull (Finviz + Yahoo backfill)
poetry run python src/data/scheduler.py --once finviz
poetry run python src/data/scheduler.py --once backfill
Environment variables
# .env in project root
FINVIZ_ELITE_AUTH=your_finviz_export_token # CSV export only, not sensitive
DATABASE_URL=postgresql://user:pass@localhost:5432/market_data
ANTHROPIC_API_KEY=sk-ant-... # Only needed for the final LLM call
Ongoing data pipeline
# Run the scheduler as a daemon (pulls Finviz every 4hrs, Yahoo daily)
poetry run python src/data/scheduler.py
# Or one-off pulls
poetry run python src/data/scheduler.py --once finviz
poetry run python src/data/scheduler.py --once yahoo
Then just /market-analysis in Claude Code whenever you want a fresh read.
File Map
| Path | Purpose |
|---|---|
.claude/commands/market-analysis.md | The /market-analysis skill definition |
.claude/commands/validate-portfolios.md | The /validate-portfolios skill definition |
docs/signal-generators.md | Full documentation of all 15 signal generators with SQL sketches |
src/data/scheduler.py | Data pipeline scheduler (Finviz + Yahoo pulls) |
src/tools/finviz/client.py | Finviz Elite HTTP client with rate limiting |
src/tools/finviz/portfolios.py | 22 portfolio definitions, 639 tickers |
src/tools/finviz/filter_registry.json | 105 filter categories, 3390 values |
src/agents/ | Original LLM agents (mechanical ones still useful as reference) |
src/graph/state.py | LangGraph AgentState definition |
Cost Comparison
| Approach | Per Run | What You Get |
|---|---|---|
| Original 18-agent | ~$0.82 | 18 LLM opinions, non-deterministic |
| Mechanical + 1 LLM call | ~$0.02–0.05 | 15 deterministic signals + 1 synthesis |
| Portfolio validation | ~$0.44 | 22 parallel audits of 639 tickers |
| Full pipeline (validate + analyze) | ~$0.49 | Clean data + scored analysis |
You could run the full pipeline every day for a month and spend less than two runs of the original.