Last updated: 2026-03-21

The Problem with LLM Hedge Funds

The original ai-hedge-fund repo is a clever idea: 18 LLM agents role-playing as different analysts (Buffett, Munger, Cathie Wood, etc.) that each analyze stocks and produce trading signals. A risk manager aggregates them. A portfolio manager makes the final call.

It sounds cool. In practice, it’s $0.82 per run and most of the “analysis” is just Claude wearing different hats. The persona agents (Buffett, Munger, Dalio) are literally the same model with different system prompts. They’re not accessing different data or running different math. They’re vibes.

So I gutted it.

The mechanical pipeline keeps the 6 agents that actually compute things (technical, fundamentals, growth, sentiment, valuation, news sentiment) and replaces even those with rule-based SQL queries. The LLM gets invoked exactly once at the end to write a human-readable allocation memo. Everything else is deterministic.

Architecture

┌─────────────────────────────────────────────────────────┐
│ Local Docker │
│ ┌───────────┐ ┌──────────────┐ ┌────────────────┐ │
│ │ Postgres │ │ Finviz Elite │ │ Yahoo Finance │ │
│ │ (market_ │ ← │ Snapshots │ │ OHLCV backfill │ │
│ │ data) │ │ (4hr pulls) │ │ (daily pulls) │ │
│ └─────┬─────┘ └──────────────┘ └────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────┐ │
│ │ 15 Signal Generators (SQL/Py) │ │
│ │ + Threads sentiment (optional) │ │
│ └─────────────┬───────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────┐ │
│ │ Composite scoring + regime │ │
│ │ detection + anomaly flags │ │
│ └─────────────┬───────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────┐ │
│ │ ONE Claude call → allocation │ │
│ │ memo with portfolio weights │ │
│ └─────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘

Data layer

Component	Detail
Database	Postgres in Docker, `market_data` schema
Finviz snapshots	639 tickers across 22 portfolios, pulled every 4 hours via `scheduler.py`
Yahoo OHLCV	5-year backfill on first run, daily appends after that
Pipeline log	`data_pipeline_log` table tracks freshness — the skill checks this before running
Threads sentiment	Optional `threads_sentiment` table from Threads API keyword search

The 22 portfolios

Not random ticker lists. These are thematic watchlists designed to cover the full market surface:

Momentum, Deep Value, Dividend Aristocrats, Growth at Reasonable Price, Small Cap Growth, Sector Rotation (11 ETFs), ARK Innovation holdings, Most Shorted, Buffett 13F, IPO Watch, Macro Indicators, Consumer Staples Defensive, REITs, Biotech Pipeline, China ADRs, Semiconductor Supply Chain, Energy Transition, Risk Regime (VIX/TLT/HYG/GLD proxies), Dogs of the Dow, Spin-offs & Special Situations, Insider Buying, Copper/Lithium/Rare Earth miners.

639 tickers total. Each portfolio has a Finviz Elite PID for direct watchlist export.

Skill 1: `/market-analysis`

The main event. Runs the full mechanical pipeline and produces an allocation recommendation.

What it does

Checks data freshness — Queries data_pipeline_log. If Finviz or Yahoo data is >24h stale, warns you and offers to run a fresh pull.
Detects market regime — Compares risk-on equity performance vs safe-haven assets. Checks volatility trend (weekly vs monthly), credit spread proxies (HYG vs TLT). Classifies the environment as RISK_ON / RISK_OFF / NEUTRAL / TRANSITION with a confidence score.
Runs 15 signal generators against the latest snapshot data:

#	Signal	Data Source	What It Catches
1	Trend Alignment	SMA 20/50/200	Multi-timeframe momentum
2	Mean Reversion	RSI + 52-week range	Oversold/overbought setups
3	Value Composite	Sector-relative P/E, P/S, P/B, PEG	Cheap vs expensive relative to peers
4	Quality Screen	Margins + returns + balance sheet	Companies that won’t blow up
5	Growth Trajectory	EPS acceleration	Earnings momentum
6	Insider + Institutional Flow	Insider/institutional ownership changes	Smart money movement
7	Short Squeeze Setup	Short float + days to cover	Crowded shorts
8	Volatility Regime	Ticker-level vol vs historical	Calm vs chaotic
9	Analyst Consensus Divergence	Target price vs current	Wall Street disagreement
10	Relative Strength	Within-portfolio ranking	Best-in-class picks
11	Dividend Safety	Payout ratio + yield vs history	Income sustainability
12	Price-Volume Divergence	Yahoo OHLCV	Volume confirming/denying price moves
13	Beta-Adjusted Performance	Returns adjusted for market risk	Alpha extraction
14	Earnings Surprise Momentum	Recent EPS beats/misses	Post-earnings drift
15	Crowding/Concentration Risk	Institutional overlap + position sizing	Herding risk

If Threads sentiment data exists in the database, it adds Signal 16: Social Sentiment using real keyword search data from the Threads API — not LLM-guessed vibes, actual post counts and engagement metrics.

Composite scoring — Applies portfolio-type weight adjustments (momentum portfolios weight trend signals higher, deep value portfolios weight mean reversion higher) and regime-adjusted weights. Each ticker gets a score from -100 to +100.
Portfolio rotation — Ranks all 22 portfolios by composite momentum + breadth + quality. Determines allocation weights based on regime (more cash in RISK_OFF, more equity in RISK_ON).
Anomaly detection — Flags statistical outliers (>3 sigma), signal conflicts (fundamentals say buy, technicals say sell), rapid changes from previous snapshot, and portfolio dispersion breakdowns.
One LLM call — Feeds all mechanical outputs to Claude for the final allocation memo: regime assessment, portfolio weights, top individual positions with rationale, risk warnings, anomaly notes, rebalance triggers.

Usage

/market-analysis # Full analysis, all 22 portfolios
/market-analysis AAPL,MSFT,NVDA # Focus on specific tickers
/market-analysis --portfolio momentum # Single portfolio deep-dive

Output

Structured allocation recommendation with:

Composite scores for top 20 bullish + top 20 bearish tickers
Portfolio rotation weights (which of the 22 portfolios to overweight/underweight)
Position sizing (max 5% per ticker, max 25% per theme)
Cash allocation based on regime
Risk warnings and anomaly flags

Cost

~$0.02–0.05 per run. One Claude call at the end. Compare that to $0.82 for the original 18-agent approach. You could run this 16 times for the cost of one run of the original.

Skill 2: `/validate-portfolios`

The hygiene skill. Catches bad data before it corrupts your analysis.

The problem it solves

The 22 portfolios contain 639 tickers. Some of those tickers were generated with LLM assistance, which means they’re trained on data that might be 6–18 months stale. Companies get acquired. Tickers get delisted. Symbols change. Crypto tokens sneak in where US stocks should be. If your signal generators are running math on a ticker that doesn’t exist anymore, your composite scores are garbage.

What it does

Loads the portfolio registry from src/tools/finviz/portfolios.py — 22 portfolios, 639 tickers, grouped by theme.
Spawns 22 parallel Haiku agents — one per portfolio, all running simultaneously with zero shared context. Each agent independently validates its portfolio’s tickers for:

Ticker validity — Is this a real, currently trading US-listed stock/ETF? Flag delisted, bankrupt, acquired, taken-private, renamed symbols.
Thematic accuracy — Does AAVE (a DeFi token) belong in a US equities portfolio? No.
Completeness — For rules-based lists (Dogs of the Dow, Buffett 13F), are the actual current constituents correct?
Date awareness — Training data defaults to 2025. We’re in 2026. Mergers, spinoffs, and ticker changes that happened in 2025–2026 need to be caught.

Compiles results into a unified report:

CRITICAL FIXES — delisted/invalid tickers that must go
SYMBOL FIXES — wrong ticker symbols (BRK.B vs BRK-B)
DUPLICATES — same ticker in multiple groups within one portfolio
MISCLASSIFICATIONS — tickers in the wrong thematic group
SUGGESTED ADDITIONS — obvious gaps

Offers to auto-apply — If you approve, it updates portfolios.py directly.
Optional Finviz API validation — Runs multi_quote() against corrected portfolios to confirm every ticker returns data from Finviz Elite. Catches anything the agents missed.

Usage

/validate-portfolios # Validate all 22 portfolios
/validate-portfolios most_shorted # Validate one specific portfolio

Cost

~$0.44 per full run (22 Haiku agents at ~$0.02 each). Takes 2–3 minutes since they all run in parallel. Cheap insurance against stale data corrupting 15 signal generators.

The Threads Sentiment Integration

This is the part I’m most pleased with.

The original repo has a sentiment_agent that asks Claude to guess what social media sentiment looks like for a given ticker. It’s literally just the LLM imagining what people might be saying. That’s not sentiment analysis — that’s creative writing.

The replacement uses the Threads API to run actual keyword searches for ticker symbols and company names. Real post counts. Real engagement metrics. Real text that real humans wrote on a real social network. The data goes into a threads_sentiment table in Postgres, and Signal 16 reads from it during the pipeline.

Is Threads the best source of financial sentiment? No, that would be Twitter/X or StockTwits. But Threads is what I have API access to, and real data from an imperfect source beats imagined data from a perfect one every time.

Why This Matters (The Philosophy Bit)

The original ai-hedge-fund is a showcase for LangGraph multi-agent orchestration. It’s architecturally interesting. But as an actual trading analysis tool, it has a fundamental problem: the LLM is doing work that math should do.

Computing whether RSI is below 30 doesn’t require intelligence. Comparing P/E ratios to sector averages doesn’t require intelligence. Checking if a stock is above its 200-day SMA doesn’t require intelligence. These are table lookups and arithmetic.

What does require intelligence is synthesizing 15 different signals into a coherent narrative with position sizing and risk management. That’s where the LLM earns its $0.03.

The mechanical approach is:

Deterministic — same data, same scores, every time
Auditable — every signal has a SQL query you can inspect
Cheap — $0.05 vs $0.82 per analysis
Fast — SQL queries over pre-loaded Postgres, not 18 sequential API calls
Transparent — you can see exactly why a ticker scored +73 or -41

The LLM-everything approach is:

Non-deterministic — different analysis each run
Opaque — “Buffett agent says buy” is not auditable
Expensive — 18 API calls per run
Slow — sequential agent chain with retry logic

Both are valid engineering. But if you’re actually going to look at the numbers and make decisions, you want the one where the numbers mean something.

Running It Yourself

Prerequisites

# Docker for Postgres
docker compose up -d

# Python deps
poetry install

# Seed the database
poetry run python src/data/scheduler.py --seed

# First data pull (Finviz + Yahoo backfill)
poetry run python src/data/scheduler.py --once finviz
poetry run python src/data/scheduler.py --once backfill

Environment variables

# .env in project root
FINVIZ_ELITE_AUTH=your_finviz_export_token # CSV export only, not sensitive
DATABASE_URL=postgresql://user:pass@localhost:5432/market_data
ANTHROPIC_API_KEY=sk-ant-... # Only needed for the final LLM call

Ongoing data pipeline

# Run the scheduler as a daemon (pulls Finviz every 4hrs, Yahoo daily)
poetry run python src/data/scheduler.py

# Or one-off pulls
poetry run python src/data/scheduler.py --once finviz
poetry run python src/data/scheduler.py --once yahoo

Then just /market-analysis in Claude Code whenever you want a fresh read.

File Map

Path	Purpose
`.claude/commands/market-analysis.md`	The `/market-analysis` skill definition
`.claude/commands/validate-portfolios.md`	The `/validate-portfolios` skill definition
`docs/signal-generators.md`	Full documentation of all 15 signal generators with SQL sketches
`src/data/scheduler.py`	Data pipeline scheduler (Finviz + Yahoo pulls)
`src/tools/finviz/client.py`	Finviz Elite HTTP client with rate limiting
`src/tools/finviz/portfolios.py`	22 portfolio definitions, 639 tickers
`src/tools/finviz/filter_registry.json`	105 filter categories, 3390 values
`src/agents/`	Original LLM agents (mechanical ones still useful as reference)
`src/graph/state.py`	LangGraph AgentState definition

Cost Comparison

Approach	Per Run	What You Get
Original 18-agent	~$0.82	18 LLM opinions, non-deterministic
Mechanical + 1 LLM call	~$0.02–0.05	15 deterministic signals + 1 synthesis
Portfolio validation	~$0.44	22 parallel audits of 639 tickers
Full pipeline (validate + analyze)	~$0.49	Clean data + scored analysis

You could run the full pipeline every day for a month and spend less than two runs of the original.

The Problem with LLM Hedge Funds

Architecture

Data layer

The 22 portfolios

Skill 1: /market-analysis

What it does

Usage

Output

Cost

Skill 2: /validate-portfolios

The problem it solves

What it does

Usage

Cost

The Threads Sentiment Integration

Why This Matters (The Philosophy Bit)

Running It Yourself

Prerequisites

Environment variables

Ongoing data pipeline

File Map

Cost Comparison

Skill 1: `/market-analysis`

Skill 2: `/validate-portfolios`