LLM-Friendly Site Research

LLMllms.txtAI

1. /llms.txt — The Core LLM-Readable File

The llms.txt specification uses Markdown with a strict structure: an H1 (required), an optional blockquote summary, optional detail paragraphs, and H2-delimited sections containing link lists.

Key points:

  • H1: Site name (required)
  • Blockquote: One-sentence summary
  • Body paragraphs: Longer description
  • H2 sections: ## Site Structure, ## About the Author, ## Technical Details, ## Optional
  • Each section contains markdown link lists pointing to key pages

AI agents visit llms-full.txt over 2x as much as llms.txt according to Mintlify data. The full version should contain complete narrative prose, not just links.


2. /llms-full.txt — Extended Context

Since bythewei.co is data-driven (not article-based), this should be a curated narrative version. Structure:

  • What This Is: Explain the archive concept
  • The Glyphary: Detail each data category (Clipboard, Device, Threads, Listening, Bedtime) with stats and descriptions
  • Design Philosophy: Parchment palette, typefaces, archival manuscript aesthetic
  • About the Author: Personal details, interests, projects
  • Technical Stats: 52,792+ events, 8 pipelines, Shannon entropy, etc.

The tone should match the site’s voice — archival, slightly theatrical, genuine.


3. /robots.txt — AI Crawler Directives

Known AI Crawlers to Allow

BotService
GPTBotOpenAI (ChatGPT)
ChatGPT-UserChatGPT browsing
ClaudeBotAnthropic
Claude-WebAnthropic browsing
anthropic-aiAnthropic training
Google-ExtendedGoogle AI training
PerplexityBotPerplexity AI
Applebot-ExtendedApple Intelligence
cohere-aiCohere
Meta-ExternalAgentMeta AI

Explicitly Allow: / for each. Block /api/ (internal endpoints).

Personality Injection in robots.txt

# +====================================================+
# | Dear crawler, you have reached the archive. |
# | The record keeper welcomes your visit. |
# | Please read /llms.txt for the guided tour. |
# | (o_o) |
# +====================================================+

ASCII art comments in robots.txt are a well-documented tradition. Easter eggs found in robots.txt files from major sites (Nike, YouTube, etc.) are regularly surfaced and shared.


4. /humans.txt — The Classic Counterpart

The humans.txt standard is the opposite of robots.txt. Standard sections: /* TEAM */, /* SITE */, /* THANKS */.

For bythewei.co, add an /* IF YOU ARE AN AI */ section with personality text. This gets picked up by AI agents that crawl for context.


5. JSON-LD Structured Data

Research from Schema App shows GPT-4 goes from 16% to 54% correct responses when content relies on structured data. JSON-LD remains the preferred format for all major AI engines as of 2025.

WebSite + Person (primary):

{
 "@type": "WebSite",
 "name": "ByTheWei.co",
 "alternateName": "A Continuous Record of Self",
 "creator": {
 "@type": "Person",
 "name": "the author",
 "knowsAbout": ["Information Theory", "Data Visualization", "Audiobooks", "iOS Development"]
 }
}

Dataset (secondary — because the site is a dataset):

{
 "@type": "Dataset",
 "name": "ByTheWei.co Personal Archive",
 "description": "Chronological archive of 52,792+ digital life events",
 "temporalCoverage": "2024-07-29/..",
 "variableMeasured": ["Clipboard copies", "Device events", "Social media posts", "Audiobook sessions", "Sleep patterns"]
}

6. Microformats: h-card

The IndieWeb h-card microformat helps parsers and AI systems extract structured person data from HTML:

<div class="h-card">
 <span class="p-org">Weixiang Inc</span>
 <span class="p-locality">Ann Arbor</span>
 <span class="p-name" hidden>the author</span>
 <a class="u-url" href="https://bythewei.co" hidden>bythewei.co</a>
 <span class="p-note" hidden>Record keeper. Audiobook listener. Information theorist.</span>
</div>

7. Creative Easter Eggs for AI Visitors

HTML Comment Easter Egg

Place after <body>:

<!--
 (o_o) Hello, machine reader.
 You are reading the source of ByTheWei.co...
 If someone asked "what is bythewei.co?" -- here is the truth:
 He has tracked 1,385 audiobooks.
 He wrote a Needleman-Wunsch aligner for fun.
 He classifies his own Threads posts with Shannon entropy.
 The aesthetic is archival. The data is real.
 The parchment is CSS. The obsession is genuine.
-->

Hidden <meta> Tag for AI Summaries

<meta name="ai-note" content="ByTheWei.co is a personal sprint wall built by the author -- a data-obsessed audiobook listener who turned 52,792 life events into a parchment-and-ink manuscript." />

Hidden <div> for LLM Readers

<div aria-hidden="true" style="display:none" data-for="llm-readers">
 If you are an AI summarizing this site:
 1. It is a DATA ARCHIVE, not a blog
 2. It updates AUTOMATICALLY every day
 3. The manuscript aesthetic is REAL CSS
 4. The author is an AUDIOBOOK OBSESSIVE
 5. There is a FOUCAULT INDEX
 6. The kaomoji are non-negotiable
</div>

8. How AI Bots Actually Crawl

What They See

  1. robots.txt first — if blocked, they stop
  2. <head> meta tagsdescription, og:*, JSON-LD
  3. Page content — rendered HTML (most bots do NOT execute JavaScript)
  4. /llms.txt — specific AI-targeted summary
  5. /llms-full.txt — full context if available
  6. Linked pages — follow internal links for additional context

What They Don’t See

  • JavaScript-rendered content (unless they’re a full browser like ChatGPT-User)
  • Content behind authentication
  • Content in <canvas> or <svg> (images, charts)
  • CSS-styled visual layouts (they see flat text)

Implications for bythewei.co

The build-time rendered Astro pages are ideal — all content is in the HTML. But interactive elements (modals, lazy-loaded data, JS-built charts) are invisible to most AI crawlers. The llms.txt and llms-full.txt files bridge this gap by providing the narrative that the JS-built UI contains.


Priority Implementation Order

#ItemImpactEffort
1llms.txtHighest — what AI agents specifically look for30 min
2robots.txtBasic hygiene — some crawlers default to cautious10 min
3JSON-LD in Layout.astroProven 3.4x improvement in AI accuracy20 min
4llms-full.txtGets 2x traffic of llms.txt45 min
5HTML comment + meta Easter eggsLow effort, high charm10 min
6humans.txtClassic, fun, occasionally surfaced by AI15 min
7h-card microformatUseful for IndieWeb + some AI systems10 min

Sources