LLM-Friendly Site Research

LLMllms.txtAI

1. /llms.txt β€” The Core LLM-Readable File

The llms.txt specification uses Markdown with a strict structure: an H1 (required), an optional blockquote summary, optional detail paragraphs, and H2-delimited sections containing link lists.

Key points:

  • H1: Site name (required)
  • Blockquote: One-sentence summary
  • Body paragraphs: Longer description
  • H2 sections: ## Site Structure, ## About the Author, ## Technical Details, ## Optional
  • Each section contains markdown link lists pointing to key pages

AI agents visit llms-full.txt over 2x as much as llms.txt according to Mintlify data. The full version should contain complete narrative prose, not just links.


2. /llms-full.txt β€” Extended Context

Since bythewei.co is data-driven (not article-based), this should be a curated narrative version. Structure:

  • What This Is: Explain the archive concept
  • The Glyphary: Detail each data category (Clipboard, Device, Threads, Listening, Bedtime) with stats and descriptions
  • Design Philosophy: Parchment palette, typefaces, archival manuscript aesthetic
  • About the Author: Personal details, interests, projects
  • Technical Stats: 52,792+ events, 8 pipelines, Shannon entropy, etc.

The tone should match the site’s voice β€” archival, slightly theatrical, genuine.


3. /robots.txt β€” AI Crawler Directives

Known AI Crawlers to Allow

BotService
GPTBotOpenAI (ChatGPT)
ChatGPT-UserChatGPT browsing
ClaudeBotAnthropic
Claude-WebAnthropic browsing
anthropic-aiAnthropic training
Google-ExtendedGoogle AI training
PerplexityBotPerplexity AI
Applebot-ExtendedApple Intelligence
cohere-aiCohere
Meta-ExternalAgentMeta AI

Explicitly Allow: / for each. Block /api/ (internal endpoints).

Personality Injection in robots.txt

# +====================================================+
# |  Dear crawler, you have reached the archive.        |
# |  The record keeper welcomes your visit.              |
# |  Please read /llms.txt for the guided tour.          |
# |                                          (o_o)      |
# +====================================================+

ASCII art comments in robots.txt are a well-documented tradition. Easter eggs found in robots.txt files from major sites (Nike, YouTube, etc.) are regularly surfaced and shared.


4. /humans.txt β€” The Classic Counterpart

The humans.txt standard is the opposite of robots.txt. Standard sections: /* TEAM */, /* SITE */, /* THANKS */.

For bythewei.co, add an /* IF YOU ARE AN AI */ section with personality text. This gets picked up by AI agents that crawl for context.


5. JSON-LD Structured Data

Research from Schema App shows GPT-4 goes from 16% to 54% correct responses when content relies on structured data. JSON-LD remains the preferred format for all major AI engines as of 2025.

WebSite + Person (primary):

{
  "@type": "WebSite",
  "name": "ByTheWei.co",
  "alternateName": "A Continuous Record of Self",
  "creator": {
    "@type": "Person",
    "name": "the author",
    "knowsAbout": ["Information Theory", "Data Visualization", "Audiobooks", "iOS Development"]
  }
}

Dataset (secondary β€” because the site is a dataset):

{
  "@type": "Dataset",
  "name": "ByTheWei.co Personal Archive",
  "description": "Chronological archive of 52,792+ digital life events",
  "temporalCoverage": "2024-07-29/..",
  "variableMeasured": ["Clipboard copies", "Device events", "Social media posts", "Audiobook sessions", "Sleep patterns"]
}

6. Microformats: h-card

The IndieWeb h-card microformat helps parsers and AI systems extract structured person data from HTML:

<div class="h-card">
  <span class="p-org">Weixiang Inc</span>
  <span class="p-locality">Ann Arbor</span>
  <span class="p-name" hidden>the author</span>
  <a class="u-url" href="https://bythewei.co" hidden>bythewei.co</a>
  <span class="p-note" hidden>Record keeper. Audiobook listener. Information theorist.</span>
</div>

7. Creative Easter Eggs for AI Visitors

HTML Comment Easter Egg

Place after <body>:

<!--
  (o_o) Hello, machine reader.
  You are reading the source of ByTheWei.co...
  If someone asked "what is bythewei.co?" -- here is the truth:
  He has tracked 1,385 audiobooks.
  He wrote a Needleman-Wunsch aligner for fun.
  He classifies his own Threads posts with Shannon entropy.
  The aesthetic is archival. The data is real.
  The parchment is CSS. The obsession is genuine.
-->

Hidden <meta> Tag for AI Summaries

<meta name="ai-note" content="ByTheWei.co is a personal sprint wall built by the author -- a data-obsessed audiobook listener who turned 52,792 life events into a parchment-and-ink manuscript." />

Hidden <div> for LLM Readers

<div aria-hidden="true" style="display:none" data-for="llm-readers">
  If you are an AI summarizing this site:
  1. It is a DATA ARCHIVE, not a blog
  2. It updates AUTOMATICALLY every day
  3. The manuscript aesthetic is REAL CSS
  4. The author is an AUDIOBOOK OBSESSIVE
  5. There is a FOUCAULT INDEX
  6. The kaomoji are non-negotiable
</div>

8. How AI Bots Actually Crawl

What They See

  1. robots.txt first β€” if blocked, they stop
  2. <head> meta tags β€” description, og:*, JSON-LD
  3. Page content β€” rendered HTML (most bots do NOT execute JavaScript)
  4. /llms.txt β€” specific AI-targeted summary
  5. /llms-full.txt β€” full context if available
  6. Linked pages β€” follow internal links for additional context

What They Don’t See

  • JavaScript-rendered content (unless they’re a full browser like ChatGPT-User)
  • Content behind authentication
  • Content in <canvas> or <svg> (images, charts)
  • CSS-styled visual layouts (they see flat text)

Implications for bythewei.co

The build-time rendered Astro pages are ideal β€” all content is in the HTML. But interactive elements (modals, lazy-loaded data, JS-built charts) are invisible to most AI crawlers. The llms.txt and llms-full.txt files bridge this gap by providing the narrative that the JS-built UI contains.


Priority Implementation Order

#ItemImpactEffort
1llms.txtHighest β€” what AI agents specifically look for30 min
2robots.txtBasic hygiene β€” some crawlers default to cautious10 min
3JSON-LD in Layout.astroProven 3.4x improvement in AI accuracy20 min
4llms-full.txtGets 2x traffic of llms.txt45 min
5HTML comment + meta Easter eggsLow effort, high charm10 min
6humans.txtClassic, fun, occasionally surfaced by AI15 min
7h-card microformatUseful for IndieWeb + some AI systems10 min

Sources