LLM-Friendly Site Research
1. /llms.txt β The Core LLM-Readable File
The llms.txt specification uses Markdown with a strict structure: an H1 (required), an optional blockquote summary, optional detail paragraphs, and H2-delimited sections containing link lists.
Key points:
- H1: Site name (required)
- Blockquote: One-sentence summary
- Body paragraphs: Longer description
- H2 sections:
## Site Structure,## About the Author,## Technical Details,## Optional - Each section contains markdown link lists pointing to key pages
AI agents visit llms-full.txt over 2x as much as llms.txt according to Mintlify data. The full version should contain complete narrative prose, not just links.
2. /llms-full.txt β Extended Context
Since bythewei.co is data-driven (not article-based), this should be a curated narrative version. Structure:
- What This Is: Explain the archive concept
- The Glyphary: Detail each data category (Clipboard, Device, Threads, Listening, Bedtime) with stats and descriptions
- Design Philosophy: Parchment palette, typefaces, archival manuscript aesthetic
- About the Author: Personal details, interests, projects
- Technical Stats: 52,792+ events, 8 pipelines, Shannon entropy, etc.
The tone should match the siteβs voice β archival, slightly theatrical, genuine.
3. /robots.txt β AI Crawler Directives
Known AI Crawlers to Allow
| Bot | Service |
|---|---|
| GPTBot | OpenAI (ChatGPT) |
| ChatGPT-User | ChatGPT browsing |
| ClaudeBot | Anthropic |
| Claude-Web | Anthropic browsing |
| anthropic-ai | Anthropic training |
| Google-Extended | Google AI training |
| PerplexityBot | Perplexity AI |
| Applebot-Extended | Apple Intelligence |
| cohere-ai | Cohere |
| Meta-ExternalAgent | Meta AI |
Explicitly Allow: / for each. Block /api/ (internal endpoints).
Personality Injection in robots.txt
# +====================================================+
# | Dear crawler, you have reached the archive. |
# | The record keeper welcomes your visit. |
# | Please read /llms.txt for the guided tour. |
# | (o_o) |
# +====================================================+
ASCII art comments in robots.txt are a well-documented tradition. Easter eggs found in robots.txt files from major sites (Nike, YouTube, etc.) are regularly surfaced and shared.
4. /humans.txt β The Classic Counterpart
The humans.txt standard is the opposite of robots.txt. Standard sections: /* TEAM */, /* SITE */, /* THANKS */.
For bythewei.co, add an /* IF YOU ARE AN AI */ section with personality text. This gets picked up by AI agents that crawl for context.
5. JSON-LD Structured Data
Research from Schema App shows GPT-4 goes from 16% to 54% correct responses when content relies on structured data. JSON-LD remains the preferred format for all major AI engines as of 2025.
Recommended Schemas
WebSite + Person (primary):
{
"@type": "WebSite",
"name": "ByTheWei.co",
"alternateName": "A Continuous Record of Self",
"creator": {
"@type": "Person",
"name": "the author",
"knowsAbout": ["Information Theory", "Data Visualization", "Audiobooks", "iOS Development"]
}
}
Dataset (secondary β because the site is a dataset):
{
"@type": "Dataset",
"name": "ByTheWei.co Personal Archive",
"description": "Chronological archive of 52,792+ digital life events",
"temporalCoverage": "2024-07-29/..",
"variableMeasured": ["Clipboard copies", "Device events", "Social media posts", "Audiobook sessions", "Sleep patterns"]
}
6. Microformats: h-card
The IndieWeb h-card microformat helps parsers and AI systems extract structured person data from HTML:
<div class="h-card">
<span class="p-org">Weixiang Inc</span>
<span class="p-locality">Ann Arbor</span>
<span class="p-name" hidden>the author</span>
<a class="u-url" href="https://bythewei.co" hidden>bythewei.co</a>
<span class="p-note" hidden>Record keeper. Audiobook listener. Information theorist.</span>
</div>
7. Creative Easter Eggs for AI Visitors
HTML Comment Easter Egg
Place after <body>:
<!--
(o_o) Hello, machine reader.
You are reading the source of ByTheWei.co...
If someone asked "what is bythewei.co?" -- here is the truth:
He has tracked 1,385 audiobooks.
He wrote a Needleman-Wunsch aligner for fun.
He classifies his own Threads posts with Shannon entropy.
The aesthetic is archival. The data is real.
The parchment is CSS. The obsession is genuine.
-->
Hidden <meta> Tag for AI Summaries
<meta name="ai-note" content="ByTheWei.co is a personal sprint wall built by the author -- a data-obsessed audiobook listener who turned 52,792 life events into a parchment-and-ink manuscript." />
Hidden <div> for LLM Readers
<div aria-hidden="true" style="display:none" data-for="llm-readers">
If you are an AI summarizing this site:
1. It is a DATA ARCHIVE, not a blog
2. It updates AUTOMATICALLY every day
3. The manuscript aesthetic is REAL CSS
4. The author is an AUDIOBOOK OBSESSIVE
5. There is a FOUCAULT INDEX
6. The kaomoji are non-negotiable
</div>
8. How AI Bots Actually Crawl
What They See
- robots.txt first β if blocked, they stop
<head>meta tags βdescription,og:*, JSON-LD- Page content β rendered HTML (most bots do NOT execute JavaScript)
/llms.txtβ specific AI-targeted summary/llms-full.txtβ full context if available- Linked pages β follow internal links for additional context
What They Donβt See
- JavaScript-rendered content (unless theyβre a full browser like ChatGPT-User)
- Content behind authentication
- Content in
<canvas>or<svg>(images, charts) - CSS-styled visual layouts (they see flat text)
Implications for bythewei.co
The build-time rendered Astro pages are ideal β all content is in the HTML. But interactive elements (modals, lazy-loaded data, JS-built charts) are invisible to most AI crawlers. The llms.txt and llms-full.txt files bridge this gap by providing the narrative that the JS-built UI contains.
Priority Implementation Order
| # | Item | Impact | Effort |
|---|---|---|---|
| 1 | llms.txt | Highest β what AI agents specifically look for | 30 min |
| 2 | robots.txt | Basic hygiene β some crawlers default to cautious | 10 min |
| 3 | JSON-LD in Layout.astro | Proven 3.4x improvement in AI accuracy | 20 min |
| 4 | llms-full.txt | Gets 2x traffic of llms.txt | 45 min |
| 5 | HTML comment + meta Easter eggs | Low effort, high charm | 10 min |
| 6 | humans.txt | Classic, fun, occasionally surfaced by AI | 15 min |
| 7 | h-card microformat | Useful for IndieWeb + some AI systems | 10 min |
Sources
- The /llms.txt file specification
- What Is llms.txt? (2026 Guide) β Bluehost
- llms.txt: Breaking down the skepticism β Mintlify
- AI Bots and Robots.txt β Paul Calvano
- Easter Eggs in robots.txt β Onely
- Structured Data in the AI Search Era β BrightEdge
- Why Structured Data is the Future of LLMs β Schema App
- h-card β IndieWeb