Last updated: 2026-02-19

1. Data Sources

1.1 Apple Books Highlights (primary bookmark source)

Origin: Apple Books app on iOS/macOS
Export path: Highlights are exported via DataJar (iOS automation app) or directly as raw JSON
Raw format: JSON array with fields including author, book_title, date, time, highlights (or highlight), location, notes, tags, progress
Location data: The raw export includes a location field containing real street addresses (e.g., "Home"). This is stripped during processing for privacy.
Boilerplate: Apple Books appends "Excerpt From\n{title}\n{author}\nThis material may be protected by copyright." to every highlight. This is removed during processing.
Raw file: /Users/weixiangzhang/Local_Dev/projects/bythewei/src/data/bookmarks.json (unprocessed, contains location and boilerplate)

1.2 DataJar Exports (alternate bookmark source)

Origin: DataJar app on iOS — a structured key-value store used for Shortcuts automation
Export formats: Three supported container formats:
Bare JSON (store.json, "store 2.json")
ZIP archive ("Data Jar YYYY-MM-DD ....zip" containing store.json)
.datajar file ("YYYY-MM-DD HH.MM.datajar" — ZIP containing root.json)
Internal schema: Deeply nested typed-value tree:

root.children.BookmarkRecords.value.value[]
-> entry.value.value (dictionary of field nodes)

Each field is wrapped: { value: { type: "string", value: "actual content" } }

Note: Some older entries (~8 from Feb 2025 Dune reads) use highlight (singular) instead of highlights (plural). Similarly, note vs notes appears in some entries.

1.3 Google Sheets CSV (reading log source)

Origin: Manually maintained Google Sheet titled “Reading Data - Primary”
Export path: Downloaded as CSV to ~/Downloads/Reading Data - Primary.csv
Coverage: 150 books read from February 2019 through February 2020 (one full reading year)
Total pages tracked: 41,086 pages across 150 books
Manual entry: All fields are hand-entered by the reader, including emotional responses, difficulty ratings, and discovery source

1.4 Manual / Static Data

src/data/sprint.json: Hand-authored sprint board data for the homepage sticky-note wall. Contains metadata, stats, column definitions with sticky notes, and a bottom row. Updated manually per sprint.
src/data/kaomojis.ts: 200 hand-curated kaomojis organized by category (happy, excited, surprised, angry, sad, cool, fighting, love, shrug, animals, magic, tired, chaos). Used for deterministic daily rotation on the site via date-seeded pseudo-random selection.

2. Data Schemas

2.1 `bookmarks.clean.json` — Processed Highlights

Location: public/data/bookmarks.clean.json and src/data/bookmarks.clean.json (identical copies) Size: 17,572 lines, 1,357 entries, 156 unique books Date range: 2022-01-02 to 2026-02-03 Sort order: Newest first (descending by date_read)

Field	Type	Nullable	Description
`id`	`string`	No	SHA-1 hex digest of `lowercase(author) + "
`author`	`string`	No	Author name, normalised from “Last, First” to “First Last” for simple single-author cases. Empty string if unknown.
`book_title`	`string`	No	Full book title. Empty string if unknown (1 orphan entry exists).
`date_read`	`string \| null`	Yes	ISO 8601 date (`YYYY-MM-DD`). Parsed from Apple Books “MMM DD, YYYY” format. Null if date could not be parsed.
`date`	`string \| null`	Yes	Alias for `date_read`. Kept for backward compatibility — `index.astro` reads `entry.date` for the Quote of the Day feature. Always identical to `date_read`.
`highlights`	`string`	No	The highlight/passage text. Cleaned of Apple Books boilerplate, leading/trailing curly quotes, collapsed excessive newlines. Entries with empty highlights are dropped entirely.
`notes`	`string \| null`	Yes	User-added notes on the highlight. Trimmed. Null if none.
`tags`	`string[]`	No	Array of tag strings. Parsed from comma/semicolon-separated string or passed through if already an array. Empty array if no tags.
`source`	`string`	No	One of: `"apple_books"`, `"kindle"`, `"readwise"`, `"manual"`. Auto-detected from entry properties. Current dataset: 1,347 apple_books + 10 manual.
`word_count`	`number`	No	Word count of the cleaned highlight text. Computed via `text.trim().split(/\s+/).filter(Boolean).length`.

Source detection logic (in order):

Has source_url or readwise_url field -> "readwise"
Has location matching pattern ^\d+[-\u2013]\d+$ -> "kindle" (e.g., “123-456”)
Highlight text matches Excerpt From ... copyright pattern -> "apple_books"
Has any location field -> "apple_books"
Otherwise -> "manual"

ID generation:

SHA-1( lowercase(author) + "||" + lowercase(book_title) + "||" + lowercase(highlight[0:100]) )

Uses only the first 100 characters of the highlight so IDs remain stable even when boilerplate trimming changes the tail.

2.2 `reading-log.json` — Reading Year Log

Location: public/data/reading-log.json and src/data/reading-log.json (identical copies) Size: 3,751 lines, 150 entries Date range: 2019-02-19 to 2020-02-27 (one calendar reading year) Sort order: Ascending by date_finished (nulls last)

Field	Type	Nullable	Description
`title`	`string \| null`	Yes	Book title. Note: field is `title` here vs `book_title` in bookmarks.
`author`	`string \| null`	Yes	Author name in “Last, First” format (NOT normalised to “First Last” — differs from bookmarks schema).
`date_started`	`string \| null`	Yes	ISO 8601 date (`YYYY-MM-DD`). Parsed from `M/D/YYYY` or `M/D/YY` CSV format.
`date_finished`	`string \| null`	Yes	ISO 8601 date (`YYYY-MM-DD`). Same parsing as `date_started`.
`days_to_read`	`number \| null`	Yes	Computed: `Math.round((date_finished - date_started) / 86400000)`. Null if either date is missing.
`rating`	`number \| null`	Yes	1-5 integer rating. Null if not rated.
`gender`	`string \| null`	Yes	Author gender: `"F"` (female), `"M"` (male), `"N"` (non-binary). Null if unknown.
`poc`	`boolean \| null`	Yes	Whether the author is a person of color. `true`/`false`/`null`. Parsed from `"Y"`/`"N"` in CSV.
`emotions`	`string[]`	No	Array of emotional responses to the book. Known values: `"Happy"`, `"Sad"`, `"Angry"`, `"Anger"`, `"Bored"`, `"Empowered"`, `"Interesting"`, `"Funny"`. See Known Issues for the Anger/Angry problem.
`emotional_output`	`string \| null`	Yes	Aggregate sentiment: `"Positive"`, `"Negative"`, `"Neutral"`. Null if not specified.
`difficulty`	`number \| null`	Yes	1-5 integer difficulty rating.
`publisher`	`string \| null`	Yes	Publisher name.
`year_published`	`number \| null`	Yes	Year of publication as integer.
`pages`	`number \| null`	Yes	Page count of the book.
`running_pages`	`number \| null`	Yes	Cumulative page count across all books read (running total).
`fiction`	`string \| null`	Yes	One of: `"Fiction"`, `"Non-Fiction"`, `"Graphic Novel"`. Normalised from CSV variants.
`genre`	`string \| null`	Yes	Free-text genre label (e.g., “Race Studies”, “Fantasy”, “Productivity”, “Autobiography”).
`country`	`string \| null`	Yes	Country of origin/setting (e.g., “USA”, “Russia”).
`why`	`string \| null`	Yes	Why the book was chosen (e.g., “Word of Mouth”, “Curious”, “Utility”).
`why_source`	`string \| null`	Yes	Where the recommendation came from (e.g., “Online Forums”, “Friend”, “Colleague”, “Publicity”).
`review`	`string \| null`	Yes	Text review. Filtered: entries containing “didn’t have time”, “did not have time”, or “review in progress” are set to null.

Emotion value distribution (150 books):

Interesting: 70
Happy: 20
Sad: 16
Bored: 12
Funny: 12
Anger: 10
Empowered: 8
Angry: 2

2.3 `sprint.json` — Homepage Sprint Board

Location: src/data/sprint.json Usage: Imported directly into index.astro at build time. Drives the sticky-note wall UI.

{
 meta: { date, title, subtitle }
 stats: [{ value, label }]
 columns: [{
 header: string,
 stickies: [{
 color: "g"|"y"|"o"|"r"|"b"|"p"|"w"|"teal",
 size: "big"|null,
 rotation: "r1"-"r7",
 tape: boolean,
 stamp: "DONE"|"IN PROGRESS"|null,
 title: string,
 body: string,
 tag: string|null,
 blocker: boolean? // optional
 }]
 }]
 bottom_row: [{
 color, rotation, tape: "center"|"left"|"right"|null,
 title, body
 }]
 footer: string
}

2.4 `kaomojis.ts` — Kaomoji Collection

Location: src/data/kaomojis.ts Type: export const kaomojis: string[] Count: 200 kaomojis Categories: Happy/Wholesome (20), Excited/Celebrating (20), Surprised/Shocked (20), Angry/Table Flip (20), Sad/Crying (20), Cool/Smug (20), Fighting/Determined (20), Love/Affectionate (20), Shrug/Whatever (10), Animals (15), Magic/Sparkle (10), Tired/Done (10), Running/Chaos (10)

Client-side rotation logic (in index.astro):

// Date-seeded: same day -> same kaomojis for every visitor worldwide
const seed = year * 10000 + (month + 1) * 100 + day;
// Mulberry32 PRNG seeded with date
// Each [data-kaomoji] element gets a deterministic pick

3. Pipeline Scripts

All scripts live in /Users/weixiangzhang/Local_Dev/projects/bythewei/scripts/.

3.1 `extract-datajar.mjs` — DataJar Export Extractor

Purpose: Extracts highlight/bookmark data from DataJar JSON exports into a flat JSON array.

Input: DataJar export file in one of three formats:

Bare JSON (store.json)
ZIP archive (Data Jar YYYY-MM-DD ....zip)
.datajar file (YYYY-MM-DD HH.MM.datajar)

Output: JSON array of raw bookmark records with the schema:

{
 "author": "string|null",
 "book_title": "string|null",
 "date": "MMM DD, YYYY|null",
 "time": "string|null",
 "highlights": "string",
 "notes": "string|null",
 "tags": "string|null",
 "source": "datajar",
 "progress": "number|null",
 "location": "string|null"
}

Usage:

# JSON to stdout, summary to stderr
node scripts/extract-datajar.mjs store.json

# JSON to file, summary to stdout
node scripts/extract-datajar.mjs "Data Jar 2025-06-12 19.44.30.zip" datajar-2025.json

Key details:

Custom zero-dependency ZIP parser (supports DEFLATE method 8 and stored method 0)
Handles data-descriptor entries (bit 3 flag) by falling back to central directory metadata
.datajar files use root.json internally; .zip files use store.json
Entries without highlight text are skipped
Handles both highlights (plural) and highlight (singular) field names
Prints summary with date range, top books by highlight count, and sample previews

3.2 `process-bookmarks.mjs` — Full Bookmark Pipeline (primary)

Purpose: The main processing pipeline. Takes raw Apple Books JSON and produces the clean, deduplicated, privacy-safe bookmarks.clean.json.

Input: Raw bookmark JSON (default: src/data/bookmarks.json)

Output: Written to BOTH:

src/data/bookmarks.clean.json
public/data/bookmarks.clean.json

Processing steps (in order):

Load raw JSON array
Resolve highlight text (normalise highlight -> highlights key)
Auto-detect source format (apple_books / kindle / readwise / manual)
Strip location field (contains real addresses — privacy)
Clean highlight text (remove Apple Books boilerplate, curly quotes, collapse whitespace)
Drop entries with empty highlights after cleaning
Normalise author names (“Last, First” -> “First Last”)
Generate stable SHA-1 ID for deduplication
Deduplicate by ID (first occurrence wins)
Parse dates to ISO 8601
Compute word count
Sort newest-first
Write output + print summary report

Usage:

# Default input (src/data/bookmarks.json)
node scripts/process-bookmarks.mjs

# Custom input
node scripts/process-bookmarks.mjs ~/Downloads/apple-books-export.json

3.3 `merge-bookmarks.mjs` — Incremental Merge

Purpose: Merges a NEW raw export into the existing bookmarks.clean.json without wiping or re-processing existing data. Designed for incremental updates.

Input: Path to a new raw export JSON file

Output: Updated bookmarks.clean.json in both src/data/ and public/data/

Merge strategy:

Load existing bookmarks.clean.json (the live DB)
Index existing entries by ID in a Map for O(1) lookup
Process new entries through the same pipeline as process-bookmarks.mjs
Match by stable SHA-1 ID
New entries are appended; existing entries are preserved unchanged (no overwrites)
Merged list is sorted newest-first
Write back to both output locations

Usage:

node scripts/merge-bookmarks.mjs ~/Downloads/apple-books-export-march.json

Output report:

New entries added : 42
Already existed (skipped): 315
Empty/dropped : 3
Total in DB now : 1399

3.4 `strip-location.mjs` — Legacy Processor (superseded)

Purpose: The original bookmark processor. Superseded by process-bookmarks.mjs but still functional.

Differences from process-bookmarks.mjs:

No SHA-1 ID generation (deduplicates by exact author||book_title||highlights string match)
No source detection
No author name normalisation
No word count computation
Preserves date, time, progress, notes, tags as-is (does not transform to ISO dates)
Simpler output schema (no id, date_read, source, word_count fields)

Usage:

node scripts/strip-location.mjs [input]
# Default input: src/data/bookmarks.json

3.5 `convert-reading-log.mjs` — CSV to JSON Converter

Purpose: Converts the Google Sheets CSV reading log into JSON.

Input: Hardcoded path: ~/Downloads/Reading Data - Primary.csv

Output: Written to BOTH:

src/data/reading-log.json
public/data/reading-log.json

Processing details:

Custom RFC-4180 CSV parser (handles quoted fields with embedded commas and newlines)
Expects exactly 20 columns per row
Date parsing: M/D/YYYY or M/D/YY (2-digit years treated as 20xx) -> YYYY-MM-DD
days_to_read computed from start/finish dates
Emotions parsed from comma-separated string to array
Reviews filtered: “didn’t have time” / “review in progress” -> null
Fiction normalised to exact enum values
Sorted ascending by date_finished (nulls last)
Prints stats: total count, date range, rating distribution, top 10 genres, review counts, parsing errors

Usage:

node scripts/convert-reading-log.mjs

Column mapping (0-indexed from CSV headers):

0: Date Started -> date_started
1: Date Finished -> date_finished
2: Title -> title
3: Author -> author
4: Gender -> gender
5: POC -> poc
6: Rating -> rating
7: Emotions -> emotions
8: Emotional Output -> emotional_output
9: Difficulty -> difficulty
10: Publisher -> publisher
11: Year Published -> year_published
12: Pages -> pages
13: Running Pages -> running_pages
14: Fiction or Non -> fiction
15: Genre -> genre
16: Country -> country
17: Why -> why
18: Why Source -> why_source
19: Review -> review

3.6 `verify-clean.mjs` — Quality Check

Purpose: Quick verification script to spot-check the cleaned bookmark data.

Input: Reads src/data/bookmarks.clean.json (hardcoded relative path)

Checks performed:

Dumps first 5 entries showing book, author, highlight start/end (JSON-escaped for visibility)
Counts entries still containing Apple Books boilerplate (“excerpt from”, “this material may”)
Counts entries with leading/trailing whitespace in highlights
Reports total count, shortest highlight, longest highlight, median highlight length

Usage:

node scripts/verify-clean.mjs

4. Data Flow

4.1 Bookmark Pipeline

 DataJar app (iOS)
 |
 v
 extract-datajar.mjs Apple Books (direct JSON export)
 | |
 v v
 raw JSON array raw JSON array
 | |
 +----------+---------------------+
 |
 v
 src/data/bookmarks.json
 (raw, with location + boilerplate)
 |
 +---------------+---------------+
 | |
 v v
 process-bookmarks.mjs merge-bookmarks.mjs
 (full rebuild) (incremental update)
 | |
 +---------------+---------------+
 |
 v
 bookmarks.clean.json
 |
 +---------------+---------------+
 | |
 v v
 src/data/bookmarks.clean.json public/data/bookmarks.clean.json
 | |
 v v
 (available at build time) (served at /data/bookmarks.clean.json)
 |
 v
 index.astro client-side fetch()
 |
 +----------+----------+
 | | |
 v v v
 QOTD Catalog Journal
 (quote of (book list (timeline
 the day) modal) modal)

Key points:

bookmarks.clean.json is written to TWO locations: src/data/ (for build-time import) and public/data/ (for runtime client-side fetch)
The client fetches from /data/bookmarks.clean.json at page load, not at build time
QOTD selects a highlight deterministically based on the current date
The same fetched data powers the catalog modal (group by book, filter) and journal modal (timeline view)

4.2 Reading Log Pipeline

 Google Sheets ("Reading Data - Primary")
 |
 v (manual CSV download)
 ~/Downloads/Reading Data - Primary.csv
 |
 v
 convert-reading-log.mjs
 |
 +---------+---------+
 | |
 v v
 src/data/reading-log.json public/data/reading-log.json
 | |
 v v
 (build-time reference) (served at /data/reading-log.json)
 |
 v
 index.astro client-side fetch()
 |
 v
 "THE READING YEAR" modal
 (hidden bookshelf UI)

Key points:

The reading log fetch is lazy — it only triggers when the user opens the hidden reading year modal (via the shelf trigger pin)
The modal shows stats (genre breakdown, emotion distribution, author demographics) computed client-side from the JSON

4.3 Static Data (no pipeline)

 src/data/sprint.json -----> imported at build time by index.astro
 -> renders sticky note wall

 src/data/kaomojis.ts -----> imported at build time by index.astro
 -> injected as define:vars for client-side rotation script

5. Cross-Referencing

5.1 Dataset Overlap

The two primary datasets — bookmarks and reading log — have zero book title overlap. They cover entirely different time periods and use different title/author schemas:

Property	bookmarks.clean.json	reading-log.json
Time period	Jan 2022 — Feb 2026	Feb 2019 — Feb 2020
Entry count	1,357 highlights	150 books
Unique books	156	150
Title field	`book_title`	`title`
Author format	”First Last” (normalised)	“Last, First” (CSV original)
Books in common	0	0
Granularity	Per-highlight (many per book)	Per-book (one entry per book)

5.2 Schema Differences

The two datasets were designed independently and have several naming inconsistencies:

Concept	Bookmarks	Reading Log	Notes
Book title	`book_title`	`title`	Different field names
Author	`author` (“First Last”)	`author` (“Last, First”)	Different name order
Date	`date_read` / `date` (ISO)	`date_started` / `date_finished` (ISO)	Different semantics
Rating	(none)	`rating` (1-5)	Bookmarks have no rating
Emotions	(none)	`emotions` (array)	Bookmarks have no emotions
Word count	`word_count`	`pages`	Different unit of measurement
Genre	(none)	`genre`	Bookmarks have no genre
Source	`source` (auto-detected)	(none)	Reading log has no source

5.3 Potential Future Unification

If the datasets were to be merged or cross-referenced:

Author normalisation would need to be applied to reading-log data (“Last, First” -> “First Last”)
Title field would need aliasing (title <-> book_title)
The gap between Feb 2020 and Jan 2022 means there are ~2 years of untracked reading

6. Known Issues

6.1 “Anger” vs “Angry” Emotion Normalization

The reading log CSV source uses both "Anger" (10 occurrences) and "Angry" (2 occurrences) to represent the same emotion. The convert-reading-log.mjs script does NOT normalise these — it passes emotions through from the CSV as-is via simple comma-split:

function parseEmotions(str) {
 if (!str || str.trim() === '') return [];
 return str.split(',').map(e => e.trim()).filter(Boolean);
}

Impact: Any client-side code that groups or counts by emotion will treat “Anger” and “Angry” as separate categories. The current distribution is:

"Anger": 10 books
"Angry": 2 books

Fix: Add normalisation in parseEmotions() or in the CSV source itself. Recommended target: "Angry" (adjective form, consistent with "Happy", "Sad", "Funny").

6.2 Orphan Title Problem

There is 1 entry in bookmarks.clean.json with an empty book_title (empty string ""):

Author: empty string
Date: 2022-01-02
Highlight preview: "This is an" (truncated)
Root cause: The raw Apple Books export contained an entry with no book metadata. The processing pipeline preserves entries as long as they have non-empty highlight text, even without book/author data.

Impact: This entry will appear in the QOTD rotation without attribution. In the catalog modal, it would appear under an empty book title.

6.3 Author Name Format Inconsistency

Bookmarks: Authors are normalised to “First Last” format by normaliseAuthor() in process-bookmarks.mjs (e.g., “Huffer, Lynne” -> “Lynne Huffer”)
Reading log: Authors remain in “Last, First” format from the CSV (e.g., “Rios, Victor”)

This means the same author would appear differently in each dataset, preventing naive string matching for cross-referencing.

6.4 Duplicate `date` / `date_read` Fields

Every entry in bookmarks.clean.json carries both date_read and date with identical values. The date field exists solely for backward compatibility with index.astro’s QOTD code, which reads entry.date. This duplication adds ~20KB to the JSON file. The comment in process-bookmarks.mjs documents this:

// 'date' alias kept for QOTD backward-compatibility (index.astro reads entry.date)
date: dateIso,

6.5 Hardcoded CSV Path

The convert-reading-log.mjs script has a hardcoded absolute path for its CSV input:

const CSV_PATH = '/Users/weixiangzhang/Downloads/Reading Data - Primary.csv';

This is not configurable via command-line arguments (unlike the bookmark scripts). Running on a different machine or after moving the CSV will fail silently.

6.6 `strip-location.mjs` Superseded but Not Removed

The original strip-location.mjs is still in the scripts directory but has been functionally replaced by process-bookmarks.mjs, which does everything strip-location.mjs does plus adds SHA-1 IDs, source detection, author normalisation, and word counts. Running strip-location.mjs would produce output in a different schema than what the site expects.

6.7 DataJar `source` Field Mismatch

Entries extracted via extract-datajar.mjs are tagged with source: "datajar", but after processing through process-bookmarks.mjs, the source is re-detected based on entry properties and typically overwritten to "apple_books" or "manual". The "datajar" source value does not appear in the final bookmarks.clean.json.

6.8 No Validation of Emotion Values

The reading log pipeline does not validate emotion strings against a known set of values. Any string in the CSV emotions column is accepted. This is how "Anger" and "Angry" both ended up in the data — they were entered inconsistently in the spreadsheet and passed through without validation.

6.9 `running_pages` Inconsistency

The first entry in the reading log (Human Targets) has running_pages: 7700, which is far higher than the book’s 224 pages and inconsistent with the second entry (Circe) having running_pages: 393. This suggests the running pages counter was either reset partway through the reading year or was pre-seeded from prior reading. The field comes directly from the CSV without validation or recomputation.

1. Data Sources

1.1 Apple Books Highlights (primary bookmark source)

1.2 DataJar Exports (alternate bookmark source)

1.3 Google Sheets CSV (reading log source)

1.4 Manual / Static Data

2. Data Schemas

2.1 bookmarks.clean.json — Processed Highlights

2.2 reading-log.json — Reading Year Log

2.3 sprint.json — Homepage Sprint Board

2.4 kaomojis.ts — Kaomoji Collection

3. Pipeline Scripts

3.1 extract-datajar.mjs — DataJar Export Extractor

3.2 process-bookmarks.mjs — Full Bookmark Pipeline (primary)

3.3 merge-bookmarks.mjs — Incremental Merge

3.4 strip-location.mjs — Legacy Processor (superseded)

3.5 convert-reading-log.mjs — CSV to JSON Converter

3.6 verify-clean.mjs — Quality Check

4. Data Flow

4.1 Bookmark Pipeline

4.2 Reading Log Pipeline

4.3 Static Data (no pipeline)

5. Cross-Referencing

5.1 Dataset Overlap

5.2 Schema Differences

5.3 Potential Future Unification

6. Known Issues

6.1 “Anger” vs “Angry” Emotion Normalization

6.2 Orphan Title Problem

6.3 Author Name Format Inconsistency

6.4 Duplicate date / date_read Fields

6.5 Hardcoded CSV Path

6.6 strip-location.mjs Superseded but Not Removed

6.7 DataJar source Field Mismatch

6.8 No Validation of Emotion Values

6.9 running_pages Inconsistency

2.1 `bookmarks.clean.json` — Processed Highlights

2.2 `reading-log.json` — Reading Year Log

2.3 `sprint.json` — Homepage Sprint Board

2.4 `kaomojis.ts` — Kaomoji Collection

3.1 `extract-datajar.mjs` — DataJar Export Extractor

3.2 `process-bookmarks.mjs` — Full Bookmark Pipeline (primary)

3.3 `merge-bookmarks.mjs` — Incremental Merge

3.4 `strip-location.mjs` — Legacy Processor (superseded)

3.5 `convert-reading-log.mjs` — CSV to JSON Converter

3.6 `verify-clean.mjs` — Quality Check

6.4 Duplicate `date` / `date_read` Fields

6.6 `strip-location.mjs` Superseded but Not Removed

6.7 DataJar `source` Field Mismatch

6.9 `running_pages` Inconsistency