From Plex Token Hunt to 'Hey Siri, Audiomark'

claude-codelife-dashboardplexaudiobookswhisperapple-silicon

From Plex Token Hunt to “Hey Siri, Audiomark”

Started with “check synology-garden for the Plex API key.” Ended with a fully operational voice-triggered audiobook transcription pipeline running in under 5 seconds.


The Starting Point

Life-dashboard had reading/highlight support but nothing for audiobooks. Plex has been my audiobook player for years — 205 books, 3.5 years of listening history — but that data was trapped inside the Plex database. The goal was integration.

First task: find the Plex token. Dug through the synology-garden project (NAS management scripts) where it was stored in a config file.

Phase 1: Backfill History

Plex’s /status/sessions/history/all endpoint exposes complete listening history with pagination. Built an async backfill script:

  • Paginated through all history entries
  • Filtered to audiobook library (excluding music/video)
  • Extracted title, author, duration, timestamps
  • Fetched book descriptions from album metadata
  • Converted to UnifiedEvent format with deterministic IDs

Result: 1,061 events, 205 unique books, Sept 2022 through May 2026. Idempotent — safe to re-run.

Phase 2: Periodic Sync

Added a 30-minute background loop (alongside the existing Threads proxy sync) that checks Plex for new listening sessions and ingests them. No more manual backfill needed — new listens appear automatically.

Phase 3: Now-Playing in /api/reading/current

Updated the reading/current endpoint to combine:

  • Text highlights (existing)
  • Audiobook listening events (new)
  • Plex real-time now-playing status

If you’re currently listening to an audiobook, it shows up with live progress.

Phase 4: The Audiomark Endpoint

This is where it got interesting. The realization:

Plex exposes the exact file path on the NAS. The NAS is mounted locally. ffmpeg can seek into any position in a container file in 0.15 seconds. mlx_whisper transcribes 2 minutes of audio in 4 seconds on M2 Pro.

That chain means: query what’s playing, extract the audio around the current position, transcribe it, save it. Total time: ~5 seconds.

Built POST /api/ingest/audiomark:

  1. Hit Plex /status/sessions for current playback
  2. Map NAS path to local mount (/volume1/ to /Volumes/)
  3. Calculate album-wide progress (sum all tracks, not just current file)
  4. ffmpeg extract +/- 1 minute around current position
  5. mlx_whisper transcribe the clip
  6. Store as searchable highlight event

Phase 5: Docker to Host Migration

The pipeline requires:

  • NAS mount access (not available in Docker)
  • ffmpeg binary
  • mlx_whisper (Apple Silicon Neural Engine)

Moved life-dashboard from Docker to host uvicorn. Updated the systemd-equivalent (launchd plist) accordingly.

The Debugging

mlx_whisper output path: It writes transcription files to --output-dir using the input file’s basename, not the full path. When the input was /tmp/audiomark_clip.wav, the output was audiomark_clip.txt in the output dir. Took a minute to catch.

Album progress vs track progress: Plex reports offset within the current track. A book split across 47 files might show “95% complete” when you’re 95% through track 12 of 47. Had to fetch the full track list and sum durations.

Proper noun mangling: whisper-base is fast but inaccurate on names. “Crysknife” -> “Christ’s knife”, “Leto” -> “Latos”. Acceptable for bookmarking (you know what book you’re in), but worth noting.

The Result

A Siri Shortcut that sends a single POST request. Five seconds later, the passage you’re listening to is transcribed and searchable in the life-dashboard event log. Title, author, progress, and the actual text of the passage — all indexed.

This is the audiobook bookmarking workflow I’ve been trying to build for years. The missing piece was realizing that Plex + local NAS mount + Apple Silicon inference eliminates every bottleneck that made previous attempts impractical.

Key Technical Decisions

  • Deterministic event IDs: SHA256(source + timestamp + content_prefix). Re-running backfill is safe.
  • Album-wide progress: Sum all tracks, not just current file. Consistent across single-file and multi-file books.
  • whisper-base over whisper-large: 4s vs 30s+ transcription time. Speed matters for a voice-triggered UX.
  • +/- 1 minute extraction: Captures context around the moment you triggered the bookmark.
  • Host uvicorn over Docker: Required for NAS mount + ffmpeg + MLX access. Trade-off accepted.

What Shipped

FeatureDetails
Plex audiobook backfill1,061 events, 205 books, Sept 2022 - May 2026
Periodic sync30-min background loop, automatic new listen ingestion
/api/reading/currentCombined highlights + listens + now-playing
POST /api/ingest/audiomarkVoice-triggered passage transcription, ~5s total
highlight quick-syncHighlight ingest auto-checks Plex for title/author context