From Plex Token Hunt to 'Hey Siri, Audiomark'
From Plex Token Hunt to “Hey Siri, Audiomark”
Started with “check synology-garden for the Plex API key.” Ended with a fully operational voice-triggered audiobook transcription pipeline running in under 5 seconds.
The Starting Point
Life-dashboard had reading/highlight support but nothing for audiobooks. Plex has been my audiobook player for years — 205 books, 3.5 years of listening history — but that data was trapped inside the Plex database. The goal was integration.
First task: find the Plex token. Dug through the synology-garden project (NAS management scripts) where it was stored in a config file.
Phase 1: Backfill History
Plex’s /status/sessions/history/all endpoint exposes complete listening history with pagination. Built an async backfill script:
- Paginated through all history entries
- Filtered to audiobook library (excluding music/video)
- Extracted title, author, duration, timestamps
- Fetched book descriptions from album metadata
- Converted to UnifiedEvent format with deterministic IDs
Result: 1,061 events, 205 unique books, Sept 2022 through May 2026. Idempotent — safe to re-run.
Phase 2: Periodic Sync
Added a 30-minute background loop (alongside the existing Threads proxy sync) that checks Plex for new listening sessions and ingests them. No more manual backfill needed — new listens appear automatically.
Phase 3: Now-Playing in /api/reading/current
Updated the reading/current endpoint to combine:
- Text highlights (existing)
- Audiobook listening events (new)
- Plex real-time now-playing status
If you’re currently listening to an audiobook, it shows up with live progress.
Phase 4: The Audiomark Endpoint
This is where it got interesting. The realization:
Plex exposes the exact file path on the NAS. The NAS is mounted locally. ffmpeg can seek into any position in a container file in 0.15 seconds. mlx_whisper transcribes 2 minutes of audio in 4 seconds on M2 Pro.
That chain means: query what’s playing, extract the audio around the current position, transcribe it, save it. Total time: ~5 seconds.
Built POST /api/ingest/audiomark:
- Hit Plex
/status/sessionsfor current playback - Map NAS path to local mount (
/volume1/to/Volumes/) - Calculate album-wide progress (sum all tracks, not just current file)
- ffmpeg extract +/- 1 minute around current position
- mlx_whisper transcribe the clip
- Store as searchable highlight event
Phase 5: Docker to Host Migration
The pipeline requires:
- NAS mount access (not available in Docker)
- ffmpeg binary
- mlx_whisper (Apple Silicon Neural Engine)
Moved life-dashboard from Docker to host uvicorn. Updated the systemd-equivalent (launchd plist) accordingly.
The Debugging
mlx_whisper output path: It writes transcription files to --output-dir using the input file’s basename, not the full path. When the input was /tmp/audiomark_clip.wav, the output was audiomark_clip.txt in the output dir. Took a minute to catch.
Album progress vs track progress: Plex reports offset within the current track. A book split across 47 files might show “95% complete” when you’re 95% through track 12 of 47. Had to fetch the full track list and sum durations.
Proper noun mangling: whisper-base is fast but inaccurate on names. “Crysknife” -> “Christ’s knife”, “Leto” -> “Latos”. Acceptable for bookmarking (you know what book you’re in), but worth noting.
The Result
A Siri Shortcut that sends a single POST request. Five seconds later, the passage you’re listening to is transcribed and searchable in the life-dashboard event log. Title, author, progress, and the actual text of the passage — all indexed.
This is the audiobook bookmarking workflow I’ve been trying to build for years. The missing piece was realizing that Plex + local NAS mount + Apple Silicon inference eliminates every bottleneck that made previous attempts impractical.
Key Technical Decisions
- Deterministic event IDs: SHA256(source + timestamp + content_prefix). Re-running backfill is safe.
- Album-wide progress: Sum all tracks, not just current file. Consistent across single-file and multi-file books.
- whisper-base over whisper-large: 4s vs 30s+ transcription time. Speed matters for a voice-triggered UX.
- +/- 1 minute extraction: Captures context around the moment you triggered the bookmark.
- Host uvicorn over Docker: Required for NAS mount + ffmpeg + MLX access. Trade-off accepted.
What Shipped
| Feature | Details |
|---|---|
| Plex audiobook backfill | 1,061 events, 205 books, Sept 2022 - May 2026 |
| Periodic sync | 30-min background loop, automatic new listen ingestion |
| /api/reading/current | Combined highlights + listens + now-playing |
| POST /api/ingest/audiomark | Voice-triggered passage transcription, ~5s total |
| highlight quick-sync | Highlight ingest auto-checks Plex for title/author context |