The Starting Point

The session started with a straightforward goal: clean up the book detail page (/book/{ratingKey}) and make it functional. What shipped was something fundamentally different — a reading room where live whisper transcription scrolls in sync with audiobook playback.

The Accidental Product Insight

Midway through wiring up the transcription waterfall on the book page, something clicked. Scrolling text synchronized to audio narration acts as an attention anchor. For ADHD brains, the moving text provides just enough visual stimulation to keep focus locked on the audio content. It’s not note-taking, not retrieval — it’s reading. The core use case shifted from “audiobook player with metadata” to “reading room with live transcription.”

This reframed every design decision. The book page got an 8/4 grid layout: waterfall transcript on the left (the main event), metadata sidebar on the right (supporting context). Reading mode tabs: Listen (default, waterfall), Passages, Chapters. The floating player bar stays hidden until you actually hit play. The active line gets a subtle warm left border that slides down as narration flows.

The hierarchy says: you’re here to read. Everything else is secondary.

Shared Transcription Engine

The reading page (/static/reading.html) already had a working transcription engine. Rather than duplicating it, the engine got extracted into a shared transcribe.js module exposing a TranscriptionEngine class. Both the reading page and the book page import it. Same whisper pipeline, same waterfall rendering, different layouts.

This was the right call architecturally but required careful state management — the engine needs to know about the audio element, the display container, and the chapter boundaries, all of which differ between pages.

Multi-Chapter Playback

Single-chapter playback was easy. Multi-chapter is where audiobooks get interesting. The audio.ended handler now loads the next chapter seamlessly. Chapter markers appear on the seek bar (clickable with hover tooltips, hidden for books with 30+ chapters to avoid visual noise). A resolveAbsolutePosition() function converts album-level timestamps to chapter + offset pairs.

The tricky part: Plex reports positions at the album level (total seconds from start), but audio files are per-chapter. Every seek, every audiomark, every progress calculation needs this translation layer.

Web Audiomark Button

The bookmark icon on the floating player saves the current position plus the waterfall transcript to the database. No Plex session required — it POSTs title, position, and transcript directly from browser state. This means you can audiomark from the web player without the iOS Shortcut, without Plex even knowing you’re listening.

Two bugs burned time here: the transcript selector was targeting #karaoke-box instead of #karaoke-inner, and playerState.bookTitle wasn’t being set before the stream loaded.

Format-Aware FFmpeg

Both the transcribe-segment and clip pipelines were hardcoded to output .m4a. MP3 audiobooks crashed with “codec not supported in container.” The fix: detect source format from the file extension and match the output. MP3 stays MP3, M4A stays M4A. Simple once you know the problem.

Race Conditions and Cleanup

Clicking between books rapidly caused stale fetch responses to overwrite the current page. A swapGeneration counter now discards responses from previous selections. Also killed a 30-second polling loop that was refreshing the hero section when nothing was playing — unnecessary network chatter.

What Shipped

Live transcription waterfall on book page (8/4 grid layout)
Shared TranscriptionEngine module (transcribe.js)
Multi-chapter playback with auto-advance
Chapter markers on seek bar
Web audiomark button (no Plex session needed)
Format-aware FFmpeg pipelines
Track-level rating key detection and parent climbing
Clickable addresses linking to globe map with fly-to
Race condition fix with generation counter
Recently played bumped to 100 books, 2-row scroll
Today stats timezone fix (UTC to EST)

The Takeaway

The best product decisions come from using your own tool. Nobody planned “ADHD focus mode via scrolling transcription.” It emerged from building the feature and accidentally discovering it works. The session started as UI polish and ended as a reading room. Scope crept, but it crept toward the actual use case.