Audiobook Highlights to Threads: Building the Pipeline Nobody Asked For

claude-codethreads-analysislife-dashboardthreads-apiautomation

Audiobook Highlights to Threads: Building the Pipeline Nobody Asked For

Started with “I want to post to Threads programmatically.” Ended with an automated pipeline that transcribes audiobook passages and posts them as quotation-formatted highlights. Probably the only person on Earth running this particular stack.


The Goal

The threads-analysis project already pulls 50K+ posts from @maybe_foucault for analysis. That’s read-only. The new goal: write to Threads. Specifically, automatically share audiobook highlights captured by the audiomark system (Siri trigger, Plex extraction, mlx_whisper transcription) as formatted posts on a new account.

Simpthreads: A Separate Publishing App

Key decision: keep the publishing bot (dont_mind_me_bitte) on a completely separate Meta app from the read-only analysis app (ByTheWeiCo). Different OAuth tokens, different scopes, different rate limit pools. If the bot gets rate-limited or flagged, analysis keeps running.

Built two scripts in threads-analysis:

  • scripts/simpthreads-oauth.mjs — handles the OAuth dance for getting long-lived tokens
  • scripts/simpthreads-post.mjs — CLI posting tool for testing

The OAuth flow surfaced the first gotcha: Meta App ID and Threads App ID are different things. The parent app ID is prominently displayed in the developer console. The Threads product ID is buried in settings. OAuth needs the Threads one. This cost more time than it should have.

For tester accounts, the User Token Generator in the developer console sidesteps the entire OAuth flow. Just click, get token. Would have been nice to know that 45 minutes earlier.

The Threading Bug

Splitting long passages into threaded replies seemed straightforward. Post the root, then reply to each previous reply to build a chain. Wrong.

Threads threading works differently from every other platform. All replies must point to reply_to_id=root_id. Replying to a reply creates a separate conversation. The API doesn’t error — it just silently posts into the void. Discovered this only after checking the actual Threads app and seeing scattered orphan posts instead of a clean thread.

The fix is simple once you know: every reply targets the root. But the debugging cost of “it posted successfully but looks wrong” is real.

Wiring Into Life-Dashboard

The pipeline lives in app/threads_bot.py in life-dashboard. It hooks into the existing audiomark ingest endpoint with an opt-in share: true flag.

The flow:

  1. iOS audiomark button triggers Plex clip extraction + whisper transcription (existing pipeline)
  2. If share: true, fire-and-forget asyncio.create_task to the Threads bot
  3. Bot sanitizes the transcript (redacts terms that trigger Threads moderation)
  4. Wraps passage in quotation marks with title/author/progress footer
  5. If over 500 chars, splits at sentence boundaries into threaded replies
  6. Posts via two-step API: create container, then publish

Default is NOT sharing. The share: true flag is explicit opt-in so audiomarks don’t auto-spam Threads every time someone says “Hey Siri, audiomark.”

Sanitization: Output Only

Critical design choice: sanitization only happens for the Threads output. The database keeps the original verbatim transcript. Audiobook passages contain all kinds of language — racial slurs in historical fiction, explicit content, clinical terminology. The DB is a personal archive. Threads has content moderation. Different contexts, different rules.

The sanitizer maintains a redaction list that replaces flagged terms with bracketed placeholders. Clean enough for Threads, lossless in the DB.

Post Formatting

Every post follows a consistent format:

"Transcribed passage text here, wrapped in quotation marks
and split at sentence boundaries if needed."

-- Title by Author (42% complete)

#bytheweiaudiomark

The topic_tag parameter (no periods or ampersands allowed) categorizes all bot posts. The progress footer gives context for where in the book the passage appears.

First Live Test

Posted highlights from “I Am Not a Robot” by Joanna Stern. A book about human-technology relationships, posted by an AI-powered pipeline. The irony was not lost.

The pipeline worked end-to-end: Siri trigger on iPhone, Plex clip extraction on Mac mini, whisper transcription, sanitization, Threads API publish. All automated. All under 10 seconds from voice command to live post.

What Shipped

  • Threads posting bot (dont_mind_me_bitte) via Simpthreads Meta app
  • CLI posting tool (scripts/simpthreads-post.mjs) for testing
  • OAuth scripts (scripts/simpthreads-oauth.mjs) for token management
  • Audiomark-to-Threads pipeline in life-dashboard (app/threads_bot.py)
  • Opt-in sharing via share: true flag on audiomark ingest
  • Content sanitization layer (Threads output only, DB keeps original)
  • Sentence-boundary splitting for long passages with threaded replies
  • Topic tagging via Threads API topic_tag parameter

Technical Learnings Worth Remembering

  • Threads API two-step publish: container creation is where validation happens
  • reply_to_id must ALWAYS be the root post ID, never the previous reply
  • Meta App ID != Threads App ID — OAuth uses the Threads one
  • Threads char limit is 500, not 450 (many libraries get this wrong)
  • User Token Generator bypasses full OAuth for tester accounts
  • topic_tag rejects periods and ampersands
  • asyncio.create_task for fire-and-forget is the right pattern here — don’t block the audiomark response waiting for Threads to publish