Digest Case Study
Building a production news synthesis pipeline that turns raw RSS feeds into personalised, LLM-written email briefings.
Problem
Keeping up with a broad set of news sources means either spending an hour skimming or missing things entirely. Existing aggregators surface links — they don’t synthesise. The goal was a system that reads across sources, groups related coverage, writes a coherent briefing, and delivers it on a schedule.
Approach
Built a multi-stage pipeline:
- Ingestion — RSS feeds polled on a schedule, articles deduplicated by URL, full text extracted via newspaper3k and BeautifulSoup.
- Clustering — entity-boosted TF-IDF clustering (spaCy for entity extraction, scikit-learn for vectorisation) groups related articles by default; optional semantic clustering with OpenAI embeddings for higher-quality grouping.
- Synthesis — LLM writes a briefing per story cluster. Multi-provider routing (Anthropic Claude, Google Gemini, OpenAI) with a circuit-breaker pattern falls over to the next provider on failure. Enrichments add social pulse (Hacker News reactions), expert voice (attributed quotes), and confidence indicators.
- Post-processing — story threading attaches new items to existing threads or suppresses near-duplicates; a corrections layer flags cross-story contradictions.
- Delivery — personalised digests assembled from topic preferences and delivery schedules, sent via Resend. A weekly Saturday edition (“The Arc”) covers longer-running threads for Plus-tier users.
- Operations — a dashboard tracks pipeline metrics, deliverability, per-feature LLM usage, and alerts. Magic-link auth, reading lists, and delivery webhooks round out the product surface.
Outcome
- Live at thedigest.co.
- 861 tests passing across the pipeline, delivery, and ops layers.
- Multi-provider fallback means synthesis continues when any single LLM provider has an outage.
Links
- Live site: thedigest.co
- GitHub: github.com/deanem/digest
Stack
Python, Flask, PostgreSQL, spaCy, scikit-learn, Anthropic Claude / Google Gemini / OpenAI, Resend