Skip to content

DenSul/pipepost

Repository files navigation

         _                          _
 _ __ (_)_ __   ___ _ __   ___  ___| |_
| '_ \| | '_ \ / _ \ '_ \ / _ \/ __| __|
| |_) | | |_) |  __/ |_) | (_) \__ \ |_
| .__/|_| .__/ \___| .__/ \___/|___/\__|
|_|     |_|        |_|

CI PyPI version Python 3.11+ Tests Coverage License: AGPL-3.0 Code style: ruff

PipePost

Open-source AI content curation pipeline -- scout, translate, and publish articles from any domain automatically.

                        P I P E L I N E

  SOURCES          SCOUT       TRANSLATE     PUBLISH         DESTINATIONS
 ----------       -------      ---------     -------        --------------
  HackerNews        |             |             |            Webhook / CMS
  Reddit      ----> | Score  ---> | Rewrite --> | Fanout --> Telegram
  RSS/Atom          | Rank        | Adapt       | to N       Markdown
  DuckDuckGo        |             | Style       |            OpenClaw (23+)
  Custom            |             |             |            Custom

PipePost discovers articles from sources like HackerNews, Reddit, RSS feeds, and search engines, translates them to your target language using AI, and publishes to your blog or CMS. Works for any niche -- tech, business, health, lifestyle, and more.

PipePost demo — batch pipeline run


Table of Contents

Features

  • 📡 Multiple Sources — HackerNews, Reddit, RSS/Atom, DuckDuckGo search
  • 🌍 AI Translation — Full paragraph-by-paragraph translation via any LLM (DeepSeek, Claude, GPT, Qwen, etc.)
  • 🔄 Content Rewriting — Deep AI-powered rewrite to make content 100% unique and undetectable by plagiarism checkers
  • 📝 Multiple Destinations — Webhook, Markdown, Telegram, OpenClaw (23+ channels)
  • 🤖 Telegram Bot — Interactive curation: scout candidates, approve/reject via inline buttons
  • 🎯 Smart Scoring — LLM-based candidate ranking by relevance, originality, and engagement
  • ✍️ Style Adaptation — Adapt content for blog, Telegram, newsletter, or Twitter thread
  • 📢 Fanout Publish — Publish to multiple destinations simultaneously
  • 📦 Batch Mode — Process multiple articles in one run (--batch -n 5)
  • 🔄 Composable Flows — Chain steps: dedup → scout → filter → score → fetch → translate → adapt → publish
  • 🔍 Smart Filtering — Filter candidates by keywords, domain blacklist, and title length
  • 💾 Deduplication — Async SQLite persistence (aiosqlite) prevents re-publishing across runs
  • 📊 Prometheus Metrics — Pipeline runs, step durations, error counters (optional)
  • ⚙️ Config-Driven Flows — Define entire pipelines in YAML without writing Python
  • 🧩 Plugin Architecture — Add sources and destinations with a single file
  • 🔁 Resilient Retries — Exponential backoff with jitter for LLM calls and HTTP destinations (5xx/timeout)
  • 🚦 Rate Limiting — Built-in semaphore-based concurrency control for external APIs
  • Fetch Caching — In-memory TTL cache avoids re-downloading the same article
  • 🔐 Secret References — Use ${ENV_VAR} in YAML configs to keep secrets out of files
  • 🐳 Docker Readydocker compose up and go

Quick Start

# Install from PyPI
pip install pipepost

# Or from source
git clone https://github.com/DenSul/pipepost && cd pipepost
pip install -e .

# Configure
export PIPEPOST_MODEL=deepseek/deepseek-chat
export DEEPSEEK_API_KEY=your-key  # or OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.

# List available components
pipepost sources
pipepost destinations
pipepost styles
pipepost flows

# Run a pipeline flow
pipepost run default --source hackernews --dest webhook --lang ru

# Preview without publishing (dry run)
pipepost run default --source hackernews --dry-run

# Batch mode — process multiple articles
pipepost run default --source hackernews --batch -n 5

# Use a config file
pipepost run --config pipepost.yaml --source hackernews

# Run interactive Telegram bot
export TELEGRAM_BOT_TOKEN=your-bot-token
pipepost bot --source hackernews --lang ru

# Validate config without running
pipepost validate --config pipepost.yaml

# Check health
pipepost health

Example batch output:

$ pipepost run default --source hackernews --batch -n 3 --lang ru

Batch: processed 3 article(s)
  [1] Восемь лет желания, три месяца работы с ИИ | 2026-04-05-vosem-let-zhelaniya | ok
  [2] Финская сауна усиливает иммунный ответ    | 2026-04-05-finskaya-sauna       | ok
  [3] Утечка email-адресов в BrowserStack        | 2026-04-05-utechka-email        | ok

Architecture

graph LR
    subgraph Sources
        HN[HackerNews]
        RD[Reddit]
        RSS[RSS/Atom]
        DDG[DuckDuckGo]
    end

    subgraph Pipeline
        Dedup[Dedup<br><i>SQLite</i>]
        Scout[Scout<br><i>fetch candidates</i>]
        Filter[Filter<br><i>keyword/domain</i>]
        Score[Score<br><i>LLM ranking</i>]
        Fetch[Fetch<br><i>download article</i>]
        Translate[Translate<br><i>LLM translation</i>]
        Rewrite[Rewrite<br><i>make unique</i>]
        Adapt[Adapt<br><i>style: blog/tg/thread</i>]
        Images[Images<br><i>download & rewrite</i>]
        Validate[Validate<br><i>quality check</i>]
    end

    subgraph Destinations
        WH[Webhook / CMS]
        MD[Markdown]
        TG[Telegram]
        OC[OpenClaw<br><i>23+ channels</i>]
    end

    HN & RD & RSS & DDG --> Dedup --> Scout --> Filter --> Score --> Fetch --> Translate --> Rewrite --> Adapt --> Images --> Validate
    Validate --> WH & MD & TG & OC

    style Pipeline fill:#1a1a2e,stroke:#16213e,color:#e0e0e0
    style Sources fill:#0f3460,stroke:#16213e,color:#e0e0e0
    style Destinations fill:#533483,stroke:#16213e,color:#e0e0e0
Loading

Every step is independent and composable. Define your pipeline in YAML -- no Python needed:

# pipepost.yaml — full pipeline config
sources:
  - name: hackernews
    min_score: 100

translate:
  model: deepseek/deepseek-chat
  target_lang: ru

rewrite:
  model: deepseek/deepseek-chat  # optional: separate model for rewriting
  creativity: 0.7                 # temperature (0.3–1.0)

destination:
  type: markdown
  output_dir: ./output

flow:
  steps: [dedup, scout, score, fetch, translate, rewrite, validate, publish, post_publish]
  score:
    niche: tech
  storage:
    db_path: pipepost.db
pipepost run --config pipepost.yaml --source hackernews

Add or remove steps from the flow.steps list to customize your pipeline. Available steps: dedup, scout, filter, score, fetch, translate, rewrite, adapt, images, validate, publish, fanout_publish, post_publish.

Advanced: custom flows in Python
from pipepost.core import Flow
from pipepost.steps import (
    AdaptStep, DeduplicationStep, FanoutPublishStep, FetchStep,
    PostPublishStep, RewriteStep, ScoutStep, ScoringStep, TranslateStep, ValidateStep,
)
from pipepost.storage import SQLiteStorage

storage = SQLiteStorage(db_path="my_project.db")

my_flow = Flow(
    name="my-pipeline",
    steps=[
        DeduplicationStep(storage=storage),
        ScoutStep(max_candidates=20),
        ScoringStep(niche="tech", max_score_candidates=5),
        FetchStep(max_chars=15000),
        TranslateStep(model="deepseek/deepseek-chat", target_lang="ru"),
        RewriteStep(creativity=0.7),
        AdaptStep(style="telegram"),
        ValidateStep(min_content_len=500),
        FanoutPublishStep(destination_names=["webhook", "telegram", "markdown"]),
        PostPublishStep(storage=storage),
    ],
)

Use Cases

Cooking & Food

sources:
  - name: food-news
    type: reddit
    subreddits: [cooking, recipes, AskCulinary]
  - name: food-search
    type: search
    queries:
      - "new restaurant trends 2026"
      - "seasonal recipes spring"

Cooking with Filter

# Combine sources with filtering
flow:
  steps: [dedup, scout, filter, fetch, translate, publish, post_publish]
  filter:
    keywords_include: ["recipe", "cooking", "restaurant"]
    keywords_exclude: ["sponsored", "advertisement", "affiliate"]
    domain_blacklist: ["buzzfeed.com"]

Travel & Adventure

sources:
  - name: travel-news
    type: search
    queries:
      - "best travel destinations 2026"
      - "budget travel tips Europe"
      - "digital nomad guides"

Finance & Investing

sources:
  - name: finance-news
    type: reddit
    subreddits: [personalfinance, investing]
  - name: finance-search
    type: search
    queries:
      - "stock market analysis today"
      - "personal finance strategies"

Health & Science

sources:
  - name: health-news
    type: search
    queries:
      - "health research breakthroughs"
      - "nutrition science news"
      - "mental health studies"

Tech & Programming

sources:
  - name: tech-news
    type: search
    queries:
      - "latest AI research papers"
      - "open source projects trending"

Sports & Fitness

sources:
  - name: sports-news
    type: reddit
    subreddits: [sports, fitness, running]
  - name: sports-search
    type: search
    queries:
      - "sports highlights this week"
      - "fitness training programs"

Sources

Source Type Description
hackernews API Top stories from Hacker News (Firebase API)
reddit API Top posts from configurable subreddits
rss RSS/Atom Any RSS or Atom feed URL
search DuckDuckGo Keyword-based article search

Destinations

Destination Description
webhook POST to any URL (WordPress REST API, Ghost, custom)
markdown Save as .md files with YAML frontmatter
telegram Post to Telegram channels/chats via Bot API
openclaw Route through OpenClaw to 23+ messaging platforms

Steps

Step Description
dedup Load published URLs from SQLite to prevent re-processing
scout Fetch candidates from a source (HN, Reddit, RSS, search)
filter Filter candidates by keywords, domain blacklist, title length
score LLM-based candidate ranking by relevance, originality, engagement
fetch Download article, extract content as markdown, get og:image
translate Translate via LLM (LiteLLM — supports 100+ models)
rewrite Deep AI rewrite — makes content unique and undetectable by plagiarism checkers
adapt Adapt content style: blog, telegram, newsletter, or thread
validate Check translation quality (length, ratio, required fields)
publish Send to a single configured destination
fanout_publish Publish to multiple destinations concurrently
images Download images from article content and rewrite URLs to local paths
post_publish Persist published URL to SQLite for future deduplication

Configuration

All configuration lives in pipepost.yaml. Priority: CLI flags > env vars > YAML > defaults.

# pipepost.yaml — complete example
sources:
  - name: hackernews
    min_score: 100
  - name: my-blog
    type: rss
    url: https://example.com/feed.xml
  - name: daily-search
    type: search
    queries:
      - "latest news in your niche"
      - "trending articles today"

destination:
  type: webhook
  url: https://myblog.com/api/posts/auto-publish
  headers:
    Authorization: "Bearer ${API_TOKEN}"

# Or use typed destination configs:
# destination:
#   type: telegram
#   bot_token: "${TELEGRAM_BOT_TOKEN}"
#   chat_id: "@my_channel"

translate:
  model: deepseek/deepseek-chat
  target_lang: ru

flow:
  steps: [dedup, scout, filter, score, fetch, translate, rewrite, validate, publish, post_publish]
  on_error: stop
  filter:
    keywords_include: ["AI", "open source", "startup"]  # at least one must match
    keywords_exclude: ["sponsored", "advertisement"]     # none must match
    domain_blacklist: ["medium.com", "substack.com"]     # blocked domains
    min_title_length: 10
  score:
    model: gpt-4o-mini  # optional: cheaper model for scoring
    niche: tech
  adapt:
    model: claude-sonnet-4-20250514  # optional: different model for style adaptation
    style: telegram
  publish:
    destination_name: webhook
  storage:
    db_path: pipepost.db

Env var overrides: PIPEPOST_MODEL, PIPEPOST_LANG, PIPEPOST_DEST_URL

Secret references in YAML: Use ${ENV_VAR} syntax to reference environment variables directly in config values. This is useful for keeping secrets out of config files:

destination:
  type: telegram
  bot_token: "${TELEGRAM_BOT_TOKEN}"
  chat_id: "${TELEGRAM_CHAT_ID}"

Source auto-registration: Sources defined in YAML (rss with custom URL, search with custom queries, reddit with subreddits) are automatically registered in the pipeline registry when using --config. No manual registration needed.

Fetch caching: The fetch step includes an in-memory TTL cache (default: 1 hour). Set fetch.cache_ttl to 0 to disable:

fetch:
  max_chars: 20000
  cache_ttl: 3600  # seconds, 0 to disable

See examples/pipepost.yaml for more examples.

Adding a Custom Source

Create a single file — PipePost auto-discovers it:

# pipepost/sources/my_source.py
from pipepost.sources.base import Source
from pipepost.core.context import Candidate
from pipepost.core.registry import register_source


class MySource(Source):
    name = "my-source"
    source_type = "api"

    async def fetch_candidates(self, limit: int = 10) -> list[Candidate]:
        # Your logic here
        return [Candidate(url="https://...", title="...", source_name=self.name)]


register_source("my-source", MySource())

Adding a Custom Destination

# pipepost/destinations/my_cms.py
from pipepost.destinations.base import Destination
from pipepost.core.context import PublishResult, TranslatedArticle
from pipepost.core.registry import register_destination


class MyCMSDestination(Destination):
    name = "my-cms"

    async def publish(self, article: TranslatedArticle) -> PublishResult:
        # Your CMS API logic here
        return PublishResult(success=True, slug="article-slug")


register_destination("my-cms", MyCMSDestination())

Adding a Custom Style

Register new adapt styles without modifying existing code:

from pipepost.core.registry import register_style

register_style("twitter", """
Adapt the article into a Twitter/X thread format:
- First tweet: hook + key insight (max 280 chars)
- Follow-up tweets: supporting points
- Last tweet: source link + call to action
""")

Then use it: pipepost run default --source hackernews with flow.adapt.style: twitter in your config.

Telegram Bot

PipePost includes an interactive Telegram bot for human-in-the-loop content curation:

export TELEGRAM_BOT_TOKEN=your-bot-token
pipepost bot --source hackernews --lang ru
sequenceDiagram
    participant U as You (Telegram)
    participant B as PipePost Bot
    participant S as Source (HN/Reddit)
    participant L as LLM (DeepSeek/GPT)
    participant D as Destination

    U->>B: /scout
    B->>S: fetch_candidates(limit=5)
    S-->>B: 5 articles
    B->>U: Article 1: "..." [Publish] [Skip]
    B->>U: Article 2: "..." [Publish] [Skip]
    U->>B: tap [Publish] on Article 1
    B->>B: fetch full content
    B->>L: translate to Russian
    L-->>B: translated article
    B->>D: publish
    D-->>B: slug: my-article
    B->>U: Published: my-article
Loading

How it works:

  1. Send /scout to the bot
  2. Bot fetches candidates and shows them with inline buttons
  3. Tap Publish — bot runs the standard Flow pipeline (fetch → translate → validate → publish)
  4. Tap Skip — bot moves to the next candidate

The bot uses the same composable Flow engine as CLI pipelines, so any new steps you add are automatically available.

Telegram as a destination (automated, no approval needed):

destination:
  type: telegram
  bot_token: "your-bot-token"
  chat_id: "@your_channel"

OpenClaw Integration

PipePost integrates with OpenClaw -- a self-hosted AI assistant platform with 23+ messaging channels.

graph LR
    PP[PipePost] -->|publish| OC[OpenClaw Gateway]
    OC --> TG[Telegram]
    OC --> SL[Slack]
    OC --> DC[Discord]
    OC --> WA[WhatsApp]
    OC --> SG[Signal]
    OC --> MS[Teams]
    OC --> ETC[...20+ more]

    style PP fill:#1a1a2e,stroke:#16213e,color:#e0e0e0
    style OC fill:#533483,stroke:#16213e,color:#e0e0e0
Loading

As a destination -- publish through OpenClaw to all connected channels:

destination:
  type: openclaw
  gateway_url: "ws://127.0.0.1:18789"
  session_id: "my-session"
  channels: ["telegram", "slack", "discord"]

As a native plugin — install the openclaw-pipepost plugin for 5 native agent tools (pipepost_run, pipepost_status, pipepost_init, pipepost_validate, pipepost_sources):

pip install pipepost
openclaw plugins install clawhub:openclaw-pipepost

Available on ClawHub.

Supported LLM Models

PipePost uses LiteLLM for translation, supporting 100+ models:

  • DeepSeekdeepseek/deepseek-chat, deepseek/deepseek-reasoner
  • OpenAIgpt-4o, gpt-4o-mini
  • Anthropicclaude-sonnet-4-20250514, claude-haiku-4-20250414
  • Googlegemini/gemini-2.0-flash
  • Localollama/llama3.1, any Ollama model

Set via PIPEPOST_MODEL env var or in YAML config. Each LLM step (translate, score, adapt) can use a different model — see Configuration.

CLI Reference

Command Description
pipepost run <flow> Run a pipeline flow (default: default)
pipepost run --batch -n 5 Process multiple articles in one run
pipepost run --dry-run Preview results without publishing
pipepost run --config file.yaml Use a config file
pipepost validate --config file.yaml Validate config without running
pipepost bot Start interactive Telegram curation bot
pipepost sources List registered content sources
pipepost destinations List registered publish destinations
pipepost styles List available adapt styles
pipepost flows List registered pipeline flows
pipepost health Check pipeline health status

Docker

# Build and run
docker compose up -d

# Or build manually
docker build -t pipepost .
docker run -v ./pipepost.yaml:/app/config/pipepost.yaml pipepost run default

Development

git clone https://github.com/DenSul/pipepost
cd pipepost
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev,metrics]"

# Lint
ruff check pipepost/

# Type check
mypy --strict pipepost/

# Test
pytest tests/

# Integration tests (hits real APIs)
pytest tests/test_integration.py -v

Contributing

Contributions are welcome! Please read CONTRIBUTING.md for guidelines on how to get started.

In short: fork, branch, make your changes, run ruff check, mypy --strict, and pytest, then open a PR.

License

AGPL-3.0 -- Free to use, modify, and self-host. If you offer PipePost as a hosted service, you must open-source your modifications.


Built by Denis Sultanov

About

Open-source AI content curation pipeline — Scout, translate, publish

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors