An automated AI-powered newsletter pipeline that scrapes the latest tech and AI trends, curates the content using LLM agents, and delivers a formatted HTML digest to your inbox on a schedule.
Scrapers → datasets/ → Editor Agent → Writer Agent → Reviewer Agent → Email
(data) (CSVs) (selects) (formats) (validates) (sent)
- Scrapers run in parallel and populate
datasets/with fresh CSVs. - Editor Agent (Gemini) reads each CSV and selects the most relevant items.
- Writer Agent (Gemini) formats the selected items into styled HTML sections.
- Reviewer Agent (Gemini) validates the assembled email and fixes issues.
- The final HTML is saved to
process/emails/and sent via Gmail SMTP.
| Section | Source |
|---|---|
| Tech News | Perplexity API (sonar-pro) |
| GitHub Trending | GitHub API |
| HuggingFace Models / Datasets / Spaces / Papers | HuggingFace API |
| Research Papers | arXiv API + Semantic Scholar |
| Google Research Blog | google.github.io/googleai RSS |
| AWS Blog | AWS Blog API |
| OpenAI & Claude Cookbook | GitHub (openai/openai-cookbook, anthropics-cookbook) |
| Startup Funding | Google News RSS |
| LinkedIn Updates | Apify (Y Combinator, a16z, Sequoia) |
| Private Equity | Apify (a16z, Sequoia LinkedIn posts) |
- Python 3.10+
- A Gmail account with an App Password enabled
- API keys (see Configuration below)
git clone https://github.com/your-username/techpulse.git
cd techpulse
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtCopy the example below into a .env file in the project root:
# AI Agents (required)
GEMINI_API_KEY=your_gemini_api_key
# Tech News scraper (required)
PERPLEXITY_API_KEY=your_perplexity_api_key
# LinkedIn scraper via Apify (required for LinkedIn sections)
APIFY_API_KEY=your_apify_api_key
# Email delivery (required)
GMAIL_USER=your_email@gmail.com
GMAIL_PASSWORD=your_gmail_app_password
# GitHub API — increases rate limits (optional but recommended)
GITHUB_TOKEN=your_github_tokenEdit config.json to set the email recipient list:
{
"recipients": ["you@example.com", "colleague@example.com"]
}Control how many items appear per section:
default_limit: 3
categories:
"Tech News": 10
"GitHub Trending": 3
"HuggingFace Models": 3Override which Gemini model each agent uses:
default_model: "gemini-flash-latest"
agents:
EditorAgent: "gemini-flash-latest"
WriterAgent: "gemini-flash-latest"
ReviewerAgent: "gemini-flash-latest"Add or remove LinkedIn company slugs to scrape:
companies:
- y-combinator
- a16z
- sequoia-capitalRun the full pipeline once:
python main.pyOr use the shell script (recommended for cron):
chmod +x run_pipeline.sh
./run_pipeline.shSchedule with cron (e.g. every Monday at 08:00):
0 8 * * 1 /path/to/techpulse/run_pipeline.sh >> /path/to/techpulse/cron_log.log 2>&1Output:
- Raw data →
datasets/*.csv - Generated email →
process/emails/YYYY-MM-DD.html
techpulse/
├── main.py # Entry point — orchestrates scrapers + pipeline
├── run_pipeline.sh # Shell wrapper for cron scheduling
├── requirements.txt
├── config/
│ ├── editor.yaml # Per-section item limits
│ ├── models.yaml # Gemini model overrides per agent
│ └── influencers.yaml # LinkedIn targets
├── scrapers/ # One file per data source
│ ├── github_scraper.py
│ ├── huggingface_scraper.py
│ ├── news_scraper.py # Perplexity API
│ ├── papers_scraper.py # arXiv + Semantic Scholar
│ ├── startup_scraper.py
│ ├── linkedin_scraper.py # Apify
│ ├── aws_blog_scraper.py
│ ├── openai_cookbook_scraper.py
│ └── config.py # Scraper limits
├── process/
│ ├── flow_manager.py # AI editorial pipeline
│ ├── email_template.html # HTML email layout
│ ├── send_email.py # Gmail SMTP sender
│ └── agents/
│ ├── editor.py # Selects top items per section
│ ├── writer.py # Writes HTML for each section
│ └── reviewer.py # Validates & fixes the final email
└── datasets/ # Auto-generated CSVs (git-ignored)
Contributions are welcome. To add a new data source:
- Create a scraper in
scrapers/your_source_scraper.pythat saves a CSV todatasets/. - Register it in
main.pyunder thescraperslist. - Add a category entry in
process/flow_manager.py(cats_config) with the filename and an HTML format hint. - Add the corresponding
{{ placeholder }}inprocess/email_template.html.
For bug reports and feature requests, please open a GitHub Issue.
This project is licensed under the MIT License. See LICENSE for details.