Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ BLUESKY_CREDENTIALS_ENCRYPTION_KEY=
LINKEDIN_CREDENTIALS_ENCRYPTION_KEY=
LINKEDIN_CLIENT_ID=
LINKEDIN_CLIENT_SECRET=
LINKEDIN_OAUTH_SCOPES=openid profile email offline_access
LINKEDIN_OAUTH_SCOPES="openid profile email offline_access"

# Outbound mail provider. Use Resend or Amazon SES.
EMAIL_BACKEND=anymail.backends.resend.EmailBackend
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ frontend/.next/
frontend/coverage/
frontend/node_modules/

docs/
docs/_internal_only/

*storybook.log
storybook-static
158 changes: 56 additions & 102 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,48 @@ An AI-powered content curation platform for technically-oriented newsletters. Ne

The system is organized into projects: each newsletter project has its own tracked entities, relevance model, and content pipeline. Projects are assigned to Django groups so editorial access can be shared cleanly. Designed for non-technical editors who don't know what a vector database is and don't need to.

## Local Development

Use the repo's `just` commands to get a local stack running:

Linux:

```bash
python3 -m venv .venv
source .venv/bin/activate
just install
just build
just dev
just seed
xdg-open http://localhost:8080/
```

macOS:

```bash
python3 -m venv .venv
source .venv/bin/activate
just install
just build
just dev
just seed
open http://localhost:8080/
```

Windows PowerShell:

```powershell
py -m venv .venv
.\.venv\Scripts\Activate.ps1
just install
just build
just dev
just seed
Start-Process http://localhost:8080/
```

`just build` prepares the backend image and frontend bundle, `just dev` starts the Docker Compose stack, and `just seed` loads demo data into the running app. For the full workflow and troubleshooting notes, see [docs/developer-guide/local-development.md](docs/developer-guide/local-development.md).

## What This Does That Existing Tools Don't

Tools like Feedly, UpContent, and ContentStudio handle parts of the content curation problem. Newsletter Maker combines several capabilities none of them offer:
Expand Down Expand Up @@ -58,14 +100,14 @@ The roadmap progresses from contextual actions (MVP) to multi-step skill chainin

Each data source implements a common interface (`fetch_new_content`, `get_entity_profile`, `health_check`) and handles its own auth and rate limiting. The core system just calls the interface. Planned integrations:

| Source | Purpose | Priority |
| ------ | ------- | -------- |
| RSS | Blog/site tracking for followed entities | Phase 1 |
| Reddit | Trend detection and community sentiment | Phase 1 |
| Resend Inbound | Newsletter email ingestion and authority signals | Phase 2 |
| Bluesky | Entity content tracking (open AT Protocol) | Phase 2 |
| Mastodon | Entity content tracking (ActivityPub) | Phase 3 |
| LinkedIn | Entity enrichment and article discovery | Phase 4 |
| Source | Purpose |
| ------ | ------- |
| RSS | Blog/site tracking for followed entities |
| Reddit | Trend detection and community sentiment |
| Resend Inbound | Newsletter email ingestion and authority signals |
| Bluesky | Entity content tracking (open AT Protocol) |
| Mastodon | Entity content tracking (ActivityPub) |
| LinkedIn | Entity enrichment and article discovery |

### Production-Grade Error Handling

Expand All @@ -83,102 +125,14 @@ The system is designed for graceful failure, not silent corruption. Unparseable

## Project Documentation

- [Developer Guide](docs/DEVELOPER_GUIDE.md) gives a fast "where to look first" map for new contributors.
- [Deployment Guide](docs/DEPLOYMENT.md) covers Docker Compose, Helm, Minikube, and deployment-aware CI.
- [Implementation Overview](docs/IMPLEMENTATION_OVERVIEW.md) summarizes the main features and current architecture.
- [Data Models](docs/MODELS.md) describes the purpose of each core model.
- [Relevance Scoring](docs/RELEVANCE_SCORING.md) explains how similarity scoring and review thresholds work.
- [Logging](docs/LOGGING.md) explains where application logs go in local and containerized environments.

## Local Development

```bash
python3 -m venv .venv
source .venv/bin/activate
just install
```

`just install` installs the backend and frontend dependencies and registers the repository's `pre-commit` hooks, so `git commit` runs the configured lint and test hooks locally.

There are two intentionally separate workflows:
Newsletter Maker documentation is organized by audience inside the `docs/` folder:

- `just lint` and `just test` run on the host without Docker. The backend half of those commands uses `.env.test`.
- Runtime, data, and Django management commands run against the Docker Compose stack.

1. Run `just dev` to start Django, Celery, Postgres, Redis, Qdrant, and Nginx. On the first run Docker builds the app image automatically. After that, `just dev` reuses the existing image so normal restarts are fast. If `.env` is missing, the `just` command copies `.env.example` automatically.
2. Run `just build` after changing `requirements.txt` or `docker/web/Dockerfile`. It does not copy or depend on local env files.
3. For a fully fresh local stack after schema changes, run `just reset-volumes` before starting the containers again. This drops the Docker-backed Postgres, Redis, and Qdrant state so regenerated migrations apply cleanly.
4. Run Django management commands against the running backend container. `just migrate`, `just shell`, `just embed-all`, `just embed-project <project_id>`, `just embed-smoke`, `just embed-smoke-content <content_id>`, and `just bootstrap-live-sources <project_id>` all use `docker compose exec django ...`.
5. `.env.example` is Compose-oriented and uses Docker service hostnames for the backend runtime. Update `.env` with non-default secrets before using the stack outside local development.
6. Open `http://localhost:8080/healthz/` for a liveness check and `http://localhost:8080/admin/` for Django admin. Use `just seed` after the stack is up if you want the demo project and sample content.

### Testing

Run the test suite with:

```bash
just test
```

Pytest auto-loads `.env.test` during test startup. That file is intentionally checked in and only contains non-sensitive placeholder values used by tests, such as fake API keys, fake Reddit credentials, and localhost service URLs.

`.env.test` also pins Django tests to an explicit SQLite configuration so backend tests stay independent from the Compose-backed Postgres development database.

`backend-lint` also runs Django-aware host-side checks (`mypy` with the Django plugin and `manage.py check`) under `.env.test`, so `just lint` stays independent from Docker.

Use `.env.test` for stable dummy values that make tests deterministic. Do not put real secrets in it. Real local or production secrets belong in `.env`, which remains ignored.

### Embedding Backends

The embedding layer is provider-based. Configure it with `EMBEDDING_PROVIDER` and `EMBEDDING_MODEL`:

- `sentence-transformers`: loads a Hugging Face / SentenceTransformers model inside the Django process
- `ollama`: calls a local Ollama server for embeddings
- `openrouter`: calls OpenRouter's embeddings API using the configured model id

Common examples:

```dotenv
EMBEDDING_PROVIDER=sentence-transformers
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
```

```dotenv
EMBEDDING_PROVIDER=ollama
EMBEDDING_MODEL=nomic-embed-text
OLLAMA_URL=http://localhost:11434
```

```dotenv
EMBEDDING_PROVIDER=openrouter
EMBEDDING_MODEL=openai/text-embedding-3-small
OPENROUTER_API_KEY=...
OPENROUTER_API_BASE=https://openrouter.ai/api/v1
```

For SentenceTransformers models that require custom remote code, set `EMBEDDING_TRUST_REMOTE_CODE=true`.

### Embedding Commands

Use these commands to backfill or refresh embeddings for existing content:

```bash
just embed-all
just embed-project 1
docker compose exec django python manage.py sync_embeddings --content-id 42
docker compose exec django python manage.py sync_embeddings --references-only
```

When `just dev` is running, Django admin and the developer-facing `just` wrappers all operate against the Compose-backed Postgres database.

Create or update an admin user for the running Docker stack with:

```bash
just createsuperuser
just changepassword your-username
```
- [User Guide](docs/user-guide/getting-started-saas.md) covers managing projects, intaking content, and curating drafts.
- [Admin Guide](docs/admin-guide/overview.md) covers installation, configuration, user management, and operational health.
- [Developer Guide](docs/developer-guide/overview.md) covers local workflows, backend/frontend conventions, and testing logic.
- [Reference](docs/reference/data-model.md) details the backend API, algorithms, pipeline definitions, and tunables.

For the default local bootstrap, `.env` also seeds an `admin` superuser in the container database using `DJANGO_SUPERUSER_USERNAME`, `DJANGO_SUPERUSER_EMAIL`, and `DJANGO_SUPERUSER_PASSWORD`.
Start at the [Documentation Root](docs/README.md) to navigate to the specific section you need.

## License

Expand Down
14 changes: 14 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Newsletter Maker Documentation

Newsletter Maker is an AI-powered platform for ingesting, scoring, and writing domain-specific newsletters. It uses LangGraph to orchestrate Claude Skills against incoming RSS, Reddit, and forwarded email content to synthesize high-quality reading lists.

These documents are organized by audience.

* **I am an Editor or Curator using the product day-to-day**: Head to the [User Guide](user-guide/getting-started-saas.md) to learn how to ingest content, manage authority, and synthesize drafts.
* **I am an Administrator installing or managing the platform**: Head to the [Admin Guide](admin-guide/overview.md) to understand Docker deployments, API keys, and queue troubleshooting.
* **I am a Developer contributing code to this repository**: Head to the [Developer Guide](developer-guide/overview.md) to understand local workflows, architecture, and coding conventions.
* **I need to understand the underlying Math and Logic**: Head to the [Reference Section](reference/data-model.md) to see how LangGraph, LangChain, Celery, Qdrant, and the Cosine similarity algorithms are wired together.

## Terminology Note
In this repository, a distinct newsletter workspace is called a **Project** (not a Tenant, not a Workspace). An article or extracted text is called **Content**.
See the full [Glossary](reference/glossary.md) for clarification on Entities, Skills, and Velocity.
21 changes: 21 additions & 0 deletions docs/admin-guide/backups-and-retention.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Backups and Retention

## Postgres Backup
Back up Postgres using standard `pg_dump`:
```bash
docker compose exec postgres pg_dump -U newsletter newsletter_maker > backup.sql
```

## Qdrant Snapshot
Qdrant manages internal snapshots. See Qdrant Snapshot API documentation for exporting raw vector archives. Otherwise, vector data can be entirely reconstructed from Postgres text if necessary (though it costs API tokens to recalculate).

## Observability Retention Windows
To prevent unbound DB growth, old logs and task runs are deleted according to:
- `OBSERVABILITY_SNAPSHOT_RETENTION_DAYS` (default 90)
- `OBSERVABILITY_TREND_TASK_RUN_RETENTION_DAYS` (default 30)

## Restore Drill
To restore the platform:
1. `docker compose down -v`
2. Restore Postgres DB volume.
3. Bring system up. (If Qdrant is empty, trigger an embedding backfill from Postgres text).
31 changes: 31 additions & 0 deletions docs/admin-guide/configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Configuration

See the [Tunables Reference](../reference/tunables.md) for the exact list of algorithms and thresholds.

## Required vs Optional Variables
**Required**:
* `DATABASE_URL`, `REDIS_URL`, `QDRANT_URL`, `SECRET_KEY`, `NEWSLETTER_API_BASE_URL`.
**Optional but critical for AI**:
* `OPENROUTER_API_KEY` (Required for relevance tie-breaking and categorization).

## Secrets Handling
* In Docker Compose: Loaded tightly from the `.env` file mapped securely to the container.
* In Kubernetes: Expected to be mapped into the Pod `env` spec via Secrets.

## Internal vs Public URLs
Due to container networking:
* `NEWSLETTER_API_BASE_URL` (Internal) will reference inner hostnames like `http://nginx`.
* `NEWSLETTER_PUBLIC_URL` (Public) should point to your real FQDN (e.g. `https://news.mydomain.com`) used in emails.

## Email Provider (Anymail)
Newsletter intake relies on Resend webhooks and Django Anymail forwarding.
Configured via:
* `RESEND_API_KEY`
* `RESEND_INBOUND_SECRET`
* `DEFAULT_FROM_EMAIL`

## LLM Provider Routing
Select between `local`, `ollama` or remote providers using `EMBEDDING_PROVIDER`. Set URLs correctly to point to either the internal container (`http://ollama:11434`) or external APIs (`https://api.openai.com/v1`).

## OAuth Provider Toggles
If `LINKEDIN_CLIENT_ID` or `REDDIT_CLIENT_ID` are present, their respective capabilities light up dynamically in the application.
31 changes: 31 additions & 0 deletions docs/admin-guide/installation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Installation

## Minimum Requirements
* 4 CPU Cores (8 recommended if running Ollama locally).
* 8 GB RAM (16GB recommended if running Ollama locally).
* Postgres 14+, Redis 7+.

## Docker Compose Path
The easiest way to stand up a single VPS is Docker Compose:
1. Clone the repository.
2. Copy `.env.example` to `.env` and fill in secrets.
3. Run `docker compose build`.
4. Run `docker compose up -d`.
5. Run migrations: `docker compose exec django python manage.py migrate`.

## Helm + ArgoCD Path
For Kubernetes usage, an ArgoCD App configuration lives in `deploy/argocd` pointing to the Helm chart in `deploy/helm`. Configure your values file with the required secrets (or rely on ExternalSecrets).

## First-Run Checklist
1. Ensure containers are healthy (`docker compose ps`).
2. Run database migrations.
3. Create the superuser (see below).
4. Run `docker compose exec django python manage.py bootstrap_live_sources` to seed default RSS/Reddit connections.

## Creating the First Superuser
```bash
docker compose exec django python manage.py createsuperuser
```

## Smoke Test
Log into the dashboard. Go to settings and add an RSS feed. If the `Ingestion Settings` page shows health check successes within 5 minutes, Celery, Postgres, and the Network are functional.
24 changes: 24 additions & 0 deletions docs/admin-guide/operations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Operations

## Daily/Weekly Health Checks
* Check Celery Beat logs to see if scheduled ticks are executing.
* Look for `429 Too Many Requests` in your API provider logs (OpenRouter).

## Celery Beat & Worker Monitoring
Use Celery Flower (if enabled in Compose) or monitor queue depth in Redis length.

## Qdrant Collection Health
Ensure Qdrant is snapshotting to disk and not constantly OOM-killed. If it runs out of memory, increase VPS limits.

## Embeddings Backfill
If you switch embedding providers (e.g., moving from `local` to `openai`), previous cosine scores are invalidated. You must run a backfill management command to rewrite all `Content` vectors.

## Re-running Pipeline
If LLM failures occurred due to an outage:
Go to Django Admin -> `PipelineRun` -> select failed items and re-trigger.

## Clearing Stuck Items
Use the `ReviewQueue` in the Next.js frontend to clear items the LLM had zero confidence about.

## Messaging/Channels Health
If real-time notifications fail, verify Daphne is alive and the `REDIS_URL` matches the Channels configuration.
26 changes: 26 additions & 0 deletions docs/admin-guide/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Admin Overview

Welcome to the operator's manual for Newsletter Maker. This guide is assuming you are running the system, not writing code for it.

## Component Map
* **Django (API & Workers)**: Python core running the REST API.
* **Celery Worker**: Asynchronous task runner for LangGraph skills and entity extraction.
* **Celery Beat**: Cron scheduler for trend gathering and fetching RSS/Reddit plugins.
* **PostgreSQL**: Holds all standard application state and configuration.
* **Redis**: Acts as the message broker for Celery and the WebSockets channel layer.
* **Qdrant**: The Vector Database storing the high-dimensional embeddings for Cosine relevance calculations.
* **Ollama** (Optional): A containerized local LLM server for generating embeddings locally without paying OpenAI/OpenRouter.
* **Nginx**: Reverse proxy to route `/api/` traffic to Django and `/` traffic to Next.js.
* **Next.js**: The frontend App Router.

## Request Path
Browser -> Nginx -> Next.js (for HTML) -> Nginx -> Django Gunicorn -> Postgres.

## Ingestion Path
Beat triggers fetch -> Celery Worker -> Fetches RSS array -> Django DB -> Triggers Embedding -> Saves to Qdrant -> Enqueues LangGraph Pipeline -> Celery Worker Executes Skills.

## AI Pipeline Path
Orchestrated by LangGraph inside a Celery task. Calls out to the specific `OPENROUTER_API_BASE` or Local Ollama instance. State transitions are saved continuously to Postgres mapping to `SkillResult`s.

## Realtime Path
Browser -> Nginx (WebSocket Upgrade) -> Django Daphne ASGI -> Redis `CHANNEL_LAYER` -> Broadcast to users.
29 changes: 29 additions & 0 deletions docs/admin-guide/sources-and-allowlist.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Sources and Allowlist

Administrators must babysit the influx of content to keep the system healthy.

## Per-Plugin Config
* **RSS**: Relies purely on outbound GET fetches.
* **Reddit / Bluesky / Mastodon**: Relies entirely on their respective API limitations. If you hit 429 Too Many Requests, throttle the polling intervals.

## Health Check Semantics
Plugins record timestamped failures into `IngestionRun`. If an ingestion source fails 5 times consecutively, the frontend highlights it in red.

## Bootstrap Live Sources
You can instantly seed a fresh database with:
```bash
docker compose exec django python manage.py bootstrap_live_sources
```

## Intake Allowlist Lifecycle
When you forward a newsletter to your ingest address:
1. `Pending`: The system receives the email but quarantines it.
2. `Confirmation Sent`: The system emails the sender back with a one-time link.
3. `Confirmed`: The user clicks the link. Their address is now Trusted.
4. `Expired`: Stalled after 7 days.

## Revoking Senders
If a newsletter breaks or creates spam, remove it via the Django Admin panel under `newsletters.IntakeAllowlist`.

## Investigating Dropped Subscriptions
Check the `NewsletterIntake` model in Django admin. Emails that fail to parse HTML correctly record their Stack Traces there.
Loading
Loading