Skip to content

Commit b428156

Browse files
author
Ang
committed
feat: env-based credential injection, primaryEnv frontmatter, desensitize config, test coverage
- auth.py: add env_token (PAYWALLFETCHER_TOKEN) and env_cookie_header (PAYWALLFETCHER_COOKIE_*) as priority 1/2 auth sources - auth.py: resolve() never prints; warnings stored in config['_warnings'] - cli.py: _emit_warnings() emits to stderr in human mode, silent in JSON - cli.py: auth print-openclaw-snippet subcommand - cli.py: auth check --json includes candidate_sources + missing_env - SKILL.md: add primaryEnv + skillKey to openclaw metadata - config.example.json: remove cookies section (secrets out of tracked files) - config.debug-cookies.example.json: debug-only cookie fallback template - TOOLS.md: Credentials section with priority order, remove cookie-in-config advice - README.md: OpenClaw-native credential config section with skills.entries snippet - AGENTS.md: canonical entrypoints updated to src/paywallfetcher/ - tests/test_auth_env.py: 15 tests for env priority, no-print contract, redaction - tests/test_cli.py: 10 tests for --json position, output purity, domain allowlist
1 parent a88e7e1 commit b428156

10 files changed

Lines changed: 698 additions & 55 deletions

File tree

AGENTS.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,13 @@ Use it when an agent must:
1818
- **Repository landing page**: README.md
1919
- **Config template**: config.example.json
2020
- **Workspace-specific config**: TOOLS.md
21-
- **Article fetcher**: downloader.py
22-
- **Q&A fetcher**: qa/qa_downloader.py
23-
- **Q&A browser-assist**: qa/qa_unlock.py
24-
- **Auth layer**: auth_utils.py
21+
- **Core package**: src/paywallfetcher/
22+
- **CLI entrypoint**: `py -m paywallfetcher`
23+
- **Auth layer**: src/paywallfetcher/auth.py
24+
25+
> Legacy wrapper scripts (`downloader.py`, `qa/qa_downloader.py`, `qa/qa_unlock.py`,
26+
> `auth_utils.py`) remain for backward compatibility but are not canonical.
27+
> Prefer `py -m paywallfetcher` for all new agent tasks.
2528
2629
## Agent operating assumptions
2730

README.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,67 @@ Create a daily cron job at 09:00 that runs:
6767
Set the working directory to this workspace.
6868
```
6969

70+
## OpenClaw-native credential configuration
71+
72+
This skill supports env-based credential injection. The preferred path for
73+
OpenClaw users is to configure credentials via `skills.entries` rather than
74+
editing `config.json` directly.
75+
76+
Add this to your `openclaw.json`:
77+
78+
```json
79+
{
80+
"skills": {
81+
"entries": {
82+
"paywall_fetcher": {
83+
"apiKey": {
84+
"source": "env",
85+
"provider": "default",
86+
"id": "PAYWALLFETCHER_TOKEN"
87+
},
88+
"env": {
89+
"PAYWALLFETCHER_BASE_URL": "https://target.example",
90+
"PAYWALLFETCHER_TARGET_UID": "YOUR_TARGET_UID"
91+
}
92+
}
93+
}
94+
}
95+
}
96+
```
97+
98+
Then set the token env var (cookie string format):
99+
100+
```
101+
PAYWALLFETCHER_TOKEN="SESSION=<value>; XSRF-TOKEN=<value>"
102+
```
103+
104+
Or inject individual cookies:
105+
106+
```
107+
PAYWALLFETCHER_COOKIE_SESSION=<value>
108+
PAYWALLFETCHER_COOKIE_XSRF-TOKEN=<value>
109+
```
110+
111+
> **Note**: The Python CLI reads `PAYWALLFETCHER_TOKEN` and
112+
> `PAYWALLFETCHER_COOKIE_*` env vars directly. It does **not** auto-read
113+
> `skills.entries.*.config` from `openclaw.json` — OpenClaw injects those
114+
> values into the process environment before the CLI runs.
115+
116+
Generate a ready-to-paste snippet at any time:
117+
118+
```powershell
119+
py -m paywallfetcher auth print-openclaw-snippet
120+
```
121+
122+
### Credential priority order
123+
124+
| Priority | Source | How to supply |
125+
|---|---|---|
126+
| 1 | `env_token` | `PAYWALLFETCHER_TOKEN` env var |
127+
| 2 | `env_cookie_header` | `PAYWALLFETCHER_COOKIE_<NAME>` env vars |
128+
| 3 | `browser_auto` | logged-in Chrome or Edge session |
129+
| 4 | `config_cookies` | `cookies` in `config.json` — debug-only, do not commit |
130+
70131
## Capability map
71132

72133
| Capability | Preferred command | Transport |

TOOLS.md

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Edit this file when you set up PaywallFetcher for a specific target site.
1616
| Site kind | (set `site.kind` in config.json — e.g., `generic`) |
1717
| Base URL | (set `site.base_url` in config.json) |
1818
| Target UID | (set `site.target_uid` in config.json) |
19-
| Auth mode | `browser_auto` (reads Chrome or Edge session) |
19+
| Auth mode | `browser_auto` / `env_token` / `env_cookie_header` (see Credentials below) |
2020
| Output root | `./output` |
2121
| Q&A output | `./qa/output` |
2222

@@ -37,14 +37,25 @@ Override these under `site.api_paths` in `config.json` if the target site uses d
3737

3838
---
3939

40-
## Required cookies
40+
## Credentials
4141

42-
Populate these under `cookies` in `config.json` as a fallback if browser auth is unavailable:
42+
Preferred resolution order (highest priority first):
4343

44-
- `SESSION`
45-
- `XSRF-TOKEN`
44+
1. **`PAYWALLFETCHER_TOKEN` env var** — cookie string injected by OpenClaw `apiKey`:
45+
```
46+
PAYWALLFETCHER_TOKEN="SESSION=<value>; XSRF-TOKEN=<value>"
47+
```
48+
2. **`PAYWALLFETCHER_COOKIE_<NAME>` env vars** — individual cookie injection:
49+
```
50+
PAYWALLFETCHER_COOKIE_SESSION=<value>
51+
```
52+
3. **Browser session (`browser_auto`)** — reads logged-in Chrome or Edge automatically.
53+
4. **`config.json` cookies** — debug-only fallback. See `config.debug-cookies.example.json`.
4654

47-
Preferred: keep `auth.mode = browser_auto` and stay logged in through Chrome or Edge.
55+
> **Do not store production cookies in tracked files.**
56+
> Do not paste secrets into issues, artifacts, or screenshots.
57+
58+
Run `py -m paywallfetcher auth print-openclaw-snippet` to generate the OpenClaw config snippet.
4859

4960
---
5061

config.debug-cookies.example.json

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"_comment": "DEBUG-ONLY. Do not commit real cookies. Use env vars (PAYWALLFETCHER_TOKEN or PAYWALLFETCHER_COOKIE_*) for production.",
3+
4+
"cookies": {
5+
"SESSION": "PASTE_YOUR_SESSION_COOKIE_VALUE_HERE",
6+
"XSRF-TOKEN": "PASTE_YOUR_XSRF_TOKEN_VALUE_HERE"
7+
}
8+
}

config.example.json

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,6 @@
1919
"xsrf_cookie_names": ["XSRF-TOKEN", "XSRF_TOKEN"]
2020
},
2121

22-
"cookies": {
23-
"SESSION": "",
24-
"XSRF-TOKEN": ""
25-
},
26-
2722
"network": {
2823
"proxy": null,
2924
"request_timeout": 20,

skills/site-content-downloader/SKILL.md

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ version: "1.1.0"
44
description: Fetch articles and Q&A from the configured target site using the logged-in browser session. Do not ask the user to copy cookies. Supports one-off download, incremental polling, and scheduled collection via OpenClaw cron or Windows Task Scheduler.
55
user-invocable: true
66
tags: ["content-fetching", "browser-auth", "windows", "polling", "scheduled"]
7-
metadata: {"openclaw":{"emoji":"📰","homepage":"https://github.com/HzaCode/PaywallFetcher","os":["win32"],"requires":{"anyBins":["py","python"]}}}
7+
metadata: {"openclaw":{"emoji":"📰","homepage":"https://github.com/HzaCode/PaywallFetcher","os":["win32"],"requires":{"anyBins":["py","python"]},"primaryEnv":"PAYWALLFETCHER_TOKEN","skillKey":"paywall_fetcher"}}
88
---
99

1010
# PaywallFetcher
@@ -82,10 +82,18 @@ Treat this repository as an agent-operated workspace for:
8282

8383
## Flag reference
8484

85-
> **Important**: `--json` and `--config FILE` are **global flags** that must appear **before** the subcommand.
86-
> Correct: `py -m paywallfetcher --json article fetch --new-only`
87-
> Correct: `py -m paywallfetcher --config path/config.json article fetch`
88-
> Wrong: `py -m paywallfetcher article fetch --new-only --json`
85+
### Global flags (must appear before the subcommand)
86+
87+
| Flag | Type | Description |
88+
|---|---|---|
89+
| `--json` | bool | Machine-readable JSON output — must precede the subcommand |
90+
| `--config FILE` | str | Path to config file — must precede the subcommand |
91+
92+
```
93+
Correct: py -m paywallfetcher --json article fetch --new-only
94+
Correct: py -m paywallfetcher --config path/config.json article fetch
95+
Wrong: py -m paywallfetcher article fetch --new-only --json
96+
```
8997

9098
### `py -m paywallfetcher article fetch`
9199

@@ -96,17 +104,13 @@ Treat this repository as an agent-operated workspace for:
96104
| `--no-images` | bool | Skip image downloads |
97105
| `--dry-run` | bool | List without saving |
98106
| `--fail-on-empty` | bool | Exit 10 if no new content in incremental mode |
99-
| `--json` | bool | Machine-readable JSON output |
100-
| `--config FILE` | str | Path to config file |
101107

102108
### `py -m paywallfetcher qa fetch`
103109

104110
| Flag | Type | Description |
105111
|---|---|---|
106112
| `--new-only` | bool | Incremental mode |
107113
| `--start N` | int | Start from Nth item |
108-
| `--json` | bool | Machine-readable JSON output |
109-
| `--config FILE` | str | Path to config file |
110114

111115
### `py -m paywallfetcher qa browser-fetch`
112116

src/paywallfetcher/auth.py

Lines changed: 122 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,19 @@
11
"""Authentication layer.
22
3-
Resolves browser session cookies from Chrome / Edge automatically.
4-
Falls back to manually configured cookies only after browser auth failure.
5-
Proxy credentials are redacted from all log output.
3+
Priority order for credential resolution:
4+
1. env_token — PAYWALLFETCHER_TOKEN env var (cookie string or bearer token)
5+
2. env_cookie_header — PAYWALLFETCHER_COOKIE_<NAME> env vars (individual cookies)
6+
3. browser_auto — local Chrome / Edge logged-in session
7+
4. config_cookies — cookies field in config.json (debug-only fallback)
8+
9+
resolve() never prints. All warnings are stored in config['_warnings'] and
10+
emitted by the caller (cli.py) according to output mode.
611
"""
712

813
from __future__ import annotations
914

15+
import os
1016
import re
11-
import sys
1217
from typing import Any, Dict, List, Optional, Tuple
1318
from urllib.parse import urlparse
1419

@@ -26,53 +31,80 @@
2631
_DEFAULT_BROWSER_ORDER = ("chrome", "edge")
2732
_DEFAULT_XSRF_NAMES = ("XSRF-TOKEN", "XSRF_TOKEN", "xsrf-token", "x-xsrf-token", "_xsrf")
2833

34+
ENV_TOKEN_VAR = "PAYWALLFETCHER_TOKEN"
35+
ENV_COOKIE_PREFIX = "PAYWALLFETCHER_COOKIE_"
36+
2937

3038
def resolve(config: Dict[str, Any]) -> Dict[str, Any]:
31-
"""Resolve auth and inject _auth_source / _cookie_records / _cookies / _xsrf_token into config."""
39+
"""Resolve auth. Never prints. Warnings are stored in config['_warnings'].
40+
41+
Injects into config:
42+
_auth_source, _cookie_records, _cookies, _xsrf_token, _warnings
43+
"""
44+
warnings: List[str] = []
3245
auth = config.get("auth", {})
3346
mode = (auth.get("mode") or "browser_auto").lower()
3447

3548
cookie_domains = auth.get("cookie_domains") or _derive_domains(config["base_url"])
3649
xsrf_names = auth.get("xsrf_cookie_names") or list(_DEFAULT_XSRF_NAMES)
3750
required = auth.get("required_cookies") or []
3851

39-
manual = _manual_records(config)
40-
browser_records: List[Dict] = []
41-
browser_name: Optional[str] = None
42-
browser_errors: List[str] = []
43-
44-
if mode in {"browser", "browser_auto"}:
45-
browser_records, browser_name, browser_errors = _load_browser_records(
46-
auth.get("browser", "auto"), cookie_domains
47-
)
48-
49-
if browser_records:
50-
records = _merge(manual, browser_records)
51-
config["_auth_source"] = f"browser:{browser_name}"
52-
elif manual:
53-
records = manual
54-
config["_auth_source"] = "config"
55-
if mode in {"browser", "browser_auto"} and browser_errors:
56-
print(f"[Warn] Browser auth unavailable, using config cookies: {' | '.join(browser_errors)}", file=sys.stderr)
57-
elif mode == "config":
58-
raise AuthError("No cookies found in config.json under 'cookies'.")
52+
# ── Priority 1: env_token ──────────────────────────────────────────────
53+
records = _env_token_records(config)
54+
if records:
55+
config["_auth_source"] = "env_token"
5956
else:
60-
detail = " | ".join(browser_errors) if browser_errors else "No matching cookies in local browser profiles."
61-
raise AuthError(
62-
f"Failed to load browser cookies automatically. {detail}\n"
63-
" Ensure you are already logged into the target site in Chrome or Edge."
64-
)
57+
# ── Priority 2: env_cookie_header ──────────────────────────────────
58+
records = _env_cookie_records(config)
59+
if records:
60+
config["_auth_source"] = "env_cookie_header"
61+
else:
62+
# ── Priority 3: browser_auto ───────────────────────────────────
63+
manual = _manual_records(config)
64+
browser_records: List[Dict] = []
65+
browser_name: Optional[str] = None
66+
browser_errors: List[str] = []
67+
68+
if mode in {"browser", "browser_auto"}:
69+
browser_records, browser_name, browser_errors = _load_browser_records(
70+
auth.get("browser", "auto"), cookie_domains
71+
)
72+
73+
if browser_records:
74+
records = _merge(manual, browser_records)
75+
config["_auth_source"] = f"browser:{browser_name}"
76+
elif manual:
77+
# ── Priority 4: config_cookies (debug-only) ────────────────
78+
records = manual
79+
config["_auth_source"] = "config_cookies"
80+
if mode in {"browser", "browser_auto"} and browser_errors:
81+
warnings.append(
82+
f"Browser auth unavailable, falling back to config cookies "
83+
f"(debug-only): {' | '.join(browser_errors)}"
84+
)
85+
elif mode == "config":
86+
raise AuthError("No cookies found in config.json under 'cookies'.")
87+
else:
88+
detail = " | ".join(browser_errors) if browser_errors else "No matching cookies in local browser profiles."
89+
raise AuthError(
90+
f"Failed to load browser cookies automatically. {detail}\n"
91+
" Ensure you are already logged into the target site in Chrome or Edge.\n"
92+
f" Alternatively, set {ENV_TOKEN_VAR} or {ENV_COOKIE_PREFIX}<NAME> env vars."
93+
)
6594

6695
cookies_dict = {r["name"]: r["value"] for r in records}
6796
xsrf = _find_xsrf(cookies_dict, xsrf_names)
6897

6998
missing = [n for n in required if n not in cookies_dict]
7099
if missing and config.get("_auth_source", "").startswith("browser"):
71-
print(f"[Warn] Browser auth loaded but missing required cookies: {', '.join(missing)}", file=sys.stderr)
100+
warnings.append(
101+
f"Browser auth loaded but missing required cookies: {', '.join(missing)}"
102+
)
72103

73104
config["_cookie_records"] = records
74105
config["_cookies"] = cookies_dict
75106
config["_xsrf_token"] = xsrf
107+
config["_warnings"] = warnings
76108
return config
77109

78110

@@ -164,6 +196,65 @@ def doctor_auth(config: Dict[str, Any]) -> Dict[str, Any]:
164196
return result
165197

166198

199+
# ── env credential helpers ─────────────────────────────────────────────────
200+
201+
202+
def _env_token_records(config: Dict[str, Any]) -> List[Dict]:
203+
"""Parse PAYWALLFETCHER_TOKEN env var into cookie records.
204+
205+
Accepts a semicolon-separated cookie string: ``SESSION=abc; XSRF-TOKEN=xyz``
206+
Each ``NAME=value`` pair becomes one cookie record bound to the config host.
207+
Returns an empty list if the env var is unset or empty.
208+
"""
209+
token = os.environ.get(ENV_TOKEN_VAR, "").strip()
210+
if not token:
211+
return []
212+
213+
host = _normalize_domain(urlparse(config["base_url"]).netloc.split(":")[0])
214+
domain = f".{host}" if host else None
215+
216+
records: List[Dict] = []
217+
for part in token.split(";"):
218+
part = part.strip()
219+
if "=" not in part:
220+
continue
221+
name, _, value = part.partition("=")
222+
name = name.strip()
223+
value = value.strip()
224+
if name:
225+
records.append({
226+
"name": name, "value": value,
227+
"domain": domain, "path": "/",
228+
"secure": True, "expires": None,
229+
})
230+
return records
231+
232+
233+
def _env_cookie_records(config: Dict[str, Any]) -> List[Dict]:
234+
"""Collect PAYWALLFETCHER_COOKIE_<NAME>=value env vars as cookie records.
235+
236+
Each env var whose name starts with ``PAYWALLFETCHER_COOKIE_`` contributes
237+
one cookie; the cookie name is the suffix after the prefix.
238+
Returns an empty list if no matching env vars are set.
239+
"""
240+
host = _normalize_domain(urlparse(config["base_url"]).netloc.split(":")[0])
241+
domain = f".{host}" if host else None
242+
243+
records: List[Dict] = []
244+
for key, value in os.environ.items():
245+
if not key.startswith(ENV_COOKIE_PREFIX):
246+
continue
247+
name = key[len(ENV_COOKIE_PREFIX):]
248+
value = value.strip()
249+
if name and value:
250+
records.append({
251+
"name": name, "value": value,
252+
"domain": domain, "path": "/",
253+
"secure": True, "expires": None,
254+
})
255+
return records
256+
257+
167258
# ── internals ──────────────────────────────────────────────────────────────
168259

169260

0 commit comments

Comments
 (0)