CreatmanCEO · CreatmanCEO · May 5, 2026 · May 5, 2026
diff --git a/.github/workflows/validate.yml b/.github/workflows/validate.yml
@@ -0,0 +1,82 @@
+name: Validate
+
+on:
+  push:
+    branches: [master, main]
+  pull_request:
+    branches: [master, main]
+
+jobs:
+  validate:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: LICENSE exists
+        run: test -s LICENSE || (echo "::error::LICENSE missing or empty" && exit 1)
+
+      - name: CHANGELOG.md exists
+        run: test -s CHANGELOG.md || (echo "::error::CHANGELOG.md missing or empty" && exit 1)
+
+      - uses: actions/setup-python@v5
+        with:
+          python-version: '3.12'
+
+      - name: All docs/* assets referenced from READMEs exist
+        run: |
+          set -e
+          fail=0
+          for ref in $(grep -hoE 'docs/[a-zA-Z0-9_./-]+' README.md README.ru.md | sort -u); do
+            ref="${ref%[)\"\\,.]}"
+            if [ ! -e "$ref" ]; then
+              echo "::error file=README.md::missing referenced asset $ref"
+              fail=1
+            fi
+          done
+          exit $fail
+
+      - name: HTML samples are well-formed
+        run: |
+          set -e
+          fail=0
+          for f in $(git ls-files 'docs/**/*.html' '*.html'); do
+            if ! python -c "from html.parser import HTMLParser; HTMLParser().feed(open('$f', encoding='utf-8').read())" 2>&1 | grep -qE 'Error|error'; then
+              :  # parser ran without raising
+            else
+              echo "::error file=$f::HTML parse error"
+              fail=1
+            fi
+          done
+          exit $fail
+
+      - name: Internal Markdown links resolve
+        run: |
+          set -e
+          fail=0
+          for src in README.md README.ru.md CHANGELOG.md CONTRIBUTING.md docs/architecture.md docs/case-studies.md docs/detection-rules.md docs/device-telemetry.md docs/research-mobile-malware-signatures.md docs/reports/sample-scan-report.md; do
+            [ -f "$src" ] || continue
+            base="$(dirname "$src")"
+            for tgt in $(grep -hoE '\]\([^)]+\)' "$src" | sed 's/](\(.*\))/\1/' | sed 's/#.*$//'); do
+              case "$tgt" in
+                http*|mailto:*|"") continue ;;
+              esac
+              [ "$base" = "." ] && resolved="$tgt" || resolved="$base/$tgt"
+              if [ ! -e "$resolved" ] && [ ! -e "$tgt" ]; then
+                echo "::error file=$src::broken internal link → $tgt"
+                fail=1
+              fi
+            done
+          done
+          exit $fail
+
+      - name: Image / WebP screenshot files are non-empty
+        run: |
+          set -e
+          fail=0
+          for f in $(git ls-files 'docs/screenshots/*'); do
+            if [ ! -s "$f" ]; then
+              echo "::error file=$f::screenshot file empty"
+              fail=1
+            fi
+          done
+          exit $fail
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,68 @@
+# Changelog
+
+Public-facing milestones for the Security Scanner Bot project. Internal commit history is private — this file tracks user-visible behavioural / detection-engine changes.
+
+Format: [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) · [SemVer](https://semver.org/spec/v2.0.0.html).
+
+## [Unreleased] — Phase 2: open-source / self-hosted (in progress)
+
+### Working on
+- Docker-compose self-hosted package (`docker compose up` deploys the entire stack on user's own server)
+- Full backend code release under MIT (currently public repo is documentation-only)
+- iOS-specific detection rules (Apple ecosystem telemetry classification, iCloud Private Relay detection)
+- WireGuard VPN option (alternative to VLESS for users who prefer WireGuard)
+- Scheduled recurring scans (daily / weekly automated)
+- PDF report export with charts and visualisations
+- Expanded stalkerware database beyond the 919-domain AssoEchap baseline
+
+## [Showcase 0.2.0] — 2026-05-05
+
+### Added
+- `docs/screenshots/01-onboarding-and-vpn.webp` — three-screen onboarding flow showing greeting / privacy disclaimer / VPN-client picker
+- `docs/screenshots/02-scan-and-report-delivery.webp` — three-screen active scan flow showing scan-started state / two VPN-key delivery modes / final report with download attachment
+- `docs/reports/sample-scan-report.md` — anonymised real-world report example (3 CRITICAL findings — SSH/Telnet/RTSP — plus 6 HIGH-severity threat-intel IPs, traffic statistics, plain-language recommendations)
+- `docs/reports/sample-scan-report.html` — same report rendered as standalone HTML (inline CSS, dark theme, mobile-friendly) — matches the file format the bot delivers to users
+- `Limitations & known failure modes` section — 8 honest constraints (encrypted-payload blindness, JA3 evasion, detection lag for slow beaconing, mobile-only scope, network-side only, VPN-trust requirement, false-positive rate, no on-device remediation)
+- `Contact` section with explicit channels for end users, security researchers / responsible disclosure, press, partnership / commercial discussions
+- `Related — Claude Code ecosystem` section with cross-links to all 7 sister repos by the same author (anti-regression-setup, ai-context-hierarchy, claude-statusline, notebooklm-claude-workflows, lingua-companion, diabot, ghost-showcase)
+- Author signature expanded — Nick Podolyak with GitHub / Habr / dev.to / Telegram links
+- `CHANGELOG.md` (this file)
+- `CONTRIBUTING.md` with Phase-2-readiness priorities (detection rules, manufacturer telemetry data, language locales, Docker-compose hardening, security-disclosure clause)
+- `.github/workflows/validate.yml` — LICENSE / CHANGELOG presence, every `docs/*` asset referenced from README exists, internal Markdown links resolve, sample HTML report parses as valid HTML, sample MD report has no broken cross-refs
+- New badges — Stars, Validate CI, "@secure_scanbot LIVE"
+
+### Changed
+- README structured into a clear flow with the new "What it looks like" section right after badges (screenshots + sample-report link visible above the Table of Contents — readers see *what the bot actually does* before reading the architecture)
+- Author signature footer no longer just "Built by Creatman" — full attribution with all professional channels
+
+### Operational fix
+- **Bot uptime restored.** `security-scanner-bot.service` was crash-looping with `status=203/EXEC` since 2026-04-15 06:46 UTC because `/root/security-scanner/venv/` had been removed from disk (likely during cleanup). systemd attempted 170,678 restarts before this fix. The venv has been recreated, dependencies reinstalled (aiogram 3.4.1, aiohttp 3.9.3, aiosqlite 0.19.0, nest_asyncio, plus the analysis stack), and the service is back to active polling. Bot is once again live at @secure_scanbot.
+
+## [Showcase 0.1.0] — 2026-03-16
+
+### Added (initial showcase publication)
+- Bilingual `README.md` and `README.ru.md` (1,072 / 1,077 lines) — comprehensive documentation of detection layers, architecture, comparison with existing solutions, real-world case study
+- Five hero badges with concrete numbers — License, Telegram platform, 18,987 Suricata rules, 919 stalkerware domains, 97 JA3 fingerprints
+- `docs/architecture.md` — detailed component descriptions with data-flow diagrams
+- `docs/case-studies.md` — anonymised real-world case studies showing scanner findings
+- `docs/detection-rules.md` — complete reference of all detection rules (ports, behaviours, blacklists)
+- `docs/device-telemetry.md` — manufacturer telemetry domains database with privacy analysis
+- `docs/research-mobile-malware-signatures.md` — research on network signatures of mobile malware families
+- Real-world case study in README — 26 SSH connections discovered on a Xiaomi Redmi Note device (anonymised)
+- Comparison table vs Amnesty MVT, PiRogue Security Suite, commercial mobile antivirus
+- LICENSE — MIT
+
+## [Bot v2.3 production] — 2026-03 (private code, public behaviour)
+
+### Implemented behavioural changes (visible to users of @secure_scanbot)
+
+- **Layer 4: JA3 TLS fingerprinting** — 97 malware fingerprints from abuse.ch SSLBL. Suricata extracts JA3 hashes; `ja3_matcher.py` correlates against the database. Detects malware by TLS handshake even on port 443.
+- **Secure VPN key delivery** — subscription URL (recommended) and raw VLESS URI (fallback) so the user can choose their preferred client.
+- **Admin broadcast system** — FSM flow: compose text, preview with user count, confirm, send to all users. HTML support with fallback on parse errors. Per-user error logging.
+- **Tone-of-voice rewrite** — all user-facing messages simplified for non-technical users. Three report styles: plain language / technical / expert.
+- **App download links** — inline button in scan message; per-OS links: Android (GitHub APK direct) + iPhone (AppStore) with Russia-aware warnings. Apps: Hiddify, v2rayNG, NekoBox, Streisand.
+- **Cancel / back flow** — Cancel deletes scan from DB, removes the VPN key, notifies the user. "Back to scan" from app links does *not* cancel an active scan.
+- **IP enrichment pipeline** — offline prefix matching + IP-API.com + SQLite cache (24-hour TTL).
+- **False-positive protection** — server IP filtering, `SAFE_PREFIXES`, AbuseIPDB confidence threshold, client-IP exclusion.
+- **Stale scan cleanup** — auto-cleanup of scans older than 45 minutes; periodic check every 30 minutes.
+- **Admin metrics** — scan statistics, AI cost tracking (model, tokens, cost per scan), active scan monitoring with username.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,58 @@
+# Contributing
+
+This repository is currently a **public showcase** — the bot's source code is private. **Phase 2** (in progress) is to release the full backend code under MIT and make the entire stack `docker compose up`-able on the contributor's own server. This `CONTRIBUTING.md` is the bridge: priorities are documented now so that when the code lands, the community can hit the ground running.
+
+## Priorities (highest impact first)
+
+1. **Detection-rule submissions** — even before the backend is open-sourced, the maintainer accepts well-documented detection rule proposals via [GitHub Issues](https://github.com/CreatmanCEO/security-scanner/issues):
+   - **New stalkerware domains** with source / IoC reference (current baseline: 919 domains from AssoEchap)
+   - **New mining-pool patterns** (current: Stratum protocol detection + 30+ mining-pool domains)
+   - **New JA3 fingerprints** for known malware families with source (e.g. abuse.ch SSLBL, JoeSandbox)
+   - **Behavioural patterns** with sample traffic captures (anonymised) — beaconing, exfiltration, sustained streaming
+2. **Manufacturer telemetry mapping** — `docs/device-telemetry.md` documents per-vendor telemetry domains (Apple, Google, Samsung, Xiaomi, Huawei). Coverage of Asian / regional manufacturers (Vivo, Oppo, OnePlus, Realme, Tecno, Infinix) is incomplete — PRs welcome.
+3. **Language locales** — currently English (this README) + Russian (`README.ru.md`). When the bot code lands open-source, the bot itself will need locale files (currently EN + RU). Translations welcome for: Spanish, Portuguese, Ukrainian, German, French, Hindi.
+4. **Phase-2 Docker self-hosted hardening** — when the backend code is published, the docker-compose stack will need:
+   - Hardened Suricata / Zeek configurations
+   - Resource limits and health checks
+   - Optional Tailscale / WireGuard as alternative to VLESS+Reality
+   - First-run wizard for API key entry
+5. **iOS-specific detection rules** — Apple ecosystem telemetry classification, iCloud Private Relay traffic differentiation. Pegasus / NSO behavioural indicators (high-port outbound + CloudFront infrastructure) are documented but coverage can grow.
+6. **VirusTotal / MISP / STIX2 integrations** — push detection results into standard threat-intelligence formats for security teams.
+
+## Responsible disclosure
+
+If you have found a **security vulnerability** in the bot or the analysis pipeline (XSS in report rendering, SQL injection, VPN escape, etc.) — **do not** open a public GitHub issue. Instead:
+
+- Email **creatmanick@gmail.com** with subject prefix `[SECURITY] security-scanner — `
+- Provide reproduction steps, observed impact, and an anonymised reporter handle if you want public credit
+- Expect an acknowledgement within 5 business days
+
+We will coordinate disclosure timing with you. Public credit on the maintainer's discretion.
+
+## What we will not merge
+
+- Detection rules that target legitimate consumer apps (Telegram itself, WhatsApp, Signal, mainstream banking apps) — false-positive risk is too high
+- Anything that requires a paid third-party service to function (without an open-source / free-tier alternative)
+- Changes that bypass the two-step user-consent flow (consent on first scan; explicit start of every scan)
+- Off-topic features (browser extensions, on-device app scanning) — those belong in dedicated forks
+- Changes to the behavioural-detection thresholds without sample traffic and a confusion-matrix justification
+
+## Pull request checklist (when Phase 2 code is open)
+
+- [ ] If you added a detection rule: a test fixture with sample traffic capturing the rule firing, and a sample where it does *not* fire
+- [ ] If you touched a Suricata rule: `suricata-update` and `suricata -T -c suricata.yaml` clean
+- [ ] If you touched a Zeek script: `zeek -a script.zeek` clean
+- [ ] User-visible changes mirrored in **both** `README.md` and `README.ru.md`
+- [ ] `CHANGELOG.md` entry in Keep a Changelog format
+- [ ] No PII (IPs, phone identifiers, account names) in commits or test fixtures — anonymise everything
+
+## Style
+
+- Code: Python 3.11+, type hints, docstrings on public functions, `logging` (not `print`), HTML parse_mode for Telegram
+- Documentation: prefer plain-language explanation over jargon; show concrete examples
+- Issue / PR titles: imperative voice (*"Add detection for X"*, not *"Added detection for X"*)
+- One feature per PR
+
+## Author / maintainer
+
+[@CreatmanCEO](https://github.com/CreatmanCEO) — Nick Podolyak. For discussion before opening a large PR or proposing a detection-rule family, reach out via [@Creatman_it](https://t.me/Creatman_it) on Telegram.