Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 82 additions & 0 deletions .github/workflows/validate.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
name: Validate

on:
push:
branches: [master, main]
pull_request:
branches: [master, main]

jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: LICENSE exists
run: test -s LICENSE || (echo "::error::LICENSE missing or empty" && exit 1)

- name: CHANGELOG.md exists
run: test -s CHANGELOG.md || (echo "::error::CHANGELOG.md missing or empty" && exit 1)

- uses: actions/setup-python@v5
with:
python-version: '3.12'

- name: All docs/* assets referenced from READMEs exist
run: |
set -e
fail=0
for ref in $(grep -hoE 'docs/[a-zA-Z0-9_./-]+' README.md README.ru.md | sort -u); do
ref="${ref%[)\"\\,.]}"
if [ ! -e "$ref" ]; then
echo "::error file=README.md::missing referenced asset $ref"
fail=1
fi
done
exit $fail

- name: HTML samples are well-formed
run: |
set -e
fail=0
for f in $(git ls-files 'docs/**/*.html' '*.html'); do
if ! python -c "from html.parser import HTMLParser; HTMLParser().feed(open('$f', encoding='utf-8').read())" 2>&1 | grep -qE 'Error|error'; then
: # parser ran without raising
else
echo "::error file=$f::HTML parse error"
fail=1
fi
done
exit $fail

- name: Internal Markdown links resolve
run: |
set -e
fail=0
for src in README.md README.ru.md CHANGELOG.md CONTRIBUTING.md docs/architecture.md docs/case-studies.md docs/detection-rules.md docs/device-telemetry.md docs/research-mobile-malware-signatures.md docs/reports/sample-scan-report.md; do
[ -f "$src" ] || continue
base="$(dirname "$src")"
for tgt in $(grep -hoE '\]\([^)]+\)' "$src" | sed 's/](\(.*\))/\1/' | sed 's/#.*$//'); do
case "$tgt" in
http*|mailto:*|"") continue ;;
esac
[ "$base" = "." ] && resolved="$tgt" || resolved="$base/$tgt"
if [ ! -e "$resolved" ] && [ ! -e "$tgt" ]; then
echo "::error file=$src::broken internal link → $tgt"
fail=1
fi
done
done
exit $fail

- name: Image / WebP screenshot files are non-empty
run: |
set -e
fail=0
for f in $(git ls-files 'docs/screenshots/*'); do
if [ ! -s "$f" ]; then
echo "::error file=$f::screenshot file empty"
fail=1
fi
done
exit $fail
68 changes: 68 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Changelog

Public-facing milestones for the Security Scanner Bot project. Internal commit history is private — this file tracks user-visible behavioural / detection-engine changes.

Format: [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) · [SemVer](https://semver.org/spec/v2.0.0.html).

## [Unreleased] — Phase 2: open-source / self-hosted (in progress)

### Working on
- Docker-compose self-hosted package (`docker compose up` deploys the entire stack on user's own server)
- Full backend code release under MIT (currently public repo is documentation-only)
- iOS-specific detection rules (Apple ecosystem telemetry classification, iCloud Private Relay detection)
- WireGuard VPN option (alternative to VLESS for users who prefer WireGuard)
- Scheduled recurring scans (daily / weekly automated)
- PDF report export with charts and visualisations
- Expanded stalkerware database beyond the 919-domain AssoEchap baseline

## [Showcase 0.2.0] — 2026-05-05

### Added
- `docs/screenshots/01-onboarding-and-vpn.webp` — three-screen onboarding flow showing greeting / privacy disclaimer / VPN-client picker
- `docs/screenshots/02-scan-and-report-delivery.webp` — three-screen active scan flow showing scan-started state / two VPN-key delivery modes / final report with download attachment
- `docs/reports/sample-scan-report.md` — anonymised real-world report example (3 CRITICAL findings — SSH/Telnet/RTSP — plus 6 HIGH-severity threat-intel IPs, traffic statistics, plain-language recommendations)
- `docs/reports/sample-scan-report.html` — same report rendered as standalone HTML (inline CSS, dark theme, mobile-friendly) — matches the file format the bot delivers to users
- `Limitations & known failure modes` section — 8 honest constraints (encrypted-payload blindness, JA3 evasion, detection lag for slow beaconing, mobile-only scope, network-side only, VPN-trust requirement, false-positive rate, no on-device remediation)
- `Contact` section with explicit channels for end users, security researchers / responsible disclosure, press, partnership / commercial discussions
- `Related — Claude Code ecosystem` section with cross-links to all 7 sister repos by the same author (anti-regression-setup, ai-context-hierarchy, claude-statusline, notebooklm-claude-workflows, lingua-companion, diabot, ghost-showcase)
- Author signature expanded — Nick Podolyak with GitHub / Habr / dev.to / Telegram links
- `CHANGELOG.md` (this file)
- `CONTRIBUTING.md` with Phase-2-readiness priorities (detection rules, manufacturer telemetry data, language locales, Docker-compose hardening, security-disclosure clause)
- `.github/workflows/validate.yml` — LICENSE / CHANGELOG presence, every `docs/*` asset referenced from README exists, internal Markdown links resolve, sample HTML report parses as valid HTML, sample MD report has no broken cross-refs
- New badges — Stars, Validate CI, "@secure_scanbot LIVE"

### Changed
- README structured into a clear flow with the new "What it looks like" section right after badges (screenshots + sample-report link visible above the Table of Contents — readers see *what the bot actually does* before reading the architecture)
- Author signature footer no longer just "Built by Creatman" — full attribution with all professional channels

### Operational fix
- **Bot uptime restored.** `security-scanner-bot.service` was crash-looping with `status=203/EXEC` since 2026-04-15 06:46 UTC because `/root/security-scanner/venv/` had been removed from disk (likely during cleanup). systemd attempted 170,678 restarts before this fix. The venv has been recreated, dependencies reinstalled (aiogram 3.4.1, aiohttp 3.9.3, aiosqlite 0.19.0, nest_asyncio, plus the analysis stack), and the service is back to active polling. Bot is once again live at @secure_scanbot.

## [Showcase 0.1.0] — 2026-03-16

### Added (initial showcase publication)
- Bilingual `README.md` and `README.ru.md` (1,072 / 1,077 lines) — comprehensive documentation of detection layers, architecture, comparison with existing solutions, real-world case study
- Five hero badges with concrete numbers — License, Telegram platform, 18,987 Suricata rules, 919 stalkerware domains, 97 JA3 fingerprints
- `docs/architecture.md` — detailed component descriptions with data-flow diagrams
- `docs/case-studies.md` — anonymised real-world case studies showing scanner findings
- `docs/detection-rules.md` — complete reference of all detection rules (ports, behaviours, blacklists)
- `docs/device-telemetry.md` — manufacturer telemetry domains database with privacy analysis
- `docs/research-mobile-malware-signatures.md` — research on network signatures of mobile malware families
- Real-world case study in README — 26 SSH connections discovered on a Xiaomi Redmi Note device (anonymised)
- Comparison table vs Amnesty MVT, PiRogue Security Suite, commercial mobile antivirus
- LICENSE — MIT

## [Bot v2.3 production] — 2026-03 (private code, public behaviour)

### Implemented behavioural changes (visible to users of @secure_scanbot)

- **Layer 4: JA3 TLS fingerprinting** — 97 malware fingerprints from abuse.ch SSLBL. Suricata extracts JA3 hashes; `ja3_matcher.py` correlates against the database. Detects malware by TLS handshake even on port 443.
- **Secure VPN key delivery** — subscription URL (recommended) and raw VLESS URI (fallback) so the user can choose their preferred client.
- **Admin broadcast system** — FSM flow: compose text, preview with user count, confirm, send to all users. HTML support with fallback on parse errors. Per-user error logging.
- **Tone-of-voice rewrite** — all user-facing messages simplified for non-technical users. Three report styles: plain language / technical / expert.
- **App download links** — inline button in scan message; per-OS links: Android (GitHub APK direct) + iPhone (AppStore) with Russia-aware warnings. Apps: Hiddify, v2rayNG, NekoBox, Streisand.
- **Cancel / back flow** — Cancel deletes scan from DB, removes the VPN key, notifies the user. "Back to scan" from app links does *not* cancel an active scan.
- **IP enrichment pipeline** — offline prefix matching + IP-API.com + SQLite cache (24-hour TTL).
- **False-positive protection** — server IP filtering, `SAFE_PREFIXES`, AbuseIPDB confidence threshold, client-IP exclusion.
- **Stale scan cleanup** — auto-cleanup of scans older than 45 minutes; periodic check every 30 minutes.
- **Admin metrics** — scan statistics, AI cost tracking (model, tokens, cost per scan), active scan monitoring with username.
58 changes: 58 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Contributing

This repository is currently a **public showcase** — the bot's source code is private. **Phase 2** (in progress) is to release the full backend code under MIT and make the entire stack `docker compose up`-able on the contributor's own server. This `CONTRIBUTING.md` is the bridge: priorities are documented now so that when the code lands, the community can hit the ground running.

## Priorities (highest impact first)

1. **Detection-rule submissions** — even before the backend is open-sourced, the maintainer accepts well-documented detection rule proposals via [GitHub Issues](https://github.com/CreatmanCEO/security-scanner/issues):
- **New stalkerware domains** with source / IoC reference (current baseline: 919 domains from AssoEchap)
- **New mining-pool patterns** (current: Stratum protocol detection + 30+ mining-pool domains)
- **New JA3 fingerprints** for known malware families with source (e.g. abuse.ch SSLBL, JoeSandbox)
- **Behavioural patterns** with sample traffic captures (anonymised) — beaconing, exfiltration, sustained streaming
2. **Manufacturer telemetry mapping** — `docs/device-telemetry.md` documents per-vendor telemetry domains (Apple, Google, Samsung, Xiaomi, Huawei). Coverage of Asian / regional manufacturers (Vivo, Oppo, OnePlus, Realme, Tecno, Infinix) is incomplete — PRs welcome.
3. **Language locales** — currently English (this README) + Russian (`README.ru.md`). When the bot code lands open-source, the bot itself will need locale files (currently EN + RU). Translations welcome for: Spanish, Portuguese, Ukrainian, German, French, Hindi.
4. **Phase-2 Docker self-hosted hardening** — when the backend code is published, the docker-compose stack will need:
- Hardened Suricata / Zeek configurations
- Resource limits and health checks
- Optional Tailscale / WireGuard as alternative to VLESS+Reality
- First-run wizard for API key entry
5. **iOS-specific detection rules** — Apple ecosystem telemetry classification, iCloud Private Relay traffic differentiation. Pegasus / NSO behavioural indicators (high-port outbound + CloudFront infrastructure) are documented but coverage can grow.
6. **VirusTotal / MISP / STIX2 integrations** — push detection results into standard threat-intelligence formats for security teams.

## Responsible disclosure

If you have found a **security vulnerability** in the bot or the analysis pipeline (XSS in report rendering, SQL injection, VPN escape, etc.) — **do not** open a public GitHub issue. Instead:

- Email **creatmanick@gmail.com** with subject prefix `[SECURITY] security-scanner — `
- Provide reproduction steps, observed impact, and an anonymised reporter handle if you want public credit
- Expect an acknowledgement within 5 business days

We will coordinate disclosure timing with you. Public credit on the maintainer's discretion.

## What we will not merge

- Detection rules that target legitimate consumer apps (Telegram itself, WhatsApp, Signal, mainstream banking apps) — false-positive risk is too high
- Anything that requires a paid third-party service to function (without an open-source / free-tier alternative)
- Changes that bypass the two-step user-consent flow (consent on first scan; explicit start of every scan)
- Off-topic features (browser extensions, on-device app scanning) — those belong in dedicated forks
- Changes to the behavioural-detection thresholds without sample traffic and a confusion-matrix justification

## Pull request checklist (when Phase 2 code is open)

- [ ] If you added a detection rule: a test fixture with sample traffic capturing the rule firing, and a sample where it does *not* fire
- [ ] If you touched a Suricata rule: `suricata-update` and `suricata -T -c suricata.yaml` clean
- [ ] If you touched a Zeek script: `zeek -a script.zeek` clean
- [ ] User-visible changes mirrored in **both** `README.md` and `README.ru.md`
- [ ] `CHANGELOG.md` entry in Keep a Changelog format
- [ ] No PII (IPs, phone identifiers, account names) in commits or test fixtures — anonymise everything

## Style

- Code: Python 3.11+, type hints, docstrings on public functions, `logging` (not `print`), HTML parse_mode for Telegram
- Documentation: prefer plain-language explanation over jargon; show concrete examples
- Issue / PR titles: imperative voice (*"Add detection for X"*, not *"Added detection for X"*)
- One feature per PR

## Author / maintainer

[@CreatmanCEO](https://github.com/CreatmanCEO) — Nick Podolyak. For discussion before opening a large PR or proposing a detection-rule family, reach out via [@Creatman_it](https://t.me/Creatman_it) on Telegram.
Loading
Loading