From b939897047bfc84899fd2f4b9331b4b7d0c68dd4 Mon Sep 17 00:00:00 2001 From: AI Agent Bot Date: Fri, 15 May 2026 12:05:36 -0500 Subject: [PATCH] =?UTF-8?q?Snowflake=20red-team=20iter-5=20=E2=80=94=20hea?= =?UTF-8?q?lthcare=20overlay,=20detection=20honesty=20pass,=20Chains=20K/L?= =?UTF-8?q?/M,=20deeper=20I/C,=20hardened=20pipeline?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Major additions: * Healthcare overlay (`docs/analysis/snowflake-healthcare-overlay-2026.md`) — per-chain PHI exposure map, HIPAA control mapping, BAA considerations, OCR audit-retention sufficiency analysis. * Detection honesty pass — canonical `detection/snowflake/ENRICHMENT.md` inventorying every derived field (allowlists, baselines, IdP correlation, Cortex sidecar) with native source + computation + deployment checklist; `enrichment:` block added to all Sigma rules (~24 files, ACCOUNT_USAGE and Trail variants). * `bulk_exfil_baseline.yml` rewritten — role-baseline / volume-baseline / off-hours signals replace the volume-only floor. * `federated_login_anomaly.yml` made lag-tolerant — documents Snowflake / Okta / Entra latency profile; both-sources-caught-up gate prevents FP storms during ingestion lag. New chains: * Chain K — Polaris / Iceberg catalog abuse (`iceberg_catalog_pivot.py` + `iceberg_table_outside_catalog_base.yml`). * Chain L — External OAuth scope drift (`oauth_scope_audit.py` + `oauth_integration_scope_drift.yml`). * Chain M — UDF EAI breakout (`udf_eai_egress.py` + `udf_with_eai_invocation.yml`). * SPCS base-image supply chain (Chain H extension) — `spcs_base_image_probe.py` + `spcs_image_unpinned_or_external.yml`. Deepening on existing chains: * Chain I — Cortex agent abuse: `mode_corpus.py` externalizes payloads; new modes `semantic_inject` / `authority_spoof` / `multi_turn_setup` / `multi_turn_payoff` / `search_rank_hijack`; new behavioural rule `cortex_agent_followup_without_user_intent.yml` (no keyword dependency); full `lab-validation/` directory with trace + search audit + MCP poisoning lab SQL. * Chain C — Native App supply chain: `naaaps_bypass_probe.py` with a 10-payload corpus across the four documented NAAAPS threat categories; `v2-dep` + `v3-loader` manifests for the deferred-loader timeline; `--variant multi-stage` simulator mode; `native_app_dependency_drift.yml`. Pipeline & infra hardening: * Streaming-ingest: Function timeout extended to 4 min and aligned with the 60 s poll cadence; `host.json` singleton block prevents auto-recovery race; cursor write goes through fcntl.flock + atomic rename; README replaces the "~90 s end-to-end" claim with a per-stage measurement methodology. * `tools/lib/snowflake_mock_client.py` — shared `login_with_pat` / `run_sql` / `read_query_history` / `get` / `post`; 5 pivot tools refactored to use it (boilerplate consolidation). Report + indexes: * `docs/analysis/snowflake-platform-attack-surface-2026.md` adds chains K/L/M + SPCS image; `reports/snowflake-platform-assessment/` attack-chains, detection, and index pages updated. * `CLAUDE.md`, root `README.md`, `detection/snowflake/README.md`, and both tool READMEs refreshed. CI status: `check_snowflake_report_integrity`, `check_snowflake_tools_syntax`, and `check_mock_services_loopback` all pass against the new content. Co-Authored-By: Claude Opus 4.7 (1M context) --- CLAUDE.md | 12 +- README.md | 4 +- detection/snowflake/ENRICHMENT.md | 484 ++++++++++++++++++ detection/snowflake/README.md | 8 +- .../snowflake/sigma/bulk_exfil_baseline.yml | 76 ++- .../sigma/connector_secret_leak_in_logs.yml | 3 + .../sigma/cortex_code_pre_1_0_25.yml | 3 + ...cortex_code_session_to_unknown_session.yml | 6 + .../sigma/federated_login_anomaly.yml | 61 ++- .../native_app_unexpected_version_bump.yml | 6 + .../snowflake/streaming-ingest/README.md | 37 ++ .../azure-function/function.json | 1 + .../streaming-ingest/azure-function/host.json | 10 +- .../streaming-ingest/poller/poller.py | 57 ++- .../snowflake-healthcare-overlay-2026.md | 415 +++++++++++++++ .../snowflake-platform-attack-surface-2026.md | 155 ++++++ .../attack-chains.html | 88 ++++ .../detection.html | 41 +- .../snowflake-platform-assessment/index.html | 13 + tools/cloud-identity/snowflake/README.md | 20 +- .../sigma/oauth_integration_scope_drift.yml | 68 +++ .../partner_integration_credential_replay.yml | 8 + ...er_integration_credential_replay_trail.yml | 8 + .../sigma/snowflake_keypair_auth_abuse.yml | 6 + .../snowflake_keypair_auth_abuse_trail.yml | 7 + .../detection/sigma/snowflake_pat_anomaly.yml | 3 + .../sigma/snowflake_scim_role_race.yml | 6 + .../snowflake/oauth_scope_audit.py | 198 +++++++ .../snowflake-pivot/README.md | 31 +- .../snowflake-pivot/bind_param_evasion.py | 36 +- .../iceberg_table_outside_catalog_base.yml | 65 +++ .../sigma/snowflake_bind_param_audit_gap.yml | 5 + ...flake_replication_group_unknown_target.yml | 3 + ...replication_group_unknown_target_trail.yml | 3 + ...wflake_share_creation_unknown_consumer.yml | 5 + ..._share_creation_unknown_consumer_trail.yml | 3 + .../sigma/snowflake_spcs_eai_overbroad.yml | 3 + .../snowflake_spcs_eai_overbroad_trail.yml | 4 + .../snowflake_storage_integration_misuse.yml | 6 + ...flake_storage_integration_misuse_trail.yml | 5 + .../sigma/spcs_image_unpinned_or_external.yml | 69 +++ .../sigma/udf_with_eai_invocation.yml | 67 +++ .../snowflake-pivot/iceberg_catalog_pivot.py | 190 +++++++ .../replication_group_exfil.py | 42 +- .../snowflake-pivot/share_creation_exfil.py | 44 +- .../snowflake-pivot/spcs_base_image_probe.py | 167 ++++++ .../snowflake-pivot/spcs_egress_probe.py | 39 +- .../storage_integration_enum.py | 31 +- .../snowflake-pivot/udf_eai_egress.py | 155 ++++++ tools/lib/snowflake_mock_client.py | 121 +++++ tools/llm-attacks/cortex/README.md | 37 +- .../cortex/cortex_agent_mcp_bench.py | 118 +++-- .../cortex/detection/false-positive-notes.md | 23 + .../sigma/cortex_agent_directive_followup.yml | 6 + .../cortex_agent_directive_followup_trail.yml | 6 + ...tex_agent_followup_without_user_intent.yml | 96 ++++ .../cortex_agent_sql_from_tool_output.yml | 6 + .../sigma/cortex_search_rank_anomaly.yml | 7 + .../cortex/lab-validation/README.md | 54 ++ .../lab-validation/mcp_poisoning_setup.sql | 86 ++++ .../observe_cortex_agent_trace.sql | 126 +++++ .../lab-validation/observe_search_audit.sql | 121 +++++ tools/llm-attacks/cortex/mode_corpus.py | 193 +++++++ .../snowflake-native-app/README.md | 72 ++- .../sigma/native_app_dependency_drift.yml | 73 +++ .../sigma/native_app_privilege_bump.yml | 6 + .../sigma/native_app_privilege_bump_trail.yml | 6 + .../snowflake-native-app/manifest_builder.py | 69 ++- .../naaaps_bypass_probe.py | 346 +++++++++++++ .../snowflake-native-app/version_bump_sim.py | 94 ++-- 70 files changed, 4158 insertions(+), 285 deletions(-) create mode 100644 detection/snowflake/ENRICHMENT.md create mode 100644 docs/analysis/snowflake-healthcare-overlay-2026.md create mode 100644 tools/cloud-identity/snowflake/detection/sigma/oauth_integration_scope_drift.yml create mode 100644 tools/cloud-identity/snowflake/oauth_scope_audit.py create mode 100644 tools/lateral-movement/snowflake-pivot/detection/sigma/iceberg_table_outside_catalog_base.yml create mode 100644 tools/lateral-movement/snowflake-pivot/detection/sigma/spcs_image_unpinned_or_external.yml create mode 100644 tools/lateral-movement/snowflake-pivot/detection/sigma/udf_with_eai_invocation.yml create mode 100644 tools/lateral-movement/snowflake-pivot/iceberg_catalog_pivot.py create mode 100644 tools/lateral-movement/snowflake-pivot/spcs_base_image_probe.py create mode 100644 tools/lateral-movement/snowflake-pivot/udf_eai_egress.py create mode 100644 tools/lib/snowflake_mock_client.py create mode 100644 tools/llm-attacks/cortex/detection/sigma/cortex_agent_followup_without_user_intent.yml create mode 100644 tools/llm-attacks/cortex/lab-validation/README.md create mode 100644 tools/llm-attacks/cortex/lab-validation/mcp_poisoning_setup.sql create mode 100644 tools/llm-attacks/cortex/lab-validation/observe_cortex_agent_trace.sql create mode 100644 tools/llm-attacks/cortex/lab-validation/observe_search_audit.sql create mode 100644 tools/llm-attacks/cortex/mode_corpus.py create mode 100644 tools/supply-chain/snowflake-native-app/detection/sigma/native_app_dependency_drift.yml create mode 100644 tools/supply-chain/snowflake-native-app/naaaps_bypass_probe.py diff --git a/CLAUDE.md b/CLAUDE.md index 3d5e238..47d5b70 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -107,13 +107,13 @@ The report at `reports/snowflake-platform-assessment/` is a set of linked static → [tools/lateral-movement/sccm-abuse/README.md](tools/lateral-movement/sccm-abuse/README.md) — SCCM ELEVATE1/2 → [tools/lateral-movement/azure-arc/README.md](tools/lateral-movement/azure-arc/README.md) — Azure Arc MSI pivot → [tools/lateral-movement/exchange-hybrid/README.md](tools/lateral-movement/exchange-hybrid/README.md) — evoSTS token forge -→ [tools/lateral-movement/snowflake-pivot/README.md](tools/lateral-movement/snowflake-pivot/README.md) — Snowflake Chain E storage-integration enum, Chain G share / replication exfil, Chain H SPCS egress depth × EAI rule matrix probe, bind-param evasion +→ [tools/lateral-movement/snowflake-pivot/README.md](tools/lateral-movement/snowflake-pivot/README.md) — Snowflake Chain E storage-integration enum, Chain G share / replication exfil, Chain H SPCS egress depth × EAI rule matrix probe, Chain K Polaris/Iceberg catalog pivot, Chain M UDF EAI breakout, SPCS base-image posture probe, bind-param evasion → [tools/kerberos/README.md](tools/kerberos/README.md) — S4U2self/proxy, RBCD, NTLM relay, EPA recon, NTLM reflection LPE, AES roasting ### AD CS & Identity → [tools/ad-cs/README.md](tools/ad-cs/README.md) — ESC1–ESC16, chain.py, Shadow Credentials 2026 → [tools/cloud-identity/README.md](tools/cloud-identity/README.md) — WIF, OIDC, Golden SAML, Silver SAML, SyncJacking, EvilTokens, FOCI, PRT devtools, CloudTrail blinding -→ [tools/cloud-identity/snowflake/README.md](tools/cloud-identity/snowflake/README.md) — Snowflake JWT key-pair (Chain F), PAT scope walk + PAT discovery, SCIM token harvester, partner-integration audit (Chain J) +→ [tools/cloud-identity/snowflake/README.md](tools/cloud-identity/snowflake/README.md) — Snowflake JWT key-pair (Chain F), PAT scope walk + PAT discovery, SCIM token harvester, partner-integration audit (Chain J), OAuth scope-drift audit (Chain L) → [tools/entra-abuse/README.md](tools/entra-abuse/README.md) — device-code, PRT, token replay (historical) ### Lateral Movement @@ -148,7 +148,7 @@ The report at `reports/snowflake-platform-assessment/` is a set of linked static → [tools/kernel-lpe/README.md](tools/kernel-lpe/README.md) — AFD.sys, CLFS, I/O Ring primitives (requires EXPLOIT_LAB_KERNEL=1) ### Supply Chain -→ [tools/supply-chain/README.md](tools/supply-chain/README.md) — Shai-Hulud npm worm, LiteLLM PyPI .pth, GitHub Actions OIDC (UNC6426), tj-actions-class, Snowflake Native App version-bump (Chain C empirical) +→ [tools/supply-chain/README.md](tools/supply-chain/README.md) — Shai-Hulud npm worm, LiteLLM PyPI .pth, GitHub Actions OIDC (UNC6426), tj-actions-class, Snowflake Native App version-bump + multi-stage deferred-loader timeline + NAAAPS bypass probe (Chain C empirical) ### Phishing & Initial Access → [tools/phishing/README.md](tools/phishing/README.md) — AiTM kits (Tycoon2FA/Sneaky2FA/Rockstar2FA), ClickFix/FileFix/ConsentFix, passkey bench, vishing tabletop @@ -179,10 +179,12 @@ The report at `reports/snowflake-platform-assessment/` is a set of linked static → [docs/analysis/firmware-landscape-2026/README.md](docs/analysis/firmware-landscape-2026/README.md) — Hydroph0bia, LogoFAIL successors, UEFI cert expiry → [docs/analysis/apple-mie-impact.md](docs/analysis/apple-mie-impact.md) — Apple Memory Integrity Enforcement → [docs/analysis/vishing-2026-market.md](docs/analysis/vishing-2026-market.md) — deepfake vishing economics + healthcare targeting -→ [docs/analysis/snowflake-platform-attack-surface-2026.md](docs/analysis/snowflake-platform-attack-surface-2026.md) — CVE inventory, UNC5537 analysis, Cortex AI/Native Apps/SPCS attack surface, chains A–I, Trail vs ACCOUNT_USAGE field mapping +→ [docs/analysis/snowflake-platform-attack-surface-2026.md](docs/analysis/snowflake-platform-attack-surface-2026.md) — CVE inventory, UNC5537 analysis, Cortex AI/Native Apps/SPCS attack surface, chains A–M (incl. Polaris/Iceberg K, OAuth scope drift L, UDF EAI breakout M), Trail vs ACCOUNT_USAGE field mapping +→ [docs/analysis/snowflake-healthcare-overlay-2026.md](docs/analysis/snowflake-healthcare-overlay-2026.md) — Per-chain PHI exposure map + HIPAA control mapping + BAA considerations + OCR retention sufficiency → [docs/analysis/databricks-vs-snowflake-platform-comparison.md](docs/analysis/databricks-vs-snowflake-platform-comparison.md) — Cross-platform primitive map + chain mapping; detection-reuse notes for defenders covering both platforms → [detection/snowflake/README.md](detection/snowflake/README.md) — Cross-chain Sigma/KQL/SPL index, streaming ingest pattern, connector-debug-log secret-leak detector -→ [detection/snowflake/streaming-ingest/README.md](detection/snowflake/streaming-ingest/README.md) — Concrete config (Terraform + Function App + docker-compose lab) for the INFORMATION_SCHEMA polling pipeline +→ [detection/snowflake/ENRICHMENT.md](detection/snowflake/ENRICHMENT.md) — Canonical inventory of every derived/enrichment field the Sigma rules require; deployment checklist for the detection pack +→ [detection/snowflake/streaming-ingest/README.md](detection/snowflake/streaming-ingest/README.md) — Concrete config (Terraform + Function App + docker-compose lab) for the INFORMATION_SCHEMA polling pipeline; cursor-locking + latency-measurement methodology ### Research Docs — Methodology → [docs/methodology/callstack-spoofing.md](docs/methodology/callstack-spoofing.md) diff --git a/README.md b/README.md index 22debd7..3053905 100644 --- a/README.md +++ b/README.md @@ -78,12 +78,12 @@ Each tool below ships under [tools/](tools/) and has a sibling `detection/` dire ### Lateral Movement - **Lateral Movement** — [tools/lateral-movement/](tools/lateral-movement/). RPC-based DCOM / TSCH / SCMR / WMI execution; SCCM ELEVATE1/2 plus the TAKEOVER-5 Entra-integration chain (SpecterOps, November 2025); Azure Arc MSI pivot with CVE-2026-26117 (`himds` pipe DACL); Exchange hybrid evoSTS token forge. -- **Snowflake Pivot** — [tools/lateral-movement/snowflake-pivot/](tools/lateral-movement/snowflake-pivot/). Snowflake-specific lateral primitives: Storage Integration enumeration (Chain E), Direct Share + replication-group exfil (Chain G, audit-bypass via server-side data motion), bind-parameter evasion against `QUERY_HISTORY`. Talks to `mock-snowflake` on 9600. +- **Snowflake Pivot** — [tools/lateral-movement/snowflake-pivot/](tools/lateral-movement/snowflake-pivot/). Snowflake-specific lateral primitives: Storage Integration enumeration (Chain E), Direct Share + replication-group exfil (Chain G, audit-bypass via server-side data motion), SPCS egress matrix (Chain H), Polaris / Iceberg catalog pivot (Chain K), UDF EAI breakout (Chain M), SPCS base-image posture probe (Chain H supply-chain extension), bind-parameter evasion against `QUERY_HISTORY`. Talks to `mock-snowflake` on 9600. ### Cloud Identity - **Cloud Identity Attacks** — [tools/cloud-identity/](tools/cloud-identity/). The modern cloud-identity surface: Workload Identity Federation wildcard `sub` abuse, Golden SAML, Silver SAML (secondary cert), SyncJacking via `ImmutableId` takeover, EvilTokens-style device-code 2026 PhaaS (Broker client ID FOCI path), FOCI Conditional Access bypass, PRT extraction via dev tools, and a CloudTrail-blinding catalog. Talks to the lab mocks: `mock-oidc` (9300), `mock-saml` (9400), `mock-entra` (9100/9102). -- **Snowflake Cloud Identity** — [tools/cloud-identity/snowflake/](tools/cloud-identity/snowflake/). Snowflake-specific identity abuse for the post-UNC5537 / post-MFA control surface: JWT key-pair signer (Chain F — service-user key theft from CI / orchestration hosts), PAT scope walk, SCIM token harvester with a role-race primitive. Talks to `mock-snowflake` on 9600. +- **Snowflake Cloud Identity** — [tools/cloud-identity/snowflake/](tools/cloud-identity/snowflake/). Snowflake-specific identity abuse for the post-UNC5537 / post-MFA control surface: JWT key-pair signer (Chain F — service-user key theft from CI / orchestration hosts), PAT scope walk, SCIM token harvester with a role-race primitive, partner-integration audit (Chain J), external OAuth scope-drift audit (Chain L). Talks to `mock-snowflake` on 9600. - **Entra ID Abuse (legacy)** — [tools/entra-abuse/](tools/entra-abuse/). Earlier device-code phishing, PRT simulation, and token-replay work. Kept for historical reference; current Entra work lives under `cloud-identity/`. ### Kernel LPE (Windows) diff --git a/detection/snowflake/ENRICHMENT.md b/detection/snowflake/ENRICHMENT.md new file mode 100644 index 0000000..a931bde --- /dev/null +++ b/detection/snowflake/ENRICHMENT.md @@ -0,0 +1,484 @@ +# Snowflake Detection — Enrichment Reference + +The Sigma rules in this repo target a hybrid schema: most fields come +directly from +[`SNOWFLAKE.ACCOUNT_USAGE`](https://docs.snowflake.com/en/sql-reference/account-usage) +views or the Snowflake Trail event stream, but several rules also +reference **derived** fields that the SIEM ingestion pipeline must +compute before the rule will fire. This document is the canonical +inventory of those derived fields. + +Each entry names: + +- The derived field as it appears in the rule. +- The native source field(s) the enrichment pipeline reads. +- The computation the pipeline performs. +- Where the input data lives if it is *not* in Snowflake (watchlists, + IdP audit, baselines, partner registries). + +If a deployment doesn't implement an enrichment, the corresponding rule +will silently not fire — it isn't a syntax error in the SIEM, but it +is a detection gap. **Treat this document as the deployment checklist +for the detection pack.** + +--- + +## 1. Watchlist / Allowlist Membership + +These are simple set-membership lookups against an allowlist the +customer maintains. A SIEM-side lookup table, a config-as-code file, or +a Snowflake table tagged appropriately are all valid sources. + +### `external_stage_in_watchlist` + +- **Rule:** [`sigma/bulk_exfil_baseline.yml`](sigma/bulk_exfil_baseline.yml) +- **Native source:** `QUERY_HISTORY.QUERY_TEXT` (parse external-stage URL from + the `COPY INTO @` or `COPY INTO 's3://...'` form). +- **Computation:** Extract the stage URL prefix; check membership in the + customer's approved-exfil-stage allowlist. +- **Input data location:** SIEM lookup table or a Snowflake table + `OPS.SECURITY.APPROVED_EXFIL_STAGES` keyed by stage URL prefix. + +### `role_in_approved_bulk_exporter_set` + +- **Rule:** [`sigma/bulk_exfil_baseline.yml`](sigma/bulk_exfil_baseline.yml) +- **Native source:** `QUERY_HISTORY.ROLE_NAME`. +- **Computation:** Set-membership against the customer-curated set of + roles that are *expected* to emit bulk exports (e.g., + `EHR_EXPORT_PIPELINE_ROLE`, `RESEARCH_COHORT_PUBLISHER`, + `PAYOR_FEED_WRITER`). +- **Input data location:** `OPS.SECURITY.BULK_EXPORTER_ROLES`. + +### `volume_above_role_baseline` + +- **Rule:** [`sigma/bulk_exfil_baseline.yml`](sigma/bulk_exfil_baseline.yml) +- **Native source:** `QUERY_HISTORY.BYTES_WRITTEN_TO_RESULT`, + `QUERY_HISTORY.ROLE_NAME`. +- **Computation:** Per role, maintain a 30-day rolling 90th-percentile + of `COPY INTO @` byte volumes. For each new event, set the + flag true if the volume exceeds that role's p90. +- **Input data location:** SIEM-side aggregation; persist a per-role + table `OPS.SECURITY.COPY_BYTES_P90_BY_ROLE`. Rebuild nightly. + +### `outside_business_hours` + +- **Rule:** [`sigma/bulk_exfil_baseline.yml`](sigma/bulk_exfil_baseline.yml) +- **Native source:** `QUERY_HISTORY.START_TIME`, + `QUERY_HISTORY.USER_NAME`. +- **Computation:** Per user (or per role, if simpler), check whether + the event timestamp falls outside the documented business-hours + window in the user's time zone. Roles with documented overnight + windows (EHR refresh, payor batch) should have *their* hours, not + the tenant default. +- **Input data location:** `OPS.SECURITY.ROLE_BUSINESS_HOURS` + (role → tz → (start_hour, end_hour) tuples). + +### `target_account_not_in_watchlist` + +- **Rules:** + [`snowflake_share_creation_unknown_consumer.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/snowflake_share_creation_unknown_consumer.yml), + [`snowflake_replication_group_unknown_target.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/snowflake_replication_group_unknown_target.yml). +- **Native source:** `QUERY_HISTORY.QUERY_TEXT` (parse `ADD ACCOUNTS = + .`) or `REPLICATION_GROUP_USAGE_HISTORY.TARGET_ACCOUNT_NAME`. +- **Computation:** Set-membership against the approved-consumer-account + watchlist. +- **Input data location:** `OPS.SECURITY.APPROVED_SHARE_CONSUMERS` and + `OPS.SECURITY.APPROVED_REPLICATION_TARGETS`. + +### `indexed_by_role_in_pipeline_watchlist` + +- **Rule:** [`cortex_search_rank_anomaly.yml`](../../tools/llm-attacks/cortex/detection/sigma/cortex_search_rank_anomaly.yml). +- **Native source:** `INDEX_AUDIT.ROLE_NAME` (where `INDEX_AUDIT` is the + trace produced by the Cortex Search index pipeline; see *Cortex + Telemetry Sidecar* below). +- **Computation:** Set-membership against approved-indexing-pipeline roles. +- **Input data location:** `OPS.SECURITY.APPROVED_INDEXING_ROLES`. + +--- + +## 2. Per-User Baselines + +These are time-windowed baselines maintained by the SIEM (typically +30-day rolling) against `LOGIN_HISTORY`. Rebuild nightly. + +### `is_outside_baseline_source` + +- **Rules:** + [`snowflake_keypair_auth_abuse.yml`](../../tools/cloud-identity/snowflake/detection/sigma/snowflake_keypair_auth_abuse.yml), + [`partner_integration_credential_replay.yml`](../../tools/cloud-identity/snowflake/detection/sigma/partner_integration_credential_replay.yml). +- **Native source:** `LOGIN_HISTORY.CLIENT_IP`, `LOGIN_HISTORY.USER_NAME`. +- **Computation:** Per user, maintain a rolling 30-day set of observed + `CLIENT_IP` values (or, more usefully, their `/24` CIDRs and the + GeoIP/ASN of each). For each new login, set the flag true if the + source IP is outside the set. +- **Input data location:** Computed at ingest; persist a per-user + table `OPS.SECURITY.LOGIN_BASELINE_30D`. + +### `is_login_source_in_host_egress_range` + +- **Rule:** [`cortex_code_session_to_unknown_session.yml`](sigma/cortex_code_session_to_unknown_session.yml). +- **Native source:** `LOGIN_HISTORY.CLIENT_IP`; *plus* an external + host-egress mapping derived from corporate VPN policy or device + posture (CrowdStrike, Intune, etc.). +- **Computation:** For the user whose Cortex Code host issued the + recent session, check whether the new Snowflake login's `CLIENT_IP` + falls within the host's documented egress range. +- **Input data location:** `OPS.SECURITY.HOST_EGRESS_RANGES` (host_id → + CIDR list). Tie to the corporate VPN policy as the source of truth, + not to the host's last-cached IP. + +### `is_outside_documented_partner_egress` + +- **Rule:** [`partner_integration_credential_replay.yml`](../../tools/cloud-identity/snowflake/detection/sigma/partner_integration_credential_replay.yml). +- **Native source:** `LOGIN_HISTORY.CLIENT_IP`. +- **Computation:** For users tagged as partner-integration identities + (`user_tag_partner_id` set), check the source IP against the + documented egress CIDRs of the named partner. +- **Input data location:** Customer-maintained partner registry, + e.g. `OPS.SECURITY.PARTNER_REGISTRY` (partner_id → cidr_list, + contact, BAA status). + +--- + +## 3. Cross-System Correlation + +These require correlation across the Snowflake audit and a *different* +system's audit (IdP, endpoint EDR). The correlation is time-windowed +and lag-sensitive; both window and lag tolerance must be configurable +to avoid false positives during the slower system's ingestion delay. + +### `has_corresponding_idp_event` + +- **Rules:** + [`federated_login_anomaly.yml`](sigma/federated_login_anomaly.yml), + [`snowflake_scim_role_race.yml`](../../tools/cloud-identity/snowflake/detection/sigma/snowflake_scim_role_race.yml). +- **Native source:** Snowflake `LOGIN_HISTORY` or SCIM audit row; + IdP-side sign-in / directory-change events. +- **Computation:** For each Snowflake federated login or SCIM PATCH, + search the IdP audit within `idp_correlation_window_minutes` for a + matching sign-in or directory change. Set the flag to `false` only + when *no such event exists* AND the IdP audit is known-ingested for + the window (do not fire on ingestion lag — see `idp_correlation_window_minutes` + below). +- **Input data location:** IdP audit stream (Okta System Log, + Entra Sign-In Logs); ingest both with a documented SLA. + +### `idp_correlation_window_minutes` + +- **Rules:** as above. +- **Computation:** Configurable parameter, typically 5–10 minutes. + Tune *up* to absorb IdP audit ingestion lag (Okta median ~2–3 + minutes; Entra median 5–15 minutes; both can spike higher during + vendor incidents). +- **Note:** Snowflake `ACCOUNT_USAGE.LOGIN_HISTORY` itself has a + documented latency of up to ~45 minutes. If the SIEM is reading + ACCOUNT_USAGE rather than Trail, the *Snowflake* side is the + long pole. See `lag_tolerant` flag below. + +### `lag_tolerant`, `both_sources_caught_up`, `*_watermark_ingested_at` + +- **Rules:** + [`federated_login_anomaly.yml`](sigma/federated_login_anomaly.yml) + (and Trail-paired variants where present). +- **Computation:** `lag_tolerant` is a boolean ingestion-pipeline + parameter (default `true`). When true, the rule defers final firing + until *both* sides of the correlation have known-ingested timestamps + newer than the event time, surfaced as `both_sources_caught_up = true`. + The per-source watermarks (`idp_audit_watermark_ingested_at`, + `snowflake_login_ingested_at`) come from the SIEM pipeline's + per-source ingestion tracker. +- **Input data location:** Computed by the SIEM ingestion pipeline + from per-source ingestion timestamps. Most modern SIEMs (Sentinel, + Splunk SC4S, Elastic Fleet) expose ingestion watermarks per source; + the pipeline only needs to materialize them as enrichment columns. + +### `has_cortex_code_session_within_window`, `cortex_code_session_host_id` + +- **Rule:** [`cortex_code_session_to_unknown_session.yml`](sigma/cortex_code_session_to_unknown_session.yml). +- **Native source:** Endpoint EDR process-creation events for the + Cortex Code CLI binary. +- **Computation:** Maintain a windowed set of `(user, host_id, session_start_ts)` + triples from EDR events; for each Snowflake login by the same user, + set the flag true if any session exists in the trailing window + (default 30 minutes). +- **Input data location:** EDR product (CrowdStrike Falcon, MS + Defender, SentinelOne) — process-creation table. + +--- + +## 4. SQL Statement Parsing + +These extract structured fields from the unstructured +`QUERY_HISTORY.QUERY_TEXT` column. A simple regex pass at ingest is +sufficient; production deployments should consider a tokenizer to +avoid false positives from inside string literals. + +### `stage_url`, `is_stage_url_outside_integration_allowlist` + +- **Rule:** [`snowflake_storage_integration_misuse.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/snowflake_storage_integration_misuse.yml). +- **Native source:** `QUERY_HISTORY.QUERY_TEXT` plus + `ACCOUNT_USAGE.INTEGRATIONS` (specifically the + `INTEGRATION_PARAMETERS.STORAGE_ALLOWED_LOCATIONS` column for the + `INTEGRATION_TYPE = 'EXTERNAL_STAGE'` entries). +- **Computation:** For each `CREATE STAGE` statement that names an + `INTEGRATION = ` and a `URL = `, look up the + integration's allowed_locations and set the flag if the stage URL + prefix is not contained in the allowlist. +- **Input data location:** Joined at ingest from `ACCOUNT_USAGE.INTEGRATIONS`. + +### `external_stage_url`, `external_stage` (boolean), `target_account`, `share_name` + +- **Rules:** + [`snowflake_bind_param_audit_gap.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/snowflake_bind_param_audit_gap.yml), + [`snowflake_share_creation_unknown_consumer.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/snowflake_share_creation_unknown_consumer.yml). +- **Native source:** `QUERY_HISTORY.QUERY_TEXT`. +- **Computation:** Regex extraction at ingest. + +### `cortex_code_version` + +- **Rule:** [`cortex_code_pre_1_0_25.yml`](sigma/cortex_code_pre_1_0_25.yml). +- **Native source:** Endpoint EDR process-creation `command_line`. +- **Computation:** Parse `--version 1.0.` or read the version + field from the binary signature. +- **Input data location:** Endpoint EDR. + +--- + +## 5. Native App Manifest Diff + +### `manifest_diff_added`, `manifest_hash_previous`, `manifest_hash_current` + +- **Rule:** [`native_app_unexpected_version_bump.yml`](sigma/native_app_unexpected_version_bump.yml). +- **Native source:** `ACCOUNT_USAGE.APPLICATIONS`, + `ACCOUNT_USAGE.APPLICATION_VERSIONS` (rows that record + `EVENT_TYPE = 'APP_VERSION_INSTALLED'`). +- **Computation:** On each `APP_VERSION_INSTALLED` event for an + application that previously had a version pinned, fetch the + prior version's manifest and the new version's manifest from the + application package and diff. Surface the added grant references, + added `EXTERNAL ACCESS INTEGRATION` declarations, added + `EXTERNAL FUNCTION` declarations, and any added container image + references. +- **Input data location:** Manifests live in the application package + itself; the diff must be computed by a Snowpark stored procedure or + external service that has consumer-side read access to the + application package. +- **Implementation note:** The Snowpark procedure form is more durable + because the consumer always has read access to the installed + manifest. An external service requires share-back of the manifest. + +### `user_type`, `network_policy` + +- **Rule:** [`snowflake_keypair_auth_abuse.yml`](../../tools/cloud-identity/snowflake/detection/sigma/snowflake_keypair_auth_abuse.yml). +- **Native source:** `ACCOUNT_USAGE.USERS`, `ACCOUNT_USAGE.NETWORK_POLICY_REFERENCES`. +- **Computation:** Join at ingest. `user_type` is the `USERS.TYPE` column + (`PERSON` / `SERVICE` / `LEGACY_SERVICE`). `network_policy` is the + joined policy name from `NETWORK_POLICY_REFERENCES` (NULL if none). + +### `user_tag_partner_id`, `bound_network_policy`, `documented_partner_egress_cidrs` + +- **Rule:** [`partner_integration_credential_replay.yml`](../../tools/cloud-identity/snowflake/detection/sigma/partner_integration_credential_replay.yml). +- **Native source:** `ACCOUNT_USAGE.TAG_REFERENCES`, + `ACCOUNT_USAGE.NETWORK_POLICY_REFERENCES`; *plus* the + partner-registry table maintained outside Snowflake. +- **Computation:** Joined at ingest. Tag `partner_id` is set on the + user via `ALTER USER … SET TAG partner_id = ''`; the + documented egress CIDRs come from the customer registry keyed by + `partner_id`. + +--- + +## 6. Cortex Telemetry Sidecar + +This is the largest deployment dependency. Snowflake's per-step Cortex +Agent and Cortex Search audit are not surfaced through standard +`SNOWFLAKE.ACCOUNT_USAGE` views as of the assessment cut date. There +are three viable sources, in order of preference: + +1. **Snowflake Trail** event stream where the customer has Trail + enabled and the `cortex_agent.*` / `cortex_search.*` event families + are configured to emit. This is the most accurate source. The + `_trail` variants of the Cortex rules target this stream. +2. **Customer-side trace capture** via a Snowpark stored procedure + that wraps Cortex Agent invocations and logs each agent step + (planner output, tool call, tool result) to a customer-owned + `CORTEX_AGENT_TRACE` table. See + [`tools/llm-attacks/cortex/lab-validation/observe_cortex_agent_trace.sql`](../../tools/llm-attacks/cortex/lab-validation/observe_cortex_agent_trace.sql) + for the schema this repo's rules expect. +3. **MCP-server-side logging** (where the customer operates the MCP + server). The MCP server sees `tool_output` content verbatim and + can flag injection tokens before the planner ever consumes them. + +The rules that depend on this sidecar: + +- [`cortex_agent_directive_followup.yml`](../../tools/llm-attacks/cortex/detection/sigma/cortex_agent_directive_followup.yml) + — needs per-step tool output and follow-up tool invocations within + one agent run. +- [`cortex_agent_sql_from_tool_output.yml`](../../tools/llm-attacks/cortex/detection/sigma/cortex_agent_sql_from_tool_output.yml) + — needs `sql_origin` (tool_output vs. user_prompt) classification + per executed query. +- [`cortex_search_rank_anomaly.yml`](../../tools/llm-attacks/cortex/detection/sigma/cortex_search_rank_anomaly.yml) + — needs `rank_at_search_time` per (document, query) pair and the + indexing role for the document. + +If neither Trail nor a sidecar is available, **mark these rules as +DEPLOYMENT_BLOCKED in the rule registry** and gate Cortex's PHI-bearing +access at the row-access-policy layer instead. The +[healthcare overlay](../../docs/analysis/snowflake-healthcare-overlay-2026.md) +section on audit-retention sufficiency calls this out as the relevant +HIPAA §164.312(b) compensating control. + +--- + +## 7. Cortex Agent Derived Fields + +Independent of the sidecar source, these fields are computed on the +trace once captured: + +### `same_agent_run`, `tool_invocations_after_directive` + +- **Rule:** [`cortex_agent_directive_followup.yml`](../../tools/llm-attacks/cortex/detection/sigma/cortex_agent_directive_followup.yml). +- **Computation:** Within a single `agent_run_id`, count the number of + tool invocations that follow a step whose `tool_output|contains: 'CALL_TOOL:'` + matches. The directive itself anchors `step_n`; the follow-up + counts steps in `step_{n+1..k}` that name the directive's referenced + tool. + +### `sql_origin` + +- **Rule:** [`cortex_agent_sql_from_tool_output.yml`](../../tools/llm-attacks/cortex/detection/sigma/cortex_agent_sql_from_tool_output.yml). +- **Computation:** For each query executed within an agent run, classify + whether the query text *first appears* in: (a) the user prompt, (b) the + output of a tool earlier in the same run, or (c) the planner's own + generation. The sidecar records this classification at execution time + (it's a tag emitted by the wrapping stored procedure, not a post-hoc + inference). + +### `tool_in_prompt_match`, `tool_function_in_prompt_match`, `prior_step_role`, `cross_turn_*` + +- **Rule:** [`cortex_agent_followup_without_user_intent.yml`](../../tools/llm-attacks/cortex/detection/sigma/cortex_agent_followup_without_user_intent.yml). +- **Computation:** + - `tool_in_prompt_match` is `true` iff the tool name (or a documented + alias) appears literally in the run's user prompt. + - `tool_function_in_prompt_match` is `true` iff the tool's documented + intent appears in the user prompt as a free-text request (e.g., + user said "search for X" and the called tool is `cortex_search`). + Compute via a small embedding-similarity check against the agent + registry's per-tool intent description. + - `prior_step_role` is the `step_role` of step `n-1` in the same + agent run; `'tool_result'` indicates the call followed a tool + output rather than a planner-internal step. + - `cross_turn_prior_run_id` and `cross_turn_same_user` join against + the prior 60 minutes of the same user's runs to catch multi-turn + poisoning. +- **Input data location:** `OPS.SECURITY.AGENT_TOOL_CHAIN_ALLOWLIST` + (the per-agent chained-tool allowlist) plus the agent registry's + tool-intent descriptions. + +### `rank_at_search_time`, `indexed_within_minutes` + +- **Rule:** [`cortex_search_rank_anomaly.yml`](../../tools/llm-attacks/cortex/detection/sigma/cortex_search_rank_anomaly.yml). +- **Computation:** Per-query top-N ranking captured at search time + (typically by wrapping Cortex Search calls in a stored procedure + that records the returned document_ids and their positions). + `indexed_within_minutes` is the difference between document + indexing timestamp and search-time timestamp. + +--- + +## 8. SCIM Audit Fields + +### `operation`, `target_attribute`, `op`, `target_user`, `previous_role`, `new_role`, `scim_bearer_id` + +- **Rule:** [`snowflake_scim_role_race.yml`](../../tools/cloud-identity/snowflake/detection/sigma/snowflake_scim_role_race.yml). +- **Native source:** SCIM provisioning logs. Snowflake's SCIM endpoint + audit is surfaced through Trail and through the customer's + reverse-proxy / IdP-side audit (Okta SCIM logs; Entra Provisioning + Logs). +- **Computation:** Parse the SCIM HTTP request body for PATCH operations + on the `snowflakeRole` attribute; correlate to IdP-side + user-attribute-change events on the same target. + +--- + +## 9. Chain-K (Iceberg) Derived Fields + +### `iceberg_metadata_location`, `iceberg_storage_location`, `is_location_outside_catalog_base`, `last_metadata_writer_role` + +- **Rule:** [`iceberg_table_outside_catalog_base.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/iceberg_table_outside_catalog_base.yml). +- **Native source:** `QUERY_HISTORY.QUERY_TEXT` plus + `INFORMATION_SCHEMA.ICEBERG_TABLES` (where exposed). Production + deployments may need to query the Iceberg catalog directly for the + metadata pointer. +- **Computation:** Parse the `CREATE ICEBERG TABLE` statement for + `METADATA_FILE_PATH` and `BASE_LOCATION`; check against + `OPS.SECURITY.APPROVED_ICEBERG_CATALOG_BASES`. +- **Input data location:** Customer-maintained allowlist of approved + catalog base prefixes; one entry per cross-region replica. + +## 10. Chain-L (OAuth) Derived Fields + +### `integration_default_role`, `idp_granted_scopes`, `idp_consent_diff_added`, `reaches_admin_class_role` + +- **Rule:** [`oauth_integration_scope_drift.yml`](../../tools/cloud-identity/snowflake/detection/sigma/oauth_integration_scope_drift.yml). +- **Native source:** `ACCOUNT_USAGE.INTEGRATIONS` (Snowflake side) plus + IdP-side consent audit (Okta `apps//grants`, Entra + `oauth2PermissionGrants`). +- **Computation:** Join the integration's `default_role` against an + admin-class set (`{ACCOUNTADMIN, SECURITYADMIN, USERADMIN}`). + Diff the IdP-side granted scopes against the prior snapshot to + produce `idp_consent_diff_added`. +- **Input data location:** IdP audit stream; snapshot stored + in `OPS.SECURITY.IDP_CONSENT_SNAPSHOT_DAILY`. + +## 11. Chain-M (UDF EAI) Derived Fields + +### `udf_owner`, `udf_eai_list`, `eai_network_rule_value_list`, `eai_rule_is_overbroad`, `invocation_role_eq_owner` + +- **Rule:** [`udf_with_eai_invocation.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/udf_with_eai_invocation.yml). +- **Native source:** `ACCOUNT_USAGE.FUNCTIONS` (function owner + EAI + list), `ACCOUNT_USAGE.INTEGRATIONS` and `NETWORK_RULES` (the EAI's + referenced rule's `value_list`), `QUERY_HISTORY` (invocation + user/role). +- **Computation:** Join at ingest. `eai_rule_is_overbroad` is `true` + when the referenced NETWORK RULE's `value_list` contains a wildcard + (`*`, `OPEN_ANY`). +- **Input data location:** All four joins are pure Snowflake views. + +## 12. SPCS Image Posture Derived Fields + +### `service_spec_image`, `image_has_digest_pin`, `image_registry`, `image_registry_in_approved_set` + +- **Rule:** [`spcs_image_unpinned_or_external.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/spcs_image_unpinned_or_external.yml). +- **Native source:** `QUERY_HISTORY.QUERY_TEXT` plus + `INFORMATION_SCHEMA.SERVICES` for the parsed `spec.image`. +- **Computation:** Regex on the image URI for `@sha256:` (digest pin) + and for the registry prefix; set-membership against + `OPS.SECURITY.APPROVED_CONTAINER_REGISTRIES`. + +## Deployment Checklist + +Before deploying the Snowflake detection pack to a production SIEM, +confirm: + +- [ ] An allowlist table or lookup is populated for each + `*_in_watchlist` field above. +- [ ] A per-user 30-day login baseline is computed nightly and + available for join at detection time. +- [ ] IdP audit ingestion is configured with a documented latency SLA; + `idp_correlation_window_minutes` is tuned to cover the worst-case + ingestion lag of either side, and `lag_tolerant` is enabled by + default. +- [ ] If Snowflake Trail is enabled for the account, the Cortex Trail + event families are emitting; if not, a Cortex telemetry sidecar + is deployed (Snowpark wrapper) and the rule registry reflects + which rules are DEPLOYMENT_BLOCKED. +- [ ] Partner registry table exists and is kept up to date by the + vendor-management process. +- [ ] EDR process-creation events are ingested for Cortex Code CLI on + every developer endpoint subject to the rule (otherwise the + endpoint-side Chain B coverage is blind). +- [ ] Native App manifest-diff procedure is installed in the consumer + account and runs on each `APP_VERSION_INSTALLED` event. + +Any item left unchecked is a *named* detection gap, not a silent one. diff --git a/detection/snowflake/README.md b/detection/snowflake/README.md index f35b018..0b7520b 100644 --- a/detection/snowflake/README.md +++ b/detection/snowflake/README.md @@ -21,14 +21,18 @@ ingestion surface available on the customer's side. |-------|--------------|---------------------|-------------| | A — Credential theft to bulk exfil | UNC5537 replay; bulk `COPY INTO @stage` from a non-MFA / no-network-policy user. | [`bulk_exfil_baseline.yml`](sigma/bulk_exfil_baseline.yml) + bind-param coverage: [`snowflake_bind_param_audit_gap.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/snowflake_bind_param_audit_gap.yml) | — (folded into bulk_exfil_baseline via the streaming-ingest pipeline) | | B — Cortex Code indirect injection | Pre-1.0.25 Cortex Code CLI executes shell-pipe-sh under indirect prompt injection. | [`cortex_code_pre_1_0_25.yml`](sigma/cortex_code_pre_1_0_25.yml) (version-string, endpoint-side) + behavioral pair: [`cortex_code_session_to_unknown_session.yml`](sigma/cortex_code_session_to_unknown_session.yml) | covered by the behavioral pair (does not depend on Trail event names) | -| C — Native App Marketplace supply-chain | Installed Native App auto-updates to a manifest with new external integrations. | [`native_app_unexpected_version_bump.yml`](sigma/native_app_unexpected_version_bump.yml) | — (Native App lifecycle still surfaces through ACCOUNT_USAGE.APPLICATIONS) | +| C — Native App Marketplace supply-chain | Installed Native App auto-updates to a manifest with new external integrations, new privileges, or new/mutated dependencies (incl. deferred-loader shape). | [`native_app_unexpected_version_bump.yml`](sigma/native_app_unexpected_version_bump.yml) + [`native_app_privilege_bump.yml`](../../tools/supply-chain/snowflake-native-app/detection/sigma/native_app_privilege_bump.yml) + [`native_app_dependency_drift.yml`](../../tools/supply-chain/snowflake-native-app/detection/sigma/native_app_dependency_drift.yml) | [`native_app_privilege_bump_trail.yml`](../../tools/supply-chain/snowflake-native-app/detection/sigma/native_app_privilege_bump_trail.yml) | | D — Federated-IdP compromise | Forged SAML/OAuth assertion authenticates a high-privileged Snowflake user. | [`federated_login_anomaly.yml`](sigma/federated_login_anomaly.yml) | — (use the Chain F Trail variant; same login_history shape) | | E — Storage Integration cross-cloud pivot | New external stage on an integration outside the bucket allowlist. | [`snowflake_storage_integration_misuse.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/snowflake_storage_integration_misuse.yml) | [`snowflake_storage_integration_misuse_trail.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/snowflake_storage_integration_misuse_trail.yml) | | F — Key-pair JWT auth abuse | Stolen RSA private key signs JWT for a service user (post-MFA reality). | [`snowflake_keypair_auth_abuse.yml`](../../tools/cloud-identity/snowflake/detection/sigma/snowflake_keypair_auth_abuse.yml) | [`snowflake_keypair_auth_abuse_trail.yml`](../../tools/cloud-identity/snowflake/detection/sigma/snowflake_keypair_auth_abuse_trail.yml) | | G — Direct Share / Replication exfil | `ALTER SHARE ADD ACCOUNTS` or replication group with a non-allowlisted target. | [`snowflake_share_creation_unknown_consumer.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/snowflake_share_creation_unknown_consumer.yml) + [`snowflake_replication_group_unknown_target.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/snowflake_replication_group_unknown_target.yml) | [`snowflake_share_creation_unknown_consumer_trail.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/snowflake_share_creation_unknown_consumer_trail.yml) + [`snowflake_replication_group_unknown_target_trail.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/snowflake_replication_group_unknown_target_trail.yml) | | H — SPCS over-broad EAI egress | Wildcard / OPEN_ANY network rule referenced by an `EXTERNAL ACCESS INTEGRATION`. | [`snowflake_spcs_eai_overbroad.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/snowflake_spcs_eai_overbroad.yml) | [`snowflake_spcs_eai_overbroad_trail.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/snowflake_spcs_eai_overbroad_trail.yml) | -| I — Cortex Agent MCP poisoning | Tool output triggers planner-initiated follow-up tool calls or SQL execution. | [`cortex_agent_directive_followup.yml`](../../tools/llm-attacks/cortex/detection/sigma/cortex_agent_directive_followup.yml) + [`cortex_agent_sql_from_tool_output.yml`](../../tools/llm-attacks/cortex/detection/sigma/cortex_agent_sql_from_tool_output.yml) + [`cortex_search_rank_anomaly.yml`](../../tools/llm-attacks/cortex/detection/sigma/cortex_search_rank_anomaly.yml) | [`cortex_agent_directive_followup_trail.yml`](../../tools/llm-attacks/cortex/detection/sigma/cortex_agent_directive_followup_trail.yml) | +| I — Cortex Agent MCP poisoning | Tool output triggers planner-initiated follow-up tool calls or SQL execution. Behavioural variants (semantic injection, authority spoof, multi-turn) bypass the keyword form. | [`cortex_agent_directive_followup.yml`](../../tools/llm-attacks/cortex/detection/sigma/cortex_agent_directive_followup.yml) + [`cortex_agent_followup_without_user_intent.yml`](../../tools/llm-attacks/cortex/detection/sigma/cortex_agent_followup_without_user_intent.yml) (no-keyword behavioural pair) + [`cortex_agent_sql_from_tool_output.yml`](../../tools/llm-attacks/cortex/detection/sigma/cortex_agent_sql_from_tool_output.yml) + [`cortex_search_rank_anomaly.yml`](../../tools/llm-attacks/cortex/detection/sigma/cortex_search_rank_anomaly.yml) | [`cortex_agent_directive_followup_trail.yml`](../../tools/llm-attacks/cortex/detection/sigma/cortex_agent_directive_followup_trail.yml) | | J — Partner-integration credential replay | Third-party SaaS holding Snowflake credentials is compromised; credential replayed from attacker infrastructure. | [`partner_integration_credential_replay.yml`](../../tools/cloud-identity/snowflake/detection/sigma/partner_integration_credential_replay.yml) | [`partner_integration_credential_replay_trail.yml`](../../tools/cloud-identity/snowflake/detection/sigma/partner_integration_credential_replay_trail.yml) | +| K — Polaris / Iceberg catalog abuse | Iceberg metadata-pointer poisoning or off-catalog-base storage. External Iceberg table reads sidestep the customer's STORAGE INTEGRATION allowlist. | [`iceberg_table_outside_catalog_base.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/iceberg_table_outside_catalog_base.yml) | — (extend the same rule against Trail's `iceberg.snowflake.table_changed` event family where enabled) | +| L — External OAuth scope drift | Snowflake OAuth integration's IdP-side consent widens to admin-class scopes; mapping silently reaches a high-privilege role. | [`oauth_integration_scope_drift.yml`](../../tools/cloud-identity/snowflake/detection/sigma/oauth_integration_scope_drift.yml) | — | +| M — UDF EAI breakout | Python/Scala UDF with `EXTERNAL_ACCESS_INTEGRATIONS` callable by non-owner roles becomes a sanctioned exfil channel during normal query execution. | [`udf_with_eai_invocation.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/udf_with_eai_invocation.yml) | — | +| H+ — SPCS base-image supply chain | SPCS service references an image tag (not a digest) from a registry the customer does not control. The image can be substituted between scan and deploy. | [`spcs_image_unpinned_or_external.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/spcs_image_unpinned_or_external.yml) | — | ## PAT, SCIM, and Connector secret-leak detections diff --git a/detection/snowflake/sigma/bulk_exfil_baseline.yml b/detection/snowflake/sigma/bulk_exfil_baseline.yml index f8d5198..8f60c81 100644 --- a/detection/snowflake/sigma/bulk_exfil_baseline.yml +++ b/detection/snowflake/sigma/bulk_exfil_baseline.yml @@ -1,14 +1,35 @@ -title: Snowflake — Bulk COPY INTO External Stage (Chain A baseline) +title: Snowflake — Bulk COPY INTO External Stage (Chain A baseline, role-aware) id: 8e7d2c1f-3b4a-4e5c-8f0a-1b2c3d4e5f6a status: experimental description: | - Baseline detection for Chain A (UNC5537 replay). Fires when a session - emits a high-volume `COPY INTO @` whose external-stage - URL is not on the approved-exfil-stage watchlist. + Multi-signal detection for Chain A (UNC5537 replay). Fires on a + `COPY INTO @` whose **combination** of signals + separates an attacker's first-and-only bulk exfil from a legitimate + role's recurring data motion. - Designed as a coarse alarm that surfaces *any* unusual bulk exfil; pair - with `snowflake_bind_param_audit_gap.yml` (in the snowflake-pivot tool - directory) for sessions where bind parameters degrade the audit signal. + The volume-only form of this rule (pre-2026-05) was too noisy in + practice: legitimate roles routinely emit large `COPY INTO` jobs at + quarter close, EHR-export windows, payor reconciliation runs, and + research-cohort handoffs. Those events look identical to an attacker + at the bytes-out-the-door layer. + + The rule fires when **all four** of the following hold: + + 1. The query is a `COPY INTO @` (external-stage form, + not internal stage). + 2. The external stage is **not** on the customer-maintained + approved-exfil-stage watchlist. + 3. **At least one** of: + - the user's role is not in the approved bulk-exporter set, + - the volume exceeds the role's 90th-percentile baseline, + - the event falls outside business hours for the user's + time zone / tenant policy. + 4. Volume above 10 MB (drop the static-100MB floor — combined with + the role / hours signals, a 10 MB lower bound is sufficient to + distinguish from accidental small unloads). + + Pair with `snowflake_bind_param_audit_gap.yml` for sessions where bind + parameters degrade the audit signal. references: - https://cloud.google.com/blog/topics/threat-intelligence/unc5537-snowflake-data-theft-extortion - https://docs.snowflake.com/en/sql-reference/account-usage/query_history @@ -17,6 +38,13 @@ date: 2026-05-15 tags: - attack.exfiltration - attack.t1567.002 +enrichment: + required: + - external_stage_in_watchlist + - role_in_approved_bulk_exporter_set + - volume_above_role_baseline + - outside_business_hours + doc: ../ENRICHMENT.md logsource: product: snowflake service: query_history @@ -26,9 +54,19 @@ detection: query_text|contains: '@' external_stage_not_in_watchlist: external_stage_in_watchlist: false - large_result: - bytes_written_to_result|gte: 104857600 # 100 MB - condition: copy_to_external and external_stage_not_in_watchlist and large_result + role_off_baseline: + role_in_approved_bulk_exporter_set: false + volume_above_baseline: + volume_above_role_baseline: true + off_hours: + outside_business_hours: true + size_floor: + bytes_written_to_result|gte: 10485760 # 10 MB lower floor + condition: > + copy_to_external + and external_stage_not_in_watchlist + and size_floor + and (role_off_baseline or volume_above_baseline or off_hours) fields: - event_timestamp - user_name @@ -37,10 +75,20 @@ fields: - query_text - bytes_written_to_result - rows_produced + - external_stage_url + - role_in_approved_bulk_exporter_set + - volume_above_role_baseline + - outside_business_hours falsepositives: - Legitimate first-run of a new pipeline that loads from / unloads to - a freshly-created external stage. Maintain a 24h grace + on-call - notification. - - Bulk export jobs run during quarter close that are not normally on - the watchlist; tag the approved stages instead of suppressing. + a freshly-created external stage. Maintain a 24h grace + + on-call notification — the rule should warn (not page) until the + new stage is added to the watchlist. + - Genuinely novel ad-hoc exports from approved bulk-exporter roles + during declared incidents (DR, data-migration, etc.). Tag the + incident window so the rule's `off_hours` signal does not stack + with operational urgency. + - Roles that have legitimate after-hours export windows (overnight + EHR refreshes). Define the `outside_business_hours` calculation + per role, not per tenant — the enrichment doc names this pattern. level: high diff --git a/detection/snowflake/sigma/connector_secret_leak_in_logs.yml b/detection/snowflake/sigma/connector_secret_leak_in_logs.yml index 652e842..0eba59c 100644 --- a/detection/snowflake/sigma/connector_secret_leak_in_logs.yml +++ b/detection/snowflake/sigma/connector_secret_leak_in_logs.yml @@ -19,6 +19,9 @@ date: 2026-05-15 tags: - attack.credential_access - attack.t1552.001 # Credentials in Files +enrichment: + required: [] # works on raw connector debug logs — no derived fields + doc: ../ENRICHMENT.md logsource: product: snowflake category: connector_logs diff --git a/detection/snowflake/sigma/cortex_code_pre_1_0_25.yml b/detection/snowflake/sigma/cortex_code_pre_1_0_25.yml index 67313f6..ea9c15f 100644 --- a/detection/snowflake/sigma/cortex_code_pre_1_0_25.yml +++ b/detection/snowflake/sigma/cortex_code_pre_1_0_25.yml @@ -18,6 +18,9 @@ tags: - attack.execution - attack.t1059 - attack.initial_access +enrichment: + required: [cortex_code_version] + doc: ../ENRICHMENT.md logsource: product: endpoint category: process_creation diff --git a/detection/snowflake/sigma/cortex_code_session_to_unknown_session.yml b/detection/snowflake/sigma/cortex_code_session_to_unknown_session.yml index 2947432..6f62089 100644 --- a/detection/snowflake/sigma/cortex_code_session_to_unknown_session.yml +++ b/detection/snowflake/sigma/cortex_code_session_to_unknown_session.yml @@ -27,6 +27,12 @@ tags: - attack.t1528 - attack.lateral_movement - attack.t1550 +enrichment: + required: + - has_cortex_code_session_within_window + - cortex_code_session_host_id + - is_login_source_in_host_egress_range + doc: ../ENRICHMENT.md logsource: product: snowflake service: login_history diff --git a/detection/snowflake/sigma/federated_login_anomaly.yml b/detection/snowflake/sigma/federated_login_anomaly.yml index 0359ea7..c76cbd1 100644 --- a/detection/snowflake/sigma/federated_login_anomaly.yml +++ b/detection/snowflake/sigma/federated_login_anomaly.yml @@ -1,14 +1,39 @@ -title: Snowflake — Federated Login Without Corresponding IdP Sign-In Event +title: Snowflake — Federated Login Without Corresponding IdP Sign-In Event (lag-tolerant) id: 3b4c5d6e-7f80-9192-a3b4-c5d6e7f80293 status: experimental description: | Detects a Snowflake SAML or OAuth login whose corresponding sign-in - event is missing from the IdP audit within a correlation window. + event is missing from the IdP audit within a configurable correlation + window — accounting for the latency profile of both sides of the + correlation. Models Chain D (federated-IdP compromise). A forged SAML assertion or a stolen OAuth refresh token authenticates a Snowflake user without any sign-in event on the IdP side; the Snowflake LOGIN_HISTORY entry is therefore the only signal. + + **Lag-tolerance is critical.** The latency profile this rule must + survive: + + | Source | Worst-case ingestion lag | + |--------|---------------------| + | Snowflake `ACCOUNT_USAGE.LOGIN_HISTORY` | up to ~45 minutes | + | Snowflake Trail `auth.snowflake.login` | seconds (where Trail enabled) | + | Okta System Log | 2–10 minutes typical, can spike during vendor incidents | + | Entra Sign-In Logs | 5–15 minutes typical, can spike further | + + The rule's `idp_correlation_window_minutes` parameter must be at least + the worst-case combined lag of both sides. The `lag_tolerant` flag, + when set, defers final firing until both sources have a known- + ingested watermark newer than the event time. Without it, the rule + produces a stream of false positives during *healthy* IdP ingestion + delay. + + **Prefer the Trail-paired variant where Trail is enabled.** This + ACCOUNT_USAGE-shaped rule has a worst-case 45-minute latency at the + source. For real-time correlation against a fast IdP audit, the + Trail variant should be the primary; this rule is the fallback for + accounts without Trail. references: - https://docs.snowflake.com/en/user-guide/admin-security-fed-auth-overview - https://docs.snowflake.com/en/sql-reference/account-usage/login_history @@ -18,6 +43,14 @@ tags: - attack.credential_access - attack.t1606.002 # Forge Web Credentials: SAML Tokens - attack.lateral_movement +enrichment: + required: + - has_corresponding_idp_event + - idp_correlation_window_minutes + - lag_tolerant + - idp_audit_watermark_ingested_at + - snowflake_login_ingested_at + doc: ../ENRICHMENT.md logsource: product: snowflake service: login_history @@ -30,7 +63,14 @@ detection: is_success: true no_idp_correlate: has_corresponding_idp_event: false - condition: federated_login and no_idp_correlate + both_sides_known_ingested: + # When lag_tolerant is true, the SIEM pipeline only sets this to + # true once both ingestion watermarks are past the event time. + # The rule does not fire until this is true — preventing FP storms + # during ingestion lag. + lag_tolerant: true + both_sources_caught_up: true + condition: federated_login and no_idp_correlate and both_sides_known_ingested fields: - event_timestamp - user_name @@ -38,8 +78,17 @@ fields: - client_ip - client_app_id - idp_correlation_window_minutes + - idp_audit_watermark_ingested_at + - snowflake_login_ingested_at falsepositives: - - IdP audit ingestion lag or outage. Treat the alert as a *suspect* - signal and require IdP audit availability before high-confidence - triage. + - IdP audit ingestion lag exceeding `idp_correlation_window_minutes`. + Tune the window up rather than suppress; the rule's + `both_sources_caught_up` gate prevents premature firing once the + SIEM pipeline correctly tracks ingestion watermarks. + - IdP outage that prevented log capture during the correlation + window. Treat the alert as a *suspect* signal and confirm IdP audit + availability during triage. + - Token-replay scenarios where the IdP audit *does* show a sign-in + but the session was hijacked downstream. Pair with session-IP + anomaly detection on the Snowflake side. level: high diff --git a/detection/snowflake/sigma/native_app_unexpected_version_bump.yml b/detection/snowflake/sigma/native_app_unexpected_version_bump.yml index 1a66d6d..7aa8600 100644 --- a/detection/snowflake/sigma/native_app_unexpected_version_bump.yml +++ b/detection/snowflake/sigma/native_app_unexpected_version_bump.yml @@ -19,6 +19,12 @@ tags: - attack.initial_access - attack.supply_chain - attack.t1195.002 +enrichment: + required: + - manifest_diff_added + - manifest_hash_previous + - manifest_hash_current + doc: ../ENRICHMENT.md logsource: product: snowflake service: application_history diff --git a/detection/snowflake/streaming-ingest/README.md b/detection/snowflake/streaming-ingest/README.md index 4aaaf2a..d708c06 100644 --- a/detection/snowflake/streaming-ingest/README.md +++ b/detection/snowflake/streaming-ingest/README.md @@ -57,6 +57,43 @@ does not double-publish events. In the Azure Function shape this lives in the Function App's storage account as a single blob; in the docker-compose shape it lives in a named volume. +Two guards prevent cursor corruption from concurrent writes: + +- **Function App layer.** The `host.json` `singleton` block, combined + with the timer binding's `useMonitor: true`, ensures only one + Function instance polls at a time. The runtime's blob lease is the + arbitrator — auto-recovery cannot stack a second instance. +- **Process layer.** The poller wraps cursor reads and writes in an + `fcntl.flock`-style advisory lock and writes via temp-file + atomic + rename. A SIGKILL between truncate and write would otherwise leave + the cursor empty; the rename pattern guarantees the file is either + the old timestamp or the new one, never partial. + +## Latency profile — how to measure rather than estimate + +The latency the pipeline must beat is the +`SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY` worst case, documented as up +to ~45 minutes. The pipeline's own latency has four components: + +| Stage | Typical | How to measure | +|-------|---------|----------------| +| Snowflake side (`INFORMATION_SCHEMA.QUERY_HISTORY_BY_USER`) | seconds | `SELECT MAX(END_TIME)` repeated against the function returns the freshest visible row | +| Poll cadence | 0–60s | Configured at the timer schedule (`function.json`). Reduce if needed; pollings shorter than 30s typically don't pay back the warehouse overhead. | +| Function execution + Event Hub publish | 0.5–3s | Application Insights `Requests` table; `duration` is the per-invocation wall clock | +| Log Analytics ingestion | 30s–5m | Sentinel custom-log ingestion latency is the long pole on the SIEM side; verify against `TimeGenerated` vs. `ingestion_time()` deltas on the destination table | + +The "end-to-end latency" claim must be validated per-deployment because +the Log Analytics ingestion stage is the dominant variable. The pipeline +is **not** a real-time stream in the millisecond sense — its design +target is "minutes, not 45 minutes," which is the regime relevant to +session containment on bulk-exfil chains. + +A simple test: emit a marker query against Snowflake from the lab +(`SELECT '__cursor_probe__'`) and measure the time until the matching +row appears in the destination Log Analytics table. Repeat at peak and +off-peak. Capture the 90th percentile; that is the number the detection +SLAs depend on. + ## Containment note This is a defender-side tool. It does not require the `EXPLOIT_LAB_ACTIVE` diff --git a/detection/snowflake/streaming-ingest/azure-function/function.json b/detection/snowflake/streaming-ingest/azure-function/function.json index 71fdc34..999247c 100644 --- a/detection/snowflake/streaming-ingest/azure-function/function.json +++ b/detection/snowflake/streaming-ingest/azure-function/function.json @@ -1,6 +1,7 @@ { "scriptFile": "../poller/poller.py", "entryPoint": "main", + "_singleton_note": "useMonitor + the host.json singleton block ensure only one Function instance polls at a time. Auto-recovery cannot stack a second instance that would race the cursor blob.", "bindings": [ { "name": "timer", diff --git a/detection/snowflake/streaming-ingest/azure-function/host.json b/detection/snowflake/streaming-ingest/azure-function/host.json index 1281c60..730d7cd 100644 --- a/detection/snowflake/streaming-ingest/azure-function/host.json +++ b/detection/snowflake/streaming-ingest/azure-function/host.json @@ -4,7 +4,8 @@ "id": "Microsoft.Azure.Functions.ExtensionBundle", "version": "[4.*, 5.0.0)" }, - "functionTimeout": "00:00:55", + "_functionTimeout_note": "Set to 4 minutes (Y1 max is 10) so a slow Snowflake response or transient Event Hub backpressure does not race the next 60-second invocation. The poll cadence is 60s (function.json schedule); the function should normally complete in 1–3s, but the longer timeout absorbs vendor incidents without crashing mid-publish.", + "functionTimeout": "00:04:00", "logging": { "applicationInsights": { "samplingSettings": { @@ -12,5 +13,12 @@ "excludedTypes": "Request" } } + }, + "singleton": { + "lockPeriod": "00:00:55", + "listenerLockPeriod": "00:01:00", + "listenerLockRecoveryPollingInterval": "00:01:00", + "lockAcquisitionTimeout": "00:01:00", + "lockAcquisitionPollingInterval": "00:00:03" } } diff --git a/detection/snowflake/streaming-ingest/poller/poller.py b/detection/snowflake/streaming-ingest/poller/poller.py index acb9287..4ea99a3 100644 --- a/detection/snowflake/streaming-ingest/poller/poller.py +++ b/detection/snowflake/streaming-ingest/poller/poller.py @@ -70,18 +70,61 @@ def _project_row(row: dict[str, Any]) -> dict[str, Any]: # ── Cursor persistence ─────────────────────────────────────────────── +# +# The cursor is a single ISO-8601 timestamp. Two writers racing would +# corrupt it, so reads and writes go through an OS-level advisory lock. +# On Linux/macOS this is fcntl flock; on Windows we fall back to a +# best-effort exclusive open (the Function-App deployment uses a blob- +# lease at the singleton layer — see host.json — so the in-process lock +# is the second line of defence rather than the only one). +# +# The blob-lease pattern for the Azure Function shape: enable +# `singleton` in host.json AND store the cursor in a singleton-locked +# blob; the runtime's lease ensures one instance at a time. The +# fcntl-based lock here covers local / docker-compose / non-Azure runs. + +import contextlib + +try: + import fcntl # type: ignore + _HAS_FCNTL = True +except ImportError: + _HAS_FCNTL = False + + +@contextlib.contextmanager +def _cursor_lock(lock_path: Path): + """Best-effort cross-platform advisory lock around the cursor file.""" + lock_path.parent.mkdir(parents=True, exist_ok=True) + fh = open(lock_path, "a+") + try: + if _HAS_FCNTL: + fcntl.flock(fh, fcntl.LOCK_EX) + yield + finally: + if _HAS_FCNTL: + fcntl.flock(fh, fcntl.LOCK_UN) + fh.close() + def _load_cursor(path: Path, lookback_minutes: int) -> datetime: - if path.exists(): - try: - return datetime.fromisoformat(path.read_text().strip()) - except ValueError: - pass - return datetime.now(tz=timezone.utc) - timedelta(minutes=lookback_minutes) + with _cursor_lock(path.with_suffix(path.suffix + ".lock")): + if path.exists(): + try: + return datetime.fromisoformat(path.read_text().strip()) + except ValueError: + pass + return datetime.now(tz=timezone.utc) - timedelta(minutes=lookback_minutes) def _save_cursor(path: Path, when: datetime) -> None: - path.write_text(when.isoformat()) + with _cursor_lock(path.with_suffix(path.suffix + ".lock")): + # Write to a temp file in the same dir, then atomic rename. + # Prevents partial-write corruption if the process is killed + # between truncate and write. + tmp = path.with_suffix(path.suffix + ".tmp") + tmp.write_text(when.isoformat()) + tmp.replace(path) # ── Snowflake source — production path ─────────────────────────────── diff --git a/docs/analysis/snowflake-healthcare-overlay-2026.md b/docs/analysis/snowflake-healthcare-overlay-2026.md new file mode 100644 index 0000000..3a1b080 --- /dev/null +++ b/docs/analysis/snowflake-healthcare-overlay-2026.md @@ -0,0 +1,415 @@ +# Snowflake Healthcare Overlay — 2026 + +Companion to +[`snowflake-platform-attack-surface-2026.md`](snowflake-platform-attack-surface-2026.md). +That document is platform-generic; this one re-frames the same attack +surface for organizations that hold protected health information (PHI), +payor data, or research datasets on Snowflake, and that are subject to +HIPAA, HITECH, the HHS-OCR breach-reporting regime, and (where +applicable) 42 CFR Part 2 and state-level health-privacy laws. + +This is not legal advice and not a compliance assessment. It is a +red-team companion: where do the platform chains documented next door +intersect with the controls and reporting obligations a healthcare +security program is already responsible for? + +--- + +## Why Snowflake is a Healthcare Crown Jewel + +Snowflake sits at the intersection of three healthcare data flows that +were historically siloed: + +1. **Clinical data** — EHR exports (Epic Clarity / Caboodle, Cerner + HealtheIntent, athenahealth), HL7 v2 / FHIR feeds, lab and imaging + metadata, clinical-trial CRF data. The PHI elements here cover the + full 18 HIPAA identifiers and are the most regulated. +2. **Claims and financial data** — payor claims (X12 837 / 835), + eligibility (270/271), formulary, pricing, member rosters, and + denial workflow data. PHI-bearing where it ties to individuals; + commercially sensitive even where de-identified. +3. **Operational / research data** — denormalized analytics marts, + research cohorts (sometimes with limited-data-set or + de-identified status), value-based-care performance, social + determinants of health (SDOH) layers, prior-authorization logs. + +A typical 2026 healthcare data platform on Snowflake has all three of +these flowing into a small number of curated databases, with Cortex +Analyst / Cortex Search sitting on top for analyst self-service and +agent workflows. The blast radius of an account compromise is +*every patient the organization has ever treated*, not just a single +table or system. + +### Three implications for the threat model + +- **Re-identification of "de-identified" marts.** Many healthcare orgs + carry "de-identified" or "limited" datasets in Snowflake under the + assumption they are out of scope for HIPAA. Cortex Search and Cortex + Analyst, fed with auxiliary tables (zip → census, lab values → + cohort tags), make re-identification materially easier. A red-team + assessment must treat any dataset that includes residual quasi- + identifiers (DOB, 5-digit ZIP, race, rare diagnosis codes) as + PHI-equivalent for chain-impact scoring. +- **Multi-tenant payor / provider sharing.** Payor-provider data sharing + is increasingly implemented through Snowflake Data Sharing / + Replication (Chain G), not nightly SFTP. This makes Chain G's + source-side audit gap a primary HIPAA-§164.312(b) audit-control issue. +- **AI agent action on patient data.** Cortex Agents that wrap Cortex + Analyst and stored procedures can read, summarize, and (where wrapped + with a DML procedure) modify patient records. The "minimum necessary" + requirement of HIPAA §164.502(b) is not naturally enforced by Cortex + unless row-access policies and masking policies are correctly applied + at the table layer. + +--- + +## Chain-by-Chain PHI Impact Map + +For each chain documented in the platform companion, this section names +the PHI-bearing surface the chain reaches, the HIPAA control the chain +challenges, and the BAA / Business Associate considerations that change +how the chain plays out in a healthcare environment. + +The "Default residual" column is the residual risk *assuming Snowflake's +post-UNC5537 defaults are turned on at the customer side* (mandatory +MFA on humans, network policies on service users, default Trust Center +scanners enabled). It is *not* a measure of platform security with all +hardening turned on — it is a measure of "what does the average 2026 +healthcare Snowflake account actually look like." + +### Chain A — Credential theft to bulk exfil + +- **PHI surface:** Whatever the compromised user can `SELECT` — + typically the analyst's curated patient mart, claims fact tables, or + the EHR-clarity export. In healthcare a single user role often grants + read on millions of patient records. +- **HIPAA control challenged:** §164.312(a)(1) Access Control, §164.308(a)(5)(ii)(D) + Password Management, §164.312(b) Audit Controls. +- **BAA consideration:** Snowflake is the BA. A breach here is the + customer's reporting obligation (Snowflake notifies the customer; the + customer notifies HHS-OCR within 60 days for a breach of 500+ + records). Speed of customer-side detection drives whether the + notification is within window. +- **Default residual:** **High.** Service users (dbt, airflow, BI + connectors) running on key-pair auth without network policies remain + the most common gap. Human users are largely covered by the April + 2025 enforcement. +- **Healthcare-specific tuning for the + [bulk_exfil_baseline](../../detection/snowflake/sigma/bulk_exfil_baseline.yml) + rule:** *per-role* baselining matters more than *per-user* — a + research role legitimately exports cohort data weekly; an + EHR-integration service account never should. The watchlist of + approved external stages must include the cloud-storage targets the + org uses for downstream consumers (research collaborators, value- + based-care partners) and *exclude* any stage created within the + monitoring window. + +### Chain B — Cortex Code indirect prompt injection (CVE-2026-6442) + +- **PHI surface:** The cached Snowflake token in the developer's + `~/.snowsql/` or `~/.snowflake/` cache plus whatever that user can + `SELECT`. For a healthcare data engineer this is typically the + full warehouse. +- **HIPAA control challenged:** §164.308(a)(5)(ii)(B) Protection from + Malicious Software, §164.312(a)(2)(i) Unique User Identification + (the attack runs *as* the user — the audit trail blames the user, not + the agent). +- **BAA consideration:** The Cortex Code CLI is the *customer's* + software running on the *developer's* endpoint. Snowflake's BAA + covers the service side, not the local agent's command-execution + surface. The disclosure timeline of CVE-2026-6442 (fixed + 2026-02-28) is the customer-side patching SLA, not Snowflake's. +- **Default residual:** **High until the version pin is enforced + across all developer endpoints.** Detection via the + [cortex_code_pre_1_0_25](../../detection/snowflake/sigma/cortex_code_pre_1_0_25.yml) + rule is endpoint-side, not Snowflake-side. +- **Healthcare-specific note:** Many healthcare data teams keep + research-only credentials cached on developer laptops. Cortex Code's + ability to read those caches means a "review my repo" prompt with an + embedded injection can lift research-data tokens that the security + program may not have inventoried. + +### Chain C — Native App Marketplace supply-chain + +- **PHI surface:** Any tables the consumer-side grant ACL exposed to + the installed application. In healthcare, this often includes + curated patient marts because the apps in question (population- + health analytics, payor-provider quality reporting, AI/ML inference + apps) require it. +- **HIPAA control challenged:** §164.314(a) Business Associate Contract + — Native App providers are *subcontractor BAs*. The customer's BAA + with Snowflake does not transitively cover the provider; a separate + BAA with the provider is required where the app receives PHI. +- **BAA consideration:** Auto-update of a Native App is the hardest + case. The customer signed a BAA with the provider; the provider + ships a new version that materially changes data access; the + customer's consent was given to the prior manifest. The Snowflake + NAAAPS pipeline catches malware and CVE-bearing dependencies but + does not arbitrate BAA-scope changes. +- **Default residual:** **Medium-high.** Many Native App listings in + the Healthcare-and-Life-Sciences vertical request broad grants + (read on EHR-mart, write on `ml_predictions`), and consumers + accept the auto-update default. +- **Healthcare-specific tuning:** the + [native_app_unexpected_version_bump](../../detection/snowflake/sigma/native_app_unexpected_version_bump.yml) + rule should be paired with a *manifest-delta* introspection step + (see iter-5 work) that flags any new EXTERNAL ACCESS INTEGRATION, + new container image, or grant-scope change between consumer- + approved version and current version. + +### Chain D — Federated-IdP compromise + +- **PHI surface:** Whatever role(s) the targeted user holds in the + IdP-to-Snowflake mapping. Frequently includes + `ACCOUNTADMIN`-class roles for the data platform team. +- **HIPAA control challenged:** §164.312(d) Person or Entity + Authentication — when the IdP is forged, the Snowflake-side audit + trail still records a successful authentication. +- **BAA consideration:** The IdP (Okta, Entra) is a separate BA where + it handles PHI as part of the workflow (most IdPs argue they do not, + but the *credentials* flowing through them reach PHI). +- **Default residual:** **High** where Golden-SAML-class attacks + succeed against the IdP. The Snowflake side has no visibility into + the IdP-side compromise except via the cross-system correlation that + the [federated_login_anomaly](../../detection/snowflake/sigma/federated_login_anomaly.yml) + rule attempts (and which requires both surfaces ingested). +- **Healthcare-specific tuning:** The rule's IdP-correlate field must + be populated from the same source-of-truth the org uses for OCR + audit production. If the IdP audit retention is shorter than the + Snowflake LOGIN_HISTORY retention (common — IdPs default to 90 days; + ACCOUNT_USAGE.LOGIN_HISTORY is 365 days), correlation gaps appear at + the boundary, and a post-incident reconstruction may be incomplete. + +### Chain E — Storage Integration cross-cloud pivot + +- **PHI surface:** Any cloud-storage location the integration's + `storage_allowed_locations` reaches. In healthcare, this commonly + spans EHR archive buckets, claims-data lakes, and imaging + repositories. +- **HIPAA control challenged:** §164.312(e)(1) Transmission Security + — the integration's network boundary is the customer's; egress to + an attacker-controlled bucket is in scope. +- **BAA consideration:** The storage account is the customer's + responsibility (CSP-side BAA — AWS, Azure, GCP). Snowflake does not + arbitrate the destination. +- **Default residual:** **Medium-high.** Wildcard + `storage_allowed_locations` (`s3://*/`) is a documented anti-pattern; + legacy integrations still exhibit it. +- **Healthcare-specific note:** Multi-region replication of imaging + data (DICOM PACS-to-cloud) often creates broad allowlists by + necessity; pair with bucket-policy-side controls. + +### Chain F — Key-pair JWT auth abuse + +- **PHI surface:** Identical to Chain A but with no MFA-replay + defense — the JWT is signed offline. +- **HIPAA control challenged:** §164.308(a)(5)(ii)(D) Password + Management (key-pair is the credential), §164.312(c)(1) Integrity. +- **BAA consideration:** None novel. +- **Default residual:** **High** where the key-pair user has no + network policy. This is Snowflake's own top callout. +- **Healthcare-specific note:** dbt Cloud, Fivetran, Matillion, and + similar integrations that pull HL7/FHIR feeds into Snowflake almost + always run as key-pair service users. Inventory those first. + +### Chain G — Direct Share / Replication exfil + +- **PHI surface:** The full content of a database designated as a + share's *secure object*. Healthcare orgs frequently share patient + cohorts with research collaborators or downstream payors using + this feature. +- **HIPAA control challenged:** §164.312(b) Audit Controls — the + source-side `QUERY_HISTORY` does not log a `SELECT`/`COPY` for the + replicated data, so a routine audit of "who read this table when" + misses the share-mediated reads. This is the most consequential + audit-trail gap on the platform for healthcare reporting. +- **BAA consideration:** Each consumer of a share is, by virtue of + receiving PHI, a Business Associate. If a share is configured to a + consumer the org does not have a BAA with, the gap is *legal*, not + just technical. +- **Default residual:** **Medium-high.** The chain depends on an + attacker reaching `ACCOUNTADMIN` or a role with `OWNERSHIP` on the + share — but once they do, the data motion is silent on the source + audit log. +- **Healthcare-specific note:** OCR breach reconstruction depends on + showing who accessed what when. Replication-mediated exfil that + doesn't appear in source-side audit forces the org to either + reconstruct from the *consumer* side (which they may not own) or + treat the entire shared dataset as breached. Build the + consumer-side audit-acquisition step into the incident-response + runbook *before* it's needed. + +### Chain H — SPCS over-broad EAI egress + +- **PHI surface:** Any data the SPCS service handles. Healthcare + Cortex / ML workloads in SPCS often handle PHI directly (model + inference on patient records, NLP on clinical notes). +- **HIPAA control challenged:** §164.312(e)(1) Transmission Security, + §164.308(a)(1)(ii)(A) Risk Analysis (the EAI is the documented + egress; if the documented network rule is permissive, the risk + analysis is wrong). +- **BAA consideration:** None novel (SPCS is inside the Snowflake + service boundary; the BAA covers it). +- **Default residual:** **Medium.** New SPCS deployments are + increasingly using narrower EAI scopes; legacy ones often have + wildcard rules. +- **Healthcare-specific note:** Any SPCS service that invokes external + LLM endpoints (OpenAI, Anthropic, Bedrock) is sending PHI out of + the Snowflake boundary — see Cortex section below. + +### Chain I — Cortex Agent MCP poisoning + +- **PHI surface:** Whatever the agent is allowed to query. In a + healthcare population-health flow, this is the full curated patient + mart. +- **HIPAA control challenged:** §164.502(b) Minimum Necessary — an + agent that pulls more rows than its question required (because an + injection instructed it to) violates minimum-necessary by design, + and the audit trail attributes the read to the prompting user. +- **BAA consideration:** Cortex's third-party model providers + (Anthropic, Azure OpenAI). Snowflake's documentation on what + payload reaches those providers and their retention is sparse + enough that healthcare counsel will want to ask. +- **Default residual:** **High** because Cortex Guardrails (the + vendor first-party defense) was GA only in early 2026 and customer + adoption is uneven; the chain assumes a *correct* RBAC model + underneath the agent, which is the harder half of the problem in + any real healthcare deployment. +- **Healthcare-specific note:** Any Cortex Search index built over + clinical notes, prior-auth letters, or appeal documents *includes + the documents themselves in retrieval context*. An injection + embedded in a single patient's denial letter can steer agent + behavior for any analyst who later queries near it. + +### Chain J — Partner-integration credential replay + +- **PHI surface:** Whatever the partner-held credential can read. + Common healthcare partners (Fivetran, Matillion, dbt Cloud, + Snowflake-aware BI vendors) often hold ACCOUNTADMIN-adjacent + service users. +- **HIPAA control challenged:** §164.308(b) Business Associate + Contracts (the partner is a sub-BA), §164.312(a)(1) Access Control. +- **BAA consideration:** Compromise at the partner becomes the + partner's reporting obligation under their BAA with the customer; + the customer is the OCR-side reporting party. The 60-day clock + starts at the partner's discovery. +- **Default residual:** **Medium-high.** The partner-side compromise + surface is outside the customer's network policy. +- **Healthcare-specific tuning:** the + [partner_integration_credential_replay](../../tools/cloud-identity/snowflake/detection/sigma/partner_integration_credential_replay.yml) + rule's allowlist of partner egress CIDRs is more brittle in + healthcare than in tech — many healthcare-vertical SaaS providers + do not publish stable CIDRs. + +--- + +## Cortex Over Patient Data — Specific Questions + +These are questions the assessment should put on the table for any +healthcare org running Cortex over patient data, regardless of which +chains are observed exercising them. + +1. **Boundary leakage.** Cortex final-response generation reaches out + of the Snowflake boundary to a third-party model provider. What + payload is sent? What is the provider's retention? Is the BAA with + the third-party model provider in place, and does it cover the + *type* of PHI that flows (clinical notes vs. claims codes vs. + structured demographics)? +2. **Cortex Search over clinical free text.** Any indexed corpus of + clinical notes, denial letters, appeal documents, or patient-portal + messages is both a data-leak surface *and* an injection-payload + delivery surface. Treat the index itself as PHI-bearing. +3. **Cortex Analyst's semantic model as policy.** The semantic model + defines what an analyst-facing agent can query. In healthcare it is + effectively a minimum-necessary policy expressed as + YAML/JSON. Review the semantic model as a security artifact, not + just an analytics one. +4. **Cortex Agents with DML tools.** An agent wrapping Cortex Analyst + (read) with a stored procedure that does DML (write) breaks the + "Analyst is SELECT-only" guarantee. Inventory every agent's tool + set; flag any that combine a read tool over PHI with a write tool + anywhere. +5. **Guardrails policy applicability.** Snowflake Cortex Guardrails + policies are configurable. The default policy is not tuned for + healthcare-specific abuse (PHI extraction prompts, cohort fishing, + re-identification attempts). The + [Guardrails harness](../../tools/llm-attacks/cortex/guardrails-harness/) + in this repo includes a corpus tier that can be extended with + healthcare-specific payloads. + +--- + +## Audit-Retention Sufficiency for OCR Reconstruction + +HHS-OCR can request audit reconstruction up to 6 years post-event +(§164.530(j) — documentation retention period). The Snowflake-side +audit retentions that matter: + +| Surface | Retention | Notes | +|---------|-----------|-------| +| `SNOWFLAKE.ACCOUNT_USAGE.LOGIN_HISTORY` | 365 days | Insufficient for the 6-year OCR window. | +| `SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY` | 365 days | Insufficient for the 6-year OCR window. | +| Snowflake Trail (event stream) | Customer-controlled (sink) | Becomes whatever retention the SIEM / data lake gives it. | +| Streaming-ingest `INFORMATION_SCHEMA.QUERY_HISTORY()` polling | Customer-controlled (sink) | Same — retention is the downstream's, not Snowflake's. | + +The practical implication: **a healthcare Snowflake deployment cannot +rely on Snowflake's first-party retention for OCR-grade reconstruction +of a breach older than a year.** The org's SIEM, data lake, or +dedicated audit warehouse must hold a copy. The +[streaming-ingest pipeline](../../detection/snowflake/streaming-ingest/) +in this repo is the producer side of that pipeline; the org owns the +sink-side retention configuration. + +Two specific gaps to plan around: + +- **Chain G's source-side blind spot.** Replication / Direct Share + data motion that does not appear in source-side `QUERY_HISTORY` + also does not appear in the streamed projection of it. The + consumer-side audit (which the consumer owns) is the only place + the read shows up. +- **Cortex Agent step traces.** As of mid-2026, Cortex Agent + step-level traces are exposed via newer `CORTEX_AGENT_HISTORY`- + style views (where available) and the Trail event stream where + Trail is enabled. Where neither is available the org should + treat Cortex Agent activity as audit-thin and gate the agent's + PHI access at the row-access-policy layer instead. + +--- + +## What to Add to the Engagement Runbook + +These are the items a healthcare-specific Snowflake engagement should +add over a generic platform assessment, regardless of which chains are +in scope: + +- **Inventory of PHI-bearing surfaces.** Per-database, per-schema, a + classification (PHI / LDS / De-id / Non-PHI) signed off by the + privacy office. Without this, chain impact scoring is guesswork. +- **Per-role minimum-necessary review.** For every role that has + `SELECT` on a PHI-bearing schema, confirm that the role's user + population, IdP-group mapping, and use case are aligned with + minimum-necessary. +- **BAA inventory cross-referenced against installed Native Apps and + partner integrations.** Every consumer of a share, every Native App + receiving grants on PHI-bearing schemas, every partner SaaS holding + a Snowflake credential, should have a corresponding BAA. +- **Cortex agent semantic-model and tool-set review.** As above — the + semantic model is policy, the tool set is the action surface. +- **OCR reconstruction tabletop.** Pick a date 18 months back; can + the org produce a full audit trail of who accessed PHI table X + between dates Y and Z? If the answer requires data the org does + not have, that gap is a §164.312(b) finding regardless of whether + any chain has been exercised. +- **Incident-response runbook addition: cross-account share + acquisition.** For Chain G, the consumer-side audit is the only + source. Pre-build the legal and technical path to acquire it. + +--- + +## Cross-References + +- Platform companion: [snowflake-platform-attack-surface-2026.md](snowflake-platform-attack-surface-2026.md) +- Cross-platform comparison: [databricks-vs-snowflake-platform-comparison.md](databricks-vs-snowflake-platform-comparison.md) +- Detection index: [`detection/snowflake/README.md`](../../detection/snowflake/README.md) +- Cortex Guardrails harness: [`tools/llm-attacks/cortex/guardrails-harness/`](../../tools/llm-attacks/cortex/guardrails-harness/) +- Web report: [`reports/snowflake-platform-assessment/`](../../reports/snowflake-platform-assessment/) diff --git a/docs/analysis/snowflake-platform-attack-surface-2026.md b/docs/analysis/snowflake-platform-attack-surface-2026.md index 4b37037..7a08d36 100644 --- a/docs/analysis/snowflake-platform-attack-surface-2026.md +++ b/docs/analysis/snowflake-platform-attack-surface-2026.md @@ -813,6 +813,161 @@ emits a remediation-prioritized report. Lab validation in [`tools/cloud-identity/snowflake/lab-validation/partner_integration_baseline.sql`](../../tools/cloud-identity/snowflake/lab-validation/partner_integration_baseline.sql) captures the baseline source-IP profile per partner user. +### Chain K — Polaris / Iceberg Catalog Abuse + +Snowflake's Open Catalog (Polaris) and the broader Iceberg REST +catalog ecosystem expand the platform's attack surface in directions +the original chain inventory did not cover. + +The Iceberg specification layers a table's identity through three +indirections: the catalog holds a pointer to a `metadata.json`; that +file points at a manifest list; the manifest list points at the data +files. A defender enumerating "tables in the warehouse" sees the +top-level entry; the *content* lives several layers down. Three +exploitable conditions follow: + +1. **Catalog-credential leakage.** The catalog's REST endpoint is + reached via OAuth client credentials or PAT-style tokens. A token + stolen from a host that runs Spark, Trino, or Snowflake-side + Iceberg integrations grants read on every table the credential can + list — and unlike Snowflake-native auth, the catalog credential + does not inherit the network policy on the Snowflake user. +2. **Metadata-pointer poisoning.** A role with `WRITE_METADATA` on a + table can update the pointer to a metadata file the attacker + controls. Subsequent reads of "the same table" return attacker- + staged data. The table name does not change; no + `CREATE`/`DROP`/`RENAME` SQL appears in `QUERY_HISTORY`. +3. **External-table pivot.** Snowflake Iceberg tables created from + an external catalog inherit the catalog's storage scope. An + attacker holding the catalog credential can register a Snowflake + external Iceberg table that points at any storage location the + catalog reaches — sidestepping the customer's STORAGE INTEGRATION + allowlists. + +**Detection counterpart**: alert on any new or altered external +Iceberg table whose metadata or storage location is outside the +documented catalog base prefix. The +[`iceberg_table_outside_catalog_base.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/iceberg_table_outside_catalog_base.yml) +rule encodes this. Pair with catalog-side audit if available — the +metadata-pointer write is the only signal of mode 2, and it lives in +the catalog's audit, not Snowflake's. + +**Tooling**: +[`tools/lateral-movement/snowflake-pivot/iceberg_catalog_pivot.py`](../../tools/lateral-movement/snowflake-pivot/iceberg_catalog_pivot.py) +enumerates the lab catalog, joins against Snowflake's external table +registrations, and reports tables whose metadata pointer was written +outside the approved writer set or whose storage URI falls outside +the catalog base. + +### Chain L — External OAuth Scope Drift + +Snowflake's external OAuth integration with Entra ID, Okta, +PingFederate, or Auth0 maps IdP-issued tokens to Snowflake roles. The +mapping is multi-component: the integration's +`external_oauth_token_user_mapping_claim` names which IdP claim +identifies the Snowflake user; the integration's role-claim mapping +determines which Snowflake role is granted; the IdP-side consent +determines which scopes a client app may request and which users have +granted them. + +Three drift conditions move an integration's effective authority +silently: + +1. **Scope creep at the IdP.** A consent attack against a tenant admin + — or an over-broad client-app registration — grants Snowflake- + facing scopes the customer did not intend. The Snowflake-side + audit shows no change; the IdP-side audit may or may not. +2. **Audience reuse.** A token issued for one Snowflake integration + is replayed against another integration that shares the same + audience. The two integrations' role mappings differ; the attacker + collects the union. +3. **Stale admin mapping.** The integration's `default_role` was set + when the IdP had narrow scopes. The IdP later added broader ones. + The Snowflake side did not change; the effective authority did. + +**Detection counterpart**: +[`oauth_integration_scope_drift.yml`](../../tools/cloud-identity/snowflake/detection/sigma/oauth_integration_scope_drift.yml) +fires on `ALTER INTEGRATION … SET EXTERNAL_OAUTH_…` events that +reach an admin-class role, and (with the IdP-consent enrichment) on +silent IdP-only widening. The structural control is at the IdP: +keep client-app scope grants minimal and audit consent expansions. + +**Tooling**: +[`tools/cloud-identity/snowflake/oauth_scope_audit.py`](../../tools/cloud-identity/snowflake/oauth_scope_audit.py) +joins the Snowflake integration inventory against an IdP-consent +fixture and reports the three drift classes. + +### Chain M — UDF EXTERNAL ACCESS INTEGRATION Breakout + +Snowflake's Python and Scala UDFs run sandboxed with no network +access by default. The `EXTERNAL_ACCESS_INTEGRATIONS = ()` +clause is the documented exception: a UDF declared with an EAI can +call out to whatever the EAI's referenced NETWORK RULE permits. + +Chain H covers the SPCS-service variant of this primitive. Chain M is +the UDF variant — different threat geometry: + +- A UDF is invoked as part of normal query execution. Every analyst + who runs `SELECT my_udf(col) FROM patient_table` triggers the + UDF's network egress. The attack surface is the entire user + population of the UDF, not just a specific service account. +- The UDF's egress identity is the function *owner*, not the + invoking session. A misconfigured EAI on a UDF callable by `PUBLIC` + is a sanctioned exfil channel that fires under any analyst's + session and is attributed in audit to the owner. +- `QUERY_HISTORY` shows the UDF invocation but not the destination + of the network call. The audit gap mirrors Chain G's source-side + blindness on data motion. + +**Detection counterpart**: +[`udf_with_eai_invocation.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/udf_with_eai_invocation.yml) +joins `QUERY_HISTORY` to `ACCOUNT_USAGE.FUNCTIONS` and +`ACCOUNT_USAGE.INTEGRATIONS` and fires when a UDF with an over-broad +EAI is invoked by a role that is not the function owner. The +compensating control is the compute-pool-side network egress log — +QUERY_HISTORY alone cannot resolve the destination. + +**Tooling**: +[`tools/lateral-movement/snowflake-pivot/udf_eai_egress.py`](../../tools/lateral-movement/snowflake-pivot/udf_eai_egress.py) +sets up an EAI + UDF with one of three rule shapes +(`deny-by-default` / `scoped` / `wildcard`), invokes it against the +lab fixture, and reads back QUERY_HISTORY plus the modeled egress +log to show the visibility-vs-impact matrix. + +### SPCS Base-Image Supply Chain (Chain H extension) + +The Chain H tooling covers SPCS network egress; this section covers +the orthogonal supply-chain surface — the container images SPCS +services run. + +Snowflake's documented Native App + SPCS review covers provider-side +listing posture. Consumer-side image-source posture is the consumer's +responsibility. Three failure modes are common: + +1. **Tag pinning instead of digest pinning.** Spec references + `python:3.11-slim`, not `python@sha256:…`. The tag is mutable; + the base image content can change between deployments. +2. **Image source outside the customer's approved registry list.** + Public registries (`docker.io`, `ghcr.io`) are reachable by + default; nothing on the Snowflake side enforces a private- + registry-only policy. +3. **Stale base.** A base image not refreshed within an SLA window + widens the exposure window to any post-build CVE that lands on + the base. Not a direct attack signal on its own, but pairs with + #1 to make exploit reliability easier. + +**Detection counterpart**: +[`spcs_image_unpinned_or_external.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/spcs_image_unpinned_or_external.yml) +fires on service create/alter where `image_has_digest_pin = false` +or `image_registry_in_approved_set = false`. The structural control +is an admission policy on the SPCS deployment pipeline; the rule is +the detection-engineering equivalent. + +**Tooling**: +[`tools/lateral-movement/snowflake-pivot/spcs_base_image_probe.py`](../../tools/lateral-movement/snowflake-pivot/spcs_base_image_probe.py) +walks `SHOW SERVICES` for the lab account and classifies each +service's image reference across the three modes. + --- ## Reuse from Existing Repo Tooling diff --git a/reports/snowflake-platform-assessment/attack-chains.html b/reports/snowflake-platform-assessment/attack-chains.html index 4f6b55a..0857e44 100644 --- a/reports/snowflake-platform-assessment/attack-chains.html +++ b/reports/snowflake-platform-assessment/attack-chains.html @@ -271,6 +271,94 @@

Attack chains

+ +
+
Chain K — Polaris / Iceberg catalog abuse
+
+

+ Snowflake's Open Catalog (Polaris) and the broader Iceberg REST catalog ecosystem layer table identity + through metadata pointers. The catalog holds a pointer to a metadata.json; that file points + at the manifest list; the manifest list points at the data files. A defender enumerating tables sees the + top-level entry; the content lives several layers down. Three exploitable conditions follow. +

+
    +
  1. Catalog-credential leakage. The catalog's REST endpoint is reached via OAuth client credentials or PAT-style tokens; the credential does not inherit the network policy on the Snowflake user.
  2. +
  3. Metadata-pointer poisoning. A role with WRITE_METADATA on a table updates the pointer to attacker-controlled metadata. The table name is unchanged; no CREATE / RENAME appears in QUERY_HISTORY.
  4. +
  5. External-table pivot. An attacker holding the catalog credential registers a Snowflake external Iceberg table pointing at storage the customer's STORAGE INTEGRATION allowlist does not cover.
  6. +
+
+ Detection: alert on external Iceberg table creation or refresh whose metadata or storage + location falls outside the catalog's documented base prefix. Pair with catalog-side audit if available — + the metadata-pointer write is only visible there. Tooling at + tools/lateral-movement/snowflake-pivot/iceberg_catalog_pivot.py. +
+
+
+ +
+
Chain L — External OAuth scope drift
+
+

+ External OAuth integrations (Entra, Okta, Ping, Auth0) map IdP-issued tokens to Snowflake roles. Three + drift conditions silently widen an integration's effective authority without a Snowflake-side change. +

+
    +
  1. Scope creep at the IdP. Consent attacks against tenant admins, or over-broad client-app registrations, grant Snowflake-facing scopes the customer did not intend.
  2. +
  3. Audience reuse. A token issued for one integration is replayed against another sharing the same audience claim; the attacker collects the union of role mappings.
  4. +
  5. Stale admin mapping. The integration's default_role was set when the IdP had narrow scopes; the IdP later added broader ones, and the mapping silently widened.
  6. +
+
+ Detection: alert on ALTER INTEGRATION … EXTERNAL_OAUTH_… events that reach + an admin-class role, and (with IdP-consent enrichment) on silent IdP-only widening. The structural + control is at the IdP — minimal client-app scope grants, audited consent expansions. Tooling at + tools/cloud-identity/snowflake/oauth_scope_audit.py. +
+
+
+ +
+
Chain M — UDF EXTERNAL ACCESS INTEGRATION breakout
+
+

+ Python and Scala UDFs run sandboxed with no network egress by default. The + EXTERNAL_ACCESS_INTEGRATIONS clause is the documented exception. The Chain H tooling covers + the SPCS-service variant of this primitive; Chain M is the UDF variant with different threat geometry. +

+
    +
  1. A UDF is invoked during normal query execution. Every analyst who runs SELECT my_udf(col) FROM patient_table triggers the UDF's network egress.
  2. +
  3. The UDF's egress identity is the function owner, not the invoking session. A misconfigured EAI on a UDF callable by PUBLIC is a sanctioned exfil channel that fires under any analyst's session.
  4. +
  5. QUERY_HISTORY shows the UDF invocation but not the destination of the network call. The audit gap mirrors Chain G's source-side blindness on data motion.
  6. +
+
+ Detection: join QUERY_HISTORY to ACCOUNT_USAGE.FUNCTIONS and + ACCOUNT_USAGE.INTEGRATIONS; alert when a UDF with an over-broad EAI is invoked by a role + that is not the function owner. Compute-pool-side network egress logs are the compensating control for + the destination. Tooling at tools/lateral-movement/snowflake-pivot/udf_eai_egress.py. +
+
+
+ +
+
SPCS base-image supply chain (Chain H extension)
+
+

+ The Chain H tooling covers SPCS network egress. The orthogonal supply-chain surface is the container + images SPCS services run. Snowflake's documented Native App + SPCS review covers provider-side listing + posture; consumer-side image-source posture is the consumer's responsibility. +

+
    +
  1. Tag pinning instead of digest pinning. Specs reference python:3.11-slim, not python@sha256:…. The tag is mutable.
  2. +
  3. Off-registry source. Public registries are reachable; nothing on the Snowflake side enforces a private-registry-only policy.
  4. +
  5. Stale base. A base image not refreshed within an SLA window widens the exposure window to any post-build CVE.
  6. +
+
+ Detection: alert on service create/alter where the image is not digest-pinned or where the + registry is not on the customer's approved list. The structural control is an admission policy on the + SPCS deployment pipeline. Tooling at + tools/lateral-movement/snowflake-pivot/spcs_base_image_probe.py. +
+
+