AndrewAltimit · AndrewAltimit · May 16, 2026 · May 16, 2026
diff --git a/detection/snowflake/ENRICHMENT.md b/detection/snowflake/ENRICHMENT.md
@@ -180,17 +180,88 @@ to avoid false positives during the slower system's ingestion delay.
 - **Rules:**
   [`federated_login_anomaly.yml`](sigma/federated_login_anomaly.yml)
   (and Trail-paired variants where present).
-- **Computation:** `lag_tolerant` is a boolean ingestion-pipeline
-  parameter (default `true`). When true, the rule defers final firing
-  until *both* sides of the correlation have known-ingested timestamps
-  newer than the event time, surfaced as `both_sources_caught_up = true`.
-  The per-source watermarks (`idp_audit_watermark_ingested_at`,
-  `snowflake_login_ingested_at`) come from the SIEM pipeline's
-  per-source ingestion tracker.
-- **Input data location:** Computed by the SIEM ingestion pipeline
-  from per-source ingestion timestamps. Most modern SIEMs (Sentinel,
-  Splunk SC4S, Elastic Fleet) expose ingestion watermarks per source;
-  the pipeline only needs to materialize them as enrichment columns.
+- **Why these fields exist:** the federated-login rule fires on the
+  *absence* of a corresponding IdP sign-in event. Without a known
+  watermark per source, "absent" cannot be distinguished from "delayed."
+  A rule that fires on "absent so far" produces a false-positive storm
+  during routine IdP audit ingestion lag and is the first detection a
+  SOC will disable. The watermark machinery is what makes the rule
+  deployable.
+
+**Definition.** A *per-source watermark* is the latest event-time of
+any record successfully ingested from that source — measured in the
+event's own timestamp domain, not the SIEM's wall clock. The rule
+treats an event as "comparable against the other source" only when
+both sources' watermarks are at or after the event's timestamp plus the
+configured `idp_correlation_window_minutes`. Until then, the rule
+suppresses firing and re-evaluates on the next ingestion tick.
+
+**Computation by SIEM.** The watermark is a max-aggregation per source
+over a recent window. Concrete forms:
+
+  - **Microsoft Sentinel (KQL):**
+    ```kusto
+    let idp_audit_watermark = toscalar(
+      OktaSystemLog
+      | where TimeGenerated > ago(2h)
+      | summarize max(EventTime)
+    );
+    let snowflake_login_watermark = toscalar(
+      SnowflakeLoginHistory
+      | where TimeGenerated > ago(2h)
+      | summarize max(EventTimestamp)
+    );
+    ```
+    Materialize as enrichment columns on each event being evaluated;
+    `both_sources_caught_up = (event_time + window_minutes) <= least_of_both_watermarks`.
+  - **Splunk (SPL):** Use a `tstats latest(_time)` lookup per
+    sourcetype, indexed every 60s into a `summary` kvstore; join the
+    summary into each event at search time. `_indextime` (not `_time`)
+    is the canonical ingest-time field if event-time is unreliable.
+  - **Elastic / Logstash:** Use the `ingest_time` field added by the
+    Fleet pipeline; aggregate via a transform that computes
+    `max(ingest_time)` per `data_stream.dataset`. Surface as a
+    `runtime_field` on the detection index.
+
+**Where input data lives.** The watermarks are computed from the
+ingestion pipeline's own state, not from external watchlists. Concretely
+the inputs are: (a) for the IdP side, the Okta System Log connector's
+`EventTime` column or the Entra Sign-In Logs connector's
+`SignInActivityTimestamp` column; (b) for the Snowflake side, the
+`LOGIN_HISTORY` event timestamp on the ACCOUNT_USAGE path, or the
+Trail `auth.snowflake.login` event time on the Trail path. The
+materialized watermarks should be reachable as fields on each event
+the SIEM evaluates — either via a per-event lookup join or via a
+runtime field that re-computes on each search.
+
+**Fallback when watermarks are not yet wired up.** A SIEM without the
+watermark machinery has two safe operating modes:
+
+  - **Conservative (recommended).** Set `lag_tolerant: false` on the
+    rule body and tune `idp_correlation_window_minutes` to absorb the
+    worst-case combined ingestion SLA of both sides (e.g., 60 minutes
+    if Okta SLA is 10m, ACCOUNT_USAGE latency is 45m, and the IdP-to-
+    Snowflake correlation window is 5m). The rule degrades to a fixed-
+    window correlation that fires when no IdP event has arrived after
+    the window has elapsed. False-positive risk is bounded by the SLA.
+  - **Permissive (not recommended without compensating controls).**
+    Pin `both_sources_caught_up: true` unconditionally. The rule fires
+    on apparent absence regardless of ingestion state and will produce
+    spurious alerts during routine ingestion lag. Use this only as a
+    short-term posture while the watermark pipeline is being built,
+    and only with explicit SOC sign-off; the noise will train operators
+    to ignore the rule.
+
+**Validation before promoting the rule to alert.** Submit a synthetic
+test event with `has_corresponding_idp_event: false` and an
+event-timestamp 24 hours in the future. The rule MUST NOT fire — if it
+does, the watermark logic is not wired up (the rule is evaluating the
+future event as if both sources had ingested up to it, which they have
+not). A separate synthetic with an event-timestamp two minutes in the
+past, no matching IdP event, and `lag_tolerant: true` should fire only
+after both real ingestion watermarks have advanced past the event time;
+observe the firing delay equals the slower source's lag, not the rule
+evaluator's cadence.
 
 ### `has_cortex_code_session_within_window`, `cortex_code_session_host_id`
 
@@ -433,7 +504,7 @@ trace once captured:
 
 ## 11. Chain-M (UDF EAI) Derived Fields
 
-### `udf_owner`, `udf_eai_list`, `eai_network_rule_value_list`, `eai_rule_is_overbroad`, `invocation_role_eq_owner`
+### `udf_owner`, `udf_has_eai`, `udf_eai_list`, `eai_network_rule_value_list`, `eai_rule_is_overbroad`, `invocation_role_eq_owner`
 
 - **Rule:** [`udf_with_eai_invocation.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/udf_with_eai_invocation.yml).
 - **Native source:** `ACCOUNT_USAGE.FUNCTIONS` (function owner + EAI
@@ -442,8 +513,14 @@ trace once captured:
   user/role).
 - **Computation:** Join at ingest. `eai_rule_is_overbroad` is `true`
   when the referenced NETWORK RULE's `value_list` contains a wildcard
-  (`*`, `OPEN_ANY`).
-- **Input data location:** All four joins are pure Snowflake views.
+  (`*`, `OPEN_ANY`). `udf_has_eai` is the explicit boolean cardinality
+  flag — `true` iff `FUNCTIONS.EXTERNAL_ACCESS_INTEGRATIONS` is non-null
+  AND parses to a non-empty list. The rule keys on `udf_has_eai: true`
+  rather than `udf_eai_list|exists: true` because Sigma's `|exists`
+  modifier checks field presence, not list cardinality — an empty list
+  serialized as `[]` would pass `|exists` and fire the rule on UDFs
+  that declare *no* EAIs.
+- **Input data location:** All five joins are pure Snowflake views.
 
 ## 12. SPCS Image Posture Derived Fields
 
@@ -469,6 +546,12 @@ confirm:
   `idp_correlation_window_minutes` is tuned to cover the worst-case
   ingestion lag of either side, and `lag_tolerant` is enabled by
   default.
+- [ ] Per-source ingestion watermarks (`idp_audit_watermark_ingested_at`,
+  `snowflake_login_ingested_at`) are materialized as enrichment columns
+  using the SIEM-specific recipe in &sect;3 above; the synthetic-event
+  validation (future-dated event MUST NOT fire) has been run and passed.
+  Without this, `federated_login_anomaly.yml` silently never fires and
+  the customer has a Chain D detection gap they will not notice.
 - [ ] If Snowflake Trail is enabled for the account, the Cortex Trail
   event families are emitting; if not, a Cortex telemetry sidecar
   is deployed (Snowpark wrapper) and the rule registry reflects

diff --git a/detection/snowflake/README.md b/detection/snowflake/README.md
@@ -16,13 +16,13 @@ customer needs in place before the rule will fire. Honest accounting:
 | Tag | Count | What it means for deployment |
 |-----|------:|------------------------------|
 | `production_ready` | 4 | Fires on raw audit / log surfaces a customer already ingests. No enrichment, correlation, or sidecar required. Drop in. |
-| `requires_enrichment` | 19 | Fires only when a SIEM-side enrichment pipeline computes the derived fields listed under each rule's `enrichment.required`. See [`ENRICHMENT.md`](ENRICHMENT.md) for the full field contract; templates under [`enrichment-templates/`](enrichment-templates/). |
+| `requires_enrichment` | 20 | Fires only when a SIEM-side enrichment pipeline computes the derived fields listed under each rule's `enrichment.required`. See [`ENRICHMENT.md`](ENRICHMENT.md) for the full field contract; templates under [`enrichment-templates/`](enrichment-templates/). |
 | `requires_correlation` | 4 | Fires only when an external audit stream — IdP sign-in events for `federated_login_anomaly` / `oauth_integration_scope_drift`, Cortex Code CLI session logs for `cortex_code_session_to_unknown_session` — is correlated with the Snowflake-side event. |
 | `requires_cortex_sidecar` | 5 | Fires only when a Cortex Agents per-step trace is surfaced by a sidecar. Snowflake's first-party `ACCOUNT_USAGE` views do not surface the depth these rules require. |
 | `requires_endpoint_telemetry` | 1 | Fires on host-side process / file telemetry, not Snowflake audit (Cortex Code CLI version-string detection). |
 
-**Rule of thumb**: of the 33 Sigma rules in this pack, 4 work out of the
-box. The remaining 29 land an alert only after the relevant enrichment,
+**Rule of thumb**: of the 34 Sigma rules in this pack, 4 work out of the
+box. The remaining 30 land an alert only after the relevant enrichment,
 correlation, or sidecar is operational. The `requires_enrichment` tier
 is the biggest deployment lift; the [`enrichment-templates/`](enrichment-templates/)
 directory has the SQL and SIEM lookup definitions to compute the derived
@@ -66,7 +66,7 @@ ingestion surface available on the customer's side.
 
 | Chain | What it does | ACCOUNT_USAGE Sigma | Trail Sigma |
 |-------|--------------|---------------------|-------------|
-| A — Credential theft to bulk exfil | UNC5537 replay; bulk `COPY INTO @stage` from a non-MFA / no-network-policy user. | [`bulk_exfil_baseline.yml`](sigma/bulk_exfil_baseline.yml) + bind-param coverage: [`snowflake_bind_param_audit_gap.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/snowflake_bind_param_audit_gap.yml) | — (folded into bulk_exfil_baseline via the streaming-ingest pipeline) |
+| A — Credential theft to bulk exfil | UNC5537 replay; bulk `COPY INTO @stage` from a non-MFA / no-network-policy user. | [`bulk_exfil_baseline.yml`](sigma/bulk_exfil_baseline.yml) + bind-param coverage: [`snowflake_bind_param_audit_gap.yml`](../../tools/lateral-movement/snowflake-pivot/detection/sigma/snowflake_bind_param_audit_gap.yml) | [`bulk_exfil_baseline_trail.yml`](sigma/bulk_exfil_baseline_trail.yml) — mirrors the ACCOUNT_USAGE rule's four-signal contract on `query.snowflake.completed`; where Trail is not yet wired up, the streaming-ingest sidecar under [`streaming-ingest/`](streaming-ingest/) is the interim minute-scale coverage |
 | B — Cortex Code indirect injection | Pre-1.0.25 Cortex Code CLI executes shell-pipe-sh under indirect prompt injection. | [`cortex_code_pre_1_0_25.yml`](sigma/cortex_code_pre_1_0_25.yml) (version-string, endpoint-side) + behavioral pair: [`cortex_code_session_to_unknown_session.yml`](sigma/cortex_code_session_to_unknown_session.yml) | covered by the behavioral pair (does not depend on Trail event names) |
 | C — Native App Marketplace supply-chain | Installed Native App auto-updates to a manifest with new external integrations, new privileges, or new/mutated dependencies (incl. deferred-loader shape). | [`native_app_unexpected_version_bump.yml`](sigma/native_app_unexpected_version_bump.yml) + [`native_app_privilege_bump.yml`](../../tools/supply-chain/snowflake-native-app/detection/sigma/native_app_privilege_bump.yml) + [`native_app_dependency_drift.yml`](../../tools/supply-chain/snowflake-native-app/detection/sigma/native_app_dependency_drift.yml) | [`native_app_privilege_bump_trail.yml`](../../tools/supply-chain/snowflake-native-app/detection/sigma/native_app_privilege_bump_trail.yml) |
 | D — Federated-IdP compromise | Forged SAML/OAuth assertion authenticates a high-privileged Snowflake user. | [`federated_login_anomaly.yml`](sigma/federated_login_anomaly.yml) | — (use the Chain F Trail variant; same login_history shape) |

diff --git a/detection/snowflake/sigma/bulk_exfil_baseline_trail.yml b/detection/snowflake/sigma/bulk_exfil_baseline_trail.yml
@@ -0,0 +1,118 @@
+title: Snowflake Trail — Bulk COPY INTO External Stage (Chain A, role-aware)
+id: 9a1c3e5f-7b2d-4e6a-9c0e-1f2a3b4c5d6e
+maturity: requires_enrichment    # fires only when a SIEM-side enrichment pipeline computes the derived fields listed under enrichment.required
+status: experimental
+description: |
+  Trail-event-shaped pair to `bulk_exfil_baseline.yml`. Consumes the
+  `query.snowflake.completed` Trail event for any `COPY INTO @<external_stage>`
+  whose combination of signals separates an attacker's first-and-only
+  bulk exfil from a legitimate role's recurring data motion.
+
+  The detection contract mirrors the ACCOUNT_USAGE-shaped rule exactly —
+  same four-signal gating, same false-positive guidance — but consumes
+  the real-time Trail event stream instead of polling
+  `SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY`. Latency advantage: seconds
+  rather than the up-to-45-minute ACCOUNT_USAGE projection lag. This is
+  the difference between catching an UNC5537-class exfil during the
+  attacker's session vs. after it has already completed.
+
+  The rule fires when **all four** of the following hold:
+
+  1. The query is a `COPY INTO @<external_stage>` (external-stage form,
+     not internal stage).
+  2. The external stage is **not** on the customer-maintained
+     approved-exfil-stage watchlist.
+  3. **At least one** of:
+       - the user's role is not in the approved bulk-exporter set,
+       - the volume exceeds the role's 90th-percentile baseline,
+       - the event falls outside business hours for the user's
+         time zone / tenant policy.
+  4. Volume above 10 MB.
+
+  Pair with `snowflake_bind_param_audit_gap.yml` for sessions where bind
+  parameters degrade the audit signal — the Trail event preserves the
+  same bind-parameter blindness on `QUERY_TEXT` that ACCOUNT_USAGE does,
+  so the pair coverage is needed regardless of the ingestion surface.
+
+  **Known sensitivity gap** (identical to the ACCOUNT_USAGE rule): an
+  attacker who has stolen credentials for an approved bulk-exporter role
+  and exfils inside that role's documented business-hours window at a
+  volume below the role's p90 baseline is invisible to this rule unless
+  the destination stage is flagged at a higher signal level than the
+  current outer-OR gating. The `fp_fn_harness/bulk_exfil_baseline.py`
+  harness measures this gap; the recommended remediation is documented
+  there (promote `external_stage_in_watchlist` to a fire signal, or add
+  a `stage_outside_corp_namespace` enrichment field).
+
+  **Deployment posture.** This rule activates only where Snowflake Trail
+  is enabled and the `query_events` event family is subscribed. Where
+  Trail is not yet wired up, the customer's interim coverage is the
+  streaming-ingest sidecar documented under
+  `detection/snowflake/streaming-ingest/`, which polls
+  `INFORMATION_SCHEMA.QUERY_HISTORY()` on a 60-second cadence and
+  produces equivalent signal at ~minute-scale latency (rather than the
+  Trail rule's seconds-scale).
+references:
+  - https://cloud.google.com/blog/topics/threat-intelligence/unc5537-snowflake-data-theft-extortion
+  - https://docs.snowflake.com/en/user-guide/snowflake-trail
+  - https://docs.snowflake.com/en/sql-reference/sql/copy-into-location
+author: security-research
+date: 2026-05-15
+tags:
+  - attack.exfiltration
+  - attack.t1567.002
+enrichment:
+  required:
+    - external_stage_in_watchlist
+    - role_in_approved_bulk_exporter_set
+    - volume_above_role_baseline
+    - outside_business_hours
+  doc: ../ENRICHMENT.md
+logsource:
+  product: snowflake_trail
+  service: query_events
+detection:
+  copy_to_external:
+    event_type: 'query.snowflake.completed'
+    query_type|startswith: 'COPY'
+    query_text|contains: '@'
+  external_stage_not_in_watchlist:
+    external_stage_in_watchlist: false
+  role_off_baseline:
+    role_in_approved_bulk_exporter_set: false
+  volume_above_baseline:
+    volume_above_role_baseline: true
+  off_hours:
+    outside_business_hours: true
+  size_floor:
+    bytes_written_to_result|gte: 10485760   # 10 MB lower floor
+  condition: >
+    copy_to_external
+    and external_stage_not_in_watchlist
+    and size_floor
+    and (role_off_baseline or volume_above_baseline or off_hours)
+fields:
+  - event_timestamp
+  - user_name
+  - role_name
+  - session_id
+  - query_text
+  - bytes_written_to_result
+  - rows_produced
+  - external_stage_url
+  - role_in_approved_bulk_exporter_set
+  - volume_above_role_baseline
+  - outside_business_hours
+falsepositives:
+  - Legitimate first-run of a new pipeline that loads from / unloads to
+    a freshly-created external stage. Maintain a 24h grace +
+    on-call notification — the rule should warn (not page) until the
+    new stage is added to the watchlist.
+  - Genuinely novel ad-hoc exports from approved bulk-exporter roles
+    during declared incidents (DR, data-migration, etc.). Tag the
+    incident window so the rule's `off_hours` signal does not stack
+    with operational urgency.
+  - Roles that have legitimate after-hours export windows (overnight
+    EHR refreshes). Define the `outside_business_hours` calculation
+    per role, not per tenant — the enrichment doc names this pattern.
+level: high
diff --git a/detection/snowflake/sigma/native_app_unexpected_version_bump.yml b/detection/snowflake/sigma/native_app_unexpected_version_bump.yml
@@ -11,6 +11,14 @@ description: |
   provider account pushes a new manifest; consumers with auto-update
   enabled receive it without re-consent. The NAAAPS scan is the upstream
   control; the consumer-side detection is the version-bump diff.
+
+  Structural contract: `manifest_diff_added` is a list of prefixed
+  tokens emitted by the application_history projection
+  (`PRIVILEGE:<name>`, `EXTERNAL ACCESS INTEGRATION:<name>`,
+  `EXTERNAL FUNCTION:<name>`, `CONTAINER:<image>`). The rule uses
+  `|startswith` against the prefix list so the detection binds to the
+  structural contract; free-text occurrences of these tokens in other
+  fields will not fire the rule.
 references:
   - https://docs.snowflake.com/en/developer-guide/native-apps/security-overview
   - https://docs.snowflake.com/en/developer-guide/native-apps/security-cve
@@ -34,9 +42,9 @@ detection:
     event_type: APP_VERSION_INSTALLED
     auto_upgrade: true
   new_eai_or_extfn:
-    manifest_diff_added|contains:
-      - 'EXTERNAL ACCESS INTEGRATION'
-      - 'EXTERNAL FUNCTION'
+    manifest_diff_added|startswith:
+      - 'EXTERNAL ACCESS INTEGRATION:'
+      - 'EXTERNAL FUNCTION:'
   condition: app_upgraded and new_eai_or_extfn
 fields:
   - event_timestamp