Add tag probabilities to AI Guard evaluation result by y9v · Pull Request #5580 · DataDog/dd-trace-rb

y9v · 2026-04-10T16:05:31Z

What does this PR do?
This PR adds AIGuard::Evaluation::Result#tag_probabilities attribute, exposing the tag_probs field from the AI Guard API response.

Motivation:
We want to add evaluation result tag probabilities to our SDK, and attach tag probabilities to metastruct.

Change log entry
Yes. AI Guard evaluation results now include tag probabilities from the API response.

Additional Notes:
APPSEC-61897

How to test the change?
CI and manual testing

github-actions · 2026-04-10T16:06:02Z

Typing analysis

Note: Ignored files are excluded from the next sections.

`steep:ignore` comments

This PR introduces 1 steep:ignore comment, and clears 1 steep:ignore comment.

steep:ignore comments (+1-1)

❌ Introduced:

lib/datadog/ai_guard/evaluation.rb:66

✅ Cleared:

lib/datadog/ai_guard/evaluation.rb:65

Untyped methods

This PR introduces 1 partially typed method, and clears 1 partially typed method.

Partially typed methods (+1-1)

❌ Introduced:

sig/datadog/ai_guard/evaluation/result.rbs:29
└── def initialize: (::Hash[::String, untyped] raw_response_body) -> void

✅ Cleared:

sig/datadog/ai_guard/evaluation/result.rbs:28
└── def initialize: (::Hash[::String, untyped] raw_response_body) -> void

If you believe a method or an attribute is rightfully untyped or partially typed, you can add # untyped:accept on the line before the definition to remove it from the stats.

datadog-datadog-prod-us1 · 2026-04-10T16:25:59Z

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
• Patch Coverage: 94.12%
• Overall Coverage: 95.35% (+0.05%)

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 672a7c7 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!}

pr-commenter · 2026-04-13T11:56:22Z

Benchmarks

Benchmark execution time: 2026-04-13 11:56:19

Comparing candidate commit 672a7c7 in PR branch ai-guard-tag-probabilities with baseline commit b9193de in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 45 metrics, 1 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

🟩 = significantly better candidate vs. baseline
🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

y9v self-assigned this Apr 10, 2026

y9v requested review from a team as code owners April 10, 2026 16:05

vpellan approved these changes Apr 13, 2026

View reviewed changes

Add tag probabilities to AI Guard evaluation result

672a7c7

y9v force-pushed the ai-guard-tag-probabilities branch from 0c20b88 to 672a7c7 Compare April 13, 2026 11:30

y9v merged commit 103d9c5 into master Apr 13, 2026
629 checks passed

y9v deleted the ai-guard-tag-probabilities branch April 13, 2026 12:26

github-actions bot added this to the 2.31.0 milestone Apr 13, 2026

y9v mentioned this pull request Apr 16, 2026

Bump to version 2.31.0 #5597

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tag probabilities to AI Guard evaluation result#5580

Add tag probabilities to AI Guard evaluation result#5580
y9v merged 1 commit intomasterfrom
ai-guard-tag-probabilities

y9v commented Apr 10, 2026 •

edited by atlassian bot

Loading

Uh oh!

github-actions bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

datadog-datadog-prod-us1 bot commented Apr 10, 2026 •

edited by datadog-official bot

Loading

Uh oh!

pr-commenter bot commented Apr 13, 2026

Explanation

More details about the CI and significant changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

y9v commented Apr 10, 2026 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Typing analysis

steep:ignore comments

Untyped methods

Uh oh!

datadog-datadog-prod-us1 bot commented Apr 10, 2026 • edited by datadog-official bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pr-commenter bot commented Apr 13, 2026

Benchmarks

Explanation

More details about the CI and significant changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

y9v commented Apr 10, 2026 •

edited by atlassian bot

Loading

github-actions bot commented Apr 10, 2026 •

edited

Loading

`steep:ignore` comments

datadog-datadog-prod-us1 bot commented Apr 10, 2026 •

edited by datadog-official bot

Loading