Draft
Conversation
scripts/ci_analysis.py fetches GitHub Actions workflow run and job data for the CI workflows and produces: - Per-job queue time (created_at -> started_at) and duration stats - Per-run total queue time (sum across all PR-gating jobs) - Weekly trend plots: queue time and duration per job (PNG) - Failure classification: infrastructure vs. genuine Default time range is from Jan 1 of the current year to today, so running the script regularly accumulates a growing year-to-date view. Usage: python3 scripts/ci_analysis.py --plot # year-to-date python3 scripts/ci_analysis.py --since 2026-03-01 --plot Requires: gh CLI authenticated or GITHUB_TOKEN env var. Generated PNGs and JSON are added to .gitignore. Signed-off-by: Clemens Volk <cvolk@nvidia.com>
ci_new.yml (GitHub workflow id 238099976) was a temporary rename of ci.yml in Feb 2026 and no longer exists in the repo. GitHub keeps it active in its API as long as run history exists, causing the analysis script to count every run twice. Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Mirrors the existing total queue time line on the queue plot: the dashed black 'Total' line shows the weekly median of the sum of PR-gating job durations per run. Signed-off-by: Clemens Volk <cvolk@nvidia.com>
- Duration plot: only count jobs with conclusion==success to avoid cancelled/failed jobs biasing the median downward - Total wall-clock: only record for runs where ALL PR-gating jobs succeeded individually, not just run.conclusion==success - Fix variable shadowing bug: job conclusion was overwriting run conclusion in the outer loop (renamed to job_conclusion) - Remove failure rate bar chart (kept in text report instead) - Simplify plots to median-only lines; move med/p90/σ breakdown into a weekly stats table printed to stdout - Remove ci_new.yml from WORKFLOW_IDS (short-lived rename, no longer in repo, was double-counting all runs) Signed-off-by: Clemens Volk <cvolk@nvidia.com>
- Exclude the current (partial) ISO week from both plots and the weekly stats table, so incomplete data does not distort trends - Remove the "Total queue" line from the queue time plot; median-of-sums is misleading when per-run queues are correlated/bursty; per-job medians and the weekly table (med/p90/σ) give a clearer picture Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Correctness: - Fix p90 formula in fmt_stats to match pct() — was sv[int(n*0.9)] (100th percentile for n=10), now sv[max(0, int(n*p)-1)] via shared _pct() helper used in both places - week_failure_counts["genuine"] was never incremented; now uses week_failure_counts[run_week][cat] += 1 to track all categories - Clarify run_total_times comment: it is unfiltered (all runs with valid timestamps); only run_duration_records is restricted to fully-successful runs Robustness: - gh_get: distinguish permanent 4xx errors (401/403/404/422) from transient ones — fail fast with a clear hint instead of retrying; also change bare `raise exc` to `raise` to preserve traceback - fetch_runs_since: add _MAX_PAGES=200 safety cap to prevent infinite pagination; check for GitHub error objects in response body - analyze: count per-run job-fetch failures and print a summary warning; sys.exit if every run failed (signals auth/network issue) - get_token: wrap subprocess.run with timeout=10 and handle FileNotFoundError/TimeoutExpired; replace bare except on YAML fallback with specific exceptions (ImportError, FileNotFoundError, KeyError) so each failure prints a diagnostic message - main: guard --since date parsing with try/except ValueError and route through parser.error(); guard json.dump with try/except OSError Comment accuracy: - INFRA_PATTERNS block: "step names only" → "job names and step names" - classify_failure docstring: same correction - Module docstring output section: show <prefix> and <output> as configurable rather than fixed filenames - to_minutes sanity cap: mention negative values (clock skew) too - WORKFLOW_IDS comment: explain why omission == exclusion Simplification: - Extract _pct(), _current_iso_week(), _group_records_by_week() helpers; print_report() and plot_weekly_trends() now share the same week-grouping and current-week-exclusion logic (~40 lines removed) - print_report(): use module-level defaultdict (drop redundant `from collections import defaultdict as _dd`); use PLOT_JOBS.items() instead of a separate KEY_JOBS list Signed-off-by: Clemens Volk <cvolk@nvidia.com>
- week_failure_counts initializer was missing "unknown" key, causing a
KeyError whenever classify_failure() returned "unknown"
- Per-job fetch: add GitHub error-envelope check ({"message": ...})
so API errors are counted as fetch failures rather than silently
becoming zero-job runs
- Per-job fetch: print a note when exactly 100 jobs are returned,
signalling possible truncation (pagination not yet implemented)
- fetch_runs_since: add explicit isinstance check so a non-dict
response raises RuntimeError with context rather than bare
AttributeError
- analyze meta: include job_fetch_failures and
job_fetch_completeness_pct so downstream consumers can detect
partial data
- main: write JSON before generating plots so data is never lost
if plot generation fails
- get_token: always warn when gh auth token exits non-zero, even
when stderr is empty
Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Replace the packed "med/p90/σ n=X" single-row format with separate sub-tables: Median, p90, and n (samples), each with one value per cell. Week labels shortened to "W07" style. Trend arrows (↑↓→) compare the last two completed weeks per metric. Unicode horizontal rule separator for visual clarity. Signed-off-by: Clemens Volk <cvolk@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
scripts/ci_analysis.py, a standalone tool for analysing GitHub Actions CI health for this repo. It fetches run and job data from the GitHub API and produces:--plot) saved as PNGsUsage:
Current findings (Jan 23 – Mar 6):