Skip to content

Add marker-sanity validator and conditional PR labels#14

Draft
plebioda wants to merge 33 commits into
masterfrom
handling-workflows
Draft

Add marker-sanity validator and conditional PR labels#14
plebioda wants to merge 33 commits into
masterfrom
handling-workflows

Conversation

@plebioda
Copy link
Copy Markdown
Collaborator

Prep work for PSMDB-1972 (automated handling of CI workflow failures on mergai PRs). Moves PR-label behaviour toward "only skip CI when the solution actually has unresolved conflicts".

  • Add AgentExecutor.validate_resolved_files_have_no_markers: fails when the agent lists a file under response.resolved while it still contains conflict markers. Wired in via a new combined validate_solution on the resolve path; the retry loop now self-heals "claimed resolved but markers remain" cases.
  • Extract a shared content_has_conflict_markers helper in git_utils; the existing file_has_conflict_markers variants and the (removed) rebase._content_has_conflict_markers delegate to it.
  • Add PRTypeConfig.labels_on_unresolved and MergaiNote.has_unresolved_conflicts. pr.py now composes the effective config-label set as labels + labels_on_unresolved only when a solution has unresolved files; existing --labels/--no-labels flags operate on that effective set.

@plebioda plebioda force-pushed the handling-workflows branch from db5333c to 5a33082 Compare April 23, 2026 12:38
log = logging.getLogger(__name__)


class CommandHandler(WorkflowHandler):
Comment thread src/mergai/ci/handlers/resolve.py Fixed
@plebioda plebioda marked this pull request as draft May 21, 2026 11:42
plebioda added 21 commits May 21, 2026 13:42
Prep work for PSMDB-1972 (automated handling of CI workflow failures on
mergai PRs). Moves PR-label behaviour toward "only skip CI when the
solution actually has unresolved conflicts".

- Add `AgentExecutor.validate_resolved_files_have_no_markers`: fails
  when the agent lists a file under `response.resolved` while it still
  contains conflict markers. Wired in via a new combined
  `validate_solution` on the resolve path; the retry loop now self-heals
  "claimed resolved but markers remain" cases.
- Extract a shared `content_has_conflict_markers` helper in `git_utils`;
  the existing `file_has_conflict_markers` variants and the (removed)
  `rebase._content_has_conflict_markers` delegate to it.
- Add `PRTypeConfig.labels_on_unresolved` and
  `MergaiNote.has_unresolved_conflicts`. `pr.py` now composes the
  effective config-label set as `labels + labels_on_unresolved` only
  when a solution has unresolved files; existing `--labels`/`--no-labels`
  flags operate on that effective set.
Implements the core of PSMDB-1972: when a CI workflow (format,
clang-tidy, …) fails on a mergai PR, GitHub fires a `workflow_run`
event that invokes `mergai ci handle`, which builds a structured
context from the failure artifact, runs a configured handler, commits
the fix, and lets the workflow push so CI reruns. Per-workflow
`max_attempts` caps the loop and posts a PR comment when reached.

This commit is the mergai (Python) half. The matching psmdb-mergai
workflow + config changes are landed separately, and pin a mergai
version that contains this commit.

- `config.py`: new `WorkflowsConfig` / `WorkflowConfig` /
  `WorkflowContextConfig` dataclasses parsed under a top-level
  `workflows:` section. `action_type` is validated against
  `command` / `resolve`; the `command` action requires a non-empty
  `command` field.
- `models.py`: `MergaiNote.ci_fix_history` (`list[dict]`),
  `has_ci_fix_history`, `get_ci_attempts(workflow)`,
  `add_ci_attempt(...)`. Round-trips through `from_dict` / `to_dict`
  and concatenates in `combine_from_dicts`.
- `src/mergai/ci/`: new package.
  - `context_builders/`: abstract `WorkflowContextBuilder` +
    `WorkflowContext` dataclass; `DiffContextBuilder` reads
    `diff.patch` + `files.txt` from a workflow artifact (format);
    `SARIFContextBuilder` parses a SARIF JSON artifact (clang-tidy);
    `get_context_builder(type)` factory.
  - `handlers/`: abstract `WorkflowHandler`; `CommandHandler` runs a
    shell command with `${TARGET_BRANCH}` / `${PR_NUMBER}` /
    `${WORKFLOW_NAME}` and reports success based on a dirty tree;
    `ResolveHandler` builds a prompt from the context and drives the
    AI agent through `AgentExecutor.run_with_retry` with a
    files-modified validator; `get_handler(app, config)` factory.
- `commands/ci.py`: `mergai ci handle --workflow … --run-id …
  --pr … --artifacts-dir …`. Looks up the workflow, enforces
  `max_attempts` (posts a PR comment on the give-up turn), builds
  context, dispatches to the handler, commits as
  `fix(<workflow>): automated fix attempt N`, and attaches a git
  note containing only the latest attempt entry.
- `app.py`: extends `add_selective_note` to honour a
  `ci_fix_history` field that emits only the most recent entry.
- `main.py`: registers the `ci` group.

End-to-end verified in a temp git repo against five paths: missing
config, disabled workflow, command-handler success on attempts 1
and 2, max-attempts give-up. Lint + black clean.
`get_handler(app, config)` in `mergai/ci/handlers/__init__.py` calls
`handler_cls(app, config)`, but the abstract `WorkflowHandler` had no
`__init__`, so mypy's `--ignore-missing-imports` Type Checking job
flagged "Too many arguments for WorkflowHandler". Add the `(app,
config)` signature on the base class so concrete handlers inherit it
and the factory call type-checks.
When a merge brings in hundreds of upstream commits, the rendered
per-commit table can push the PR body past GitHub's 65,536-char limit
and `pr create` fails with a generic "Validation Failed". This adds a
`--skip-commit-list` flag that omits the table from the body, and
formats GithubException with `errors[*].message` so callers see the
actual reason (e.g. "Body is too long...") instead of just the
top-level message.

The merge_context Jinja template wraps the table in
`{% if include_commit_list %}`; `merge_context_to_markdown` and the
two `_build_*_pr_body` helpers thread the flag through.
GitPython's ref resolver only treats HEAD/ORIG_HEAD/FETCH_HEAD/index/
logs as per-worktree. `repo.commit("MERGE_HEAD")` looks in common_dir
and fails inside a linked worktree (e.g. ~/psmdb-mergai), even though
git itself reads MERGE_HEAD from the worktree's local gitdir. Read
the file directly via `repo.git_dir`, then resolve the SHA the
normal way.
When a workflow that uses `context.type: sarif` (currently
clang-tidy) fails before producing its SARIF report — e.g. a Bazel
loading-phase error during compile_commands.json generation — the
artifact upload step writes nothing and `mergai ci handle` was
crashing with FileNotFoundError, leaving the PR with no fix
attempt and no PR comment.

`SARIFContextBuilder` now falls back to fetching the failing job's
log via the GitHub API. The excerpt anchors on the first
`##[error]Process completed with exit code` marker, walks back to
the preceding `##[group]Run` line to delimit the failing step, and
returns the section. If it exceeds 64 KiB the excerpt keeps both
head and tail joined by an omission marker — root-cause errors
from build tools usually appear at the start of the output while
the failure summary appears at the end, so a plain tail loses the
original error.

To support this, context builders now receive the active
`AppContext` via constructor injection, mirroring the handler
factory. `DiffContextBuilder` is unaffected (it inherits the new
`__init__`).
The CLI used to require --workflow / --run-id / --pr / --artifacts-dir
as positionally-equivalent flags, with the workflow YAML extracting
each from the workflow_run event payload and downloading artifacts
itself via github-script. That bloated the workflow file and made
manual invocation cumbersome.

`mergai ci handle --run-id <id>` now resolves the run via PyGithub
(workflow name, head SHA, PR number, conclusion), validates that the
head branch is mergai/*, downloads artifacts to a temp dir in-process
via the new utils/artifact_downloader, and dispatches by run
conclusion + per-workflow config:

* failure: artifact context (with log fallback for runs that died
  before producing one — preserved from the prior commit)
* success + context.code_scanning_check: query Code Scanning's
  analyses API for findings on the run's commit; if results_count > 0,
  fetch the SARIF and reuse the existing _flatten_findings parser via
  a new code-scanning source path on SARIFContextBuilder
* otherwise: no-op

The new `code_scanning_check: bool` flag on WorkflowContextConfig opts
clang-tidy in (clang-tidy returns 0 even for findings, so without
this flag the workflow_run would always be conclusion=success and we
would never act).

--workflow / --pr / --artifacts-dir survive as overrides for manual
or offline runs. The orchestration is wrapped in
`build_workflow_context_for_run` (a contextmanager that owns the
tempdir lifecycle) so future callers can reuse the dispatch decision.
Three prompt commands previously lived in different shapes — top-level
`mergai prompt` (resolve), top-level `mergai merge-prompt` (describe),
and no manual entry point at all for the CI fix prompt that
`ci handle` feeds to the agent. The latter could only be inspected by
running `ci handle` and letting it fail.

Replaced with a single `mergai prompt` group:

* `prompt resolve`  — was the top-level `mergai prompt`
* `prompt describe` — was the top-level `mergai merge-prompt`
* `prompt ci --run-id <id>` — new; runs the same orchestration as
  `ci handle` (resolve run → pick artifact / Code Scanning / log
  fallback → build WorkflowContext) but stops before the agent and
  prints the rendered prompt to stdout. Same override flags as
  `ci handle` (--workflow / --pr / --artifacts-dir).

To enable `prompt ci`, factored out:
* `build_ci_fix_prompt(context)` — the prompt template rendering, was
  `ResolveHandler._build_prompt`. The handler now calls the free
  function; `prompt ci` calls it directly with the WorkflowContext.

The old `mergai prompt` (no arg) and `mergai merge-prompt` are removed
without aliases — breaking change.
Before this change CI fixes lived in their own `ci_fix_history` field
on the note: a free-form attempt log distinct from the agent solutions
that the resolve flow already handles. The CI prompt was a
single-line `PROMPT_TEMPLATE.format(...)`, the validator was a
hand-rolled "is the tree dirty?" check, and the commit message was a
one-liner with no record of what the agent claimed to fix.

Fold CI fixes into `solutions[]` with `type: "ci_fix"`. Each entry
carries a `request` dict (workflow / run_id / pr_number /
attempt_number / context_summary / timestamp) and the same
`response` shape as the resolve flow (summary / resolved /
unresolved / modified / review_notes). The post-processing pipeline
is now shared:

* Prompt: a new `build_ci_fix_prompt(WorkflowContext)` free function
  in `prompt_builder.py` mirrors `build_resolve_prompt` — system
  prompt (`prompts/system_prompt_ci_fix.md`), project invariants,
  per-context section (`prompts/ci_fix_context.md`), and the
  `WorkflowContext` serialized as JSON.
* Validator: `executor.validate_solution`, the same one resolve uses
  (resolved/modified files actually have unstaged changes, no leftover
  conflict markers in resolved files).
* Commit: a new `app.commit_ci_fix_solution(idx)` that mirrors
  `commit_solution()` — `fix(<workflow>): automated fix attempt N`
  header, the agent's summary, Resolved/Unresolved/Modified file
  sections, footer, then `add_selective_note(sha, ["solutions[N]"])`.

Behavior changes:

* Attempt counting now caps *applied* fixes only. Failed agent runs
  leave no solution behind, so they don't count toward `max_attempts`.
  This matches resolve's behavior (no per-attempt log there either).
* Re-handling the same `run_id` is now a no-op — the orchestrator
  looks up `note.get_ci_solution_for_run(run_id)` and bails early.
* The new resolve and human-synced solutions also get
  `type: "conflict_resolution"` for consistency.

The `ResolveHandler.execute` return type changes from `bool` to
`dict | None` so the orchestrator can wrap the agent's response with
CI-fix metadata before storing. `CommandHandler.execute` synthesizes a
solution dict from the post-command working-tree state so the recorded
shape stays uniform.

No back-compat for old `ci_fix_history` notes — feature branch only,
not yet shipped.
Lists recent workflow_runs on a branch (defaulting to the current
branch) and tells you, for each configured workflow, whether mergai
has already applied a fix or — if not — what action mergai would take
if the run was handled now.

Status column:
  applied   — note has a `type: ci_fix` solution with matching run_id
              (notes column shows the solutions[N] index and attempt #)
  pending   — would act on this run (failure: artifact + log fallback;
              success + code_scanning_check + findings present)
  skip      — would not act (passing run with no opt-in, conclusion
              not actionable, workflow not enabled, head_branch
              outside mergai/*)

For passing runs whose workflow has `code_scanning_check`, the command
queries Code Scanning to determine the *actual* finding count rather
than just saying "would check"; this is one extra API call per such
run. Skip the lookup with `--no-check-findings`.

Useful for diagnosing why mergai isn't reacting to a run, and for
spot-checking the bare `mergai ci handle` (step C) before letting it
loose on a backlog.
Replaces `--run-id <id>` with a positional `TARGET` arg that
auto-disambiguates:

  * a numeric run ID — process that specific run (the GitHub Actions
    invocation: `mergai ci handle ${{ github.event.workflow_run.id }}`)
  * "all"            — process every unprocessed actionable run on the
                       current branch, newest-first
  * a workflow name  — like "all" but filtered to one workflow

Manual catch-up was the missing case. After landing a series of fixes
or pausing the bot, you'd want to walk the backlog without writing a
shell loop around the run IDs. `mergai ci list` shows what's pending;
`mergai ci handle all` (or `mergai ci handle clang-tidy`) processes
them.

The body of the per-run flow is hoisted out of `handle` into
`_handle_one_run` so the iteration is a thin loop. Resolution lives in
`_resolve_target_runs` which mirrors the dispatch decision from
`build_workflow_context_for_run` to filter the listing — same
"actionable" rules, just applied up front so we don't even build a
context for runs we'd skip.

Per-workflow `max_attempts` is enforced inside `_handle_one_run`, so
processing N pending runs naturally stops at the cap and posts the
"giving up" PR comment.
The rich.Table output used Unicode box-drawing characters which look
fine in interactive terminals but garble in log-aggregators, CI
output, and any TTY without a font that ships them. Replace with a
small `_format_ascii_table` helper: header row, dashed rule, padded
columns separated by two spaces. No box. No Unicode.

Also drop the `→` arrow in the failure-path notes string.
Don't process runs whose head_sha isn't the current branch HEAD.
Findings on a superseded or force-pushed commit can reference code
that no longer exists, and committing a fix on the wrong base is
worse than no fix.

  * `_run_head_status(app, run)` classifies a run's head commit as
    `current` (== HEAD), `superseded` (ancestor of HEAD but not
    equal — newer commits exist), or `obsolete` (not reachable from
    HEAD; typically force-pushed away).
  * `_run_is_actionable` requires `current`, so
    `mergai ci handle all` and `mergai ci handle <workflow>` only
    pick up runs that match the current state.
  * `_handle_one_run` rejects stale runs even when an explicit
    run-id is passed, with a one-line message explaining why.
  * `mergai prompt ci` is unaffected — inspecting the prompt for an
    older run is still useful for debugging and never mutates state.

`mergai ci list` improvements:

  * Filter `obsolete` runs out of the listing entirely. Force-pushed
    branches accumulate runs in GitHub's history under the branch
    name; showing them as "head_sha not reachable" is just noise.
    `superseded` runs stay (still relevant history of the branch).
  * Add a `wait` status for runs that haven't completed yet, with a
    note like `still in_progress` instead of the cryptic
    `conclusion 'None'`. Distinguishes "ignore forever" (skip) from
    "will be checked once it finishes" (wait).
The note column hardcoded "failure -> artifact + log fallback" for
every failed run, but only the SARIF builder actually has a log
fallback when the artifact is missing — the diff builder (used by
the format workflow) does not. Reading "log fallback" on a format
row was misleading.

Now branches on `config.context.type`:

  * sarif -> "failure -> SARIF artifact (log fallback if missing)"
  * diff  -> "failure -> diff artifact"
  * else  -> "failure -> <type> artifact"
Two related changes to `mergai prompt ci`:

1. Replace the required `--run-id <id>` option with a positional
   `TARGET` arg that mirrors `mergai ci handle`:
     * a numeric run ID — print that run's prompt
     * "all"            — print prompts for every unprocessed
                          actionable run on the current branch
     * a workflow name  — like "all" but filtered to that workflow
   Reuses `ci.py`'s `_resolve_target_runs` so the actionable filter
   matches what `ci handle` would actually process. Specific run-ids
   stay exempt from the staleness gate.

2. Split `build_ci_fix_prompt` into `build_ci_fix_preamble` (system
   prompt + project invariants + ci_fix_context.md description) and
   `build_ci_fix_run_section` (the per-run JSON block). Single-run
   output is unchanged — `build_ci_fix_prompt` still returns
   preamble + section. Multi-run output emits the preamble once and
   appends one section per run with a `## Run <id> — <workflow>`
   heading, so `mergai prompt ci all` doesn't repeat the system
   prompt for every run.
`fix` reads as a more concrete action than the generic `handle` and
fits both action types — `command` workflows (e.g. format running
`bazel run format`) and `resolve` workflows (e.g. clang-tidy invoking
the AI agent) both produce a fix and a `type: ci_fix` solution. No
behavior change; the function `_handle_one_run` is renamed to
`_fix_one_run` to match.

No deprecated alias — same policy as the earlier `prompt` rename.

The psmdb-mergai workflow YAML and `.mergai/config.yml` need a
matching one-line update each (out of tree).
When `mergai ci fix all` (or `fix <workflow>`) processes multiple
runs, applying a fix for run #1 commits something on top of HEAD —
which then disqualifies run #2 from the per-iteration head_sha check
even though both runs were originally valid against the same commit
at the start of the loop. The user sees:

    Found 2 unprocessed actionable run(s) for target 'all'.
    Handling 'format' run 25095788502 (attempt 1 of 3). ...
    Run 25095788526 (edfd91c) is superseded; skipping.

That second skip is mergai self-disqualifying.

The fix: gate the staleness check at the *entry*, not the per-run
step. `_resolve_target_runs` already filters by `_run_is_actionable`
(which includes the staleness check), so for `all` /
workflow-name targets the runs are vetted up front. The loop trusts
the resolved list and processes each run regardless of whether our
own commits have moved HEAD in the meantime.

Explicit run-id targets still get the staleness gate — the user may
have pasted an old run-id and we shouldn't act on it. Threaded through
`_fix_one_run` as a `check_staleness` kwarg, defaulting to True so
behavior at the API level is unchanged.
Two new selection modes for `mergai context drop solution`, on top of
the existing default (uncommitted) and `--all`:

  * `--index N` (repeatable) — drop the solution at the given index.
    Useful when you've inspected the solutions in note.json and want
    to remove a specific one rather than all-or-nothing. Remaining
    solutions are compacted, so a solution that was at index 5 may
    now be at index 3 — the help text says so explicitly.

  * `--orphaned` — drop solutions whose commit SHA is no longer
    reachable from the current HEAD. The intended use is after
    `git reset --hard HEAD~1` or a force-push that removed a fix
    commit: the solution stays recorded in the note even though the
    code change is gone, and `--orphaned` cleans it up.

The flags are mutually exclusive (validated up front with a
UsageError); they only apply to `solution`, not to other parts.

Implementation:

  * `MergaiNote.drop_solutions_at_indices(indices)` removes specific
    indices and rewrites `note_index` references to track the
    compacted positions, so future `add_selective_note` calls keep
    pointing at the right solution. Note-index entries whose only
    field referenced a dropped solution are removed entirely.

  * `MergaiNote.find_orphaned_solution_indices(repo)` collects every
    commit SHA associated with each solution (from
    `solution.commit_sha` for human-synced entries, plus matching
    `note_index` entries) and returns the indices whose SHAs aren't
    reachable from HEAD. Solutions with no associated commit at all
    are left alone (they're uncommitted and have nothing to verify).

  * `_sha_reachable_from_head(repo, sha)` is a small helper using
    `git merge-base --is-ancestor`, mirroring the staleness check in
    `mergai/commands/ci.py`.

`git revert` (which creates a new commit undoing the change) leaves
the original commit reachable from HEAD, so `--orphaned` won't
catch it — the help text recommends `--index N` for that case.
A real `ci fix` run on a clang-tidy failure produced this commit:

    fix(clang-tidy): automated fix attempt 1
    ...changed compile_commands.json from a hard source dependency to
    an optional glob in the compiledb genrules, allowing the build to
    proceed when the file doesn't exist yet.

That's a band-aid. The build was actually broken because of an
upstream merge bringing in code that referenced a target the
.auto_header generator hadn't produced — the fix is on the source
side, not by weakening BUILD.bazel deps. The agent didn't realize it
was on a post-merge branch because the prompt never told it.

Two changes:

1. **Embed the merge note in the CI-fix prompt.** A new `Merge
   Context` section in `build_ci_fix_preamble` (driven by the new
   `prompts/merge_context_for_ci_fix.md`) carries `merge_info`,
   `merge_context`, `conflict_context`, prior `solutions`,
   `pr_comments`, and `user_comment` from the note as JSON. Same
   serialization config that `build_resolve_prompt` uses, plus
   `solutions` so the agent sees what mergai already did on this
   branch (conflict resolution + prior CI fixes). `build_ci_fix_prompt`
   and `build_ci_fix_preamble` gain optional `note` /
   `prompt_config` parameters; `ResolveHandler.execute` and
   `mergai prompt ci` pass them when a note is loaded.

2. **Rewrite `prompts/system_prompt_ci_fix.md`** with an explicit
   "Diagnose first" section that names the three likely root-cause
   buckets for a freshly-merged branch (upstream change, Percona
   code, prior conflict resolution), tells the agent to use git +
   `mergai show <commit>` to inspect, and forbids build-system
   workarounds (weakening BUILD.bazel deps, replacing hard deps
   with optional globs, suppressing warnings) unless they're clearly
   the right fix and not a band-aid. The output-format contract
   (resolved / unresolved / modified / summary / review_notes) is
   unchanged but `summary` now asks the agent to name which bucket
   it identified.

Verified live against PR #205: the rendered prompt now includes
`merge_info.merge_commit = aea0c37039a`, the upstream
`merge_context.merged_commits`, the `conflict_context` files, and
the prior `conflict_resolution` solution with summary "Resolved
all merge conflicts preserving Percona-specific fea[tures]".
Old title was conventional-commits style:
  fix(clang-tidy): automated fix attempt 1
which doesn't fit the rest of the branch — `commit_solution` and
`commit_merge` use a sentence form ("Resolve conflicts for merge
commit '<sha>' into <target_branch>", "Merge commit '<sha>' into
<target_branch>"). A `git log` over a mergai branch should read in
one voice.

New title:
  Fix <workflow> failure for merge commit '<sha>' into <target_branch>

Body keeps the agent's wrapped summary plus Resolved / Unresolved /
Modified sections (same shape `commit_solution` produces) and gains
a `CI: <workflow> run <run_id> (attempt N of M)` trailer for
traceability — useful when bisecting which run a given fix
addressed.
@plebioda plebioda force-pushed the handling-workflows branch from f9b243f to 67f7e39 Compare May 22, 2026 16:42
from dataclasses import dataclass, field
from typing import Any

from ...app import AppContext
log = logging.getLogger(__name__)


class ResolveHandler(WorkflowHandler):
from .utils import util

if TYPE_CHECKING:
from .ci.context_builders.base import WorkflowContext
Comment thread src/mergai/models.py
Returns:
Self for method chaining.
"""
import re
Comment thread src/mergai/models.py
Returns:
Sorted list of orphaned solution indices.
"""
import re
plebioda and others added 4 commits May 26, 2026 11:43
Pulls the failing-job log fetch + excerpt out of SARIFContextBuilder
into a shared `_job_log` helper so other builders can reuse it
instead of reaching into a private SARIF method. SARIF behavior is
unchanged.

WorkflowContextConfig.artifact_name moves from `str | None` to
`list[str]`. YAML still accepts a bare string (back-compat for
format / clang-tidy); from_dict normalizes both forms and rejects
other types. Builders that take exactly one artifact (diff, sarif)
now read element zero — groundwork for multi-artifact builders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reads Bazel's Build Event Protocol stream + bazel-testlogs/ uploaded
as failure artifacts. Extracts aborted events, failed actions with
their stderr, and non-PASSED testResults — with each test's matching
test.log resolved from the //pkg:tgt label and included inline.
Falls back to the failing job's log when no BEP file is present, via
the shared `_job_log` helper.

Iterates context.artifact_name and uses the first directory that
exists, so one config entry can cover multi-job workflows where each
job uploads its own failure artifact (e.g. PSMDB's build-and-test:
build job uploads build-failure-artifacts, unittests job uploads
unittest-failure-artifacts, only one is present per failing run).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three flaws surfaced while debugging a botched build-and-test fix
against a psmdb merge branch.

1. The agent's "unable to fix in code" verdict (empty resolved +
   modified) was committed as an empty change that nonetheless burned
   a `max_attempts` slot. Now those verdicts land in a new
   `ci_diagnoses[]` list on the note instead of `solutions[]` — no
   commit, no slot consumed. Inspect / publish them with:

       mergai ci diagnosis list [--pending] [-v]
       mergai ci diagnosis post [<run_id>|all] [--dry-run] [--force]

   `ci fix all` and `ci list` skip runs that already have a recorded
   diagnosis so re-running is idempotent.

2. `download_workflow_run_artifacts` pulled stale entries from prior
   attempts of the same workflow run (GitHub returns artifacts
   cumulatively across attempts with no per-attempt endpoint). Skip
   anything whose `created_at` predates the run's
   `run_started_at` — the latest attempt's start. Also stream the
   download in 256 KB chunks with TTY-overwrite / per-10% non-TTY
   progress so a 75 MB artifact isn't silent.

3. The bazel context builder was embedding multi-MB `test.log` files
   under head+tail truncation that systematically dropped the actual
   failure line (the failing test sits in the middle of a 99-test
   suite). Drop the embedding entirely: save each failing job's full
   log to `<artifacts_dir>/_mergai_job_logs/`, point the agent at
   that path plus the BEP file via a short pointer-based `details`
   section, and surface `artifacts_dir` in the prompt JSON so the
   agent can navigate. No assumptions about Google Test or any
   MongoDB-specific layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously only the "unable to fix" verdict produced text a human could
post (the diagnosis flow); a successful fix committed silently. Unify
both outcomes under one concept so every attempt leaves an explanation.

- Rename `ci_diagnoses` -> `ci_comments` (field + methods). Each entry
  carries `outcome` ("fixed" | "unfixable") and a `commit_sha` (set for
  fixed, None otherwise), embedding the agent response so the renderer
  is self-contained.
- `_fix_one_run` now records a ci_comment on both branches: after
  committing a fix, and for the no-change verdict as before.
- Rename the `ci diagnosis` command group to `ci comment`; the renderer
  branches on outcome ("auto-fixed" vs "unable to auto-fix").
- `ci list` surfaces the comment's posted state alongside the fix.

Records live only in the cache note, so `ci comment post` must run in
the same CI job as `ci fix` (unchanged from the diagnosis flow).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
plebioda and others added 8 commits May 27, 2026 12:22
`ci list` crashed with `IndexError: list index out of range` when run in
the seconds after a push. PyGithub's `_Slice` (used by `runs[:limit]`)
indexes by position and trusts the pagination `Link` header via
`_isBiggerThan`, so when GitHub reports more runs than a page actually
returns — common while new runs are still being created — it overruns
`__elements`.

Add `_take_workflow_runs`, which iterates the PaginatedList directly
(the `__iter__` path never indexes by position) and stops at the limit,
and use it at both slice sites. Degrades to a short list instead of
crashing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The skip note for a green run leaked the internal config field name
("passed; code_scanning_check not enabled"), which read as jargon.

A passing run is only actionable when the workflow opts into a Code
Scanning check, so:
- not opted in        -> bare "passed" (green, nothing to do)
- opted in            -> every note names "Code Scanning" explicitly,
                         so it's clear why that workflow is treated
                         differently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`_collect_commits_for_squash` walked all ancestors of HEAD via bare
`iter_commits()` and stopped at the first one matching `target_branch_sha`
in date order. For a mergai merge commit (two parents: prior target tip
and the upstream merge commit), the walk also descended into the second
parent and collected upstream commits dated after `target_branch_sha`
before ever reaching it — producing dozens to hundreds of "commits to
squash" instead of just the merge plus solution commits.

Switch to `iter_commits(target..HEAD, first_parent=True)` and validate
ancestry up front via `git merge-base --is-ancestor`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The workflow needs a single machine-readable answer to "is the whole
watched set done for this commit, and did it pass?" before deciding to
squash, fix, or wait. `ci list` is per-run and human-oriented; this adds
an aggregate gate.

`ci status` takes the latest run of each watched workflow whose head_sha
is the branch HEAD (ignoring superseded/obsolete runs) and reduces them
to one token: in-progress | success | failure | none. `--state` prints
just the token for shell capture (STATE=$(mergai ci status --state));
exit code is always 0. Reuses `_take_workflow_runs` / `_run_head_status`
and the configured workflow set.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`os.getenv("GITHUB_TOKEN")` returns "" when the env var is set but empty,
which `is not None`, so the function returned the empty string and never
tried GH_TOKEN or `gh auth token`. github.Github("") is then falsy, so
app.gh became None -> "GitHub token not found", even when a usable token
was available from the next source.

Test truthiness instead, so an empty value falls through.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…finalizes

The squash finalize message was regenerated from the live first-parent
walk each time, so re-finalizing collapsed a prior squash into a single
line and dropped its modified files. Persist the squashed-commit record
in the note and render the message from the note instead.

- MergaiNote gains an accumulating `squashed_commits` field (sha + first
  line), wired through from_dict/to_dict/note_index/selective notes.
- squash_to_merge expands any prior-squash entries and appends leaf
  commits, so the "Squashed commits:" list grows across finalizes.
- "Modified:" is now sourced from solutions' resolved+modified files
  (minus conflict files) rather than a per-commit git diff, so it
  accumulates too and includes ci-fix changes.
- Validate the squash range: only the HEAD PR merge commit may be
  noteless; a noteless single-parent commit fails with a hint to run
  `commit sync`, and a stray noteless merge is reported.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A clean (textual) merge can still produce a broken result - failing
build/tests - i.e. a semantic conflict. These fixes need to be reviewed
before being squashed into the merge commit, but there was no branch/PR
kind for them: fixes landed directly on the main branch and got squashed
in, making them unreviewable, and squashing an open solution PR destroyed
the conflict-resolution diff under review.

Introduce a `semantic` kind, parallel to `solution` but for semantic
conflicts:

- BranchType.SEMANTIC + semantic_branch naming (auto-wires `branch
  create/delete/push/switch semantic`).
- pr.semantic config (title/labels) + PRTitleBuilder.semantic_title.
- `mergai pr create semantic`: head = semantic branch, base = main
  branch, so the fixes are reviewed against the merge commit and finalize
  later squashes them in.

The CLI is generic; the workflow wiring (route fixes to a semantic
branch/PR, stop squashing open review branches) lives downstream.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants