Skip to content

CCIP-10030 CCIP-10029 Retrying failed jobs CLI#962

Open
mateusz-sekara wants to merge 5 commits intomainfrom
retrying-failed-jobs-cli
Open

CCIP-10030 CCIP-10029 Retrying failed jobs CLI#962
mateusz-sekara wants to merge 5 commits intomainfrom
retrying-failed-jobs-cli

Conversation

@mateusz-sekara
Copy link
Collaborator

No description provided.

@mateusz-sekara mateusz-sekara changed the title Retrying failed jobs CLI CCIP-10030 CCIP-10029 Retrying failed jobs CLI Mar 23, 2026
@mateusz-sekara mateusz-sekara marked this pull request as ready for review March 23, 2026 12:13
@mateusz-sekara mateusz-sekara requested review from a team and skudasov as code owners March 23, 2026 12:13
Copilot AI review requested due to automatic review settings March 23, 2026 12:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new ccv job-queue CLI surface to inspect and retry permanently failed verifier jobs stored in Postgres archive tables, complementing the existing ccv chain-statuses CLI.

Changes:

  • Adds job-queue CLI commands (list, reschedule) with parsing/rendering helpers and unit tests.
  • Implements a Postgres-backed Store for listing failed archived jobs and rescheduling them back into active queues.
  • Adds an E2E smoke test for the new CLI and wires it into the smoke workflow matrix.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
verifier/cmd/run_ccv_cli.go Registers the new ccv job-queue subcommand and lazily constructs DB-backed deps.
cli/jobqueue/store.go Defines queue types, archived job model, and Store interface for CLI operations.
cli/jobqueue/postgres_store.go Postgres implementation for listing failed jobs and rescheduling from archive to active tables.
cli/jobqueue/commands.go Implements list and reschedule CLI commands, flag parsing, and table output.
cli/jobqueue/commands_test.go Unit tests for command behavior, parsing, and output rendering.
cli/jobqueue/mocks/mock_Store.go Mockery-generated mock for the Store interface used in unit tests.
build/devenv/tests/e2e/smoke_jobqueue_cli_test.go E2E smoke test that seeds an archived failed job and verifies list/reschedule behavior.
.mockery.yaml Adds mockery config for generating cli/jobqueue mocks.
.github/workflows/test-smoke.yaml Adds a new smoke-test matrix entry intended to run the job-queue E2E test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

args = append(args, ownerID)
}

query += " ORDER BY created_at DESC"
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ListFailed orders archive rows by created_at DESC, but the archive tables are indexed by completed_at and the CLI renders this as the archive timestamp. Ordering by completed_at DESC would better match the semantics (most recently failed first) and align with existing indexes to avoid slow sorts on large archives.

Suggested change
query += " ORDER BY created_at DESC"
query += " ORDER BY completed_at DESC"

Copilot uses AI. Check for mistakes.
Comment on lines +213 to +216
archivedAt := "-"
if j.ArchivedAt != nil {
archivedAt = j.ArchivedAt.Format("2006-01-02T15:04:05Z")
}
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time formatting layout 2006-01-02T15:04:05Z hardcodes a literal Z suffix even if the time.Time value isn’t UTC, which can misrepresent timestamps coming back from Postgres. Consider formatting as RFC3339 with offset (e.g., time.RFC3339 / 2006-01-02T15:04:05Z07:00) or converting to UTC before formatting; cli/chainstatuses/commands.go already uses Z07:00 for this reason.

Copilot uses AI. Check for mistakes.
…the insert

ON CONFLICT DO NOTHING would delete the archive row but leave the active table
unchanged, losing the job with no trace. Removing the clause lets PostgreSQL
surface a unique-constraint violation so the operator can fix the state manually.

Made-with: Cursor
@github-actions
Copy link

Code coverage report:

Package main retrying-failed-jobs-cli diff
github.com/smartcontractkit/chainlink-ccv/aggregator 47.78% 47.79% +0.01%
github.com/smartcontractkit/chainlink-ccv/bootstrap 42.35% 42.35% +0.00%
github.com/smartcontractkit/chainlink-ccv/cli 86.39% 65.13% -21.26%
github.com/smartcontractkit/chainlink-ccv/cmd 0.00% 0.00% +0.00%
github.com/smartcontractkit/chainlink-ccv/common 50.74% 50.74% +0.00%
github.com/smartcontractkit/chainlink-ccv/executor 46.42% 46.42% +0.00%
github.com/smartcontractkit/chainlink-ccv/indexer 42.74% 42.74% +0.00%
github.com/smartcontractkit/chainlink-ccv/integration 45.27% 45.38% +0.11%
github.com/smartcontractkit/chainlink-ccv/pkg 100.00% 100.00% +0.00%
github.com/smartcontractkit/chainlink-ccv/pricer 0.00% 0.00% +0.00%
github.com/smartcontractkit/chainlink-ccv/protocol 68.69% 68.69% +0.00%
github.com/smartcontractkit/chainlink-ccv/verifier 33.10% 33.10% +0.00%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants