Skip to content

linuxptp-daemon CI Optimization#76532

Open
yaronh12 wants to merge 1 commit intoopenshift:mainfrom
yaronh12:linuxptpdaemon-ci-optimization
Open

linuxptp-daemon CI Optimization#76532
yaronh12 wants to merge 1 commit intoopenshift:mainfrom
yaronh12:linuxptpdaemon-ci-optimization

Conversation

@yaronh12
Copy link
Contributor

Jira: CNF-22511
Related: CNF-17167 (cloud-event-proxy CI optimization)

Problem

The e2e-aws presubmit job for openshift/linuxptp-daemon takes almost ~3 hours to complete. From job history.

Time Breakdown (from build #2033906116655583232)

Phase Duration % of Total
Image build + release creation ~14 min 8%
IPI cluster install (ipi-install-install) 43m58s 26%
Other pre-install steps ~4 min 3%
OpenShift e2e conformance suite (openshift-e2e-test) 1h29m45s 53%
Post (gather artifacts + IPI teardown) 16m49s 10%
Total 2h49m05s 100%

Root Cause

The e2e-aws test runs the full OpenShift parallel conformance suite (openshift-e2e-test) — the same generic platform test suite that core components like apiserver and kubelet use. This test:

  • Does NOT deploy or test linuxptp-daemon in any PTP-specific way
  • Does NOT exercise PTP functionality (clock synchronization, ptp4l, phc2sys)
  • Tests generic platform behavior (networking, storage, APIs, scheduling, etc.)
  • Requires provisioning a brand new AWS cluster via IPI for every job run

The built ptp container image is included in the release payload, but linuxptp-daemon is an optional component — it is never deployed on a standard cluster unless the ptp-operator explicitly installs it. The conformance suite does not interact with the ptp image at all. (example build log)

Solution

Remove the e2e-aws test entirely from all linuxptp-daemon CI configurations (main + all release branches).

Why Removal Instead of ClusterClaim Optimization

I initially considered switching from IPI cluster provisioning to Hive ClusterClaim (pre-provisioned clusters), as was done for cloud-event-proxy in PR #74797. However, the situations are different:

Aspect cloud-event-proxy linuxptp-daemon
Custom test step Yes — make functests (~2.5 min) None — runs generic conformance (~1h30m)
Tests own component Yes — deploys and tests cloud-event-proxy No — tests the entire OCP platform
ClusterClaim savings 95% (1-2h → ~15 min) Only ~38% (3h → ~1h45m)

Switching to ClusterClaim would save ~1 hour (eliminating IPI install/teardown), but the conformance suite itself would still take 1.5 hours — and it's not testing anything relevant to linuxptp-daemon.

Comparable Repos With No Cluster-Based Tests At All

Several CNF/networking/hardware component repos — including ones from the same PTP ecosystem — run only container-based tests (unit tests, image builds, formatting checks) with no cluster provisioning whatsoever:

Repository Domain Presubmit Tests
openshift/pf-status-relay PTP/SRIOV relay unit, verify-deps
redhat-cne/hw-event-proxy Hardware event proxy (PTP/CNF) image build only
openshift/sriov-network-metrics-exporter SRIOV networking unit, security, verify-deps
openshift/node-feature-discovery Hardware discovery unit, verify, verify-deps
openshift/rdma-cni RDMA CNI security, verify-deps

pf-status-relay and hw-event-proxy are from the same PTP/CNF ecosystem as linuxptp-daemon and follow the same pattern this PR adopts: container-based presubmit tests only, with integration testing delegated to the operator-level CI and periodic telco5g pipelines.

What Tests Remain

After this change, linuxptp-daemon PRs still run these presubmit tests:

Test What It Does Duration
unit-test make test — Go unit tests with coverage ~minutes
gofmt make fmt — code formatting validation ~seconds
images Builds the ptp container image from Dockerfile ~minutes
verify-deps Validates Go module dependencies ~minutes
security Snyk vulnerability scan (optional) ~minutes

Estimated total presubmit time: ~5-10 minutes (down from ~3 hours).

Integration Testing Continues Elsewhere

The removal of the generic conformance suite does not leave linuxptp-daemon untested in a cluster context:

  1. ptp-operator operator-e2e — On ptp-operator PRs, the operator-e2e test deploys the ptp-operator (which includes linuxptp-daemon as a DaemonSet) via OLM and runs validation tests. The ptp-operator CI config pulls linuxptp-daemon as a base_image and substitutes the built image into the operator bundle.

  2. Release gating — The OpenShift release controller runs the full conformance suite as part of release acceptance. If a linuxptp-daemon image in the release payload somehow broke the platform, it would be caught at the release gate level.

  3. telco5g periodic CI — The full PTP conformance tests (make functests from openshift/ptp-operator) run periodically on bare-metal clusters with PTP-capable hardware. These tests exercise actual clock synchronization and are the only tests that meaningfully validate PTP daemon behavior.

Changes Made

  • Removed the e2e-aws test definition from all 23 CI config files across every branch (main, release-4.3 through release-5.0) in ci-operator/config/openshift/linuxptp-daemon/
  • Regenerated Prow job files in ci-operator/jobs/openshift/linuxptp-daemon/ via make ci-operator-prowgen
  • All other presubmit tests (unit-test, gofmt, images, security, verify-deps) remain unchanged

Pros and Cons

Pros

  • ~3 hours saved per PR — presubmit drops from ~3h to ~5-10 min
  • Reduces CI resource consumption — eliminates AWS cluster provisioning (IPI) and 1.5 hours of e2e test compute per job run, across 23 branches
  • Aligns with community practice — comparable optional operator repos do not run the generic conformance suite
  • No loss of meaningful test coverage — the conformance suite was not testing PTP functionality; actual PTP testing happens in ptp-operator and telco5g CI
  • No risk of PTP regressions going undetected — unit tests, image builds, ptp-operator integration tests, and release gating all continue

Cons

  • Loses generic platform conformance signal — if a linuxptp-daemon change somehow broke the broader OpenShift platform (extremely unlikely for a userspace PTP daemon), it would not be caught at presubmit time. It would instead be caught by the release controller's gating jobs.
  • Relies on ptp-operator CI for integration testing — if ptp-operator's operator-e2e test is broken or disabled, there is no presubmit cluster-level test for linuxptp-daemon. This is a reasonable tradeoff since the ptp-operator test is actively maintained and tests the actual component deployment.

Alternative: Keep the Conformance Suite on ClusterClaim

If reviewers prefer to keep the full OpenShift conformance suite running for linuxptp-daemon, it can be switched from IPI to ClusterClaim instead of being removed. This would use a pre-provisioned cluster from a Hive pool, eliminating the ~48 min IPI install and ~17 min teardown while still running the same openshift-e2e-test step.
This would reduce job time from ~3 hours to ~1h45m — a meaningful improvement.

@yaronh12 yaronh12 changed the title linuxptp-daemon CI Optimization: Removing the Generic e2e Conformance Suite linuxptp-daemon CI Optimization Mar 19, 2026
@openshift-ci-robot openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Mar 19, 2026
@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@yaronh12: no rehearsable tests are affected by this change

Note: If this PR includes changes to step registry files (ci-operator/step-registry/) and you expected jobs to be found, try rebasing your PR onto the base branch. This helps pj-rehearse accurately detect changes when the base branch has moved forward.

@openshift-ci openshift-ci bot requested review from josephdrichard and jzding March 19, 2026 10:36
@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 19, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 19, 2026

Hi @yaronh12. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 19, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: yaronh12
Once this PR has been reviewed and has the lgtm label, please assign jzding for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 19, 2026

@yaronh12: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@shirmoran
Copy link
Contributor

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 19, 2026
@yaronh12
Copy link
Contributor Author

/cc edcdavid

@openshift-ci openshift-ci bot requested a review from edcdavid March 19, 2026 13:07
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 19, 2026

@yaronh12: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ok-to-test Indicates a non-member PR verified by an org member that is safe to test. rehearsals-ack Signifies that rehearsal jobs have been acknowledged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants