fix(actions): retry npm publish, not just the visibility poll#26
Conversation
The previous loop ran `npm publish` exactly once and then polled `npm view` between sleeps. That handles the eventual-consistency case (publish appeared to fail but actually went through) but does NOT retry a real transient failure — registry 5xx, network blip, TLS error — where the artifact never reached the registry. stdnum's previous inline publish loop did retry the publish itself up to 5 times; that semantic is restored here, with a single `already_published` check inside the loop covering both modes: - publish fails AND artifact landed (eventual consistency) → exit 0 - publish fails AND artifact didn't land → next attempt Backoff lengthened to 5s, 10s, 15s, 20s between attempts (50s total across 5 attempts) to give the registry a real chance to recover from a transient. Addresses codex P2 on stella/stdnum#95.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6001bc2924
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (( attempt == MAX_ATTEMPTS )); then | ||
| printf '::error::Failed to publish %s@%s after %d attempts.\n' \ | ||
| "${PACKAGE_NAME}" "${PACKAGE_VERSION}" "${attempt}" >&2 | ||
| exit 1 |
There was a problem hiding this comment.
Poll after the final publish failure before exiting
If the fifth npm publish is the first attempt that reaches the registry but exits non-zero before the version is visible, this branch exits immediately after one already_published check. That reintroduces the eventual-consistency failure for the last attempt: a package can be accepted and become visible seconds later, but the action has already failed. Please keep polling/sleeping after the final failed publish as well before returning failure.
Useful? React with 👍 / 👎.
…blish If the 5th and final `npm publish` is the first attempt that reaches the registry but exits non-zero before the version is visible, the previous control flow ran one `already_published` check and exited immediately. That collapses the eventual-consistency window to ~0s for the last attempt and reintroduces the exact failure mode the retry loop was meant to avoid. After all publish attempts fail, run an additional 1/2/3/4/5s polling loop (15s) for the version to appear. Worst-case total time across all 5 publish attempts + final poll is ~65s. Addresses codex P2 review on PR #26.
There was a problem hiding this comment.
Code Review
This pull request introduces a retry mechanism for the npm publication process to handle transient network errors and registry eventual consistency. The implementation uses a loop to attempt the publish up to five times with an incremental backoff strategy. Feedback was provided regarding the GitHub Actions workflow commands, noting that the error message should be sent to standard output instead of standard error to ensure that UI annotations are correctly generated in the GitHub interface.
| printf '::error::Failed to publish %s@%s after %d attempts.\n' \ | ||
| "${PACKAGE_NAME}" "${PACKAGE_VERSION}" "${attempt}" >&2 |
There was a problem hiding this comment.
According to the GitHub Actions documentation, workflow commands like ::error:: must be written to standard output (stdout) to be processed correctly and create annotations in the UI. Redirecting this message to standard error (>&2) prevents the annotation from being created, though the message will still appear in the logs. To enable UI annotations, please remove the >&2 redirection. This advice also applies to other ::error:: calls in this file.
| printf '::error::Failed to publish %s@%s after %d attempts.\n' \ | |
| "${PACKAGE_NAME}" "${PACKAGE_VERSION}" "${attempt}" >&2 | |
| printf '::error::Failed to publish %s@%s after %d attempts.\n' \ | |
| "${PACKAGE_NAME}" "${PACKAGE_VERSION}" "${attempt}" |
|
CC on behalf of @jan-kubica Codex P2 addressed in the latest commit. After all 5 publish attempts fail, an additional 15s polling window (1+2+3+4+5s sleep cadence) runs before declaring failure — so an attempt-5 publish that lands but lags in registry visibility still gets credited. Worst-case total time across publish + final poll is ~65s. |
GitHub Actions parses `::error::`/`::warning::`/`::notice::` workflow
commands out of the runner's log stream to create UI annotations. Per
the docs, those commands must be written to stdout — `>&2` skips the
annotation pipeline and the message only shows up in raw logs.
Remove `>&2` from every `::error::` printf. The remaining `>&2` on
`cat "${PUBLISH_LOG}" >&2` is kept: that's the npm publish failure
output, which is genuine stderr content rather than a workflow
command.
Addresses gemini medium on PR #26.
|
CC on behalf of @jan-kubica Gemini medium addressed in the latest commit. All `::error::` printfs now go to stdout (the annotation processor reads workflow commands from stdout per GitHub docs). The remaining `>&2` redirect on `cat "$PUBLISH_LOG" >&2` is kept because that's genuine stderr content (npm publish's own error output), not a workflow command. |
Addresses codex P2 review on stella/stdnum#95.
The regression
The composite ran `npm publish` exactly once, then polled `npm view` between sleeps. That covers eventual consistency (publish appeared to fail but actually went through) but does not retry a real transient failure — registry 5xx, network blip, TLS error — where the artifact never reached the registry. The previous inline loops in stdnum and the napi-rs repos retried publish up to 5x; migrating to the composite as-shipped would weaken that.
Fix
Wrap `npm publish` in the retry loop. A single `already_published` check between attempts covers both failure modes:
Backoff lengthened to 5s, 10s, 15s, 20s between attempts (50s total across 5 attempts) — gives the registry a real chance to recover.
Test plan