fix(ci, server): Get lts server pipelines green again#27303
Conversation
…l on lts The 'Replacing Lerna with pnpm on LTS' change (microsoft#25537) updated include-set-package-version.yml to declare a required 'interdependencyRange' parameter, but did not update build-docker-service.yml (which invokes that template) to declare or forward it. Server pipelines that extend build-docker-service.yml (server-gitrest, server-historian, server-routerlicious) therefore fail at queue time with: A value for the 'interdependencyRange' parameter must be provided. Mirror the wiring already present on main: declare the parameter on build-docker-service.yml with a default of '"^"' and forward it into the include-set-package-version.yml template invocation. Servers on lts have no .releaseGroup files, so update-package-version.sh ignores the value at runtime — this is purely a YAML wiring fix. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
After PR microsoft#25537 migrated lts from Lerna to pnpm, the root lerna.json was removed. flub 0.5.x relied on lerna.json to identify the repo root, so 'flub generate buildVersion' (run from server/<service>) now fails with: ERROR: Unknown repo root. Specify it with --root or environment variable _FLUID_ROOT_ build-client.yml on lts already bumped its buildToolsVersionToInstall default to ^0.55.0 (which understands the post-pnpm layout) and runs green. Mirror that on the server pipelines and the docker template so 'flub generate buildVersion' can locate the repo root again. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bumping the server pipelines to flub ^0.55.0 (the post-pnpm version) revealed that '@fluid-tools/build-cli@0.55.0' requires Node >=20.15.1. The build-docker-service.yml agent was still pinned to Node 18.17.1, so 'flub commands' failed during 'Check Build Tools Installation' with: SyntaxError: Invalid regular expression flags build-npm-package.yml on lts (used by build-client) already uses Node 20.15.1 with the same flub version and runs the same set-version steps green. Mirror that here so the docker-service templates also satisfy flub's engine requirements. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…er Dockerfiles on lts Debian 10 (buster) reached EOL and its packages are no longer available from deb.debian.org, so the 'Docker Build - base' step now fails with: E: The repository 'http://deb.debian.org/debian buster Release' does not have a Release file. Bump the base image distro from buster to bookworm (Debian 12) while keeping the Node version (18.17.1) and the slim/full variant unchanged to minimize delta. The apt packages each Dockerfile installs (build-essential, libssl-dev, python3, libcurl4-openssl-dev, make, git, curl, g++, openssl, ca-certificates) are all available in bookworm. This is the minimum needed to unblock the server pipeline builds on lts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…packages on lts PR microsoft#25537 bumped server/routerlicious/packages/kafka-orderer/package.json's eslint devDependency from ~8.6.0 to ~8.55.0, but did not update the other 18 sibling packages. This left the version-range inconsistent across the lerna release group, so lerna bootstrap --strict --hoist (used by the Docker image build via the root postinstall:lerna script) now aborts with EHOISTSTRICT: lerna WARN EHOIST_PKG_VERSION "@fluidframework/server-kafka-orderer" package depends on eslint@~8.55.0, which differs from the hoisted eslint@~8.6.0. lerna ERR! EHOISTSTRICT Package version inconsistencies found while hoisting. Fix the above warnings and retry. Align the other 18 packages to ~8.55.0 to match (newer of the two), and regenerate lerna-package-lock.json / package-lock.json accordingly to restore a consistent hoist target. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… on lts
The node-rdkafka native module rebuild during the Docker image build of
server-historian and server-routerlicious fails with:
ValueError: invalid mode: 'rU' while trying to load binding.gyp
gyp ERR! configure error
The hoisted node-gyp@5.1.1 is pulled in by npm-lifecycle@3.1.5 (a
transitive dep of @lerna/run-lifecycle@4.0.0). node-gyp 5.x's bundled gyp
opens binding.gyp with mode 'rU', which Python 3.11 (Bookworm's default,
since Issue 4) removed; node-gyp >=8 dropped it.
npm-lifecycle is deprecated and pinned at its final 3.1.5; bumping it
would require upgrading Lerna 4 -> 5+ (a much larger change). Instead,
add a minimal npm 'overrides' entry forcing node-gyp to ^10, and
regenerate both lerna-package-lock.json files (lockfileVersion: 1
preserved) so the hoisted node-gyp is 10.3.1.
node-gyp@10 supports Node >=18 and Python 3.6-3.13, covering both the
current Node 18.17.1 image and Bookworm's Python 3.11.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…y on lts
Building the native node-zookeeper module fails during the
server-routerlicious Docker image build:
/usr/bin/ld: ./.libs/libzookeeper_st.a(zookeeper.o):
in function 'init_ssl_for_socket':
zookeeper.c:(.text+0x52c8): undefined reference to 'FIPS_mode'
collect2: error: ld returned 1 exit status
zookeeper@5.3.2 bundles a libzookeeper C client (Apache ZooKeeper 3.5/3.6
era) that calls FIPS_mode(). That symbol was removed in OpenSSL 3.0.
Bookworm (now in the image after Issue 4) ships OpenSSL 3.x; Buster and
Bullseye shipped OpenSSL 1.1.1 where FIPS_mode still existed.
Apache ZooKeeper 3.8.2 added OpenSSL 3 support (ZOOKEEPER-4541).
node-zookeeper@6.0.0 was the first npm release to bundle ZK C 3.8.2.
zookeeper 7.0.x and 7.1.x still failed to compile in some environments,
so pin to ^7.2.0 (the same range main settled on in microsoft#26493). The API
surface used by services-ordering-zookeeper (new ZooKeeper, init, get,
close, once, removeAllListeners) is unchanged across 5.x -> 7.x.
Mirror main's defense-in-depth: also add a workspace-level override
('zookeeper': '^7.2.0') so any future transitive consumer cannot pin
the build back to a non-OpenSSL-3-compatible version. Today there is
no transitive consumer of zookeeper in routerlicious, so the lockfile
is unchanged by the override.
Regenerate server/routerlicious/lerna-package-lock.json with
lockfileVersion: 1 preserved.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
alexvy86
left a comment
There was a problem hiding this comment.
Beautiful PR description 😭 (joy tears).
I'll note that there's probably zero chance we'll ever publish a new server release from the lts branch, and for that reason I feel comfortable with decisions like keeping the base docker images on Node18. The cleanup is still awesome.
I didn't look through the lockfile changes in detail, but did notice lodash getting updated to a version that addressed a recent-memory CVE, which is nice. If there are other old resolved deps that will get flagged by Component Governance, I expect that the client builds for lts are already triggering those.
Yeah I was debating whether to follow this up with a Node update, but figured the first/main priority should be to get the pipeline green. |
Description
Tracked by AB#73288.
The server pipelines on
lts(server-gitrest,server-historian,server-routerlicious) have been failing since at least Dec. 2, 2025 — presumably since #25537 ("Replacing Lerna with pnpm on LTS") was merged, but there were no runs between that change landing and Dec. 2 to confirm earlier.Seven distinct issues are addressed here, surfaced one after the other as each previous blocker was unblocked. Issues 1–3 and 5 are direct fallout from #25537; Issue 4 is unrelated bit-rot exposed once the pipelines started running again, and Issues 6 and 7 are longstanding latent issues exposed by Issue 4's distro bump (Python 3.11 in Issue 6, OpenSSL 3 in Issue 7).
(
server-gitsshsetssetVersion: false, so it skips the failing template invocation and is unaffected by Issues 1–3; it is touched only for default consistency.)Issue 1 —
interdependencyRangenot forwarded throughbuild-docker-service.ymlSymptom
Pipelines fail at queue time before any job runs:
Cause
#25537 updated
tools/pipelines/templates/include-set-package-version.ymlto declare a requiredinterdependencyRangeparameter, but did not updatetools/pipelines/templates/build-docker-service.yml(which invokes that template) to declare or forward it. None of the server pipelines that extendbuild-docker-service.ymlpass it, so ADO refuses to expand the templates.Fix
Mirror the wiring already present on
main:interdependencyRange(default"^") onbuild-docker-service.yml.include-set-package-version.ymlin thesetVersion: truebranch.Runtime impact
None. None of the server release groups on
ltshave a.releaseGroupfile, soscripts/update-package-version.shalways takes the "independent package" branch and never readsINTERDEPENDENCY_RANGE. This is purely a YAML wiring repair so ADO can expand the templates again.Issue 2 —
flub generate buildVersioncan't find the repo rootSymptom
After Issue 1 is fixed, the pipelines now run far enough to fail in the
Set Package Versionstep:Cause
The server pipelines were pinned to
buildToolsVersionToInstall: "^0.5.0". flub0.5.xrelied on the rootlerna.jsonto identify the repo root. #25537 removed that file when migrating to pnpm. The newer flub0.55.xunderstands the post-pnpm layout —build-client.ymlonltswas already bumped to"^0.55.0"as part of #25537 and is green today.Fix
Mirror the
build-client.ymlpattern: bump thebuildToolsVersionToInstalldefault from"^0.5.0"→"^0.55.0"on:tools/pipelines/server-gitrest.ymltools/pipelines/server-historian.ymltools/pipelines/server-routerlicious.ymltools/pipelines/server-gitssh.yml(consistency only)tools/pipelines/templates/build-docker-service.yml(template default)Issue 3 — Node 18 is incompatible with flub
0.55.xSymptom
After Issue 2 is fixed, the pipelines now run far enough to fail in
Check Build Tools Installation(right after theflub commandsinvocation):Cause
The
npm installof@fluid-tools/build-cli@0.55.0emits an explicit warning:flub
0.55uses regex syntax (likely thevflag) that Node 18 doesn't recognize. The agent inbuild-docker-service.ymlwas still on Node 18.17.1.build-npm-package.yml(used bybuild-client.yml, green onltstoday) already runs Node 20.15.1.Fix
Bump the
UseNode@1step intools/pipelines/templates/build-docker-service.ymlfrom 18.17.1 → 20.15.1 to matchbuild-npm-package.ymland satisfy flub's engine requirement.Issue 4 — Debian 10 ("buster") base image is archived
Symptom
After Issue 3 is fixed, the pipelines now run far enough to fail in
Docker Build - base:Cause
The
server/{gitrest,historian,routerlicious}/Dockerfilefiles allFROMDebian 10 ("buster") variants of the Node image. Debian 10 reached end-of-life and was archived fromdeb.debian.org, soapt-get updatereturns 404 for the buster repositories and the build fails before installing the build dependencies. (mainhas long since moved these tonode:22.22.2-bookworm-slim.)Fix
Minimal bump to the Debian distro only — keep Node 18.17.1 and the existing slim/full variant unchanged:
server/gitrest/Dockerfile:node:18.17.1-buster→node:18.17.1-bookwormserver/historian/Dockerfile:node:18.17.1-buster-slim→node:18.17.1-bookworm-slimserver/routerlicious/Dockerfile:node:18.17.1-buster-slim→node:18.17.1-bookworm-slimThe apt packages each Dockerfile installs (
build-essential,libssl-dev,python3,libcurl4-openssl-dev,make,git,curl,g++,openssl,ca-certificates) are all available in bookworm.Issue 5 —
eslintversion inconsistency inserver/routerliciousblockslerna bootstrap --strict --hoistSymptom
After Issue 4 is fixed,
server-routerliciousnow runs as far asDocker Build - baseand aborts during the rootnpm install --unsafe-perm'spostinstallstep:Cause
PR #25537 bumped only
server/routerlicious/packages/kafka-orderer/package.json'seslintdevDependency from~8.6.0to~8.55.0, leaving the other 18 sibling packages in the release group still at~8.6.0.lerna bootstrap --strict --hoist(run from the rootpostinstall:lernascript as part of the Docker image build) refuses to hoist when subpackages declare incompatible version ranges for the same dependency. The same script in earlier-known-good builds succeeded only because all 19 packages then agreed on~8.6.0.Fix
Align the other 18 routerlicious subpackages to
eslint@~8.55.0(the newer of the two pinned ranges, matchingkafka-orderer) and regenerateserver/routerlicious/lerna-package-lock.jsonandserver/routerlicious/package-lock.jsonaccordingly. Lockfile format (lockfileVersion: 1) is preserved (npm install --lockfile-version=1) to avoid an unrelated lockfile-format churn. No source files are changed; the new eslint surfaces only warnings (no errors), which the existingnpm run lintdoes not enforce.Issue 6 —
node-gyp@5.1.1(hoisted via deprecatednpm-lifecycle) is incompatible with Bookworm's Python 3.11Symptom
After Issue 5 is fixed,
server-historianandserver-routerliciousboth reach thenode-rdkafkanative module rebuild during the rootnpm install --unsafe-permand fail:Cause
The hoisted
node-gyp@5.1.1is pulled in transitively bynpm-lifecycle@3.1.5(a dep of@lerna/run-lifecycle@4.0.0).node-gyp 5.x's bundled gyp opensbinding.gypwith mode'rU', which Python 3.11 (Debian Bookworm's default — newly in the image after Issue 4) removed.node-gyp >=8dropped that mode.npm-lifecycleis itself deprecated and pinned at its final3.1.5, so bumping it would mean upgrading Lerna 4 -> 5+ — well out of scope for this PR.(
server-gitrestis unaffected because it doesn't depend onnode-rdkafkaor any other native module that triggers a node-gyp rebuild.)Fix
Add a minimal npm
overridesentry forcingnode-gypto^10in the rootpackage.jsonof bothserver/historianandserver/routerlicious, and regenerate theirlerna-package-lock.json/package-lock.jsonfiles (lockfileVersion: 1preserved) so the hoistednode-gypresolves to10.3.1.node-gyp@10supports Node>=18and Python 3.6–3.13, covering both the current Node 18.17.1 image and Bookworm's Python 3.11.Issue 7 —
node-zookeeper@5.3.2fails to link against OpenSSL 3 in BookwormSymptom
After Issue 6 unblocks routerlicious's install, the build fails when compiling the bundled
node-zookeeperC client:Cause
zookeeper@5.3.2bundles a libzookeeper C client (Apache ZooKeeper 3.5/3.6 era) that callsFIPS_mode(). That symbol was removed in OpenSSL 3.0. Debian Bookworm (now in the image after Issue 4) ships OpenSSL 3.x; Buster and Bullseye shipped OpenSSL 1.1.1 whereFIPS_modestill existed. Apache ZooKeeper 3.8.2 added OpenSSL 3 support (ZOOKEEPER-4541), andnode-zookeeper@6.0.0was the first npm release to bundle ZK C 3.8.2.7.0.xand7.1.xstill failed to compile in some environments, so the safe pin is^7.2.0— the same rangemainsettled on in #26493.Fix
Bump
zookeeperfrom^5.3.2to^7.2.0inservices-ordering-zookeeper/package.jsonand regenerateserver/routerlicious/lerna-package-lock.json(lockfileVersion: 1preserved). Also mirrormain's defense-in-depth and add a workspace-level"zookeeper": "^7.2.0"override inserver/routerlicious/package.jsonso any future transitive consumer cannot pin the build back to a non-OpenSSL-3-compatible version. The API surface used byservices-ordering-zookeeper(new ZooKeeper,init,get,close,once,removeAllListeners) is unchanged across5.x -> 7.x, so no source changes are needed.Validation
main(Issues 1, 4) or onlts'sbuild-client.yml(Issues 2 and 3); Issue 4 is the minimum bump that keeps Node 18 in the image. Issue 5 is the smallest in-tree alignment that letslerna bootstrap --strict --hoistsucceed without altering build behavior. Issue 6 is a transitive-dep override scoped to two packages that addresses the onlynode-gyp-using path in the install graph. Issue 7 mirrors thezookeeperbump and root override thatmainadopted in build(r11s): bump @types/node to ^22, upgrade zookeeper to ^7, and update webpack #26493 for the same OpenSSL 3 incompatibility.