Skip to content

fix(ci, server): Get lts server pipelines green again#27303

Merged
ChumpChief merged 7 commits into
microsoft:ltsfrom
ChumpChief:fix/lts-interdependencyRange-pipeline
May 18, 2026
Merged

fix(ci, server): Get lts server pipelines green again#27303
ChumpChief merged 7 commits into
microsoft:ltsfrom
ChumpChief:fix/lts-interdependencyRange-pipeline

Conversation

@ChumpChief
Copy link
Copy Markdown
Contributor

@ChumpChief ChumpChief commented May 13, 2026

Description

Tracked by AB#73288.

The server pipelines on lts (server-gitrest, server-historian, server-routerlicious) have been failing since at least Dec. 2, 2025 — presumably since #25537 ("Replacing Lerna with pnpm on LTS") was merged, but there were no runs between that change landing and Dec. 2 to confirm earlier.

Seven distinct issues are addressed here, surfaced one after the other as each previous blocker was unblocked. Issues 1–3 and 5 are direct fallout from #25537; Issue 4 is unrelated bit-rot exposed once the pipelines started running again, and Issues 6 and 7 are longstanding latent issues exposed by Issue 4's distro bump (Python 3.11 in Issue 6, OpenSSL 3 in Issue 7).

(server-gitssh sets setVersion: false, so it skips the failing template invocation and is unaffected by Issues 1–3; it is touched only for default consistency.)

Issue 1 — interdependencyRange not forwarded through build-docker-service.yml

Symptom

Pipelines fail at queue time before any job runs:

A value for the interdependencyRange parameter must be provided.

Cause

#25537 updated tools/pipelines/templates/include-set-package-version.yml to declare a required interdependencyRange parameter, but did not update tools/pipelines/templates/build-docker-service.yml (which invokes that template) to declare or forward it. None of the server pipelines that extend build-docker-service.yml pass it, so ADO refuses to expand the templates.

Fix

Mirror the wiring already present on main:

  • Declare interdependencyRange (default "^") on build-docker-service.yml.
  • Forward it to include-set-package-version.yml in the setVersion: true branch.

Runtime impact

None. None of the server release groups on lts have a .releaseGroup file, so scripts/update-package-version.sh always takes the "independent package" branch and never reads INTERDEPENDENCY_RANGE. This is purely a YAML wiring repair so ADO can expand the templates again.

Issue 2 — flub generate buildVersion can't find the repo root

Symptom

After Issue 1 is fixed, the pipelines now run far enough to fail in the Set Package Version step:

ERROR: Unknown repo root. Specify it with --root or environment variable _FLUID_ROOT_

Cause

The server pipelines were pinned to buildToolsVersionToInstall: "^0.5.0". flub 0.5.x relied on the root lerna.json to identify the repo root. #25537 removed that file when migrating to pnpm. The newer flub 0.55.x understands the post-pnpm layout — build-client.yml on lts was already bumped to "^0.55.0" as part of #25537 and is green today.

Fix

Mirror the build-client.yml pattern: bump the buildToolsVersionToInstall default from "^0.5.0""^0.55.0" on:

  • tools/pipelines/server-gitrest.yml
  • tools/pipelines/server-historian.yml
  • tools/pipelines/server-routerlicious.yml
  • tools/pipelines/server-gitssh.yml (consistency only)
  • tools/pipelines/templates/build-docker-service.yml (template default)

Issue 3 — Node 18 is incompatible with flub 0.55.x

Symptom

After Issue 2 is fixed, the pipelines now run far enough to fail in Check Build Tools Installation (right after the flub commands invocation):

SyntaxError: Invalid regular expression flags

Cause

The npm install of @fluid-tools/build-cli@0.55.0 emits an explicit warning:

npm warn EBADENGINE Unsupported engine {
npm warn EBADENGINE   package: '@fluid-tools/build-cli@0.55.0',
npm warn EBADENGINE   required: { node: '>=20.15.1' },
npm warn EBADENGINE   current: { node: 'v18.17.1', npm: '10.9.8' }
npm warn EBADENGINE }

flub 0.55 uses regex syntax (likely the v flag) that Node 18 doesn't recognize. The agent in build-docker-service.yml was still on Node 18.17.1. build-npm-package.yml (used by build-client.yml, green on lts today) already runs Node 20.15.1.

Fix

Bump the UseNode@1 step in tools/pipelines/templates/build-docker-service.yml from 18.17.1 → 20.15.1 to match build-npm-package.yml and satisfy flub's engine requirement.

Issue 4 — Debian 10 ("buster") base image is archived

Symptom

After Issue 3 is fixed, the pipelines now run far enough to fail in Docker Build - base:

E: The repository 'http://deb.debian.org/debian buster Release' does not have a Release file.

Cause

The server/{gitrest,historian,routerlicious}/Dockerfile files all FROM Debian 10 ("buster") variants of the Node image. Debian 10 reached end-of-life and was archived from deb.debian.org, so apt-get update returns 404 for the buster repositories and the build fails before installing the build dependencies. (main has long since moved these to node:22.22.2-bookworm-slim.)

Fix

Minimal bump to the Debian distro only — keep Node 18.17.1 and the existing slim/full variant unchanged:

  • server/gitrest/Dockerfile: node:18.17.1-busternode:18.17.1-bookworm
  • server/historian/Dockerfile: node:18.17.1-buster-slimnode:18.17.1-bookworm-slim
  • server/routerlicious/Dockerfile: node:18.17.1-buster-slimnode:18.17.1-bookworm-slim

The apt packages each Dockerfile installs (build-essential, libssl-dev, python3, libcurl4-openssl-dev, make, git, curl, g++, openssl, ca-certificates) are all available in bookworm.

Issue 5 — eslint version inconsistency in server/routerlicious blocks lerna bootstrap --strict --hoist

Symptom

After Issue 4 is fixed, server-routerlicious now runs as far as Docker Build - base and aborts during the root npm install --unsafe-perm's postinstall step:

lerna WARN EHOIST_PKG_VERSION "@fluidframework/server-kafka-orderer" package depends on eslint@~8.55.0, which differs from the hoisted eslint@~8.6.0.
lerna ERR! EHOISTSTRICT Package version inconsistencies found while hoisting. Fix the above warnings and retry.

Cause

PR #25537 bumped only server/routerlicious/packages/kafka-orderer/package.json's eslint devDependency from ~8.6.0 to ~8.55.0, leaving the other 18 sibling packages in the release group still at ~8.6.0. lerna bootstrap --strict --hoist (run from the root postinstall:lerna script as part of the Docker image build) refuses to hoist when subpackages declare incompatible version ranges for the same dependency. The same script in earlier-known-good builds succeeded only because all 19 packages then agreed on ~8.6.0.

Fix

Align the other 18 routerlicious subpackages to eslint@~8.55.0 (the newer of the two pinned ranges, matching kafka-orderer) and regenerate server/routerlicious/lerna-package-lock.json and server/routerlicious/package-lock.json accordingly. Lockfile format (lockfileVersion: 1) is preserved (npm install --lockfile-version=1) to avoid an unrelated lockfile-format churn. No source files are changed; the new eslint surfaces only warnings (no errors), which the existing npm run lint does not enforce.

Issue 6 — node-gyp@5.1.1 (hoisted via deprecated npm-lifecycle) is incompatible with Bookworm's Python 3.11

Symptom

After Issue 5 is fixed, server-historian and server-routerlicious both reach the node-rdkafka native module rebuild during the root npm install --unsafe-perm and fail:

npm ERR! command sh -c node-gyp rebuild
npm ERR! gyp info using node-gyp@5.1.1
npm ERR! gyp info using node@18.17.1 | linux | x64
npm ERR! gyp info find Python using Python version 3.11.2 found at "/usr/bin/python3"
...
npm ERR! ValueError: invalid mode: 'rU' while trying to load binding.gyp
npm ERR! gyp ERR! configure error

Cause

The hoisted node-gyp@5.1.1 is pulled in transitively by npm-lifecycle@3.1.5 (a dep of @lerna/run-lifecycle@4.0.0). node-gyp 5.x's bundled gyp opens binding.gyp with mode 'rU', which Python 3.11 (Debian Bookworm's default — newly in the image after Issue 4) removed. node-gyp >=8 dropped that mode. npm-lifecycle is itself deprecated and pinned at its final 3.1.5, so bumping it would mean upgrading Lerna 4 -> 5+ — well out of scope for this PR.

(server-gitrest is unaffected because it doesn't depend on node-rdkafka or any other native module that triggers a node-gyp rebuild.)

Fix

Add a minimal npm overrides entry forcing node-gyp to ^10 in the root package.json of both server/historian and server/routerlicious, and regenerate their lerna-package-lock.json / package-lock.json files (lockfileVersion: 1 preserved) so the hoisted node-gyp resolves to 10.3.1. node-gyp@10 supports Node >=18 and Python 3.6–3.13, covering both the current Node 18.17.1 image and Bookworm's Python 3.11.

Issue 7 — node-zookeeper@5.3.2 fails to link against OpenSSL 3 in Bookworm

Symptom

After Issue 6 unblocks routerlicious's install, the build fails when compiling the bundled node-zookeeper C client:

/usr/bin/ld: ./.libs/libzookeeper_st.a(zookeeper.o): in function `init_ssl_for_socket':
zookeeper.c:(.text+0x52c8): undefined reference to `FIPS_mode'
collect2: error: ld returned 1 exit status

Cause

zookeeper@5.3.2 bundles a libzookeeper C client (Apache ZooKeeper 3.5/3.6 era) that calls FIPS_mode(). That symbol was removed in OpenSSL 3.0. Debian Bookworm (now in the image after Issue 4) ships OpenSSL 3.x; Buster and Bullseye shipped OpenSSL 1.1.1 where FIPS_mode still existed. Apache ZooKeeper 3.8.2 added OpenSSL 3 support (ZOOKEEPER-4541), and node-zookeeper@6.0.0 was the first npm release to bundle ZK C 3.8.2. 7.0.x and 7.1.x still failed to compile in some environments, so the safe pin is ^7.2.0 — the same range main settled on in #26493.

Fix

Bump zookeeper from ^5.3.2 to ^7.2.0 in services-ordering-zookeeper/package.json and regenerate server/routerlicious/lerna-package-lock.json (lockfileVersion: 1 preserved). Also mirror main's defense-in-depth and add a workspace-level "zookeeper": "^7.2.0" override in server/routerlicious/package.json so any future transitive consumer cannot pin the build back to a non-OpenSSL-3-compatible version. The API surface used by services-ordering-zookeeper (new ZooKeeper, init, get, close, once, removeAllListeners) is unchanged across 5.x -> 7.x, so no source changes are needed.

Validation

  • Issues 1–4 are YAML- or Dockerfile-only and each mirror a pattern already proven on main (Issues 1, 4) or on lts's build-client.yml (Issues 2 and 3); Issue 4 is the minimum bump that keeps Node 18 in the image. Issue 5 is the smallest in-tree alignment that lets lerna bootstrap --strict --hoist succeed without altering build behavior. Issue 6 is a transitive-dep override scoped to two packages that addresses the only node-gyp-using path in the install graph. Issue 7 mirrors the zookeeper bump and root override that main adopted in build(r11s): bump @types/node to ^22, upgrade zookeeper to ^7, and update webpack #26493 for the same OpenSSL 3 incompatibility.
  • After each commit, the previously-blocked pipelines progress past the prior step and surface the next issue. Verify via the queued ADO runs on this PR.

ChumpChief and others added 7 commits May 13, 2026 10:31
…l on lts

The 'Replacing Lerna with pnpm on LTS' change (microsoft#25537) updated
include-set-package-version.yml to declare a required
'interdependencyRange' parameter, but did not update
build-docker-service.yml (which invokes that template) to declare or
forward it. Server pipelines that extend build-docker-service.yml
(server-gitrest, server-historian, server-routerlicious) therefore fail
at queue time with:

  A value for the 'interdependencyRange' parameter must be provided.

Mirror the wiring already present on main: declare the parameter on
build-docker-service.yml with a default of '"^"' and forward it into
the include-set-package-version.yml template invocation. Servers on lts
have no .releaseGroup files, so update-package-version.sh ignores the
value at runtime — this is purely a YAML wiring fix.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
After PR microsoft#25537 migrated lts from Lerna to pnpm, the root lerna.json was
removed. flub 0.5.x relied on lerna.json to identify the repo root, so
'flub generate buildVersion' (run from server/<service>) now fails with:

  ERROR: Unknown repo root. Specify it with --root or environment variable _FLUID_ROOT_

build-client.yml on lts already bumped its buildToolsVersionToInstall
default to ^0.55.0 (which understands the post-pnpm layout) and runs
green. Mirror that on the server pipelines and the docker template so
'flub generate buildVersion' can locate the repo root again.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bumping the server pipelines to flub ^0.55.0 (the post-pnpm version)
revealed that '@fluid-tools/build-cli@0.55.0' requires Node >=20.15.1.
The build-docker-service.yml agent was still pinned to Node 18.17.1, so
'flub commands' failed during 'Check Build Tools Installation' with:

  SyntaxError: Invalid regular expression flags

build-npm-package.yml on lts (used by build-client) already uses Node
20.15.1 with the same flub version and runs the same set-version steps
green. Mirror that here so the docker-service templates also satisfy
flub's engine requirements.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…er Dockerfiles on lts

Debian 10 (buster) reached EOL and its packages are no longer available
from deb.debian.org, so the 'Docker Build - base' step now fails with:

  E: The repository 'http://deb.debian.org/debian buster Release' does not have a Release file.

Bump the base image distro from buster to bookworm (Debian 12) while
keeping the Node version (18.17.1) and the slim/full variant unchanged
to minimize delta. The apt packages each Dockerfile installs
(build-essential, libssl-dev, python3, libcurl4-openssl-dev, make, git,
curl, g++, openssl, ca-certificates) are all available in bookworm.

This is the minimum needed to unblock the server pipeline builds on lts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…packages on lts

PR microsoft#25537 bumped server/routerlicious/packages/kafka-orderer/package.json's
eslint devDependency from ~8.6.0 to ~8.55.0, but did not update the other 18
sibling packages. This left the version-range inconsistent across the lerna
release group, so lerna bootstrap --strict --hoist (used by the Docker image
build via the root postinstall:lerna script) now aborts with EHOISTSTRICT:

    lerna WARN EHOIST_PKG_VERSION "@fluidframework/server-kafka-orderer"
        package depends on eslint@~8.55.0, which differs from the hoisted
        eslint@~8.6.0.
    lerna ERR! EHOISTSTRICT Package version inconsistencies found while
        hoisting. Fix the above warnings and retry.

Align the other 18 packages to ~8.55.0 to match (newer of the two), and
regenerate lerna-package-lock.json / package-lock.json accordingly to
restore a consistent hoist target.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… on lts

The node-rdkafka native module rebuild during the Docker image build of
server-historian and server-routerlicious fails with:

    ValueError: invalid mode: 'rU' while trying to load binding.gyp
    gyp ERR! configure error

The hoisted node-gyp@5.1.1 is pulled in by npm-lifecycle@3.1.5 (a
transitive dep of @lerna/run-lifecycle@4.0.0). node-gyp 5.x's bundled gyp
opens binding.gyp with mode 'rU', which Python 3.11 (Bookworm's default,
since Issue 4) removed; node-gyp >=8 dropped it.

npm-lifecycle is deprecated and pinned at its final 3.1.5; bumping it
would require upgrading Lerna 4 -> 5+ (a much larger change). Instead,
add a minimal npm 'overrides' entry forcing node-gyp to ^10, and
regenerate both lerna-package-lock.json files (lockfileVersion: 1
preserved) so the hoisted node-gyp is 10.3.1.

node-gyp@10 supports Node >=18 and Python 3.6-3.13, covering both the
current Node 18.17.1 image and Bookworm's Python 3.11.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…y on lts

Building the native node-zookeeper module fails during the
server-routerlicious Docker image build:

    /usr/bin/ld: ./.libs/libzookeeper_st.a(zookeeper.o):
        in function 'init_ssl_for_socket':
    zookeeper.c:(.text+0x52c8): undefined reference to 'FIPS_mode'
    collect2: error: ld returned 1 exit status

zookeeper@5.3.2 bundles a libzookeeper C client (Apache ZooKeeper 3.5/3.6
era) that calls FIPS_mode(). That symbol was removed in OpenSSL 3.0.
Bookworm (now in the image after Issue 4) ships OpenSSL 3.x; Buster and
Bullseye shipped OpenSSL 1.1.1 where FIPS_mode still existed.

Apache ZooKeeper 3.8.2 added OpenSSL 3 support (ZOOKEEPER-4541).
node-zookeeper@6.0.0 was the first npm release to bundle ZK C 3.8.2.
zookeeper 7.0.x and 7.1.x still failed to compile in some environments,
so pin to ^7.2.0 (the same range main settled on in microsoft#26493). The API
surface used by services-ordering-zookeeper (new ZooKeeper, init, get,
close, once, removeAllListeners) is unchanged across 5.x -> 7.x.

Mirror main's defense-in-depth: also add a workspace-level override
('zookeeper': '^7.2.0') so any future transitive consumer cannot pin
the build back to a non-OpenSSL-3-compatible version. Today there is
no transitive consumer of zookeeper in routerlicious, so the lockfile
is unchanged by the override.

Regenerate server/routerlicious/lerna-package-lock.json with
lockfileVersion: 1 preserved.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ChumpChief ChumpChief changed the title fix(ci): Forward interdependencyRange through build-docker-service.yml on lts fix(ci, server): Get lts server pipelines green again May 13, 2026
@ChumpChief ChumpChief marked this pull request as ready for review May 13, 2026 21:30
@ChumpChief ChumpChief requested review from a team and msfluid-bot as code owners May 13, 2026 21:30
Copy link
Copy Markdown
Contributor

@alexvy86 alexvy86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful PR description 😭 (joy tears).

I'll note that there's probably zero chance we'll ever publish a new server release from the lts branch, and for that reason I feel comfortable with decisions like keeping the base docker images on Node18. The cleanup is still awesome.

I didn't look through the lockfile changes in detail, but did notice lodash getting updated to a version that addressed a recent-memory CVE, which is nice. If there are other old resolved deps that will get flagged by Component Governance, I expect that the client builds for lts are already triggering those.

@ChumpChief
Copy link
Copy Markdown
Contributor Author

Beautiful PR description 😭 (joy tears).

I'll note that there's probably zero chance we'll ever publish a new server release from the lts branch, and for that reason I feel comfortable with decisions like keeping the base docker images on Node18. The cleanup is still awesome.

I didn't look through the lockfile changes in detail, but did notice lodash getting updated to a version that addressed a recent-memory CVE, which is nice. If there are other old resolved deps that will get flagged by Component Governance, I expect that the client builds for lts are already triggering those.

Yeah I was debating whether to follow this up with a Node update, but figured the first/main priority should be to get the pipeline green.

@ChumpChief ChumpChief requested review from a team and daesunp May 13, 2026 22:59
@ChumpChief ChumpChief merged commit 5ec545b into microsoft:lts May 18, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants