Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions nix/docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,10 @@ learn how to play with `postgres` in the [build guide](./build-postgres.md).
- **[Testing PG Upgrade Scripts](./testing-pg-upgrade-scripts.md)** - Testing PostgreSQL upgrades
- **[Docker Image testing](./docker-testing.md)** - How to test the docker images against the pg_regress test suite.

## CI

- **[Nix Build Matrix](./nix-build-matrix-ci.md)** - Understand how the CI Nix build matrix works

## Reference

- **[References](./references.md)** - Useful links and resources
84 changes: 84 additions & 0 deletions nix/docs/nix-build-matrix-ci.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Nix Matrix CI

## Incident Q/A

Something broke the Nix CI, you need a quick and dirty fix to unblock you as fast as possible, follow these guides.

### **Q:** A test is failing; how to ignore it and generate an AMI image anyway?

You can adopt the nuclear approach and generate the AMI image regardless of the test outcome. To do that, remove the three conditions checking the `nix-build-checks` resullt in the if clause of the `run-testinfra` step in the `.github/workflows/nix-build.yml` file.

IE. remove the following conditions:

```
(needs.nix-build-checks-aarch64-linux.result == 'skipped' || needs.nix-build-checks-aarch64-linux.result == 'success')
(needs.nix-build-checks-aarch64-darwin.result == 'skipped' || needs.nix-build-checks-aarch64-darwin.result == 'success')
(needs.nix-build-checks-x86_64-linux.result == 'skipped' || needs.nix-build-checks-x86_64-linux.result == 'success')
```

Note: the merge queue check will block the PR from getting merged to develop.

### **Q:** A hosted runner is down, how to reschedule a job somewhere else?

**A:** Edit the `BUILD_RUNNER_MAP` dictionary in the `github_matrix.py` script and change the `labels` entry to match one of the still functional GitHub runners.

You can see the available runners and their associated labels on [this page](https://github.com/supabase/postgres/actions/runners?tab=self-hosted). Note: the blacksmith runners are considered as "self-hosted" by GitHub.

### **Q:** The eval step is OOM-ing, what should I do?

**A:** The evaluation can be quite costly memory-wise. [nix-eval-jobs](https://github.com/NixOS/nix-eval-jobs) is spinning up multiple nix evaluation in parallel to speed things up. The tradeoff is an increased memory consumption compared to a single-process eval.

An easy way to go around a eval OOM issue is to reduce the number of parallel evals by overriding the `--nb-eval-jobs-workers` flag from the github_matrix call in the `github/workflows/nix-eval.yml` file.

IE. replace

```terminal
nix run --accept-flake-config .\#github-matrix -- checks legacyPackages
```

with:

```terminal
nix run --accept-flake-config .\#github-matrix -- --nb-eval-jobs-workers 30 checks legacyPackages
```

Note: by default, the `github_matrix.py` will spin up a eval instance per CPU. For a `blacksmith-32vcpu-ubuntu-2404` worker, it means it'll spin up 32 nix eval instances.

## Walkthrough the CI

The Nix artifacts are built from the `Nix CI` workflow defined in the `.github/workflows/nix-build.yml` file.

It's performed in 4 steps. Each step depending on the previous one.

### Step 1: Eval

Conceptually, this workflow evaluates the `legacyPackages` and `checks` flake outputs using [nix-eval-jobs](https://github.com/NixOS/nix-eval-jobs). This step produces a json map containing the jobs to build/check for each architecture. That json map is later consumed by the subsequent build and check steps.

Implementation-wise, most of the code lives in the `/nix/packages/github-matrix/github_matrix.py` python script. The script starts an instance of `nix-eval-jobs` and parses its output. Each parsed job is associated with a builder tag using the following order:

1. KVM packages -> self-hosted runners
2. Large packages on Linux -> 32vcpu ephemeral runners
3. Darwin packages -> self-hosted runners
4. Default -> ephemeral runners

KVM packages and large packages are determined respectively by the `kvm` and `big-parralel` Nix attributes.

GHA-wise, `.github/workflows/nix-eval.yml` is called by the `nix-build.yml` workflow. `github_matrix.py` is instanciated in the `Generate Nix Matrix` step through a `nix run` call. The resulting json map is stored in the workflow output and later used by the subsequent steps.

### Step 2: Build

This step is in charge of building the various Nix packages. Build matrices are instanciated for each system architecture.

Implementation-wise, this step is less complex than the eval one. Most of the magic lies in the machine selection. The previous step attached an instance label to each job on which kind of GitHub runner it should be executed.

The actual build step is a simple `nix build ${job}` invocation. The result of this build is pushed to the `nix-postgres-artifacts` s3 cache. This step is instantiated 3 times, once for each of the supported architectures: aarch64 darwin, aarch64 linux and x86_64 linux.

### Step 3: Check

This step uses again the JSON generated by the evaluation step to run various automated tests. Some of those require virtualization and are run on the self hosted runners able to perform KVM virtualization.

Implementation-wise, this step is very similar to the previous one. A matrix job instantiated once per target architecture. It's "just" running on a different set of Nix jobs. These tests do assume the various plugins have been built and are part of the Nix cache.

### Step 4: Images Build

The last step builds AMI images using the artifacts generated during step 2 and uses the `nix/packages/build-ami.nix` script to generate a AMI image based on ubuntu noble. The generation of the image is done in two steps.
2 changes: 2 additions & 0 deletions nix/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ nav:
- Testing PG Upgrade Scripts: testing-pg-upgrade-scripts.md
- Test Docker Images: docker-testing.md
- NixOS Tests on macOS: nixos-tests-on-macos.md
- CI:
- Nix Build Matrix: nix-build-matrix-ci.md
- References: references.md

validation:
Expand Down
11 changes: 8 additions & 3 deletions nix/packages/github-matrix/github_matrix.py
Original file line number Diff line number Diff line change
Expand Up @@ -280,15 +280,20 @@ def main() -> None:
parser = argparse.ArgumentParser(
description="Generate GitHub Actions matrix for Nix builds"
)
parser.add_argument(
"-j",
"--nb-eval-jobs-workers",
default=os.cpu_count() or 1,
type=int,
help="Number of parallel eval jobs. Defaults to ncpus.",
)
parser.add_argument(
"flake_outputs", nargs="+", help="Nix flake outputs to evaluate"
)

args = parser.parse_args()

max_workers: int = os.cpu_count() or 1

cmd = build_nix_eval_command(max_workers, args.flake_outputs)
cmd = build_nix_eval_command(args.nb_eval_jobs_workers, args.flake_outputs)

# Run evaluation and collect packages, warnings, and errors
packages, warnings_list, errors_list = run_nix_eval_jobs(cmd)
Expand Down
Loading