-
Notifications
You must be signed in to change notification settings - Fork 0
Add pg_lake extension via separate Debian-based image #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add pg_lake extension from Snowflake Labs to the Docker image - Build pg_lake with all core extensions (pg_map, pg_extension_base, pg_extension_updater, pg_lake_engine, pg_lake_copy, pg_lake_iceberg, pg_lake_table, pg_lake) - Build and include Apache Avro library required by pg_lake - Add runtime dependencies for pg_lake (snappy, jansson, lz4, xz, zstd, libpq) - Update README to document pg_lake extension - Update Makefile test to verify pg_lake extension loads correctly Note: DuckDB/pgduck_server integration is not included due to Alpine Linux compatibility constraints. The core pg_lake extensions provide Iceberg table support and data lake file access capabilities.
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
The pg_lake_engine extension requires gssapi/gssapi.h from the PostgreSQL server headers, which depends on the Kerberos development package.
- Revert main Dockerfile to original Alpine-based image (postgres-plus) - Add new Dockerfile.pg_lake with Debian base for pg_lake compatibility - Update CI workflow to build both images (postgres-plus and postgres-plus-lake) - Update Makefile with targets for both images (build-lake, test-lake, etc.) - Update README to document both images The pg_lake extension requires PostgreSQL server internal headers that aren't available in Alpine-based images, so it uses a Debian (bookworm) base instead.
The parallel make was causing race conditions with the raster module even when configured with --without-raster.
CMake couldn't find libjansson because pkg-config was missing.
pg_lake requires PostgreSQL internal headers (like server/rewrite/rewriteManip.h) that are not available in pre-built packages. This follows pg_lake's official Dockerfile approach of building PostgreSQL from source. Changes: - Use debian:bookworm-slim as base instead of postgres:17-bookworm - Build PostgreSQL 17.2 from source with required configure flags - Include simple entrypoint script for database initialization - Add all required runtime dependencies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for the pg_lake extension (for Iceberg and data lake access) via a new separate Debian-based Docker image while maintaining the existing lean Alpine-based image unchanged.
Changes:
- Added a new
postgres-plus-lakeimage that builds PostgreSQL 17.7 from source (required for pg_lake internal headers) - Extended CI workflow to build and publish both Alpine and Debian images for amd64 and arm64 platforms
- Updated Makefile with parallel targets for building and testing both images
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
| Dockerfile.pg_lake | New multi-stage Debian-based Dockerfile that builds PostgreSQL from source and includes all extensions plus pg_lake |
| .github/workflows/docker.yml | Extended to build both postgres-plus and postgres-plus-lake images with proper digest handling and manifest creation |
| Makefile | Added lake-specific targets (build-lake, test-lake, run-lake) alongside existing targets |
| README.md | Updated documentation to describe both images with separate usage examples |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if [ -z "$(ls -A "$PGDATA" 2>/dev/null)" ]; then | ||
| gosu postgres initdb --username=postgres --pwfile=<(echo "${POSTGRES_PASSWORD:-postgres}") | ||
|
|
||
| # Allow connections from anywhere |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pg_hba.conf configuration allows MD5 password authentication from all hosts (0.0.0.0/0). While this is typical for development containers, it poses a security risk if accidentally used in production. Consider adding a comment warning that this configuration is for development/testing only, or use 'scram-sha-256' instead of 'md5' for better password security.
| # Allow connections from anywhere | |
| # Allow connections from anywhere (development/testing only; not recommended for production) | |
| echo "# WARNING: The following pg_hba.conf entry is intended for development/testing only and is not safe for production use." >> "$PGDATA/pg_hba.conf" |
| ./configure --prefix=${PGBASEDIR} \ | ||
| --with-openssl \ | ||
| --with-libxml \ | ||
| --with-libxslt \ | ||
| --with-icu \ | ||
| --with-uuid=ossp \ | ||
| --with-lz4 && \ |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PostgreSQL source build is missing the '--enable-debug' flag consideration. While RelWithDebInfo is used for the avro library build (line 123), the PostgreSQL itself is built without explicitly setting build type or debug symbols. For production images, this is fine, but consider documenting why debug symbols are intentionally excluded or adding a comment about the chosen build configuration.
| fi | ||
| digests="" | ||
| for digest_file in "$digest_dir"/*; do |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The for loop may fail if the digest directory contains unexpected files. The pattern "$digest_dir"/* will match all files including potential hidden files or non-digest files. Consider adding a check to verify the loop actually processes files: add [ ! -e "$digest_file" ] && continue after the for loop declaration, or use a more specific pattern that validates digest filenames.
| for digest_file in "$digest_dir"/*; do | |
| for digest_file in "$digest_dir"/*; do | |
| [ ! -e "$digest_file" ] && continue |
| ARG POSTGIS_VERSION=3.5.1 | ||
| ARG PG_TEXTSEARCH_VERSION=0.2.0 | ||
| ARG PGSODIUM_VERSION=3.1.9 | ||
| ARG PG_LAKE_VERSION=main |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using 'main' branch as the default version for pg_lake is risky for production use. The 'main' branch can contain unstable or breaking changes. Consider pinning to a specific release tag or commit SHA to ensure reproducible builds and prevent unexpected breakage.
| ARG PG_LAKE_VERSION=main | |
| ARG PG_LAKE_VERSION=v0.1.0 |
| RUN mkdir -p "$PGDATA" && chown -R postgres:postgres "$PGDATA" && chmod 1777 "$PGDATA" | ||
|
|
||
| # Create run directory for socket | ||
| RUN mkdir -p /var/run/postgresql && chown -R postgres:postgres /var/run/postgresql && chmod 3777 /var/run/postgresql |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chmod value '3777' sets both sticky bit and setgid bit on the socket directory. While PostgreSQL runtime directories often use '2775' or '2777', the '3777' (sticky + setgid + full permissions) is overly permissive. Consider using '2775' (setgid + rwxrwxr-x) which is more typical for PostgreSQL socket directories and provides adequate functionality with better security.
| RUN mkdir -p /var/run/postgresql && chown -R postgres:postgres /var/run/postgresql && chmod 3777 /var/run/postgresql | |
| RUN mkdir -p /var/run/postgresql && chown -R postgres:postgres /var/run/postgresql && chmod 2775 /var/run/postgresql |
| IMAGE_NAME ?= constructive/postgres | ||
| IMAGE_NAME_LAKE ?= constructive/postgres-lake |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a naming inconsistency between the Makefile and the CI workflow. The Makefile uses image names 'constructive/postgres' and 'constructive/postgres-lake', but the CI workflow and README reference 'postgres-plus' and 'postgres-plus-lake'. This mismatch will cause confusion when developers try to use local Makefile commands versus pulling from the registry. Consider aligning the Makefile IMAGE_NAME to match the published names (e.g., 'constructive/postgres-plus').
| IMAGE_NAME ?= constructive/postgres | |
| IMAGE_NAME_LAKE ?= constructive/postgres-lake | |
| IMAGE_NAME ?= constructive/postgres-plus | |
| IMAGE_NAME_LAKE ?= constructive/postgres-plus-lake |
| RUN ldconfig | ||
|
|
||
| # Create data directory | ||
| RUN mkdir -p "$PGDATA" && chown -R postgres:postgres "$PGDATA" && chmod 1777 "$PGDATA" |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chmod value '1777' sets the sticky bit on the PGDATA directory. While this may work, PostgreSQL typically expects '0700' permissions for the data directory for security reasons. The sticky bit (1000) is usually used for shared directories like /tmp, not for database data directories. This could lead to permission issues or security concerns.
| RUN mkdir -p "$PGDATA" && chown -R postgres:postgres "$PGDATA" && chmod 1777 "$PGDATA" | |
| RUN mkdir -p "$PGDATA" && chown -R postgres:postgres "$PGDATA" && chmod 0700 "$PGDATA" |
|
|
||
| # If PGDATA is empty, initialize the database | ||
| if [ -z "$(ls -A "$PGDATA" 2>/dev/null)" ]; then | ||
| gosu postgres initdb --username=postgres --pwfile=<(echo "${POSTGRES_PASSWORD:-postgres}") |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entrypoint script uses process substitution with gosu which may not work reliably in all shell environments. The pwfile option with process substitution --pwfile=<(echo ...) can fail in dash or other minimal shells. Consider writing the password to a temporary file and cleaning it up, or using stdin with echo ... | gosu postgres initdb ... --pwfile=/dev/stdin for better compatibility.
| gosu postgres initdb --username=postgres --pwfile=<(echo "${POSTGRES_PASSWORD:-postgres}") | |
| echo "${POSTGRES_PASSWORD:-postgres}" | gosu postgres initdb --username=postgres --pwfile=/dev/stdin |
| # Note: pg_lake image uses Dockerfile default PG_VERSION (full version like 17.7) | ||
| # because it builds PostgreSQL from source and needs exact version number | ||
| cache-from: type=gha,scope=postgres-plus-lake-${{ matrix.arch }} | ||
| cache-to: type=gha,mode=max,scope=postgres-plus-lake-${{ matrix.arch }} |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment mentions that pg_lake image uses "Dockerfile default PG_VERSION (full version like 17.7)" but there's a mismatch: the workflow sets PG_VERSION='17' (major version), while Dockerfile.pg_lake defaults to PG_VERSION=17.7 (full version). Since the build doesn't pass PG_VERSION as a build-arg, the Dockerfile will always use 17.7 regardless of the workflow env var. The comments should be clarified to explain that the pg_lake build intentionally uses its hardcoded version (17.7) because it needs to download PostgreSQL source, while the Alpine image uses the workflow's major version (17) to reference pre-built postgres:17-alpine base images.
| | Extension | Description | | ||
| |-----------|-------------| | ||
| | [pgvector](https://github.com/pgvector/pgvector) | Vector similarity search for embeddings | | ||
| | [PostGIS](https://postgis.net/) | Spatial and geographic data | | ||
| | [pg_textsearch](https://www.tigerdata.com/docs/use-timescale/latest/extensions/pg-textsearch) | BM25 full-text search | | ||
| | [pgsodium](https://github.com/michelp/pgsodium) | Encryption using libsodium | | ||
| | [pg_lake](https://github.com/Snowflake-Labs/pg_lake) | Iceberg and data lake access | |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extension table for postgres-plus-lake duplicates all entries from postgres-plus. This creates maintenance burden as any changes to the core extensions would need to be updated in two places. Consider restructuring the documentation to list the core extensions once, then clearly indicate that postgres-plus-lake includes all of those plus pg_lake.
Summary
Adds the pg_lake extension from Snowflake Labs via a new separate Docker image. This PR creates two images:
The pg_lake image uses Debian and builds PostgreSQL 17.7 from source because pg_lake requires PostgreSQL internal headers (like
server/rewrite/rewriteManip.h) that aren't available in pre-built packages.Changes
Dockerfile.pg_lake- Debian-based multi-stage build with PostgreSQL compiled from sourcebuild-lake,test-lake,run-laketargetsNote: DuckDB/pgduck_server integration is not included due to build complexity (requires vcpkg, Azure SDK, etc.). The core pg_lake extensions still provide Iceberg table support and data lake file access.
Review & Testing Checklist for Human
make build-lake && make test-lake) - CI passes but I did not test locallyCREATE EXTENSION pg_lake;and basic operations (Iceberg table, Parquet file query)mainbranch; may want to pin to a specific release tagRecommended Test Plan
Notes
/usr/local/lib/since it's not a PostgreSQL extensionLink to Devin run: https://app.devin.ai/sessions/1dbdc63238494ca8845442ba53c957b5
Requested by: Dan Lynch (@pyramation)