Skip to content

Add repository package discovery script#89

Draft
AntTheLimey wants to merge 1 commit into
mainfrom
feat/repo-discovery-script
Draft

Add repository package discovery script#89
AntTheLimey wants to merge 1 commit into
mainfrom
feat/repo-discovery-script

Conversation

@AntTheLimey
Copy link
Copy Markdown
Member

@AntTheLimey AntTheLimey commented May 12, 2026

Summary

  • Docker-based tool that queries pgEdge DNF (including noarch) and APT repositories
  • Diffs discovered packages against catalog.json and reports NEW, MATCH, MISSING, EXCLUDED
  • Interactive walkthrough for adding new packages to the catalog (category, name, description)
  • Shell glob exclusion patterns for non-user-facing packages (debuginfo, devel, llvmjit, etc.)
  • Self-test suite with 5 checks, no Docker required

Usage

# Discover and diff (no interactive walkthrough)
./scripts/discover-repo-packages.sh --discover-only

# Full interactive mode
./scripts/discover-repo-packages.sh

# Run self-tests
./scripts/discover-repo-packages.sh --self-test

Test plan

  • Run ./scripts/discover-repo-packages.sh --self-test — all 5 checks pass
  • Run ./scripts/discover-repo-packages.sh --discover-only — exits 0, shows MATCH/EXCLUDED with no unexpected NEW
  • Run ./scripts/discover-repo-packages.sh --discover-only --verbose — excluded packages listed individually
  • Verify scripts/output/ is gitignored and not committed

Summary by CodeRabbit

  • Chores
    • Added automated package discovery tool that identifies packages from system repositories and validates them against catalog configurations.
    • Introduced package exclusion patterns and expanded test fixtures to support discovery validation.
    • Updated build configuration to exclude generated output directories.

Review Change Stack

Docker-based tool that queries pgEdge DNF and APT repositories,
diffs discovered packages against catalog.json, and interactively
walks through adding new packages to the catalog.

Features:
- Queries both pgedge and pgedge-noarch EL repos + DEB repo
- Shell glob exclusion patterns for non-user-facing packages
- PG-version detection (standalone vs versioned)
- Meta-package membership from DEB deps (authoritative source)
- Interactive walkthrough with category/name/description prompts
- Self-test suite (5 checks, no Docker required)
- Flags: --discover-only, --verbose, --el-only, --deb-only

Run: ./scripts/discover-repo-packages.sh --discover-only
Test: ./scripts/discover-repo-packages.sh --self-test
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 141aae84-dbdb-4d6c-b1e2-43c91500a8b7

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/repo-discovery-script

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production
Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying pgedge-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: df6ba61
Status: ✅  Deploy successful!
Preview URL: https://c3f13c5a.pgedge-docs.pages.dev
Branch Preview URL: https://feat-repo-discovery-script.pgedge-docs.pages.dev

View logs

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
scripts/discover-repo-packages.sh (3)

1027-1027: Stale TODO marker — confirm intent before merging.

# Additional tests added in later tasks reads like a hand-off note from PR drafting. If no follow-up tests are planned for this PR, remove it; otherwise capture the work in a tracking issue so it doesn't linger.

Would you like me to open a follow-up issue describing the remaining self-test coverage (e.g., catalog version-expansion, exclusion-glob edge cases) so this comment can be removed?

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/discover-repo-packages.sh` at line 1027, The inline TODO comment "#
Additional tests added in later tasks" is stale—either remove it or replace it
with a tracked work item; confirm whether more tests will be added for self-test
coverage (catalog version-expansion, exclusion-glob edge cases) and if not
delete the line from scripts/discover-repo-packages.sh, otherwise create a
follow-up issue and replace the comment with the issue number or URL so the
intent is captured and won't linger; look for the exact string "# Additional
tests added in later tasks" to locate and update the comment.

668-669: 💤 Low value

SC2188: use : > file instead of a bare redirection.

> "$file" with no command is a quirk that works in bash but is flagged by shellcheck and isn't portable to other POSIX shells. Replace with the standard : > "$file" idiom in both discover_el and discover_deb.

♻️ Proposed fix (apply at both locations)
-    > "$output_dir/rocky9-packages.txt"
-    > "$output_dir/rocky9-metapkg-deps.txt"
+    : > "$output_dir/rocky9-packages.txt"
+    : > "$output_dir/rocky9-metapkg-deps.txt"
-    > "$output_dir/debian12-packages.txt"
-    > "$output_dir/debian12-metapkg-deps.txt"
+    : > "$output_dir/debian12-packages.txt"
+    : > "$output_dir/debian12-metapkg-deps.txt"

Also applies to: 717-718

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/discover-repo-packages.sh` around lines 668 - 669, The script uses
bare redirections like > "$output_dir/rocky9-packages.txt" (and similarly for
"$output_dir/rocky9-metapkg-deps.txt") inside the discover_el and discover_deb
logic, which ShellCheck SC2188 flags as non-portable; replace each bare
redirection with the portable no-op truncation form : >
"$output_dir/rocky9-packages.txt" and : > "$output_dir/rocky9-metapkg-deps.txt"
(and the analogous two occurrences around lines noted for discover_deb) so files
are created/truncated portably.

458-458: ⚡ Quick win

Hardcoded 18 will silently regress when a new PG major ships.

detect_pg_versions already enumerates ("16" "17" "18") (lines 89, 160), the metapkg repoquery/apt-cache depends calls target …_18 / …-18 directly, and the {ver} expansion in interactive_walkthrough uses 18 as the canonical resolution. The moment PG 19 lands, all three places will need to be updated in lockstep or membership detection will quietly miss the new variants.

Centralize this — for example a single DEFAULT_PG_VERSION constant near the top (sourced from default_pg_version in catalog.json if available), and reference it in discover_el, discover_deb, and interactive_walkthrough.

Also applies to: 660-662, 709-711

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/discover-repo-packages.sh` at line 458, Replace the hardcoded "18" by
introducing a single DEFAULT_PG_VERSION variable near the top (set it from
catalog.json's default_pg_version if present, otherwise derive the latest entry
from detect_pg_versions), then update usage sites: change the rpm_name expansion
that sets check_name (where rpm_name//\{ver\}/18), the metapkg
repoquery/apt-cache depends constructions in discover_el and discover_deb, and
the {ver} resolution in interactive_walkthrough to use ${DEFAULT_PG_VERSION};
keep detect_pg_versions unchanged as the authoritative list but reference
DEFAULT_PG_VERSION for canonical resolution so new PG majors don’t require
multiple edits.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/discover-repo-packages.sh`:
- Around line 202-211: The pkg_in_metapkg function (and the same loose check
used in interactive_walkthrough for in_full/in_minimal classification) currently
treats any dependency containing $pkg as a match, causing false positives;
change the matching logic to only accept exact tokens or clearly delimited
variants (e.g., match "$dep" == "$pkg" or "$dep" == "${pkg}_*" or split dep
tokens and compare each token exactly) so versioned or suffixed names like
pkg_... are matched intentionally and not by arbitrary substring containment;
update pkg_in_metapkg and the corresponding checks in interactive_walkthrough to
use this stricter comparison.
- Around line 53-58: The load_exclusions loop currently uses line="${line// /}"
which removes all internal spaces and ignores tabs, breaking patterns with
leading tabs; change the trimming logic in load_exclusions so you only strip
leading and trailing whitespace (spaces and tabs) and preserve internal spaces
when populating EXCLUSION_PATTERNS. Replace the single global-space removal with
a two-step trim using POSIX parameter expansion (remove leading whitespace, then
trailing whitespace) before the empty-check and appending to EXCLUSION_PATTERNS;
keep the existing comment-stripping step (line="${line%%#*}") and the rest of
the loop unchanged so is_excluded can match correctly.

---

Nitpick comments:
In `@scripts/discover-repo-packages.sh`:
- Line 1027: The inline TODO comment "# Additional tests added in later tasks"
is stale—either remove it or replace it with a tracked work item; confirm
whether more tests will be added for self-test coverage (catalog
version-expansion, exclusion-glob edge cases) and if not delete the line from
scripts/discover-repo-packages.sh, otherwise create a follow-up issue and
replace the comment with the issue number or URL so the intent is captured and
won't linger; look for the exact string "# Additional tests added in later
tasks" to locate and update the comment.
- Around line 668-669: The script uses bare redirections like >
"$output_dir/rocky9-packages.txt" (and similarly for
"$output_dir/rocky9-metapkg-deps.txt") inside the discover_el and discover_deb
logic, which ShellCheck SC2188 flags as non-portable; replace each bare
redirection with the portable no-op truncation form : >
"$output_dir/rocky9-packages.txt" and : > "$output_dir/rocky9-metapkg-deps.txt"
(and the analogous two occurrences around lines noted for discover_deb) so files
are created/truncated portably.
- Line 458: Replace the hardcoded "18" by introducing a single
DEFAULT_PG_VERSION variable near the top (set it from catalog.json's
default_pg_version if present, otherwise derive the latest entry from
detect_pg_versions), then update usage sites: change the rpm_name expansion that
sets check_name (where rpm_name//\{ver\}/18), the metapkg repoquery/apt-cache
depends constructions in discover_el and discover_deb, and the {ver} resolution
in interactive_walkthrough to use ${DEFAULT_PG_VERSION}; keep detect_pg_versions
unchanged as the authoritative list but reference DEFAULT_PG_VERSION for
canonical resolution so new PG majors don’t require multiple edits.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: aac0f51a-6f47-41ac-9b44-7a05e09693fa

📥 Commits

Reviewing files that changed from the base of the PR and between 63d9be1 and df6ba61.

📒 Files selected for processing (10)
  • .gitignore
  • scripts/discover-repo-packages.sh
  • scripts/excluded-packages.txt
  • scripts/test-fixtures/catalog-test.json
  • scripts/test-fixtures/deb-metapkg-deps.txt
  • scripts/test-fixtures/deb-packages.txt
  • scripts/test-fixtures/el-metapkg-deps.txt
  • scripts/test-fixtures/el-packages.txt
  • scripts/test-fixtures/excluded-test.txt
  • scripts/test-fixtures/expected-output.json

Comment on lines +53 to +58
while IFS= read -r line; do
line="${line%%#*}"
line="${line// /}"
[[ -z "$line" ]] && continue
EXCLUSION_PATTERNS+=("$line")
done < "$file"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

load_exclusions strips all internal spaces and leaves tabs untouched.

line="${line// /}" removes every space character anywhere in the pattern (not just leading/trailing) and does nothing about tabs. Any tab-indented entry in excluded-packages.txt keeps its leading \t, so is_excluded will never match (pgedge-radar\tpgedge-radar*). Use a proper trim that handles spaces and tabs on both ends only.

🛠 Proposed fix
     while IFS= read -r line; do
         line="${line%%#*}"
-        line="${line// /}"
+        # Trim leading/trailing whitespace (spaces and tabs) only.
+        line="${line#"${line%%[![:space:]]*}"}"
+        line="${line%"${line##*[![:space:]]}"}"
         [[ -z "$line" ]] && continue
         EXCLUSION_PATTERNS+=("$line")
     done < "$file"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
while IFS= read -r line; do
line="${line%%#*}"
line="${line// /}"
[[ -z "$line" ]] && continue
EXCLUSION_PATTERNS+=("$line")
done < "$file"
while IFS= read -r line; do
line="${line%%#*}"
# Trim leading/trailing whitespace (spaces and tabs) only.
line="${line#"${line%%[![:space:]]*}"}"
line="${line%"${line##*[![:space:]]}"}"
[[ -z "$line" ]] && continue
EXCLUSION_PATTERNS+=("$line")
done < "$file"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/discover-repo-packages.sh` around lines 53 - 58, The load_exclusions
loop currently uses line="${line// /}" which removes all internal spaces and
ignores tabs, breaking patterns with leading tabs; change the trimming logic in
load_exclusions so you only strip leading and trailing whitespace (spaces and
tabs) and preserve internal spaces when populating EXCLUSION_PATTERNS. Replace
the single global-space removal with a two-step trim using POSIX parameter
expansion (remove leading whitespace, then trailing whitespace) before the
empty-check and appending to EXCLUSION_PATTERNS; keep the existing
comment-stripping step (line="${line%%#*}") and the rest of the loop unchanged
so is_excluded can match correctly.

Comment on lines +202 to +211
pkg_in_metapkg() {
local pkg="$1"
local -n arr="$2"
for dep in "${arr[@]}"; do
if [[ "$dep" == "$pkg" || "$dep" == *"$pkg"* ]]; then
return 0
fi
done
return 1
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Substring fallback in pkg_in_metapkg produces false positives.

[[ "$dep" == *"$pkg"* ]] matches any dependency whose name contains $pkg as a substring — e.g. looking up pgedge-spock will match pgedge-spock50_18, and looking up pgedge-pg will match nearly everything. The same loose check is reused in interactive_walkthrough (lines 463–476) for in_full/in_minimal classification, so newly discovered packages can be silently mis-classified as included in full/minimal.

Consider matching exact tokens, or constrain the substring check to a well-defined prefix/suffix (e.g. "$pkg"_* for versioned variants).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/discover-repo-packages.sh` around lines 202 - 211, The pkg_in_metapkg
function (and the same loose check used in interactive_walkthrough for
in_full/in_minimal classification) currently treats any dependency containing
$pkg as a match, causing false positives; change the matching logic to only
accept exact tokens or clearly delimited variants (e.g., match "$dep" == "$pkg"
or "$dep" == "${pkg}_*" or split dep tokens and compare each token exactly) so
versioned or suffixed names like pkg_... are matched intentionally and not by
arbitrary substring containment; update pkg_in_metapkg and the corresponding
checks in interactive_walkthrough to use this stricter comparison.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant