ENH: eliminate redundancy in code to generate indices #92
+827
−58
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
also:
Co-authored with Claude
Changes Summary: Replace generate-indices.py with IDCIndexDataManager
Overview
Replaced the redundant
generate-indices.pyscript with direct invocation ofIDCIndexDataManagerin the CD workflow to eliminate code duplication and improve maintainability.Files Modified
1.
.github/workflows/cd.yml(Lines 47-53)Before:
After:
Changes:
generate-indices.pywithidc_index_data_manager.py--generate-parquetflag--output-dir release_artifactsparameterGCP_PROJECT(matches manager's default)2.
.pre-commit-config.yaml(Lines 42-49)Added exclusion for
compare_parquet.py:Changes:
scripts/python/compare_parquet.pyfrom ruff lintingscripts/python/compare_parquet.pyfrom ruff formattingFiles Deleted
scripts/python/generate-indices.py(55 lines removed)This script was a redundant wrapper that duplicated functionality already available in
IDCIndexDataManager.generate_index_data_files().Verification:
hatch_build.pyusesidc_index_data_manager.pydirectlyci.ymldoes not use this scriptMissing Functionality Analysis
Result: No missing functionality identified.
The
IDCIndexDataManagerclass provides complete feature parity:assets/andscripts/sql/directoriesprior_versions_index(saves schema without descriptions)--output-dirparameterExpected Behavior
The CD workflow will continue to:
idc_index_data_manager.pywith CLI flagsrelease_artifacts/:Benefits
Code Quality
IDCIndexDataManagerclassClarity
hatch_build.pyalready uses the managerPre-commit Hygiene
compare_parquet.pyfrom linting to prevent unrelated errors in CITesting
✅ Pre-commit checks passed for all modified files:
check yaml- Validated cd.yml syntaxprettier- Formatted YAML filesValidate GitHub Workflows- Validated workflow syntaxruff-check- Passed (with compare_parquet.py excluded)ruff-format- Passed (with compare_parquet.py excluded)Migration Notes
No migration steps required. The change is backward compatible:
release_artifacts/)