Update ingestion helper to use cloud run services#1999
Conversation
There was a problem hiding this comment.
Code Review
This pull request migrates the import automation infrastructure from Cloud Functions to Cloud Run services, containerizing the ingestion and import helpers and consolidating aggregation logic. Several critical issues were identified: the aggregation_utils.py file is missing from the PR, the SPANNER_GRAPH_DATABASE_ID environment variable is not configured for the new service, and removing DDL management from Terraform will break database initialization. Furthermore, the build process is brittle due to remote schema fetching, and a stale default URL persists in the update script.
| # Fetch proto file from GitHub | ||
| RUN curl -o storage.proto https://raw.githubusercontent.com/datacommonsorg/import/master/pipeline/data/src/main/proto/storage.proto | ||
|
|
||
| RUN curl -o schema.sql https://raw.githubusercontent.com/datacommonsorg/import/master/pipeline/spanner/src/main/resources/spanner_schema.sql |
There was a problem hiding this comment.
I think you should remove this. The point is was want to use the schema.sql in this directory. With this line, you're overriding the schema.sql to what's in the import/ repo.
And FYI - for the June 15th milestone, we will be switching over to the new schema which I believe will no longer need this
RUN curl -o storage.proto https://raw.githubusercontent.com/datacommonsorg/import/master/pipeline/data/src/main/proto/storage.proto
So we will be able to remove it within a month, and the schema.sql will be the complete schema.
gmechali
left a comment
There was a problem hiding this comment.
Just one change needed for the schema download on the dockerfile but otherwise LGTM.
Pls share this with Sandeep as well, so he doesn't waste time starting from an old branch!
Updated ingestion workflow, cloud build, and Terraform scripts to use cloud run services instead of cloud function for helpers.
Moved aggregation helper logic into ingestion helper to consolidate under a single docker image. We can always deploy them as independent services if required.