Update ingestion helper to use cloud run services by vish-cs · Pull Request #1999 · datacommonsorg/data

vish-cs · 2026-05-11T10:51:11Z

Updated ingestion workflow, cloud build, and Terraform scripts to use cloud run services instead of cloud function for helpers.
Moved aggregation helper logic into ingestion helper to consolidate under a single docker image. We can always deploy them as independent services if required.

gemini-code-assist

Code Review

This pull request migrates the import automation infrastructure from Cloud Functions to Cloud Run services, containerizing the ingestion and import helpers and consolidating aggregation logic. Several critical issues were identified: the aggregation_utils.py file is missing from the PR, the SPANNER_GRAPH_DATABASE_ID environment variable is not configured for the new service, and removing DDL management from Terraform will break database initialization. Furthermore, the build process is brittle due to remote schema fetching, and a stale default URL persists in the update script.

gmechali · 2026-05-11T16:33:16Z

 # Fetch proto file from GitHub
 RUN curl -o storage.proto https://raw.githubusercontent.com/datacommonsorg/import/master/pipeline/data/src/main/proto/storage.proto

+RUN curl -o schema.sql https://raw.githubusercontent.com/datacommonsorg/import/master/pipeline/spanner/src/main/resources/spanner_schema.sql


I think you should remove this. The point is was want to use the schema.sql in this directory. With this line, you're overriding the schema.sql to what's in the import/ repo.

And FYI - for the June 15th milestone, we will be switching over to the new schema which I believe will no longer need this
RUN curl -o storage.proto https://raw.githubusercontent.com/datacommonsorg/import/master/pipeline/data/src/main/proto/storage.proto
So we will be able to remove it within a month, and the schema.sql will be the complete schema.

gmechali

Just one change needed for the schema download on the dockerfile but otherwise LGTM.
Pls share this with Sandeep as well, so he doesn't waste time starting from an old branch!

gemini-code-assist Bot reviewed May 11, 2026

View reviewed changes

vish-cs force-pushed the workflow branch from 75c8edd to fc6e1ec Compare May 11, 2026 11:10

vish-cs requested a review from gmechali May 11, 2026 11:18

gmechali reviewed May 11, 2026

View reviewed changes

Comment thread import-automation/workflow/ingestion-helper/schema.sql

Update ingestion helper to use cloud run services

c52f0df

vish-cs force-pushed the workflow branch from fc6e1ec to c52f0df Compare May 11, 2026 15:15

gmechali reviewed May 11, 2026

View reviewed changes

Comment thread import-automation/workflow/import-helper/Dockerfile

gmechali reviewed May 11, 2026

View reviewed changes

gmechali approved these changes May 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update ingestion helper to use cloud run services#1999

Update ingestion helper to use cloud run services#1999
vish-cs wants to merge 1 commit into
datacommonsorg:masterfrom
vish-cs:workflow

vish-cs commented May 11, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gmechali May 11, 2026

Uh oh!

gmechali left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vish-cs commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gmechali May 11, 2026

Choose a reason for hiding this comment

Uh oh!

gmechali left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vish-cs commented May 11, 2026 •

edited

Loading