Skip to content

feat(nvd): use go to upload NVD conversion to gcs upon conversion#5099

Merged
jess-lowe merged 34 commits intogoogle:masterfrom
jess-lowe:refactor/nvd-use-gcs
May 5, 2026
Merged

feat(nvd): use go to upload NVD conversion to gcs upon conversion#5099
jess-lowe merged 34 commits intogoogle:masterfrom
jess-lowe:refactor/nvd-use-gcs

Conversation

@jess-lowe
Copy link
Copy Markdown
Contributor

@jess-lowe jess-lowe commented Mar 20, 2026

This pull request introduces a highly optimized, Go-native asynchronous Google Cloud Storage (GCS) upload pipeline for vulnerability converters (nvd-cve-osv and cve-bulk-converter). By transitioning the GCS uploading and synchronization logic from external bash scripts (gcloud storage rsync / sequential CLI wrappers) into a concurrent, thread-safe Go worker pool, this change drastically improves execution speed, reduces resource consumption, and enables unified in-memory caching across multiple years.

Additionally, this refactoring completely decouples the core conversion logic (nvd.CVEToOSV) from the file-system writing side-effects, making the system significantly cleaner, more modular, and easier to test.


Key Enhancements & Architecture

1. Highly Concurrent Go-Native GCS Uploader (gcs Package)

  • Worker Pool Architecture: Introduced the gcs.Helper struct which manages a pool of concurrent goroutines (bucketWorker) communicating via an internal upload queue channel.
  • Hash-Based Smart Uploading: Before pushing any JSON payload to GCS, the worker checks the sha256-hash metadata attribute of the existing object on GCS. If it matches the computed SHA256 of the new record, the upload is skipped. This saves significant network bandwidth, operation count, and time.
  • Resource Cleanup: Robust context handling and graceful termination via CloseAndWait() ensures all active writers and storage client handles are cleanly shut down.

2. Unified Local & Cloud Output Manager (writer Package)

  • Replaced the deprecated upload package with a modern writer package that encapsulates all output-oriented operations.
  • Employs a centralized VulnWorker which handles:
    • Enqueuing asynchronous uploads to GCS via gcs.Helper.
    • Falling back to local disk writes cleanly.
    • Fetching and applying overrides from an overrides GCS bucket path.

3. Decoupled & Optimized NVD CVE Conversion

  • Side-Effect Elimination: Modified nvd.CVEToOSV to return a vulns.Vulnerability object structurally rather than writing directly to disk, facilitating unit testing.
  • Shared In-Memory Caching: The year-by-year loop has been moved from the shell script into Go (main.go). The nvd-cve-osv tool now processes the entire nvd-json-dir directory in a single execution run (processing files in reverse chronological order). This enables the vendor-product repo cache (vpRepoCache) and git repo tags cache (repoTagsCache) to be shared across all years, avoiding redundant git listings and fetching overhead.
  • Deterministic JSON & Hashes: To prevent calculated SHA256 hashes from fluctuating due to map iteration randomness, v.Affected records are now sorted alphabetically by repository name before serializing, guaranteeing reliable cache hit rates.

4. Drastically Simplified Orchestration Scripts

  • Removed Complex Shell logic: Fully eliminated the sequential year-by-year bash loop, local staging directories (gcs_stage), manual find -exec cp operations, and the gcloud storage rsync command pipelines.
  • Streamlined Execution: Both NVD and CVE5 bulk converters now invoke their corresponding Go binaries once, passing --upload-to-gcs=true along with the target bucket name, allowing the Go runtime to manage the entire lifecycle seamlessly.

Verification & Testing

Automated Unit Tests

A comprehensive suite of unit and integration tests has been implemented and verified to pass:

  • GCS Worker Pool Tests: vulnfeeds/gcs-tools/gcs_test.go covers worker initialization, concurrent scheduling, and object metadata validation.
  • Writer Tests: vulnfeeds/conversion/writer/writer_test.go validates concurrent local disk writes, remote GCS uploads, override resolution, and metrics generation.
  • Converter Tests: Verified that decoupled NVD and CVE5 conversion tests pass without side-effects.

To run the tests locally:

cd vulnfeeds
go test ./gcs-tools/...
go test ./conversion/writer/...
go test ./conversion/nvd/...

Manual Verification

  • Ran the run_cve_to_osv_generation.sh script locally with --upload-to-gcs=true to confirm:
    1. Correct execution flow across all years.
    2. Successful caching and lookup of CPE and git repositories.
    3. Parallel execution of 30 concurrent uploader workers enqueuing and transferring objects.
    4. Skipping of identical records where GCS object hashes matched computed local hashes.

@jess-lowe jess-lowe requested review from another-rex and michaelkedar and removed request for another-rex March 20, 2026 03:09
Comment thread vulnfeeds/gcs-tools/gcs.go
Comment thread vulnfeeds/gcs-tools/gcs.go Outdated
@jess-lowe jess-lowe requested a review from another-rex March 20, 2026 05:44
jess-lowe added a commit that referenced this pull request Apr 30, 2026
nvd-cve-osv Cron job doesn't seem to be successfully finishing - it is
currently taking forever to upload and go threshold checks. This should
speed things up hopefully, while waiting for #5099
Copy link
Copy Markdown
Contributor

@another-rex another-rex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, mostly looks good. Have you tested it locally and see how much faster it is compared to the script? (Probably not a big impact here, since locally we have a lot of threads compared to the cronjob)

Comment thread vulnfeeds/cmd/converters/cve/nvd-cve-osv/main.go Outdated
Comment thread vulnfeeds/conversion/writer/writer.go Outdated
Comment thread vulnfeeds/conversion/writer/writer.go Outdated
Comment thread vulnfeeds/conversion/writer/writer.go
another-rex
another-rex previously approved these changes May 5, 2026
Copy link
Copy Markdown
Contributor

@another-rex another-rex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the PR description. Otherwise LGTM

@jess-lowe jess-lowe requested a review from another-rex May 5, 2026 03:31
@jess-lowe jess-lowe merged commit 0f1707e into google:master May 5, 2026
21 checks passed
@jess-lowe jess-lowe deleted the refactor/nvd-use-gcs branch May 5, 2026 04:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants