feat(nvd): use go to upload NVD conversion to gcs upon conversion#5099
Merged
jess-lowe merged 34 commits intogoogle:masterfrom May 5, 2026
Merged
feat(nvd): use go to upload NVD conversion to gcs upon conversion#5099jess-lowe merged 34 commits intogoogle:masterfrom
jess-lowe merged 34 commits intogoogle:masterfrom
Conversation
another-rex
reviewed
Mar 20, 2026
…sv.dev into refactor/nvd-use-gcs
jess-lowe
added a commit
that referenced
this pull request
Apr 30, 2026
nvd-cve-osv Cron job doesn't seem to be successfully finishing - it is currently taking forever to upload and go threshold checks. This should speed things up hopefully, while waiting for #5099
another-rex
reviewed
May 1, 2026
Contributor
another-rex
left a comment
There was a problem hiding this comment.
Nice, mostly looks good. Have you tested it locally and see how much faster it is compared to the script? (Probably not a big impact here, since locally we have a lot of threads compared to the cronjob)
another-rex
reviewed
May 1, 2026
another-rex
previously approved these changes
May 5, 2026
Contributor
another-rex
left a comment
There was a problem hiding this comment.
Please update the PR description. Otherwise LGTM
another-rex
approved these changes
May 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces a highly optimized, Go-native asynchronous Google Cloud Storage (GCS) upload pipeline for vulnerability converters (
nvd-cve-osvandcve-bulk-converter). By transitioning the GCS uploading and synchronization logic from external bash scripts (gcloud storage rsync/ sequential CLI wrappers) into a concurrent, thread-safe Go worker pool, this change drastically improves execution speed, reduces resource consumption, and enables unified in-memory caching across multiple years.Additionally, this refactoring completely decouples the core conversion logic (
nvd.CVEToOSV) from the file-system writing side-effects, making the system significantly cleaner, more modular, and easier to test.Key Enhancements & Architecture
1. Highly Concurrent Go-Native GCS Uploader (
gcsPackage)gcs.Helperstruct which manages a pool of concurrent goroutines (bucketWorker) communicating via an internal upload queue channel.sha256-hashmetadata attribute of the existing object on GCS. If it matches the computed SHA256 of the new record, the upload is skipped. This saves significant network bandwidth, operation count, and time.CloseAndWait()ensures all active writers and storage client handles are cleanly shut down.2. Unified Local & Cloud Output Manager (
writerPackage)uploadpackage with a modernwriterpackage that encapsulates all output-oriented operations.VulnWorkerwhich handles:gcs.Helper.3. Decoupled & Optimized NVD CVE Conversion
nvd.CVEToOSVto return avulns.Vulnerabilityobject structurally rather than writing directly to disk, facilitating unit testing.main.go). Thenvd-cve-osvtool now processes the entirenvd-json-dirdirectory in a single execution run (processing files in reverse chronological order). This enables the vendor-product repo cache (vpRepoCache) and git repo tags cache (repoTagsCache) to be shared across all years, avoiding redundant git listings and fetching overhead.v.Affectedrecords are now sorted alphabetically by repository name before serializing, guaranteeing reliable cache hit rates.4. Drastically Simplified Orchestration Scripts
gcs_stage), manualfind -exec cpoperations, and thegcloud storage rsynccommand pipelines.--upload-to-gcs=truealong with the target bucket name, allowing the Go runtime to manage the entire lifecycle seamlessly.Verification & Testing
Automated Unit Tests
A comprehensive suite of unit and integration tests has been implemented and verified to pass:
vulnfeeds/gcs-tools/gcs_test.gocovers worker initialization, concurrent scheduling, and object metadata validation.vulnfeeds/conversion/writer/writer_test.govalidates concurrent local disk writes, remote GCS uploads, override resolution, and metrics generation.To run the tests locally:
Manual Verification
run_cve_to_osv_generation.shscript locally with--upload-to-gcs=trueto confirm: