Skip to content

Add V2_importer to collect advisories from EUVD#2046

Open
Samk1710 wants to merge 5 commits intoaboutcode-org:mainfrom
Samk1710:add-euvd-importer
Open

Add V2_importer to collect advisories from EUVD#2046
Samk1710 wants to merge 5 commits intoaboutcode-org:mainfrom
Samk1710:add-euvd-importer

Conversation

@Samk1710
Copy link
Copy Markdown
Contributor

@Samk1710 Samk1710 commented Nov 26, 2025

Data Source

  • API: https://euvdservices.enisa.europa.eu/api/search
  • Format: JSON with pagination

Dependent on :

Logs:

INFO 2025-11-25 23:53:04.724015 UTC Progress: 100% (452459/452459)
INFO 2025-11-25 23:53:04.735236 UTC Successfully collected 452,441 advisories
INFO 2025-11-25 23:53:04.735444 UTC Step [collect_and_store_advisories] completed in 23064 seconds (6.4 hours)

@Samk1710 Samk1710 mentioned this pull request Nov 26, 2025
Signed-off-by: Sampurna Pyne <sampurnapyne1710@gmail.com>
@ziadhany ziadhany self-requested a review November 26, 2025 12:21
Signed-off-by: Sampurna Pyne <sampurnapyne1710@gmail.com>
Copy link
Copy Markdown
Collaborator

@ziadhany ziadhany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Samk1710 Great start! Just a few small tweaks

Comment thread vulnerabilities/pipelines/v2_importers/euvd_importer.py Outdated
Comment thread vulnerabilities/tests/pipelines/v2_importers/test_euvd_importer_v2.py Outdated
Comment thread vulnerabilities/pipelines/v2_importers/euvd_importer.py
Comment thread vulnerabilities/pipelines/v2_importers/euvd_importer.py Outdated
Comment thread vulnerabilities/pipelines/v2_importers/euvd_importer.py Outdated
@Samk1710
Copy link
Copy Markdown
Contributor Author

@Samk1710 Great start! Just a few small tweaks

Thanks a lot @ziadhany for the review. Will make the changes as suggested.

Signed-off-by: Sampurna Pyne <sampurnapyne1710@gmail.com>
@Samk1710
Copy link
Copy Markdown
Contributor Author

Hey @ziadhany ,

I’ve pushed the requested updates. Summary of changes:

  • Replaced broad exception handling with specific parsing-related exceptions.
  • Updated the test to use util_tests.check_results_against_json along with expected JSON fixtures.
  • Removed unbounded loops by using the API’s total count field to determine total number of pages.
  • Simplified retry handling(Retains a fixed 3-second delay between retries to prevent rapid page skipping in case of network interruptions).

Let me know if you’d like any additional modifications. Thanks again for the feedback and guidance!

Comment thread vulnerabilities/pipelines/v2_importers/euvd_importer.py Outdated
Signed-off-by: Sampurna Pyne <sampurnapyne1710@gmail.com>
@Samk1710
Copy link
Copy Markdown
Contributor Author

Samk1710 commented Dec 1, 2025

I have updated the License Expression and added sample test data from the EUVD API as suggested in today's call.

@Samk1710
Copy link
Copy Markdown
Contributor Author

Samk1710 commented Dec 1, 2025

Hey @pombredanne @ziadhany
Since caching is not a feasible option, we could use the "total" field returned by the API to populate the total advisories count in advisories_count function.

"total": 452844

This would mean the importer would take 5-6 hours and fetch data(only once) in collect_advisories.

@ziadhany
Copy link
Copy Markdown
Collaborator

ziadhany commented Dec 1, 2025

@Samk1710 Yes, we can use the total field. Since we know the total number of advisories, we can iterate over the endpoint using either the date or the advisory count, if that’s available.

the time isn’t a big issue to get all the available data in under a couple of hours.

@Samk1710
Copy link
Copy Markdown
Contributor Author

Samk1710 commented Dec 1, 2025

@Samk1710 Yes, we can use the total field. Since we know the total number of advisories, we can iterate over the endpoint using either the date or the advisory count, if that’s available.

the time isn’t a big issue to get all the available data in under a couple of hours.

yes the date field is available in the API response and if time(5-6 hours) ain't a issue, this would be the simplest approach to avoid double fetching and caching will not be required. Shall I move forward with this approach?

Signed-off-by: Sampurna Pyne <sampurnapyne1710@gmail.com>
@keshav-space
Copy link
Copy Markdown
Member

@Samk1710 here is the repo for the EUVD mirror https://github.com/aboutcode-org/aboutcode-mirror-euvd. For data collection script take a look at the pipeline here https://github.com/aboutcode-org/aboutcode-mirror-nuget-catalog/blob/main/sync_catalog.py. We want to do something similar for EUVD the script will be used in a workflow that will be almost identical to what we have here https://github.com/aboutcode-org/aboutcode-mirror-nuget-catalog/blob/main/.github/workflows/sync.yml

@Samk1710
Copy link
Copy Markdown
Contributor Author

Samk1710 commented Dec 2, 2025

@Samk1710 here is the repo for the EUVD mirror https://github.com/aboutcode-org/aboutcode-mirror-euvd. For data collection script take a look at the pipeline here https://github.com/aboutcode-org/aboutcode-mirror-nuget-catalog/blob/main/sync_catalog.py. We want to do something similar for EUVD the script will be used in a workflow that will be almost identical to what we have here https://github.com/aboutcode-org/aboutcode-mirror-nuget-catalog/blob/main/.github/workflows/sync.yml

thanks a lot @keshav-space. will look into it

@Samk1710
Copy link
Copy Markdown
Contributor Author

Samk1710 commented Dec 8, 2025

Hey, I have added a pipeline to mirror the EUVD advisories

aboutcode-org/aboutcode-mirror-euvd#1

Would look forward to a review so that I can continue this importer. Thanks a lot!

Comment thread vulnerabilities/pipelines/v2_importers/euvd_importer.py
@Samk1710
Copy link
Copy Markdown
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Collect EUVD data

4 participants