Skip to content

Fix description cache invalidation#1715

Open
emphor11 wants to merge 2 commits intoopenml:mainfrom
emphor11:fix-description-cache-invalidation
Open

Fix description cache invalidation#1715
emphor11 wants to merge 2 commits intoopenml:mainfrom
emphor11:fix-description-cache-invalidation

Conversation

@emphor11
Copy link

Metadata

Details

This PR implements time-based cache invalidation for dataset descriptions in _get_dataset_description.

Currently, the dataset description (description.xml) is cached indefinitely once downloaded. However, dataset metadata such as status (e.g., active or deactivated) can change on the server. Because the cache never expires, users may receive outdated dataset metadata if the local cache becomes stale.

This change introduces a simple TTL (time-to-live) mechanism for the cached description file:

  • A module-level constant DESCRIPTION_CACHE_TTL is introduced (24 hours).
  • When _get_dataset_description is called, the modification time of the cached description.xml file is checked.
  • If the file is older than the TTL, it is removed and the description is fetched again from the server.

This ensures that cached dataset descriptions are periodically refreshed while still preserving the existing caching behavior and minimizing additional API calls.

The change addresses the existing TODO in the code suggesting that the cache should invalidate itself after some time.

@emphor11
Copy link
Author

Hi @fkiraly
Made changes to implement time-based cache invalidation for _get_dataset_description, addressing the TODO about cache invalidation.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement time-based cache invalidation for dataset description cache

1 participant