Skip to content

sudoStacks/retriever-community-cache

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Retreivr Community Cache

This repository is a transport index dataset for Retreivr.

It stores mappings from canonical MusicBrainz recording MBIDs to known-good transport identifiers.

Scope

Canonical mapping model:

recording_mbid -> transport sources

Examples of transport identifiers:

  • YouTube video IDs
  • SoundCloud track IDs (future)
  • Other supported transport IDs (future)

MusicBrainz remains the authoritative source of metadata. This repository does not replicate MusicBrainz entity metadata.

Data Layout

Current dataset namespace:

  • youtube/recording/<prefix>/<recording_mbid>.json
  • youtube/video/<prefix>/<video_id>.json (generated reverse index)

Where:

  • prefix is the first two characters of recording_mbid
  • filename stem equals recording_mbid
  • reverse-index prefix is the first two characters of video_id
  • reverse-index filename stem equals video_id

Record Model

Each record contains:

  • recording_mbid
  • sources[] with transport candidate identifiers and minimal validation fields
  • schema_version

See schema/schema.json for the strict record contract.

Reverse index records contain minimal lookup metadata:

  • video_id
  • recording_mbid
  • confidence
  • verified_at

Reverse index files are generated by promotion tooling and must not be edited manually.

Non-Goals

This repository must not contain:

  • scraped metadata dumps
  • platform search result dumps
  • thumbnails
  • ranking heuristics
  • MusicBrainz entity metadata copies
  • media files or download URLs

CI Guarantees

Validation in .github/workflows/validate.yml enforces:

  • JSON parse validity for dataset files
  • JSON Schema compliance
  • shard-path and filename/MBID consistency
  • duplicate MBID prevention in namespace
  • stats integrity via scripts/generate_stats.py --check

Purpose

The dataset accelerates transport resolution for Retreivr clients while keeping output deterministic, lightweight, and Git-native.

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors