feat: add badge endpoint for IPFS availability badges#121
feat: add badge endpoint for IPFS availability badges#121Patrick-Ehimen wants to merge 1 commit into
Conversation
Add an opt-in badge endpoint that generates shields.io-style SVG badges showing real-time IPFS provider availability for a given CID. Features: - Color-coded badges: green (2+ providers), yellow (1), red (0), gray (error/pending) - Shows provider count and last 6 characters of CID - Background checks with non-blocking "checking..." badge - In-memory cache with configurable TTL (default 24h) - Configurable via --enable-badges flag and environment variables Closes ipfs#118
🚀 Build Preview on IPFS ready
|
|
Triage: test / ensure this is an opt-in feature. |
lidel
left a comment
There was a problem hiding this comment.
Hi @Patrick-Ehimen, thanks for the contribution and for adding unit tests.
That said, I can't merge this in its current form. The handler reflects the unvalidated cid query parameter into an image/svg+xml response without escaping, which is an XSS-class issue: the 6-char getCIDSuffix cap happens to block a working PoC today, but a future tweak to the suffix length or status wording would turn it into a real XSS. The same lack of input validation, combined with an unbounded cache and unbounded background goroutines, also makes the endpoint trivially abusable once enabled.
See inline comments with concrete fix suggestions (input validation, SVG escaping, dedup race, cache bounds, Cache-Control on pending/error responses). Happy to take another look once those are addressed. Thanks again!
| func (h *BadgeHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { | ||
| w.Header().Set("Access-Control-Allow-Origin", "*") | ||
|
|
||
| cidStr := r.URL.Query().Get("cid") | ||
| if cidStr == "" { | ||
| http.Error(w, "missing 'cid' query parameter", http.StatusBadRequest) | ||
| return | ||
| } | ||
|
|
||
| cidSuffix := getCIDSuffix(cidStr) | ||
|
|
||
| // Try to get from cache first (includes completed results) | ||
| if entry, ok := h.cache.Get(cidStr); ok { | ||
| if !entry.Pending { | ||
| // Completed result - serve it | ||
| h.serveSVG(w, entry.SVG) | ||
| return | ||
| } | ||
| // Still pending - serve the pending badge | ||
| h.serveSVG(w, entry.SVG) | ||
| return | ||
| } | ||
|
|
||
| // Check if already pending (avoid duplicate background checks) | ||
| if h.cache.IsPending(cidStr) { | ||
| svg := generatePendingBadgeSVG(cidSuffix) | ||
| h.serveSVG(w, svg) | ||
| return | ||
| } | ||
|
|
||
| // Not in cache and not pending - start background check | ||
| pendingSVG := generatePendingBadgeSVG(cidSuffix) | ||
| h.cache.SetPending(cidStr, cidSuffix, pendingSVG) | ||
|
|
||
| // Trigger background check | ||
| go h.runBackgroundCheck(cidStr, cidSuffix) | ||
|
|
||
| // Return pending badge immediately | ||
| h.serveSVG(w, pendingSVG) | ||
| } |
There was a problem hiding this comment.
cid before doing any work
Right now cidStr comes straight from the query string and is used as the cache key, the SVG status text, and the input to a background runCidCheck goroutine. The first time we actually try to parse it is inside the goroutine, via resolveInput.
Because the endpoint is opt-in but CORS-open (*), anyone can hit /badge?cid=<random> in a loop and trigger:
- a new entry in
BadgeCache(no max-size bound, default 24h TTL), - a new goroutine doing a full DHT + IPNI lookup with a 30s context,
- a cached "error" badge that then sits there for 24h.
Easy fix that closes most of this: parse and validate up-front (the same resolveInput call you already use, or at minimum cid.Decode with a fallback to the IPNS/DNSLink path), return 400 for anything that isn't a valid CID / IPNS name / DNSLink, and only then proceed.
| // buildBadgeSVG constructs the SVG markup for a badge with given label, status, and color. | ||
| func buildBadgeSVG(label, status, color string) []byte { | ||
| // Calculate widths based on text length (approximate) | ||
| statusWidth := len(status)*7 + 10 | ||
| totalWidth := labelWidth + statusWidth | ||
| labelX := labelWidth / 2 | ||
| statusX := labelWidth + statusWidth/2 | ||
|
|
||
| svg := fmt.Sprintf(`<svg xmlns="http://www.w3.org/2000/svg" width="%d" height="%d"> | ||
| <linearGradient id="b" x2="0" y2="100%%"> | ||
| <stop offset="0" stop-color="#bbb" stop-opacity=".1"/> | ||
| <stop offset="1" stop-opacity=".1"/> | ||
| </linearGradient> | ||
| <clipPath id="a"> | ||
| <rect width="%d" height="%d" rx="3" fill="#fff"/> | ||
| </clipPath> | ||
| <g clip-path="url(#a)"> | ||
| <rect width="%d" height="%d" fill="#555"/> | ||
| <rect x="%d" width="%d" height="%d" fill="%s"/> | ||
| <rect width="%d" height="%d" fill="url(#b)"/> | ||
| </g> | ||
| <g fill="#fff" text-anchor="middle" font-family="DejaVu Sans,Verdana,Geneva,sans-serif" font-size="11"> | ||
| <text x="%d" y="15" fill="#010101" fill-opacity=".3">%s</text> | ||
| <text x="%d" y="14">%s</text> | ||
| <text x="%d" y="15" fill="#010101" fill-opacity=".3">%s</text> | ||
| <text x="%d" y="14">%s</text> | ||
| </g> | ||
| </svg>`, | ||
| totalWidth, badgeHeight, | ||
| totalWidth, badgeHeight, | ||
| labelWidth, badgeHeight, | ||
| labelWidth, statusWidth, badgeHeight, color, | ||
| totalWidth, badgeHeight, | ||
| labelX, label, | ||
| labelX, label, | ||
| statusX, status, | ||
| statusX, status, | ||
| ) | ||
|
|
||
| return []byte(svg) | ||
| } |
There was a problem hiding this comment.
status carries user input (the last 6 chars of the raw cid query param) and is dropped into <text>...</text> with fmt.Sprintf, no escaping. The 6-char window makes a clean <script> injection impractical today, so this is more defense-in-depth than an exploitable bug, but the response is image/svg+xml (which browsers happily parse as XML, scripts and all) so it's a sharp edge worth removing.
Run label and status through html.EscapeString (or xml.EscapeText) before formatting. Cheap, and means a future change to the suffix length or wording can't accidentally turn this into a real XSS.
| type BadgeCache struct { | ||
| mu sync.RWMutex | ||
| items map[string]*BadgeCacheEntry | ||
| ttl time.Duration | ||
| } | ||
|
|
||
| // BadgeCacheEntry holds cached badge data for a CID. | ||
| type BadgeCacheEntry struct { | ||
| ProviderCount int | ||
| CIDSuffix string | ||
| SVG []byte | ||
| Timestamp time.Time | ||
| Pending bool // true if check is in progress | ||
| } | ||
|
|
||
| // NewBadgeCache creates a new badge cache with the specified TTL. | ||
| func NewBadgeCache(ttl time.Duration) *BadgeCache { | ||
| cache := &BadgeCache{ | ||
| items: make(map[string]*BadgeCacheEntry), | ||
| ttl: ttl, | ||
| } | ||
| // Start background cleanup goroutine | ||
| go cache.cleanupLoop() | ||
| return cache | ||
| } |
There was a problem hiding this comment.
nit: bound the cache and give it a Stop()
Two small changes that go a long way once the input is validated:
- Cap entries with a
MaxEntries(e.g. 10k) and evict on insertion when full. Even random eviction is fine for v1, LRU is nicer, 2Q even better. Otherwise the only thing keeping memory bounded is the 24h TTL plus a cleanup tick that runs every TTL/2. NewBadgeCachestartsgo cache.cleanupLoop()with no way to stop it. Add aStop()(close a channel, exit the loop) and call it on server shutdown. As-is, every test that creates a cache leaks a ticker + goroutine.
| // Try to get from cache first (includes completed results) | ||
| if entry, ok := h.cache.Get(cidStr); ok { | ||
| if !entry.Pending { | ||
| // Completed result - serve it | ||
| h.serveSVG(w, entry.SVG) | ||
| return | ||
| } | ||
| // Still pending - serve the pending badge | ||
| h.serveSVG(w, entry.SVG) | ||
| return | ||
| } | ||
|
|
||
| // Check if already pending (avoid duplicate background checks) | ||
| if h.cache.IsPending(cidStr) { | ||
| svg := generatePendingBadgeSVG(cidSuffix) | ||
| h.serveSVG(w, svg) | ||
| return | ||
| } | ||
|
|
||
| // Not in cache and not pending - start background check | ||
| pendingSVG := generatePendingBadgeSVG(cidSuffix) | ||
| h.cache.SetPending(cidStr, cidSuffix, pendingSVG) | ||
|
|
||
| // Trigger background check | ||
| go h.runBackgroundCheck(cidStr, cidSuffix) | ||
|
|
||
| // Return pending badge immediately | ||
| h.serveSVG(w, pendingSVG) |
There was a problem hiding this comment.
The "avoid duplicate background checks" guarantee races
The dedup logic is split across three independent locked calls:
if entry, ok := h.cache.Get(cidStr); ok { ... }
if h.cache.IsPending(cidStr) { ... }
h.cache.SetPending(cidStr, ...)
go h.runBackgroundCheck(...)Two concurrent first-time requests for the same CID can both pass Get and IsPending, both hit SetPending, and both spawn a goroutine. The PR description claims this is prevented, but the implementation does not enforce it.
Consider collapsing into one atomic call on BadgeCache, e.g. TryStartCheck(cid, suffix, pendingSVG) (entry *BadgeCacheEntry, started bool) that does the get-or-set under a single write lock and tells the caller whether they should kick off the goroutine.
Bonus: the twoif entry.Pending branches at lines 150-157 currently do the same thing (both call serveSVG(w, entry.SVG)). Either drop the branch or make the pending case actually behave differently.
| func (h *BadgeHandler) serveSVG(w http.ResponseWriter, svg []byte) { | ||
| w.Header().Set("Content-Type", "image/svg+xml") | ||
| w.Header().Set("Cache-Control", "public, max-age=3600") // Browser cache for 1 hour | ||
| w.Write(svg) | ||
| } |
There was a problem hiding this comment.
nit: don't cache pending/error responses for an hour
Cache-Control: public, max-age=3600 is sent unconditionally, including on the "checking..." pending response. That means a user's first request can be cached at the browser/CDN for an hour, even though the real result lands seconds later, which defeats the polling pattern badges normally rely on.
Two related tweaks:
- pending/error responses: return different
Cache-Control: no-cache(ormax-age=10). Settled results: keepmax-age=3600, or align it with--badge-cache-ttl. - in
BadgeCache, give error entries a much shorter TTL than success entries (something like 1-5 min). Today a transient DHT hiccup turns into a 24h visible "error" badge.
| // Badge endpoint (opt-in) | ||
| if badgeConfig != nil && badgeConfig.Enabled { | ||
| badgeCache := NewBadgeCache(badgeConfig.CacheTTL) | ||
| badgeHandler := NewBadgeHandler(d, badgeCache, badgeConfig.CheckTimeout) | ||
| http.Handle("/badge", badgeHandler) | ||
| log.Printf("Badge endpoint enabled at http://%s/badge", webAddr) | ||
| } |
There was a problem hiding this comment.
nit: instrument /badge like /check
/check runs through the Prometheus counter / duration / in-flight middleware at line 265-274.
/badge is registered raw, so we get no visibility into traffic, latency, or in-flight count for an endpoint that triggers background DHT/IPNI work. Worth wrapping with the same chain so operators can actually see what's going on.
Summary
/badgeendpoint that generates shields.io-style SVG badges showing real-time IPFS provider availability--enable-badgesflag andIPFS_CHECK_ENABLE_BADGESenvironment variableTest plan
Closes #118