Remove content app redirect for manifest fetching#2358
Conversation
848ee41 to
5ad30c0
Compare
20d478f to
a3b1c6e
Compare
| blob.touch() | ||
| except models.Blob.DoesNotExist: | ||
| raise BlobNotFound(digest=pk) | ||
| if pk == EMPTY_BLOB: |
There was a problem hiding this comment.
can you explain (and maybe document) this a bit better? What is EMPTY_BLOB and why does it redirect?
There was a problem hiding this comment.
It's a layer that has nothing in it. Each layer(blob) is a gzip-compressed tar and the empty blob is a no-op layer used in images, so we handle it specially and have the empty compressed bytes ready to serve if requested. See the _empty_blob() method in registry.py, there are still bytes that need to be served so that is why it was originally redirected to the content app.
There was a problem hiding this comment.
The name pk is a bit confusing because the EMPTY_BLOB constant seems to be a checksum string
There was a problem hiding this comment.
Do you want me to rename pk? It's just a path parameter of the url and it's equal to the blob's digest field which is the unique field of the blob, but not its pulp_id (db pk).
There was a problem hiding this comment.
It's not needed necessarily, but it would be good to distinguish from pk in the database sense. Not blocking
| blob.touch() | ||
| return redirects.issue_blob_redirect(blob) | ||
|
|
||
| def create_on_demand_blob(self, digest, remote, blob_url): |
There was a problem hiding this comment.
Missing doc comments - would be nice if all functions added here had them
There was a problem hiding this comment.
Also, even in the pull-through cache, you are using on-demand?
There was a problem hiding this comment.
Yep, because we redirect to the content app we can use the content app's ability to stream the artifact from the remote while saving it to Pulp. So that is why we create the blob as on-demand before redirecting to the content app. Technically the content app also has the ability to create the content for pull-through at request time, but that logic is complicated and we already overwrite the content app for pulp-container and have special operations for pending_blobs, so I didn't think it was worth it to try to integrate into pulpcore's pull-through machinery.
fixes: pulp#1974 Assisted by: claude-sonnet-4.6
Fixes #1974
https://redhat.atlassian.net/browse/PULP-486
To properly remove the redirect for manifest fetching I had to remove the PullThroughCaching redirect logic as well. Now the entirety of PullThroughCaching lives on the registry_api side, no more content app logic needed. These are the three scenarios we were handling for pull-through caching:
Upon first fetch you need to create the tag and download the manifest and initiate a task to download the rest of the image in the background (
tag=None, local_manifest=None). If the upstream manifest already exists on Pulp you simply need to create the tag and add it and the image's content to the repo (tag=None, local_manifest is not None). On the second fetch of a tag with pull-through you need to check to see if the tag has been updated on the remote. There are three scenarios for this: one - new upstream manifest that needs to be downloaded to Pulp (tag is not None, local_manifest=None), two - new upstream manifest already in Pulp needs to be added to repo (tag is not None, local_manifest is not None), and three - no changes on upstream (tag is not None, tag.manifest.digest == local_manifest.digest). If the tag doesn't exist on the remote, but we have the tag locally then we should serve our local copy (tag is not None, fetch_manifest raised ApiException).When fetching a manifest by digest we either have it or we don't. If we don't, we check the upstream if it's there. If yes, then we check if the manifest is in Pulp locally and just needs to be added to the repo, or if we need to download the whole image in the background.
When fetching a blob and we don't have it in the repository we check to see if the remote has the blob. If so, we create the blob as an on-demand content and redirect to the content-app for it to finish streaming the blob from the upstream. We also add it to the
pending_blobsto speed up future requests.📜 Checklist
See: Pull Request Walkthrough