Skip to content

Add an overwrite boolean when modifying repo content#7551

Open
daviddavis wants to merge 1 commit intopulp:mainfrom
daviddavis:safe-update
Open

Add an overwrite boolean when modifying repo content#7551
daviddavis wants to merge 1 commit intopulp:mainfrom
daviddavis:safe-update

Conversation

@daviddavis
Copy link
Copy Markdown
Contributor

Add a overwrite boolean parameter to endpoints which modify repository content. When set to false, the update checks whether any content being added would overwrite existing content (based on repo_key_fields) and returns a 409 Conflict response with details instead of silently replacing it.

fixes #7550

Add a `overwrite` boolean parameter to endpoints which modify repository
content. When set to false, the update checks whether any content being
added would overwrite existing content (based on repo_key_fields) and
returns a 409 Conflict response with details instead of silently
replacing it.

fixes pulp#7550
Copy link
Copy Markdown
Contributor

@gerrod3 gerrod3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added the same feature (in a slightly different manner) to pulp-python to prevent the same problem. pulp/pulp_python#1163

One of things I was concerned about was the performance of trying to validate the modify call, at request time, will not produce an overwrite against the potential thousands of units a user can call with. Have you tested this against your own instance? How large are the modify calls you typically use?

continue

pulp_type = type_obj.get_pulp_type()
incoming_qs = type_obj.objects.filter(pk__in=add_content.filter(pulp_type=pulp_type))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we have this pattern in our repo-version utils file, but I wonder if it is more performant than a normal type_obj.objects.filter(pk__in=add_content). Theoretically it should filter down the number of content to check to only those of the correct type, but does it actually increase the speed?

}
incoming_by_key = {
tuple(row[f] for f in repo_key_fields): row["pk"]
for row in incoming_qs.filter(find_dup_qs).values("pk", *repo_key_fields)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this query if we add pk to the batch_qs(incoming_qs.values(*repo_key_fields) at the top, then create this dictionary during the assembling of the find_dup_qs.

@daviddavis
Copy link
Copy Markdown
Contributor Author

daviddavis commented Apr 3, 2026

@gerrod3 very interesting. I haven't tested this in our instance yet--we actually validate requests and check them for conflicts/substitutions before passing them to Pulp. We do this using the Pulp API so as you can imagine it's slow and not a great solution.

Performance has been an issue with our current solution but luckily most of our users are only adding one or two packages to a repo at a time. I think adding 30 packages is the biggest request size we see.

Your solution looks really clean. I wonder if it could be moved into pulpcore? The one issue though that would prevent us from being able to adopt it is that our repos are shared by multiple teams/users. So if two users are adding packages to a repo around the same timeframe, one user might want to overwrite packages while the other might not. Perhaps one solution might be to also have a allow_package_substitution parameter for repo modify and package upload that override the repo field?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a way to reject repo content updates if overwrites would occur

2 participants