feat!(backend): file sharing - prevent overwrites of uploaded file#6381
feat!(backend): file sharing - prevent overwrites of uploaded file#6381anna-parker wants to merge 4 commits into
Conversation
|
Claude finished @anna-parker's task in 2m 59s —— View job Code Review
The approach is correct: signing the presigned URL with The test updates are thorough and consistent: all A few issues noted via inline comments:
Remaining checklist items: The PR checklist still has open items (docs for disconnected-upload case, manual testing documentation). The disconnected-upload scenario (user gets presigned URL but upload never completes) is worth documenting since clients will now get 412 on retry with the same URL — they'd need to call |
| def upload_embl_file_to_presigned_url(content: str, url: str) -> None: | ||
| headers = {"Content-Type": "chemical/x-embl-dl-nucleotide"} | ||
| headers = {"Content-Type": "chemical/x-embl-dl-nucleotide", "If-None-Match": "*"} | ||
| r = requests.put(url, data=content.encode("utf-8"), headers=headers, timeout=60) |
There was a problem hiding this comment.
The function hardcodes If-None-Match: * instead of using the headers field from FileUploadInfo. Meanwhile, prepro.py already fetches those headers via request_upload() but discards them. This duplicates the contract: if the server-side required headers change (e.g., adding a checksum header), this function will silently diverge.
Consider accepting the headers from FileUploadInfo and merging them in:
def upload_embl_file_to_presigned_url(content: str, url: str, extra_headers: dict | None = None) -> None:
headers = {"Content-Type": "chemical/x-embl-dl-nucleotide", **(extra_headers or {})}
r = requests.put(url, data=content.encode("utf-8"), headers=headers, timeout=60)And at the call site in prepro.py:
upload_embl_file_to_presigned_url(file_content, url, upload_info.headers)| description = "HTTP headers that must be included when making the PUT request to the presigned URL.", | ||
| example = "{\"If-None-Match\": \"*\"}", | ||
| ) | ||
| val headers: Map<String, String> = mapOf("If-None-Match" to "*"), |
There was a problem hiding this comment.
The headers field is hardcoded here independently of what S3Service.createUrlToUploadPrivateFile actually signs into the URL. If the signed headers change on the S3 side without updating FileTypes.kt, clients will get a mismatch. A long-term improvement would be to derive this from the presigned request rather than hardcoding both independently, though for the current single-header case this is fine.
…m response (#6386) Addresses [@anna-parker's review on #6381](#6381 (comment)). ## Summary - Refactors `GIVEN a request for a URL THEN returns a valid presigned URL` to read the required upload headers from the `headers` field on the response JSON instead of hardcoding `If-None-Match: *`. This exercises the same client-side flow that real callers (e.g. the preprocessing pipeline) are expected to follow. - Adds a new test, `GIVEN a presigned URL has been used to upload THEN a second upload to the same URL fails`, that uses the same presigned URL twice and asserts the second PUT is rejected with HTTP 412 — the overwrite-prevention guarantee that motivated #6381. ## Test plan - [x] `./gradlew test --tests 'org.loculus.backend.controller.files.RequestUploadEndpointTest'` — all 16 tests pass, including the new one (412 returned by MinIO on the second PUT). - [x] `./gradlew ktlintFormat` — no changes. This PR is targeted at the `file_sharing_nooverwrite` branch so it can land alongside #6381. 🚀 Preview: Add `preview` label to enable Co-authored-by: theosanderson-agent <theo@theo.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
resolves #4056
The backend now adds
"If-None-Match": "*"as a header when requesting presigned URLs on behalf of a user, this prevents writes to the S3 if the S3 already has data - preventing accidental overwrites.Suggested by @tombch, see details in https://docs.aws.amazon.com/AmazonS3/latest/userguide/conditional-writes.html https://security.stackexchange.com/a/286617
Note that this header is not required for multi-part S3 uploads as the request
complete-multipart-uploadprevents future modifications of the S3 using the presigned URL.Breaking change
Clients using presigned URLs (i.e. requested via the
/files/request-uploadendpoint) now need to add"If-None-Match": "*"to the header when submitting data using the presigned URL (this is because AWS and other S3 providers will block uploads to S3 buckets that do not use the same headers as in the created presigned URL.PR Checklist
🚀 Preview: Add
previewlabel to enable