Skip to content

feat!(backend): file sharing - prevent overwrites of uploaded file#6381

Open
anna-parker wants to merge 4 commits into
mainfrom
file_sharing_nooverwrite
Open

feat!(backend): file sharing - prevent overwrites of uploaded file#6381
anna-parker wants to merge 4 commits into
mainfrom
file_sharing_nooverwrite

Conversation

@anna-parker
Copy link
Copy Markdown
Contributor

@anna-parker anna-parker commented May 8, 2026

resolves #4056

The backend now adds "If-None-Match": "*" as a header when requesting presigned URLs on behalf of a user, this prevents writes to the S3 if the S3 already has data - preventing accidental overwrites.

Suggested by @tombch, see details in https://docs.aws.amazon.com/AmazonS3/latest/userguide/conditional-writes.html https://security.stackexchange.com/a/286617

Note that this header is not required for multi-part S3 uploads as the request complete-multipart-upload prevents future modifications of the S3 using the presigned URL.

Breaking change

Clients using presigned URLs (i.e. requested via the /files/request-upload endpoint) now need to add "If-None-Match": "*" to the header when submitting data using the presigned URL (this is because AWS and other S3 providers will block uploads to S3 buckets that do not use the same headers as in the created presigned URL.

PR Checklist

  • Document how users should handle case where upload to S3 is disconnected
  • The implemented feature is covered by appropriate, automated tests.
  • Any manual testing that has been done is documented (i.e. what exactly was tested?)

🚀 Preview: Add preview label to enable

@claude claude Bot added backend related to the loculus backend component preprocessing Issues related to the preprocessing component labels May 8, 2026
@anna-parker anna-parker changed the title feat: prevent overwrites of uploaded file feat(backend): file sharing - prevent overwrites of uploaded file May 8, 2026
@anna-parker anna-parker changed the title feat(backend): file sharing - prevent overwrites of uploaded file feat!(backend): file sharing - prevent overwrites of uploaded file May 8, 2026
@anna-parker anna-parker marked this pull request as ready for review May 9, 2026 09:13
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 9, 2026

Claude finished @anna-parker's task in 2m 59s —— View job


Code Review

  • Gather context and understand the PR
  • Read changed files
  • Review code quality, security, testing, and documentation
  • Post final review

The approach is correct: signing the presigned URL with If-None-Match: * and returning the required headers to clients is exactly the right way to implement this — clients can't independently determine which headers are required since they're embedded in the signature.

The test updates are thorough and consistent: all uploadFile call sites now pass fileIdAndUrl.headers, and the new uploadFile helper on SubmissionConvenienceClient is a clean refactor. The Python FileUploadInfo dataclass correctly gains a headers field.

A few issues noted via inline comments:

preprocessing/nextclade/src/loculus_preprocessing/backend.py:219upload_embl_file_to_presigned_url hardcodes "If-None-Match": "*" rather than using upload_info.headers from request_upload(). The headers returned by the server are already available at the call site in prepro.py but are discarded. If required headers change server-side, this function will silently drift. Fix this →

RequestUploadEndpointTest.kt:88 — The test verifies a single upload succeeds with If-None-Match: *, but there's no test for the overwrite-prevention behaviour (a second upload to the same URL should return 412). Also: the header is hardcoded rather than read from the response JSON, so the test doesn't verify the end-to-end client flow as documented.

RequestUploadEndpointTest.kt — Tests asserting response structure check fileId and url but not the new headers field. Since this is a breaking change for clients, asserting headers contains {"If-None-Match": "*"} would be valuable.

FileTypes.kt:30 — Minor: the headers default is hardcoded independently of what S3Service signs into the URL. A future change to S3Service would need to update both places. Not a blocker for this PR but worth keeping in mind.

Remaining checklist items: The PR checklist still has open items (docs for disconnected-upload case, manual testing documentation). The disconnected-upload scenario (user gets presigned URL but upload never completes) is worth documenting since clients will now get 412 on retry with the same URL — they'd need to call /files/request-upload again to get a fresh URL.

def upload_embl_file_to_presigned_url(content: str, url: str) -> None:
headers = {"Content-Type": "chemical/x-embl-dl-nucleotide"}
headers = {"Content-Type": "chemical/x-embl-dl-nucleotide", "If-None-Match": "*"}
r = requests.put(url, data=content.encode("utf-8"), headers=headers, timeout=60)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function hardcodes If-None-Match: * instead of using the headers field from FileUploadInfo. Meanwhile, prepro.py already fetches those headers via request_upload() but discards them. This duplicates the contract: if the server-side required headers change (e.g., adding a checksum header), this function will silently diverge.

Consider accepting the headers from FileUploadInfo and merging them in:

def upload_embl_file_to_presigned_url(content: str, url: str, extra_headers: dict | None = None) -> None:
    headers = {"Content-Type": "chemical/x-embl-dl-nucleotide", **(extra_headers or {})}
    r = requests.put(url, data=content.encode("utf-8"), headers=headers, timeout=60)

And at the call site in prepro.py:

upload_embl_file_to_presigned_url(file_content, url, upload_info.headers)

description = "HTTP headers that must be included when making the PUT request to the presigned URL.",
example = "{\"If-None-Match\": \"*\"}",
)
val headers: Map<String, String> = mapOf("If-None-Match" to "*"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The headers field is hardcoded here independently of what S3Service.createUrlToUploadPrivateFile actually signs into the URL. If the signed headers change on the S3 side without updating FileTypes.kt, clients will get a mismatch. A long-term improvement would be to derive this from the presigned request rather than hardcoding both independently, though for the current single-header case this is fine.

…m response (#6386)

Addresses [@anna-parker's review on
#6381](#6381 (comment)).

## Summary
- Refactors `GIVEN a request for a URL THEN returns a valid presigned
URL` to read the required upload headers from the `headers` field on the
response JSON instead of hardcoding `If-None-Match: *`. This exercises
the same client-side flow that real callers (e.g. the preprocessing
pipeline) are expected to follow.
- Adds a new test, `GIVEN a presigned URL has been used to upload THEN a
second upload to the same URL fails`, that uses the same presigned URL
twice and asserts the second PUT is rejected with HTTP 412 — the
overwrite-prevention guarantee that motivated #6381.

## Test plan
- [x] `./gradlew test --tests
'org.loculus.backend.controller.files.RequestUploadEndpointTest'` — all
16 tests pass, including the new one (412 returned by MinIO on the
second PUT).
- [x] `./gradlew ktlintFormat` — no changes.

This PR is targeted at the `file_sharing_nooverwrite` branch so it can
land alongside #6381.

🚀 Preview: Add `preview` label to enable

Co-authored-by: theosanderson-agent <theo@theo.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend related to the loculus backend component preprocessing Issues related to the preprocessing component

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make files uneditable after submission

2 participants