-
Notifications
You must be signed in to change notification settings - Fork 0
263 step for alphafold multimer query json generation #271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
AnnaPolensky
merged 25 commits into
crosslinking
from
263-step-for-alphafold-multimer-query-json-generation
Mar 31, 2026
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
cb608b6
feat: add step for generating a json alphafold multimer query
AnnaPolensky 32c657f
feat: add download tab to outputs
AnnaPolensky cfa88ab
merge: merge crosslinking into 263-step-for-alphafold-multimer-query-…
AnnaPolensky fe19dc6
fix: fix broken all_steps after merge
AnnaPolensky ce49260
feat: add download_methods
AnnaPolensky 237f903
feat: add download button for generated alphafold json queries
AnnaPolensky 1268988
refactor: tidy up code
AnnaPolensky 081a912
test: add tests for alphafold multimer query json generation
AnnaPolensky 0a817b5
feat: make sure that at least 2 protein ids or 2 copies of one protei…
AnnaPolensky d791ed8
feat: add input to use a specific seed and allow not only space- but …
AnnaPolensky 068329e
fix: fix broken test
AnnaPolensky ed7bb2a
fix: also accept input separated with comma and space, add success me…
AnnaPolensky 288e01a
feat: add input field for file name of prediction query file
AnnaPolensky 2cdf9dd
fix: step only turns green if file was generated
AnnaPolensky c043e76
fix: address code review feedback
AnnaPolensky dedfba1
refactor: rename alphafold-multimer-query-json-generation to alphafol…
AnnaPolensky 89e030b
merge: merge crosslinking into 263
AnnaPolensky e494242
refactor: add broken refactoring of download outputs
AnnaPolensky a0c8644
refactor: add working refactoring of download outputs
AnnaPolensky 84ff4f8
refactor: add useCertainStepOutputs hook
AnnaPolensky 1fdd897
refactor: remove 'any' type hint
AnnaPolensky 464da38
fix: fix broken tests
AnnaPolensky a6e791f
style: remove unused parts of the first downloads implementation
AnnaPolensky 79a1059
style: remove unused parts of the first downloads implementation - pa…
AnnaPolensky de2604f
chore: fix docstring
AnnaPolensky File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,105 @@ | ||
| import json | ||
| import logging | ||
| import requests | ||
|
|
||
| from backend.protzilla.steps import OutputItem, OutputType | ||
|
|
||
|
|
||
| def generate_alphafold_query_json( | ||
| protein_ids: str, number_copies: str, model_seed: int, name: str | ||
| ) -> dict: | ||
| """ | ||
| Generates an AlphaFold JSON query for a set of UniProt protein IDs. | ||
| For each provided UniProt ID, the corresponding amino acid sequence is fetched | ||
| from the UniProt REST API and added to the query with the specified copy number. | ||
| Format of the json is as defined here: https://github.com/google-deepmind/alphafold/blob/main/server/README.md | ||
|
|
||
| Protein IDs and copy numbers must be provided as space- or comma-separated strings and | ||
| must have the same length. If an invalid copy number is provided or if the | ||
| lengths do not match, an error message is generated and an exception may be raised. | ||
|
|
||
| :param protein_ids: Space- or comma-separated list of UniProt protein IDs (e.g. "P69905 P68871"). | ||
| :param number_copies: Space- or comma-separated list of integers specifying the number of copies | ||
| for each protein ID (e.g. "2 2"). | ||
| :param model_seed: Model seed for the AlphaFold query. If -1 we want AlphaFold to use a random seed. | ||
| :param name: How the AlphaFold job and the generated file should be named. | ||
| :return: dict (messages, downloads), downloads contains a dictionary mapping a generated filename | ||
| to the AlphaFold query JSON string (wrapped in square brackets as required by AlphaFold server) | ||
| :raises ValueError: If the number of copies or the model seeds cannot be parsed as integers. | ||
| :raises requests.exceptions.HTTPError: If fetching a UniProt FASTA sequence fails. | ||
| """ | ||
| messages = [] | ||
|
|
||
| # extract protein_ids and number of copies per id and make sure they have the same length | ||
| uniprot_ids = protein_ids.replace(",", " ").split() | ||
| try: | ||
| copies_per_id = [ | ||
| int(input) for input in number_copies.replace(",", " ").split() | ||
| ] | ||
| except ValueError as e: | ||
| msg = f"Invalid list of number of copies per id: please provide space-separated integers" | ||
| messages.append( | ||
| dict( | ||
| level=logging.ERROR, | ||
| msg=msg, | ||
| ) | ||
| ) | ||
| raise ValueError(msg) | ||
| if len(uniprot_ids) != len(copies_per_id): | ||
| msg = f"There are {len(uniprot_ids)} ids. However, there are {len(copies_per_id)} entries for number of copies. Please make sure that these numbers match." | ||
| messages.append( | ||
| dict( | ||
| level=logging.ERROR, | ||
| msg=msg, | ||
| ) | ||
| ) | ||
| raise ValueError(msg) | ||
| if min(copies_per_id) < 1: | ||
| msg = f"There can't be a non-positive number of copies." | ||
| messages.append( | ||
| dict( | ||
| level=logging.ERROR, | ||
| msg=msg, | ||
| ) | ||
| ) | ||
| raise ValueError(msg) | ||
|
|
||
| # create the json query for alphafold | ||
| query = { | ||
| "name": name, | ||
| "modelSeeds": [], | ||
| "sequences": [], | ||
| "dialect": "alphafoldserver", | ||
| "version": 1, | ||
| } | ||
|
|
||
| if model_seed != -1: | ||
| query["modelSeeds"] = [model_seed] | ||
|
|
||
| for uniprot_id, copies in zip(uniprot_ids, copies_per_id): | ||
| url = f"https://rest.uniprot.org/uniprotkb/{uniprot_id}.fasta" | ||
|
|
||
| response = requests.get(url, timeout=20) | ||
| response.raise_for_status() | ||
|
|
||
| fasta = response.text | ||
| amino_acid_sequence = "".join( | ||
| line.strip() for line in fasta.splitlines() if not line.startswith(">") | ||
| ) | ||
| query["sequences"].append( | ||
| { | ||
| "proteinChain": { | ||
| "sequence": amino_acid_sequence, | ||
| "count": copies, | ||
| } | ||
| } | ||
| ) | ||
| messages.append( | ||
| dict( | ||
| level=logging.INFO, msg=f"Successfully generated a json file for AlphaFold." | ||
| ) | ||
| ) | ||
| return dict( | ||
| messages=messages, | ||
| downloads=OutputItem(output_type=OutputType.DOWNLOAD, value={name: [query]}), | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this label should probably tell the user that the text they enter is only the stem of the filename, since .json is appended automatically