Skip to content

263 step for alphafold multimer query json generation#271

Open
AnnaPolensky wants to merge 17 commits intocrosslinkingfrom
263-step-for-alphafold-multimer-query-json-generation
Open

263 step for alphafold multimer query json generation#271
AnnaPolensky wants to merge 17 commits intocrosslinkingfrom
263-step-for-alphafold-multimer-query-json-generation

Conversation

@AnnaPolensky
Copy link
Copy Markdown
Collaborator

@AnnaPolensky AnnaPolensky commented Mar 2, 2026

Description

fixes #263
Added a new Output-Tab "Downloads":
grafik
Therefore, introduced a new category of step methods, so that we have calc_method, plot_method and download_method.
Used the new download_method, to add a new step "AlphaFoldMultimerQueryJsonGeneration". User can input a list of uniprot ids and how many copies he wants to have of each protein. One can also add a seed if one wants to use a specific seed. Then a json-File is being generated that can be downloaded via the button in the Downloads-Tab.
grafik

Changes

backend/protzilla/importing/query_generation.py contains the method for generating the json
backend/protzilla/steps.py contains most of the changes for adding a download_method
frontend/src/components/app/run-screen/run-screen.tsx contains frontend changes for displaying the download-Tab and generating a button for each download.

Testing

  1. Add the query generation step (section importing).
  2. Enter protein ids (e.g. I tested with "O43242 O432432" and "2 2", although I believe that that is a query that does not make sense from a biological point of view).
  3. Enter a seed or not. (Maybe try both versions.)
  4. Calculate step.
  5. Go to download-Tab and hit the download button.
  6. See that the downloaded file follows this format: https://github.com/google-deepmind/alphafold/blob/main/server/README.md
  7. (optional) Go to https://alphafoldserver.com/, continue with your Google-Account and check that you can upload the json:
grafik

Feel free to try different ids and different numbers of copies or try to break the step by putting in different input compared to the specified format.

PR checklist

Development

  • If necessary, I have updated the documentation (README, docstrings, etc.)
  • If necessary, I have created / updated tests.

Mergeability

  • main-branch has been merged into local branch to resolve conflicts
  • The tests and linter have passed AFTER local merge
  • The backend code has been formatted with black
  • The frontend code has been formatted with pnpm format and checked with pnpm lint

Code review

  • I have self-reviewed my code.
  • At least one other developer reviewed and approved the changes

@AnnaPolensky AnnaPolensky requested review from Elena-kal and tE3m March 2, 2026 12:26
@AnnaPolensky AnnaPolensky self-assigned this Mar 2, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 2, 2026

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  backend/main
  views.py 624-650
  views_settings.py 238-251, 255-259, 263-315, 326-340, 347-368, 372-447, 453, 464-482, 486-556, 562, 624-630
  backend/protzilla
  form.py
  form_helper.py
  networking.py 27-32
  run.py 361, 365
  steps.py 143-145, 245-252, 281, 308, 471, 474, 478, 483, 569-577, 681-689
  backend/protzilla/data_analysis
  crosslinking_validation.py 56, 246-247, 317-330
  dimension_reduction.py
  plots.py
  backend/protzilla/data_integration
  enrichment_analysis.py
  enrichment_analysis_gsea.py
  backend/protzilla/importing
  alphafold_protein_structure_load.py 125, 208-211, 235-239, 304-307, 315-318, 361-362, 367, 371, 383, 416-420, 521-524, 556-562, 609-611, 654-656, 664-665, 674-682, 730-732, 743-744, 755-758, 863-864, 878, 900-902
  crosslinking_import.py 148, 152, 202-205, 247-249, 254-259, 293-296, 311, 338-373, 395-429, 478, 480, 483-485, 684, 691, 693-696, 700, 710, 738-739, 774-778, 795-796, 801-802
  import_utils.py
  query_generation.py 57-64
  backend/protzilla/methods
  data_analysis.py 2558, 2616-2645
  importing.py 440, 466, 502, 532, 598, 625
  backend/protzilla/utilities
  clustergram.py
  utilities.py 142-144, 157-180
Project Total  

This report was generated by python-coverage-comment-action

Copy link
Copy Markdown
Collaborator

@Elena-kal Elena-kal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One major problem is that inputting the uniprot ids like this: "P69905, P68871" does not work while "P69905 P68871" is fine. Should be fixed by stripping the whitespaces though.
Another issue I came across is that the step is validated (gets the green checkmark) even though there are error messages. This does not happen in other steps.
I also added a few suggestions to the code but overall the code seems fine.
I tested a few structures and used the json file for alphafold predictions. This worked very well. I am also very convinced of the downloads tab, I like it a lot and I think it could be useful for other steps as well.


# extract protein_ids and number of copies per id and make sure they have the same length
if "," in protein_ids:
uniprot_ids = protein_ids.split(",")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also remove all whitespaces. I am not sure whether my problems came from that...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, too, that this caused your problem. I originally did not remove the whitespaces because I only added the comma-separated lists to support copy-paste from Excel where no whitespaces would be included. Changed this now.

uniprot_ids = protein_ids.split()
try:
if "," in number_copies:
copies_per_id = [int(input) for input in number_copies.split(",")]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, this split(",") worked I think

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we are having an additional cast to an int here, so the cast kind of removes the whitespaces.

)
query_as_string = f"[{json.dumps(query)}]"
return dict(
messages={},
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we return messages? And even if we want this to be empty, wouldn't we want the messages to be an empty list not a dict?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out, I added a success message.

query_as_string = f"[{json.dumps(query)}]"
return dict(
messages={},
downloads={f"prediction_query_{'_'.join(uniprot_ids)}": query_as_string},
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file name could become very long if we use too many uniprot ids. maybe we could truncate it to prevent this.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added another input field, the user now enters the filename themselves.

@AnnaPolensky
Copy link
Copy Markdown
Collaborator Author

One major problem is that inputting the uniprot ids like this: "P69905, P68871" does not work while "P69905 P68871" is fine. Should be fixed by stripping the whitespaces though. Another issue I came across is that the step is validated (gets the green checkmark) even though there are error messages. This does not happen in other steps. I also added a few suggestions to the code but overall the code seems fine. I tested a few structures and used the json file for alphafold predictions. This worked very well. I am also very convinced of the downloads tab, I like it a lot and I think it could be useful for other steps as well.

You can now enter comma- and space-separated ids like "P69905, P68871". Also, changed the validation of the step. Happy to hear that you like the download tab :)

Copy link
Copy Markdown
Collaborator

@tE3m tE3m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor adjustments, but good changes overall

input_fields=[
TextField(
name="name",
label="File name and AlphaFold job name for generated query",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this label should probably tell the user that the text they enter is only the stem of the filename, since .json is appended automatically

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants