Conversation
ljwoods2
left a comment
There was a problem hiding this comment.
Overall looks good, can you add a plot toresults/figures using the graph method you created for the analysis in this notebook? Would be cool to show John and look back on later to compare with other methods
|
|
||
|
|
||
| # %% | ||
| def plot_epitope_non_epitope_stats_9mer( |
There was a problem hiding this comment.
Hmm, it's not intuitive from looking at the plot what the minimum represents here. I think the legend should be more descriptive: this is the mean per-amino acid minimum pLDDT for the 9mer in each 30mer with the lowest minimum single pLDDT. Probably a better way to word this, but the plot's a bit misleading. Could make the mean legend more descriptive, too.
There was a problem hiding this comment.
Also, is this identical to the 30mer mean min? I think the calculation works out the same
There was a problem hiding this comment.
Yep, from looking at the values, I think these are equivalent- I'd recommend removing min min from this plot.
| print("max:" + str(max_pLDDT)) | ||
| min_pLDDT = dataset.select(pl.col(colname)).min().item() | ||
| print("min:" + str(min_pLDDT)) | ||
| mean_pLDDT = dataset.select(pl.col(colname)).to_series() |
There was a problem hiding this comment.
remove mean_pLDDT, looks like it isn't doing anything
| all_statistics, | ||
| "data/hv/peptide/inference", | ||
| ) | ||
|
|
There was a problem hiding this comment.
add your mass + helix / beta sheet feature extraction methods here
|
@ljwoods2 just pushed the code wasn't able to do everything I wanted but I got a good start |
…nd min of the atomic weights of the 9-mers
|
@spencer2234 can you try using max bepipred score per 30mer as a feature instead of mean? I think that's potentially a more fair way to compare |
|
@ljwoods2 checkout the hv_class and in_class folders in notebooks |
| @@ -0,0 +1,99 @@ | |||
| import polars as pl | |||
There was a problem hiding this comment.
Write a brief docstring at the top of each of these feature extraction scripts describing which features they're going to extract- it's not clear from name alone what this is meant to do
| y_hat_RSA_fp = st.normalized_pLDDT_30mer(all_statistics_in_class_fp, "mean_rsa_slice") | ||
| y_true_RSA = all_statistics_in_class_fp.select(pl.col("epitope")) | ||
| in_class_norm_rsa_mean_30mer_ROC = st.plot_auc_roc_curve( | ||
| y_true_RSA, y_hat_RSA_fp, "in_class Normalized mean RSA values for 30mer fp ROC" |
There was a problem hiding this comment.
For your poster, change the titles of these figures so that they don't say "in_class" as the dataset name. I would say "IN1 30mer classification set" as the name or something similar, and then you can define that in the text of the poster.
Same goes for other figures
| in_class_norm_rsa_mean_30mer_ROC = st.plot_auc_roc_curve( | ||
| y_true_RSA, y_hat_RSA_fp, "in_class Normalized mean RSA values for 30mer fp ROC" | ||
| ) | ||
| in_class_norm_rsa_mean_30mer_ROC.savefig( |
There was a problem hiding this comment.
AUC curve is flipped, fix this so AUC > 0.5
There was a problem hiding this comment.
Same goes for other flipped AUC curves
|
|
||
|
|
||
| # %% | ||
| y_hat_RSA_fp = st.normalized_pLDDT_30mer(all_statistics_in_class_fp, "mean_rsa_slice") |
There was a problem hiding this comment.
There's quite a few steps leading up to this, so add a markdown cell above this cell describing what this plot shows. The scoring method isn't immediately obvious: all instances of all 30mers across the focal proteins they appeared in, which allows duplicate 30mers, each 30mer annotated with a true/false value extracted from PepSeq (assay)
| ) | ||
|
|
||
| # %% | ||
| fp_aggrigate_30mer = all_statistics_hv_class_fp.group_by("peptide").agg( |
There was a problem hiding this comment.
this plot and the one above it are both using the column "mean_pLDDT_slice" but this one refers to the metric as "geometric mean pLDDT"- is it using geometric mean or not? Should rename whichever is incorrect
| ) | ||
|
|
||
| # %% | ||
| mean_auc = fp_aggrigate_9mer.select("AUC").mean() |
There was a problem hiding this comment.
AUC is always None here, something is wrong
There was a problem hiding this comment.
Same with the equivalent cell in in_data.py
Fixes #7
Fixes #8
Fixes #9
Fixes #15
Fixes #16