Conversation
notebooks/regression.ipynb
Outdated
| "train_bools = []\n", | ||
| "epitope_bools = []\n", | ||
| "rsa_vals = []\n", | ||
| "for (esm_emb, seq, train_boolmask, epitope_boolmask, rsa) in bp3.iter_rows():\n", |
There was a problem hiding this comment.
Just a quality of life thing, you can always do this if you have a large number of rows and don't want to unpack the tuple and name everything
for row in bp3.iter_rows(named=True):
esm_emb = row["esm_emb"]
notebooks/regression.ipynb
Outdated
| } | ||
| ], | ||
| "source": [ | ||
| "# --- Transform to Per-Residue Basis ---\n", |
There was a problem hiding this comment.
This cell could also be replaced with something like:
bp3.explode("esm_emb", pl.col("seq").str.split(""), "train_boolmask", "epitope_boolmask", "RSA")although this would require you to load the esm_emb object into the dataframe as a list rather than a tensor, which polars doesn't know the length of
notebooks/regression.ipynb
Outdated
| " esm_embeddings = []\n", | ||
| " for job_num in range(bp3.shape[0]):\n", | ||
| " job_name = bp3.select(\"job_name\")[job_num].item()\n", | ||
| " esm_embeddings.append(torch.load(ESM_ENCODING_DIR / (job_name + \".pt\")))\n", |
There was a problem hiding this comment.
I would change this to be a list since they can be converted into polars lists when inserted as a column and play nice with polars- notice how in the printed dataframe polars thinks it is an opaque "object"/binary blob and polars operations won't work on it
torch.load(ESM_ENCODING_DIR / (job_name + ".pt")).tolist()
| "X_df = train_df[agg_features]\n", | ||
| "y_df = train_df[\"epitope_bools\"]\n", | ||
| "\n", | ||
| "X = X_df.values\n", |
There was a problem hiding this comment.
Polars version of this is
bp3_res.select(agg_features).to_numpy()| " y_train, y_test = y[train_index], y[test_index]\n", | ||
| "\n", | ||
| " # --- Scale Features ---\n", | ||
| " scaler = StandardScaler() \n", |
There was a problem hiding this comment.
Does bepipred scale embedding features? Just curious if this is necessary
No description provided.