Skip to content

Commit 976c5fb

Browse files
committed
Updated documentation for TOPMED
1 parent ca4b1b1 commit 976c5fb

12 files changed

Lines changed: 298 additions & 121 deletions

TOPMED.ipynb

Lines changed: 141 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,14 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"# TOPMED Presentation\n",
7+
"# Trans-Omics for Precision Medicine | TOPMed \n",
8+
"## Live Presentation of BioNeuralNet.\n",
89
"\n",
9-
"#### Demonstrates a **step-by-step** guide to using BioNeuralNet for multi-omics analysis and research."
10+
"A **step-by-step** guide to **BioNeuralNet**.\n",
11+
"\n",
12+
"- This demonstration was made specifically for the 2025 TOPMed Annual Meeting. Featuring artificial intelligence and machine learning.\n",
13+
"\n",
14+
"- For more information on TOPMed and their mission, please visit [TOPMed](https://topmed.nhlbi.nih.gov/)."
1015
]
1116
},
1217
{
@@ -26,6 +31,13 @@
2631
"!{sys.executable} -m pip install bioneuralnet"
2732
]
2833
},
34+
{
35+
"cell_type": "markdown",
36+
"metadata": {},
37+
"source": [
38+
"### BioNeuralNet Components"
39+
]
40+
},
2941
{
3042
"cell_type": "code",
3143
"execution_count": 2,
@@ -51,8 +63,9 @@
5163
"cell_type": "markdown",
5264
"metadata": {},
5365
"source": [
54-
"- #### Your dataset import may look something like this.\n",
55-
"- #### After loading your data, the remaining steps will be the same."
66+
"## Loading your data:\n",
67+
"- If you data is stored in a csv file, it can be loaded by following the example below.\n",
68+
"- After loading your data, the remaining steps will be the same."
5669
]
5770
},
5871
{
@@ -70,6 +83,17 @@
7083
"clinical_data = pd.read_csv(\"clinical.csv\")"
7184
]
7285
},
86+
{
87+
"cell_type": "markdown",
88+
"metadata": {},
89+
"source": [
90+
"## Easy of component exploration via `DatasetLoader` \n",
91+
"\n",
92+
"- This component allows users to explore BioNeuralNet capabilities.\n",
93+
"\n",
94+
"- DatasetLoader `example1` is synthetic and purely for example purposes."
95+
]
96+
},
7397
{
7498
"cell_type": "code",
7599
"execution_count": 4,
@@ -1091,6 +1115,18 @@
10911115
"display(clinical)"
10921116
]
10931117
},
1118+
{
1119+
"cell_type": "markdown",
1120+
"metadata": {},
1121+
"source": [
1122+
"## Generating a Multi-Omics Network using SmCCNet\n",
1123+
"- SmCCNet stands for: Sparse Multiple Canonical Correlation Network Analysis Tool.\n",
1124+
"\n",
1125+
"- Is one of our external tools offered through `external_tools`component.\n",
1126+
"\n",
1127+
"- For more information on SmCCNet please visit docs [SmCCNet](https://cran.r-project.org/package=SmCCNet)"
1128+
]
1129+
},
10941130
{
10951131
"cell_type": "code",
10961132
"execution_count": null,
@@ -1108,6 +1144,14 @@
11081144
"global_network, smccnet_clusters = smccnet.run()"
11091145
]
11101146
},
1147+
{
1148+
"cell_type": "markdown",
1149+
"metadata": {},
1150+
"source": [
1151+
"## SmCCNet Output\n",
1152+
"- SmCCNet returned a 600x600 network and 3 Sub Netwoks or Clusters."
1153+
]
1154+
},
11111155
{
11121156
"cell_type": "code",
11131157
"execution_count": 6,
@@ -1137,6 +1181,16 @@
11371181
"display(len(smccnet_clusters))"
11381182
]
11391183
},
1184+
{
1185+
"cell_type": "markdown",
1186+
"metadata": {},
1187+
"source": [
1188+
"## Generating Low-Dimensional Embeddings using Graph Neural Networks to capture meaningful biological interactions.\n",
1189+
"- This will return node embeddings that reflect both local connectivity and supervised signals\n",
1190+
"\n",
1191+
"- Each node (omics feature) is associated with a numeric label (e.g., Pearson correlation with phenotype) that guides learning."
1192+
]
1193+
},
11401194
{
11411195
"cell_type": "code",
11421196
"execution_count": null,
@@ -1159,6 +1213,14 @@
11591213
"embeddings_output = embeddings.embed(as_df=True)"
11601214
]
11611215
},
1216+
{
1217+
"cell_type": "markdown",
1218+
"metadata": {},
1219+
"source": [
1220+
"## Output\n",
1221+
"For each Omic or Node in the network, our `GNNEmbedding` function generated a 64 dimmensional representation of that Omics."
1222+
]
1223+
},
11621224
{
11631225
"cell_type": "code",
11641226
"execution_count": 8,
@@ -1191,6 +1253,14 @@
11911253
"embeddings_array = embeddings_output.values "
11921254
]
11931255
},
1256+
{
1257+
"cell_type": "markdown",
1258+
"metadata": {},
1259+
"source": [
1260+
"## Embeddings visualization\n",
1261+
"By visulizing the Emebedding Space in a 2 dimensional space. We can notice some cluster of Nodes/Omics. Highlighting close relationships between them."
1262+
]
1263+
},
11941264
{
11951265
"cell_type": "code",
11961266
"execution_count": 10,
@@ -1211,6 +1281,18 @@
12111281
"fig1 = plot_embeddings(embeddings_array, global_node_labels.to_frame(name=\"phenotype\"), method=\"tsne\")"
12121282
]
12131283
},
1284+
{
1285+
"cell_type": "markdown",
1286+
"metadata": {},
1287+
"source": [
1288+
"## Using the Embeddings\n",
1289+
"- Let use this omics to enrich the representation of the original dataset.\n",
1290+
"\n",
1291+
"- The `GraphEmbedding` function takes our previously generated embeddings and our original Omics Dataset and associated Phenotype\n",
1292+
"\n",
1293+
"- This function will use the embeddings to enrich the orignal dataset. For more details and how this is performed please view our `GNN Embeddings for Multi-Omics` tab."
1294+
]
1295+
},
12141296
{
12151297
"cell_type": "code",
12161298
"execution_count": null,
@@ -1611,6 +1693,18 @@
16111693
"display(enhanced_omics_df)"
16121694
]
16131695
},
1696+
{
1697+
"cell_type": "markdown",
1698+
"metadata": {},
1699+
"source": [
1700+
"## Comparing results\n",
1701+
"Lets compare the enriched dataset vs the raw dataset by using it with a popular machine learning technique, `Random Forest`. In this example we are simply using the high-dimensional omics data to make prediction on the phenotype.\n",
1702+
"\n",
1703+
"- Our entire codebase is publicly avaible at the [Big Data Management and Mining Laboratory](https://github.com/UCD-BDLab) github repository.\n",
1704+
"\n",
1705+
"- For specific details on this code, please visit: [BioNeuralNet](https://github.com/UCD-BDLab/BioNeuralNet)"
1706+
]
1707+
},
16141708
{
16151709
"cell_type": "code",
16161710
"execution_count": null,
@@ -1648,6 +1742,20 @@
16481742
"plot_performance(accuracy_with_embeddings, accuracy_alone, \"Raw Omics vs Enriched Omics\")"
16491743
]
16501744
},
1745+
{
1746+
"cell_type": "markdown",
1747+
"metadata": {},
1748+
"source": [
1749+
"## Network Visulization\n",
1750+
"- As part of this demonstration, I have developed a number of graphic and plotting tools.\n",
1751+
"\n",
1752+
"- Please note that these tools are not part of the BioNeuralNet core, and it is not my intent to present them this way.\n",
1753+
"\n",
1754+
"- This examples highlight how external libraies and your own code can be easily integrated into our workflow. Allowing users to further explore the omics-data.\n",
1755+
"\n",
1756+
"- All this code and visulization aid components will be availble after the presentation."
1757+
]
1758+
},
16511759
{
16521760
"cell_type": "code",
16531761
"execution_count": 15,
@@ -1833,6 +1941,18 @@
18331941
"display(cluster2_mapping.head())\n"
18341942
]
18351943
},
1944+
{
1945+
"cell_type": "markdown",
1946+
"metadata": {},
1947+
"source": [
1948+
"## Correlated Clustering\n",
1949+
"- BioNeuralNet includes internal modules for performing correlated clustering on complex networks. \n",
1950+
"\n",
1951+
"- These methods modify and extend the traditional community detection by integrating phenotype correlation, allowing users to extract biologically relevant, phenotype-associated modules from any network. \n",
1952+
"\n",
1953+
"- For more details on how this performed, please visit our `Correlated Clustering Methods` tab"
1954+
]
1955+
},
18361956
{
18371957
"cell_type": "code",
18381958
"execution_count": null,
@@ -2392,10 +2512,15 @@
23922512
"cell_type": "markdown",
23932513
"metadata": {},
23942514
"source": [
2395-
"### DPMON (Disease Prediction using Multi-Omics Networks) reuses the same GNN architectures but with a different objective: \n",
2396-
"- Instead of node-level MSE regression, DPMON aggregates node embeddings with patient-level omics data. \n",
2397-
"- A downstream classification head (e.g., softmax layer with CrossEntropyLoss) is applied for sample-level disease prediction. \n",
2398-
"- This end-to-end approach leverages both local (node-level) and global (patient-level) network information."
2515+
"## DPMON (Disease Prediction using Multi-Omics Networks) reuses the same GNN architectures but with a different objective: \n",
2516+
"\n",
2517+
"- DPMON aggregates node embeddings with patient-level omics data. \n",
2518+
"\n",
2519+
"- A downstream classification head is applied for sample-level disease prediction.\n",
2520+
"\n",
2521+
"- This end-to-end approach leverages both local (node-level) and global (patient-level) network information.\n",
2522+
"\n",
2523+
"- This single cell bellow captures the entire workflow demonstrated earlier (Generating GNN Embeddins + Integrating these Embeddings back into Omics Dataset) in an end-to-end iterative pipeline. "
23992524
]
24002525
},
24012526
{
@@ -2531,6 +2656,13 @@
25312656
"display(dpmon_predictions[0])"
25322657
]
25332658
},
2659+
{
2660+
"cell_type": "markdown",
2661+
"metadata": {},
2662+
"source": [
2663+
"## DPMON allows BioNeuralNet users to significatly improve phenotype predictions with a few lines of code."
2664+
]
2665+
},
25342666
{
25352667
"cell_type": "code",
25362668
"execution_count": null,
@@ -2568,7 +2700,7 @@
25682700
],
25692701
"metadata": {
25702702
"kernelspec": {
2571-
"display_name": ".test_env",
2703+
"display_name": ".venv",
25722704
"language": "python",
25732705
"name": "python3"
25742706
},

0 commit comments

Comments
 (0)