|
4 | 4 | "cell_type": "markdown", |
5 | 5 | "metadata": {}, |
6 | 6 | "source": [ |
7 | | - "# TOPMED Presentation\n", |
| 7 | + "# Trans-Omics for Precision Medicine | TOPMed \n", |
| 8 | + "## Live Presentation of BioNeuralNet.\n", |
8 | 9 | "\n", |
9 | | - "#### Demonstrates a **step-by-step** guide to using BioNeuralNet for multi-omics analysis and research." |
| 10 | + "A **step-by-step** guide to **BioNeuralNet**.\n", |
| 11 | + "\n", |
| 12 | + "- This demonstration was made specifically for the 2025 TOPMed Annual Meeting. Featuring artificial intelligence and machine learning.\n", |
| 13 | + "\n", |
| 14 | + "- For more information on TOPMed and their mission, please visit [TOPMed](https://topmed.nhlbi.nih.gov/)." |
10 | 15 | ] |
11 | 16 | }, |
12 | 17 | { |
|
26 | 31 | "!{sys.executable} -m pip install bioneuralnet" |
27 | 32 | ] |
28 | 33 | }, |
| 34 | + { |
| 35 | + "cell_type": "markdown", |
| 36 | + "metadata": {}, |
| 37 | + "source": [ |
| 38 | + "### BioNeuralNet Components" |
| 39 | + ] |
| 40 | + }, |
29 | 41 | { |
30 | 42 | "cell_type": "code", |
31 | 43 | "execution_count": 2, |
|
51 | 63 | "cell_type": "markdown", |
52 | 64 | "metadata": {}, |
53 | 65 | "source": [ |
54 | | - "- #### Your dataset import may look something like this.\n", |
55 | | - "- #### After loading your data, the remaining steps will be the same." |
| 66 | + "## Loading your data:\n", |
| 67 | + "- If you data is stored in a csv file, it can be loaded by following the example below.\n", |
| 68 | + "- After loading your data, the remaining steps will be the same." |
56 | 69 | ] |
57 | 70 | }, |
58 | 71 | { |
|
70 | 83 | "clinical_data = pd.read_csv(\"clinical.csv\")" |
71 | 84 | ] |
72 | 85 | }, |
| 86 | + { |
| 87 | + "cell_type": "markdown", |
| 88 | + "metadata": {}, |
| 89 | + "source": [ |
| 90 | + "## Easy of component exploration via `DatasetLoader` \n", |
| 91 | + "\n", |
| 92 | + "- This component allows users to explore BioNeuralNet capabilities.\n", |
| 93 | + "\n", |
| 94 | + "- DatasetLoader `example1` is synthetic and purely for example purposes." |
| 95 | + ] |
| 96 | + }, |
73 | 97 | { |
74 | 98 | "cell_type": "code", |
75 | 99 | "execution_count": 4, |
|
1091 | 1115 | "display(clinical)" |
1092 | 1116 | ] |
1093 | 1117 | }, |
| 1118 | + { |
| 1119 | + "cell_type": "markdown", |
| 1120 | + "metadata": {}, |
| 1121 | + "source": [ |
| 1122 | + "## Generating a Multi-Omics Network using SmCCNet\n", |
| 1123 | + "- SmCCNet stands for: Sparse Multiple Canonical Correlation Network Analysis Tool.\n", |
| 1124 | + "\n", |
| 1125 | + "- Is one of our external tools offered through `external_tools`component.\n", |
| 1126 | + "\n", |
| 1127 | + "- For more information on SmCCNet please visit docs [SmCCNet](https://cran.r-project.org/package=SmCCNet)" |
| 1128 | + ] |
| 1129 | + }, |
1094 | 1130 | { |
1095 | 1131 | "cell_type": "code", |
1096 | 1132 | "execution_count": null, |
|
1108 | 1144 | "global_network, smccnet_clusters = smccnet.run()" |
1109 | 1145 | ] |
1110 | 1146 | }, |
| 1147 | + { |
| 1148 | + "cell_type": "markdown", |
| 1149 | + "metadata": {}, |
| 1150 | + "source": [ |
| 1151 | + "## SmCCNet Output\n", |
| 1152 | + "- SmCCNet returned a 600x600 network and 3 Sub Netwoks or Clusters." |
| 1153 | + ] |
| 1154 | + }, |
1111 | 1155 | { |
1112 | 1156 | "cell_type": "code", |
1113 | 1157 | "execution_count": 6, |
|
1137 | 1181 | "display(len(smccnet_clusters))" |
1138 | 1182 | ] |
1139 | 1183 | }, |
| 1184 | + { |
| 1185 | + "cell_type": "markdown", |
| 1186 | + "metadata": {}, |
| 1187 | + "source": [ |
| 1188 | + "## Generating Low-Dimensional Embeddings using Graph Neural Networks to capture meaningful biological interactions.\n", |
| 1189 | + "- This will return node embeddings that reflect both local connectivity and supervised signals\n", |
| 1190 | + "\n", |
| 1191 | + "- Each node (omics feature) is associated with a numeric label (e.g., Pearson correlation with phenotype) that guides learning." |
| 1192 | + ] |
| 1193 | + }, |
1140 | 1194 | { |
1141 | 1195 | "cell_type": "code", |
1142 | 1196 | "execution_count": null, |
|
1159 | 1213 | "embeddings_output = embeddings.embed(as_df=True)" |
1160 | 1214 | ] |
1161 | 1215 | }, |
| 1216 | + { |
| 1217 | + "cell_type": "markdown", |
| 1218 | + "metadata": {}, |
| 1219 | + "source": [ |
| 1220 | + "## Output\n", |
| 1221 | + "For each Omic or Node in the network, our `GNNEmbedding` function generated a 64 dimmensional representation of that Omics." |
| 1222 | + ] |
| 1223 | + }, |
1162 | 1224 | { |
1163 | 1225 | "cell_type": "code", |
1164 | 1226 | "execution_count": 8, |
|
1191 | 1253 | "embeddings_array = embeddings_output.values " |
1192 | 1254 | ] |
1193 | 1255 | }, |
| 1256 | + { |
| 1257 | + "cell_type": "markdown", |
| 1258 | + "metadata": {}, |
| 1259 | + "source": [ |
| 1260 | + "## Embeddings visualization\n", |
| 1261 | + "By visulizing the Emebedding Space in a 2 dimensional space. We can notice some cluster of Nodes/Omics. Highlighting close relationships between them." |
| 1262 | + ] |
| 1263 | + }, |
1194 | 1264 | { |
1195 | 1265 | "cell_type": "code", |
1196 | 1266 | "execution_count": 10, |
|
1211 | 1281 | "fig1 = plot_embeddings(embeddings_array, global_node_labels.to_frame(name=\"phenotype\"), method=\"tsne\")" |
1212 | 1282 | ] |
1213 | 1283 | }, |
| 1284 | + { |
| 1285 | + "cell_type": "markdown", |
| 1286 | + "metadata": {}, |
| 1287 | + "source": [ |
| 1288 | + "## Using the Embeddings\n", |
| 1289 | + "- Let use this omics to enrich the representation of the original dataset.\n", |
| 1290 | + "\n", |
| 1291 | + "- The `GraphEmbedding` function takes our previously generated embeddings and our original Omics Dataset and associated Phenotype\n", |
| 1292 | + "\n", |
| 1293 | + "- This function will use the embeddings to enrich the orignal dataset. For more details and how this is performed please view our `GNN Embeddings for Multi-Omics` tab." |
| 1294 | + ] |
| 1295 | + }, |
1214 | 1296 | { |
1215 | 1297 | "cell_type": "code", |
1216 | 1298 | "execution_count": null, |
|
1611 | 1693 | "display(enhanced_omics_df)" |
1612 | 1694 | ] |
1613 | 1695 | }, |
| 1696 | + { |
| 1697 | + "cell_type": "markdown", |
| 1698 | + "metadata": {}, |
| 1699 | + "source": [ |
| 1700 | + "## Comparing results\n", |
| 1701 | + "Lets compare the enriched dataset vs the raw dataset by using it with a popular machine learning technique, `Random Forest`. In this example we are simply using the high-dimensional omics data to make prediction on the phenotype.\n", |
| 1702 | + "\n", |
| 1703 | + "- Our entire codebase is publicly avaible at the [Big Data Management and Mining Laboratory](https://github.com/UCD-BDLab) github repository.\n", |
| 1704 | + "\n", |
| 1705 | + "- For specific details on this code, please visit: [BioNeuralNet](https://github.com/UCD-BDLab/BioNeuralNet)" |
| 1706 | + ] |
| 1707 | + }, |
1614 | 1708 | { |
1615 | 1709 | "cell_type": "code", |
1616 | 1710 | "execution_count": null, |
|
1648 | 1742 | "plot_performance(accuracy_with_embeddings, accuracy_alone, \"Raw Omics vs Enriched Omics\")" |
1649 | 1743 | ] |
1650 | 1744 | }, |
| 1745 | + { |
| 1746 | + "cell_type": "markdown", |
| 1747 | + "metadata": {}, |
| 1748 | + "source": [ |
| 1749 | + "## Network Visulization\n", |
| 1750 | + "- As part of this demonstration, I have developed a number of graphic and plotting tools.\n", |
| 1751 | + "\n", |
| 1752 | + "- Please note that these tools are not part of the BioNeuralNet core, and it is not my intent to present them this way.\n", |
| 1753 | + "\n", |
| 1754 | + "- This examples highlight how external libraies and your own code can be easily integrated into our workflow. Allowing users to further explore the omics-data.\n", |
| 1755 | + "\n", |
| 1756 | + "- All this code and visulization aid components will be availble after the presentation." |
| 1757 | + ] |
| 1758 | + }, |
1651 | 1759 | { |
1652 | 1760 | "cell_type": "code", |
1653 | 1761 | "execution_count": 15, |
|
1833 | 1941 | "display(cluster2_mapping.head())\n" |
1834 | 1942 | ] |
1835 | 1943 | }, |
| 1944 | + { |
| 1945 | + "cell_type": "markdown", |
| 1946 | + "metadata": {}, |
| 1947 | + "source": [ |
| 1948 | + "## Correlated Clustering\n", |
| 1949 | + "- BioNeuralNet includes internal modules for performing correlated clustering on complex networks. \n", |
| 1950 | + "\n", |
| 1951 | + "- These methods modify and extend the traditional community detection by integrating phenotype correlation, allowing users to extract biologically relevant, phenotype-associated modules from any network. \n", |
| 1952 | + "\n", |
| 1953 | + "- For more details on how this performed, please visit our `Correlated Clustering Methods` tab" |
| 1954 | + ] |
| 1955 | + }, |
1836 | 1956 | { |
1837 | 1957 | "cell_type": "code", |
1838 | 1958 | "execution_count": null, |
|
2392 | 2512 | "cell_type": "markdown", |
2393 | 2513 | "metadata": {}, |
2394 | 2514 | "source": [ |
2395 | | - "### DPMON (Disease Prediction using Multi-Omics Networks) reuses the same GNN architectures but with a different objective: \n", |
2396 | | - "- Instead of node-level MSE regression, DPMON aggregates node embeddings with patient-level omics data. \n", |
2397 | | - "- A downstream classification head (e.g., softmax layer with CrossEntropyLoss) is applied for sample-level disease prediction. \n", |
2398 | | - "- This end-to-end approach leverages both local (node-level) and global (patient-level) network information." |
| 2515 | + "## DPMON (Disease Prediction using Multi-Omics Networks) reuses the same GNN architectures but with a different objective: \n", |
| 2516 | + "\n", |
| 2517 | + "- DPMON aggregates node embeddings with patient-level omics data. \n", |
| 2518 | + "\n", |
| 2519 | + "- A downstream classification head is applied for sample-level disease prediction.\n", |
| 2520 | + "\n", |
| 2521 | + "- This end-to-end approach leverages both local (node-level) and global (patient-level) network information.\n", |
| 2522 | + "\n", |
| 2523 | + "- This single cell bellow captures the entire workflow demonstrated earlier (Generating GNN Embeddins + Integrating these Embeddings back into Omics Dataset) in an end-to-end iterative pipeline. " |
2399 | 2524 | ] |
2400 | 2525 | }, |
2401 | 2526 | { |
|
2531 | 2656 | "display(dpmon_predictions[0])" |
2532 | 2657 | ] |
2533 | 2658 | }, |
| 2659 | + { |
| 2660 | + "cell_type": "markdown", |
| 2661 | + "metadata": {}, |
| 2662 | + "source": [ |
| 2663 | + "## DPMON allows BioNeuralNet users to significatly improve phenotype predictions with a few lines of code." |
| 2664 | + ] |
| 2665 | + }, |
2534 | 2666 | { |
2535 | 2667 | "cell_type": "code", |
2536 | 2668 | "execution_count": null, |
|
2568 | 2700 | ], |
2569 | 2701 | "metadata": { |
2570 | 2702 | "kernelspec": { |
2571 | | - "display_name": ".test_env", |
| 2703 | + "display_name": ".venv", |
2572 | 2704 | "language": "python", |
2573 | 2705 | "name": "python3" |
2574 | 2706 | }, |
|
0 commit comments