UCD-BDLab
diff --git a/‎TOPMED.ipynb‎
Lines changed: 141 additions & 9 deletions b/‎TOPMED.ipynb‎
Lines changed: 141 additions & 9 deletions
@@ -4,9 +4,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# TOPMED Presentation\n",
+    "# Trans-Omics for Precision Medicine | TOPMed \n",
+    "## Live Presentation of BioNeuralNet.\n",
     "\n",
-    "#### Demonstrates a **step-by-step** guide to using BioNeuralNet for multi-omics analysis and research."
+    "A **step-by-step** guide to **BioNeuralNet**.\n",
+    "\n",
+    "- This demonstration was made specifically for the 2025 TOPMed Annual Meeting. Featuring artificial intelligence and machine learning.\n",
+    "\n",
+    "- For more information on TOPMed and their mission, please visit [TOPMed](https://topmed.nhlbi.nih.gov/)."
    ]
   },
   {
@@ -26,6 +31,13 @@
     "!{sys.executable} -m pip install bioneuralnet"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### BioNeuralNet Components"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 2,
@@ -51,8 +63,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "- #### Your dataset import may look something like this.\n",
-    "- #### After loading your data, the remaining steps will be the same."
+    "## Loading your data:\n",
+    "- If you data is stored in a csv file, it can be loaded by following the example below.\n",
+    "- After loading your data, the remaining steps will be the same."
    ]
   },
   {
@@ -70,6 +83,17 @@
     "clinical_data = pd.read_csv(\"clinical.csv\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Easy of component exploration via `DatasetLoader` \n",
+    "\n",
+    "- This component allows users to explore BioNeuralNet capabilities.\n",
+    "\n",
+    "- DatasetLoader `example1` is synthetic and purely for example purposes."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 4,
@@ -1091,6 +1115,18 @@
     "display(clinical)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Generating a Multi-Omics Network using SmCCNet\n",
+    "- SmCCNet stands for: Sparse Multiple Canonical Correlation Network Analysis Tool.\n",
+    "\n",
+    "- Is one of our external tools offered through `external_tools`component.\n",
+    "\n",
+    "- For more information on SmCCNet please visit docs [SmCCNet](https://cran.r-project.org/package=SmCCNet)"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -1108,6 +1144,14 @@
     "global_network, smccnet_clusters = smccnet.run()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## SmCCNet Output\n",
+    "- SmCCNet returned a 600x600 network and 3 Sub Netwoks or Clusters."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 6,
@@ -1137,6 +1181,16 @@
     "display(len(smccnet_clusters))"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Generating Low-Dimensional Embeddings using Graph Neural Networks to capture meaningful biological interactions.\n",
+    "- This will return node embeddings that reflect both local connectivity and supervised signals\n",
+    "\n",
+    "- Each node (omics feature) is associated with a numeric label (e.g., Pearson correlation with phenotype) that guides learning."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -1159,6 +1213,14 @@
     "embeddings_output = embeddings.embed(as_df=True)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Output\n",
+    "For each Omic or Node in the network, our `GNNEmbedding` function generated a 64 dimmensional representation of that Omics."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 8,
@@ -1191,6 +1253,14 @@
     "embeddings_array = embeddings_output.values  "
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Embeddings visualization\n",
+    "By visulizing the Emebedding Space in a 2 dimensional space. We can notice some cluster of Nodes/Omics. Highlighting close relationships between them."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 10,
@@ -1211,6 +1281,18 @@
     "fig1 = plot_embeddings(embeddings_array, global_node_labels.to_frame(name=\"phenotype\"), method=\"tsne\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Using the Embeddings\n",
+    "- Let use this omics to enrich the representation of the original dataset.\n",
+    "\n",
+    "- The `GraphEmbedding` function takes our previously generated embeddings and our original Omics Dataset and associated Phenotype\n",
+    "\n",
+    "- This function will use the embeddings to enrich the orignal dataset. For more details and how this is performed please view our `GNN Embeddings for Multi-Omics` tab."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -1611,6 +1693,18 @@
     "display(enhanced_omics_df)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Comparing results\n",
+    "Lets compare the enriched dataset vs the raw dataset by using it with a popular machine learning technique, `Random Forest`. In this example we are simply using the high-dimensional omics data to make prediction on the phenotype.\n",
+    "\n",
+    "- Our entire codebase is publicly avaible at the [Big Data Management and Mining Laboratory](https://github.com/UCD-BDLab) github repository.\n",
+    "\n",
+    "- For specific details on this code, please visit: [BioNeuralNet](https://github.com/UCD-BDLab/BioNeuralNet)"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -1648,6 +1742,20 @@
     "plot_performance(accuracy_with_embeddings, accuracy_alone, \"Raw Omics vs Enriched Omics\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Network Visulization\n",
+    "- As part of this demonstration, I have developed a number of graphic and plotting tools.\n",
+    "\n",
+    "- Please note that these tools are not part of the BioNeuralNet core, and it is not my intent to present them this way.\n",
+    "\n",
+    "- This examples highlight how external libraies and your own code can be easily integrated into our workflow. Allowing users to further explore the omics-data.\n",
+    "\n",
+    "- All this code and visulization aid components will be availble after the presentation."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 15,
@@ -1833,6 +1941,18 @@
     "display(cluster2_mapping.head())\n"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Correlated Clustering\n",
+    "- BioNeuralNet includes internal modules for performing correlated clustering on complex networks. \n",
+    "\n",
+    "- These methods modify and extend the traditional community detection by integrating phenotype correlation, allowing users to extract biologically relevant, phenotype-associated modules from any network. \n",
+    "\n",
+    "- For more details on how this performed, please visit our `Correlated Clustering Methods` tab"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -2392,10 +2512,15 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### DPMON (Disease Prediction using Multi-Omics Networks) reuses the same GNN architectures but with a different objective: \n",
-    "- Instead of node-level MSE regression, DPMON aggregates node embeddings with patient-level omics data. \n",
-    "- A downstream classification head (e.g., softmax layer with CrossEntropyLoss) is applied for sample-level disease prediction. \n",
-    "- This end-to-end approach leverages both local (node-level) and global (patient-level) network information."
+    "## DPMON (Disease Prediction using Multi-Omics Networks) reuses the same GNN architectures but with a different objective: \n",
+    "\n",
+    "- DPMON aggregates node embeddings with patient-level omics data. \n",
+    "\n",
+    "- A downstream classification head is applied for sample-level disease prediction.\n",
+    "\n",
+    "- This end-to-end approach leverages both local (node-level) and global (patient-level) network information.\n",
+    "\n",
+    "- This single cell bellow captures the entire workflow demonstrated earlier (Generating GNN Embeddins + Integrating these Embeddings back into Omics Dataset) in an end-to-end iterative pipeline. "
    ]
   },
   {
@@ -2531,6 +2656,13 @@
     "display(dpmon_predictions[0])"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## DPMON allows BioNeuralNet users to significatly improve phenotype predictions with a few lines of code."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -2568,7 +2700,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": ".test_env",
+   "display_name": ".venv",
    "language": "python",
    "name": "python3"
   },