hydroframe · danielletijerina · Mar 24, 2026 · Mar 25, 2026 · Mar 25, 2026 · Mar 25, 2026
diff --git a/examples/collect_observations/ccss_swe_collect_observations.ipynb b/examples/collect_observations/ccss_swe_collect_observations.ipynb
@@ -259,25 +259,11 @@
     "for i in gdf_in_bbox.index:\n",
     "    observation_utils.getCCSSData(gdf_in_bbox.name[i], gdf_in_bbox.code[i], 'Ca', StartDate, EndDate, OBS_OutputFolder)"
    ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "7e4506d2",
-   "metadata": {},
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "208439e6",
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "cssi_evaluation",
+   "display_name": "nwm_env",
    "language": "python",
    "name": "python3"
   },

diff --git a/examples/nwm/nwm_swe_point_scale_evaluation.ipynb b/examples/nwm/nwm_swe_point_scale_evaluation.ipynb
@@ -244,7 +244,7 @@
    "outputs": [],
    "source": [
     "# Create a folder to save results. Raise an error if the path already exists\n",
-    "Path(OBS_OutputFolder).mkdir(exist_ok=False)"
+    "Path(OBS_OutputFolder).mkdir(exist_ok=True)"
    ]
   },
   {
@@ -335,7 +335,7 @@
    "outputs": [],
    "source": [
     "# Create a folder to save results. If the folder already exists, an error will be raised.\n",
-    "Path(MOD_OutputFolder).mkdir(exist_ok=False)\n",
+    "Path(MOD_OutputFolder).mkdir(exist_ok=True)\n",
     "\n",
     "# Download NWM SWE data for the sites within the watershed bounding box, save to /mod_outputs folder\n",
     "input_crs = 'EPSG:4326' # WGS84 lat/lon. This is the CRS of the input geodataframe (gdf_in_bbox) \n",
@@ -842,7 +842,7 @@
    "source": [
     "<div style=\"color:black; background-color:#f5f5f5; padding:10px; border-left: 5px solid #007acc;\">\n",
     "<h4>🧠 Reflect</h4>\n",
-    "<p>Looking at the two plots, what could be some reasons for the model having simulated peak SWE so much earlier and less than observed in Paradise Meadow (PDS)? Perhaps try changing the <code>my_site_code</code> from earlier in the notebook to rerun <code>nwm_utils.comparison_plots()</code> to see the timeseries. \n",
+    "<p>Looking at the two plots, what could be some reasons for the model having simulated peak SWE so much earlier and less than observed in Paradise Meadow (PDS)? Perhaps try changing the <code>my_site_code</code> from earlier in the notebook to rerun <code>plot_utils.comparison_plots()</code> to see the timeseries. \n",
     "\n",
     "<br>What happens if you change the year that is plotted? <br>✏️ Try modifying the bar plot code from <code>bar1 = year1.hvplot.bar</code> to <code>bar1 = year2.hvplot.bar</code> </p>\n",
     "</div>"
@@ -1086,6 +1086,56 @@
     "metrics_df"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "2cfc514c",
+   "metadata": {},
+   "source": [
+    "Look at plots of summary statistics for each station. Here we look at Bias and NSE for each station in the watershed:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6d6b779b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Bias scatter\n",
+    "scatter = metrics_df.hvplot.scatter(\n",
+    "    x='site_id',\n",
+    "    y='bias',\n",
+    "    size=100,\n",
+    "    rot=45,\n",
+    "    ylabel='Bias (m)',\n",
+    "    title='Mean SWE Bias by Station'\n",
+    ")\n",
+    "\n",
+    "hline = hv.HLine(0).opts(color='black', line_dash='dashed', line_width=1)\n",
+    "\n",
+    "scatter * hline"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "99110e67",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# NSE histogram\n",
+    "metrics_df.hvplot.bar(\n",
+    "    x='site_id',\n",
+    "    y='nse',\n",
+    "    rot=45,\n",
+    "    ylabel='Nash–Sutcliffe Efficiency',\n",
+    "    title='NSE by Station',\n",
+    "    height=400,\n",
+    "    width=600,\n",
+    "    bar_width=0.5\n",
+    ")\n"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "5c521c83",
@@ -1099,6 +1149,30 @@
     "</div> "
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dd685949",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plot_utils.comparison_plots(obs_df,  model_df, f'{my_site_code}', f'{my_site_code}', site_label=None)\n",
+    "\n",
+    "# Change the site code to see other Snotel Stations --> e.g., '688:CO:SNTL'\n",
+    "#plot_utils.comparison_plots(obs_df,  model_df, '688:CO:SNTL', '688:CO:SNTL', site_label=None)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "965e400b",
+   "metadata": {},
+   "source": [
+    "<div style=\"color:black; background-color:#f5f5f5; padding:10px; border-left: 5px solid #007acc;\">\n",
+    "<h4>🧠 Reflect</h4>\n",
+    "<p>You now have several performance metrics: Bias, Pearson Correlation, Spearman Correlation, NSE, and KGE. If you had to pick just one metric to summarize model performance, which would you choose—and why? As you review the results, compare the peak flow amounts and the timing of snowmelt onset. Do you see any significant differences? Which dataset indicates an earlier melt?</p>\n",
+    "</div>"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "ce04cad1",
@@ -1141,53 +1215,6 @@
     "For some sites, Pearson and Spearman correlations are both close to 1, suggesting a strong relationship between observed and modeled SWE. As shown on the timeseries plot, this strong correlation alone does not indicate a \"good\" model. For example, it does not guarantee accurate timing of key events, such as peak SWE or melt onset. Let's compare these as well. The following code uses `report_max_dates_and_values` function to identify the peak SWE value and the date it occurs for both the observed (CCSS) and modeled (NWM) datasets. "
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "965e400b",
-   "metadata": {},
-   "source": [
-    "<div style=\"color:black; background-color:#f5f5f5; padding:10px; border-left: 5px solid #007acc;\">\n",
-    "<h4>🧠 Reflect</h4>\n",
-    "<p>You now have several performance metrics: Bias, Pearson Correlation, Spearman Correlation, NSE, and KGE. If you had to pick just one metric to summarize model performance, which would you choose—and why? As you review the results, compare the peak flow amounts and the timing of snowmelt onset. Do you see any significant differences? Which dataset indicates an earlier melt?</p>\n",
-    "</div>"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "99110e67",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "metrics_df.hvplot.bar(\n",
-    "    x='site_id',\n",
-    "    y='nse',\n",
-    "    rot=45,\n",
-    "    ylabel='Nash–Sutcliffe Efficiency',\n",
-    "    title='NSE by Station',\n",
-    "    height=400,\n",
-    "    width=600,\n",
-    "    bar_width=0.5\n",
-    ")\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "6d6b779b",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "metrics_df.hvplot.scatter(\n",
-    "    x='site_id',\n",
-    "    y='bias',\n",
-    "    size=100,\n",
-    "    rot=45,\n",
-    "    ylabel='Bias (m)',\n",
-    "    title='Mean SWE Bias by Station'\n",
-    ")\n"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "dc3c56e9",
@@ -1201,19 +1228,11 @@
    "id": "310c309c",
    "metadata": {},
    "source": [
+    "One way to learn more about the model performance is to combine metrics that tell us about different aspects of model behavior—such as timing, variability, and magnitude—rather than relying on a single summary measure.\n",
+    "\n",
     "The Condon diagram separates model performance into quadrants based on two metrics: **Spearman’s rho** (shape/time agreement) and **relative bias** (magnitude error). The horizontal line at 0.5 distinguishes whether the model captures the temporal pattern well (above 0.5 = good shape), while the vertical line is traditionally placed at a relative bias of 1.0, which represents a 100% error. This means the model’s total error is as large as the observed signal itself. This threshold has a clear physical interpretation and is used in the original Condon framework to distinguish acceptable vs. large bias. "
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "10a68672",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%reload_ext autoreload"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -1223,14 +1242,6 @@
    "source": [
     "plot_utils.plot_condon_diagram(metrics_df, variable=\"SWE\")"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "a2ae0058-3437-4ee0-8f0a-5bc1a289da37",
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {