From 35a961c23754e4017338867f231b085533dff476 Mon Sep 17 00:00:00 2001 From: danielletijerina Date: Mon, 23 Mar 2026 14:19:46 -0600 Subject: [PATCH 1/2] renamed model notebooks; removed dataCollectionHydrodata_parflow.ipynb and nwm_utils.py --- ...nb => ccss_swe_collect_observations.ipynb} | 0 .../dataCollectionHydrodata_parflow.ipynb | 764 ------------------ ...m_streamflow_point_scale_evaluation.ipynb} | 0 ...b => nwm_swe_point_scale_evaluation.ipynb} | 0 ... nwm_swe_watershed_scale_evaluation.ipynb} | 0 examples/nwm/utils/nwm_utils.py | 690 ---------------- ... parflow_swe_point_scale_evaluation.ipynb} | 713 +++++++++++++++- ...flow_swe_watershed_scale_evaluation.ipynb} | 0 8 files changed, 690 insertions(+), 1477 deletions(-) rename examples/collect_observations/{dataCollection_nwm.ipynb => ccss_swe_collect_observations.ipynb} (100%) delete mode 100644 examples/collect_observations/dataCollectionHydrodata_parflow.ipynb rename examples/nwm/{NWM-Streamflow-Evaluation.ipynb => nwm_streamflow_point_scale_evaluation.ipynb} (100%) rename examples/nwm/{02_point_scale_comparison.ipynb => nwm_swe_point_scale_evaluation.ipynb} (100%) rename examples/nwm/{03_watershed_scale_comparison.ipynb => nwm_swe_watershed_scale_evaluation.ipynb} (100%) delete mode 100644 examples/nwm/utils/nwm_utils.py rename examples/parflow/{02_point_scale_comparison_ParFlow.ipynb => parflow_swe_point_scale_evaluation.ipynb} (56%) rename examples/parflow/{03_watershed_scale_comparison_ParFlow.ipynb => parflow_swe_watershed_scale_evaluation.ipynb} (100%) diff --git a/examples/collect_observations/dataCollection_nwm.ipynb b/examples/collect_observations/ccss_swe_collect_observations.ipynb similarity index 100% rename from examples/collect_observations/dataCollection_nwm.ipynb rename to examples/collect_observations/ccss_swe_collect_observations.ipynb diff --git a/examples/collect_observations/dataCollectionHydrodata_parflow.ipynb b/examples/collect_observations/dataCollectionHydrodata_parflow.ipynb deleted file mode 100644 index 5acd88d..0000000 --- a/examples/collect_observations/dataCollectionHydrodata_parflow.ipynb +++ /dev/null @@ -1,764 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "45fc714b", - "metadata": {}, - "source": [ - "![NWM](../img/NWM.png)\n", - "\n", - "# Use HydroData to Retrieve Modeled and Observed Snow Data for a Watershed of Interest \n", - "Authors: Irene Garousi-Nejad (igarousi@cuahsi.org), Danielle Tijerina-Kreuzer (dtijerina@cuahsi.org) \n", - "Last updated: January 2026 \n", - "\n", - "This notebook uses the Princeton HydroData repository to gather [SWE point observations](https://hf-hydrodata.readthedocs.io/en/latest/available_datasets.html#point-observations) and [ParFlow CONUS1 outputs](https://hf-hydrodata.readthedocs.io/en/latest/available_datasets.html#parflow-simulation-outputs). \n", - "See the `hf_hydrodata` [Getting Started](https://hf-hydrodata.readthedocs.io/en/latest/getting_started.html) page for more information on the package and account. \n" - ] - }, - { - "cell_type": "markdown", - "id": "19823635", - "metadata": {}, - "source": [ - "## 1. Setup" - ] - }, - { - "cell_type": "markdown", - "id": "fd7a7730", - "metadata": {}, - "source": [ - "### 1a. Python Environment \n", - "\n", - "Ensure that the `nwm_env` conda environment is selected as your Jupyter kernel. This environment should already be created if you followed the instructions under section \"Creating your HydroLearnEnv Virtual Environment\" in the `getting_started.md` file." - ] - }, - { - "cell_type": "markdown", - "id": "9bf991df", - "metadata": {}, - "source": [ - "Import the libraries needed to run this notebook:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "e77b41a9", - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "import os\n", - "import sys\n", - "\n", - "prefix = os.environ['CONDA_PREFIX']\n", - "os.environ['PROJ_LIB'] = os.path.join(prefix, 'share', 'proj')\n", - "\n", - "# add the src directory to the path so we can import evaluation modules\n", - "sys.path.append('../../src/')\n", - "\n", - "import sys\n", - "import pyproj\n", - "import pandas as pd\n", - "import numpy as np\n", - "import xarray as xr\n", - "import geopandas as gpd\n", - "from dask.distributed import Client\n", - "import matplotlib.pyplot as plt\n", - "import matplotlib.dates as mdates\n", - "import hf_hydrodata as hf\n", - "import subsettools\n", - "import hvplot.xarray\n", - "\n", - "\n", - "from cssi_evaluation.utils import plot_utils\n", - "\n", - "\n", - "%load_ext autoreload\n", - "%autoreload 2\n" - ] - }, - { - "cell_type": "markdown", - "id": "d6cb09f4", - "metadata": {}, - "source": [ - "### 1b. Register Pin and Access HydroData\n", - "\n", - "To access the HydroData catalog you will need to sign up for a [HydroFrame account](https://hydrogen.princeton.edu/signup) (do this only once), [create a 4-digit PIN](https://hydrogen.princeton.edu/pin), and register your pin in order to have access to the HydroData datasets (you will do this in the next code cell below). To note, you PIN will expire after 7 days and will need to recreate it after that time. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "3dea091d", - "metadata": {}, - "outputs": [], - "source": [ - "# You need to register on https://hydrogen.princeton.edu/pin \n", - "# and run the following with your registered information\n", - "# before you can use the hydrodata utilities\n", - "hf.register_api_pin(\"dtt2@princeton.edu\", \"7837\")" - ] - }, - { - "cell_type": "markdown", - "id": "4eb7d35c", - "metadata": {}, - "source": [ - "### 1c. Dask \n", - "\n", - "We'll use dask to parallelize our code. To manage parallel computation and visualize progress of long-running tasks, we initialize a Dask “cluster,” which defines how many workers are used and how much computing power each worker has. \n", - "\n", - "In this setup, we create a Dask client with `Client(n_workers=6, threads_per_worker=1, memory_limit='2GB')`, which launches a cluster with 6 workers. Each worker uses a single thread, typically mapped to one CPU core, allowing for efficient parallel processing across 6 cores. Each worker also has a memory limit of 2 GB, for a total of up to 12 GB across the cluster.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "72535c32", - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "# use a try accept loop so we only instantiate the client\n", - "# if it doesn't already exist.\n", - "try:\n", - " print('Dashboard link:', client.dashboard_link)\n", - "except: \n", - " # The client should be customized to your workstation resources.\n", - " client = Client(n_workers=6, threads_per_worker=1, memory_limit='2GB') \n", - " print('Dashboard link:', client.dashboard_link)\n", - "print(client)" - ] - }, - { - "cell_type": "markdown", - "id": "f1aed6de", - "metadata": {}, - "source": [ - "## 2. Set Paths" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "add1c5c0", - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "# Start and end times of a water year (to note, these dates were chosen to align with the PFCONUS1 early 2000s runs)\n", - "StartDate = '2003-10-01'\n", - "EndDate = '2005-09-30'\n", - "\n", - "domain_data_path = 'examples/parflow/domain_data/' # path to the model domain data\n", - "\n", - "# Path to save results (obs and mod stands for observation and modeled, respectively)\n", - "OBS_OutputFolder = './obs_outputs_PF' \n", - "MOD_OutputFolder = './mod_outputs_PF'" - ] - }, - { - "cell_type": "markdown", - "id": "878cad17", - "metadata": { - "tags": [] - }, - "source": [ - "## 3. Retrieve Observed Snow Data " - ] - }, - { - "cell_type": "markdown", - "id": "65b0d46d", - "metadata": {}, - "source": [ - "### 3a. Define the watershed of interest\n", - "\n", - "One of the simplest ways to gather data and model output from HydroData is by specifying a [Hydrologic Unit Code](https://www.usgs.gov/national-hydrography/watershed-boundary-dataset). Before we retrieve any hydrologic information, we need to indicate a HUC8 code and use it to gather snow water equivalent (SWE) observations from SNOTEL sites \n", - "\n", - "✏️ If you have a specific HUC8 in mind, you can change the variable `huc_8_code` below." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "a5504c19", - "metadata": {}, - "outputs": [], - "source": [ - "# ✏️ Specify HUC8 ID and Name for watershed of interest\n", - "huc_8_code = '14020001' # East-Taylor HUC-8\n", - "print(f'HUC-8 ID: {huc_8_code}')\n", - "\n", - "huc_8_name = 'East-Taylor'\n", - "print(f'HUC-8 name: {huc_8_name}')" - ] - }, - { - "cell_type": "markdown", - "id": "39f2529f", - "metadata": {}, - "source": [ - "Use the Subsettools function `define_huc_domain()` to get the actual CONUS1 indices associated with the East-Taylor HUC-O8. It returns a tuple `(imin, jmin, imax, jmax)` of grid indices that define a bounding box containing our region (or point) of interest (Note: (imin, jmin, imax, jmax) are the west, south, east and north boundaries of the box respectively) and a mask for that domain." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "ae9ba429-f165-4814-844b-12e7561c6199", - "metadata": {}, - "outputs": [], - "source": [ - "ij_bounds, mask = subsettools.define_huc_domain([huc_8_code], 'conus1')\n", - "\n", - "np.save(f'{domain_data_path}domainMask_{huc_8_name}_conus1.npy', mask)\n", - "\n", - "plt.imshow(mask, origin='lower')\n", - "print(ij_bounds)\n", - "print(mask.shape)" - ] - }, - { - "cell_type": "markdown", - "id": "0dd197f8", - "metadata": {}, - "source": [ - "Using the domain mask and the i,j PF-CONUS1 indices, we use a hf_hydrodata function to find and save the associated grid cell center lat/lon pair for each grid cell in the domain. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "15b47d36-40f1-4562-bb12-1e0b78c7d09a", - "metadata": {}, - "outputs": [], - "source": [ - "# Extract bounds\n", - "i_min, j_min, i_max, j_max = ij_bounds\n", - "mask_shape = mask.shape #shape of the subset rectangular domain\n", - "\n", - "# Create i/j index ranges\n", - "i_vals = np.arange(i_min, i_max)\n", - "j_vals = np.arange(j_min, j_max)\n", - "\n", - "# Create full 2D grid (note indexing order carefully)\n", - "jj, ii = np.meshgrid(j_vals, i_vals, indexing=\"ij\")" - ] - }, - { - "cell_type": "markdown", - "id": "46e103aa", - "metadata": {}, - "source": [ - "Because the function `hf.to_latlon()` finds the coordinates at the lower left corner of a grid cell, we add 0.5 to each i,j index pair to find the **lat/lon at the grid cell center**." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "947885f0-ec28-44cc-8ad8-e179b9795aa4", - "metadata": {}, - "outputs": [], - "source": [ - "# Compute grid cell centers\n", - "ii_center = ii + 0.5\n", - "jj_center = jj + 0.5\n", - "\n", - "# Convert to lat/lon (vectorized loop)\n", - "lat = np.zeros(mask_shape)\n", - "lon = np.zeros(mask_shape)\n", - "\n", - "for r in range(mask_shape[0]):\n", - " for c in range(mask_shape[1]):\n", - " lat[r, c], lon[r, c] = hf.to_latlon(\"conus1\",\n", - " ii_center[r, c],\n", - " jj_center[r, c])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "b06edd7c", - "metadata": {}, - "outputs": [], - "source": [ - "# Save 2D arrays of Lat & Lon\n", - "np.save(f\"{domain_data_path}{huc_8_name}_{huc_8_code}_lat_2d.npy\", lat)\n", - "np.save(f\"{domain_data_path}{huc_8_name}_{huc_8_code}_lon_2d.npy\", lon)\n", - "\n", - "# Save the Lat & Lon in a df (as CSV) matched with the ParFlow-CONUS i,j indices \n", - "grid_df = pd.DataFrame({\n", - " \"i\": ii.ravel(),\n", - " \"j\": jj.ravel(),\n", - " \"lat\": lat.ravel(),\n", - " \"lon\": lon.ravel(),\n", - "})\n", - "grid_df.to_csv(f\"{domain_data_path}df_{huc_8_name}_{huc_8_code}_conus1_gridPoints_LatLon.csv\", index=False)\n", - "\n", - "# Save a shapefile of the watershed Lat & Lon\n", - "grid_gdf = gpd.GeoDataFrame(\n", - " grid_df,\n", - " geometry=gpd.points_from_xy(grid_df.lon, grid_df.lat),\n", - " crs=\"EPSG:4326\"\n", - ")\n", - "# Save the grid points / GeoDataFrame to a shapefile for later use\n", - "grid_gdf.to_file(f\"{domain_data_path}{huc_8_name}_{huc_8_code}_conus1_gridPoints_LatLon.shp\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "542ce08c", - "metadata": {}, - "outputs": [], - "source": [ - "grid_df" - ] - }, - { - "cell_type": "markdown", - "id": "144f8eda", - "metadata": {}, - "source": [ - "### 3b. Explore the available SWE data in a watershed " - ] - }, - { - "cell_type": "markdown", - "id": "c8e955cd", - "metadata": {}, - "source": [ - "
\n", - "

📖 Did you know?

\n", - "

The Snow Telemetry (SNOTEL) network, managed by the USDA Natural Resources Conservation Service (NRCS), monitors snowpack conditions across key watersheds in the western United States to support water supply forecasting and climate monitoring. SNOTEL sites are fully automated stations that continuously measure snow water equivalent (SWE), snow depth, precipitation, temperature, and other meteorological variables throughout the year. Unlike manual snow survey programs, SNOTEL provides high-temporal-resolution observations that enable near–real-time assessment of snowpack evolution and interannual variability. These data are widely used for operational forecasting, drought assessment, and long-term climate analysis.

\n", - "
" - ] - }, - { - "cell_type": "markdown", - "id": "b48ff3e1", - "metadata": {}, - "source": [ - "Explore what SWE data is available at sites within the HUC ID you specified that operated during WY2004 and WY2005. If you want to check other variables besides SWE, you can change the `variable` argument name. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "8f3323af", - "metadata": {}, - "outputs": [], - "source": [ - "avail_df = hf.get_site_variables(variable = \"swe\",\n", - " huc_id = [huc_8_code], grid = 'conus1',\n", - " date_start = StartDate, date_end = EndDate)\n", - "\n", - "# View first five records\n", - "avail_df.head(5)" - ] - }, - { - "cell_type": "markdown", - "id": "808ca604", - "metadata": {}, - "source": [ - "### 3c. Map the SNOTEL stations inside the HUC-08 watershed that have available data in the selected time range \n", - "To note here, we are using pre-loaded shape files for the East-Taylor HUC8, which are located in the `/domain_data/` directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "0538d39b", - "metadata": {}, - "outputs": [], - "source": [ - "### Select station locations that fall within the HUC8 watershed\n", - "\n", - "# Path to the watershed shapefile that was just created\n", - "watershed = f'{domain_data_path}East-Taylor_14020001.shp'\n", - "watershed_gdf = gpd.read_file(watershed).to_crs(epsg=4326)\n", - "\n", - "# Create GeoDataFrame of all available stations\n", - "filtered_all_stations_gdf = gpd.GeoDataFrame(\n", - " avail_df,\n", - " geometry=gpd.points_from_xy(\n", - " avail_df.longitude,\n", - " avail_df.latitude\n", - " ),\n", - " crs=\"EPSG:4326\"\n", - ")\n", - "print(\"Sites CRS:\", filtered_all_stations_gdf.crs)\n", - "\n", - "# Combine watershed polygons into one geometry\n", - "watershed_union = watershed_gdf.geometry.unary_union\n", - "\n", - "# Filter stations that fall within the watershed\n", - "sites_in_watershed = filtered_all_stations_gdf[\n", - " filtered_all_stations_gdf.geometry.within(watershed_union)\n", - "].copy()\n", - "\n", - "sites_in_watershed.reset_index(drop=True, inplace=True)\n", - "\n", - "print(f\"Total sites in watershed: {len(sites_in_watershed)}\")\n", - "\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "id": "636ae5af", - "metadata": {}, - "source": [ - "Plot these sites on a map. Then, hover over the pins to see the site names." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "38fccbd4", - "metadata": {}, - "outputs": [], - "source": [ - "## TODO: REPLACE WITH CSSI_EVALUATION.PLOTS FUNCTIONS\n", - "\n", - "# this may take a moment to load, but it should pop up in a new window\n", - "m = plot_utils.plot_sites_within_domain(sites_in_watershed, watershed_gdf, zoom_start=9)\n", - "m" - ] - }, - { - "cell_type": "markdown", - "id": "5b76090d", - "metadata": {}, - "source": [ - "## 4. Retrieve SNOTEL point observations and metadata from HydroData \n", - "Use the `hf.get_point_data()` function to retrieve daily, start-of-day SWE from SNOTEL sites:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "a5ffe02a", - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "# Create a folder to save observations\n", - "isExist = os.path.exists(OBS_OutputFolder)\n", - "if isExist == True:\n", - " exit\n", - "else:\n", - " os.mkdir(OBS_OutputFolder)" - ] - }, - { - "cell_type": "markdown", - "id": "a0833bc5", - "metadata": {}, - "source": [ - "### 4a. Get HydroData Observed SWE\n", - "Gather the SNOTEL data for all stations within the watershed and save CSV:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "a7ac417c", - "metadata": {}, - "outputs": [], - "source": [ - "# Request point observations data\n", - "data_df = hf.get_point_data(dataset=\"snotel\", variable=\"swe\", temporal_resolution=\"daily\", aggregation=\"sod\",\n", - " date_start=StartDate, date_end=EndDate,\n", - " huc_id=[huc_8_code], grid='conus1')\n", - " #polygon=watershed_bbox, polygon_crs=watershed_crs)\n", - "\n", - "# save\n", - "data_df.to_csv(f'./{OBS_OutputFolder}/df_{huc_8_name}_{huc_8_code}_SNOTEL.csv', index=False)\n", - "\n", - "# Ensure date column is datetime\n", - "data_df[\"date\"] = pd.to_datetime(data_df[\"date\"])\n", - "\n", - "# View first five records\n", - "data_df.head(5)" - ] - }, - { - "cell_type": "markdown", - "id": "196353da", - "metadata": {}, - "source": [ - "### 4b. Get Metadata for HydroData Observed SWE\n", - "Also, retrieve the metadata for the same stations we retrieved SWE observations for:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "bc8480ee", - "metadata": {}, - "outputs": [], - "source": [ - "# Request site-level attributes for these sites\n", - "metadata_df = hf.get_point_metadata(dataset=\"snotel\", variable=\"swe\", temporal_resolution=\"daily\", aggregation=\"sod\",\n", - " date_start=StartDate, date_end=EndDate,\n", - " huc_id=['14020001'], grid='conus1')\n", - "\n", - "# save\n", - "metadata_df.to_csv(f'./{OBS_OutputFolder}/df_{huc_8_name}_{huc_8_code}_SNOTEL_metadata.csv', index=False)\n", - "\n", - "# View first five records\n", - "metadata_df.head(5)" - ] - }, - { - "cell_type": "markdown", - "id": "c4088b0c", - "metadata": {}, - "source": [ - "The metadata file is an important addition to the observations and it is recommended to always gather and save this for the observations you are using (particularly to support reproducibility within an open-science workflow). The saved file has useful attributes like site names, first and last date of available data, lat/lon, and the query URL. \n", - "\n", - "Additionally, the metadata contains **ParFlow-CONUS1 and ParFlow-CONUS2 `i,j` indices, which indicate the exact model domain grid cell the observation aligns with**. This is a useful HydroData feature that removes the need for users to manually match station latitude/longitude coordinates to the appropriate model grid cell, as this spatial mapping is handled directly within HydroData. We will use these indices below to extract PF-CONUS1 modeled SWE for each SNOTEL station in the section below. " - ] - }, - { - "cell_type": "markdown", - "id": "197fc2ad", - "metadata": { - "tags": [] - }, - "source": [ - "## 5. Retrieve ParFlow-CONUS1 Modeled Snow Data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "efbac9d1", - "metadata": {}, - "outputs": [], - "source": [ - "# Create a folder to save results\n", - "isExist = os.path.exists(MOD_OutputFolder)\n", - "if isExist == True:\n", - " exit\n", - "else:\n", - " os.mkdir(MOD_OutputFolder)" - ] - }, - { - "cell_type": "markdown", - "id": "3a40df9d", - "metadata": {}, - "source": [ - "The following section retrieves ParFlow-CONUS1 data for each SNOTEL site within our HUC-08 watershed. The code identifies the CONUS1 `i,j` indices associated with each SNOTEL site, indicated in the `metadata_df`. It then extracts the CONUS1 modeled SWE output for the site and the period of interest, returning the result as a DataFrame. To fairly compare with SNOTEL, which reports SWE once daily at the start of the local day, model output is aggregated by day, using the argment `\"temporal_resolution\": \"daily\"`. Finally, the processed data is saved as a CSV file for each site. \n", - "\n", - "### 5a. ParFlow CONUS1 Model Dataset Information\n", - "We can print some information about the model output dataset by using the `hf.get_catalog_entry()` to get the CONUS1 model dataset metadata. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "d57c3724", - "metadata": {}, - "outputs": [], - "source": [ - "conus1_options = {\n", - " \"dataset\": \"conus1_baseline_mod\",\n", - " \"variable\": \"swe\"\n", - "}\n", - "hf.get_catalog_entry(conus1_options)" - ] - }, - { - "cell_type": "markdown", - "id": "fb0c4eeb", - "metadata": {}, - "source": [ - "Before we gather model outputs at the specific SNOTEL sites, we can visualize SWE across our HUC-08. This is plotted for one day at 1km lateral resolution." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "e2a8e499", - "metadata": {}, - "outputs": [], - "source": [ - "# retrieve gridded PF-CONUS1 SWE for the entire HUC8 watershed\n", - "grid_swe_options = {\n", - " \"dataset\": \"conus1_baseline_mod\",\n", - " \"variable\": \"swe\",\n", - " \"temporal_resolution\": \"daily\",\n", - " \"start_time\": '2004-04-01', ### TO NOTE: the gridded function has exclusive end date, so this is hardcoded for now \n", - " \"end_time\": '2004-04-02',\n", - " \"huc_id\": huc_8_code\n", - " }\n", - " \n", - " # Get gridded data\n", - "grid_swe = hf.get_gridded_data(grid_swe_options)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "e8258c4e", - "metadata": {}, - "outputs": [], - "source": [ - "grid_swe_map = xr.DataArray(grid_swe[0], dims=(\"y\", \"x\"), name=\"SWE\")\n", - "grid_swe_map.hvplot.image(cmap=\"YlGnBu\", colorbar=True, aspect=\"equal\", title=f\"{huc_8_name} Gridded SWE on 2004-04-01\")\n" - ] - }, - { - "cell_type": "markdown", - "id": "da5e3987", - "metadata": {}, - "source": [ - "Now, grab the PF-CONUS1 modeled SWE from the SNOTEL site locations. Here we use the CONUS1 i and j indices from the `metadata_df` and grab the SWE from those grid cells. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "a0b86ddd", - "metadata": {}, - "outputs": [], - "source": [ - "# Copy data_df to model_df so we have the same timestamps and site_id structure\n", - "model_df = data_df.copy()\n", - "\n", - "# Set all non-date columns to NaN to prepare for filling in model data\n", - "non_date_cols = model_df.columns.difference([\"date\"])\n", - "model_df[non_date_cols] = np.nan\n", - "\n", - "# Rename site_id columns for PF outputs \n", - "model_df.columns = [\n", - " col if col == \"date\" else col.replace(\":SNTL\", \"\") + \":PFCONUS1\"\n", - " for col in model_df.columns\n", - "]" - ] - }, - { - "cell_type": "markdown", - "id": "09f1e698", - "metadata": {}, - "source": [ - "Use the function `hf.get_gridded_data()` and PF-CONUS1 `i,j` indices to select the SWE output for the correct location and time period: " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "470e19ec", - "metadata": {}, - "outputs": [], - "source": [ - "# Loop over each station in metadata_df\n", - "for idx, row in metadata_df.iterrows():\n", - " site_id = row[\"site_id\"] # original SNTL site_id\n", - " col_name = site_id.replace(\":SNTL\", \"\") + \":PFCONUS1\" # corresponding column in model_df\n", - " conus_i = int(row[\"conus1_i\"])\n", - " conus_j = int(row[\"conus1_j\"])\n", - " \n", - " # Build options dict for this station\n", - " options = {\n", - " \"dataset\": \"conus1_baseline_mod\",\n", - " \"variable\": \"swe\",\n", - " \"temporal_resolution\": \"daily\",\n", - " \"start_time\": '2003-10-01', ### TO NOTE: the gridded function has exclusive end date, so this is hardcoded for now \n", - " \"end_time\": '2005-10-01',\n", - " \"grid_point\": [conus_i, conus_j]\n", - " }\n", - " \n", - " # Get gridded data\n", - " data = hf.get_gridded_data(options)\n", - " \n", - " # Fill column in model_df\n", - " # Convert to numeric in case hf returns lists or other types\n", - " model_df[col_name] = np.squeeze(np.array(data))\n", - "\n", - "# Ensure date column is datetime\n", - "model_df[\"date\"] = pd.to_datetime(model_df[\"date\"])\n", - "\n", - "# Save\n", - "model_df.to_csv(f'./{MOD_OutputFolder}/df_{huc_8_name}_{huc_8_code}_PFCONUS1.csv', index=False)\n", - " \n", - "model_df.head(5)" - ] - }, - { - "cell_type": "markdown", - "id": "54ae8422", - "metadata": {}, - "source": [ - "## 6. Quick plot sanity check \n", - "Plot a simple timeseries of modeled and observed SWE to make sure our data retrieval was successful. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "06701744", - "metadata": {}, - "outputs": [], - "source": [ - "fig, ax = plt.subplots(figsize=(10, 4))\n", - "\n", - "ax.plot(data_df[\"date\"], model_df[\"380:CO:PFCONUS1\"], label=\"Modeled\", linewidth=2)\n", - "\n", - "ax.plot(data_df[\"date\"], data_df[\"380:CO:SNTL\"], label=\"Observed\", linewidth=2)\n", - "\n", - "ax.set_xlabel(\"Date\")\n", - "ax.set_ylabel(\"SWE (mm)\")\n", - "\n", - "# Date formatting for x-axis\n", - "ax.xaxis.set_major_locator(mdates.MonthLocator(interval=3))\n", - "ax.xaxis.set_major_formatter(mdates.DateFormatter('%m-%Y'))\n", - "\n", - "ax.legend(loc='upper left')\n", - "ax.grid(True, alpha=0.3)\n", - "\n", - "plt.tight_layout()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "a0c0c86d", - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "cssi_evaluation", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.13" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/examples/nwm/NWM-Streamflow-Evaluation.ipynb b/examples/nwm/nwm_streamflow_point_scale_evaluation.ipynb similarity index 100% rename from examples/nwm/NWM-Streamflow-Evaluation.ipynb rename to examples/nwm/nwm_streamflow_point_scale_evaluation.ipynb diff --git a/examples/nwm/02_point_scale_comparison.ipynb b/examples/nwm/nwm_swe_point_scale_evaluation.ipynb similarity index 100% rename from examples/nwm/02_point_scale_comparison.ipynb rename to examples/nwm/nwm_swe_point_scale_evaluation.ipynb diff --git a/examples/nwm/03_watershed_scale_comparison.ipynb b/examples/nwm/nwm_swe_watershed_scale_evaluation.ipynb similarity index 100% rename from examples/nwm/03_watershed_scale_comparison.ipynb rename to examples/nwm/nwm_swe_watershed_scale_evaluation.ipynb diff --git a/examples/nwm/utils/nwm_utils.py b/examples/nwm/utils/nwm_utils.py deleted file mode 100644 index 5b420e9..0000000 --- a/examples/nwm/utils/nwm_utils.py +++ /dev/null @@ -1,690 +0,0 @@ -import os -import sys -import pytz -import time -import urllib3 -import datetime -import numpy as np -import pandas as pd -import pyproj -import folium -import hvplot.pandas -import holoviews as hv -import hvplot.xarray -from holoviews import opts -import xarray as xr -import geopandas as gpd -import matplotlib.pyplot as plt -import matplotlib.dates as mdates -import xyzservices.providers as xyz -from scipy.stats import pearsonr, spearmanr -pd.options.mode.chained_assignment = None - -import geoviews as gv -import geoviews.tile_sources as gts -gv.extension('bokeh') - -def getSNOTELData(SiteName, SiteID, StateAbb, StartDate, EndDate, OutputFolder): - url1 = 'https://wcc.sc.egov.usda.gov/reportGenerator/view_csv/customMultiTimeSeriesGroupByStationReport/daily/start_of_period/' - url2 = f'{SiteID}:{StateAbb}:SNTL%7Cid=%22%22%7Cname/' - url3 = f'{StartDate},{EndDate}/' - url4 = 'WTEQ::value?fitToScreen=false' - url = url1+url2+url3+url4 - - dl_start_time = time.time() - - http = urllib3.PoolManager() - response = http.request('GET', url) - data = response.data.decode('utf-8') - i=0 - for line in data.split("\n"): - if line.startswith("#"): - i=i+1 - data = data.split("\n")[i:] - - df = pd.DataFrame.from_dict(data) - df = df[0].str.split(',', expand=True) - df.rename(columns={0:df[0][0], - 1:df[1][0]}, inplace=True) - df.drop(0, inplace=True) - df.dropna(inplace=True) - df.reset_index(inplace=True, drop=True) - df["Date"] = pd.to_datetime(df["Date"]) - df.rename(columns={df.columns[1]:'Snow Water Equivalent (m) Start of Day Values'}, inplace=True) - df.iloc[:, 1:] = df.iloc[:, 1:].apply(lambda x: pd.to_numeric(x) * 0.0254) # convert in to m - df['Water_Year'] = pd.to_datetime(df['Date']).map(lambda x: x.year+1 if x.month>9 else x.year) - - df.to_csv(f'./{OutputFolder}/df_{SiteID}_{StateAbb}_SNTL.csv', index=False) - - dl_elapsed = time.time() - dl_start_time - print(f'✅ Retrieved data for {SiteName}, {SiteID} in {dl_elapsed:.2f} seconds\n') - -def getCCSSData(SiteName, SiteID, StartDate, EndDate, OutputFolder): - StateAbb = 'Ca' - url1 = 'https://wcc.sc.egov.usda.gov/reportGenerator/view_csv/customSingleStationReport/daily/start_of_period/' - url2 = f'{SiteID}:CA:MSNT%257Cid=%2522%2522%257Cname/' - url3 = f'{StartDate},{EndDate}/' - url4 = 'WTEQ::value?fitToScreen=false' - url = url1+url2+url3+url4 - - dl_start_time = time.time() - - http = urllib3.PoolManager() - response = http.request('GET', url) - data = response.data.decode('utf-8') - i=0 - for line in data.split("\n"): - if line.startswith("#"): - i=i+1 - data = data.split("\n")[i:] - - df = pd.DataFrame.from_dict(data) - print(df.columns) - df = df[0].str.split(',', expand=True) - df.rename(columns={0:df[0][0], - 1:df[1][0]}, inplace=True) - df.drop(0, inplace=True) - df.dropna(inplace=True) - df.reset_index(inplace=True, drop=True) - df["Date"] = pd.to_datetime(df["Date"]) - df.rename(columns={df.columns[1]:'Snow Water Equivalent (m) Start of Day Values'}, inplace=True) - df.iloc[:, 1:] = df.iloc[:, 1:].apply(lambda x: pd.to_numeric(x) * 0.0254) # convert in to m - df['Water_Year'] = pd.to_datetime(df['Date']).map(lambda x: x.year+1 if x.month>9 else x.year) - - df.to_csv(f'./{OutputFolder}/df_{SiteID}_{StateAbb}_CCSS.csv', index=False) - - dl_elapsed = time.time() - dl_start_time - print(f'✅ Retrieved data for {SiteName}, {SiteID} in {dl_elapsed:.2f} seconds\n') - -def convert_latlon_to_yx(lat, lon, input_crs, ds, output_crs): - """ - This function takes latitude and longitude values along with - input and output coordinate reference system (CRS) and - uses Python's pyproj package to convert the provided values - (as single float values, not arrays) to the corresponding y and x - coordinates in the output CRS. - - Parameters: - lat: The latitude value - lon: The longitude value - input_crs: The input coordinate reference system ('EPSG:4326') - output_crs: The output coordinate reference system - - Returns: - y, x: a tuple of the transformed coordinates in the specified output - """ - # Create a transformer - transformer = pyproj.Transformer.from_crs(input_crs, output_crs, always_xy=True) - - # Perform the transformation - x, y = transformer.transform(lon, lat) - - return y, x - -def convert_utc_to_local(state, df): - state_abbreviations = { - "Alabama": "AL", "Alaska": "AK", "Arizona": "AZ", "Arkansas": "AR", - "California": "CA", "Colorado": "CO", "Connecticut": "CT", "Delaware": "DE", - "Florida": "FL", "Georgia": "GA", "Hawaii": "HI", "Idaho": "ID", - "Illinois": "IL", "Indiana": "IN", "Iowa": "IA", "Kansas": "KS", - "Kentucky": "KY", "Louisiana": "LA", "Maine": "ME", "Maryland": "MD", - "Massachusetts": "MA", "Michigan": "MI", "Minnesota": "MN", "Mississippi": "MS", - "Missouri": "MO", "Montana": "MT", "Nebraska": "NE", "Nevada": "NV", - "New Hampshire": "NH", "New Jersey": "NJ", "New Mexico": "NM", "New York": "NY", - "North Carolina": "NC", "North Dakota": "ND", "Ohio": "OH", "Oklahoma": "OK", - "Oregon": "OR", "Pennsylvania": "PA", "Rhode Island": "RI", - "South Carolina": "SC", "South Dakota": "SD", "Tennessee": "TN", - "Texas": "TX", "Utah": "UT", "Vermont": "VT", "Virginia": "VA", - "Washington": "WA", "West Virginia": "WV", "Wisconsin": "WI", "Wyoming": "WY" - } - - state_timezones = { - 'AL': 'US/Central', 'AK': 'US/Alaska', 'AZ': 'US/Mountain', 'AR': 'US/Central', - 'CA': 'US/Pacific', 'CO': 'US/Mountain', 'CT': 'US/Eastern', 'DE': 'US/Eastern', - 'FL': 'US/Eastern', 'GA': 'US/Eastern', 'HI': 'US/Hawaii', 'ID': 'US/Mountain', - 'IL': 'US/Central', 'IN': 'US/Eastern', 'IA': 'US/Central', 'KS': 'US/Central', - 'KY': 'US/Eastern', 'LA': 'US/Central', 'ME': 'US/Eastern', 'MD': 'US/Eastern', - 'MA': 'US/Eastern', 'MI': 'US/Eastern', 'MN': 'US/Central', 'MS': 'US/Central', - 'MO': 'US/Central', 'MT': 'US/Mountain', 'NE': 'US/Central', 'NV': 'US/Pacific', - 'NH': 'US/Eastern', 'NJ': 'US/Eastern', 'NM': 'US/Mountain', 'NY': 'US/Eastern', - 'NC': 'US/Eastern', 'ND': 'US/Central', 'OH': 'US/Eastern', 'OK': 'US/Central', - 'OR': 'US/Pacific', 'PA': 'US/Eastern', 'RI': 'US/Eastern', 'SC': 'US/Eastern', - 'SD': 'US/Central', 'TN': 'US/Central', 'TX': 'US/Central', 'UT': 'US/Mountain', - 'VT': 'US/Eastern', 'VA': 'US/Eastern', 'WA': 'US/Pacific', 'WV': 'US/Eastern', - 'WI': 'US/Central', 'WY': 'US/Mountain' - } - - if len(state) == 2: - state_abbr = state - else: - state_abbr = state_abbreviations.get(state, "State not found") - - # Extract the state abbreviation from the filename - # state_abbr = os.path.basename(filename).split('_')[2] - timezone = state_timezones.get(state_abbr) - - if timezone: - # Convert the 'Date' column to datetime - df['Date'] = pd.to_datetime(df['Date'], utc=True) - - # Convert to local time zone - local_tz = pytz.timezone(timezone) - df['Date_Local'] = df['Date'].dt.tz_convert(local_tz) - - # Save the timezone-aware Date_Local column - df['Date_Local'] = df['Date_Local'].astype(str) - df['Date_Local'] = df['Date_Local'].apply(lambda x: datetime.datetime.strptime(x, '%Y-%m-%d %H:%M:%S%z')) - df['Date_Local'] = df['Date_Local'].apply(lambda x: x.replace(tzinfo=None)) - - else: - print(f"Timezone for state abbreviation {state_abbr} not found.") - - return df - -def combine(obs_files, mod_files, StartDate, EndDate): - - # Create a dictionary to store dataframes - dataframes = {} - - # Read SNOTEL files - for file in obs_files: - location = os.path.basename(file).split('_')[1] # Extract location from filename - network = os.path.basename(file).split('_')[-1].split('.')[0] # Extract network from filename - df = pd.read_csv(file) - df['Date'] = pd.to_datetime(df['Date']).dt.date # .dt.date is required because times are not excatly the same between SNOTEL and NWM - dataframes[f'{network}_{location}'] = df.set_index('Date') - - # Read NWM files - for file in mod_files: - location = os.path.basename(file).split('_')[1] # Extract location from filename - df = pd.read_csv(file) - df['Date_Local'] = pd.to_datetime(df['Date_Local']).dt.date # .dt.date is required because times are not excatly the same between SNOTEL and NWM - dataframes[f'NWM_{location}'] = df.set_index('Date_Local') - - # Merge dataframes on Date - combined_df = pd.DataFrame(index=pd.date_range(start=StartDate, end=EndDate)) - for key, df in dataframes.items(): - if 'SNTL' in key: - combined_df[f'{key}_swe_m'] = df['Snow Water Equivalent (m) Start of Day Values'] - if 'CCSS' in key: - combined_df[f'{key}_swe_m'] = df['Snow Water Equivalent (m) Start of Day Values'] - elif 'NWM' in key: - combined_df[f'{key}_swe_m'] = df['NWM_SWE_meters'] - - return combined_df - -def report_max_dates_and_values(df, col_obs, col_mod): - # Ensure the index is datetime - if not pd.api.types.is_datetime64_any_dtype(df.index): - raise ValueError("DataFrame index must be datetime") - - # Find max values and associated dates - max_obs = df[col_obs].max() - date_obs = df[col_obs].idxmax() - - max_mod = df[col_mod].max() - date_mod = df[col_mod].idxmax() - - # Create a summary table as a DataFrame (nice for Jupyter display) - summary_table = pd.DataFrame({ - 'Data Source': [col_obs, col_mod], - 'Peak SWE (mm)': [max_obs, max_mod], - 'Date of Maximum': [date_obs.date(), date_mod.date()] - }) - - return summary_table - -def compute_melt_period(swe_series, min_zero_days=10): - - # Find peak date and maximum SWE - peak_date = swe_series.idxmax() - peak_swe = swe_series.max() - - # Subset data to only include days after peak date - after_peak = swe_series.loc[peak_date:] - - # Find first date where SWE becomes zero and stays zero for at least `min_zero_days` - zero_streak = 0 - melt_end_date = None - - for date, value in after_peak.items(): - if value == 0: - zero_streak += 1 - else: - zero_streak = 0 - - if zero_streak >= min_zero_days: - melt_end_date = date - break - - if melt_end_date is None: - raise ValueError("Could not find a period of at least 10 consecutive zero SWE days after the peak.") - - # Calculate melt period length (days between peak and melt completion) - melt_period_days = (melt_end_date - peak_date).days - - # Calculate melt rate (m/day) - melt_rate = peak_swe / melt_period_days - - # Return results in a dictionary - return { - 'peak_date': peak_date, - 'peak_swe_m': peak_swe, - 'melt_end_date': melt_end_date, - 'melt_period_days': melt_period_days, - 'melt_rate_m/d': melt_rate - } - -def prep_nwm_swe_dataframe(ds, state): - df = ds.to_dataframe() - df.drop(columns=['crs'], inplace=True) - df.reset_index(inplace=True) - df["time"] = pd.to_datetime(df["time"]) - df.rename(columns={df.columns[0]:'Date', df.columns[1]:'NWM_SWE_meters'}, inplace=True) - df.iloc[:, 1:] = df.iloc[:, 1:].apply(lambda x: pd.to_numeric(x)/1000) # convert mm to m - df_local = convert_utc_to_local(state, df) - df_local.index = pd.to_datetime(df_local['Date_Local']) - df_local = df_local.groupby(pd.Grouper(freq='D')).first() - df_local = df_local.reset_index(drop=True) - #df_local.to_csv(output_path, index=False) - - return df_local - -def compute_spatial_agg_from_obs(folder_path, agg): - # List all CSV files in the folder - csv_files = [f for f in os.listdir(folder_path) if f.endswith('.csv')] - - if len(csv_files) == 0: - raise ValueError("No CSV files found in the specified folder.") - - # Read all files into a list of DataFrames - dfs = [] - for file in csv_files: - file_path = os.path.join(folder_path, file) - df = pd.read_csv(file_path, parse_dates=['Date']) - dfs.append(df) - - # Concatenate all files into a single DataFrame - combined_df = pd.concat(dfs) - - # Group by Date and Water_Year, compute mean SWE - averaged_df = combined_df.groupby(['Date', 'Water_Year'], as_index=False).agg({ - 'Snow Water Equivalent (m) Start of Day Values': agg - }) - - # Save to output CSV - # averaged_df.to_csv(output_file, index=False) - return averaged_df - - print(f"Averaged CSV saved to: {output_file}") - - - -def plot_sites_within_domain(gdf_sites, domain_gdf, zoom_start=10): - """ - Create and return a folium map showing observation sites within a given watershed boundary. - - Parameters: - - gdf_sites: GeoDataFrame containing site locations. - - domain_gdf: GeoDataFrame containing the watershed boundary. - - zoom_start: Initial zoom level for the map (default=10). - - Returns: - - folium.Map object ready to display. - """ - import folium - import geopandas as gpd - - # Ensure CRS compatibility - if gdf_sites.crs != domain_gdf.crs: - gdf_sites = gdf_sites.to_crs(domain_gdf.crs) - - # Helper to find appropriate column names - def find_column(columns, candidates): - return next((c for c in candidates if c in columns), None) - - # Candidate column names (ordered by preference) - site_name_col = find_column( - gdf_sites.columns, - ['site_name', 'station_name', 'name', 'Site Name'] - ) - site_id_col = find_column( - gdf_sites.columns, - ['site_id', 'station_id', 'site_code', 'code', 'Site Code'] - ) - - # Calculate center of the domain - minx, miny, maxx, maxy = domain_gdf.total_bounds - center_lat = (miny + maxy) / 2 - center_lon = (minx + maxx) / 2 - - # Create folium map - m = folium.Map(location=[center_lat, center_lon], zoom_start=zoom_start) - - # Add site markers - for _, row in gdf_sites.iterrows(): - site_name = row[site_name_col] if site_name_col else "Site" - site_id = row[site_id_col] if site_id_col else "" - - folium.Marker( - location=[row.geometry.y, row.geometry.x], - popup=f"{site_name}
Site ID: {site_id}", - tooltip=f"{site_name} ({site_id})", - icon=folium.Icon(color="green", icon="info-sign") - ).add_to(m) - - # Add watershed boundary - folium.GeoJson( - domain_gdf.to_json(), - name="Watershed Boundary", - style_function=lambda x: { - "color": "lightcyan", - "weight": 2, - "fillOpacity": 0.3, - }, - ).add_to(m) - - # Add Esri Imagery basemap - folium.TileLayer( - tiles="https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/Tile/{z}/{y}/{x}", - attr="Esri, Maxar, Earthstar Geographics, and the GIS User Community", - name="Esri Imagery", - overlay=False, - control=True, - ).add_to(m) - - # Layer control - folium.LayerControl(collapsed=False).add_to(m) - - return m - - -def compute_stats(df, ts1, ts2): - df = df[[f'{ts1}', f'{ts2}']] - df.dropna(inplace=True) # Both Pearson and Spearman correlations cannot handle NaN values, so make sure to drop nan values before any calculatoin. - - # Compute statistics for each time series - stats = { - 'Mean [L]': [df[f'{ts1}'].mean(), df[f'{ts2}'].mean()], - 'Median [L]': [df[f'{ts1}'].median(), df[f'{ts2}'].median()], - 'Standard Deviation [L]': [df[f'{ts1}'].std(), df[f'{ts2}'].std()], - 'Variance [L^2]': [df[f'{ts1}'].var(), df[f'{ts2}'].var()], - 'Min [L]': [df[f'{ts1}'].min(), df[f'{ts2}'].min()], - 'Max [L]': [df[f'{ts1}'].max(), df[f'{ts2}'].max()] - } - - # Calculate correlation coefficients - pearson_corr, _ = pearsonr(df[f'{ts1}'], df[f'{ts2}']) - spearman_corr, _ = spearmanr(df[f'{ts1}'], df[f'{ts2}']) - - # Compute Mean Bias (mean error) - bias = df[ts2].mean() - df[ts1].mean() - - # Compute Nash-Sutcliffe Efficiency (NSE) - obs_mean = df[ts1].mean() - numerator = np.sum((df[ts2] - df[ts1])**2) - denominator = np.sum((df[ts1] - obs_mean)**2) - nse = 1 - (numerator / denominator) - - # Compute Kling-Gupta Efficiency (KGE) - r = pearson_corr - alpha = df[ts2].std() / df[ts1].std() - beta = df[ts2].mean() / df[ts1].mean() - kge = 1 - np.sqrt((r - 1)**2 + (alpha - 1)**2 + (beta - 1)**2) - - # Create a DataFrame for the statistics - stats_table = pd.DataFrame(stats, index=['observed', 'modeled']) - - # Add Pearson and Spearman correlations as additional rows - stats_table.loc[''] = [''] * len(stats_table.columns) # Blank row for formatting - stats_table.loc['Pearson Correlation [-]'] = [pearson_corr, '', '', '', '', ''] - stats_table.loc['Spearman Correlation [-]'] = [spearman_corr, '', '', '', '', ''] - stats_table.loc['Bias (Modeled - Observed) [L]'] = [bias, '', '', '', '', ''] - stats_table.loc['Nash-Sutcliffe Efficiency (NSE) [-]'] = [nse, '', '', '', '', ''] - stats_table.loc['Kling-Gupta Efficiency (KGE) [-]'] = [kge, '', '', '', '', ''] - - return stats_table - -def compute_stats_period(df, ts_obs, ts_mod, months): - - """ - Create a wrapper for compute_stats to filter dataframe by months before computing stats. - For example, to compute stats for melt season (April to July), use months=[4,5,6,7]. - - Parameters: - - df: DataFrame containing the data. - - ts_obs: Column name for observed timeseries in df. - - ts_mod: Column name for modeled timeseries in df. - - months: List of months to filter the data. - - Returns: - - DataFrame with computed statistics. - """ - df_sub = df[df.index.month.isin(months)] - - return compute_stats(df_sub, ts_obs, ts_mod) - - -def comparison_plots(df, ts_obs, ts_mod): - ''' - Create a set of comparison plots (timeseries overlay and scatter plot with 1:1 line) - - Parameters: - df: dataframe with combined observed and modeled timeseries for each site - ts_obs (str): column heading for observed timeseries in df - ts_mod (str): column heading for modeled timeseries in df - ''' - - df = df.copy() - df.index.name = "date" # change the index name to "Date" for better hover tooltip display - - # Timeseries plot (Overlay) - observed_plot = df.hvplot.line( - y=ts_obs, - ylabel='Snow Water Equivalent (mm)', - xlabel='', - label='Observed SWE', - color='blue', - line_width=2, - width=500, - height=400, - ) - - modeled_plot = df.hvplot.line( - y=ts_mod, - ylabel='Snow Water Equivalent (mm)', - xlabel='', - label='Modeled SWE', - color='orchid', - line_width=2, - width=500, - height=400, - ) - - # Overlay (combines both lines into a single visual object) - timeseries_plot = (observed_plot * modeled_plot).opts( - title='Observed vs Modeled SWE\nDaily Time Series', - legend_position='top_right', - ) - - # Scatter plot - scatter_plot = df.hvplot.scatter( - x=ts_obs, - y=ts_mod, - xlabel='Observed SWE (mm)', - ylabel='Modeled SWE (mm)', - color='black', - width=500, - height=400, - size=15, - hover_cols=['date'] # This will add the date (index) to hover tooltip - ) - - # Add 1:1 line (perfect match line) - swe_max = max(df[ts_obs].max(), df[ts_mod].max()) - one_to_one_line = hv.Curve(([0, swe_max], [0, swe_max])).opts( - color='gray', - line_dash='solid', - line_width=1, - ).relabel('1:1 Line') # This is the correct way to set a label for a Curve - - # Combine scatter plot and 1:1 line into an Overlay - scatter_with_line = (scatter_plot * one_to_one_line).opts( - title='Observed vs Modeled SWE\nScatter with 1:1 Line', - legend_position='bottom_right' - ) - - # Combine both into a 1-row, 2-column layout - layout = (timeseries_plot + scatter_with_line).opts(shared_axes=False) - - - return layout - - -def plot_custom_scatter_SWE( - df, - obs_col, - mod_col, - *, - highlight_months=None, - month_col="month", - size=15, - width=500, - height=400, -): - """ - Flexible scatter plot with optional month highlighting and 1:1 line. - - Parameters - ---------- - df : pandas.DataFrame - Input dataframe - obs_col : str - Column name for observed SWE - mod_col : str - Column name for modeled SWE - highlight_months : list[int], optional - Months to highlight (e.g., [10, 11]) - month_col : str - Column containing month values - """ - - df = df.copy() - - # Handle highlighting - if highlight_months is not None and month_col in df.columns: - df["color"] = df[month_col].apply( - lambda m: "teal" if m in highlight_months else "tomato" - ) - color = "color" - else: - color = "black" - - scatter = df.hvplot.scatter( - x=obs_col, - y=mod_col, - xlabel='Observed SWE (mm)', - ylabel='Modeled SWE (mm)', - title='Observed vs. Modeled SWE at ' + obs_col, - size=15, - width=500, - height=400, - hover_cols=['index', 'month'], - color='color' - ) - - # 1:1 line - swe_max = max(df[obs_col].max(), df[mod_col].max()) - one_to_one = hv.Curve(([0, swe_max], [0, swe_max])).opts( - color="gray", - line_dash="dashed", - line_width=1, - ).relabel("1:1 Line") - - return (scatter * one_to_one).opts(legend_position="bottom_right") - - -def plot_grid_vector_data(ds_clip, data_var, time_index, shp, sites): - hv.extension('bokeh') - da = ds_clip[data_var] - - # Select one timestep - if isinstance(time_index, int): - da = da.isel(time=time_index) - else: - da = da.sel(time=time_index) - - # Create an interactive map plot - clipped = da.rio.reproject("EPSG:4326") - clipped = clipped.rename({'x': 'longitude', 'y': 'latitude'}) - hvplot_map = clipped.hvplot( - x='longitude', - y='latitude', - geo=True, - project=True, - tiles=gts.ESRI, - cmap='kbc', - alpha=0.6, - frame_height=400, - title=f"Snow Water Equivalent, at {pd.to_datetime(time_index).strftime('%Y-%m-%d %H:%M')}", - clim=(0, 300) - ) - - shp = shp.to_crs("EPSG:4326").reset_index(drop=True) - sites = sites.to_crs("EPSG:4326").reset_index(drop=True) - - # Plot the shapefile outline - shp_plot = shp.hvplot( - geo=True, project=True, - color='none', line_width=2 - ) - - # Plot sites (scatter) - points_plot = sites.hvplot.points( - x='longitude', y='latitude', - geo=True, project=True, - color='red', size=100, hover_cols=['name'] - ) - - # Combine the two by overlaying - combined_map = (hvplot_map * shp_plot * points_plot).opts(framewise=True) - - return combined_map - -def plot_grid_vector_monthly_data(ds_clip, data_var, shp, sites): - hv.extension('bokeh') - - # Create an interactive map plot - clipped = ds_clip[data_var].rio.reproject("EPSG:4326") - clipped = clipped.rename({'x': 'longitude', 'y': 'latitude'}) - - # Plot the shapefile outline - shp_plot = shp.hvplot( - geo=True, project=True, - color='none', line_width=2 - ) - - # Plot sites (scatter) - points_plot = sites.hvplot.points( - x='longitude', y='latitude', - geo=True, project=True, - color='red', size=100, hover_cols=['name'] - ) - - # Split into individual plots (list of plots) - plots = [] - for t in clipped.time.values: - base_plot = clipped.sel(time=t).hvplot( - x='longitude', y='latitude', - geo=True, project=True, - tiles=gts.ESRI, - title=f'SWE (mm) on {pd.to_datetime(t).strftime("%Y-%m-%d")}', - frame_height=200, frame_width=300 - ) - # Overlay shapefile and points on top of SWE map - combined_plot = base_plot * shp_plot * points_plot - plots.append(combined_plot) - - layout = hv.Layout(plots).cols(3) - - return layout \ No newline at end of file diff --git a/examples/parflow/02_point_scale_comparison_ParFlow.ipynb b/examples/parflow/parflow_swe_point_scale_evaluation.ipynb similarity index 56% rename from examples/parflow/02_point_scale_comparison_ParFlow.ipynb rename to examples/parflow/parflow_swe_point_scale_evaluation.ipynb index d2f3906..88d04c8 100644 --- a/examples/parflow/02_point_scale_comparison_ParFlow.ipynb +++ b/examples/parflow/parflow_swe_point_scale_evaluation.ipynb @@ -4,9 +4,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "![NWM](img/NWM.png)\n", + "![NWM](../img/NWM.png)\n", "\n", - "# Analyze ParFlow-CONUS Outputs vs Observed Snow Water Equivalent (SWE)\n", + "# Use HydroData to Retrieve Modeled and Observed Snow Data for a Watershed of Interest with ParFlow-CONUS Outputs vs Observed Snow Water Equivalent (SWE) - Full Evaluation Workflow\n", "Authors: Irene Garousi-Nejad (igarousi@cuahsi.org), Danielle Tijerina-Kreuzer (dtijerina@cuahsi.org) \n", "Last updated: Feb 2026" ] @@ -18,7 +18,7 @@ "#### Introduction: \n", "This notebook demonstrates how to perform a point-scale analysis comparing modeled and observed SWE at selected SNOTEL sites. We focus on analyzing model performance both for **a single SNOTEL site** and **watershed-scale behavior for multiple stations**, with particular attention to the **magnitude and timing of peak SWE**. \n", "\n", - "This notebook requires ParFlow-CONUS output, SNOTEL data, and metadata CSVs that are created in the `01_HydroData_collection.ipynb` notebook." + "# FIX THIS: This notebook requires ParFlow-CONUS output, SNOTEL data, and metadata CSVs that are created in the `01_HydroData_collection.ipynb` notebook." ] }, { @@ -75,65 +75,732 @@ "%autoreload 2" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "60a74589", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "19adce70", + "metadata": {}, + "source": [ + "# Get data from Hydrodata - from `dataCollectionHydrodata_parflow.ipynb` notebook and needs to be merged into this notebook" + ] + }, + { + "cell_type": "markdown", + "id": "e86aae63", + "metadata": {}, + "source": [ + "## 1. Setup" + ] + }, + { + "cell_type": "markdown", + "id": "e88ed1ef", + "metadata": {}, + "source": [ + "### 1a. Python Environment \n", + "\n", + "Ensure that the `nwm_env` conda environment is selected as your Jupyter kernel. This environment should already be created if you followed the instructions under section \"Creating your HydroLearnEnv Virtual Environment\" in the `getting_started.md` file." + ] + }, + { + "cell_type": "markdown", + "id": "c0f30927", + "metadata": {}, + "source": [ + "Import the libraries needed to run this notebook:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0dfc3fb3", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import sys\n", + "\n", + "prefix = os.environ['CONDA_PREFIX']\n", + "os.environ['PROJ_LIB'] = os.path.join(prefix, 'share', 'proj')\n", + "\n", + "# add the src directory to the path so we can import evaluation modules\n", + "sys.path.append('../../src/')\n", + "\n", + "import sys\n", + "import pyproj\n", + "import pandas as pd\n", + "import numpy as np\n", + "import xarray as xr\n", + "import geopandas as gpd\n", + "from dask.distributed import Client\n", + "import matplotlib.pyplot as plt\n", + "import matplotlib.dates as mdates\n", + "import hf_hydrodata as hf\n", + "import subsettools\n", + "import hvplot.xarray\n", + "\n", + "\n", + "from cssi_evaluation.utils import plot_utils\n", + "\n", + "\n", + "%load_ext autoreload\n", + "%autoreload 2\n" + ] + }, + { + "cell_type": "markdown", + "id": "8e058228", + "metadata": {}, + "source": [ + "### 1b. Register Pin and Access HydroData\n", + "\n", + "To access the HydroData catalog you will need to sign up for a [HydroFrame account](https://hydrogen.princeton.edu/signup) (do this only once), [create a 4-digit PIN](https://hydrogen.princeton.edu/pin), and register your pin in order to have access to the HydroData datasets (you will do this in the next code cell below). To note, you PIN will expire after 7 days and will need to recreate it after that time. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a365996d", + "metadata": {}, + "outputs": [], + "source": [ + "# You need to register on https://hydrogen.princeton.edu/pin \n", + "# and run the following with your registered information\n", + "# before you can use the hydrodata utilities\n", + "hf.register_api_pin(\"dtt2@princeton.edu\", \"7837\")" + ] + }, + { + "cell_type": "markdown", + "id": "825c288d", + "metadata": {}, + "source": [ + "### 1c. Dask \n", + "\n", + "We'll use dask to parallelize our code. To manage parallel computation and visualize progress of long-running tasks, we initialize a Dask “cluster,” which defines how many workers are used and how much computing power each worker has. \n", + "\n", + "In this setup, we create a Dask client with `Client(n_workers=6, threads_per_worker=1, memory_limit='2GB')`, which launches a cluster with 6 workers. Each worker uses a single thread, typically mapped to one CPU core, allowing for efficient parallel processing across 6 cores. Each worker also has a memory limit of 2 GB, for a total of up to 12 GB across the cluster.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6f2b08d0", + "metadata": {}, + "outputs": [], + "source": [ + "# use a try accept loop so we only instantiate the client\n", + "# if it doesn't already exist.\n", + "try:\n", + " print('Dashboard link:', client.dashboard_link)\n", + "except: \n", + " # The client should be customized to your workstation resources.\n", + " client = Client(n_workers=6, threads_per_worker=1, memory_limit='2GB') \n", + " print('Dashboard link:', client.dashboard_link)\n", + "print(client)" + ] + }, + { + "cell_type": "markdown", + "id": "b8620cfc", + "metadata": {}, + "source": [ + "## 2. Set Paths" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e50dc99d", + "metadata": {}, + "outputs": [], + "source": [ + "# Start and end times of a water year (to note, these dates were chosen to align with the PFCONUS1 early 2000s runs)\n", + "StartDate = '2003-10-01'\n", + "EndDate = '2005-09-30'\n", + "\n", + "domain_data_path = 'examples/parflow/domain_data/' # path to the model domain data\n", + "\n", + "# Path to save results (obs and mod stands for observation and modeled, respectively)\n", + "OBS_OutputFolder = './obs_outputs_PF' \n", + "MOD_OutputFolder = './mod_outputs_PF'" + ] + }, + { + "cell_type": "markdown", + "id": "feb58871", + "metadata": {}, + "source": [ + "## 3. Retrieve Observed Snow Data " + ] + }, + { + "cell_type": "markdown", + "id": "45ca2832", + "metadata": {}, + "source": [ + "### 3a. Define the watershed of interest\n", + "\n", + "One of the simplest ways to gather data and model output from HydroData is by specifying a [Hydrologic Unit Code](https://www.usgs.gov/national-hydrography/watershed-boundary-dataset). Before we retrieve any hydrologic information, we need to indicate a HUC8 code and use it to gather snow water equivalent (SWE) observations from SNOTEL sites \n", + "\n", + "✏️ If you have a specific HUC8 in mind, you can change the variable `huc_8_code` below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c8355563", + "metadata": {}, + "outputs": [], + "source": [ + "# ✏️ Specify HUC8 ID and Name for watershed of interest\n", + "huc_8_code = '14020001' # East-Taylor HUC-8\n", + "print(f'HUC-8 ID: {huc_8_code}')\n", + "\n", + "huc_8_name = 'East-Taylor'\n", + "print(f'HUC-8 name: {huc_8_name}')" + ] + }, + { + "cell_type": "markdown", + "id": "5de02c3b", + "metadata": {}, + "source": [ + "Use the Subsettools function `define_huc_domain()` to get the actual CONUS1 indices associated with the East-Taylor HUC-O8. It returns a tuple `(imin, jmin, imax, jmax)` of grid indices that define a bounding box containing our region (or point) of interest (Note: (imin, jmin, imax, jmax) are the west, south, east and north boundaries of the box respectively) and a mask for that domain." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "965bd6ea", + "metadata": {}, + "outputs": [], + "source": [ + "ij_bounds, mask = subsettools.define_huc_domain([huc_8_code], 'conus1')\n", + "\n", + "np.save(f'{domain_data_path}domainMask_{huc_8_name}_conus1.npy', mask)\n", + "\n", + "plt.imshow(mask, origin='lower')\n", + "print(ij_bounds)\n", + "print(mask.shape)" + ] + }, + { + "cell_type": "markdown", + "id": "61745fb2", + "metadata": {}, + "source": [ + "Using the domain mask and the i,j PF-CONUS1 indices, we use a hf_hydrodata function to find and save the associated grid cell center lat/lon pair for each grid cell in the domain. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1ff5c1d4", + "metadata": {}, + "outputs": [], + "source": [ + "# Extract bounds\n", + "i_min, j_min, i_max, j_max = ij_bounds\n", + "mask_shape = mask.shape #shape of the subset rectangular domain\n", + "\n", + "# Create i/j index ranges\n", + "i_vals = np.arange(i_min, i_max)\n", + "j_vals = np.arange(j_min, j_max)\n", + "\n", + "# Create full 2D grid (note indexing order carefully)\n", + "jj, ii = np.meshgrid(j_vals, i_vals, indexing=\"ij\")" + ] + }, + { + "cell_type": "markdown", + "id": "c6ae90bf", + "metadata": {}, + "source": [ + "Because the function `hf.to_latlon()` finds the coordinates at the lower left corner of a grid cell, we add 0.5 to each i,j index pair to find the **lat/lon at the grid cell center**." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "13fbc81f", + "metadata": {}, + "outputs": [], + "source": [ + "# Compute grid cell centers\n", + "ii_center = ii + 0.5\n", + "jj_center = jj + 0.5\n", + "\n", + "# Convert to lat/lon (vectorized loop)\n", + "lat = np.zeros(mask_shape)\n", + "lon = np.zeros(mask_shape)\n", + "\n", + "for r in range(mask_shape[0]):\n", + " for c in range(mask_shape[1]):\n", + " lat[r, c], lon[r, c] = hf.to_latlon(\"conus1\",\n", + " ii_center[r, c],\n", + " jj_center[r, c])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "aa20e59f", + "metadata": {}, + "outputs": [], + "source": [ + "# Save 2D arrays of Lat & Lon\n", + "np.save(f\"{domain_data_path}{huc_8_name}_{huc_8_code}_lat_2d.npy\", lat)\n", + "np.save(f\"{domain_data_path}{huc_8_name}_{huc_8_code}_lon_2d.npy\", lon)\n", + "\n", + "# Save the Lat & Lon in a df (as CSV) matched with the ParFlow-CONUS i,j indices \n", + "grid_df = pd.DataFrame({\n", + " \"i\": ii.ravel(),\n", + " \"j\": jj.ravel(),\n", + " \"lat\": lat.ravel(),\n", + " \"lon\": lon.ravel(),\n", + "})\n", + "grid_df.to_csv(f\"{domain_data_path}df_{huc_8_name}_{huc_8_code}_conus1_gridPoints_LatLon.csv\", index=False)\n", + "\n", + "# Save a shapefile of the watershed Lat & Lon\n", + "grid_gdf = gpd.GeoDataFrame(\n", + " grid_df,\n", + " geometry=gpd.points_from_xy(grid_df.lon, grid_df.lat),\n", + " crs=\"EPSG:4326\"\n", + ")\n", + "# Save the grid points / GeoDataFrame to a shapefile for later use\n", + "grid_gdf.to_file(f\"{domain_data_path}{huc_8_name}_{huc_8_code}_conus1_gridPoints_LatLon.shp\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "41e8d63a", + "metadata": {}, + "outputs": [], + "source": [ + "grid_df" + ] + }, + { + "cell_type": "markdown", + "id": "84549c32", + "metadata": {}, + "source": [ + "### 3b. Explore the available SWE data in a watershed " + ] + }, + { + "cell_type": "markdown", + "id": "e088705e", + "metadata": {}, + "source": [ + "
\n", + "

📖 Did you know?

\n", + "

The Snow Telemetry (SNOTEL) network, managed by the USDA Natural Resources Conservation Service (NRCS), monitors snowpack conditions across key watersheds in the western United States to support water supply forecasting and climate monitoring. SNOTEL sites are fully automated stations that continuously measure snow water equivalent (SWE), snow depth, precipitation, temperature, and other meteorological variables throughout the year. Unlike manual snow survey programs, SNOTEL provides high-temporal-resolution observations that enable near–real-time assessment of snowpack evolution and interannual variability. These data are widely used for operational forecasting, drought assessment, and long-term climate analysis.

\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "de83d6b6", + "metadata": {}, + "source": [ + "Explore what SWE data is available at sites within the HUC ID you specified that operated during WY2004 and WY2005. If you want to check other variables besides SWE, you can change the `variable` argument name. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3aa8210e", + "metadata": {}, + "outputs": [], + "source": [ + "avail_df = hf.get_site_variables(variable = \"swe\",\n", + " huc_id = [huc_8_code], grid = 'conus1',\n", + " date_start = StartDate, date_end = EndDate)\n", + "\n", + "# View first five records\n", + "avail_df.head(5)" + ] + }, + { + "cell_type": "markdown", + "id": "268915dc", + "metadata": {}, + "source": [ + "### 3c. Map the SNOTEL stations inside the HUC-08 watershed that have available data in the selected time range \n", + "To note here, we are using pre-loaded shape files for the East-Taylor HUC8, which are located in the `/domain_data/` directory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f5c95f67", + "metadata": {}, + "outputs": [], + "source": [ + "### Select station locations that fall within the HUC8 watershed\n", + "\n", + "# Path to the watershed shapefile that was just created\n", + "watershed = f'{domain_data_path}East-Taylor_14020001.shp'\n", + "watershed_gdf = gpd.read_file(watershed).to_crs(epsg=4326)\n", + "\n", + "# Create GeoDataFrame of all available stations\n", + "filtered_all_stations_gdf = gpd.GeoDataFrame(\n", + " avail_df,\n", + " geometry=gpd.points_from_xy(\n", + " avail_df.longitude,\n", + " avail_df.latitude\n", + " ),\n", + " crs=\"EPSG:4326\"\n", + ")\n", + "print(\"Sites CRS:\", filtered_all_stations_gdf.crs)\n", + "\n", + "# Combine watershed polygons into one geometry\n", + "watershed_union = watershed_gdf.geometry.unary_union\n", + "\n", + "# Filter stations that fall within the watershed\n", + "sites_in_watershed = filtered_all_stations_gdf[\n", + " filtered_all_stations_gdf.geometry.within(watershed_union)\n", + "].copy()\n", + "\n", + "sites_in_watershed.reset_index(drop=True, inplace=True)\n", + "\n", + "print(f\"Total sites in watershed: {len(sites_in_watershed)}\")\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "06a6b39b", + "metadata": {}, + "source": [ + "Plot these sites on a map. Then, hover over the pins to see the site names." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a1e3bc39", + "metadata": {}, + "outputs": [], + "source": [ + "## TODO: REPLACE WITH CSSI_EVALUATION.PLOTS FUNCTIONS\n", + "\n", + "# this may take a moment to load, but it should pop up in a new window\n", + "m = plot_utils.plot_sites_within_domain(sites_in_watershed, watershed_gdf, zoom_start=9)\n", + "m" + ] + }, + { + "cell_type": "markdown", + "id": "354fc021", + "metadata": {}, + "source": [ + "## 4. Retrieve SNOTEL point observations and metadata from HydroData \n", + "Use the `hf.get_point_data()` function to retrieve daily, start-of-day SWE from SNOTEL sites:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d74eeccb", + "metadata": {}, + "outputs": [], + "source": [ + "# Create a folder to save observations\n", + "isExist = os.path.exists(OBS_OutputFolder)\n", + "if isExist == True:\n", + " exit\n", + "else:\n", + " os.mkdir(OBS_OutputFolder)" + ] + }, + { + "cell_type": "markdown", + "id": "b1805ac2", + "metadata": {}, + "source": [ + "### 4a. Get HydroData Observed SWE\n", + "Gather the SNOTEL data for all stations within the watershed and save CSV:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f0f2beb6", + "metadata": {}, + "outputs": [], + "source": [ + "# Request point observations data\n", + "data_df = hf.get_point_data(dataset=\"snotel\", variable=\"swe\", temporal_resolution=\"daily\", aggregation=\"sod\",\n", + " date_start=StartDate, date_end=EndDate,\n", + " huc_id=[huc_8_code], grid='conus1')\n", + " #polygon=watershed_bbox, polygon_crs=watershed_crs)\n", + "\n", + "# save\n", + "data_df.to_csv(f'./{OBS_OutputFolder}/df_{huc_8_name}_{huc_8_code}_SNOTEL.csv', index=False)\n", + "\n", + "# Ensure date column is datetime\n", + "data_df[\"date\"] = pd.to_datetime(data_df[\"date\"])\n", + "\n", + "# View first five records\n", + "data_df.head(5)" + ] + }, { "cell_type": "markdown", + "id": "fb365fbf", "metadata": {}, "source": [ - "## 2. Build the Input DataFrame\n", + "### 4b. Get Metadata for HydroData Observed SWE\n", + "Also, retrieve the metadata for the same stations we retrieved SWE observations for:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1cb40a7c", + "metadata": {}, + "outputs": [], + "source": [ + "# Request site-level attributes for these sites\n", + "metadata_df = hf.get_point_metadata(dataset=\"snotel\", variable=\"swe\", temporal_resolution=\"daily\", aggregation=\"sod\",\n", + " date_start=StartDate, date_end=EndDate,\n", + " huc_id=['14020001'], grid='conus1')\n", + "\n", + "# save\n", + "metadata_df.to_csv(f'./{OBS_OutputFolder}/df_{huc_8_name}_{huc_8_code}_SNOTEL_metadata.csv', index=False)\n", "\n", - "This workflow requires us to organize our observed and modeled snow water equivilent into a single Pandas DataFrame. We'll do this by reading the CSVs created in the `01_HydroData_collection.ipynb` notebook. \n", + "# View first five records\n", + "metadata_df.head(5)" + ] + }, + { + "cell_type": "markdown", + "id": "d17b371a", + "metadata": {}, + "source": [ + "The metadata file is an important addition to the observations and it is recommended to always gather and save this for the observations you are using (particularly to support reproducibility within an open-science workflow). The saved file has useful attributes like site names, first and last date of available data, lat/lon, and the query URL. \n", "\n", - "**Quick Reminder:** Observed SWE is derived from SNOTEL measurements, while modeled SWE is extracted from ParFlow-CONUS at corresponding station locations. All time series here are evaluated at a daily resolution.\n" + "Additionally, the metadata contains **ParFlow-CONUS1 and ParFlow-CONUS2 `i,j` indices, which indicate the exact model domain grid cell the observation aligns with**. This is a useful HydroData feature that removes the need for users to manually match station latitude/longitude coordinates to the appropriate model grid cell, as this spatial mapping is handled directly within HydroData. We will use these indices below to extract PF-CONUS1 modeled SWE for each SNOTEL station in the section below. " + ] + }, + { + "cell_type": "markdown", + "id": "0e50455e", + "metadata": {}, + "source": [ + "## 5. Retrieve ParFlow-CONUS1 Modeled Snow Data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "545a9d22", + "metadata": {}, + "outputs": [], + "source": [ + "# Create a folder to save results\n", + "isExist = os.path.exists(MOD_OutputFolder)\n", + "if isExist == True:\n", + " exit\n", + "else:\n", + " os.mkdir(MOD_OutputFolder)" + ] + }, + { + "cell_type": "markdown", + "id": "56eb4bb4", + "metadata": {}, + "source": [ + "The following section retrieves ParFlow-CONUS1 data for each SNOTEL site within our HUC-08 watershed. The code identifies the CONUS1 `i,j` indices associated with each SNOTEL site, indicated in the `metadata_df`. It then extracts the CONUS1 modeled SWE output for the site and the period of interest, returning the result as a DataFrame. To fairly compare with SNOTEL, which reports SWE once daily at the start of the local day, model output is aggregated by day, using the argment `\"temporal_resolution\": \"daily\"`. Finally, the processed data is saved as a CSV file for each site. \n", + "\n", + "### 5a. ParFlow CONUS1 Model Dataset Information\n", + "We can print some information about the model output dataset by using the `hf.get_catalog_entry()` to get the CONUS1 model dataset metadata. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "10647da1", + "metadata": {}, + "outputs": [], + "source": [ + "conus1_options = {\n", + " \"dataset\": \"conus1_baseline_mod\",\n", + " \"variable\": \"swe\"\n", + "}\n", + "hf.get_catalog_entry(conus1_options)" + ] + }, + { + "cell_type": "markdown", + "id": "c6fd1306", + "metadata": {}, + "source": [ + "Before we gather model outputs at the specific SNOTEL sites, we can visualize SWE across our HUC-08. This is plotted for one day at 1km lateral resolution." ] }, { "cell_type": "code", "execution_count": null, + "id": "ba48a33a", "metadata": {}, "outputs": [], "source": [ - "# Start and end times of a water year (note that this code currently works for one water year)\n", - "StartDate = '2018-10-01'\n", - "EndDate = '2020-09-30'" + "# retrieve gridded PF-CONUS1 SWE for the entire HUC8 watershed\n", + "grid_swe_options = {\n", + " \"dataset\": \"conus1_baseline_mod\",\n", + " \"variable\": \"swe\",\n", + " \"temporal_resolution\": \"daily\",\n", + " \"start_time\": '2004-04-01', ### TO NOTE: the gridded function has exclusive end date, so this is hardcoded for now \n", + " \"end_time\": '2004-04-02',\n", + " \"huc_id\": huc_8_code\n", + " }\n", + " \n", + " # Get gridded data\n", + "grid_swe = hf.get_gridded_data(grid_swe_options)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b82b9574", + "metadata": {}, + "outputs": [], + "source": [ + "grid_swe_map = xr.DataArray(grid_swe[0], dims=(\"y\", \"x\"), name=\"SWE\")\n", + "grid_swe_map.hvplot.image(cmap=\"YlGnBu\", colorbar=True, aspect=\"equal\", title=f\"{huc_8_name} Gridded SWE on 2004-04-01\")\n" ] }, { "cell_type": "markdown", - "id": "2fe4466c", + "id": "73a13787", "metadata": {}, "source": [ - "Read in existing CSV files of observations and model outputs for one HUC-08" + "Now, grab the PF-CONUS1 modeled SWE from the SNOTEL site locations. Here we use the CONUS1 i and j indices from the `metadata_df` and grab the SWE from those grid cells. " ] }, { "cell_type": "code", "execution_count": null, + "id": "17143151", "metadata": {}, "outputs": [], "source": [ - "# Observations \n", - "obs_df = pd.read_csv(os.path.join('obs_outputs_PF', 'df_East-Taylor_14020001_SNOTEL.csv'))\n", - "obs_df = obs_df.set_index(\"date\")\n", + "# Copy data_df to model_df so we have the same timestamps and site_id structure\n", + "model_df = data_df.copy()\n", "\n", - "# Observations metadata \n", - "metadata_df = pd.read_csv(os.path.join('obs_outputs_PF', 'df_East-Taylor_14020001_SNOTEL_metadata.csv'))\n", + "# Set all non-date columns to NaN to prepare for filling in model data\n", + "non_date_cols = model_df.columns.difference([\"date\"])\n", + "model_df[non_date_cols] = np.nan\n", "\n", - "# Model outputs\n", - "mod_df = pd.read_csv(os.path.join('mod_outputs_PF', 'df_East-Taylor_14020001_PFCONUS1.csv'))\n", - "mod_df = mod_df.set_index(\"date\")" + "# Rename site_id columns for PF outputs \n", + "model_df.columns = [\n", + " col if col == \"date\" else col.replace(\":SNTL\", \"\") + \":PFCONUS1\"\n", + " for col in model_df.columns\n", + "]" + ] + }, + { + "cell_type": "markdown", + "id": "523bd35c", + "metadata": {}, + "source": [ + "Use the function `hf.get_gridded_data()` and PF-CONUS1 `i,j` indices to select the SWE output for the correct location and time period: " ] }, { "cell_type": "code", "execution_count": null, + "id": "a814204c", "metadata": {}, "outputs": [], "source": [ - "# Combine these into a single dataframe\n", - "combined_df = pd.concat([obs_df, mod_df], axis=1)\n", - "combined_df.index = pd.to_datetime(combined_df.index)\n", + "# Loop over each station in metadata_df\n", + "for idx, row in metadata_df.iterrows():\n", + " site_id = row[\"site_id\"] # original SNTL site_id\n", + " col_name = site_id.replace(\":SNTL\", \"\") + \":PFCONUS1\" # corresponding column in model_df\n", + " conus_i = int(row[\"conus1_i\"])\n", + " conus_j = int(row[\"conus1_j\"])\n", + " \n", + " # Build options dict for this station\n", + " options = {\n", + " \"dataset\": \"conus1_baseline_mod\",\n", + " \"variable\": \"swe\",\n", + " \"temporal_resolution\": \"daily\",\n", + " \"start_time\": '2003-10-01', ### TO NOTE: the gridded function has exclusive end date, so this is hardcoded for now \n", + " \"end_time\": '2005-10-01',\n", + " \"grid_point\": [conus_i, conus_j]\n", + " }\n", + " \n", + " # Get gridded data\n", + " data = hf.get_gridded_data(options)\n", + " \n", + " # Fill column in model_df\n", + " # Convert to numeric in case hf returns lists or other types\n", + " model_df[col_name] = np.squeeze(np.array(data))\n", + "\n", + "# Ensure date column is datetime\n", + "model_df[\"date\"] = pd.to_datetime(model_df[\"date\"])\n", + "\n", + "# Save\n", + "model_df.to_csv(f'./{MOD_OutputFolder}/df_{huc_8_name}_{huc_8_code}_PFCONUS1.csv', index=False)\n", + " \n", + "model_df.head(5)" + ] + }, + { + "cell_type": "markdown", + "id": "7464828b", + "metadata": {}, + "source": [ + "## 6. Quick plot sanity check \n", + "Plot a simple timeseries of modeled and observed SWE to make sure our data retrieval was successful. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fbe43f6a", + "metadata": {}, + "outputs": [], + "source": [ + "fig, ax = plt.subplots(figsize=(10, 4))\n", + "\n", + "ax.plot(data_df[\"date\"], model_df[\"380:CO:PFCONUS1\"], label=\"Modeled\", linewidth=2)\n", + "\n", + "ax.plot(data_df[\"date\"], data_df[\"380:CO:SNTL\"], label=\"Observed\", linewidth=2)\n", + "\n", + "ax.set_xlabel(\"Date\")\n", + "ax.set_ylabel(\"SWE (mm)\")\n", + "\n", + "# Date formatting for x-axis\n", + "ax.xaxis.set_major_locator(mdates.MonthLocator(interval=3))\n", + "ax.xaxis.set_major_formatter(mdates.DateFormatter('%m-%Y'))\n", + "\n", + "ax.legend(loc='upper left')\n", + "ax.grid(True, alpha=0.3)\n", "\n", - "combined_df" + "plt.tight_layout()" + ] + }, + { + "cell_type": "markdown", + "id": "da3df109", + "metadata": {}, + "source": [ + "# Start of comparison from old notebook" ] }, { diff --git a/examples/parflow/03_watershed_scale_comparison_ParFlow.ipynb b/examples/parflow/parflow_swe_watershed_scale_evaluation.ipynb similarity index 100% rename from examples/parflow/03_watershed_scale_comparison_ParFlow.ipynb rename to examples/parflow/parflow_swe_watershed_scale_evaluation.ipynb From 8b1eeeb08cd23ee5c1499679ae996b024aa8e715 Mon Sep 17 00:00:00 2001 From: danielletijerina Date: Mon, 23 Mar 2026 14:21:45 -0600 Subject: [PATCH 2/2] added DS_store to gitignore --- .gitignore | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/.gitignore b/.gitignore index 63597ef..5d23686 100644 --- a/.gitignore +++ b/.gitignore @@ -1,2 +1,3 @@ **/__pycache__/** -*.coverage \ No newline at end of file +*.coverage +*.DS_Store \ No newline at end of file