This project was done just for fun, to experiment with and highlight the benefits of rich metadata annotations of Sentinel-2. These scripts calculate and resample Sentinel-2 STAC metadata attributes onto the equal-area H3 DGGS with an area-weighted mean. This allows for an easy, globally-applicable, and fast visualisation of parameters such as cloud-cover, without having to download any S2 tiles.
Mappable parameters include:
- cloud cover
- nodata
- snow & ice
- vegetation
- not-vegetation
- water
It should be noted that these are provided per S2 tile, and should not be trusted for analysis but just querying and selecting your required tiles. Possibly relevant information such as the tile_ids per hexagon, revisits, .. etc are not included in this use-case, as my focus lay just on the visualisation.
As this is a personal project I wanted to experiment with some features I have not used a lot before:
-
STAC metadata: While browsing some repositories I noticed the rich metadata annotations of S2 and wanted to experiment with some visualisations. As the data is stored as metadata and STAC catalogues can be accessed without any credentials (at least on AWS), downloading the metadata can be achieved easily, requiring little storage space for global coverage and long temporal timespans.
-
H3 DGGS: Since stumbling into the DGGS session at LPS, I was intrigued with the presented solutions for equal-area (key for my resampling strategy), global coverage (required for S2), and hierarchical setup (for different resampling levels or pyramids). I decided to use the H3 library, as its Python API is relatively well maintained and easy to use (but I again got reminded to check lat/lon order as they use lon/lat...). Using a vector-based grid complicated the resampling, but allows for neat visualisation and very efficient processing later on.
-
Area-weighting: The mentioned metadata is provided as a single value for each tile, and thus needs to be aggregated or resampled to the required grid. As some tiles cover smaller/larger proportions of the resampling grid I decided to implement an proportional area-based weighting to both simplify and unify the resampling process.
-
Zarr: While I prefer GPKGs/COGs instead of Zarr for geospatial data (no standard yet), I wanted to experiment with it and see how it performs in this scenario. Converting the geopackage into an xarray and storing geometries in a separate geopackage seemed unintuiatave, but actually worked out great. The dimensionality reduction when transforming the metadata into an xarray saved a tremendous amount of storage (indices time (one per day) and H3 can be saved in a simple array), but remerging it with H3-cell-geometries seems a bit redundant. I thought about using xvec or manually encoding polygon geometry as a WKB into the zarr output, but ultimately it was too much work compared to the implemented simpler solution.
If you want to reproduce this example or just play around with it, use the provided scripts accordingly:
-
Data Download
download_stac.pyallows the download of all relevant STAC metadata in your AOI (provide a GeoJson) and timespan (iterable of date slices). Either manually or automatically split the timespan into smaller processes to allow parallel processing of your requests. Due to bandwidth constraints, expect around half an hour download time per month of global STAC metadata.simplify.pyallows you to remove uneccesary columns (depending on what you need) and save storage space and processing time later on: from 336MB to 112MB for the month of january alone. -
Prepare H3 DGGS
Use
get_h3_cells.pyto generate all overlapping H3 cells of your S2 tiles. Select a resolution level (res=3 allows good coverage and fast processing) based on this guide: https://h3geo.org/docs/core-library/restable/. This will create a GPKG which contains all H3 cell geometries used for intersecting and plotting. -
Resampling and Area-Weighting
own_intersection.pyhandles the resampling of irregular S2 tiles with the uniform H3 grid. This processing is parallelized along the time dimension and vectorised to consume as little CPU and RAM as possible. Nevertheless, this process will take some time, with approximately 4.8 million S2 tiles being processed for a single year. The data is subsequently saved as a geopackage (840MB) and repackaged as a zarr (114MB), with only the zarr provided here as a sample file for 2025. -
Plotting
The resulting data can be plotted and visualised using the
plot_globe.pyscript. The zarr file is read in, temporally cropped depending on user input, aggregated and combined with spatial H3 geometry data. This can subsequently be plotted on a globe and displayed as an interactive plot, or saved as a gif (see sample above).
This was just a fun side-project, follow for possibly more :) ps: I dont know why the colorbar is shifting in the gif
install the dependencies for Pyton 3.11 with requirements.txt
