Skip to content

Zerhigh/S2_global_clouds

Repository files navigation

Sentinel-2 STAC Metadata Area-Weighted Visualisation using H3 DGGS

This project was done just for fun, to experiment with and highlight the benefits of rich metadata annotations of Sentinel-2. These scripts calculate and resample Sentinel-2 STAC metadata attributes onto the equal-area H3 DGGS with an area-weighted mean. This allows for an easy, globally-applicable, and fast visualisation of parameters such as cloud-cover, without having to download any S2 tiles.

Mappable parameters include:

  • cloud cover
  • nodata
  • snow & ice
  • vegetation
  • not-vegetation
  • water

It should be noted that these are provided per S2 tile, and should not be trusted for analysis but just querying and selecting your required tiles. Possibly relevant information such as the tile_ids per hexagon, revisits, .. etc are not included in this use-case, as my focus lay just on the visualisation.


Yearly Aggregated Cloud Cover for 2025

Design Choices

As this is a personal project I wanted to experiment with some features I have not used a lot before:

  • STAC metadata: While browsing some repositories I noticed the rich metadata annotations of S2 and wanted to experiment with some visualisations. As the data is stored as metadata and STAC catalogues can be accessed without any credentials (at least on AWS), downloading the metadata can be achieved easily, requiring little storage space for global coverage and long temporal timespans.

  • H3 DGGS: Since stumbling into the DGGS session at LPS, I was intrigued with the presented solutions for equal-area (key for my resampling strategy), global coverage (required for S2), and hierarchical setup (for different resampling levels or pyramids). I decided to use the H3 library, as its Python API is relatively well maintained and easy to use (but I again got reminded to check lat/lon order as they use lon/lat...). Using a vector-based grid complicated the resampling, but allows for neat visualisation and very efficient processing later on.

  • Area-weighting: The mentioned metadata is provided as a single value for each tile, and thus needs to be aggregated or resampled to the required grid. As some tiles cover smaller/larger proportions of the resampling grid I decided to implement an proportional area-based weighting to both simplify and unify the resampling process.

  • Zarr: While I prefer GPKGs/COGs instead of Zarr for geospatial data (no standard yet), I wanted to experiment with it and see how it performs in this scenario. Converting the geopackage into an xarray and storing geometries in a separate geopackage seemed unintuiatave, but actually worked out great. The dimensionality reduction when transforming the metadata into an xarray saved a tremendous amount of storage (indices time (one per day) and H3 can be saved in a simple array), but remerging it with H3-cell-geometries seems a bit redundant. I thought about using xvec or manually encoding polygon geometry as a WKB into the zarr output, but ultimately it was too much work compared to the implemented simpler solution.

Workflow

If you want to reproduce this example or just play around with it, use the provided scripts accordingly:

  1. Data Download

    download_stac.py allows the download of all relevant STAC metadata in your AOI (provide a GeoJson) and timespan (iterable of date slices). Either manually or automatically split the timespan into smaller processes to allow parallel processing of your requests. Due to bandwidth constraints, expect around half an hour download time per month of global STAC metadata. simplify.py allows you to remove uneccesary columns (depending on what you need) and save storage space and processing time later on: from 336MB to 112MB for the month of january alone.

  2. Prepare H3 DGGS

    Use get_h3_cells.py to generate all overlapping H3 cells of your S2 tiles. Select a resolution level (res=3 allows good coverage and fast processing) based on this guide: https://h3geo.org/docs/core-library/restable/. This will create a GPKG which contains all H3 cell geometries used for intersecting and plotting.

  3. Resampling and Area-Weighting

    own_intersection.py handles the resampling of irregular S2 tiles with the uniform H3 grid. This processing is parallelized along the time dimension and vectorised to consume as little CPU and RAM as possible. Nevertheless, this process will take some time, with approximately 4.8 million S2 tiles being processed for a single year. The data is subsequently saved as a geopackage (840MB) and repackaged as a zarr (114MB), with only the zarr provided here as a sample file for 2025.

  4. Plotting

    The resulting data can be plotted and visualised using the plot_globe.py script. The zarr file is read in, temporally cropped depending on user input, aggregated and combined with spatial H3 geometry data. This can subsequently be plotted on a globe and displayed as an interactive plot, or saved as a gif (see sample above).


Notes

This was just a fun side-project, follow for possibly more :) ps: I dont know why the colorbar is shifting in the gif

install the dependencies for Pyton 3.11 with requirements.txt

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages