Skip to content

[BUG] String and Char coordinates incorrectly written as variables with wrong dtype #172

@rhaegar325

Description

@rhaegar325

Describe the bug

When writing datasets with string/character coordinates (e.g., type coordinate in land variables), the current write() function has two issues:

  1. String coordinates become variables: Coordinates like type are written as data variables instead of remaining as coordinates
  2. Incorrect dtype: The dtype changes from |S11 (byte string) to <U11 (Unicode string), which is not CMIP6 compliant

Expected Behavior

String coordinates should:

  • Remain as coordinates (appear in ds.coords, not ds.data_vars)
  • Use CF-compliant character encoding with dtype='|S1' and a strlen dimension
  • Match the encoding of standard CMIP6 datasets

Expected encoding:

{
    'dtype': dtype('S1'),
    'char_dim_name': 'type_strlen',
    'original_shape': (11,)
}

Actual Behavior (Before Fix)

Before writing:

ds.coords:
    type     |S11 11B b'bare_ground'  # Coordinate

After writing and re-reading:

ds.data_vars:
    type     <U11 'bare_ground'  # ❌ Now a variable, wrong dtype

ds.coords:
    # ❌ type is missing here

Root Cause

The write() function uses netCDF4 library directly but doesn't properly handle string coordinates:

  1. Doesn't distinguish between string coordinates and regular variables
  2. Doesn't apply CF-compliant character array encoding (S1 + strlen dimension)
  3. Doesn't add string coordinates to the main variable's coordinates attribute (required for auxiliary/scalar coordinates per CF conventions)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions