Chardata plus encoded datasets by pp-mo · Pull Request #6898 · SciTools/iris

pp-mo · 2026-01-19T13:49:34Z

Closes #6309 + various

Successor to #6850
now incorporating #6851

+ now integrated usage with netcdf load+save, to use encoded datasets

…Mostly working? Get 'create_cf_data_variable' to call 'create_generic_cf_array_var': Mostly working?

… Cubes.

Rename; addin parts of old investigation; add temporary notes.

…or overlength writes.

…width.

ukmo-ccbunney

Just one comment at this time.

ukmo-ccbunney · 2026-01-30T10:56:41Z

lib/iris/fileformats/netcdf/_bytecoding_datasets.py

+        encoding = self.read_encoding
+        if "utf-16" in encoding:
+            # Each char needs at least 2 bytes -- including a terminator char
+            strlen = (strlen // 2) - 1


Do we really need to account for a terminating char on "utf-32" and "utf-16" encodings?
When writing to a netCDF file, surely the terminator isn't written? This is just something that is used when storing strings in memory, is it not?

OK - this looks to be the case. Certainly encoding a byte string to "utf-16" or "utf-32" does appear to add an extra null terminator...

OK - this looks to be the case. Certainly encoding a byte string to "utf-16" or "utf-32" does appear to add an extra null terminator...

And, from my experiments, omitting the extra byte breaks a reverse 'decode' operation.

pp-mo · 2026-03-06T10:37:58Z

Update

merged from main to unblock CI testing

pp-mo · 2026-03-09T10:10:43Z

Status Update 2026-03-06

See #6919 (comment)

scitools-ci

Templating

This PR includes changes that may be worth sharing via templating. For each file listed below, please either:

Action the suggestion via a pull request editing/adding the relevant file in the SciTools/.github templates/ directory. ¹
Raise an issue against the SciTools/.github repo for the above action if you really don't have 10mins spare right now. Include an assignee, to avoid it being forgotten.
Dismiss the suggestion if the changes are not suitable for templating.

You will need to dismiss this review before this PR can be merged. Recommend the reviewer does this as their final action before merging, as this text will continually update as commits come in.

Templated files

The following changed files are templated:

.pre-commit-config.yaml, templated by SciTools/.github/templates/.pre-commit-config.yaml

Include this text in the PR body to avoid any notifications about applying the template changes back to the source repo!
@scitools-templating: please no update notification on: iris ↩

codecov · 2026-03-09T17:31:43Z

Codecov Report

❌ Patch coverage is 94.68439% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.18%. Comparing base (043b0bc) to head (0907fe8).

Files with missing lines	Patch %	Lines
...ib/iris/fileformats/netcdf/_bytecoding_datasets.py	93.92%	7 Missing and 4 partials ⚠️
lib/iris/fileformats/netcdf/_thread_safe_nc.py	71.42%	4 Missing ⚠️
lib/iris/fileformats/netcdf/saver.py	98.75%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6898      +/-   ##
==========================================
+ Coverage   90.11%   90.18%   +0.07%     
==========================================
  Files          91       92       +1     
  Lines       24912    25075     +163     
  Branches     4675     4688      +13     
==========================================
+ Hits        22449    22615     +166     
- Misses       1684     1685       +1     
+ Partials      779      775       -4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

pp-mo added 28 commits January 19, 2026 11:49

Initial tests.

041af2d

Get 'create_cf_data_variable' to call 'create_generic_cf_array_var': …

65bd9dd

…Mostly working? Get 'create_cf_data_variable' to call 'create_generic_cf_array_var': Mostly working?

Reinstate decode on load, now in-Iris coded.

d75a7a7

Revert and amend.

07efc06

Hack to preserve the existing order of attributes on saved Coords and…

2321077

… Cubes.

Fix for dataless; avoid FUTURE global state change from temporary tests.

0174e53

Further fix to attribute ordering.

035e28b

Fixes for data packing.

80c4776

Latest test-chararrays.

d4d3ebd

Fix search+replace error.

3f10cc1

Tiny fix in crucial place! (merge error?).

ee2fe4c

Extra mock property prevents weird test crashes.

744826d

Fix another mock problem.

a3e1217

Initial dataset wrappers.

1a4f2f2

Rename; addin parts of old investigation; add temporary notes.

Various notes, choices + changes: Beginnings of encoded-dataset testing.

0148f43

Replace use of encoding functions with test-specific function: Test f…

20a5be2

…or overlength writes.

Radically simplify 'make_bytesarray', by using a known specified byte…

9b621bf

…width.

Add read tests.

b366fd2

Remove iris width control (not in this layer).

cf048b2

more notes

e684d1d

Merge branch 'encoded_datasets' into chardata_plus_encoded_datasets

28b124c

Remove temporary test code.

a20cc45

Use iris categorised warnings for unknown encodings.

c995a8d

Clarify the temporary load/save exercising tests (a bit).

f118c18

Use bytecoded_datasets in nc load+save, begin fixes.

c8a27df

Further attempt to satisfy warning cateogry checker.

c4a31a4

Fix overlength error tests.

10831d7

Get temporary iris load/save exercises working (todo: proper tests).

042028e

scitools-ci bot added this to 🚴 Peloton Jan 20, 2026

pp-mo mentioned this pull request Jan 20, 2026

Chardata plus #6850

Closed

ukmo-ccbunney reviewed Jan 30, 2026

View reviewed changes

pp-mo mentioned this pull request Feb 3, 2026

Fix iris handling of netcdf character array variables #6309

Open

pp-mo added 5 commits February 27, 2026 16:46

Fix mock patches.

2dbdcba

Fix patches in test_CFReader.

a34ea09

Fix variable creation in odd cases.

aa1fe03

Ignore attribute reordering in scaling-packed saves.

f5d50ee

Fix test for refactored proxy constructor.

b2c6d51

pp-mo mentioned this pull request Feb 27, 2026

Chardata plus encoded datasets pp-mo/iris#122

Closed

pp-mo added 2 commits February 27, 2026 18:56

Fix get_cf_var_data to support vlen-string.

dfd4d91

Add back new test results, folder removed in error.

274fae4

pp-mo force-pushed the chardata_plus_encoded_datasets branch from 274fae4 to 31884e9 Compare March 6, 2026 10:37

pp-mo force-pushed the chardata_plus_encoded_datasets branch 2 times, most recently from e328f94 to 2800dc1 Compare March 6, 2026 12:31

Merge branch 'latest' into chardata_plus_encoded_datasets

09137c3

pp-mo force-pushed the chardata_plus_encoded_datasets branch from 2800dc1 to 09137c3 Compare March 6, 2026 12:52

pp-mo mentioned this pull request Mar 6, 2026

Chardata plus encoded datasets prerebase pp-mo/iris#124

Closed

pp-mo added 2 commits March 6, 2026 17:16

Fix string-type check in cf to suit any of the new dtypes.

122dc92

Remove non-working no-unit for label variables.

0bb70e1

pp-mo force-pushed the chardata_plus_encoded_datasets branch from c4a60d5 to 0bb70e1 Compare March 6, 2026 17:18

Separate asserts for ruff PT018.

3c44c8b

pp-mo added 6 commits March 9, 2026 16:24

Make encoding controls public API.

6e0b34a

Fix old label-loading tests for new chardata handling.

2ca9f6e

Review changes, stylistic only.

b81f4b5

Fix test for new dataset type.

2adf6ab

Remove obsolete not-really-a-test.

7e58f7d

Odd pre-commit fixes, and autoupdate.

0907fe8

scitools-ci bot requested changes Mar 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chardata plus encoded datasets#6898

Chardata plus encoded datasets#6898
pp-mo wants to merge 54 commits intoSciTools:mainfrom
pp-mo:chardata_plus_encoded_datasets

pp-mo commented Jan 19, 2026 •

edited

Loading

Uh oh!

ukmo-ccbunney left a comment

Uh oh!

ukmo-ccbunney Jan 30, 2026

Uh oh!

ukmo-ccbunney Jan 30, 2026

Uh oh!

pp-mo Feb 26, 2026

Uh oh!

pp-mo commented Mar 6, 2026 •

edited

Loading

Uh oh!

pp-mo commented Mar 9, 2026 •

edited

Loading

Uh oh!

scitools-ci bot left a comment

Uh oh!

codecov bot commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pp-mo commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ukmo-ccbunney left a comment

Choose a reason for hiding this comment

Uh oh!

ukmo-ccbunney Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

ukmo-ccbunney Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

pp-mo Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

pp-mo commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Update

Uh oh!

pp-mo commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Status Update 2026-03-06

Uh oh!

scitools-ci bot left a comment

Choose a reason for hiding this comment

Templating

Templated files

Footnotes

Uh oh!

codecov bot commented Mar 9, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pp-mo commented Jan 19, 2026 •

edited

Loading

pp-mo commented Mar 6, 2026 •

edited

Loading

pp-mo commented Mar 9, 2026 •

edited

Loading