Skip to content

Chardata plus encoded datasets#6898

Draft
pp-mo wants to merge 54 commits intoSciTools:mainfrom
pp-mo:chardata_plus_encoded_datasets
Draft

Chardata plus encoded datasets#6898
pp-mo wants to merge 54 commits intoSciTools:mainfrom
pp-mo:chardata_plus_encoded_datasets

Conversation

@pp-mo
Copy link
Member

@pp-mo pp-mo commented Jan 19, 2026

Closes #6309 + various

Successor to #6850
now incorporating #6851

+ now integrated usage with netcdf load+save, to use encoded datasets

pp-mo added 28 commits January 19, 2026 11:49
…Mostly working?

Get 'create_cf_data_variable' to call 'create_generic_cf_array_var': Mostly working?
Rename; addin parts of old investigation; add temporary notes.
@pp-mo pp-mo mentioned this pull request Jan 20, 2026
Copy link
Contributor

@ukmo-ccbunney ukmo-ccbunney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment at this time.

encoding = self.read_encoding
if "utf-16" in encoding:
# Each char needs at least 2 bytes -- including a terminator char
strlen = (strlen // 2) - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to account for a terminating char on "utf-32" and "utf-16" encodings?
When writing to a netCDF file, surely the terminator isn't written? This is just something that is used when storing strings in memory, is it not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - this looks to be the case. Certainly encoding a byte string to "utf-16" or "utf-32" does appear to add an extra null terminator...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - this looks to be the case. Certainly encoding a byte string to "utf-16" or "utf-32" does appear to add an extra null terminator...

And, from my experiments, omitting the extra byte breaks a reverse 'decode' operation.

@pp-mo pp-mo force-pushed the chardata_plus_encoded_datasets branch from 274fae4 to 31884e9 Compare March 6, 2026 10:37
@pp-mo
Copy link
Member Author

pp-mo commented Mar 6, 2026

Update

merged from main to unblock CI testing

@pp-mo pp-mo force-pushed the chardata_plus_encoded_datasets branch 2 times, most recently from e328f94 to 2800dc1 Compare March 6, 2026 12:31
@pp-mo pp-mo force-pushed the chardata_plus_encoded_datasets branch from c4a60d5 to 0bb70e1 Compare March 6, 2026 17:18
@pp-mo
Copy link
Member Author

pp-mo commented Mar 9, 2026

Status Update 2026-03-06

See #6919 (comment)

Copy link
Contributor

@scitools-ci scitools-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Templating

This PR includes changes that may be worth sharing via templating. For each file listed below, please either:

  • Action the suggestion via a pull request editing/adding the relevant file in the SciTools/.github templates/ directory. 1
  • Raise an issue against the SciTools/.github repo for the above action if you really don't have 10mins spare right now. Include an assignee, to avoid it being forgotten.
  • Dismiss the suggestion if the changes are not suitable for templating.

You will need to dismiss this review before this PR can be merged. Recommend the reviewer does this as their final action before merging, as this text will continually update as commits come in.

Templated files

The following changed files are templated:

Footnotes

  1. Include this text in the PR body to avoid any notifications about applying the template changes back to the source repo!
    @scitools-templating: please no update notification on: iris

@codecov
Copy link

codecov bot commented Mar 9, 2026

Codecov Report

❌ Patch coverage is 94.68439% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.18%. Comparing base (043b0bc) to head (0907fe8).

Files with missing lines Patch % Lines
...ib/iris/fileformats/netcdf/_bytecoding_datasets.py 93.92% 7 Missing and 4 partials ⚠️
lib/iris/fileformats/netcdf/_thread_safe_nc.py 71.42% 4 Missing ⚠️
lib/iris/fileformats/netcdf/saver.py 98.75% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6898      +/-   ##
==========================================
+ Coverage   90.11%   90.18%   +0.07%     
==========================================
  Files          91       92       +1     
  Lines       24912    25075     +163     
  Branches     4675     4688      +13     
==========================================
+ Hits        22449    22615     +166     
- Misses       1684     1685       +1     
+ Partials      779      775       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Fix iris handling of netcdf character array variables

2 participants