refactor: simplify internal chunk representation by d-v-b · Pull Request #3899 · zarr-developers/zarr-python

d-v-b · 2026-04-12T12:27:10Z

The addition of rectilinear chunks left us with some jank in our internal chunk normalization logic. We had a lot of redundant chunk normalization routines, and we also weren't handling user input correctly, e.g. #3898. We need some internal changes to ensure that user input is consistently handled regardless of whether we are generating regular chunks or irregular chunks. That's what this PR does. Also, this PR closes #3898

I will give my summary, then a summary generated by claude.

My summary

`ChunksTuple`

This PR addresses this by introducing a canonical internal representation of the fully normalized chunk layout for an array, which is a tuple called ChunksTuple. Feel free to suggest better names.

ChunksTuple is just tuple[tuple[int, ...], ...], i.e. a representation compatible with regular or irregular chunks, but I wrap this type in NewType.

I use NewType because tuples of tuples of ints can be very easily confused with tuples of ints (regular chunks), or tuples of tuples of tuples of ints (e.g., rectilinear chunking with RLE). So I think it's helpful to be defensive here and reduce ambiguity.

There are 2 functions that produce ChunksTuple:

normalize_chunks_nd, which converts a user-friendly request for specific chunks into an explicit chunk layout
guess_chunks, which converts a user's request for auto chunking into a specific layout. Auto chunking depends on configuration, data type, etc so this is a separate routine.

`ResolvedChunking`

ChunksTuple is used in ResolvedChunking (bad name, I would rather use ChunkSpec but that's in use already), which is this:

class ResolvedChunking(NamedTuple):
    outer_chunks: ChunksTuple
    inner_chunks: ChunksTuple | None

ResolvedChunking is what you get when you jointly normalize the chunks and shards keyword arguments to create_array.

I introduce some new terminology here for internal purposes. outer_chunks denotes the shape of the chunks qua stored objects, and inner_chunks denotes the shape of the subchunks inside an outer chunk, if that outer chunk uses sharding. If the outer chunk doesn't use sharding, then inner_chunks is None.

These two data types are used to consolidate our chunk normalization routines.

Claude's Summary

Refactors chunk and shard handling during array creation to fix a naming ambiguity where chunk_shape meant "outer grid partition" without sharding but "inner sub-chunk" with sharding, silently changing meaning based on context.

Introduces a three-layer architecture for chunk resolution:

Normalization — normalize_chunks_nd and guess_chunks convert raw user input into ChunksTuple, a NewType-branded tuple[tuple[int, ...], ...] that represents both regular and rectilinear chunks uniformly. This is the only boundary between untyped user input and the internal representation.
Resolution — resolve_outer_and_inner_chunks takes a ChunksTuple (the user's chunks=) and raw shard input (shards=), and returns a ResolvedChunking NamedTuple with two unambiguous fields:
- outer_chunks: ChunksTuple — chunk sizes for the chunk grid metadata (shard sizes when sharding, chunk sizes otherwise)
- inner_chunks: ChunksTuple | None — sub-chunk sizes for ShardingCodec, or None when sharding is not active
Metadata construction — create_chunk_grid_metadata takes a ChunksTuple and dispatches to RegularChunkGridMetadata or RectilinearChunkGridMetadata based on whether the chunks are uniform.

Key design decisions

ChunksTuple as a NewType: Zero runtime cost, but the type checker prevents accidentally passing raw user input where normalized chunks are expected. Both regular and rectilinear chunks use the same representation — regular is just the case where each inner tuple has uniform values.
inner_chunks: None models capability, not configuration: An unsharded chunk is opaque (read the whole thing or nothing). A shard has internal structure (an index that enables sub-chunk addressing). None means "this chunk has no internal structure" — it's not a flag you toggle, it's the absence of a capability.
normalize_chunks_nd rejects None: Top-level None means "auto" everywhere else in the codebase. Having normalize_chunks_nd silently treat it as "span all" would be a bug waiting to happen. Callers must use guess_chunks for auto-chunking.
Rectilinear shard detection absorbed into resolve_outer_and_inner_chunks: The function handles all shard input forms (None, "auto", dict, flat tuple, nested sequence) internally, eliminating the shards_for_partition / rectilinear_shard_meta dance that callers previously had to manage.

Changes by file

src/zarr/core/chunk_grids.py

Added SHARDED_INNER_CHUNK_MAX_BYTES constant (1 MiB) — replaces magic number used as the auto-chunking ceiling when sharding is active
Added ChunksTuple NewType — branded tuple[tuple[int, ...], ...]
Added ResolvedChunking NamedTuple — (outer_chunks, inner_chunks)
normalize_chunks_nd now returns ChunksTuple, rejects None
guess_chunks now returns ChunksTuple (normalizes via normalize_chunks_nd)
Replaced resolve_shard_shape (returned flat tuple | None) with resolve_outer_and_inner_chunks (returns ResolvedChunking, absorbs rectilinear shard detection)
Removed resolve_chunk_shape (was a lossy flattening wrapper)
Removed guess_chunks_and_shards (was dead code)

src/zarr/core/metadata/v3.py

create_chunk_grid_metadata now accepts ChunksTuple (no longer normalizes internally, no shape parameter)
is_regular_1d rewritten to short-circuit on first mismatch instead of building a full set
RST-style docstring syntax replaced with markdown

src/zarr/core/array.py

init_array: chunk/shard resolution reduced from ~50 lines of interleaved conditionals to a clean pipeline: normalize → resolve → build metadata. Variables chunk_shape_parsed, shard_shape_parsed, chunks_out, shards_for_partition, and rectilinear_shard_meta eliminated in favor of outer_chunks and inner_chunks.
_create (legacy API): same normalize-then-build pattern, consistent outer_chunks naming

tests/conftest.py

create_array_metadata updated to use resolve_outer_and_inner_chunks and create_chunk_grid_metadata instead of manually constructing grid metadata dicts

tests/test_chunk_grids.py

normalize_chunks_nd tests updated: None moved to error cases, typesize parameter removed
Tests use the new function signatures

tests/test_array.py

Shard auto-partition tests updated to use resolve_outer_and_inner_chunks
Auto-chunk-with-sharding test exercises the full pipeline (guess → resolve → verify divisibility)
Uses SHARDED_INNER_CHUNK_MAX_BYTES instead of magic 1048576

…s regular chunks

Previously rectilinear chunk grids and regular chunk grids normalized chunks inconsistently. This change ensures that chunk specifications are always normalized by the same routines in all cases. This change also ensures that chunks=(-1, ...) consistently normalizes to a full length chunk along that axis.

codecov · 2026-04-12T12:33:33Z

Codecov Report

❌ Patch coverage is 61.53846% with 35 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.70%. Comparing base (9681cf9) to head (c40c5ff).

Files with missing lines	Patch %	Lines
src/zarr/core/chunk_grids.py	55.17%	26 Missing ⚠️
src/zarr/core/array.py	61.11%	7 Missing ⚠️
src/zarr/core/metadata/v3.py	86.66%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3899      +/-   ##
==========================================
- Coverage   92.98%   92.70%   -0.28%     
==========================================
  Files          87       87              
  Lines       11246    11261      +15     
==========================================
- Hits        10457    10440      -17     
- Misses        789      821      +32

Files with missing lines	Coverage Δ
src/zarr/core/metadata/v3.py	`92.39% <86.66%> (-0.38%)`	⬇️
src/zarr/core/array.py	`97.15% <61.11%> (-0.49%)`	⬇️
src/zarr/core/chunk_grids.py	`89.08% <55.17%> (-7.18%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov · 2026-04-12T12:41:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.26%. Comparing base (3d354a8) to head (424439b).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3899      +/-   ##
==========================================
+ Coverage   93.23%   93.26%   +0.03%     
==========================================
  Files          87       87              
  Lines       11695    11719      +24     
==========================================
+ Hits        10904    10930      +26     
+ Misses        791      789       -2

Files with missing lines	Coverage Δ
src/zarr/core/array.py	`97.87% <100.00%> (+0.15%)`	⬆️
src/zarr/core/chunk_grids.py	`96.81% <100.00%> (+0.54%)`	⬆️
src/zarr/core/metadata/v3.py	`94.06% <100.00%> (+0.22%)`	⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

maxrjones

I really like the direction of the refactor.

I found the description of the PR somewhat misleading. The bug fix (make chunk normalization properly handle -1) is totally unrelated to the additional of rectilinear chunk support; the bug report showed the issue on prior releases. The rectilinear chunk addition made the pre-existing jank related to duplicated normalization logic worse.

there are a few cases in the deprecated Array.create() method that possibly regress in this PR:

def _create_deprecated(**kwargs):
    """Call the deprecated Array.create(), suppressing the deprecation warning."""
    with warnings.catch_warnings():
        warnings.simplefilter("ignore", DeprecationWarning)
        return zarr.Array.create(**kwargs)


def test_deprecated_underspecified_chunks_padded():
    """Fewer chunk dims than shape dims — missing dims padded from shape."""
    arr = _create_deprecated(store={}, shape=(100, 20, 10), chunks=(30,), dtype="uint8")
    assert arr.metadata.chunk_grid.chunk_shape == (30, 20, 10)


def test_deprecated_underspecified_chunks_with_none():
    """Partial chunks with None — padded from shape."""
    arr = _create_deprecated(store={}, shape=(100, 20, 10), chunks=(30, None), dtype="uint8")
    assert arr.metadata.chunk_grid.chunk_shape == (30, 20, 10)


def test_deprecated_none_per_dimension_sentinel():
    """None inside chunks tuple means 'span the full axis'."""
    arr = _create_deprecated(store={}, shape=(100, 10), chunks=(10, None), dtype="uint8")
    assert arr.metadata.chunk_grid.chunk_shape == (10, 10)

I'm not sure if these were intentional API design choices, versus quirks in the old API. It may be a good time to remove deprecated functions, as a separate PR, first to reduce the surface area for potential regressions when fixing/adding functionality to the new API.

d-v-b · 2026-04-12T19:48:31Z

I found the description of the PR somewhat misleading. The bug fix (make chunk normalization properly handle -1) is totally unrelated to the additional of rectilinear chunk support; the bug report showed the issue on prior releases. The rectilinear chunk addition made the pre-existing jank related to duplicated normalization logic worse.

good catch, the change that broke -1 normalization was this one: #2761. We basically forked array creation routines and didn't reach feature / testing parity with the new one 🤦

I don't see value in supporting cases like this, other than backwards compatibility.

shape=(100, 20, 10), chunks=(30,)
shape=(100, 20, 10), chunks=(30, None)

Are there any non-deprecated functions that supported this?

d-v-b · 2026-04-12T19:49:32Z

It may be a good time to remove deprecated functions, as a separate PR, first to reduce the surface area for potential regressions when fixing/adding functionality to the new API.

💯

maxrjones · 2026-04-12T20:07:30Z

I don't see value in supporting cases like this, other than backwards compatibility. Are there any non-deprecated functions that supported this?

I couldn't find any non-deprecated cases of supporting underspecified chunks (fewer than the number of dims) or using None as a sentinel value like -1.

d-v-b · 2026-04-13T10:11:35Z

#3903 removes the deprecated methods

d-v-b · 2026-04-13T10:29:43Z

the latest changes make the representation of chunks recursive, in order to express nested sharding. This is future-proofing the design here against the possibility that we give our high-level routines a simple way to declare nested sharding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Switch normalize_chunks_1d to return np.ndarray[tuple[int], np.dtype[np.int64]] instead of tuple[int, ...]. The uniform-chunks branch now constructs in O(1) via np.full, recovering the single-allocation fast path that regressed when the canonical ChunksTuple representation was introduced. Update create_chunk_grid_metadata in v3.py to convert arrays to tuples of ints before passing to is_regular_nd and RectilinearChunkGridMetadata, keeping those downstream functions' signatures unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous commit 14788aa was meant to only touch chunk_grids.py (Tasks 2+3 of the ChunksTuple → int64-array refactor). It also modified create_chunk_grid_metadata in v3.py — that change belongs to a later task with a different approach (widen annotations rather than materialize tuples) and a better perf profile. Restoring v3.py to its pre-14788aa state. The proper v3.py change will land in a follow-up commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…unk_grid_metadata Accept ndarray[int64] in is_regular_1d/is_regular_nd alongside Sequence[int]. Cast only the first element per axis on the regular path so D ints are allocated rather than N*D. Materialize fully on the rectilinear path because _validate_chunk_shapes checks isinstance(dim_spec, int) which rejects np.int64. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…erization Use _assert_chunks_equal for the three call sites that compared a tuple of int64 arrays against a tuple of int tuples. Add three small tests asserting that normalize_chunks_1d returns a 1D int64 ndarray for uniform, explicit-list, and -1 sentinel inputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per-element Python iteration over a 100K-element int64 array dominated create_array runtime on the (10**8,) chunks=(1000,) regression case (~6 ms in is_regular_nd, downstream of ChunksTuple normalization). Dispatch on np.ndarray and use a single vectorized comparison instead. End-to-end create_array on the regression case: ~7.9 ms -> ~0.6 ms. Promote numpy to a runtime import (was TYPE_CHECKING-only) for the isinstance dispatch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Names the structure (a layout) rather than the operation that produced it. Reads cleanly for both sharded and unsharded cases, fits the recursive inner-layout pattern, and is what one reaches for when reading the code cold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codspeed-hq · 2026-04-21T06:53:04Z

Merging this PR will not alter performance

✅ 66 untouched benchmarks
⏩ 6 skipped benchmarks¹

_{Comparing d-v-b:refactor/simplify-internal-chunk-representation (4bc4678) with main (dd5a321)}

6 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

d-v-b · 2026-05-05T15:37:16Z

@maxrjones your example from #3946 produces the following error message on this branch:

TypeError: Each chunk size must be an integer; got non-integer element(s) ([3, 3],) at indices (0,). Chunk sizes must be declared as a flat sequence of positive integers (e.g. [3, 3, 1]).

maxrjones · 2026-05-05T16:01:06Z

@maxrjones your example from #3946 produces the following error message on this branch:


TypeError: Each chunk size must be an integer; got non-integer element(s) ([3, 3],) at indices (0,). Chunk sizes must be declared as a flat sequence of positive integers (e.g. [3, 3, 1]).

That's awesome, thank you! Is this branch ready for a review? I can take a look tomorrow AM if so

d-v-b · 2026-05-05T16:02:27Z

yes it's ready

d-v-b · 2026-05-05T16:59:50Z

flagging one behavioral change here: in main, chunks=True was accepted, as we interpreted True as the number 1:

import zarr
arr = zarr.create_array(store="memory://test", shape=(1,), chunks=True, dtype="int8")
print(arr.chunks)
# (True,)

I don't think we ever intended this. Given the choice between canonicalizing the odd behavior in main and rejecting it as an error, this PR makes chunks=True a ValueError.

maxrjones · 2026-05-05T17:53:21Z

flagging one behavioral change here: in main, chunks=True was accepted, as we interpreted True as the number 1:
import zarr

arr = zarr.create_array(store="memory://test", shape=(1,), chunks=True, dtype="int8")

print(arr.chunks)

# (True,)
I don't think we ever intended this. Given the choice between canonicalizing the odd behavior in main and rejecting it as an error, this PR makes chunks=True a ValueError.

OOC was this a consequence of the addition of rectilinear chunk grids or always the case? It was indeed not intended by my PR, if the former

d-v-b · 2026-05-05T20:39:48Z

before the rectilinear chunks PR, we converted True to the integer value 1 in the chunks attribute. After that PR, we kept True as a boolean, but practically interpreted it as the integer 1. Both of these are weird and not really intended.

maxrjones

Thanks for continuing to work on this, I really like the direction.

I don't think the chunks=True fix is working, below is an MVCE:

# /// script
# requires-python = ">=3.11"
# dependencies = [
#     "zarr @ git+https://github.com/d-v-b/zarr-python@refactor/simplify-internal-chunk-representation",
#     "numpy",
# ]
# ///
"""
MVCE: PR #3899 — `chunks=True` was claimed to raise ValueError but silently
produces size-1 chunks.

Author's claim (PR comment):
    "Given the choice between canonicalizing the odd behavior in `main` and
    rejecting it as an error, this PR makes `chunks=True` a `ValueError`."

Actual behavior on the PR branch: no error — the array is created with
chunks=(1, 1, ...), one element per chunk on every dimension. On a large
shape this is a serious footgun (millions of objects in the store).
"""

import zarr

# Case 1: trivial shape — author's exact example. Silently produces (1,).
arr = zarr.create_array(store={}, shape=(1,), chunks=True, dtype="int8")
print(f"shape=(1,)        chunks={arr.chunks}    write_chunk_sizes={arr.write_chunk_sizes}")
assert arr.chunks == (1,), f"expected ValueError, got chunks={arr.chunks}"

# Case 2: realistic shape — silently produces a million size-1 chunks.
arr = zarr.create_array(store={}, shape=(1_000_000,), chunks=True, dtype="int8")
print(f"shape=(1_000_000,) chunks={arr.chunks}  nchunks={arr.nchunks}")
assert arr.chunks == (1,)
assert arr.nchunks == 1_000_000

# Case 3: multi-dim — every axis gets size-1 chunks.
arr = zarr.create_array(store={}, shape=(100, 100, 100), chunks=True, dtype="int8")
print(f"shape=(100,100,100) chunks={arr.chunks} nchunks={arr.nchunks}")
assert arr.chunks == (1, 1, 1)
assert arr.nchunks == 1_000_000

# Direct call into the internal function — same path, no auto-chunking guard
# in init_array catches the bool, so it falls into the Integral branch.
from zarr.core.chunk_grids import normalize_chunks_nd

result = normalize_chunks_nd(True, (5,))
print(f"normalize_chunks_nd(True, (5,)) -> {[a.tolist() for a in result]}")
assert [a.tolist() for a in result] == [[1, 1, 1, 1, 1]]

print("\nAll asserts passed: `chunks=True` does NOT raise ValueError on this branch.")
print("Root cause: `bool` is a subclass of `int`, so:")
print("  - _is_rectilinear_chunks(True) returns False (caught by isinstance(_, int))")
print("  - init_array's `chunks is None or chunks == 'auto'` guard does not match True")
print("  - normalize_chunks_nd's `isinstance(chunks, numbers.Integral)` matches True -> int(True) == 1")

A minor thing, but it'd be nice if you could get Claude to update the bugfix.md and PR description to be up-to-date in case we need to refer back to this PR later on.

…into refactor/simplify-internal-chunk-representation

…ps://github.com/d-v-b/zarr-python into refactor/simplify-internal-chunk-representation

d-v-b · 2026-05-06T12:26:45Z

I don't think the chunks=True fix is working, below is an MVCE:

I hadn't pushed my local changes 🤦 now that example errors when chunks-True.

A minor thing, but it'd be nice if you could get Claude to update the bugfix.md and PR description to be up-to-date in case we need to refer back to this PR later on.

Definitely!

d-v-b · 2026-05-06T14:12:22Z

@maxrjones here's claude's summary (I prefer an addition over editing the opening post in the PR)

PR #3899 — refactor: simplify internal chunk representation

Why this PR exists

init_array's chunk/shard resolution path had accumulated branches for
auto-chunking, sharding, and (most recently) rectilinear chunks/shards, with
helper locals such as chunks_flat, shards_for_partition, rectilinear_meta,
chunk_shape_parsed, and shard_shape_parsed running in parallel. Within a
single function chunk_shape could refer to the outer grid partition or the
inner sub-chunk size depending on which branch you were in, which made the
code hard to follow.

While investigating #3898 (chunks=-1 no longer behaving as expected — the
underlying regression came from #2761), it became clear that consolidating
the resolution pipeline behind a single canonical representation would fix
the reported bug and make init_array easier to reason about going forward.

This PR does that consolidation, and folds in several related fixes that
surfaced during the refactor.

Internal architecture

A user-supplied chunk specification now flows through three explicit stages:

Normalize raw user input → ChunksTuple
(normalize_chunks_nd for explicit specs, guess_chunks for auto-chunking).
Resolve chunks + shards → ChunkLayout
(resolve_outer_and_inner_chunks).
Materialize grid metadata
(create_chunk_grid_metadata, dispatching to RegularChunkGridMetadata
or RectilinearChunkGridMetadata).

init_array is now a composition of these three steps. The legacy
AsyncArray._create path uses the same normalize → grid pipeline.

`ChunksTuple`

ChunksTuple = NewType(
    "ChunksTuple", tuple[np.ndarray[tuple[int], np.dtype[np.int64]], ...]
)

One 1D int64 array per axis. Regular chunks and rectilinear chunks share this
shape — regular grids are simply the case where every value in each per-axis
array is identical (with an optional smaller boundary chunk).

The NewType brand prevents passing raw user input where validated chunks are
expected. The numpy-array element type was chosen after benchmarking. An
earlier iteration of this PR used plain tuple[int, ...] per axis, which on
shape=(10**8,) chunks=(1000,) ended up materializing 100K Python ints per
axis and ran ~16× slower end-to-end than main. Switching to np.int64
arrays restored parity, and the regularity helpers below are vectorized to
keep it that way.

`ChunkLayout`

class ChunkLayout(NamedTuple):
    outer_chunks: ChunksTuple
    inner: ChunkLayout | None = None

outer_chunks is the chunk grid as stored. inner is the sub-structure inside
each chunk: None means no sharding (the chunk is opaque), and a nested
ChunkLayout means the chunk is a shard with its own sub-grid. The recursion
is intentional — it leaves room for a future high-level API that exposes
nested sharding without another schema break.

`is_regular_1d` / `is_regular_nd`

Vectorized predicates that decide between RegularChunkGridMetadata and
RectilinearChunkGridMetadata at metadata-construction time. The numpy path
short-circuits on the first mismatch; the sequence fallback iterates.

User-visible behavior changes

The chunk-spec input grammar is tightened to a single canonical form per
dimension. The full set of changes is documented in changes/3899.bugfix.md.

Input	Before	After
`chunks=-1`	Not handled (#3898)	Works: full extent of axis
`chunks=True`	Produced `(1, 1, …)` chunks	`ValueError` pointing at `"auto"`
`chunks=(30,)` for a 3D shape	Padded with `shape[len(chunks):]`	`ValueError`: dimension mismatch
`chunks=(30, None, None)`	`None` interpreted as full-extent sentinel	`ValueError`: per-dim `None` rejected
`chunks=[[3, 3], 1]` (RLE form)	`TypeError` from a nested comparison	`TypeError` reporting offending indices
`chunks=0` or list containing `0`	Error raised at later validation step	`ValueError`: "Chunk size must be positive"
Array with a 0-length dimension	Inconsistent behavior across input forms	Handled uniformly

The "padding short tuples" and "None per-dim" forms are not exercised by
the test suite or the documented public API. After this PR, init_array
expects the caller to choose between an explicit per-dimension spec, a scalar,
or auto-chunking.

Files changed

File	What changed
`src/zarr/core/chunk_grids.py`	New: `ChunksTuple`, `ChunkLayout`, `SHARDED_INNER_CHUNK_MAX_BYTES`, `normalize_chunks_1d`, `normalize_chunks_nd`, `guess_chunks`, `resolve_outer_and_inner_chunks`. Removed: `normalize_chunks`, `_auto_partition`, `_guess_chunks` (renamed to `_guess_regular_chunks`).
`src/zarr/core/metadata/v3.py`	New: `is_regular_1d`, `is_regular_nd`, `create_chunk_grid_metadata`. Removed: `resolve_chunks` (its responsibilities split between normalize and create).
`src/zarr/core/array.py`	`init_array` chunk/shard resolution simplified to a normalize → resolve → materialize pipeline. The legacy `AsyncArray._create` path uses the same primitives. About 50 lines of interleaved conditionals collapsed.
`tests/conftest.py`	`create_array_metadata` helper rewritten on top of the new primitives.
`tests/test_chunk_grids.py`	Tests reorganized around the new functions. Parametrized cases covering both happy paths (sentinels, explicit specs, rectilinear) and the new error paths (zero/negative chunks, `True`, length mismatch, RLE rejection, `None` rejection). Added a `_assert_chunks_equal` helper that compares `ChunksTuple` against tuples of int tuples.
`tests/test_array.py`	Shard auto-partition tests updated to call `resolve_outer_and_inner_chunks`. End-to-end test for auto-chunking + sharding using `SHARDED_INNER_CHUNK_MAX_BYTES`.
`tests/test_metadata/test_v3.py`	New: 21 parametrized tests for `is_regular_1d` (sequence + ndarray paths) and `is_regular_nd`.

The non-source-code files in the diff (.github/workflows/*, pyproject.toml,
tests/test_codecs/test_sharding.py, tests/test_store/test_fsspec.py) come
from main merges that have happened during the PR's lifetime and are not
part of the chunk-representation work.

Performance

After the vectorization commits (14788aa, 33ee8a1), create_array on a
1D shape with ~100K chunks is at parity with main. Other shapes were never
measurably affected. The CodSpeed report on the PR shows no regressions
across the existing benchmark suite.

Out-of-scope follow-ups identified during review

These came up in the review thread and were deferred to keep this PR focused.

The legacy Array.create / AsyncArray.create paths still accept the
looser v2-era input grammar that this PR tightens elsewhere. chore: remove .create methods from arrays #3903 removes
those deprecated methods entirely; once that lands, the legacy _create
branch can be deleted.
is_regular_1d lives in metadata/v3.py but operates on the runtime
ChunksTuple defined in chunk_grids.py. Hoisting
create_chunk_grid_metadata into chunk_grids.py would tighten the
module boundary, but isn't done here to keep the diff manageable.

Migration notes for downstream code

Nothing publicly exported from the zarr namespace changed. The removed
helpers (zarr.core.chunk_grids.normalize_chunks, zarr.core.metadata.v3.resolve_chunks)
lived under zarr.core.*, the documented internal namespace. If a downstream
project happens to import them, the equivalent calls are:

# old
from zarr.core.chunk_grids import normalize_chunks
chunks_t = normalize_chunks(chunks, shape, item_size)

# new
from zarr.core.chunk_grids import normalize_chunks_nd, guess_chunks
if chunks is None:
    chunks_t = guess_chunks(shape, item_size)
else:
    chunks_t = normalize_chunks_nd(chunks, shape)
# chunks_t is a ChunksTuple — one np.int64 array per axis.

# old
from zarr.core.metadata.v3 import resolve_chunks
grid = resolve_chunks(raw_chunks, shape, item_size)

# new
from zarr.core.metadata.v3 import create_chunk_grid_metadata
grid = create_chunk_grid_metadata(chunks_t)

d-v-b added 3 commits April 10, 2026 18:26

refactor: rename guess_chunks to more clearly indicate that it guesse…

5990390

…s regular chunks

refactor: use newtype pattern

9735a85

github-actions Bot added the needs release notes Automatically applied to PRs which haven't added release notes label Apr 12, 2026

docs: changelog

c40c5ff

github-actions Bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Apr 12, 2026

d-v-b requested a review from maxrjones April 12, 2026 12:29

fix: handle 0-length arrays

a9b68d8

d-v-b added 3 commits April 12, 2026 14:50

test: test untested cases of chunk normalization

88e93ad

fix: don't accept inane input

fcd5ab0

test: check error states in normalize_chunks_1d

5c56197

maxrjones reviewed Apr 12, 2026

View reviewed changes

d-v-b changed the title ~~refactor/simplify internal chunk representation~~ refactor: simplify internal chunk representation Apr 13, 2026

refactor: make resolvedchunking recursive to support nested sharding

9fc3fea

d-v-b added 2 commits April 14, 2026 09:51

Merge branch 'main' into refactor/simplify-internal-chunk-representation

2659f1c

Merge branch 'main' into refactor/simplify-internal-chunk-representation

c6a5095

maxrjones reviewed Apr 20, 2026

View reviewed changes

Comment thread src/zarr/core/chunk_grids.py Outdated

d-v-b and others added 6 commits April 20, 2026 19:50

Merge branch 'main' into refactor/simplify-internal-chunk-representation

fb7da3a

test: add _assert_chunks_equal helper for ChunksTuple

f6fc818

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix: cast ChunksTuple elements to int at consumer sites in array.py

62ce735

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

d-v-b and others added 4 commits April 20, 2026 23:16

test: cast ChunksTuple elements to int in create_array_metadata fixture

e68cdfc

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

d-v-b added the benchmark Code will be benchmarked in a CI job. label Apr 21, 2026

d-v-b added 2 commits April 30, 2026 16:52

Merge branch 'main' into refactor/simplify-internal-chunk-representation

6b02c13

Merge branch 'main' into refactor/simplify-internal-chunk-representation

4668fe5

d-v-b mentioned this pull request May 5, 2026

Run-Length Encoding input is not properly supported by RectilinearChunkGrid #3946

Open

d-v-b added 3 commits May 5, 2026 11:11

chore: add informative error for invalid chunk grid parameter

b74b78d

chore: make tests compact

2e8e8cb

fix: typos

fc24c87

d-v-b added 2 commits May 5, 2026 12:08

test: add tests for regular grid helper functions

bdb32d2

fix: reject chunks=True

f8a391e

Merge branch 'main' into refactor/simplify-internal-chunk-representation

09eda7e

maxrjones reviewed May 6, 2026

View reviewed changes

d-v-b added 2 commits May 6, 2026 08:17

Merge branch 'main' of https://github.com/zarr-developers/zarr-python …

9f54550

…into refactor/simplify-internal-chunk-representation

Merge branch 'refactor/simplify-internal-chunk-representation' of htt…

2907297

…ps://github.com/d-v-b/zarr-python into refactor/simplify-internal-chunk-representation

d-v-b added 3 commits May 6, 2026 08:33

docs: update changelog

a86beb0

docs: remove unneeded changelog entry

e283f4e

test: test for explicit 0-sized chunk length rejection

424439b

Uh oh!

Conversation

d-v-b commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

My summary

ChunksTuple

ResolvedChunking

Claude's Summary

Key design decisions

Changes by file

Uh oh!

codecov Bot commented Apr 12, 2026

Codecov Report

Uh oh!

codecov Bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

maxrjones left a comment

Choose a reason for hiding this comment

Uh oh!

d-v-b commented Apr 12, 2026

Uh oh!

d-v-b commented Apr 12, 2026

Uh oh!

maxrjones commented Apr 12, 2026

Uh oh!

d-v-b commented Apr 13, 2026

Uh oh!

d-v-b commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

codspeed-hq Bot commented Apr 21, 2026

Merging this PR will not alter performance

Footnotes

Uh oh!

d-v-b commented May 5, 2026

Uh oh!

maxrjones commented May 5, 2026

Uh oh!

d-v-b commented May 5, 2026

Uh oh!

d-v-b commented May 5, 2026

Uh oh!

maxrjones commented May 5, 2026

Uh oh!

d-v-b commented May 5, 2026

Uh oh!

maxrjones left a comment

Choose a reason for hiding this comment

Uh oh!

d-v-b commented May 6, 2026

Uh oh!

d-v-b commented May 6, 2026

PR #3899 — refactor: simplify internal chunk representation

Why this PR exists

Internal architecture

ChunksTuple

ChunkLayout

is_regular_1d / is_regular_nd

User-visible behavior changes

Files changed

Performance

Out-of-scope follow-ups identified during review

Migration notes for downstream code

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

d-v-b commented Apr 12, 2026 •

edited

Loading

`ChunksTuple`

`ResolvedChunking`

codecov Bot commented Apr 12, 2026 •

edited

Loading

d-v-b commented Apr 13, 2026 •

edited

Loading

`ChunksTuple`

`ChunkLayout`

`is_regular_1d` / `is_regular_nd`