[RFC]: Nested tabular headers for uniform nested objects

### Type of Change

- [x] Backward-compatible addition
- [x] New optional feature

### Summary

Extend the tabular form (§9.3) so that a column may itself be a uniform nested object, declared recursively in the header as `field{sub1,sub2}`. Rows remain flat delimiter-separated lines; the header alone encodes the nested shape. This generalises the existing `{f1,f2}` field list into a recursive form and unlocks table-collapse savings for the very common "array of records with a grouped sub-object" shape (addresses, customers, dimensions, geo, …).

### Motivation

## Problem

TOON v3's tabular form (`key[N]{fields}:`) is the format's strongest compression mechanism, but it only applies when every column value is a JSON primitive. As soon as one field in the row objects is itself a uniform nested object, detection fails and the entire array falls back to expanded list form (§9.4), repeating every key name `N` times.

Real-world payloads from APIs, event logs, and catalogue data very often have this shape: a flat list of records where one or two fields group related attributes. Today:

```
orders[2]:
  - id: 1
    customer:
      name: Alice
      country: DK
    total: 99
  - id: 2
    customer:
      name: Bob
      country: UK
    total: 149
```

Every key in `customer` repeats `N` times; the list-item markers add further overhead. The shape is *already* uniform — the format just has no way to declare it.

## Benefits

Measured against the reference TypeScript implementation on 7 datasets, including a `uniform-nested` dataset of 500 shipment records with sender/receiver/dimensions sub-objects:

| Dataset              | TOON v3 | TOON + nested | JSON compact | Δ vs TOON  | Δ vs JSON |
|----------------------|---------|---------------|--------------|------------|-----------|
| uniform-nested       | 58,701  | **27,111**    | 46,697       | **−53.8%** | **−41.9%** |
| nested-config        | 620     | 591           | 558          | −4.7%      | —         |
| tabular (no nesting) | 49,919  | 49,919        | 79,059       | 0%         | −36.9%    |

For data without nested objects the output is byte-identical to TOON v3; the feature has zero overhead when it does not apply.

### Detailed Design

## Proposed Syntax

```toon
orders[2]{id,customer{name,country},total}:
  1,Alice,DK,99
  2,Bob,UK,149
```

The header nests one or more field groups inside `{...}`; row lines are unchanged in shape — plain delimiter-separated primitives at depth +1.

## Grammar Changes (§6 Header Syntax)

Generalise the `fields-seg` / `fieldname` productions so field entries may themselves contain a nested field group:

```abnf
fields-seg    = "{" field-entry *( delim field-entry ) "}"
field-entry   = fieldname [ fields-seg ]
fieldname     = key
```

All other header rules (bracket segment, delimiter selection, trailing colon, §7.3 key encoding) are unchanged. The delimiter in a nested `fields-seg` MUST be the same active delimiter as the enclosing bracket segment, matching the existing rule for flat field lists.

## Encoding Rules (§9.3)

Extend tabular detection. An array of objects is eligible for tabular encoding when, for every top-level field of the row objects, either:

- **a.** Every row has a primitive at that key (existing rule), or
- **b.** Every row has a *nested-uniform* object at that key, where *nested-uniform* is defined recursively as: non-empty, same set of keys in every row, and each value at each key is itself either a primitive or nested-uniform.

When satisfied:

- The header lists top-level fields in first-row encounter order. Fields matching rule (a) are emitted as a bare `fieldname`; fields matching rule (b) are emitted as `fieldname{...subfields...}`, applying the same rules recursively.
- Row values are written as a single delimiter-separated line at depth +1. Values are laid out by a **depth-first, pre-order** walk of the header descriptors (left-to-right, nested groups expanded in place). The flattened arity of each row MUST equal the total number of leaf fields in the header.
- If any field fails both (a) and (b), encoders MUST fall back to the expanded list form (§9.4) for the entire array, exactly as today.

Empty nested groups (`{}`) are NOT permitted; an object field with no keys disqualifies the array from the nested-uniform rule for that column and forces expanded list form.

## Decoding Rules

A decoder encountering a nested field group in a header MUST:

1. Parse field descriptors recursively, tracking brace depth so that nested `{…}` are matched correctly.
2. Compute the ordered list of *leaf* field names via a depth-first walk of the descriptor tree.
3. For each row line at depth +1, split on the active delimiter exactly as today. In strict mode, the number of cells MUST equal the number of leaf fields — this replaces the existing flat `fields.length` equality check.
4. Reconstruct each row object by walking the descriptor tree in the same depth-first order, assigning consecutive row cells to leaf descriptors and wrapping nested groups in plain objects.

Row disambiguation rules (§9.3 "first-unquoted-delimiter vs first-unquoted-colon") are unchanged: rows still contain only primitive values separated by the active delimiter, with no unquoted colons.

### Examples

### Before (current spec)

```toon
orders[2]:
  - id: 1
    customer:
      name: Alice
      country: DK
    total: 99
  - id: 2
    customer:
      name: Bob
      country: UK
    total: 149
```

### After (proposed)

```toon
orders[2]{id,customer{name,country},total}:
  1,Alice,DK,99
  2,Bob,UK,149
```

### Multiple nested columns

```toon
shipments[1]{id,sender{name,city},receiver{name,city}}:
  s1,ACME,Berlin,Globex,Oslo
```

### Non-uniform column falls back

```json
{ "entries": [
  { "id": 1, "meta": { "a": 1 } },
  { "id": 2, "meta": { "b": 2 } }
]}
```

`meta` has different keys per row, so the array is encoded in expanded list form (§9.4) exactly as in TOON v3.

### Alternative delimiter (pipe)

```toon
orders[2|]{id|customer{name|country}|total}:
  1|Alice|DK|99
  2|Bob|UK|149
```

### Drawbacks

- Header lines become denser and slightly harder for humans to scan when many columns are nested.
- Decoder parsing of the fields segment needs brace-depth tracking, a small increase over the current flat split. (Reference implementation: ~50 LOC.)
- Adds one more case to §9.3's detection logic; encoders must recurse over each column to classify it as primitive / nested-uniform / non-uniform.
- Introduces a second mechanism for compressing nested structure alongside key folding (§13.4), which could feel redundant for the subset of shapes where both apply.
- Decoders built for v3 will hard-fail on documents using the new syntax (unmatched `{`). That is the correct fail-closed behaviour for a format extension, but it does mean v3.0 and post-RFC implementations cannot freely mix.

### Alternatives Considered

### Alternative 1: Key folding (§13.4) only
Key folding collapses single-key chains into dotted paths. It does not help here because `customer` is not a single-key wrapper — it has multiple sibling keys (`name`, `country`). The two mechanisms are orthogonal: folding targets *chains*, this RFC targets *groups*.

### Alternative 2: Per-row `{}` inline objects inside tabular rows
We could allow row cells to contain literal `{name:Alice,country:DK}` objects. This keeps the header flat but repeats the structural markers on every row and reintroduces unquoted colons inside rows, breaking the §9.3 row disambiguation rule. Strictly worse for tokens and significantly worse for parseability.

### Alternative 3: New header marker (e.g. `key[N]**{…}:`)
Introducing a new sigil to distinguish "nested-aware" tables is more invasive grammatically and provides no extra expressiveness over recursive `fields-seg`. Reusing existing characters (`{`, `}`, the active delimiter) means no new quoting or escape rules are required.

### Alternative 4: Do nothing
Not acceptable for the dominant real-world shape. On the benchmarked dataset of 500 shipment records with nested sender/receiver/dimensions sub-objects, TOON v3 is worse than JSON compact (58,701 vs 46,697 tokens) — precisely the scenario TOON is supposed to win at. Leaving this on the table concedes the format's strongest feature on a very common input shape.

### Impact on Implementations

- **Reference implementation:** Implemented in [toon-format/toon#296](https://github.com/toon-format/toon/pull/296). Encoder: `~130 LOC` added across `encode/nested-fields.ts`, `encode/encoders.ts`, `encode/primitives.ts`. Decoder: `~80 LOC` in `decode/parser.ts` (recursive descriptor parser + brace matcher). All 474 pre-existing tests pass unchanged; 8 new tests cover encode, decode, round-trip, multi-field, and non-uniform fallback.
- **Community implementations:** Need to add recursive parsing of `fields-seg` (brace-depth tracking) and a depth-first row-value walker. No new tokens, escapes, or delimiters. A conservative implementation MAY ship decoder support before encoder support.
- **Backward compatibility:** Documents produced by a pre-RFC encoder remain valid — when no column is a uniform nested object, rule (a) still produces the exact same header and rows as TOON v3. The reference implementation confirms byte-identical output on all non-nested benchmark datasets. A pre-RFC decoder encountering a nested header will fail at header parse time (unmatched `{`); this is fail-closed behaviour, which is the correct posture for a format extension.
- **Migration path:** Encoders enable the feature via an opt-in option (`nestedTables: true` in the reference implementation). No action required by users of v3-encoded documents.

### Migration Strategy

_Not a breaking change — no migration required._

## For Implementers

1. Update the fields-segment parser to track brace depth and build a recursive descriptor tree instead of a flat string array.
2. Compute the leaf field list via depth-first walk; use it in place of the flat field list for row-arity validation.
3. On decode, walk the descriptor tree in the same order, wrapping nested groups in plain objects.
4. (Encoder only, optional) Add a `nestedTables` option that extends tabular detection to allow uniform nested-object columns.

## For Users

No action required. Existing TOON documents remain valid and continue to decode under both pre-RFC and post-RFC implementations.

### Test Cases

```json
{
  "name": "nested tabular — basic",
  "input": {
    "orders": [
      { "id": 1, "customer": { "name": "Alice", "country": "DK" }, "total": 99 },
      { "id": 2, "customer": { "name": "Bob",   "country": "UK" }, "total": 149 }
    ]
  },
  "expected": "orders[2]{id,customer{name,country},total}:\n  1,Alice,DK,99\n  2,Bob,UK,149",
  "note": "Uniform nested object column collapses into the header; rows remain flat."
}
```

```json
{
  "name": "nested tabular — multiple nested columns",
  "input": {
    "shipments": [
      { "id": "s1", "sender": { "name": "ACME", "city": "Berlin" }, "receiver": { "name": "Globex", "city": "Oslo" } }
    ]
  },
  "expected": "shipments[1]{id,sender{name,city},receiver{name,city}}:\n  s1,ACME,Berlin,Globex,Oslo",
  "note": "Two sibling nested columns; row values are laid out depth-first."
}
```

```json
{
  "name": "nested tabular — non-uniform fallback",
  "input": {
    "entries": [
      { "id": 1, "meta": { "a": 1 } },
      { "id": 2, "meta": { "b": 2 } }
    ]
  },
  "note": "Different keys in the nested object per row — array MUST fall back to expanded list form (§9.4) exactly as in v3."
}
```

### Affected Specification Sections

- **§6 Header Syntax** — generalise the `fields-seg` / `fieldname` ABNF productions to allow recursion.
- **§9.3 Arrays of Objects — Tabular Form** — add rule (b) (nested-uniform columns) to detection; describe depth-first row value layout and the revised strict-mode row-arity check.
- **§9.4 Mixed / Non-Uniform Arrays — Expanded List** — add a cross-reference noting that a column failing both rules (a) and (b) triggers fallback to §9.4 for the whole array.
- **§13.4 Key Folding and Path Expansion** — one-line note that nested field groups compose cleanly with key folding applied to the header's key prefix.
- **CHANGELOG.md** — entry under the next minor version.

### Unresolved Questions

1. **Hard depth limit?** Should the spec cap nesting depth (e.g. 2 or 3 levels) or leave it unconstrained like row count? The reference implementation currently caps at 2 levels as a heuristic, but the spec itself could impose no hard limit, mirroring how §9.3 treats row count.
2. **Empty nested groups.** This RFC disallows `field{}`; an alternative is to allow it as a synonym for "this column is always `{}`". Costs one more grammar special case for marginal benefit.
3. **Interaction with key folding (§13.4).** Folding applies to the *key prefix* of a header only, so it should compose cleanly, but the interaction is worth an explicit note.

### Additional Context

- **Reference implementation:** [toon-format/toon#296](https://github.com/toon-format/toon/pull/296) — maintainer requested this RFC before reviewing the implementation PR.
- **Related but complementary:** [toon-format/spec#45](https://github.com/toon-format/spec/issues/45) ("Object Schema Headers / Nest-Collapse for Keyed Object Collections") targets `Record<string, Object>` containers; this RFC targets arrays of uniform objects. The two proposals could land independently.
- **Not related:** [toon-format/spec#31](https://github.com/toon-format/spec/issues/31) (type annotations) was closed; this RFC does not add or rely on column type hints.
- **Target version:** v3.1 (next minor) per §20 — backward-compatible addition.



Dataset	TOON v3	TOON + nested	JSON compact	Δ vs TOON	Δ vs JSON
uniform-nested	58,701	27,111	46,697	−53.8%	−41.9%
nested-config	620	591	558	−4.7%	—
tabular (no nesting)	49,919	49,919	79,059	0%	−36.9%

[RFC]: Nested tabular headers for uniform nested objects #46

Description

Type of Change

Summary

Motivation

Problem

Benefits

Detailed Design

Proposed Syntax

Grammar Changes (§6 Header Syntax)

Encoding Rules (§9.3)

Decoding Rules

Examples

Before (current spec)

After (proposed)

Multiple nested columns

Non-uniform column falls back

Alternative delimiter (pipe)

Drawbacks

Alternatives Considered

Alternative 1: Key folding (§13.4) only

Alternative 2: Per-row {} inline objects inside tabular rows

Alternative 3: New header marker (e.g. key[N]**{…}:)

Alternative 4: Do nothing

Impact on Implementations

Migration Strategy

For Implementers

For Users

Test Cases

Affected Specification Sections

Unresolved Questions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Alternative 2: Per-row `{}` inline objects inside tabular rows

Alternative 3: New header marker (e.g. `key[N]**{…}:`)