Skip to content

[RFC]: Nested tabular headers for uniform nested objects #46

@Turtle-dev3

Description

@Turtle-dev3

Type of Change

  • Backward-compatible addition
  • New optional feature

Summary

Extend the tabular form (§9.3) so that a column may itself be a uniform nested object, declared recursively in the header as field{sub1,sub2}. Rows remain flat delimiter-separated lines; the header alone encodes the nested shape. This generalises the existing {f1,f2} field list into a recursive form and unlocks table-collapse savings for the very common "array of records with a grouped sub-object" shape (addresses, customers, dimensions, geo, …).

Motivation

Problem

TOON v3's tabular form (key[N]{fields}:) is the format's strongest compression mechanism, but it only applies when every column value is a JSON primitive. As soon as one field in the row objects is itself a uniform nested object, detection fails and the entire array falls back to expanded list form (§9.4), repeating every key name N times.

Real-world payloads from APIs, event logs, and catalogue data very often have this shape: a flat list of records where one or two fields group related attributes. Today:

orders[2]:
  - id: 1
    customer:
      name: Alice
      country: DK
    total: 99
  - id: 2
    customer:
      name: Bob
      country: UK
    total: 149

Every key in customer repeats N times; the list-item markers add further overhead. The shape is already uniform — the format just has no way to declare it.

Benefits

Measured against the reference TypeScript implementation on 7 datasets, including a uniform-nested dataset of 500 shipment records with sender/receiver/dimensions sub-objects:

Dataset TOON v3 TOON + nested JSON compact Δ vs TOON Δ vs JSON
uniform-nested 58,701 27,111 46,697 −53.8% −41.9%
nested-config 620 591 558 −4.7%
tabular (no nesting) 49,919 49,919 79,059 0% −36.9%

For data without nested objects the output is byte-identical to TOON v3; the feature has zero overhead when it does not apply.

Detailed Design

Proposed Syntax

orders[2]{id,customer{name,country},total}:
  1,Alice,DK,99
  2,Bob,UK,149

The header nests one or more field groups inside {...}; row lines are unchanged in shape — plain delimiter-separated primitives at depth +1.

Grammar Changes (§6 Header Syntax)

Generalise the fields-seg / fieldname productions so field entries may themselves contain a nested field group:

fields-seg    = "{" field-entry *( delim field-entry ) "}"
field-entry   = fieldname [ fields-seg ]
fieldname     = key

All other header rules (bracket segment, delimiter selection, trailing colon, §7.3 key encoding) are unchanged. The delimiter in a nested fields-seg MUST be the same active delimiter as the enclosing bracket segment, matching the existing rule for flat field lists.

Encoding Rules (§9.3)

Extend tabular detection. An array of objects is eligible for tabular encoding when, for every top-level field of the row objects, either:

  • a. Every row has a primitive at that key (existing rule), or
  • b. Every row has a nested-uniform object at that key, where nested-uniform is defined recursively as: non-empty, same set of keys in every row, and each value at each key is itself either a primitive or nested-uniform.

When satisfied:

  • The header lists top-level fields in first-row encounter order. Fields matching rule (a) are emitted as a bare fieldname; fields matching rule (b) are emitted as fieldname{...subfields...}, applying the same rules recursively.
  • Row values are written as a single delimiter-separated line at depth +1. Values are laid out by a depth-first, pre-order walk of the header descriptors (left-to-right, nested groups expanded in place). The flattened arity of each row MUST equal the total number of leaf fields in the header.
  • If any field fails both (a) and (b), encoders MUST fall back to the expanded list form (§9.4) for the entire array, exactly as today.

Empty nested groups ({}) are NOT permitted; an object field with no keys disqualifies the array from the nested-uniform rule for that column and forces expanded list form.

Decoding Rules

A decoder encountering a nested field group in a header MUST:

  1. Parse field descriptors recursively, tracking brace depth so that nested {…} are matched correctly.
  2. Compute the ordered list of leaf field names via a depth-first walk of the descriptor tree.
  3. For each row line at depth +1, split on the active delimiter exactly as today. In strict mode, the number of cells MUST equal the number of leaf fields — this replaces the existing flat fields.length equality check.
  4. Reconstruct each row object by walking the descriptor tree in the same depth-first order, assigning consecutive row cells to leaf descriptors and wrapping nested groups in plain objects.

Row disambiguation rules (§9.3 "first-unquoted-delimiter vs first-unquoted-colon") are unchanged: rows still contain only primitive values separated by the active delimiter, with no unquoted colons.

Examples

Before (current spec)

orders[2]:
  - id: 1
    customer:
      name: Alice
      country: DK
    total: 99
  - id: 2
    customer:
      name: Bob
      country: UK
    total: 149

After (proposed)

orders[2]{id,customer{name,country},total}:
  1,Alice,DK,99
  2,Bob,UK,149

Multiple nested columns

shipments[1]{id,sender{name,city},receiver{name,city}}:
  s1,ACME,Berlin,Globex,Oslo

Non-uniform column falls back

{ "entries": [
  { "id": 1, "meta": { "a": 1 } },
  { "id": 2, "meta": { "b": 2 } }
]}

meta has different keys per row, so the array is encoded in expanded list form (§9.4) exactly as in TOON v3.

Alternative delimiter (pipe)

orders[2|]{id|customer{name|country}|total}:
  1|Alice|DK|99
  2|Bob|UK|149

Drawbacks

  • Header lines become denser and slightly harder for humans to scan when many columns are nested.
  • Decoder parsing of the fields segment needs brace-depth tracking, a small increase over the current flat split. (Reference implementation: ~50 LOC.)
  • Adds one more case to §9.3's detection logic; encoders must recurse over each column to classify it as primitive / nested-uniform / non-uniform.
  • Introduces a second mechanism for compressing nested structure alongside key folding (§13.4), which could feel redundant for the subset of shapes where both apply.
  • Decoders built for v3 will hard-fail on documents using the new syntax (unmatched {). That is the correct fail-closed behaviour for a format extension, but it does mean v3.0 and post-RFC implementations cannot freely mix.

Alternatives Considered

Alternative 1: Key folding (§13.4) only

Key folding collapses single-key chains into dotted paths. It does not help here because customer is not a single-key wrapper — it has multiple sibling keys (name, country). The two mechanisms are orthogonal: folding targets chains, this RFC targets groups.

Alternative 2: Per-row {} inline objects inside tabular rows

We could allow row cells to contain literal {name:Alice,country:DK} objects. This keeps the header flat but repeats the structural markers on every row and reintroduces unquoted colons inside rows, breaking the §9.3 row disambiguation rule. Strictly worse for tokens and significantly worse for parseability.

Alternative 3: New header marker (e.g. key[N]**{…}:)

Introducing a new sigil to distinguish "nested-aware" tables is more invasive grammatically and provides no extra expressiveness over recursive fields-seg. Reusing existing characters ({, }, the active delimiter) means no new quoting or escape rules are required.

Alternative 4: Do nothing

Not acceptable for the dominant real-world shape. On the benchmarked dataset of 500 shipment records with nested sender/receiver/dimensions sub-objects, TOON v3 is worse than JSON compact (58,701 vs 46,697 tokens) — precisely the scenario TOON is supposed to win at. Leaving this on the table concedes the format's strongest feature on a very common input shape.

Impact on Implementations

  • Reference implementation: Implemented in toon-format/toon#296. Encoder: ~130 LOC added across encode/nested-fields.ts, encode/encoders.ts, encode/primitives.ts. Decoder: ~80 LOC in decode/parser.ts (recursive descriptor parser + brace matcher). All 474 pre-existing tests pass unchanged; 8 new tests cover encode, decode, round-trip, multi-field, and non-uniform fallback.
  • Community implementations: Need to add recursive parsing of fields-seg (brace-depth tracking) and a depth-first row-value walker. No new tokens, escapes, or delimiters. A conservative implementation MAY ship decoder support before encoder support.
  • Backward compatibility: Documents produced by a pre-RFC encoder remain valid — when no column is a uniform nested object, rule (a) still produces the exact same header and rows as TOON v3. The reference implementation confirms byte-identical output on all non-nested benchmark datasets. A pre-RFC decoder encountering a nested header will fail at header parse time (unmatched {); this is fail-closed behaviour, which is the correct posture for a format extension.
  • Migration path: Encoders enable the feature via an opt-in option (nestedTables: true in the reference implementation). No action required by users of v3-encoded documents.

Migration Strategy

Not a breaking change — no migration required.

For Implementers

  1. Update the fields-segment parser to track brace depth and build a recursive descriptor tree instead of a flat string array.
  2. Compute the leaf field list via depth-first walk; use it in place of the flat field list for row-arity validation.
  3. On decode, walk the descriptor tree in the same order, wrapping nested groups in plain objects.
  4. (Encoder only, optional) Add a nestedTables option that extends tabular detection to allow uniform nested-object columns.

For Users

No action required. Existing TOON documents remain valid and continue to decode under both pre-RFC and post-RFC implementations.

Test Cases

{
  "name": "nested tabular — basic",
  "input": {
    "orders": [
      { "id": 1, "customer": { "name": "Alice", "country": "DK" }, "total": 99 },
      { "id": 2, "customer": { "name": "Bob",   "country": "UK" }, "total": 149 }
    ]
  },
  "expected": "orders[2]{id,customer{name,country},total}:\n  1,Alice,DK,99\n  2,Bob,UK,149",
  "note": "Uniform nested object column collapses into the header; rows remain flat."
}
{
  "name": "nested tabular — multiple nested columns",
  "input": {
    "shipments": [
      { "id": "s1", "sender": { "name": "ACME", "city": "Berlin" }, "receiver": { "name": "Globex", "city": "Oslo" } }
    ]
  },
  "expected": "shipments[1]{id,sender{name,city},receiver{name,city}}:\n  s1,ACME,Berlin,Globex,Oslo",
  "note": "Two sibling nested columns; row values are laid out depth-first."
}
{
  "name": "nested tabular — non-uniform fallback",
  "input": {
    "entries": [
      { "id": 1, "meta": { "a": 1 } },
      { "id": 2, "meta": { "b": 2 } }
    ]
  },
  "note": "Different keys in the nested object per row — array MUST fall back to expanded list form (§9.4) exactly as in v3."
}

Affected Specification Sections

  • §6 Header Syntax — generalise the fields-seg / fieldname ABNF productions to allow recursion.
  • §9.3 Arrays of Objects — Tabular Form — add rule (b) (nested-uniform columns) to detection; describe depth-first row value layout and the revised strict-mode row-arity check.
  • §9.4 Mixed / Non-Uniform Arrays — Expanded List — add a cross-reference noting that a column failing both rules (a) and (b) triggers fallback to §9.4 for the whole array.
  • §13.4 Key Folding and Path Expansion — one-line note that nested field groups compose cleanly with key folding applied to the header's key prefix.
  • CHANGELOG.md — entry under the next minor version.

Unresolved Questions

  1. Hard depth limit? Should the spec cap nesting depth (e.g. 2 or 3 levels) or leave it unconstrained like row count? The reference implementation currently caps at 2 levels as a heuristic, but the spec itself could impose no hard limit, mirroring how §9.3 treats row count.
  2. Empty nested groups. This RFC disallows field{}; an alternative is to allow it as a synonym for "this column is always {}". Costs one more grammar special case for marginal benefit.
  3. Interaction with key folding (§13.4). Folding applies to the key prefix of a header only, so it should compose cleanly, but the interaction is worth an explicit note.

Additional Context

  • Reference implementation: toon-format/toon#296 — maintainer requested this RFC before reviewing the implementation PR.
  • Related but complementary: toon-format/spec#45 ("Object Schema Headers / Nest-Collapse for Keyed Object Collections") targets Record<string, Object> containers; this RFC targets arrays of uniform objects. The two proposals could land independently.
  • Not related: toon-format/spec#31 (type annotations) was closed; this RFC does not add or rely on column type hints.
  • Target version: v3.1 (next minor) per §20 — backward-compatible addition.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions