Type of Change
Summary
Extend the tabular form (§9.3) so that a column may itself be a uniform nested object, declared recursively in the header as field{sub1,sub2}. Rows remain flat delimiter-separated lines; the header alone encodes the nested shape. This generalises the existing {f1,f2} field list into a recursive form and unlocks table-collapse savings for the very common "array of records with a grouped sub-object" shape (addresses, customers, dimensions, geo, …).
Motivation
Problem
TOON v3's tabular form (key[N]{fields}:) is the format's strongest compression mechanism, but it only applies when every column value is a JSON primitive. As soon as one field in the row objects is itself a uniform nested object, detection fails and the entire array falls back to expanded list form (§9.4), repeating every key name N times.
Real-world payloads from APIs, event logs, and catalogue data very often have this shape: a flat list of records where one or two fields group related attributes. Today:
orders[2]:
- id: 1
customer:
name: Alice
country: DK
total: 99
- id: 2
customer:
name: Bob
country: UK
total: 149
Every key in customer repeats N times; the list-item markers add further overhead. The shape is already uniform — the format just has no way to declare it.
Benefits
Measured against the reference TypeScript implementation on 7 datasets, including a uniform-nested dataset of 500 shipment records with sender/receiver/dimensions sub-objects:
| Dataset |
TOON v3 |
TOON + nested |
JSON compact |
Δ vs TOON |
Δ vs JSON |
| uniform-nested |
58,701 |
27,111 |
46,697 |
−53.8% |
−41.9% |
| nested-config |
620 |
591 |
558 |
−4.7% |
— |
| tabular (no nesting) |
49,919 |
49,919 |
79,059 |
0% |
−36.9% |
For data without nested objects the output is byte-identical to TOON v3; the feature has zero overhead when it does not apply.
Detailed Design
Proposed Syntax
orders[2]{id,customer{name,country},total}:
1,Alice,DK,99
2,Bob,UK,149
The header nests one or more field groups inside {...}; row lines are unchanged in shape — plain delimiter-separated primitives at depth +1.
Grammar Changes (§6 Header Syntax)
Generalise the fields-seg / fieldname productions so field entries may themselves contain a nested field group:
fields-seg = "{" field-entry *( delim field-entry ) "}"
field-entry = fieldname [ fields-seg ]
fieldname = key
All other header rules (bracket segment, delimiter selection, trailing colon, §7.3 key encoding) are unchanged. The delimiter in a nested fields-seg MUST be the same active delimiter as the enclosing bracket segment, matching the existing rule for flat field lists.
Encoding Rules (§9.3)
Extend tabular detection. An array of objects is eligible for tabular encoding when, for every top-level field of the row objects, either:
- a. Every row has a primitive at that key (existing rule), or
- b. Every row has a nested-uniform object at that key, where nested-uniform is defined recursively as: non-empty, same set of keys in every row, and each value at each key is itself either a primitive or nested-uniform.
When satisfied:
- The header lists top-level fields in first-row encounter order. Fields matching rule (a) are emitted as a bare
fieldname; fields matching rule (b) are emitted as fieldname{...subfields...}, applying the same rules recursively.
- Row values are written as a single delimiter-separated line at depth +1. Values are laid out by a depth-first, pre-order walk of the header descriptors (left-to-right, nested groups expanded in place). The flattened arity of each row MUST equal the total number of leaf fields in the header.
- If any field fails both (a) and (b), encoders MUST fall back to the expanded list form (§9.4) for the entire array, exactly as today.
Empty nested groups ({}) are NOT permitted; an object field with no keys disqualifies the array from the nested-uniform rule for that column and forces expanded list form.
Decoding Rules
A decoder encountering a nested field group in a header MUST:
- Parse field descriptors recursively, tracking brace depth so that nested
{…} are matched correctly.
- Compute the ordered list of leaf field names via a depth-first walk of the descriptor tree.
- For each row line at depth +1, split on the active delimiter exactly as today. In strict mode, the number of cells MUST equal the number of leaf fields — this replaces the existing flat
fields.length equality check.
- Reconstruct each row object by walking the descriptor tree in the same depth-first order, assigning consecutive row cells to leaf descriptors and wrapping nested groups in plain objects.
Row disambiguation rules (§9.3 "first-unquoted-delimiter vs first-unquoted-colon") are unchanged: rows still contain only primitive values separated by the active delimiter, with no unquoted colons.
Examples
Before (current spec)
orders[2]:
- id: 1
customer:
name: Alice
country: DK
total: 99
- id: 2
customer:
name: Bob
country: UK
total: 149
After (proposed)
orders[2]{id,customer{name,country},total}:
1,Alice,DK,99
2,Bob,UK,149
Multiple nested columns
shipments[1]{id,sender{name,city},receiver{name,city}}:
s1,ACME,Berlin,Globex,Oslo
Non-uniform column falls back
{ "entries": [
{ "id": 1, "meta": { "a": 1 } },
{ "id": 2, "meta": { "b": 2 } }
]}
meta has different keys per row, so the array is encoded in expanded list form (§9.4) exactly as in TOON v3.
Alternative delimiter (pipe)
orders[2|]{id|customer{name|country}|total}:
1|Alice|DK|99
2|Bob|UK|149
Drawbacks
- Header lines become denser and slightly harder for humans to scan when many columns are nested.
- Decoder parsing of the fields segment needs brace-depth tracking, a small increase over the current flat split. (Reference implementation: ~50 LOC.)
- Adds one more case to §9.3's detection logic; encoders must recurse over each column to classify it as primitive / nested-uniform / non-uniform.
- Introduces a second mechanism for compressing nested structure alongside key folding (§13.4), which could feel redundant for the subset of shapes where both apply.
- Decoders built for v3 will hard-fail on documents using the new syntax (unmatched
{). That is the correct fail-closed behaviour for a format extension, but it does mean v3.0 and post-RFC implementations cannot freely mix.
Alternatives Considered
Alternative 1: Key folding (§13.4) only
Key folding collapses single-key chains into dotted paths. It does not help here because customer is not a single-key wrapper — it has multiple sibling keys (name, country). The two mechanisms are orthogonal: folding targets chains, this RFC targets groups.
Alternative 2: Per-row {} inline objects inside tabular rows
We could allow row cells to contain literal {name:Alice,country:DK} objects. This keeps the header flat but repeats the structural markers on every row and reintroduces unquoted colons inside rows, breaking the §9.3 row disambiguation rule. Strictly worse for tokens and significantly worse for parseability.
Alternative 3: New header marker (e.g. key[N]**{…}:)
Introducing a new sigil to distinguish "nested-aware" tables is more invasive grammatically and provides no extra expressiveness over recursive fields-seg. Reusing existing characters ({, }, the active delimiter) means no new quoting or escape rules are required.
Alternative 4: Do nothing
Not acceptable for the dominant real-world shape. On the benchmarked dataset of 500 shipment records with nested sender/receiver/dimensions sub-objects, TOON v3 is worse than JSON compact (58,701 vs 46,697 tokens) — precisely the scenario TOON is supposed to win at. Leaving this on the table concedes the format's strongest feature on a very common input shape.
Impact on Implementations
- Reference implementation: Implemented in toon-format/toon#296. Encoder:
~130 LOC added across encode/nested-fields.ts, encode/encoders.ts, encode/primitives.ts. Decoder: ~80 LOC in decode/parser.ts (recursive descriptor parser + brace matcher). All 474 pre-existing tests pass unchanged; 8 new tests cover encode, decode, round-trip, multi-field, and non-uniform fallback.
- Community implementations: Need to add recursive parsing of
fields-seg (brace-depth tracking) and a depth-first row-value walker. No new tokens, escapes, or delimiters. A conservative implementation MAY ship decoder support before encoder support.
- Backward compatibility: Documents produced by a pre-RFC encoder remain valid — when no column is a uniform nested object, rule (a) still produces the exact same header and rows as TOON v3. The reference implementation confirms byte-identical output on all non-nested benchmark datasets. A pre-RFC decoder encountering a nested header will fail at header parse time (unmatched
{); this is fail-closed behaviour, which is the correct posture for a format extension.
- Migration path: Encoders enable the feature via an opt-in option (
nestedTables: true in the reference implementation). No action required by users of v3-encoded documents.
Migration Strategy
Not a breaking change — no migration required.
For Implementers
- Update the fields-segment parser to track brace depth and build a recursive descriptor tree instead of a flat string array.
- Compute the leaf field list via depth-first walk; use it in place of the flat field list for row-arity validation.
- On decode, walk the descriptor tree in the same order, wrapping nested groups in plain objects.
- (Encoder only, optional) Add a
nestedTables option that extends tabular detection to allow uniform nested-object columns.
For Users
No action required. Existing TOON documents remain valid and continue to decode under both pre-RFC and post-RFC implementations.
Test Cases
{
"name": "nested tabular — basic",
"input": {
"orders": [
{ "id": 1, "customer": { "name": "Alice", "country": "DK" }, "total": 99 },
{ "id": 2, "customer": { "name": "Bob", "country": "UK" }, "total": 149 }
]
},
"expected": "orders[2]{id,customer{name,country},total}:\n 1,Alice,DK,99\n 2,Bob,UK,149",
"note": "Uniform nested object column collapses into the header; rows remain flat."
}
{
"name": "nested tabular — multiple nested columns",
"input": {
"shipments": [
{ "id": "s1", "sender": { "name": "ACME", "city": "Berlin" }, "receiver": { "name": "Globex", "city": "Oslo" } }
]
},
"expected": "shipments[1]{id,sender{name,city},receiver{name,city}}:\n s1,ACME,Berlin,Globex,Oslo",
"note": "Two sibling nested columns; row values are laid out depth-first."
}
{
"name": "nested tabular — non-uniform fallback",
"input": {
"entries": [
{ "id": 1, "meta": { "a": 1 } },
{ "id": 2, "meta": { "b": 2 } }
]
},
"note": "Different keys in the nested object per row — array MUST fall back to expanded list form (§9.4) exactly as in v3."
}
Affected Specification Sections
- §6 Header Syntax — generalise the
fields-seg / fieldname ABNF productions to allow recursion.
- §9.3 Arrays of Objects — Tabular Form — add rule (b) (nested-uniform columns) to detection; describe depth-first row value layout and the revised strict-mode row-arity check.
- §9.4 Mixed / Non-Uniform Arrays — Expanded List — add a cross-reference noting that a column failing both rules (a) and (b) triggers fallback to §9.4 for the whole array.
- §13.4 Key Folding and Path Expansion — one-line note that nested field groups compose cleanly with key folding applied to the header's key prefix.
- CHANGELOG.md — entry under the next minor version.
Unresolved Questions
- Hard depth limit? Should the spec cap nesting depth (e.g. 2 or 3 levels) or leave it unconstrained like row count? The reference implementation currently caps at 2 levels as a heuristic, but the spec itself could impose no hard limit, mirroring how §9.3 treats row count.
- Empty nested groups. This RFC disallows
field{}; an alternative is to allow it as a synonym for "this column is always {}". Costs one more grammar special case for marginal benefit.
- Interaction with key folding (§13.4). Folding applies to the key prefix of a header only, so it should compose cleanly, but the interaction is worth an explicit note.
Additional Context
- Reference implementation: toon-format/toon#296 — maintainer requested this RFC before reviewing the implementation PR.
- Related but complementary: toon-format/spec#45 ("Object Schema Headers / Nest-Collapse for Keyed Object Collections") targets
Record<string, Object> containers; this RFC targets arrays of uniform objects. The two proposals could land independently.
- Not related: toon-format/spec#31 (type annotations) was closed; this RFC does not add or rely on column type hints.
- Target version: v3.1 (next minor) per §20 — backward-compatible addition.
Type of Change
Summary
Extend the tabular form (§9.3) so that a column may itself be a uniform nested object, declared recursively in the header as
field{sub1,sub2}. Rows remain flat delimiter-separated lines; the header alone encodes the nested shape. This generalises the existing{f1,f2}field list into a recursive form and unlocks table-collapse savings for the very common "array of records with a grouped sub-object" shape (addresses, customers, dimensions, geo, …).Motivation
Problem
TOON v3's tabular form (
key[N]{fields}:) is the format's strongest compression mechanism, but it only applies when every column value is a JSON primitive. As soon as one field in the row objects is itself a uniform nested object, detection fails and the entire array falls back to expanded list form (§9.4), repeating every key nameNtimes.Real-world payloads from APIs, event logs, and catalogue data very often have this shape: a flat list of records where one or two fields group related attributes. Today:
Every key in
customerrepeatsNtimes; the list-item markers add further overhead. The shape is already uniform — the format just has no way to declare it.Benefits
Measured against the reference TypeScript implementation on 7 datasets, including a
uniform-nesteddataset of 500 shipment records with sender/receiver/dimensions sub-objects:For data without nested objects the output is byte-identical to TOON v3; the feature has zero overhead when it does not apply.
Detailed Design
Proposed Syntax
The header nests one or more field groups inside
{...}; row lines are unchanged in shape — plain delimiter-separated primitives at depth +1.Grammar Changes (§6 Header Syntax)
Generalise the
fields-seg/fieldnameproductions so field entries may themselves contain a nested field group:All other header rules (bracket segment, delimiter selection, trailing colon, §7.3 key encoding) are unchanged. The delimiter in a nested
fields-segMUST be the same active delimiter as the enclosing bracket segment, matching the existing rule for flat field lists.Encoding Rules (§9.3)
Extend tabular detection. An array of objects is eligible for tabular encoding when, for every top-level field of the row objects, either:
When satisfied:
fieldname; fields matching rule (b) are emitted asfieldname{...subfields...}, applying the same rules recursively.Empty nested groups (
{}) are NOT permitted; an object field with no keys disqualifies the array from the nested-uniform rule for that column and forces expanded list form.Decoding Rules
A decoder encountering a nested field group in a header MUST:
{…}are matched correctly.fields.lengthequality check.Row disambiguation rules (§9.3 "first-unquoted-delimiter vs first-unquoted-colon") are unchanged: rows still contain only primitive values separated by the active delimiter, with no unquoted colons.
Examples
Before (current spec)
After (proposed)
Multiple nested columns
Non-uniform column falls back
{ "entries": [ { "id": 1, "meta": { "a": 1 } }, { "id": 2, "meta": { "b": 2 } } ]}metahas different keys per row, so the array is encoded in expanded list form (§9.4) exactly as in TOON v3.Alternative delimiter (pipe)
Drawbacks
{). That is the correct fail-closed behaviour for a format extension, but it does mean v3.0 and post-RFC implementations cannot freely mix.Alternatives Considered
Alternative 1: Key folding (§13.4) only
Key folding collapses single-key chains into dotted paths. It does not help here because
customeris not a single-key wrapper — it has multiple sibling keys (name,country). The two mechanisms are orthogonal: folding targets chains, this RFC targets groups.Alternative 2: Per-row
{}inline objects inside tabular rowsWe could allow row cells to contain literal
{name:Alice,country:DK}objects. This keeps the header flat but repeats the structural markers on every row and reintroduces unquoted colons inside rows, breaking the §9.3 row disambiguation rule. Strictly worse for tokens and significantly worse for parseability.Alternative 3: New header marker (e.g.
key[N]**{…}:)Introducing a new sigil to distinguish "nested-aware" tables is more invasive grammatically and provides no extra expressiveness over recursive
fields-seg. Reusing existing characters ({,}, the active delimiter) means no new quoting or escape rules are required.Alternative 4: Do nothing
Not acceptable for the dominant real-world shape. On the benchmarked dataset of 500 shipment records with nested sender/receiver/dimensions sub-objects, TOON v3 is worse than JSON compact (58,701 vs 46,697 tokens) — precisely the scenario TOON is supposed to win at. Leaving this on the table concedes the format's strongest feature on a very common input shape.
Impact on Implementations
~130 LOCadded acrossencode/nested-fields.ts,encode/encoders.ts,encode/primitives.ts. Decoder:~80 LOCindecode/parser.ts(recursive descriptor parser + brace matcher). All 474 pre-existing tests pass unchanged; 8 new tests cover encode, decode, round-trip, multi-field, and non-uniform fallback.fields-seg(brace-depth tracking) and a depth-first row-value walker. No new tokens, escapes, or delimiters. A conservative implementation MAY ship decoder support before encoder support.{); this is fail-closed behaviour, which is the correct posture for a format extension.nestedTables: truein the reference implementation). No action required by users of v3-encoded documents.Migration Strategy
Not a breaking change — no migration required.
For Implementers
nestedTablesoption that extends tabular detection to allow uniform nested-object columns.For Users
No action required. Existing TOON documents remain valid and continue to decode under both pre-RFC and post-RFC implementations.
Test Cases
{ "name": "nested tabular — basic", "input": { "orders": [ { "id": 1, "customer": { "name": "Alice", "country": "DK" }, "total": 99 }, { "id": 2, "customer": { "name": "Bob", "country": "UK" }, "total": 149 } ] }, "expected": "orders[2]{id,customer{name,country},total}:\n 1,Alice,DK,99\n 2,Bob,UK,149", "note": "Uniform nested object column collapses into the header; rows remain flat." }{ "name": "nested tabular — multiple nested columns", "input": { "shipments": [ { "id": "s1", "sender": { "name": "ACME", "city": "Berlin" }, "receiver": { "name": "Globex", "city": "Oslo" } } ] }, "expected": "shipments[1]{id,sender{name,city},receiver{name,city}}:\n s1,ACME,Berlin,Globex,Oslo", "note": "Two sibling nested columns; row values are laid out depth-first." }{ "name": "nested tabular — non-uniform fallback", "input": { "entries": [ { "id": 1, "meta": { "a": 1 } }, { "id": 2, "meta": { "b": 2 } } ] }, "note": "Different keys in the nested object per row — array MUST fall back to expanded list form (§9.4) exactly as in v3." }Affected Specification Sections
fields-seg/fieldnameABNF productions to allow recursion.Unresolved Questions
field{}; an alternative is to allow it as a synonym for "this column is always{}". Costs one more grammar special case for marginal benefit.Additional Context
Record<string, Object>containers; this RFC targets arrays of uniform objects. The two proposals could land independently.