Skip to content

Latest commit

 

History

History
453 lines (343 loc) · 10.5 KB

File metadata and controls

453 lines (343 loc) · 10.5 KB

NTON Specification v0.03

1. Introduction

Nested Table Optimized Notation (NTON) is a schema-driven serialization format. It achieves high data density by defining data structures (DEF) and recurring values (REF) in a header, allowing the data body to use positional encoding with optional field names for clarity.

NTON is designed for LLM contexts where token efficiency matters, while maintaining robustness and evolvability.

v0.03 Focus: Eliminates off-by-one errors and enables truncation detection through mandatory naming rules and optional metadata.

2. Document Structure

An NTON document consists of three optional sections, which MUST appear in this order:

  1. Definitions (DEF)
  2. References (REF)
  3. Data Stream (STREAM)

2.1 Encoding

NTON files MUST be encoded in UTF-8.

3. The Header

3.1 Type Definitions (DEF)

Types function as structs. They define the fields and their order in the data stream.

Syntax: DEF <TypeName>: { <field>, <field>:<Type>, ... }

  • Fields without a type default to Primitive (string, number, boolean, null).
  • Arrays are denoted by [].
  • Optional fields are denoted by ?.

Examples:

# Simple type
DEF User: { id, name, email }

# Optional fields
DEF User: { id, name, email?, phone? }

# Typed fields
DEF Group: { name, owner:User, members:User[] }

# Optional arrays
DEF Project: {
  id,
  name,
  status,
  active,
  manager_id?,
  budget,
  milestones:Milestone[]?
}

3.2 Field Naming Rules (CRITICAL)

To prevent off-by-one errors and ambiguity, NTON enforces the following rules:

Rule 1: Optional fields MUST use named syntax when present.

DEF User: { id, name, email?, phone? }

# VALID:
{U1, "Alice", email="alice@example.com"}          # Named optional field
{U2, "Bob", email="bob@example.com", phone="+1-555-1234"}
{U3, "Carol"}                                      # Optional fields omitted

# INVALID:
{U4, "Dave", "dave@example.com"}                  # Ambiguous - is this email or phone?

Rule 2: Positional encoding can only be used for required fields.

DEF Task: { id, title, priority, hours? }

# VALID:
{T1, "Fix bug", 8}                    # Required fields positional, optional omitted
{T2, "Add feature", 5, hours=40}      # Required positional, optional named

# INVALID:
{T3, "Refactor", 3, 20}               # Last field MUST be named (hours=20)

Rationale: This eliminates ambiguity when optional fields are sparse or when data is truncated.

3.3 Reference Tables (REF)

Reference tables allow string deduplication via variable substitution.

Syntax: REF <RefName>: { $<Var>: "<Value>", ... }

  • Variables MUST start with $.
  • Variables can be used anywhere a string value is expected.

Example:

REF Status: {
  $IP: "In Progress",
  $C: "Completed",
  $P: "Planning"
}

REF Departments: {
  $T: "Technology",
  $M: "Marketing",
  $F: "Finance"
}

4. The Data Stream

4.1 Primitives

  • Booleans: T or F (also true or false)
  • Null: null, ~, or _
  • Strings: Unquoted if alphanumeric with no spaces. Double-quoted otherwise.
  • Numbers: Standard integer or floating-point: 42, 3.14, -17, 1.23e-4
  • Dates: ISO 8601 format: 2025-12-15 or "2025-12-15T10:30:00Z"
  • Variables: References to REF tables (e.g., $IP, $Alice)

4.2 Objects and Arrays

Objects

Objects are enclosed in { } and contain comma-separated values.

Pure Positional (Most Compact):

{U1, "Alice", "alice@example.com"}

Named Fields (Most Readable):

{id=U1, name="Alice", email="alice@example.com"}

Hybrid (Best Balance):

{U1, "Alice", email="alice@example.com"}  # Mix positional and named

Arrays

Arrays are enclosed in [ ] and contain comma-separated items.

[
  {U1, "Alice", "alice@example.com"},
  {U2, "Bob", "bob@example.com"}
]

Nested Structures

Objects and arrays can be nested to any depth using explicit delimiters.

{
  id=P001,
  name="Alpha Initiative",
  milestones=[
    {
      name="Design Phase",
      workers=[
        {W1, 65.50, "Alice"},
        {W2, 85.00, "Bob"}
      ]
    }
  ]
}

4.3 Whitespace Rules

Whitespace is FLEXIBLE:

  • Indentation is cosmetic only (for human readability)
  • Parser ignores leading/trailing whitespace
  • Line breaks are optional
  • Any consistent indentation style is acceptable (2 spaces, 4 spaces, tabs)

All of these are equivalent:

{U1, "Alice", "alice@example.com"}

{ U1, "Alice", "alice@example.com" }

{
  U1,
  "Alice",
  "alice@example.com"
}

4.4 Streams

A stream begins with STREAM <TypeName>: followed by one or more records.

Syntax: STREAM <TypeName> [(count=N)]: where count is optional.

Simple Stream:

STREAM User (count=3):
{U1, "Alice", email="alice@example.com"}
{U2, "Bob", email="bob@example.com"}
{U3, "Carol"}  # Optional field omitted

Without count (streaming contexts):

STREAM User:
{U1, "Alice", email="alice@example.com"}
{U2, "Bob", email="bob@example.com"}

Benefit: Parsers can detect truncation by comparing actual vs expected record count.

Nested Stream:

STREAM Project:
{
  P001,
  "Alpha Initiative",
  status=$IP,
  active=T,
  budget=500000,
  milestones=[
    {name="Design", date=2025-12-15, completion=1.0},
    {name="Development", date=2026-01-20, completion=0.5}
  ]
}

4.5 Comments

# Line comment

{U1, "Alice"}  # Inline comment

/*
   Block comment
   spanning multiple lines
*/

4.6 Truncation Markers

Use ... to explicitly indicate intentional truncation or partial data display.

# Showing first 2 of 1000 records
STREAM User (count=1000):
{U1, "Alice", email="alice@example.com"}
{U2, "Bob", email="bob@example.com"}
...

Arrays with truncation:

workers=[
  {W1, 65.50, manager="Alice"},
  {W2, 85.00, manager="Bob"},
  ...
]

Benefit: Distinguishes intentional truncation from parsing errors.

5. Grammar (ABNF Summary)

document      = *definition *reference *stream
definition    = "DEF" SP type-name ":" SP "{" field-list "}" LF
field         = field-name [":" type-name] ["?"]
reference     = "REF" SP ref-name ":" SP "{" var-list "}" LF
stream        = "STREAM" SP type-name [SP "(" "count" "=" number ")"] ":" LF *record
record        = object LF
object        = "{" [field-value *("," field-value)] "}"
field-value   = [field-name "="] value
              ; field-name "=" REQUIRED for optional fields when present
array         = "[" [value *("," value) ["," "..."]] "]"
              ; "..." indicates explicit truncation
value         = primitive / object / array / variable
primitive     = string / number / boolean / null / date

6. Optional Fields and Schema Evolution

Optional Fields

Fields marked with ? in the DEF can be omitted or set to null.

IMPORTANT: When present, optional fields MUST use named syntax (see section 3.2).

DEF User: { id, name, email?, phone? }

# All valid:
{U1, "Alice", email="alice@example.com", phone="(555) 1234"}  # Named
{U2, "Bob", email="bob@example.com"}                           # Named
{U3, "Carol", email=null, phone=null}                          # Explicit null
{U4, "Dave"}                                                    # Omitted

# INVALID (v0.03):
{U5, "Eve", "eve@example.com", "(555) 1234"}  # Positional optional fields

Schema Evolution

Adding optional fields is backward compatible:

# Version 1
DEF User: { id, name, email }

# Version 2 (backward compatible)
DEF User: { id, name, email, phone?, created_at? }

Old data remains valid under the new schema.

7. Error Handling and Validation

Parsers MUST validate:

  • Required fields: Missing required fields are parse errors
  • Optional field syntax: Optional fields present without names are errors (v0.03+)
  • Unclosed delimiters: { [ without matching } ] are errors
  • Type mismatches: Attempt coercion, error if impossible
  • Record counts: Mismatch between declared count and actual count triggers warning

Parsers SHOULD be forgiving:

  • Trailing commas: Allowed and ignored
  • Missing optional fields: Treated as null
  • Extra unknown fields: Ignored with a warning
  • Whitespace variations: Flexible indentation and line breaks

8. File Extension and MIME Type

  • Extension: .nton
  • MIME type: application/nton

9. Complete Example

# GlobalTech Project Data - NTON v0.03

DEF Worker: { id, rate, manager_name? }

DEF Milestone: {
  name,
  date,
  completion,
  workers:Worker[]?
}

DEF Project: {
  id,
  name,
  status,
  active,
  manager_id?,
  budget,
  milestones:Milestone[]?
}

REF Status: {
  $IP: "In Progress",
  $C: "Completed",
  $P: "Planning"
}

REF Managers: {
  $Alice: "Alice",
  $Bob: "Bob",
  $Carol: "Carol"
}

STREAM Project (count=2):
{
  P001,
  "Alpha Initiative",
  status=$IP,
  active=T,
  manager_id=M1,
  budget=500000,
  milestones=[
    {
      "Design Sprints",
      2025-12-15,
      1.0,
      workers=[
        {W1, 65.50, manager_name=$Alice},
        {W2, 85.00, manager_name=$Bob}
      ]
    },
    {
      "Prototype Approval",
      2026-01-20,
      0.9,
      workers=[
        {W1, 65.50, manager_name=$Alice},
        {W3, 72.25, manager_name=$Carol}
      ]
    }
  ]
}

{
  P002,
  "HR Portal V2",
  status=$C,
  active=F,
  budget=120000,
  milestones=[
    {"Requirements", 2025-10-01, 1.0},
    {"Launch", 2025-11-20, 1.0}
  ]
}

10. Key Improvements in v0.02

Feature Benefit
Explicit delimiters { } [ ] Robust parsing, no indentation errors
Optional field names Clarity when needed, brevity when obvious
Optional fields ? Schema evolution without breaking changes
Flexible whitespace LLM-friendly, human-friendly
No count markers Simpler, less error-prone
Hybrid syntax Balance between compression and readability

11. Key Improvements in v0.03

Feature Benefit
Mandatory names for optional fields Eliminates off-by-one errors and ambiguity
Optional stream record counts Enables truncation detection
Explicit truncation markers (...) Distinguishes intentional vs accidental truncation
Strict validation rules Catches errors early in parsing
Required field enforcement Prevents incomplete data

Breaking change from v0.02: Optional fields MUST use named syntax when present. This trade-off sacrifices ~10-20% token efficiency on sparse data to gain robustness and eliminate parsing ambiguity.