Skip to content

Fix infinite loop in tokenizer for identifiers starting with underscore#55

Draft
Copilot wants to merge 4 commits intomainfrom
copilot/fix-parse-termination-bug
Draft

Fix infinite loop in tokenizer for identifiers starting with underscore#55
Copilot wants to merge 4 commits intomainfrom
copilot/fix-parse-termination-bug

Conversation

Copy link
Copy Markdown

Copilot AI commented Dec 24, 2025

The tokenizer enters an infinite loop when encountering identifiers starting with underscore (e.g., _0b1), exhausting memory.

Root Cause

read_identifier() regex pattern [a-zA-Z][a-zA-Z0-9_-]* requires first character to be a letter, but tokenizer matches _ as identifier start. Pattern mismatch returns empty string without advancing position, creating infinite Identifier("") tokens.

Changes

  • tokenize.mbt: Updated regex to [a-zA-Z0-9_-]+ to match any valid TOML bare key character per spec
  • lexer_bug_test.mbt: Added tokenization tests for identifiers starting with underscore
  • parser_test.mbt: Added regression test verifying proper error handling

Behavior

// Before: infinite loop, memory exhaustion
@toml.parse("key = _0b1")

// After: tokenizes correctly, fails with "Expected value" error (identifiers not valid as values)
@toml.parse("_key = \"value\"")  // Works as key

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • cli.moonbitlang.com
    • Triggering command: /usr/bin/curl curl -fsSL REDACTED (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>bug: parse does not terminate</issue_title>
<issue_description>The following test case causes the parsing process to never terminate, and at the end exhausts all memory resources.

///|
test "bad: does not terminate" {
  let toml = @toml.parse("leading-us-bin = _0b1")
  @json.inspect(toml)
}
```</issue_description>

## Comments on the Issue (you are @copilot in this section)

<comments>
</comments>

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits December 24, 2025 06:35
Co-authored-by: tonyfettes <29998228+tonyfettes@users.noreply.github.com>
Co-authored-by: tonyfettes <29998228+tonyfettes@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix parsing process that does not terminate Fix infinite loop in tokenizer for identifiers starting with underscore Dec 24, 2025
Copilot AI requested a review from tonyfettes December 24, 2025 06:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: parse does not terminate

2 participants