Skip to content

fix: add 168 missing lexer tokens to keyword rule#186

Merged
ako merged 3 commits intomendixlabs:mainfrom
engalar:fix/keyword-rule-missing-tokens
Apr 13, 2026
Merged

fix: add 168 missing lexer tokens to keyword rule#186
ako merged 3 commits intomendixlabs:mainfrom
engalar:fix/keyword-rule-missing-tokens

Conversation

@engalar
Copy link
Copy Markdown
Contributor

@engalar engalar commented Apr 10, 2026

Summary

  • The keyword rule in MDLParser.g4 allows lexer tokens to be used as identifiers (entity/attribute/enum names via qualifiedName). 168 word-type tokens were missing, causing parse failures when user-defined names matched keywords like Data, Filter, Match, Empty, Open, Container, Node, Activity, etc.
  • Reorganized the rule by category with alphabetical ordering for easy auditing
  • Removed 3 duplicate entries (STRUCTURES, PAGING, EXECUTE) and 1 phantom token (UI)

Before

-- Parse error: "extraneous input '.' expecting {',', ')'}"
$Issue = CREATE MyModule.OrderIssue (
    Status = MyModule.ENUM_IssueStatus.Open
);

After

-- All keyword-named enum values now parse correctly
$Issue = CREATE MyModule.OrderIssue (
    Status = MyModule.ENUM_IssueStatus.Open,     -- was broken
    Kind = MyModule.ENUM_Kind.Data,               -- was broken
    Mode = MyModule.ENUM_Mode.Filter,             -- was broken
    State = MyModule.ENUM_State.Empty,            -- was broken
    Type = MyModule.ENUM_Type.Container,          -- was broken
    Phase = MyModule.ENUM_Phase.Match,            -- was broken
    Method = MyModule.ENUM_M.Get                  -- was broken
);

Scope

Only the keyword rule in MDLParser.g4 changed (+ regenerated parser files). No executor, visitor, or AST changes.

Follow-up from PR #174 code review which identified this systemic gap.

Test plan

  • make test — all tests pass (including previously failing TestQuotedIdentifierInWidgetAttribute and TestShowPageMicroflowStyleArgsInWidget which were broken during development by accidentally dropping LAYOUT)
  • Smoke tested 15+ keyword-as-enum-value cases (Data, Filter, Match, Empty, Container, Open, Node, Activity, Condition, Get, Post, etc.)
  • No original keyword tokens dropped (verified via comm -23)
  • No duplicate tokens in final rule (verified via uniq -d)

@github-actions
Copy link
Copy Markdown

AI Code Review

What Looks Good

  • Clear problem/solution explanation: The PR effectively explains that 168 missing lexer tokens were causing parse failures when user-defined names matched keywords (like Data, Filter, Match), with concrete before/after examples showing the fix works.
  • Appropriate scope: The change is narrowly focused on fixing the keyword rule in MDLParser.g4 to resolve a systemic parsing gap identified in PR fix: add OPEN to keyword rule for enum value parsing #174, with no unnecessary modifications to executor, visitor, or AST layers (as correctly noted).
  • Maintainability improvements: Reorganized the keyword rule by category with alphabetical ordering for easier auditing, removed 3 duplicate entries and 1 phantom token (UI), improving code quality.
  • Thorough testing: Verified via make test (including previously failing tests), smoke-tested 15+ keyword-as-enum-value cases, confirmed no original tokens were dropped, and verified no duplicates remain.
  • Follows conventions: The PR correctly notes that generated parser files (mdl/grammar/parser/) are noise and should be skipped in review, aligning with project instructions.

Recommendation

Approve. This PR successfully resolves the parsing issue by adding missing tokens to the keyword rule, improves maintainability through reorganization and deduplication, and provides adequate test coverage. The scoped change appropriately addresses the foundation layer without requiring modifications to other pipeline components since it fixes core parsing functionality. No changes are needed.


Automated review via OpenRouter (Nemotron Super 120B) — workflow source

@github-actions
Copy link
Copy Markdown

AI Code Review

What Looks Good

  • Bug Fix: Correctly addresses parse failures when user-defined names (entity/attribute/enum names) matched missing lexer tokens like Data, Filter, Match, etc. The before/after examples clearly demonstrate the fix works for enum values and other identifiers.
  • Thorough Reorganization: The keyword rule is now categorized by functional area (DDL/DML, Entity/Domain model, Types, etc.) with alphabetical ordering within each group, making future maintenance and auditing significantly easier.
  • Cleanup: Removed 3 duplicate entries (STRUCTURES, PAGING, EXECUTE) and 1 phantom token (UI) as stated, improving accuracy of the keyword list.
  • Verification: Explicitly confirmed no original tokens were dropped (comm -23) and no duplicates remain (uniq -d), showing due diligence.
  • Test Coverage: All tests pass including previously failing tests, plus manual smoke testing of 15+ keyword-as-enum-value cases.
  • Scoped Change: Limited to the grammar layer only (no AST/visitor/executor changes needed), which is appropriate for this type of parsing fix.
  • Generated Files: Properly regenerated and committed ANTLR parser files after grammar change, as required.
  • MDL Syntax Compliance: Maintains all existing MDL syntax patterns; the fix enables proper use of keywords as identifiers without altering the language design.

Recommendation

Approve. This is a well-executed bug fix that resolves a significant parsing gap while improving code maintainability through thoughtful reorganization. The change is minimal, focused, and thoroughly tested. No modifications to downstream pipeline components are needed or appropriate for this grammar-level fix.


Automated review via OpenRouter (Nemotron Super 120B) — workflow source

Copy link
Copy Markdown
Collaborator

@ako ako left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the broad fix I recommended in the PR #174 review.

What's good

  • Solves the systemic problem: 168 word-type tokens missing from keyword is a significant gap. This affects enum values, attribute names, entity names, anywhere qualifiedName is used.
  • Categorical organization with 18 groups + alphabetical ordering makes future audits trivial — easy to find where to add new tokens
  • Cleanup: removes 3 duplicates (STRUCTURES, PAGING, EXECUTE) and 1 phantom (UI)
  • Pure grammar change — no executor/visitor/AST, lowest possible risk
  • Verification approach documented: comm -23 for missing tokens, uniq -d for duplicates

Concerns

No regression tests added. "Smoke tested 15+ cases" is good for confidence but doesn't prevent regression. A test file with CREATE ENUMERATION Test.E (Open, Data, Filter, Match, Empty, Container, Node, ...) would catch any future drift.

No automated lexer/keyword sync check. A make target that grep's word-type tokens from MDLLexer.g4 and verifies each is in the keyword rule (or explicitly excluded) would prevent this from happening again. Worth a follow-up issue.

PR #174 becomes a partial duplicate since this includes the OPEN fix. Should be closed when this merges.

LGTM.

@github-actions
Copy link
Copy Markdown

AI Code Review

What Looks Good

  • Correctly addresses the reported issue: The PR fixes parsing failures when user-defined names matched lexer keywords (like Data, Filter, Match), which was causing extraneous input '.' expecting {',', ')'} errors.
  • Thorough test coverage:
    • Added a dedicated test file mdl-examples/doctype-tests/keyword-as-identifier.mdl with comprehensive examples covering enum values, entity names, and attributes that are keywords
    • Added mdl/grammar/keyword_coverage_test.go to automatically verify all lexer tokens are included in the keyword rule, preventing future regressions
  • Improved maintainability:
    • Reorganized the keyword rule by category with alphabetical ordering within each group
    • Removed 3 duplicate entries (STRUCTURES, PAGING, EXECUTE) and 1 phantom token (UI)
    • The structure makes future auditing and updates much easier
  • Proper pipeline execution:
    • Grammar change in MDLParser.g4
    • Parser regenerated (generated files updated as expected)
    • No unnecessary changes to AST, visitor, or executor layers (as none were needed for this fix)
  • Follows MDL design principles:
    • Allows keywords to be used as identifiers in appropriate contexts (enum values, entity/attribute names) without overloading keyword meanings
    • Maintains correct usage of colon for property definitions (Key: value) and AS for name mappings (not applicable here but respected)
    • Uses proper qualified names (Module.Element) in all test examples
    • Statements read as natural English

Recommendation

Approve the PR. This is a well-focused fix that resolves a significant parsing gap while improving code maintainability through better organization and adding robust test coverage to prevent regressions. The change follows all required procedures for MDL syntax updates and properly wires through the parser layer of the pipeline (which is sufficient for this type of fix). The added tests provide excellent validation that the fix works correctly across various keyword-as-identifier scenarios.


Automated review via OpenRouter (Nemotron Super 120B) — workflow source

engalar added 2 commits April 13, 2026 12:01
The keyword rule in MDLParser.g4 allows lexer tokens to be used as
identifiers (entity names, attribute names, enum values). Many tokens
were missing, causing parse failures when user-defined names matched
keywords like Data, Filter, Match, Empty, Open, Container, etc.

This adds all word-type lexer tokens to the keyword rule, organized
by category with alphabetical ordering for easy auditing. Also removes
duplicate entries (STRUCTURES, PAGING, EXECUTE) and a phantom UI token.
@engalar engalar force-pushed the fix/keyword-rule-missing-tokens branch from f4d9459 to 2780fe2 Compare April 13, 2026 04:07
@github-actions
Copy link
Copy Markdown

AI Code Review

What Looks Good

  • Fixes a significant parsing issue where 168 lexer tokens were missing from the keyword rule, causing parse failures when user-defined names matched keywords like Data, Filter, Match, etc.
  • Comprehensive solution: added all missing tokens rather than just the reported ones
  • Improved maintainability: reorganized rule by category with alphabetical ordering for easy auditing
  • Cleanup: removed 3 duplicate entries (STRUCTURES, PAGING, EXECUTE) and 1 phantom token (UI)
  • Excellent test coverage:
    • Added keyword-as-identifier.mdl doctest showing practical usage
    • Added keyword_coverage_test.go unit test that verifies all lexer tokens are in the keyword rule (will catch future omissions)
  • Verified no original tokens were dropped and no duplicates remain
  • All tests pass including previously failing ones (TestQuotedIdentifierInWidgetAttribute, TestShowPageMicroflowStyleArgsInWidget)
  • Smoke tested 15+ keyword-as-enum-value cases
  • Changes are scoped appropriately: only grammar change with regenerated parser files (no executor/visitor/AST changes needed)
  • Follows established patterns for allowing keywords as identifiers in specific contexts

Recommendation

Approve. This is a well-executed fix that addresses a critical parsing gap with comprehensive test coverage and improved code organization. The changes are focused, follow project patterns, and prevent similar regressions in the future.


Automated review via OpenRouter (Nemotron Super 120B) — workflow source

…tests

Replace commonNameKeyword with keyword in all identifier rules
(attributeName, enumValueName, indexColumnName, imageName, parameterName,
selectAlias) so any lexer token can be used as an identifier. Remove the
now-redundant commonNameKeyword rule.

Add keyword-as-identifier.mdl regression test covering enum values,
entity names, and attribute names that match lexer keywords.
Add keyword_coverage_test.go to verify lexer/keyword sync.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@engalar engalar force-pushed the fix/keyword-rule-missing-tokens branch from 2780fe2 to 3e7d2d7 Compare April 13, 2026 04:17
@github-actions
Copy link
Copy Markdown

AI Code Review

What Looks Good

The PR effectively addresses a significant gap in the MDL parser where 168 lexer tokens were missing from the keyword rule, causing parse failures when user-defined names matched keywords. The solution is comprehensive and well-executed:

  • Thorough fix: Added all missing tokens to the keyword rule, reorganized by category with alphabetical ordering for maintainability
  • Cleanup: Removed 3 duplicate entries and 1 phantom token (UI)
  • Test coverage:
    • Added mdl-examples/doctype-tests/keyword-as-identifier.mdl with extensive test cases
    • Added mdl/grammar/keyword_coverage_test.go to automatically verify lexer-token/keyword-rule parity
    • All tests pass including previously failing tests
  • Correct scoping: As stated, only the grammar rule and generated parser files changed - no executor, visitor, or AST modifications were needed since this is purely a parsing/enablement fix
  • Maintainability improvements: The reorganization makes future audits easier, and the new test will prevent regressions
  • Verified fix: The before/after examples clearly demonstrate the issue is resolved for enum values like Open, Data, Filter, etc.

The changes follow the project's architecture correctly - since this affects only how identifiers are parsed (allowing keywords to be used as identifiers in appropriate contexts), it doesn't require changes down the stack to AST, visitor, or executor layers.

Recommendation

Approve the PR. This is a well-scoped, thoroughly tested bug fix that resolves a significant parsing gap without introducing any negative side effects. The added test coverage ensures this issue won't recur.


Automated review via OpenRouter (Nemotron Super 120B) — workflow source

Copy link
Copy Markdown
Collaborator

@ako ako left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both review concerns fully addressed in the two new commits:

  1. Automated sync test (keyword_coverage_test.go) parses MDLLexer.tokens and verifies every word-type token is in the keyword rule — this prevents the class of bug from ever recurring. Also catches stale/extra entries. Exactly what I recommended.

  2. Regression test file (keyword-as-identifier.mdl) with 117 lines of enumerations using keyword-named values (Open, Data, Filter, Match, Get, Post, Activity, Layout, Header, etc.).

  3. Unified identifier rules to use the full keyword list — good consistency improvement.

LGTM.

@ako ako merged commit f0d16f0 into mendixlabs:main Apr 13, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants