Add semantic tokens support (textDocument/semanticTokens/full and /range) by axlegrinder1 · Pull Request #450 · VHDL-LS/rust_hdl

axlegrinder1 · 2026-03-31T10:53:51Z

Expose VHDL entity classification to the editor via LSP semantic tokens, enabling context-aware coloring for signals, variables, constants, types, functions, ports, generics, and other VHDL constructs.

Uses standard LSP token types (variable, parameter, function, type, etc.) for out-of-the-box theme compatibility. Constants and generics are distinguished via the readonly modifier.

Results are cached per file and invalidated on change.

Closes #314

I built this functionality pretty quickly alongside Claude Opus4.6, so I would consider this a conceptual implementation which likely has room for improvement, but the results are very promising. I have added a set of tests within the vhdl_server.rs module and run this functionality in vscode + teroshdl across several large projects (200-500 source files) with good results.

With the feature, once the server is running, the code editor has a lot more visibility within the files, giving valuable context that was previously missing, such as being able to colour valid constants/enums in line with code, not just at declaration.

This is currently working cleanly as a separate module utilising the searcher infrastructure, but a tighter integration into the analysis pass could eliminate the redundant walk and guarantee consistency but would be a far more involved change within vhdl_lang.

…nge) Expose VHDL entity classification to the editor via LSP semantic tokens, enabling context-aware coloring for signals, variables, constants, types, functions, ports, generics, and other VHDL constructs. Uses standard LSP token types (variable, parameter, function, type, etc.) for out-of-the-box theme compatibility. Constants and generics are distinguished via the readonly modifier. Results are cached per file and invalidated on change. Closes VHDL-LS#314

Schottkyc137

Thanks for tackling this!
Implementation looks overall like a good starting point. I have left a few comments.
Concerning more involved changes of vhdl_lang: I have nothing against that if there is real advantage. Just please make sure to structure this, e.g., by opening multiple PRs such that it's easier to review.

Schottkyc137 · 2026-04-02T07:15:38Z

+    }
+
+    /// Decode delta-encoded semantic tokens to (line, start, length, token_type, modifiers).
+    fn decode_semantic_tokens(tokens: &[SemanticToken]) -> Vec<(u32, u32, u32, u32, u32)> {


Please return a more meaningful data type. A tuple that consists of n u32s is really hard to read.

Schottkyc137 · 2026-04-02T07:21:48Z

                source.change(range.as_ref(), &content_change.text);
            }
            self.project.update_source(&source);
+            self.semantic_token_cache.clear();


Feels weird that the entire cache is cleared (which contains all URIs) if only a single file is updated. Is there any reason for this? (See also above)

It's due to cross-file references. I added a clarifying comment. It's a bit heavy handed maybe but ensures no stale tokens exist for any file.

Schottkyc137 · 2026-04-02T07:23:16Z

+
+/// Delta-encode sorted tokens, optionally filtering to a range.
+fn encode(
+    tokens: &[(vhdl_lang::Range, u32, u32)],


This tuple (Range, u32, u32) is not descriptive. I suggest changing it to some small struct.

Schottkyc137 · 2026-04-02T07:24:55Z

+    tokens.sort_by(|a, b| {
+        a.0.start
+            .line
+            .cmp(&b.0.start.line)
+            .then(a.0.start.character.cmp(&b.0.start.character))
+    });


IIRC SrcPos already implements Ord so I think this is a bit overcomplicated

Yep, missed that. Refactored to use Ord

Schottkyc137 · 2026-04-02T07:28:08Z

+/// Check if a token overlaps the filter range by line.
+/// Character-level precision is not needed as clients request full-line ranges.
+fn in_range(token_range: &vhdl_lang::Range, filter: &vhdl_lang::Range) -> bool {
+    token_range.start.line <= filter.end.line && token_range.end.line >= filter.start.line
+}


If this kind of utility doesn't exist in vhdl_lang::Range already I suggest to move it there

I've moved the function to the Range::overlaps_lines

Schottkyc137 · 2026-04-02T07:30:20Z

+    }
+}
+
+fn to_semantic_token(kind: &AnyEntKind) -> Option<(u32, u32)> {


The (u32, u32) is again not really descriptive. It's OKish here compared to the examples down, but I prefer if this is some small descriptive struct.

Schottkyc137 · 2026-04-02T07:34:32Z

+// Semantic token type indices — order must match TOKEN_TYPES
+const VARIABLE: u32 = 0;
+const PARAMETER: u32 = 1;
+const PROPERTY: u32 = 2;
+const ENUM_MEMBER: u32 = 3;
+const FUNCTION: u32 = 4;
+const TYPE: u32 = 5;
+const CLASS: u32 = 6;
+const NAMESPACE: u32 = 7;
+const STRUCT: u32 = 8;
+const ENUM: u32 = 9;
+
+// Semantic token modifier bits
+const MOD_READONLY: u32 = 1 << 0;
+
+pub const TOKEN_TYPES: &[SemanticTokenType] = &[
+    SemanticTokenType::VARIABLE,    // 0: signals, variables, constants, files
+    SemanticTokenType::PARAMETER,   // 1: subprogram parameters
+    SemanticTokenType::PROPERTY,    // 2: attributes, record fields
+    SemanticTokenType::ENUM_MEMBER, // 3: enum literals
+    SemanticTokenType::FUNCTION,    // 4: functions, procedures
+    SemanticTokenType::TYPE,        // 5: types (general)
+    SemanticTokenType::CLASS,       // 6: protected types, components
+    SemanticTokenType::NAMESPACE,   // 7: libraries, design units, labels
+    SemanticTokenType::STRUCT,      // 8: record types
+    SemanticTokenType::ENUM,        // 9: enum types
+];
+
+pub const TOKEN_MODIFIERS: &[SemanticTokenModifier] = &[
+    SemanticTokenModifier::READONLY, // bit 0: constants, generics
+];


I think this should be a little macro (or similar) to make extending the semantic token types future-proof

I added a define_token_types! macro that generates both the index constants and the TOKEN_TYPES legend array from a single declaration, so now they can't get out of sync when extending

- Replace bare (u32, u32) tuples with TokenClassification and CachedToken structs - Replace DecodedToken test tuple with named struct - Add define_token_types! macro to keep index constants and legend in sync - Use SrcPos::cmp for sorting instead of manual line/character comparison - Move in_range to Range::overlaps_lines in vhdl_lang - Rename Project::semantic_tokens to find_all_entity_references - Add source file filter in search_decl to guard against cross-file decl_pos - Match ExternalObjectClass directly instead of converting to ObjectClass - Skip multi-line tokens in encode instead of computing wrong length

Schottkyc137 requested changes Apr 2, 2026

View reviewed changes

axlegrinder1 requested a review from Schottkyc137 April 7, 2026 11:42

axlegrinder1 force-pushed the feature/semantic-tokens branch from 42100a1 to bb77ba1 Compare April 7, 2026 12:36

Conversation

axlegrinder1 commented Mar 31, 2026

Uh oh!

Schottkyc137 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants