Add semantic tokens support (textDocument/semanticTokens/full and /range)#450
Add semantic tokens support (textDocument/semanticTokens/full and /range)#450axlegrinder1 wants to merge 2 commits intoVHDL-LS:masterfrom
Conversation
…nge) Expose VHDL entity classification to the editor via LSP semantic tokens, enabling context-aware coloring for signals, variables, constants, types, functions, ports, generics, and other VHDL constructs. Uses standard LSP token types (variable, parameter, function, type, etc.) for out-of-the-box theme compatibility. Constants and generics are distinguished via the readonly modifier. Results are cached per file and invalidated on change. Closes VHDL-LS#314
Schottkyc137
left a comment
There was a problem hiding this comment.
Thanks for tackling this!
Implementation looks overall like a good starting point. I have left a few comments.
Concerning more involved changes of vhdl_lang: I have nothing against that if there is real advantage. Just please make sure to structure this, e.g., by opening multiple PRs such that it's easier to review.
| } | ||
|
|
||
| /// Decode delta-encoded semantic tokens to (line, start, length, token_type, modifiers). | ||
| fn decode_semantic_tokens(tokens: &[SemanticToken]) -> Vec<(u32, u32, u32, u32, u32)> { |
There was a problem hiding this comment.
Please return a more meaningful data type. A tuple that consists of n u32s is really hard to read.
| source.change(range.as_ref(), &content_change.text); | ||
| } | ||
| self.project.update_source(&source); | ||
| self.semantic_token_cache.clear(); |
There was a problem hiding this comment.
Feels weird that the entire cache is cleared (which contains all URIs) if only a single file is updated. Is there any reason for this? (See also above)
There was a problem hiding this comment.
It's due to cross-file references. I added a clarifying comment. It's a bit heavy handed maybe but ensures no stale tokens exist for any file.
|
|
||
| /// Delta-encode sorted tokens, optionally filtering to a range. | ||
| fn encode( | ||
| tokens: &[(vhdl_lang::Range, u32, u32)], |
There was a problem hiding this comment.
This tuple (Range, u32, u32) is not descriptive. I suggest changing it to some small struct.
| tokens.sort_by(|a, b| { | ||
| a.0.start | ||
| .line | ||
| .cmp(&b.0.start.line) | ||
| .then(a.0.start.character.cmp(&b.0.start.character)) | ||
| }); |
There was a problem hiding this comment.
IIRC SrcPos already implements Ord so I think this is a bit overcomplicated
There was a problem hiding this comment.
Yep, missed that. Refactored to use Ord
| /// Check if a token overlaps the filter range by line. | ||
| /// Character-level precision is not needed as clients request full-line ranges. | ||
| fn in_range(token_range: &vhdl_lang::Range, filter: &vhdl_lang::Range) -> bool { | ||
| token_range.start.line <= filter.end.line && token_range.end.line >= filter.start.line | ||
| } |
There was a problem hiding this comment.
If this kind of utility doesn't exist in vhdl_lang::Range already I suggest to move it there
There was a problem hiding this comment.
I've moved the function to the Range::overlaps_lines
| } | ||
| } | ||
|
|
||
| fn to_semantic_token(kind: &AnyEntKind) -> Option<(u32, u32)> { |
There was a problem hiding this comment.
The (u32, u32) is again not really descriptive. It's OKish here compared to the examples down, but I prefer if this is some small descriptive struct.
| // Semantic token type indices — order must match TOKEN_TYPES | ||
| const VARIABLE: u32 = 0; | ||
| const PARAMETER: u32 = 1; | ||
| const PROPERTY: u32 = 2; | ||
| const ENUM_MEMBER: u32 = 3; | ||
| const FUNCTION: u32 = 4; | ||
| const TYPE: u32 = 5; | ||
| const CLASS: u32 = 6; | ||
| const NAMESPACE: u32 = 7; | ||
| const STRUCT: u32 = 8; | ||
| const ENUM: u32 = 9; | ||
|
|
||
| // Semantic token modifier bits | ||
| const MOD_READONLY: u32 = 1 << 0; | ||
|
|
||
| pub const TOKEN_TYPES: &[SemanticTokenType] = &[ | ||
| SemanticTokenType::VARIABLE, // 0: signals, variables, constants, files | ||
| SemanticTokenType::PARAMETER, // 1: subprogram parameters | ||
| SemanticTokenType::PROPERTY, // 2: attributes, record fields | ||
| SemanticTokenType::ENUM_MEMBER, // 3: enum literals | ||
| SemanticTokenType::FUNCTION, // 4: functions, procedures | ||
| SemanticTokenType::TYPE, // 5: types (general) | ||
| SemanticTokenType::CLASS, // 6: protected types, components | ||
| SemanticTokenType::NAMESPACE, // 7: libraries, design units, labels | ||
| SemanticTokenType::STRUCT, // 8: record types | ||
| SemanticTokenType::ENUM, // 9: enum types | ||
| ]; | ||
|
|
||
| pub const TOKEN_MODIFIERS: &[SemanticTokenModifier] = &[ | ||
| SemanticTokenModifier::READONLY, // bit 0: constants, generics | ||
| ]; |
There was a problem hiding this comment.
I think this should be a little macro (or similar) to make extending the semantic token types future-proof
There was a problem hiding this comment.
I added a define_token_types! macro that generates both the index constants and the TOKEN_TYPES legend array from a single declaration, so now they can't get out of sync when extending
- Replace bare (u32, u32) tuples with TokenClassification and CachedToken structs - Replace DecodedToken test tuple with named struct - Add define_token_types! macro to keep index constants and legend in sync - Use SrcPos::cmp for sorting instead of manual line/character comparison - Move in_range to Range::overlaps_lines in vhdl_lang - Rename Project::semantic_tokens to find_all_entity_references - Add source file filter in search_decl to guard against cross-file decl_pos - Match ExternalObjectClass directly instead of converting to ObjectClass - Skip multi-line tokens in encode instead of computing wrong length
42100a1 to
bb77ba1
Compare
Expose VHDL entity classification to the editor via LSP semantic tokens, enabling context-aware coloring for signals, variables, constants, types, functions, ports, generics, and other VHDL constructs.
Uses standard LSP token types (variable, parameter, function, type, etc.) for out-of-the-box theme compatibility. Constants and generics are distinguished via the readonly modifier.
Results are cached per file and invalidated on change.
Closes #314
I built this functionality pretty quickly alongside Claude Opus4.6, so I would consider this a conceptual implementation which likely has room for improvement, but the results are very promising. I have added a set of tests within the vhdl_server.rs module and run this functionality in vscode + teroshdl across several large projects (200-500 source files) with good results.
With the feature, once the server is running, the code editor has a lot more visibility within the files, giving valuable context that was previously missing, such as being able to colour valid constants/enums in line with code, not just at declaration.
This is currently working cleanly as a separate module utilising the searcher infrastructure, but a tighter integration into the analysis pass could eliminate the redundant walk and guarantee consistency but would be a far more involved change within vhdl_lang.