Skip to content

Add semantic tokens support (textDocument/semanticTokens/full and /range)#450

Open
axlegrinder1 wants to merge 2 commits intoVHDL-LS:masterfrom
axlegrinder1:feature/semantic-tokens
Open

Add semantic tokens support (textDocument/semanticTokens/full and /range)#450
axlegrinder1 wants to merge 2 commits intoVHDL-LS:masterfrom
axlegrinder1:feature/semantic-tokens

Conversation

@axlegrinder1
Copy link
Copy Markdown

Expose VHDL entity classification to the editor via LSP semantic tokens, enabling context-aware coloring for signals, variables, constants, types, functions, ports, generics, and other VHDL constructs.

Uses standard LSP token types (variable, parameter, function, type, etc.) for out-of-the-box theme compatibility. Constants and generics are distinguished via the readonly modifier.

Results are cached per file and invalidated on change.

Closes #314

I built this functionality pretty quickly alongside Claude Opus4.6, so I would consider this a conceptual implementation which likely has room for improvement, but the results are very promising. I have added a set of tests within the vhdl_server.rs module and run this functionality in vscode + teroshdl across several large projects (200-500 source files) with good results.

With the feature, once the server is running, the code editor has a lot more visibility within the files, giving valuable context that was previously missing, such as being able to colour valid constants/enums in line with code, not just at declaration.

This is currently working cleanly as a separate module utilising the searcher infrastructure, but a tighter integration into the analysis pass could eliminate the redundant walk and guarantee consistency but would be a far more involved change within vhdl_lang.

…nge)

Expose VHDL entity classification to the editor via LSP semantic tokens,
enabling context-aware coloring for signals, variables, constants, types,
functions, ports, generics, and other VHDL constructs.

Uses standard LSP token types (variable, parameter, function, type, etc.)
for out-of-the-box theme compatibility. Constants and generics are
distinguished via the readonly modifier.

Results are cached per file and invalidated on change.

Closes VHDL-LS#314
Copy link
Copy Markdown
Contributor

@Schottkyc137 Schottkyc137 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this!
Implementation looks overall like a good starting point. I have left a few comments.
Concerning more involved changes of vhdl_lang: I have nothing against that if there is real advantage. Just please make sure to structure this, e.g., by opening multiple PRs such that it's easier to review.

Comment thread vhdl_ls/src/vhdl_server.rs Outdated
}

/// Decode delta-encoded semantic tokens to (line, start, length, token_type, modifiers).
fn decode_semantic_tokens(tokens: &[SemanticToken]) -> Vec<(u32, u32, u32, u32, u32)> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please return a more meaningful data type. A tuple that consists of n u32s is really hard to read.

source.change(range.as_ref(), &content_change.text);
}
self.project.update_source(&source);
self.semantic_token_cache.clear();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels weird that the entire cache is cleared (which contains all URIs) if only a single file is updated. Is there any reason for this? (See also above)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's due to cross-file references. I added a clarifying comment. It's a bit heavy handed maybe but ensures no stale tokens exist for any file.


/// Delta-encode sorted tokens, optionally filtering to a range.
fn encode(
tokens: &[(vhdl_lang::Range, u32, u32)],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tuple (Range, u32, u32) is not descriptive. I suggest changing it to some small struct.

Comment on lines +110 to +115
tokens.sort_by(|a, b| {
a.0.start
.line
.cmp(&b.0.start.line)
.then(a.0.start.character.cmp(&b.0.start.character))
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC SrcPos already implements Ord so I think this is a bit overcomplicated

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, missed that. Refactored to use Ord

Comment on lines +91 to +95
/// Check if a token overlaps the filter range by line.
/// Character-level precision is not needed as clients request full-line ranges.
fn in_range(token_range: &vhdl_lang::Range, filter: &vhdl_lang::Range) -> bool {
token_range.start.line <= filter.end.line && token_range.end.line >= filter.start.line
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this kind of utility doesn't exist in vhdl_lang::Range already I suggest to move it there

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the function to the Range::overlaps_lines

}
}

fn to_semantic_token(kind: &AnyEntKind) -> Option<(u32, u32)> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The (u32, u32) is again not really descriptive. It's OKish here compared to the examples down, but I prefer if this is some small descriptive struct.

Comment on lines +6 to +36
// Semantic token type indices — order must match TOKEN_TYPES
const VARIABLE: u32 = 0;
const PARAMETER: u32 = 1;
const PROPERTY: u32 = 2;
const ENUM_MEMBER: u32 = 3;
const FUNCTION: u32 = 4;
const TYPE: u32 = 5;
const CLASS: u32 = 6;
const NAMESPACE: u32 = 7;
const STRUCT: u32 = 8;
const ENUM: u32 = 9;

// Semantic token modifier bits
const MOD_READONLY: u32 = 1 << 0;

pub const TOKEN_TYPES: &[SemanticTokenType] = &[
SemanticTokenType::VARIABLE, // 0: signals, variables, constants, files
SemanticTokenType::PARAMETER, // 1: subprogram parameters
SemanticTokenType::PROPERTY, // 2: attributes, record fields
SemanticTokenType::ENUM_MEMBER, // 3: enum literals
SemanticTokenType::FUNCTION, // 4: functions, procedures
SemanticTokenType::TYPE, // 5: types (general)
SemanticTokenType::CLASS, // 6: protected types, components
SemanticTokenType::NAMESPACE, // 7: libraries, design units, labels
SemanticTokenType::STRUCT, // 8: record types
SemanticTokenType::ENUM, // 9: enum types
];

pub const TOKEN_MODIFIERS: &[SemanticTokenModifier] = &[
SemanticTokenModifier::READONLY, // bit 0: constants, generics
];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a little macro (or similar) to make extending the semantic token types future-proof

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a define_token_types! macro that generates both the index constants and the TOKEN_TYPES legend array from a single declaration, so now they can't get out of sync when extending

Comment thread vhdl_lang/src/project.rs Outdated
Comment thread vhdl_lang/src/ast/search.rs
- Replace bare (u32, u32) tuples with TokenClassification and
CachedToken structs
- Replace DecodedToken test tuple with named struct
- Add define_token_types! macro to keep index constants and legend in
sync
- Use SrcPos::cmp for sorting instead of manual line/character
comparison
- Move in_range to Range::overlaps_lines in vhdl_lang
- Rename Project::semantic_tokens to find_all_entity_references
- Add source file filter in search_decl to guard against cross-file
decl_pos
- Match ExternalObjectClass directly instead of converting to
ObjectClass
- Skip multi-line tokens in encode instead of computing wrong length
@axlegrinder1 axlegrinder1 force-pushed the feature/semantic-tokens branch from 42100a1 to bb77ba1 Compare April 7, 2026 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Semantic Tokens

2 participants