Skip to content

Fix window size for sliding attention layer#1311

Merged
hiworldwzj merged 1 commit into
mainfrom
WANDY666-patch-2
May 18, 2026
Merged

Fix window size for sliding attention layer#1311
hiworldwzj merged 1 commit into
mainfrom
WANDY666-patch-2

Conversation

@WANDY666
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the sliding window size in the _context_attention_kernel for GPT-OSS models, specifically changing the look-ahead window to zero. A review comment suggests that this same modification should be applied to the _token_attention_kernel method to maintain consistency between the prefill and decode phases and prevent potential masking issues.

):
if self.network_config_["layer_types"][self.layer_num_] == "sliding_attention":
window_size = (self.sliding_window - 1, self.sliding_window - 1)
window_size = (self.sliding_window - 1, 0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The fix for the sliding window size should also be applied to the _token_attention_kernel method (line 95) to ensure consistency between the prefill and decode phases. Currently, _token_attention_kernel still uses (self.sliding_window - 1, self.sliding_window - 1), which is inconsistent with this change and may lead to incorrect attention masking or suboptimal performance during decoding.

@hiworldwzj hiworldwzj merged commit 171204e into main May 18, 2026
1 check passed
@hiworldwzj hiworldwzj deleted the WANDY666-patch-2 branch May 18, 2026 03:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants