Fix window size for sliding attention layer#1311
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the sliding window size in the _context_attention_kernel for GPT-OSS models, specifically changing the look-ahead window to zero. A review comment suggests that this same modification should be applied to the _token_attention_kernel method to maintain consistency between the prefill and decode phases and prevent potential masking issues.
| ): | ||
| if self.network_config_["layer_types"][self.layer_num_] == "sliding_attention": | ||
| window_size = (self.sliding_window - 1, self.sliding_window - 1) | ||
| window_size = (self.sliding_window - 1, 0) |
There was a problem hiding this comment.
The fix for the sliding window size should also be applied to the _token_attention_kernel method (line 95) to ensure consistency between the prefill and decode phases. Currently, _token_attention_kernel still uses (self.sliding_window - 1, self.sliding_window - 1), which is inconsistent with this change and may lead to incorrect attention masking or suboptimal performance during decoding.
No description provided.