Skip to content

[TMP] Feature/hma support#6

Draft
chanhopark1 wants to merge 3 commits into
devfrom
feature/hma-support
Draft

[TMP] Feature/hma support#6
chanhopark1 wants to merge 3 commits into
devfrom
feature/hma-support

Conversation

@chanhopark1
Copy link
Copy Markdown

No description provided.

chanhopark1 and others added 2 commits March 10, 2026 01:15
…odel names

- Document GatedDeltaNet + GatedAttention group layout for Qwen3.5
- Keep Full+SWA layout as a second example
- Explain why group 0 delegation is correct (standard KV only)
- Remove incorrect gpt-oss-20b/120b references

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ation

Hybrid models (e.g. Qwen3.5) produce mixed kv_caches dicts where
attention layers are torch.Tensor but Mamba/linear-attention layers
are list[torch.Tensor]. LMCache only handles attention KV caches,
so filter out non-tensor entries at both the adapter and grouping layers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had activity within 60 days. It will be automatically closed if no further activity occurs within 30 days.

@github-actions github-actions Bot added the stale label May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant