[tx] Implement Qwen 3.5 model architecture by pcmoritz · Pull Request #1228 · NovaSky-AI/SkyRL

pcmoritz · 2026-02-26T23:33:37Z

This PR implements the Qwen 3.5 model architecture, supporting mixed linear and full attention layers. For now we don't stack the layers yet to keep it simple. This PR also doesn't include MoE support yet.

Here are some examples you can run on 8xH100:

Qwen/Qwen3.5-27B Model

uv run --extra gpu --extra tinker -m skyrl.tinker.api     --base-model Qwen/Qwen3.5-27B     --backend-config '{"max_lora_adapters": 2, "max_lora_rank": 1, "tensor_parallel_size": 8, "train_micro_batch_size": 1, "shard_attention_heads": false}'

and then

export TINKER_API_KEY="tml-dummy"
uv run --with wandb --with tinker sl_loop.py     base_url=http://localhost:8000     model_name=Qwen/Qwen3.5-27B lora_rank=1 max_length=128 train_on_what=LAST_ASSISTANT_MESSAGE

Qwen/Qwen3.5-4B Model

uv run --extra gpu --extra tinker -m skyrl.tinker.api     --base-model Qwen/Qwen3.5-4B     --backend-config '{"max_lora_adapters": 2, "max_lora_rank": 1, "tensor_parallel_size": 8, "train_micro_batch_size": 1, "shard_attention_heads": false}'

and then

export TINKER_API_KEY="tml-dummy"
uv run --with wandb --with tinker sl_loop.py     base_url=http://localhost:8000     model_name=Qwen/Qwen3.5-4B lora_rank=1 train_on_what=LAST_ASSISTANT_MESSAGE max_length=512

RL example

uv run --extra gpu --extra tinker -m skyrl.tinker.api \
    --base-model Qwen/Qwen3.5-2B \
    --backend-config '{"max_lora_adapters": 3, "max_lora_rank": 1, "tensor_parallel_size": 8, "train_micro_batch_size": 1, "sample_max_num_sequences": 64, "shard_attention_heads": false}' > out.log

export TINKER_API_KEY="tml-dummy"
uv run --with wandb --with tinker rl_loop.py     base_url=http://localhost:8000     model_name="Qwen/Qwen3.5-2B"     lora_rank=1 max_tokens=1024

pcmoritz · 2026-03-02T00:42:43Z

/gemini review

pcmoritz · 2026-03-02T22:52:20Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces the Qwen 3.5 model architecture, which supports a mix of full attention and linear attention layers. The implementation is comprehensive, including changes to the model configuration, KV caching for the new layer types, and extensive tests comparing against the reference Hugging Face implementation. My review focuses on improving the robustness of the configuration handling and fixing a potential bug in model initialization. Overall, this is a solid contribution.

skyrl/tx/models/qwen3_5.py

skyrl/tx/models/configs.py

gemini-code-assist · 2026-03-02T22:55:03Z

skyrl/tx/models/qwen3_5.py

+        if layer_types is None:
+            interval = getattr(config, "full_attention_interval", 4)
+            layer_types = [
+                "linear_attention" if (i + 1) % interval else "full_attention" for i in range(config.num_hidden_layers)
+            ]
+            config.layer_types = layer_types


Modifying the config object in-place by setting config.layer_types can lead to unexpected side effects if the same config object is used elsewhere. A cleaner approach is to avoid mutating the config. You could compute layer_types and store it as a member of this class, then pass the specific layer_type to each Qwen3_5DecoderLayer during its initialization. This would require a small change to Qwen3_5DecoderLayer.__init__ to accept layer_type as an argument.

[tx] Implement Qwen 3.5 model architecture

3f3e4e2

pcmoritz mentioned this pull request Feb 26, 2026

[tx] Implement qwen3.5 model architecture #1201

Closed

This comment was marked as resolved.

Sign in to view

update

da1e176

This comment was marked as resolved.

Sign in to view

pcmoritz added 5 commits February 27, 2026 11:56

update

56ef9a2

update

2877dff

update

34b0992

update

daece78

Merge branch 'main' into tx-qwen-3.5

de2c0d9

This comment was marked as resolved.

Sign in to view

pcmoritz added 6 commits March 1, 2026 01:54

update

1568ce4

cleanup

e681a35

Add shapes

a1003da

simplify

33ffeaa

update

72a82b3

cleanup

1486fdf

pcmoritz added the tx label Mar 1, 2026

pcmoritz added 7 commits March 1, 2026 14:59

simplify

5ddb951

update

292e1ff

update

69223a4

update

5467a4c

update

c61dafc

update model

7c120ed

update

90318bb

pcmoritz changed the title ~~[WIP] [tx] Implement Qwen 3.5 model architecture~~ [tx] Implement Qwen 3.5 model architecture Mar 2, 2026

This comment was marked as resolved.

Sign in to view

update

e1000aa

pcmoritz added 4 commits March 1, 2026 17:25

update

7b7531e

update

16f05fc

update

c18f634

fix bug

ace6e71

This comment was marked as resolved.

Sign in to view

pcmoritz added 11 commits March 2, 2026 10:56

update test

cb2bd6f

cleanup

902ef4f

cleanup

9306413

remove moe code for now

209385c

simplify _causal_conv

01182e0

simplify

ef8afcf

update

6491e34

simplify

bb0872b

simplify

0319a9f

black

d272404

cleanup

f474f5b

gemini-code-assist bot reviewed Mar 2, 2026

View reviewed changes

pcmoritz added 2 commits March 2, 2026 15:32

update test

2370bf2

update tests

55608af

pcmoritz merged commit adcb93e into NovaSky-AI:main Mar 3, 2026
5 of 6 checks passed

pcmoritz mentioned this pull request Mar 3, 2026

[tx] Add Qwen3 VL model #1196

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tx] Implement Qwen 3.5 model architecture#1228

[tx] Implement Qwen 3.5 model architecture#1228
pcmoritz merged 38 commits intoNovaSky-AI:mainfrom
pcmoritz:tx-qwen-3.5

pcmoritz commented Feb 26, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

pcmoritz commented Mar 2, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

pcmoritz commented Mar 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pcmoritz commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

pcmoritz commented Mar 2, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

pcmoritz commented Mar 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pcmoritz commented Feb 26, 2026 •

edited

Loading