Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ supported on Twinkle✨ framework.
> For serverless training service accessed via `base_url=https://www.modelscope.cn/twinkle`, it
> is currently provided via the Tinker-compatible APIs. We will be rolling out services that support
> both Tinker APIs, as well as the full-fledged Twinkle✨ native APIs. The serverless endpoint is backed
> by one training base at a time, and currently it is [Qwen3-30B-A3B-Instruct-2507](https://modelscope.cn/models/Qwen/Qwen3-30B-A3B-Instruct-2507).
> by one training base at a time, and currently it is [Qwen3.5-4B](https://modelscope.cn/models/Qwen/Qwen3.5-4B).

| Model Type | Model ID on [ModelScope](https://modelscope.cn) | Model Size | Requires | Support Megatron | HF Model ID |
|---------------------|-----------------------------------------------------------------------------------------------------------------|:---------------------------------------:|----------------------|:----------------:|:---------------------------------------------------------------------------------------------------------:|
Expand Down Expand Up @@ -234,7 +234,7 @@ from twinkle.dataset import Dataset, DatasetMeta
from twinkle.preprocessor import SelfCognitionProcessor
from twinkle.server.common import input_feature_to_datum

base_model = 'ms://Qwen/Qwen3-30B-A3B-Instruct-2507'
base_model = 'ms://Qwen/Qwen3.5-4B'
base_url='your-base-url'
api_key='your-api-key'

Expand Down
4 changes: 2 additions & 2 deletions README_ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ Twinkle✨支持相同的算法接口运行在单GPU、torchrun多机、Ray、Cl
随着新模型的发布,我们将添加对更多模型的支持。下表列出了 Twinkle✨ 框架当前支持的模型。

>[!Note]
> 通过 `base_url=https://www.modelscope.cn/twinkle` 访问的无服务器训练服务,目前是通过兼容Tinker的API提供的。我们将陆续推出同时支持Tinker API和完整Twinkle✨原生 API的服务。无服务器端点每次由一个训练基座支持,目前使用的是[Qwen3-30B-A3B-Instruct-2507](https://modelscope.cn/models/Qwen/Qwen3-30B-A3B-Instruct-2507)。
> 通过 `base_url=https://www.modelscope.cn/twinkle` 访问的无服务器训练服务,目前是通过兼容Tinker的API提供的。我们将陆续推出同时支持Tinker API和完整Twinkle✨原生 API的服务。无服务器端点每次由一个训练基座支持,目前使用的是[Qwen3.5-4B](https://modelscope.cn/models/Qwen/Qwen3.5-4B)。

| Model Type | Model ID 举例 | Model Size | Requires | Support Megatron | HF Model ID |
|---------------------|-----------------------------------------------------------------------------------------------------------------|:---------------------------------------:|----------------------|:----------------:|:---------------------------------------------------------------------------------------------------------:|
Expand Down Expand Up @@ -216,7 +216,7 @@ from twinkle.dataset import Dataset, DatasetMeta
from twinkle.preprocessor import SelfCognitionProcessor
from twinkle.server.common import input_feature_to_datum

base_model = 'ms://Qwen/Qwen3-30B-A3B-Instruct-2507'
base_model = 'ms://Qwen/Qwen3.5-4B'
base_url='your-base-url'
api_key='your-api-key'

Expand Down
12 changes: 6 additions & 6 deletions cookbook/client/server/megatron/server_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,11 @@ applications:

# 3. Sampler Service - Runs inference / sampling using vLLM engine
# Used for generating text from the model (e.g., evaluating LoRA results).
- name: sampler-Qwen3-30B-A3B-Instruct-2507
route_prefix: /api/v1/sampler/Qwen/Qwen3-30B-A3B-Instruct-2507
- name: sampler-Qwen3.5-4B
route_prefix: /api/v1/sampler/Qwen/Qwen3.5-4B
import_path: sampler
args:
model_id: "ms://Qwen/Qwen3-30B-A3B-Instruct-2507" # ModelScope model identifier
model_id: "ms://Qwen/Qwen3.5-4B" # ModelScope model identifier
nproc_per_node: 4 # Number of GPU processes per node
sampler_type: vllm # Inference engine: 'vllm' (fast) or 'torch' (TorchSampler)
engine_args: # vLLM engine-specific settings
Expand Down Expand Up @@ -73,12 +73,12 @@ applications:

# 2. Model Service (commented out) - Would host the base model for training.
# Uncomment and configure if you need a training model worker.
- name: models-Qwen3-30B-A3B-Instruct-2507
route_prefix: /api/v1/model/Qwen/Qwen3-30B-A3B-Instruct-2507
- name: models-Qwen3.5-4B
route_prefix: /api/v1/model/Qwen/Qwen3.5-4B
import_path: model
args:
use_megatron: true # Use HuggingFace Transformers backend
model_id: "ms://Qwen/Qwen3-30B-A3B-Instruct-2507" # ModelScope model identifier
model_id: "ms://Qwen/Qwen3.5-4B" # ModelScope model identifier
max_length: 16000 # model max length
max_loras: 5 # model max loras
nproc_per_node: 4 # Number of GPU processes per node
Expand Down
1 change: 1 addition & 0 deletions cookbook/client/server/megatron/server_config_4b.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ applications:
import_path: model
args:
use_megatron: true
model_cls: Qwen3_5ForConditionalGeneration
model_id: "ms://Qwen/Qwen3.5-4B" # ModelScope model identifier
max_length: 10240
nproc_per_node: 2 # Number of GPU processes per node
Expand Down
4 changes: 2 additions & 2 deletions cookbook/client/tinker/modelscope/sample.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

from tinker import ServiceClient

base_model = 'Qwen/Qwen3-30B-A3B-Instruct-2507'
base_model = 'Qwen/Qwen3.5-4B'
base_url = 'http://www.modelscope.cn/twinkle'

# Step 2: Define the base model and connect to the server
Expand All @@ -29,7 +29,7 @@
# The model_path is a twinkle:// URI pointing to a previously saved LoRA checkpoint.
# The server will load the base model and apply the LoRA adapter weights.
sampling_client = service_client.create_sampling_client(
model_path='twinkle://xxx-Qwen_Qwen3-30B-A3B-Instruct-2507-xxx/weights/twinkle-lora-1',
model_path='twinkle://xxx-Qwen_Qwen3.5-4B-xxx/weights/twinkle-lora-1',
base_model=base_model
)

Expand Down
2 changes: 1 addition & 1 deletion cookbook/client/tinker/modelscope/self_cognition.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
from tinker import ServiceClient

# The base model to fine-tune / evaluate
base_model = 'Qwen/Qwen3-30B-A3B-Instruct-2507'
base_model = 'Qwen/Qwen3.5-4B'
base_url = 'http://www.modelscope.cn/twinkle'


Expand Down
2 changes: 1 addition & 1 deletion cookbook/client/tinker/modelscope/short_math_grpo.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
logger = get_logger()

# ========== Configuration ==========
BASE_MODEL = 'Qwen/Qwen3-30B-A3B-Instruct-2507'
BASE_MODEL = 'Qwen/Qwen3.5-4B'
NUM_GENERATIONS = 8
MAX_NEW_TOKENS = 4096
LEARNING_RATE = 1e-4
Expand Down
2 changes: 1 addition & 1 deletion cookbook/client/tinker/self_host/sample.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
# The model_path is a twinkle:// URI pointing to a previously saved LoRA checkpoint.
# The server will load the base model and apply the LoRA adapter weights.
sampling_client = service_client.create_sampling_client(
model_path='twinkle://xxx-Qwen_Qwen3-30B-A3B-Instruct-2507-xxx/weights/twinkle-lora-1',
model_path='twinkle://xxx-Qwen_Qwen3.5-4B-xxx/weights/twinkle-lora-1',
base_model=base_model
)

Expand Down
168 changes: 168 additions & 0 deletions cookbook/client/twinkle/modelscope/multi_modal.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
# Twinkle Client - Transformers LoRA Training Example
#
# This script demonstrates how to fine-tune a language model using LoRA
# (Low-Rank Adaptation) through the Twinkle client-server architecture.
# The server must be running first (see server.py and server_config.yaml).

# Step 1: Load environment variables from a .env file (e.g., API tokens)
import dotenv
import os
from twinkle.data_format import Trajectory, Message
from twinkle.preprocessor import Preprocessor

dotenv.load_dotenv('.env')
import numpy as np
import torch
from peft import LoraConfig

from twinkle import get_logger
from twinkle.dataset import DatasetMeta
from twinkle_client import init_twinkle_client
from twinkle.dataloader import DataLoader
from twinkle.dataset import LazyDataset
from twinkle_client.model import MultiLoraTransformersModel

logger = get_logger()

base_model = 'Qwen/Qwen3.5-4B'
base_url = 'http://www.modelscope.cn/twinkle'

# Step 2: Initialize the Twinkle client to communicate with the remote server.
# - base_url: the address of the running Twinkle server
# - api_key: authentication token (loaded from environment variable)
client = init_twinkle_client(base_url=base_url, api_key=os.environ.get('MODELSCOPE_TOKEN'))

# Step 3: Query the server for existing training runs and their checkpoints.
# This is useful for resuming a previous training session.
runs = client.list_training_runs()

resume_path = None
for run in runs:
logger.info(run.model_dump_json(indent=2))
# List all saved checkpoints for this training run
checkpoints = client.list_checkpoints(run.training_run_id)

for checkpoint in checkpoints:
logger.info(checkpoint.model_dump_json(indent=2))
# Uncomment the line below to resume from a specific checkpoint:
# resume_path = checkpoint.twinkle_path


class LatexOCRProcessor(Preprocessor):

def __call__(self, rows):
rows = self.map_col_to_row(rows)
rows = [self.preprocess(row) for row in rows]
rows = self.map_row_to_col(rows)
return rows

def preprocess(self, row) -> Trajectory:
return Trajectory(
messages=[
Message(role='user', content='<image>Using LaTeX to perform OCR on the image.', images=[row['image']]),
Message(role='assistant', content=row['text']),
]
)


def train():
# Step 4: Prepare the dataset

# Load the latex dataset from ModelScope
dataset = LazyDataset(dataset_meta=DatasetMeta('ms://AI-ModelScope/LaTeX_OCR', data_slice=range(500)))

# Apply a chat template so the data matches the model's expected input format
dataset.set_template('Qwen3_5Template', model_id=f'ms://{base_model}', max_length=512)

# Replace placeholder names in the dataset with custom model/author names
dataset.map(LatexOCRProcessor)

# Tokenize and encode the dataset into model-ready input features
dataset.encode(batched=True)

# Wrap the dataset into a DataLoader that yields batches of size 4
dataloader = DataLoader(dataset=dataset, batch_size=4)

# Step 5: Configure the model

# Create a multi-LoRA Transformers model pointing to the base model on ModelScope
model = MultiLoraTransformersModel(model_id=f'ms://{base_model}')

# Define LoRA configuration: apply low-rank adapters to all linear layers
lora_config = LoraConfig(target_modules='all-linear')

# Attach the LoRA adapter named 'default' to the model.
# gradient_accumulation_steps=2 means gradients are accumulated over 2 micro-batches
# before an optimizer step, effectively doubling the batch size.
model.add_adapter_to_model('default', lora_config, gradient_accumulation_steps=2)

# Set the same chat template used during data preprocessing
model.set_template('Qwen3_5Template')

# Set the input processor (pads sequences on the right side)
model.set_processor('InputProcessor', padding_side='right')

# Use cross-entropy loss for language modeling
model.set_loss('CrossEntropyLoss')

# Use Adam optimizer with a learning rate of 1e-4 (Only support Adam optimizer if server use megatron)
model.set_optimizer('Adam', lr=1e-4)

# Use a linear learning rate scheduler (Do not support LR scheduler if server use megatron)
# model.set_lr_scheduler('LinearLR')

# Step 6: Optionally resume from a previous checkpoint
if resume_path:
logger.info(f'Resuming training from {resume_path}')
model.load(resume_path, load_optimizer=True)

# Step 7: Run the training loop
logger.info(model.get_train_configs().model_dump())

for epoch in range(3):
logger.info(f'Starting epoch {epoch}')
for step, batch in enumerate(dataloader):
for sample in batch:
for key in sample:
if isinstance(sample[key], np.ndarray):
sample[key] = sample[key].tolist()
elif isinstance(sample[key], torch.Tensor):
sample[key] = sample[key].cpu().numpy().tolist()
# Forward pass + backward pass (computes gradients)
model.forward_backward(inputs=batch)

# Step
model.clip_grad_and_step()
# Equal to the following steps:
# # Clip gradients to prevent exploding gradients (max norm = 1.0)
# model.clip_grad_norm(1.0)
# # Perform one optimizer step (update model weights)
# model.step()
# # Reset gradients to zero for the next iteration
# model.zero_grad()
# # Advance the learning rate scheduler by one step
# model.lr_step()

# Log the loss every 2 steps (aligned with gradient accumulation)
if step % 2 == 0:
# Print metric
metric = model.calculate_metric(is_training=True)
logger.info(f'Current is step {step} of {len(dataloader)}, metric: {metric.result}')

# Step 8: Save the trained checkpoint
twinkle_path = model.save(name=f'twinkle-epoch-{epoch}', save_optimizer=True)
logger.info(f'Saved checkpoint: {twinkle_path}')

# Step 9: Upload the checkpoint to ModelScope Hub
# YOUR_USER_NAME = "your_username"
# hub_model_id = f'{YOUR_USER_NAME}/twinkle-multi-modal'
# model.upload_to_hub(
# checkpoint_dir=twinkle_path,
# hub_model_id=hub_model_id,
# async_upload=False
# )
# logger.info(f"Uploaded checkpoint to hub: {hub_model_id}")


if __name__ == '__main__':
train()
2 changes: 1 addition & 1 deletion cookbook/client/twinkle/modelscope/self_congnition.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@

logger = get_logger()

base_model = 'Qwen/Qwen3-30B-A3B-Instruct-2507'
base_model = 'Qwen/Qwen3.5-4B'
base_url = 'http://www.modelscope.cn/twinkle'

# Step 2: Initialize the Twinkle client to communicate with the remote server.
Expand Down
2 changes: 1 addition & 1 deletion cookbook/transformers/ep_fsdp_qwen3_moe.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

logger = get_logger()

MODEL_ID = os.environ.get('QWEN3_MODEL_ID', 'ms://Qwen/Qwen3-30B-A3B-Instruct-2507')
MODEL_ID = os.environ.get('QWEN3_MODEL_ID', 'ms://Qwen/Qwen3.5-4B')
DATASET_ID = os.environ.get('DATASET_ID', 'ms://swift/self-cognition')
TEMPLATE_ID = os.environ.get('TEMPLATE_ID', 'Template')
_num_layers_env = os.environ.get('NUM_LAYERS')
Expand Down
6 changes: 3 additions & 3 deletions cookbook/transformers/fsdp2_moe.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
def eval(model):
# 100 Samples
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(100)))
dataset.set_template('Template', model_id='ms://Qwen/Qwen3-30B-A3B-Instruct-2507')
dataset.set_template('Template', model_id='ms://Qwen/Qwen3.5-4B')
dataset.map(SelfCognitionProcessor('twinkle大模型', 'ModelScope社区'))
dataset.encode()
dataloader = DataLoader(dataset=dataset, batch_size=4)
Expand All @@ -35,15 +35,15 @@ def train():
# 1000 samples
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(1000)))
# Set template to prepare encoding
dataset.set_template('Template', model_id='ms://Qwen/Qwen3-30B-A3B-Instruct-2507')
dataset.set_template('Template', model_id='ms://Qwen/Qwen3.5-4B')
# Preprocess the dataset to standard format
dataset.map(SelfCognitionProcessor('twinkle大模型', 'ModelScope社区'))
# Encode dataset
dataset.encode()
# Global batch size = 4, for GPUs, so 1 sample per GPU
dataloader = DataLoader(dataset=dataset, batch_size=8)
# Use a TransformersModel, transformer_cls_names_to_wrap=Qwen3MoeSparseMoeBlock to avoid hang of fsdp2
model = TransformersModel(model_id='ms://Qwen/Qwen3-30B-A3B-Instruct-2507', fsdp_config={'transformer_cls_names_to_wrap':['Qwen3MoeSparseMoeBlock']})
model = TransformersModel(model_id='ms://Qwen/Qwen3.5-4B', fsdp_config={'transformer_cls_names_to_wrap':['Qwen3MoeSparseMoeBlock']})
# Patch MoE model to fix the hang bug, support transformers==4.*
model.apply_patch('ms://twinkle-kit/qwen3_moe_transformers4_patch')
lora_config = LoraConfig(
Expand Down
2 changes: 1 addition & 1 deletion docs/source_en/Usage Guide/Introduction-with-Qwen3.5.md
Original file line number Diff line number Diff line change
Expand Up @@ -530,7 +530,7 @@ Alongside the open-source release of Twinkle, ModelScope provides a hosted model

```python
base_url = 'https://www.modelscope.cn/twinkle'
base_model = 'Qwen/Qwen3-30B-A3B-Instruct-2507' # Model currently deployed in the official environment
base_model = 'Qwen/Qwen3.5-4B' # Model currently deployed in the official environment
```

---
Expand Down
10 changes: 5 additions & 5 deletions docs/source_en/Usage Guide/Train-as-a-Service.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Alongside the open-source release of the Twinkle framework, we also provide a hosted model training service (Training as a Service) powered by ModelScope's backend infrastructure. Developers can use this service to experience Twinkle's training API for free.

The model currently running on the cluster is [Qwen/Qwen3-30B-A3B-Instruct-2507](https://www.modelscope.cn/models/Qwen/Qwen3-30B-A3B-Instruct-2507). Below are the detailed usage instructions:
The model currently running on the cluster is [Qwen/Qwen3.5-4B](https://www.modelscope.cn/models/Qwen/Qwen3.5-4B). Below are the detailed usage instructions:

## Step 1. Register a ModelScope Account and Apply to Join the twinkle-explorers Organization

Expand Down Expand Up @@ -30,7 +30,7 @@ from twinkle.dataset import Dataset, DatasetMeta
from twinkle.preprocessor import SelfCognitionProcessor
from twinkle.server.common import input_feature_to_datum

base_model = 'ms://Qwen/Qwen3-30B-A3B-Instruct-2507'
base_model = 'ms://Qwen/Qwen3.5-4B'
base_url='http://www.modelscope.cn/twinkle'
api_key=os.environ.get('MODELSCOPE_TOKEN')

Expand Down Expand Up @@ -64,7 +64,7 @@ for epoch in range(2):
print(f'Saved checkpoint for epoch {epoch} to {result.path}')
```

With the code above, you can train a self-cognition LoRA based on `Qwen/Qwen3-30B-A3B-Instruct-2507`. This LoRA will change the model's name and creator to the names specified during training. To perform inference using this LoRA:
With the code above, you can train a self-cognition LoRA based on `Qwen/Qwen3.5-4B`. This LoRA will change the model's name and creator to the names specified during training. To perform inference using this LoRA:

```python
import os
Expand All @@ -79,7 +79,7 @@ init_tinker_client()

from tinker import ServiceClient

base_model = 'Qwen/Qwen3-30B-A3B-Instruct-2507'
base_model = 'Qwen/Qwen3.5-4B'
base_url = 'http://www.modelscope.cn/twinkle'

# Step 2: Define the base model and connect to the server
Expand All @@ -92,7 +92,7 @@ service_client = ServiceClient(
# The model_path is a twinkle:// URI pointing to a previously saved LoRA checkpoint.
# The server will load the base model and apply the LoRA adapter weights.
sampling_client = service_client.create_sampling_client(
model_path='twinkle://xxx-Qwen_Qwen3-30B-A3B-Instruct-2507-xxx/weights/twinkle-lora-1',
model_path='twinkle://xxx-Qwen_Qwen3.5-4B-xxx/weights/twinkle-lora-1',
base_model=base_model
)

Expand Down
2 changes: 1 addition & 1 deletion docs/source_zh/使用指引/Qwen3.5最佳实践.md
Original file line number Diff line number Diff line change
Expand Up @@ -530,7 +530,7 @@ Twinkle 框架开源的同时,魔搭社区依托自身算力基础设施,提

```python
base_url = 'https://www.modelscope.cn/twinkle'
base_model = 'Qwen/Qwen3-30B-A3B-Instruct-2507' # 官方环境当前部署的模型
base_model = 'Qwen/Qwen3.5-4B' # 官方环境当前部署的模型
```

---
Expand Down
Loading
Loading