-
Notifications
You must be signed in to change notification settings - Fork 274
gpt-oss 20b support #889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dkorzekwa/any_model
Are you sure you want to change the base?
gpt-oss 20b support #889
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -285,3 +285,10 @@ python -m nemo_export/convert_nemo_to_hf --input-ckpt-path path/to/nemo-model -- | |
| ## Advanced Usage | ||
|
|
||
| Modify `llama-3_1-8B_pruneffn_memory.yaml` file for advanced compression scenarios. | ||
|
|
||
| ## GptOss - 20b | ||
|
|
||
| With this release Puzzle algorithm supports only experts removal for Gpt-Oss-20b. This model comes as a quantized checkpoint i.e. MoE experts matrices are quantized with mxfp4 format. In the prunning steps puzzle utilizes decompressed model (back to bf16) for statistics and scores computation. This means, during the conversion to puzzle format we decompress the model and store it as a bf16. Once the pruning is done i.e. experts to be removed are identified and the process is finished, user may want to get back the mxfp4 format of the checkpoint. To do so, there is an additional script, that takes the original and the pruned checkpoint and outputs pruned checkpoint in mxfp4 format. | ||
| ```bash | ||
| python gpt_oss_pack_mxfp4_vllm.py --student-path /workspaces/any_model_gpt_oss_20b/mip/puzzle_solutions/stats_num_params_18014757184/solutions--checkpoints/solution_0/ --original-path /workspaces/source_model_checkpoints/openai_gpt-oss-20b/ --output-path /workspaces/any_model_gpt_oss_20b/mip/puzzle_solutions/stats_num_params_18014757184/solutions--checkpoints/mxfp4-ckpt/ --deduce-experts --num-layers 24 | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to specify from which path to run this command. Alternatively, pls check if |
||
| ``` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,110 @@ | ||
| defaults: | ||
| - pruning: ffn_pruning | ||
| - scoring: ../validate_solutions_defaults | ||
| - realize_model: ../validate_solutions_defaults | ||
| - bypass: | ||
| - override hydra/hydra_logging: disabled | ||
| - _self_ | ||
|
|
||
| puzzle_dir: ??? | ||
| descriptor: llama | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. seems wrong |
||
| teacher_dir: ${puzzle_dir}/ckpts/teacher/ | ||
| replacement_library_path: ${puzzle_dir}/replacement_library.json | ||
| dataset_path: ??? # path to Nemotron-Post-Training-Dataset-v2 | ||
|
|
||
| skip_realize_model: false | ||
|
|
||
| build_replacement_library: | ||
| add_ffn_no_ops: true | ||
| add_attention_no_ops: true | ||
|
|
||
| calc_subblock_stats: | ||
| batch_sizes: [64, 96, 128] | ||
| prefill_seq_len: 4096 | ||
| generation_seq_len: 4096 | ||
| num_active_tokens_override: # Optional override for sequence lengths | ||
| prefill_queue_size: 0 | ||
| allocate_prefill_query: false | ||
| benchmark_iterations: # Set to a number (e.g., 1000) to enable runtime benchmarking | ||
| merge_with_existing_stats: false | ||
| subblock_stats_filename: "subblock_stats.json" | ||
| moe_stats_filename: "moe_stats.json" | ||
| runtime_stats: | ||
| backend: trt_torch | ||
|
|
||
| scoring: | ||
| descriptor: ${descriptor} | ||
| solutions_to_validate: | ||
| skip_existing_solutions: true | ||
|
|
||
| replacement_library_path: ${replacement_library_path} | ||
| solutions_path: ${to_path:${puzzle_dir}/single_sequence_replacement_solutions.json} | ||
| teacher_dir: ${to_path:${teacher_dir}} | ||
| output_dir: ${puzzle_dir}/single_sequence_replacement_solutions--validation | ||
|
|
||
| eval_samples: 128 | ||
| micro_batch_size: 1 | ||
| seed: 42 | ||
| shuffle_seed: 444 | ||
| dataset_path: ${dataset_path} | ||
|
|
||
| mip: | ||
| single_block_replacement_validation_dir: ${to_path:${scoring.output_dir}} | ||
| subblock_stats_path: ${to_path:${puzzle_dir}/${calc_subblock_stats.subblock_stats_filename}} | ||
| output_path: ${to_path:${puzzle_dir}/mip/puzzle_solutions} | ||
| gathered_metrics_path: | ||
| puzzle_profile: | ||
|
|
||
| # puzzle_profile: | ||
| objective: metrics.cosine_embedding_loss_hidden_states | ||
| bigger_is_better: false | ||
|
|
||
| subblock_stats_args: | ||
| - batch_size: 96 | ||
| weights_dtype: torch.bfloat16 | ||
| activations_dtype: torch.bfloat16 | ||
| kv_cache_dtype: torch.bfloat16 | ||
|
|
||
| report_additional_costs: | ||
| - stats.memory_mib | ||
| - stats.num_params | ||
| - stats.num_kv_heads | ||
| - stats.has_attention | ||
| - stats.has_ffn | ||
| - stats.kv_cache_memory_mib | ||
| - stats.attention_memory_mib | ||
| - stats.ffn_memory_mib | ||
| - stats.ffn_num_params | ||
| - stats.attention_num_params | ||
|
|
||
| human_constraints: | ||
| target_memory: 45_000 | ||
| num_params: 3_000_000_000 | ||
|
|
||
| mip_constraints: | ||
| metric_overrides: | ||
| max_seconds_per_solution: 60 | ||
|
|
||
| realize_model: | ||
| descriptor: ${descriptor} | ||
| teacher_dir: ${to_path:${teacher_dir}} | ||
| tokenizer_name: ${to_path:${teacher_dir}} | ||
| replacement_library_path: ${replacement_library_path} | ||
| save_models: true | ||
| solutions_path: # Filled dynamically | ||
|
|
||
| # Validate params | ||
| skip_validation: false # To enable validation of the model solution set `skip_validation` as False | ||
| eval_samples: 128 | ||
| micro_batch_size: 1 | ||
| seed: 42 | ||
| shuffle_seed: 444 | ||
| dataset_path: ${dataset_path} | ||
|
|
||
| nccl_timeout_minutes: ${timedelta_minutes:10} | ||
|
|
||
| # This section redirects Hydra outputs | ||
| hydra: | ||
| run: | ||
| dir: ${puzzle_dir}/hydra_logs/${now:%Y-%m-%d}/${now:%H-%M-%S} | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| defaults: | ||
| - gptoss-20b | ||
| - _self_ | ||
|
|
||
| # Input Hugging Face model to compress | ||
| input_hf_model_path: /workspace/hf_models/openai/gpt-oss-20b | ||
|
|
||
| # Dataset path for pruning and NAS scoring | ||
| dataset_path: /workspace/datasets/Nemotron-Post-Training-Dataset-v2 | ||
|
|
||
| # Working directory for compression outputs | ||
| puzzle_dir: /workspace/puzzle_dir | ||
|
|
||
| # MIP memory constraint (in MiB) | ||
| mip: | ||
| human_constraints: | ||
| target_memory: 45_000 # 45 GiB | ||
|
|
||
| # FFN intermediate sizes to search over (heterogeneous architecture) | ||
| # teacher_intermediate_size is 8192, so we use proportionally smaller values | ||
| pruning: | ||
| intermediate_size_list: [2048, 4096, 6144] | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is it needed if we prune for num_of_experts? |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| defaults: | ||
| - pruning_defaults | ||
|
|
||
| eval_samples: 2500 #10 | ||
| activations_log_dir: ${puzzle_dir}/pruning/pruning_scores/expert_removal/${pruning.experiment_id} | ||
|
|
||
| pruning_mixin: | ||
| _target_: modelopt.torch.puzzletron.pruning.expert_removal_pruning_mixin.ExpertRemovalPruningMixIn | ||
| layer_descriptor: | ||
| _target_: modelopt.torch.puzzletron.anymodel.models.gpt_oss_20b.gpt_oss_20b_model_descriptor.GptOss20bExpertRemovalLayerDescriptor | ||
| target_name: "mlp.router" | ||
|
|
||
| hook_class: ${get_object:utils.activation_hooks.hooks.RankedChoiceVotingHook} | ||
| activation_hooks_kwargs: # Additional kwargs to pass to the hook init | ||
|
|
||
| num_experts_to_keep_list: [24, 16, 8] # num_experts in teacher is 128 | ||
| mlp_init_mode: "ExpertRemoval" | ||
| mlp_init_config_yaml: | ||
| expert_scores_key: "expert_ranks" | ||
| layer_prefix_template: "model.layers.{layer_idx}.mlp.router" | ||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| defaults: | ||
| - /validate_model_defaults | ||
|
|
||
| model_name_or_path: ${teacher_dir} | ||
| experiment_id: ${pruning.eval_samples}samples_diverse_mini | ||
| activations_log_dir: ??? | ||
| activation_hooks_kwargs: ??? | ||
|
|
||
| descriptor: ${descriptor} | ||
|
|
||
| # Data: | ||
| eval_samples: 10_000 | ||
| micro_batch_size: 1 | ||
| dataset_path: ${dataset_path} | ||
| val_dataset_name: train | ||
|
|
||
| # Prune ckpts | ||
| pruned_ckpts_outpt_dir: ${puzzle_dir}/pruning/${pruning.experiment_id} | ||
|
|
||
| ## FFN pruning | ||
| ffn_list: | ||
| mlp_init_mode: "Truncate" # PruneByActivationsLog | ||
|
|
||
| ## KV-heads pruning | ||
| n_heads_in_group_list: | ||
| gqa_init_mode: "AverageKV" | ||
|
|
||
| ## Hidden dimension pruning | ||
| hidden_size_list: | ||
| hidden_size_init_mode: "PruneByChannelRanking" | ||
| linear_init_mode: "FromTeacher" | ||
|
|
||
| mlp_init_config_yaml: | ||
| activations_log_dir: ${pruning.activations_log_dir} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| model_dtype: torch.bfloat16 # dtype to cast the model for validate_model | ||
| autocast_dtype: torch.bfloat16 # dtype for torch.autocast for validate_model | ||
| block_size: 8192 | ||
| bos_rate: 0.5 | ||
| data_column: messages | ||
| val_dataset_name: valid | ||
| shuffle_seed: 81436 | ||
| seed: 42 | ||
| fim_rate: 0 | ||
| fim_spm_rate: 0 | ||
| source_datasets_to_discard: | ||
| varlen: false | ||
| write_results: false | ||
| calc_losses_on_cpu: false | ||
| activations_log_dir: | ||
| model_name_or_path: | ||
| load_dataset_fn: ${get_object:modelopt.torch.puzzletron.utils.data.dataloaders.load_from_disk_fn} | ||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| defaults: | ||
| - /validate_model_defaults | ||
| - _self_ | ||
|
|
||
| solutions_to_validate: | ||
| skip_validation: false | ||
| save_models: false | ||
| bigger_is_better: false | ||
| sort_solutions_by: | ||
| calculate_full_score_ablations: false | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not put it at the same level as ## Advanced Usage, I would put it into a separate MD file (in the model descriptor dir) and link nicely in the main tutorial. Consult also with @LianaMikael how best to do it. We want the tutorial to have a great user experience.
Let's also check for English style/grammar. E.g. should be no ',' after that. , likely comma after 'n the prunning steps '