Skip to content

How to compress t5-v1_1-xxl and Gemma-2-2B? #28

@mingyi456

Description

@mingyi456

I managed to successfully compress Cosmos-Predict2 and Chroma, but when I tried to compress the T5 text encoder model used by Flux, I get the following error instead:

Traceback (most recent call last):
  File "F:\AI setups\Diffusers\models\compress t5.py", line 42, in <module>
    compress_model(
  File "F:\AI setups\Diffusers\diffusers-venv\Lib\site-packages\dfloat11\dfloat11.py", line 622, in compress_model
    save_file(model.state_dict(), os.path.join(save_path, 'model.safetensors'))
  File "F:\AI setups\Diffusers\diffusers-venv\Lib\site-packages\safetensors\torch.py", line 352, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
                   ^^^^^^^^^^^^^^^^^
  File "F:\AI setups\Diffusers\diffusers-venv\Lib\site-packages\safetensors\torch.py", line 577, in _flatten
    raise RuntimeError(
RuntimeError:
            Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'shared.weight', 'encoder.embed_tokens.weight'}].
            A potential way to correctly save your model is to use `save_model`.
            More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

Is this due to an error with my compression code, or is what I am trying to do not supported? The complete code I used, including the pattern_dict, is below:

import torch
from dfloat11 import compress_model
from transformers import T5EncoderModel
save_path = r".\t5-v1_1-xxl-DF11"
save_single_file = True
check_correctness = True
block_range = (0, 100)

text_encoder_2 = T5EncoderModel.from_pretrained(
	r"..\models\FLUX.1-dev",
	subfolder = "text_encoder_2",
	torch_dtype = torch.bfloat16,
	local_files_only = True
)

pattern_dict={
	"block\.\d+": (
		"layer.0.SelfAttention.q",
        "layer.0.SelfAttention.k",
        "layer.0.SelfAttention.v",
        "layer.0.SelfAttention.o",
        "layer.1.DenseReluDense.wi_0",
        "layer.1.DenseReluDense.wi_1",
        "layer.1.DenseReluDense.wo",
	)
}

# Compress the model using DFloat11 compression
compress_model(
	model=text_encoder_2,
	pattern_dict= pattern_dict,
	save_path = save_path,
	save_single_file = save_single_file,
	check_correctness = check_correctness,
	block_range = block_range,
)

Edit: Found an issue with the pattern_dict, block should be replaced with encoder.block, but the shared tensors issue will still stop the file from being saved after the compression process finishes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions