Update dependency cache_dit to v1.3.9 by renovate[bot] · Pull Request #33 · BaizeAI/image-gen-runtime

Quantization is a powerful technique to reduce the memory footprint and computational cost of deep learning models by representing weights and activations with lower precision data types. Cache-DiT supports various quantization methods, including FP8, INT8, and INT4 quantization, to help users achieve faster inference and lower memory usage while maintaining acceptable model performance.

quantization type	description	devices
float8_per_row	quantize weights and activations to float8 (dynamic quantization) with rowwise method. (recommended)	>=sm89, Ada, Hopper or newer
float8_per_tensor	quantize weights and activations to float8 (dynamic quantization) with tensorwise method.	>=sm89, Ada, Hopper or newer
float8_per_block	block-wise quantization weights and activations (dynamic quantization) to float8, which can provide better precision, activations's blocksize: (1, 128), weight's blocksize: (128, 128)	>=sm89, Ada, Hopper or newer
float8_weight_only	quantize only weights to float8, keep activations in full precision	>=sm89, Ada, Hopper or newer
int8_per_row	quantize weights and activations to int8 (dynamic quantization) with rowwise method.	>=sm80, Ampere or newer
int8_per_tensor	quantize weights and activations to int8 (dynamic quantization) with tensorwise method.	>=sm80, Ampere or newer
int8_weight_only	quantize only weights to int8, keep activations in full precision	>=sm80, Ampere or newer
int4_weight_only	quantize only weights to int4, keep activations in full precision	>=sm90, Hopper or newer, TMA required

FP8 Quantization

Currently, TorchAo has been fully integrated into Cache-DiT as the backend for online quantization. You can implement model quantization by calling quantize or pass a QuantizeConfig to enable_cache API. (recommended)

For GPUs with low memory capacity, we recommend using float8_per_row or float8_per_block, as these methods cause almost no loss in precision. Supported quantization types including:

float8_per_row: quantize both weights and activations to float8 (dynamic quantization) with rowwise method.
float8_per_tensor: quantize both weights and activations to float8 (dynamic quantization) with tensorwise method.
float8_per_block: block-wise quantization weights and activations (dynamic quantization) to float8, which can provide better precision, activations's blocksize: (1, 128), weight's blocksize: (128, 128). NOT supported for distributed inference for now.
float8_weight_only: quantize only weights to float8, keep activations in full precision.

Here are some examples of how to use quantization with cache-dit. You can directly specify the quantization config in the enable_cache API.

import cache_dit
from cache_dit import DBCacheConfig, ParallelismConfig, QuantizeConfig

# quant_type: float8_per_row, float8_per_tensor, float8_per_block, float8_weight_only, 

# int8_per_row, int8_per_tensor, int8_weight_only, int4_weight_only, etc.
# Pass a QuantizeConfig to the `enable_cache` API.
cache_dit.enable_cache( 
    pipe, cache_config=DBCacheConfig(), # w/ default
    parallelism_config=ParallelismConfig(ulysses_size=2),
    quantize_config=QuantizeConfig(quant_type="float8_per_row"),
)

Users can also specify different quantization configs for different components. For example, quantize the transformer to float8_per_row and the text encoder to float8_weight_only.

import cache_dit
from cache_dit import DBCacheConfig, ParallelismConfig, QuantizeConfig

cache_dit.enable_cache( 
    pipe, cache_config=DBCacheConfig(), # w/ default
    parallelism_config=ParallelismConfig(ulysses_size=2),
    quantize_config=QuantizeConfig(
        components_to_quantize={
            "transformer": {
                "quant_type": "float8_per_row",
                "exclude_layers": ["embedder", "embed"],
            },
            "text_encoder": {
                "quant_type": "float8_weight_only",
                "exclude_layers": ["lm_head"],
            }
        }
    ),
)

Or, directly call the quantize API for more fine-grained control.

import cache_dit
from cache_dit import QuantizeConfig

cache_dit.quantize(
    pipe.transformer, 
    quantize_config=QuantizeConfig(quant_type="float8_per_row"),
)
cache_dit.quantize(
    pipe.text_encoder, 
    quantize_config=QuantizeConfig(quant_type="float8_weight_only"),
)

Please also enable torch.compile for better performance with quantization.

import cache_dit

cache_dit.set_compile_configs()
pipe.transformer = torch.compile(pipe.transformer)
pipe.text_encoder = torch.compile(pipe.text_encoder)

Users can set exclude_layers in QuantizeConfig to exclude some sensitive layers that are not robust to quantization, e.g., embedding layers. Layers that contain any of the keywords in the exclude_layers list will be excluded from quantization. For example:

import cache_dit
from cache_dit import DBCacheConfig, ParallelismConfig, QuantizeConfig

cache_dit.enable_cache( 
    pipe, cache_config=DBCacheConfig(), # w/ default
    parallelism_config=ParallelismConfig(ulysses_size=2),
    quantize_config=QuantizeConfig(
        quant_type="float8_per_row",
        exclude_layers=["embedder", "embed"],
    ),
)

By default, quant_type="float8_per_row" for better precision. Users can set it to "float8_per_tensor" to use per-tensor quantization for better performance on some hardware.

Regional Quantization

Cache-DiT also supports regional quantization, which allows users to quantize only the repeated blocks in a transformer. This can be useful for better balancing the precision and efficiency. Users can specify the blocks to be quantized via the regional_quantize and repeated_blocks arguments in QuantizeConfig. For example, to quantize repeated blocks of the Flux2's transformer:

import cache_dit
from cache_dit import DBCacheConfig, ParallelismConfig, QuantizeConfig

cache_dit.enable_cache( 
    pipe, cache_config=DBCacheConfig(), # w/ default
    parallelism_config=ParallelismConfig(ulysses_size=2),
    quantize_config=QuantizeConfig(
        quant_type="float8_per_row",
        # Default (True), only quantize the repeated blocks in transformer if the repeated_blocks is 
        # specified. If set to False, the whole transformer will be quantized.
        regional_quantize=True, 
        # Specify the block names for the transformer, cache-dit will automatically find the repeated 
        # blocks and quantize it inplace. The block names can be found in the model architecture, e.g., 
        # for FLUX.2, the block name is "Flux2TransformerBlock" and "Flux2SingleTransformerBlock".
        repeated_blocks=['Flux2TransformerBlock', 'Flux2SingleTransformerBlock'],
        # repeated_blocks will be detected automatically from diffusers' transformer class, namely:
        # default repeated_blocks = transformer._repeated_blocks if exists, else None (quantize 
        # the whole transformer.
    ),
)

FP8 Per-Tensor Fallback

The per_tensor_fallback option in Cache-DiT's quantization configuration allows users to enable a fallback mechanism for layers that do not support float8 per-row or per-block quantization. This is particularly useful in scenarios where tensor parallelism is applied, and certain layers (e.g., those applied with RowwiseParallel) may encounter memory layout mismatch errors when quantized to float8 per-row.

When per_tensor_fallback is set to True, if a layer cannot be quantized to float8 per-row or per-block, it will automatically fall back to float8 per-tensor quantization instead of raising an error. This ensures that the quantization process can continue smoothly without interruption, while still providing the benefits of reduced precision for supported layers.

To enable this feature, simply set the per_tensor_fallback flag to True (default) in the QuantizeConfig when calling the enable_cache API. Only support for float8 quantization for now. For example:

import cache_dit
from cache_dit import DBCacheConfig, ParallelismConfig, QuantizeConfig

cache_dit.enable_cache( 
    pipe, cache_config=DBCacheConfig(), # w/ default
    parallelism_config=ParallelismConfig(tp_size=2),
    quantize_config=QuantizeConfig(
        quant_type="float8_per_row",
        # Must be True to enable fp8 per-tensor fallback.
        regional_quantize=True, # default, True.
        repeated_blocks=['Flux2TransformerBlock', 'Flux2SingleTransformerBlock'],
        # Enable fallback to float8 per-tensor quantization, default to True
        # for better compatibility for layers that do not support float8 per-row 
        # quantization, e.g., layers with RowwiseParallel applied in tensor parallelism.
        per_tensor_fallback=True, 
    ),
)

For examples, without fp8 per-tensor fallback, the cache-dit will auto skip the layers that do not support float8 per-row quantization, and raise warning for those layers. The performance will be worse due to less layers being quantized. (quantize 88 layers, skip 56 layers)

# w/o fp8 per-tensor fallback, quantize 88 layers, skip 56 layers, performance downgrade.
torchrun --nproc_per_node=2 -m cache_dit.generate flux2_klein_9b_kv_edit \
   --parallel tp --compile --float8-per-row --q-verbose \
   --disable-per-tensor-fallback

-----------------------------------------------------------------------------------
Quantized        Region: ['Flux2TransformerBlock', 'Flux2SingleTransformerBlock']  |
Quantized Linear Layers: 88    float8_per_row     56 (skipped)                     |
Quantized Linear Layers: 88    (total)                                             |
Skipped   Linear Layers: 56    (total)                                             |
Linear           Layers: 144   (total)                                             |
-----------------------------------------------------------------------------------
------------------------------------------------------------------------------------
float8_per_row, skip: attn.to_out.0        : pattern<RowwiseParallel>: 8    layers  |
float8_per_row, skip: attn.to_add_out      : pattern<RowwiseParallel>: 8    layers  |
float8_per_row, skip: ff.linear_out        : pattern<RowwiseParallel>: 8    layers  |
float8_per_row, skip: ff_context.linear_out: pattern<RowwiseParallel>: 8    layers  |
float8_per_row, skip: attn.to_out          : pattern<RowwiseParallel>: 24   layers  |
------------------------------------------------------------------------------------

With fp8 per-tensor fallback enabled, those layers that do not support float8 per-row quantization will be quantized to float8 per-tensor instead, and the performance will be better due to more layers being quantized. (quantize 144 layers, skip 0 layer)

# w/ fp8 per-tensor fallback enabled, quantize 144 layers, skip 0 layer, better performance.
torchrun --nproc_per_node=2 -m cache_dit.generate flux2_klein_9b_kv_edit \
   --parallel tp --compile --float8-per-row --q-verbose

# Default, enabled fp8 per-tensor fallback
-----------------------------------------------------------------------------------
Quantized        Region: ['Flux2TransformerBlock', 'Flux2SingleTransformerBlock']  |
Quantized Linear Layers: 88    float8_per_row     0 (skipped)                      |
Quantized Linear Layers: 56    float8_per_tensor  0 (skipped)                      |
Quantized Linear Layers: 144   (total)                                             |
Skipped   Linear Layers: 0     (total)                                             |
Linear           Layers: 144   (total)                                             |
-----------------------------------------------------------------------------------

(Hybrid) Precision Plan

The precision_plan option in QuantizeConfig allows users to specify different quantization types for matched layer-name patterns. It is useful when you want better control of the accuracy and performance trade-off for attention sub-layers (for example, keep to_k/to_v in float8_per_row while using float8_per_tensor for to_q/to_out). Please note:

Layers not matched by precision_plan continue to use the base quant_type.
precision_plan is only valid when regional_quantize=True. If regional quantization is disabled, precision plan will be ignored.
precision_plan is compatible with per_tensor_fallback. If a selected plan type is not supported by a specific layer/hardware path (case: rowwise tensor parallel is used and the basic quantize type is float8_per_row), fallback logic still works automatically when enabled.

For example: (FLUX.2-Klein-9b-kv)

import cache_dit
from cache_dit import DBCacheConfig, ParallelismConfig, QuantizeConfig

cache_dit.enable_cache(
    pipe,
    cache_config=DBCacheConfig(),
    quantize_config=QuantizeConfig(
       # Default type for unmatched layers in transformer.
        quant_type="float8_per_row",
        regional_quantize=True,
        repeated_blocks=['Flux2TransformerBlock', 'Flux2SingleTransformerBlock'],
        per_tensor_fallback=True,
        precision_plan={
            "attn.to_q": "float8_per_tensor",  # match: **attn.to_q**, best performance. 
            "attn.to_k": "float8_weight_only", # match: **attn.to_k**, best precision.
            "attn.to_v": "float8_per_block",   # match: **attn.to_v**, better precision.
            "attn.to_out": "float8_per_row",   # match: **attn.to_out**, better precision.
        },
    ),
)

# python3 -m cache_dit.generate flux2_klein_9b_kv_edit --config quantize_plan.yaml --compile

Then, the output summary will show the quantization type for each layer, and users can verify the quantization plan is applied correctly.

-----------------------------------------------------------------------------------
Quantized        Region: ['Flux2TransformerBlock', 'Flux2SingleTransformerBlock']  |
Quantized Linear Layers: 96    float8_per_row     0 (skipped)                      |
Quantized Linear Layers: 32    float8_per_tensor  0 (skipped)                      |
Quantized Linear Layers: 8     float8_per_block   0 (skipped)                      |
Quantized Linear Layers: 8     float8_weight_only 0 (skipped)                      |
Quantized Linear Layers: 144   (total)                                             |
Skipped   Linear Layers: 0     (total)                                             |
Linear           Layers: 144   (total)                                             |
-----------------------------------------------------------------------------------

INT8/INT4 Quantization

In addition to FP8 quantization, Cache-DiT also supports INT8 and INT4 quantization for weights, which can further reduce the memory footprint of the model. Users can specify int8_per_row, int8_per_tensor, int8_weight_only, or int4_weight_only as the quantization type in the QuantizeConfig when calling the enable_cache API. For example:

import cache_dit
from cache_dit import DBCacheConfig, ParallelismConfig, QuantizeConfig  

cache_dit.enable_cache( 
    # Or "int8_per_tensor", "int8_weight_only", "int4_weight_only", etc.
    pipe, quantize_config=QuantizeConfig(quant_type="int8_per_row"), 
)

INT4 quantization can provide even better memory reduction compared to FP8 or INT8, but it may cause more precision loss. We recommend users to try different quantization types and choose the one that best fits their needs in terms of the trade-off between performance and precision. In most cases, float8 per-row can be a good choice for better memory reduction while maintaining acceptable precision.

Please note that users should also install mslk kernel library to enable INT8/INT4 quantization features. The int4_weight_only w4a16 compute kennel requires architectures >= sm90 (Hopper or newer, TMA required). For older architectures, users can use int8_weight_only quantization for better compatibility.

# stable: mslk, torch and torchao (change cu130 to cu129 if using CUDA 12.9)
uv pip install torch==2.11.0 torchvision torchao triton mslk --index-url https://download.pytorch.org/whl/cu130 --upgrade

# nightly: mslk, torch and torchao (change cu130 to cu129 if using CUDA 12.9)
uv pip install --pre torch torchvision torchao triton mslk --index-url https://download.pytorch.org/whl/nightly/cu130 --upgrade

In the case of distributed inference (context parallelism or tensor parallelism), we recommend users to use float8 quantization to avoid potential compatibility issues.

Nunchaku (W4A4)

Cache-DiT natively supports the Hybrid Cache + Nunchaku + Context Parallelism scheme. Users can leverage caching and context parallelism to speed up Nunchaku 4-bits W4A4 models.

import cache_dit
from diffusers import QwenImagePipeline
from nunchaku import NunchakuQwenImageTransformer2DModel

transformer = NunchakuQwenImageTransformer2DModel.from_pretrained(
    f"path-to/svdq-int4_r32-qwen-image.safetensors"
)
pipe = QwenImagePipeline.from_pretrained(
   "Qwen/Qwen-Image", transformer=transformer, torch_dtype=torch.bfloat16,
).to("cuda")

cache_dit.enable_cache(pipe, cache_config=..., parallelism_config=...)

`v1.3.4`

Compare Source

hotfix

`v1.3.3`

Compare Source

hotfix

`v1.3.2`

Compare Source

hotfix release for fp8 per-row quantization w/ tensor parallel

Full Changelog: vipshop/cache-dit@v1.3.1...v1.3.2

`v1.3.1`

Compare Source

What's Changed

chore: update load configs docs by @DefTruth in #867
fix: skip fp8 quantize linear w/ bias in tp by @DefTruth in #869
chore: add quick start flags for quantize by @DefTruth in #871
chore: update pypi download badge by @DefTruth in #872
bugfix: remove un-supported quantize type by @DefTruth in #873
feat: expand quantize config by @DefTruth in #874
feat: support async ulysses for flux2 series by @DefTruth in #877
chore: cleanup patch functors codes by @DefTruth in #878
chore: fix docs typo by @DefTruth in #879
chore: safe import metrics funcs by @DefTruth in #880
chore: update quantization docs by @DefTruth in #881
chore: use rel imports for calibrators by @DefTruth in #882
chore: suppress torchao warnings by @DefTruth in #883
chore: add tune alias for max-autotune by @DefTruth in #884
remove manually graph break in cache blocks by @DefTruth in #885
docs: format docs by @DefTruth in #886
docs: fix typos by @DefTruth in #887
[1/N] feat: support flux2-klein kv - tp + compile by @DefTruth in #888
chore: cleanup tp utils codes by @DefTruth in #890
chore: fix api docs typo by @DefTruth in #891
chore: add mcc usage docs by @DefTruth in #892
chore: update mcc usage docs by @DefTruth in #893
chore: add mcc to cache-dit arch by @DefTruth in #894
chore: update mcc docs by @DefTruth in #895
[2/N] feat: support fp8 per-row + tp for flux2-klein kv by @DefTruth in #896
quant: add float8 linear check by @DefTruth in #898
docs: format docs by @DefTruth in #899
deps: bump up torch to 2.11.0 by @DefTruth in #900
quant: refactor torchao backend impl by @DefTruth in #901
feat: support regional quantization by @DefTruth in #902
chore: change docs highlight color by @DefTruth in #903
chore: optimize quant stats summary by @DefTruth in #904
kernel: register comm kernels as torch ops by @DefTruth in #905
kernel: refactor custom triton kernels by @DefTruth in #907
[2/N] kernel: refactor custom triton kernels by @DefTruth in #908
[3/N] kernel: refactor custom triton kernels by @DefTruth in #909
quant: refactor quantize api, deprecated kwargs by @DefTruth in #910
[2/N] quant: refactor quantize api, deprecated kwargs by @DefTruth in #911
chore: suppress diffusers torchao warnings by @DefTruth in #912
chore: fix load configs docs typo by @DefTruth in #913
chore: optimize quant ctx summary by @DefTruth in #914

Full Changelog: vipshop/cache-dit@v1.3.0...v1.3.1

`v1.3.0`: : USP, 2D/3D Parallel, FP8 Blockwise, ...

Compare Source

v1.3.0 Major Release: USP, 2D/3D Parallel, FP8 Blockwise, ...

Cache-DiT v1.3.0 is a major release after v.1.2.0, the major changes incuding:

Full Changelog: vipshop/cache-dit@v1.2.0...v1.3.0

`v1.2.3`

Compare Source

What's Changed

feat: support 🔥FireRed-Image-Edit-1.0 by @DefTruth in #797
misc: support custom input height/width by @DefTruth in #799
chore: support compile repeated blocks in examples by @DefTruth in #800
chore: add cache-dit arch by @DefTruth in #802
chore: update cache-dit arch by @DefTruth in #803
chore: update cache-dit arch by @DefTruth in #804
chore: update cache-dit arch by @DefTruth in #805
chore: update cache-dit arch by @DefTruth in #806
chore: update cache-dit arch by @DefTruth in #807
chore: update cache-dit arch by @DefTruth in #809
fix tp flat mesh broken for torch < 2.10 by @DefTruth in #810
chore: only logging at rank 0 by default by @DefTruth in #812
chore: add env docs by @DefTruth in #813

Full Changelog: vipshop/cache-dit@v1.2.2...v1.2.3

`v1.2.2`

Compare Source

What's Changed

fix load config docs typo by @DefTruth in #778
chore: rename hybrid parallel backend by @DefTruth in #779
feat: add an extend context parallel api by @DefTruth in #780
chore: set save_ctx as False for ring p2p by @DefTruth in #782
chore: add flux2-klein edit examples by @DefTruth in #783
fix ring lse fp32 convert error by @DefTruth in #785
feat: support cache for glm-image by @DefTruth in #787
chore: reset rdt as 0.12 in examples for better precision by @DefTruth in #789
chore: update badges by @DefTruth in #790
feat: ring attn w/ npu_fia for ascend npu by @luren55 in #792
feat: support tensor parallel for glm-image by @DefTruth in #794

Full Changelog: vipshop/cache-dit@v1.2.1...v1.2.2

`v1.2.1`: USP, 2D/3D Parallel

Compare Source

🎉 v1.2.1 release is ready, the major updates including: Ring Attention w/ batched P2P, USP (Hybrid Ring and Ulysses), Hybrid 2D and 3D Parallelism (💥USP + TP), VAE-P Comm overhead reduce.

# Hybrid 2D/3D Parallelism in Cache-DiT is fully compatible w/ torch.compile, 
# Cache Acceleration, Text Encoder Parallelism, VAE Parallelism and more.
torchrun --nproc_per_node=8 -m cache_dit.generate flux2 --config parallel_2d.yaml --compile
torchrun --nproc_per_node=8 -m cache_dit.generate flux2 --config parallel_3d.yaml --compile
torchrun --nproc_per_node=8 -m cache_dit.generate --parallel ulysses_tp --cache --compile

What's Changed

[chore] Align torch generator with example by @BBuf in #723
Fix generator bug in cache-dit by @BBuf in #724
examples: allow custom generator device by @DefTruth in #726
examples: allow custom warmup-steps by @DefTruth in #727
docs: add latest news by @DefTruth in #728
docs: fix docs format by @DefTruth in #729
fix selected metrics print by @66RING in #730
docs: add flux examples to tp docs by @DefTruth in #731
fix ltx-2 i2v example by @DefTruth in #734
Update README.md by @DefTruth in #735
chore: allow use default steps for scm by @DefTruth in #736
[chore] support gpu generator in server by @BBuf in #737
docs: update download badge by @DefTruth in #738
Refine profiler and serving docs by @BBuf in #739
example image-path support url by @BBuf in #742
fix UAA broken while using joint attn by @DefTruth in #743
compile: avoid graph break for UAA by @DefTruth in #744
refactor configs yml in examples by @DefTruth in #745
relax npu attention import by @DefTruth in #747
feat: add set_attn_backend api by @DefTruth in #748
docs: update quick start by @DefTruth in #749
fix ring attn w/ native backend in torch 2.10 by @DefTruth in #750
feat: NPU FA support attention mask by @zhangtao0408 in #751
feat: add cache-dit-generate cli tool by @DefTruth in #752
docs: update ascend npu examples by @DefTruth in #753
feat: support ring attn p2p comm by @DefTruth in #754
feat: support USP -> Ulysses + Ring by @DefTruth in [#755](ht

✂ Note

PR body was truncated to here.

Configuration

📅 Schedule: (UTC)

Branch creation
- At any time (no schedule defined)
Automerge
- At any time (no schedule defined)

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

renovate Bot force-pushed the renovate/cache_dit-1.x branch from 9463dfe to fd0ef19 Compare February 2, 2026 05:23

renovate Bot changed the title ~~Update dependency cache_dit to v1.2.0~~ Update dependency cache_dit to v1.2.1 Feb 2, 2026

renovate Bot force-pushed the renovate/cache_dit-1.x branch from fd0ef19 to c65665d Compare February 10, 2026 08:50

renovate Bot changed the title ~~Update dependency cache_dit to v1.2.1~~ Update dependency cache_dit to v1.2.2 Feb 10, 2026

renovate Bot force-pushed the renovate/cache_dit-1.x branch from c65665d to 376529f Compare February 26, 2026 08:32

renovate Bot changed the title ~~Update dependency cache_dit to v1.2.2~~ Update dependency cache_dit to v1.2.3 Feb 26, 2026

renovate Bot force-pushed the renovate/cache_dit-1.x branch from 376529f to 8e12ad4 Compare March 11, 2026 12:52

renovate Bot changed the title ~~Update dependency cache_dit to v1.2.3~~ Update dependency cache_dit to v1.3.0 Mar 11, 2026

renovate Bot force-pushed the renovate/cache_dit-1.x branch from 8e12ad4 to 90ea7f3 Compare March 25, 2026 09:55

renovate Bot changed the title ~~Update dependency cache_dit to v1.3.0~~ Update dependency cache_dit to v1.3.1 Mar 25, 2026

renovate Bot force-pushed the renovate/cache_dit-1.x branch from 90ea7f3 to 7f6b76d Compare March 26, 2026 13:02

renovate Bot changed the title ~~Update dependency cache_dit to v1.3.1~~ Update dependency cache_dit to v1.3.3 Mar 26, 2026

renovate Bot force-pushed the renovate/cache_dit-1.x branch from 7f6b76d to ba50475 Compare March 27, 2026 05:24

renovate Bot changed the title ~~Update dependency cache_dit to v1.3.3~~ Update dependency cache_dit to v1.3.4 Mar 27, 2026

renovate Bot force-pushed the renovate/cache_dit-1.x branch from ba50475 to 0b1a44a Compare March 30, 2026 10:15

renovate Bot changed the title ~~Update dependency cache_dit to v1.3.4~~ Update dependency cache_dit to v1.3.5 Mar 30, 2026

renovate Bot force-pushed the renovate/cache_dit-1.x branch from 0b1a44a to 1ce318c Compare May 11, 2026 05:43

renovate Bot changed the title ~~Update dependency cache_dit to v1.3.5~~ Update dependency cache_dit to v1.3.6 May 11, 2026

renovate Bot force-pushed the renovate/cache_dit-1.x branch from 1ce318c to 28fa8b8 Compare May 12, 2026 09:11

renovate Bot changed the title ~~Update dependency cache_dit to v1.3.6~~ Update dependency cache_dit to v1.3.7 May 12, 2026

renovate Bot force-pushed the renovate/cache_dit-1.x branch from 28fa8b8 to 9ea5b4b Compare May 25, 2026 14:11

renovate Bot changed the title ~~Update dependency cache_dit to v1.3.7~~ Update dependency cache_dit to v1.3.8 May 25, 2026

Update dependency cache_dit to v1.3.9

302b19a

renovate Bot force-pushed the renovate/cache_dit-1.x branch from 9ea5b4b to 302b19a Compare May 27, 2026 04:30

renovate Bot changed the title ~~Update dependency cache_dit to v1.3.8~~ Update dependency cache_dit to v1.3.9 May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update dependency cache_dit to v1.3.9#33

Update dependency cache_dit to v1.3.9#33
renovate[bot] wants to merge 1 commit into
mainfrom
renovate/cache_dit-1.x

renovate Bot commented Jan 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

renovate Bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release Notes

What's Changed

What's Changed

What's Changed

What's Changed

v1.3.5: Quantization

Low-bits Quantization

Overview

FP8 Quantization

Regional Quantization

FP8 Per-Tensor Fallback

(Hybrid) Precision Plan

INT8/INT4 Quantization

Nunchaku (W4A4)

What's Changed

v1.3.0: : USP, 2D/3D Parallel, FP8 Blockwise, ...

v1.3.0 Major Release: USP, 2D/3D Parallel, FP8 Blockwise, ...

What's Changed

What's Changed

v1.2.1: USP, 2D/3D Parallel

What's Changed

Configuration

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

renovate Bot commented Jan 16, 2026 •

edited

Loading

`v1.3.5`: Quantization

`v1.3.0`: : USP, 2D/3D Parallel, FP8 Blockwise, ...

`v1.2.1`: USP, 2D/3D Parallel