[BugFix] MultimodalHasher.hash_features 存在 ndarray shape/dtype 哈希碰撞风险

## 问题描述

`MultimodalHasher.hash_features()` 使用 `np.ndarray.tobytes()` 计算 SHA-256 作为多模态缓存的 key，但 `tobytes()` 只序列化原始元素字节，**不编码 shape 和 dtype 元信息**。

这意味着：
- shape 不同但展平后字节相同的数组（如 `(6,4)` vs `(4,6)`）会产生相同的 hash
- dtype 不同但恰好字节模式一致的数组（如 `float32` vs `uint8` 的内存重解释）也会碰撞

```python
import numpy as np, hashlib

base = np.arange(24, dtype=np.float32)
a = base.reshape(6, 4)
b = base.reshape(4, 6)

hashlib.sha256(a.tobytes()).hexdigest() == hashlib.sha256(b.tobytes()).hexdigest()
# True — 不同 shape，相同 hash
```

## 影响范围

该 hash 被用于三层多模态缓存的 key：
- `ProcessorCacheManager` — 预处理像素张量缓存
- `EncoderCacheManager` — Vision Encoder 输出特征缓存
- `PrefixCacheManager` — KV-block 前缀缓存（通过 `get_block_hash_extra_keys` 注入 `mm_hash`）

碰撞会导致缓存错误命中，返回错误的多模态特征。

## 当前状态

当前推理管线中所有 processor 固定使用 `float32` dtype 和确定性 reshape 路径，实际碰撞概率极低。但 hash 函数本身的正确性不应依赖调用方的隐式约束。

## 修复方案

在 `tobytes()` 前拼接 shape 和 dtype header：

```python
header = f"{obj.shape}|{obj.dtype}|".encode()
return hashlib.sha256(header + obj.tobytes()).hexdigest()
```

## 相关 PR

#7185

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] MultimodalHasher.hash_features 存在 ndarray shape/dtype 哈希碰撞风险 #7196

问题描述

影响范围

当前状态

修复方案

相关 PR

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BugFix] MultimodalHasher.hash_features 存在 ndarray shape/dtype 哈希碰撞风险 #7196

Description

问题描述

影响范围

当前状态

修复方案

相关 PR

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions