Skip to content

Text to image similarity #102

@xiaoooobai

Description

@xiaoooobai

I want to compute the similarity of my prompt and the model response. But a strange thing happened, whatever the prompt is related to the image, the similarity score range 0.5-0.6.
The following is my own clip function by using your model:
def long_clip_similarity(image, text, model, preprocess, device):
"""计算图像和文本之间的Long-CLIP相似度"""
try:
# 处理图像
image_processed = preprocess(image).unsqueeze(0).to(device)
# 处理文本
text_input = longclip.tokenize([text], truncate=True).to(device)

        # 计算特征
        with torch.no_grad():
            image_features = model.encode_image(image_processed)
            text_features = model.encode_text(text_input)
        
        # 归一化特征
        image_features = image_features / image_features.norm(dim=-1, keepdim=True)
        text_features = text_features / text_features.norm(dim=-1, keepdim=True)
        
        # 计算相似度
        similarity = torch.matmul(image_features, text_features.T).item()
        
        return similarity

Related text and image example:

Image

Unrelated text and image example:

Image

Looking forward to your reply.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions