Skip to content

Hard to get the full model name of long models like sentence-transformers paraphrase #632

@vemonet

Description

@vemonet

The model names for sentence-transformers/paraphrase-multilingual-... is always cut with 3 little dots, so it's hard to differentiate the 2 that are available.

import pandas as pd
from fastembed import TextEmbedding

supported_models = (
    pd.DataFrame(TextEmbedding.list_supported_models())
    .sort_values("size_in_GB")
    .drop(columns=["sources", "model_file", "additional_files"])
    .reset_index(drop=True)
)
supported_models

Getting this:

                                                model                                        description       license  size_in_GB   dim                                              tasks
0                              BAAI/bge-small-en-v1.5  Text embeddings, Unimodal (text), English, 512...           mit       0.067   384                                                 {}
1              sentence-transformers/all-MiniLM-L6-v2  Text embeddings, Unimodal (text), English, 256...    apache-2.0       0.090   384                                                 {}
2                              BAAI/bge-small-zh-v1.5  Text embeddings, Unimodal (text), Chinese, 512...           mit       0.090   512                                                 {}
3                 snowflake/snowflake-arctic-embed-xs  Text embeddings, Unimodal (text), English, 512...    apache-2.0       0.090   384                                                 {}
4                  jinaai/jina-embeddings-v2-small-en  Text embeddings, Unimodal (text), English, 819...    apache-2.0       0.120   512                                                 {}
5                                   BAAI/bge-small-en  Text embeddings, Unimodal (text), English, 512...           mit       0.130   384                                                 {}
6                    nomic-ai/nomic-embed-text-v1.5-Q  Text embeddings, Multimodal (text, image), Eng...    apache-2.0       0.130   768                                                 {}
7                  snowflake/snowflake-arctic-embed-s  Text embeddings, Unimodal (text), English, 512...    apache-2.0       0.130   384                                                 {}
8                               BAAI/bge-base-en-v1.5  Text embeddings, Unimodal (text), English, 512...           mit       0.210   768                                                 {}
9   sentence-transformers/paraphrase-multilingual-...  Text embeddings, Unimodal (text), Multilingual...    apache-2.0       0.220   384                                                 {}
10                          Qdrant/clip-ViT-B-32-text  Text embeddings, Multimodal (text&image), Engl...           mit       0.250   512                                                 {}
11                  jinaai/jina-embeddings-v2-base-de  Text embeddings, Unimodal (text), Multilingual...    apache-2.0       0.320   768                                                 {}
12                                   BAAI/bge-base-en  Text embeddings, Unimodal (text), English, 512...           mit       0.420   768                                                 {}
13                 snowflake/snowflake-arctic-embed-m  Text embeddings, Unimodal (text), English, 512...    apache-2.0       0.430   768                                                 {}
14                                  thenlper/gte-base  General text embeddings, Unimodal (text), supp...           mit       0.440   768                                                 {}
15                  jinaai/jina-embeddings-v2-base-en  Text embeddings, Unimodal (text), English, 819...    apache-2.0       0.520   768                                                 {}
16                       nomic-ai/nomic-embed-text-v1  Text embeddings, Multimodal (text, image), Eng...    apache-2.0       0.520   768                                                 {}
17                     nomic-ai/nomic-embed-text-v1.5  Text embeddings, Multimodal (text, image), Eng...    apache-2.0       0.520   768                                                 {}
18            snowflake/snowflake-arctic-embed-m-long  Text embeddings, Unimodal (text), English, 204...    apache-2.0       0.540   768                                                 {}
19                                jinaai/jina-clip-v1  Text embeddings, Multimodal (text&image), Engl...    apache-2.0       0.550   768                                                 {}
20                jinaai/jina-embeddings-v2-base-code  Text embeddings, Unimodal (text), Multilingual...    apache-2.0       0.640   768                                                 {}
21                  jinaai/jina-embeddings-v2-base-zh  Text embeddings, Unimodal (text), supports mix...    apache-2.0       0.640   768                                                 {}
22                  jinaai/jina-embeddings-v2-base-es  Text embeddings, Unimodal (text), supports mix...    apache-2.0       0.640   768                                                 {}
23                 mixedbread-ai/mxbai-embed-large-v1  Text embeddings, Unimodal (text), English, 512...    apache-2.0       0.640  1024                                                 {}
24  sentence-transformers/paraphrase-multilingual-...  Text embeddings, Unimodal (text), Multilingual...    apache-2.0       1.000   768                                                 {}
25                 snowflake/snowflake-arctic-embed-l  Text embeddings, Unimodal (text), English, 512...    apache-2.0       1.020  1024                                                 {}
26                             BAAI/bge-large-en-v1.5  Text embeddings, Unimodal (text), English, 512...           mit       1.200  1024                                                 {}
27                                 thenlper/gte-large  Text embeddings, Unimodal (text), English, 512...           mit       1.200  1024                                                 {}
28                     intfloat/multilingual-e5-large  Text embeddings, Unimodal (text), Multilingual...           mit       2.240  1024                                                 {}
29                          jinaai/jina-embeddings-v3  Multi-task unimodal (text) embedding model, mu...  cc-by-nc-4.0       2.290  1024  {'retrieval.query': 0, 'retrieval.passage': 1,...

Even with a 36" ultra large screen full screen we get the 3 dots.

I needed to make edits to the documented script to get the full name:

import pandas as pd
from fastembed import TextEmbedding

supported_models = (
    pd.DataFrame(TextEmbedding.list_supported_models())
    .sort_values("size_in_GB")
    .drop(columns=["sources", "model_file", "additional_files"])
    .reset_index(drop=True)
)
pd.set_option("display.max_colwidth", None)
print(supported_models)

Ideally we should not have to create and run this python script to see the names of available models. But that's ideally

Minimally the documented script should at least enable us to see the full name without having to dig in pandas docs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions