Skip to content

Principled Question: Usage only of iri and label in encoding #77

@peio

Description

@peio

Dear OntoAligner Team,
I'm puzzled and curious about your thoughts and consideration in using only the IRI and label from the literals in the encoding and the subsequent alignment.

The structure from the GenericOntology class, which collects also comments and synonyms, made me expect that these additional literals would be also used in the subsequent encoding and matching. However, the preprocess method of the BaseEncoder class expects strings which make me believe that there is design consideration behind this?

The synonyms and the comment structure seems unutilized by the subsequent encoders and the point of extracting this very useful information is lost to me.

You have given much more thought on the overall process, so I would like to ask early why should we not concatenate maximum amount of litterers to the text?

Something like:

` def get_owl_items(self, owl: Dict) -> Any:
"""
Extracts the IRI and label of a concept from the given OWL item.

    Parameters:
        owl (Dict): A dictionary representing an OWL item, expected to contain 'iri', 'label', 'synonyms' and 'comments' keys.

    Returns:
        Dict: A dictionary containing the IRI and label, synonyms and comments of the concept.
    """
    synonyms = " ".join(owl['synonyms'])
    comments = " ".join(owl['comment'])
    return {"iri": owl["iri"], "text": owl["label"]+" has synonyms "+synonyms +" IS DEFINED AS "+comments}`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions