Principled Question: Usage only of iri and label in encoding

Dear OntoAligner Team, 
 I'm puzzled and curious about your thoughts and consideration in using only the IRI and label from the literals in the encoding and the subsequent alignment. 

The structure from the GenericOntology class, which collects also comments and synonyms, made me expect that these additional literals would be also used in the subsequent encoding and matching. However, the preprocess method of the BaseEncoder class expects strings which make me believe that there is design consideration behind this?

The synonyms and the comment structure seems unutilized by the subsequent encoders and the point of extracting this very useful information is lost to me.

You have given much more thought on the overall process, so I would like to ask early why should we not concatenate maximum amount of litterers to the text?

Something like:

`    def get_owl_items(self, owl: Dict) -> Any:
        """
        Extracts the IRI and label of a concept from the given OWL item.

        Parameters:
            owl (Dict): A dictionary representing an OWL item, expected to contain 'iri', 'label', 'synonyms' and 'comments' keys.

        Returns:
            Dict: A dictionary containing the IRI and label, synonyms and comments of the concept.
        """
        synonyms = " ".join(owl['synonyms'])
        comments = " ".join(owl['comment'])
        return {"iri": owl["iri"], "text": owl["label"]+" has synonyms "+synonyms +" IS DEFINED AS "+comments}`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Principled Question: Usage only of iri and label in encoding #77

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Principled Question: Usage only of iri and label in encoding #77

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions