diff --git a/ImplementationGuidance/Specifications/knowledge_level_agent_type_specification.md b/ImplementationGuidance/Specifications/knowledge_level_agent_type_specification.md index 1d61a71e..08dd6b2d 100644 --- a/ImplementationGuidance/Specifications/knowledge_level_agent_type_specification.md +++ b/ImplementationGuidance/Specifications/knowledge_level_agent_type_specification.md @@ -10,41 +10,42 @@ These properties are complemented by a larger model that is being developed to support a more detailed representation of evidence and provenance metadata, and will be documented elsewhere. -The scope of this initial specification covers only **Agent Type** and **Knowledge Level** metadata - which describes the type of knowledge expressed in an edge based on type of agent that originally generated a statement of knowledge encoded in an Edge, and the level of knowledge expressed in an Edge. It is complemented by a [Supplemental Guidance document](https://docs.google.com/document/d/140dtM5CjWM97JiBRdAmDT-9IKqHoOj-xbE_5TWkdYqg/edit) that provides detailed examples implementation support. +The scope of this initial specification covers only **Agent Type** and **Knowledge Level** metadata - which describes the type of knowledge expressed in an edge based on the type of agent that originally generated a statement of knowledge encoded in an Edge, and the level of knowledge expressed in an Edge. It is complemented by a [Supplemental Guidance document](https://docs.google.com/document/d/140dtM5CjWM97JiBRdAmDT-9IKqHoOj-xbE_5TWkdYqg/edit) that provides detailed examples and implementation support. ## Biolink properties and enumerations -Biolink defines hte following properties and enumerations for classifying agent type and knowledge level, which are used to annotate individual edges in knowledge graphs. +Biolink defines the following properties and enumerations for classifying agent type and knowledge level, which are used to annotate individual edges in knowledge graphs. ### Agent Type **Biolink Edge Property**: - - `agent_type`: describes the high-level category of agent that originally generated a statement of knowledge or other type of information. Permissible values are provided by the **biolink:AgentTypeEnum** enumeration, as defined below: + - `agent_type`: Describes the high-level category of agent who originally generated a statement of knowledge or other type of information. Permissible values are provided by the **biolink:AgentTypeEnum** enumeration, as defined below: **Permissible Values**: - - `manual_agent`: a human agent is responsible for generating the knowledge expressed in the Edge. The human may utilize computationally generated information as evidence for the resulting knowledge, but the human is the one who ultimately generates this knowledge. - - `automated_agent`: an automated agent, typically a software program or tool, is responsible for generating the knowledge expressed in the Edge. Human contribution to the knowledge creation process ends with the definition and coding of algorithms or analysis pipelines that get executed by the automated agent. - - `data_analysis_pipeline`: an automated agent that executes an analysis workflow over data and reports results in an Edge. These typically report statistical associations/correlations between variables in the input data. - - `computational_model`: an automated agent that generates knowledge (typically predictions) based on rules/logic explicitly encoded in an algorithm (e.g. heuristic models, supervised classifiers), or learned from patterns observed in data (e.g. ML models, unsupervised classifiers). - - `text_mining_agent`: an automated agent that uses Natural Language Processing to recognize concepts and/or relationships in text, and generates Edges relating these concepts with formally encoded semantics. - - `image_processing_agent`: an automated agent that processes images to generate Edges reporting knowledge derived from the image and/or expressed in text the image depicts (e.g. via OCR). - - ` manual_validation_of_ automated_agent`: a human agent reviews and validates/approves the veracity of knowledge that is initially generated by an automated agent. - - `not_provided`: the agent type is not provided, typically because it cannot be determined from available information if the agent that generated the knowledge is manual or automated. + - `manual_agent`: A human agent who is responsible for generating a statement of knowledge. The human may utilize computationally generated information as evidence for the resulting knowledge, but the human is the one who ultimately interprets/reasons with this evidence to produce a statement of knowledge. + - `automated_agent`: An automated agent, typically a software program or tool, that is responsible for generating a statement of knowledge. Human contribution to the knowledge creation process ends with the definition and coding of algorithms or analysis pipelines that get executed by the automated agent. + - `data_analysis_pipeline`: An automated agent that executes an analysis workflow over data and reports the direct results of the analysis. These typically report statistical associations/correlations between variables in the input dataset, and do not interpret/infer broader conclusions from associations the analysis reveals in the data. + - `computational_model`: An automated agent that generates knowledge statements (typically predictions) based on rules/logic explicitly encoded in an algorithm (e.g. heuristic models, supervised classifiers), or learned from patterns observed in data (e.g. ML models, unsupervised classifiers). + - `text_mining_agent`: An automated agent that uses Natural Language Processing to recognize concepts and/or relationships in text, and report them using formally encoded semantics (e.g. as an edge in a knowledge graph). + - `image_processing_agent`: An automated agent that processes images to generate textual statements of knowledge derived from the image and/or expressed in text the image depicts (e.g. via OCR). + - `manual_validation_of_automated_agent`: A human agent reviews and validates/approves the veracity of knowledge that is initially generated by an automated agent. + - `not_provided`: The agent type is not provided, typically because it cannot be determined from available information if the agent that generated the knowledge is manual or automated. -Note that this property indicates the type of agent who *produced a final statement of knowledge*, which is often different from the agent or agents who produced *information used as evidence* to support the generation of this knowledge. For example, if a human curator concludes that a particular gene variant causes a medical condition - based on their interpretation of information produced by computational modeling tools, automated statistical analyses, and robotic laboratory assay systems - the agent type for this statement is "manual_agent" despite all of the evidence being created by automated agents. But if any of these systems is programmed to generate knowledge statements directly and without human assistance, the statement would be attributed to an "automated_agent". +Note that this property indicates the type of agent who *produced a final statement of knowledge*, which is often different from the agent or agents who produced *information used as evidence* to support generation of this knowledge. For example, if a human curator concludes that a particular gene variant causes a medical condition - based on their interpretation of information produced by computational modeling tools, automated data analysis pipelines, and robotic laboratory assay systems - the agent_type for this statement is "manual_agent" - despite all of the evidence being created by automated agents. But if any of these systems is programmed to generate knowledge statements directly and without human assistance, the statement would be attributed to an "automated_agent". ### Knowledge Level **Biolink Edge Property**: -- `knowledge_level`: describes the level of knowledge expressed in a statement, based on the reasoning or analysis methods used to generate the statement, or the scope or specificity of what the statement expresses to be true. Permissible values are defined in the **biolink:KnowledgeLevelEnum** enumeration: +- `knowledge_level`: Describes the level of knowledge expressed in a statement, based on the reasoning or analysis methods used to generate the statement, or the scope or specificity of what the statement expresses to be true. Permissible values are defined in the **biolink:KnowledgeLevelEnum** enumeration: **Permissible Values**: - - `knowledge_assertion`: a statement of purported fact that is put forth by an agent as true, based on assessment of direct evidence. Assertions generally have a high confidence of being true based on the strength of evidence supporting them. - - `logical_entailment`: a statement reporting a conclusion that follows logically from premises, which are typically well-established facts or knowledge assertions. (e.g. fingernail part of finger, finger part of hand → fingernail part of hand)). Logical entailments are based on dedictive inference, and generally have a high degree of confidence when based on sound premises and inference logic. - - `prediction`: a statement of a possible fact based on more probabilistic (non-deductive) forms of reasoning over indirect forms of evidence, that lead to more speculative conclusions. Predictions often have a lower degree of confidence based on the indirect nature of their evidence and reasoning supporting them. - - `statistical_association`: a statement that reports concepts representing variables in a dataset to be statistically associated in the context of a particular cohort or dataset (e.g. “Metformin Treatment (variable 1) is correlated with Diabetes Diagnosis (variable 2) in EHR dataset X”). These associations are inherently true in that they simple report the results of some statistical analysis, but do not interpret these data to draw broader conclusions about general types in the domain of discourse. - - `observation`: a statement reporting (and possibly quantifying) a phenomenon that was observed to occur - absent any analysis or interpretation that generates a statistical association or supports a broader conclusion or inference. Observation statements are also inherently true in that they simple report what an agent observed - without any interpretation or inference. - - `not_provided`: the knowledge level/type fora statement is not provided, typically because it cannot be determined from available information. + - `knowledge_assertion`: A statement of purported fact that is put forth by an agent as true, based on assessment of direct evidence. Assertions are likely but not definitively true. + - `logical_entailment`: A statement reporting a conclusion that follows logically from premises representing established facts or knowledge assertions (e.g. fingernail part of finger, finger part of hand --> fingernail part of hand). + - `prediction`: A statement of a possible fact based on probabilistic forms of reasoning over more indirect forms of evidence, that lead to more speculative conclusions. + - `statistical_association`: A statement that reports concepts representing variables in a dataset to be statistically associated with each other in a particular cohort (e.g. 'Metformin Treatment (variable 1) is correlated with Diabetes Diagnosis (variable 2) in EHR dataset X'). + - `text_co_occurrence`: A statement reporting that mentions of two concepts in some corpus of text (e.g. the biomedical literature) occur together at a statistically significant frequency - suggesting that a real-world biological or clinical relationship may exist between the concepts. + - `observation`: A statement reporting (and possibly quantifying) a phenomenon that was observed to occur - absent any analysis or interpretation that generates a statistical association or supports a broader conclusion or inference. + - `not_provided`: The knowledge level is not provided, typically because it cannot be determined from available information. -NOTE that the notion of a 'level' of knowledge can in one sense relate to the strength of a statement - i.e. how confident we are that it says something true about our domain of discourse. Here, we can generally consider Knowledge Assertions to be stronger than Entailments to be stronger than Predictions. But in another sense, 'level' of knowledge can refer to the scope or specificity of what a statement expresses - on a spectrum from context-specific results of a data analysis, to generalized assertions of knowledge or fact. Here, Statistical Associations and Observations represent more foundational statements that are only slightly removed from the data on which they are based (the former reporting the direct results of an analysis in terms of correlations between variables in the data, and the latter describing phenomena that were observed/reported to have occurred). +NOTE that the notion of a 'level' of knowledge can in one sense relate to the strength of a statement - i.e. how confident we are that it says something true about our domain of discourse. Here, we can generally consider Assertions to be stronger than Entailments to be stronger than Predictions. But in another sense, 'level' of knowledge can refer to the scope or specificity of what a statement expresses - on a spectrum from context-specific results of a data analysis, to generalized assertions of knowledge or fact. Here, Statistical Associations and Observations represent more foundational statements that are only slightly removed from the data on which they are based (the former reporting the direct results of an analysis in terms of correlations between variables in the data, and the latter describing phenomena that were observed/reported to have occurred). ## Implementation Guidance @@ -52,14 +53,16 @@ NOTE that the notion of a 'level' of knowledge can in one sense relate to the st 2. Knowledge Providers MUST report one and only one `knowledge level` for each Edge returned in a TRAPI message, using a `knowledge_level` property. The value MUST come from the [biolink:KnowledgeLevelEnum](https://biolink.github.io/biolink-model/KnowledgeLevelEnum/). +```json "edge_id": { "subject": "PUBCHEM.COMPOUND:6623", "predicate": "biolink:affects", "object": "HGNC:3467", - "sources": { . . . } + "sources": { . . . }, "knowledge_level": "knowledge_assertion", "agent_type": "manual_agent" } +``` 3. The main challenge in applying this standard concerns selecting appropriate agent type and knowledge level terms for a given Edge. To assist KPs in this task, a [Supplemental Guidance document](https://docs.google.com/document/d/140dtM5CjWM97JiBRdAmDT-9IKqHoOj-xbE_5TWkdYqg/edit) provides additional implementation support beyond the base specification above. This includes clarification of key distinctions, tips for proper term selection, and a corpus of examples illustrating how agent type and knowledge level terms are applied to the diverse kinds of Edges provided in Translator knowledge graphs. diff --git a/ImplementationGuidance/Specifications/pathfinder_query_specification.md b/ImplementationGuidance/Specifications/pathfinder_query_specification.md index 9b00bcf9..08de9b04 100644 --- a/ImplementationGuidance/Specifications/pathfinder_query_specification.md +++ b/ImplementationGuidance/Specifications/pathfinder_query_specification.md @@ -148,7 +148,7 @@ Paths are represented within AuxiliaryGraphs. Each Path references edges from th AuxiliaryGraph's `edges` field. These Edges are unordered; their sequence does not convey Path structure. Instead, the Path must be reconstructed from the graph itself. -Paths are expected to be linear—there should be no way to skip or bypass any node in the Path. However, parallel edges +Paths are expected to be linear — there should be no way to skip or bypass any node in the Path. However, parallel edges between two nodes are allowed. Using the Knowledge Graph above, we can construct the AuxiliaryGraphs shown in the example below: @@ -185,7 +185,7 @@ Path `a2` represents a direct lookup edge between Crohn’s and Parkinson’s. I input nodes, it should be included as a valid path. The Paths shown correspond to the unconstrained version of the initial query. Applying `intermediate_categories` -constraints would yield a slightly different set of paths. As show below in the constrained version of the +constraints would yield a slightly different set of paths. As shown below in the constrained version of the AuxiliaryGraphs, `a2` is removed because it does not contain any `Gene` nodes. Therefore, it is not a valid Path for the constrained version of the query. @@ -212,7 +212,7 @@ for the constrained version of the query. ## Results Each individual Result is structured similarly, with NodeBindings and Analyses. Both input -nodes must be pinned, and therefor no unpinned nodes will exist in the QueryGraph. This means there is only one result +nodes must be pinned, and therefore no unpinned nodes will exist in the QueryGraph. This means there is only one result for each query, contained within the Results field of the Message. This Result can have many Analyses, each one corresponding to a different Path. This follows the same Result-merging rules used in other query types, where results that contain the same nodes but different analyses are combined into a single result, with their analyses concatenated. diff --git a/ImplementationGuidance/Specifications/qualifier_rules_and_examples.md b/ImplementationGuidance/Specifications/qualifier_rules_and_examples.md index 167e9e4b..2d4b3058 100644 --- a/ImplementationGuidance/Specifications/qualifier_rules_and_examples.md +++ b/ImplementationGuidance/Specifications/qualifier_rules_and_examples.md @@ -2,28 +2,18 @@ These rules can not be enforced in the schema for TRAPI, but should be implemented in a validation layer. -1. __general rules__ - 1. There MUST be only one of each type of qualifier in any edges.qualifier_constraints.qualifier_set - 1. There MUST be only one qualified_predicate for each set of qualifiers in a QualifierConstraint. - 2. qualified_predicate is an optional qualifier. (see [localization_or_transport.json](../DataExamples/localization_or_transport.json)) - 1. Both the qualified_predicate and the predicate edge properties SHOULD be queried when a predicate is provided. - see [causes_predicate_vs_qualifier.json](../DataExamples/causes_predicate_vs_qualifier.json) - 2. If a KP receives non-empty QEdge.qualifier_constraints, it MUST only return edges that satisfy the entire set of - qualifier_constraints. If a KP does not yet support QEdge.qualifier_constraints, it MUST return an empty response - because no matches are found. - 1. If a knowledge statement contains more qualifiers or differently typed qualifiers than those specified in - edges.qualifier_constraints.qualifier_set in addition to the entire set of qualifier_constraints, the knowledge - statement MAY also be returned. - 3. Qualifier constraints should be treated as "or" constraints. -2. __qualifier_value__ - 1. is constrained by either: an enumeration in biolink, or an ontology term. - 1. When an ontology term is used, the assumption is that annotations that use this term or any of its children +1. __general rules for constraints__ + 1. There MUST be only one instance of each qualifier type in a single `QEdge.constraints.qualifiers` object (qualifier-set) + 1. `qualified_predicate` is an optional qualifier (see [localization_or_transport.json](../DataExamples/localization_or_transport.json)). Both the `qualified_predicate` and the `predicates` constraint SHOULD be met when both are present on the same QEdge (see [causes_predicate_vs_qualifier.json](../DataExamples/causes_predicate_vs_qualifier.json)). + 2. If a KP receives a query with `constraints.qualifiers` on a QEdge, the edges it returns for that QEdge MUST satisfy at least 1 of `constraints.qualifiers` objects (qualifier-set) - meaning all constraints within that object. + 1. If an Edge satisfies a `constraints.qualifiers` object and contains other qualifiers not specified in that object, it MAY also be returned. +2. __for the constraint qualifier values__ + 1. They may be from a Biolink enum for the qualifier type or from an ontology. + 1. When an ontology term is used, the assumption is that edges that use this term or any of its descendants should be returned. - 2. When an enumerated value is used, the assumption is that annotations that use this enumerated value or any - of its children should be returned. - 1. For example, if a query asks for "biolink:object_aspect_qualifier" = "abundance", - then, aspects matching any child of "abundance" should also be returned (if the other qualifiers used in this - query are also satisfied). + 2. When an Biolink enum value is used, the assumption is that edges that use this value or any + of its descendants should be returned. + 1. For example, if a query asks for "biolink:object_aspect_qualifier" = "abundance", then edges with object_aspect_qualifier matching any descendant of "abundance" should also be returned. ## Biolink Qualifiers Examples @@ -31,7 +21,7 @@ These rules can not be enforced in the schema for TRAPI, but should be implement ### Object qualifiers _“Bisphenol A results in decreased degradation of ESR1 protein”_ -``` +```yaml subject: CHEBI:33216 # Bisphenol A predicate: biolink:affects qualified_predicate: biolink:causes @@ -39,6 +29,7 @@ object: NCBIGene:2099 # ESR1 object_aspect_qualifier: degradation object_direction_qualifier: decreased ``` + * [object_qualifiers.json](../DataExamples/object_qualifiers.json) Note: the predicate chosen should reflect the relationship between the subject and the object, and is not required @@ -47,7 +38,7 @@ not causal. _"Bisphenol A is associated with decreased degradation of ESR1 protein"_ -``` +```yaml subject: CHEBI:33216 # Bisphenol A predicate: biolink:associated_with object: NCBIGene:2099 # ESR1 @@ -59,8 +50,8 @@ object_direction_qualifier: decreased ### Subject and object qualifiers _“Methionine deficiency results in increased expression of ADRB2”_ -``` -subject: "CHEBI:16811", # methionine +```yaml +subject: "CHEBI:16811" # methionine subject_aspect_qualifier: abundance subject_direction_qualifier: decreased predicate: biolink:affects @@ -74,7 +65,7 @@ object_direction_qualifier: increased _"Fenofibrate is an agonist of PPARA protein"_ -``` +```yaml subject: "CHEBI:5001" # Fenofibrate predicate: biolink:affects qualified_predicate: biolink:causes @@ -86,10 +77,9 @@ causal_mechanism_qualifier: agonism ### Complex statement -_"The protein ser/thr kinase activator activity of Ras85D in the plasma membrane directly positively regulates MAPKKK -activity of Raf in the cytoplasm within the EGFR signaling pathway"_ +_"The protein ser/thr kinase activator activity of Ras85D in the plasma membrane directly positively regulates MAPKKK activity of Raf in the cytoplasm within the EGFR signaling pathway"_ -``` +```yaml subject: FB:FBgn0003205 # Dmel Ras85D subject_aspect_qualifier: GO:0043539 # protein ser/thr kinase activator activity subject_context_qualifier: GO:0005886 # plasma membrane @@ -101,6 +91,7 @@ object_context_qualifier: GO:0005737 #cytoplasm object_direction_qualifier: increased pathway_context_qualifier: GO:0038134 # ERBB2-EGFR signaling pathway ``` + Please note, pathway_context_qualifier is still under discussion in the Biolink Model. If you are trying to represent GO-CAMs, please contact the Biolink Model team for more information. diff --git a/ImplementationGuidance/Specifications/query_specification.md b/ImplementationGuidance/Specifications/query_specification.md index 72491991..b3506218 100644 --- a/ImplementationGuidance/Specifications/query_specification.md +++ b/ImplementationGuidance/Specifications/query_specification.md @@ -8,52 +8,48 @@ The terms MUST, SHOULD, MAY are used as defined in RFC 2119 https://tools.ietf. ## /asyncquery - Knowledge Providers (KPs) MAY implement /asyncquery - Autonomous Reasoning Agents (ARAs) SHOULD implement /asyncquery -- The /asyncquery endpoint SHOULD be left in an OpenAPI definition for a TRAPI endpoint even if - if it is not implemented, since it is part of the TRAPI core schema - Each TRAPI server MUST indicate with true or false if the /asyncquery endpoint is implemented by the server via the x-trapi asyncquery property as found in the TRAPI core schema template. ## QNode.ids -- MAY be null, or MAY be missing. The meaning is the same. +- MAY be absent. - MUST NOT be an empty array (#199) -- If more than one element is present, the elements MUST be treated accoring to Qnode.set_interpretation. +- If more than one element is present, the elements MUST be treated according to QNode.set_interpretation. - The list SHOULD NOT be used by the client to provide equivalent CURIEs to the server - If the server considers a subset of items in the list as equivalent CURIEs, the server SHOULD merge the subset into a single KnowledgeGraph Node ## QNode.set_interpretation -- MAY be null, or MAY be missing. If null or missing, the default is "BATCH". +- MAY be absent; then the default is "BATCH". - MUST be one of the following values: "BATCH", "ALL", "MANY", or "COLLATE" - If set_interpretation is "BATCH", each CURIE in the ids list is treated independently. Results MUST include answers for each queried CURIE separately. - If set_interpretation is "ALL", all specified CURIEs MUST appear in each Result. Multiple CURIEs are combined into a set, and the ids field MUST hold a single UUID representing this set, with individual members in member_ids. - If set_interpretation is "MANY", member CURIEs MUST form one or more sets in the Results. Multiple CURIEs are combined into a set, and the ids field MUST hold a single UUID representing this set, with individual members in member_ids. Sets with more members are generally considered more desirable than sets with fewer members. -- If set_interpretation is "COLLATE", it MAY only be used when no ids are provided (QNode.ids is null or missing). Multiple matching nodes MUST be collated into a single Result rather than separated into separate Results. +- If set_interpretation is "COLLATE", it MAY only be used when no ids are provided (QNode.ids is absent). Multiple matching nodes MUST be collated into a single Result rather than put into separate Results. ## QNode.categories -- MAY be null, or MAY be missing. The meaning is the same: matching Nodes may be any category -- If QNode.categories is [ 'biolink:NamedThing' ], it means matching Nodes may be any category - (any descendent biolink category NamedThing) +- MAY be absent, meaning the matching Nodes may be any category +- If QNode.categories is `[ 'biolink:NamedThing' ]`, it means matching Nodes may be any category + (any descendant of Biolink base category NamedThing) - MUST NOT be an empty array (#199) -- If more than one element is present, the elements MUST be treated in the sense of an "or" list. +- If more than one element is present, the elements MUST be treated in the sense of an "OR" list. Matching Nodes may be any of the listed QNode.categories -- Biolink category descendents do not need to be specified separately. Queries MUST automatically - match descendents. (e.g. QNode.categories is [ 'biolink:BiologicalEntity' ], then the KP MUST return - Nodes with category biolink:Protein and biolink:Disease if present) -- IF a QNode has non-null QNode.ids (CURIEs), the client SHOULD NOT provide QNode.categories, and - the server SHOULD NOT require that categories are provided to function, and the server MAY provide - different answers for different provided categories. +- Biolink category descendants do not need to be specified separately. Queries MUST automatically + match descendants. (e.g. QNode.categories is `[ 'biolink:BiologicalEntity' ]`, then the KP MUST return + Nodes with descendant categories like biolink:Protein or biolink:Disease) +- IF a QNode has QNode.ids (CURIEs), the client SHOULD NOT provide QNode.categories. ## QEdge.predicates -- MAY be null, or MAY be missing. The meaning is the same. +- MAY be absent. - MUST NOT be an empty array (#199) -- If more than one element is present, the elements MUST be treated in the sense of an "or" list. +- If more than one element is present, the elements MUST be treated in the sense of an "OR" list. Matching Edges may be any of the listed QEdge.predicates. This effectively creates a simple batch query mechanism where the response may contain multiple edges, where each one matches at least one of the specified QEdge.predicates. -- Biolink predicate descendents do not need to be specified separately. Queries MUST automatically - match descendents. (e.g. QEdge.predicates is [ 'biolink:regulates' ], then the KP MUST return - Edges with biolink:positively_regulates and biolink:negatively_regulates if present) +- Biolink predicate descendants do not need to be specified separately. Queries MUST automatically + match descendants. (e.g. QEdge.predicates is `[ 'biolink:has_participant' ]`, then the KP MUST return + Edges with descendant predicates like biolink:has_substrate and biolink:has_input) ## QNode.xxxxx - If a server receives a property on a QNode that it does not recognize, it SHOULD generate @@ -65,26 +61,26 @@ The terms MUST, SHOULD, MAY are used as defined in RFC 2119 https://tools.ietf. ## QNode.constraints - If a KP server receives any QNode.constraints, if it does not support all of them, - it MUST immediately respond with an error Code "UnsupportedConstraint" and list + it MUST immediately respond with an error code "UnsupportedConstraint" and list all the specified constraint names that it does not support. - If an ARA server receives any QNode.constraints, it MUST perform one of the following: - Relay all constraints to its KP(s) to satisfy - Withhold one or more constraints from its KP queries and satisfy those constraints itself -- An ARA server MUST ensure that all constraints are satisifed by either trusting its KPs to satisfy them - or by performing the constraining itself. If the ARA cannot ensure this, - it MUST immediately respond with an error Code "UnsupportedConstraint" and list all constraint +- An ARA server MUST ensure that all constraints are satisfied by either trusting its KPs to satisfy them + or by implementing the constraints itself. If the ARA cannot ensure this, + it MUST immediately respond with an error code "UnsupportedConstraint" and list all constraint names that it does not support. ## QEdge.constraints - If a KP server receives any QEdge.constraints, it MUST only return edges that are compatible with the constraints. If a KP server receives a query that contains QEdge - constraints that it does not support yet, it MUST immediately respond with an error Code "UnsupportedConstraint" and list all the specified constraints that it does not support. + constraints that it does not support yet, it MUST immediately respond with an error code "UnsupportedConstraint" and list all the specified constraints that it does not support. - If a KP server receives any QEdge.constraints.qualifiers, it MUST NOT return any edges that don't have qualifiers. - If an ARA server receives any QEdge.constraints, it MUST relay all QEdge.constraints to its KP(s) to satisfy. - See [QEdge constraints specification for details](qedge_constraints_specification.md). +See [QEdge constraints specification for details](qedge_constraints_specification.md). ## info.x-trapi.batch_size_limit - This batch size limit refers to the maximum length of any single QNode.ids list. The limit @@ -95,10 +91,10 @@ The terms MUST, SHOULD, MAY are used as defined in RFC 2119 https://tools.ietf. ## Specifying permitted and excluded sources to an ARA - The proper syntax for specifying or excluding specific knowledge sources (infores CURIEs) to an ARA MUST be done - via a `sources` constraint on a QEdge within the `constraints` object. The following is a complete Query example that disallows the + via `constraints.sources` on QEdges. The following is a complete Query example that disallows the use of SemMedDB: -``` +```json { "message": { "query_graph": { @@ -107,7 +103,7 @@ The terms MUST, SHOULD, MAY are used as defined in RFC 2119 https://tools.ietf. "object": "n0", "subject": "n1", "predicates": [ - "biolink:entity_negatively_regulates_entity" + "biolink:affects" ], "constraints": { "sources": { @@ -139,8 +135,8 @@ The terms MUST, SHOULD, MAY are used as defined in RFC 2119 https://tools.ietf. } ``` -An allowlist with `behavior` set to "ALLOW" requires at least one of the specified infores CURIEs to be present in the sources of bound edges: -``` +An allowlist (`behavior` set to "ALLOW") requires at least one of the specified infores CURIEs to be present in the `sources` of bound edges: +```json "constraints": { "sources": { "behavior": "ALLOW", @@ -152,8 +148,8 @@ An allowlist with `behavior` set to "ALLOW" requires at least one of the specifi }, ``` -A denylist with `behavior` set to "DENY" excludes all of the specified infores CURIEs from the sources of bound edges: -``` +A denylist (`behavior` set to "DENY") excludes all of the specified infores CURIEs from the `sources` of bound edges: +```json "constraints": { "sources": { "behavior": "DENY", diff --git a/ImplementationGuidance/Specifications/retrieval_provenance_specification.md b/ImplementationGuidance/Specifications/retrieval_provenance_specification.md index 32e78e93..de3c64ac 100644 --- a/ImplementationGuidance/Specifications/retrieval_provenance_specification.md +++ b/ImplementationGuidance/Specifications/retrieval_provenance_specification.md @@ -1,17 +1,18 @@ -# A TRAPI Attribute Specification for Source Retrieval Provenance +# A TRAPI Specification for Source Retrieval Provenance ## Overview -"Source retrieval provenance" describes the set of Information Resources through which the knowledge expressed in an Edge was passed, through various retrieval and/or transform operations, on its way to its current serialized form. For example, the provenance of a Gene-Chemical Edge in a message sent to a Translator ARA (e.g. ARAGORN) might be traced through the Translator KP that provided it (e.g. MolePro), one or more intermediate aggregator resources (e.g. ChEMBL), and back to the resource that originally created/curated it (e.g. ClinicalTrials.org). +"Source retrieval provenance" describes the set of Information Resources through which the knowledge expressed in an Edge was passed, through various retrieval and/or transform operations, on its way to its current serialized form. For example, the provenance of a Gene-Chemical Edge in a message sent by a Translator ARA (e.g. ARAGORN) might be traced through the Translator KP that provided it (e.g. MolePro), one or more intermediate aggregator resources (e.g. ChEMBL), and back to the resource that originally created/curated it (e.g. ClinicalTrials.org). ```` ARAGORN --retrieved_from--> MolePro --retrieved_from--> ChEMBL --retrieved_from--> ClinicalTrials.gov ```` -Note that source retrieval provenance concerns the **mechanical retrieval and transformation of data between web accessible information systems**. It does not trace the source of knowledge back to specific publications or data sets. And it is not concerned with the reasoning, inference or analysis activities that generate knowledge in the first place. These types of provenance are handled by a different set of properties in the EPC model (e.g. see the ‘Supporting Publications Specification’ [here]([url](https://github.com/NCATSTranslator/ReasonerAPI/blob/master/ImplementationGuidance/Specifications/supporting_publications_specification.md))). + +Note that source retrieval provenance concerns the **mechanical retrieval and transformation of data between web-accessible information systems**. It does not trace the source of knowledge back to specific publications or data sets. And it is not concerned with the reasoning, inference or analysis activities that generate knowledge in the first place. These types of provenance are handled by a different set of properties in the EPC model (e.g. see the ‘Supporting Publications Specification’ [here](supporting_publications_specification.md)). ## The Model -While the TRAPI schema uses the generic Attribute class for representing nearly all metadata about Edges in knowledge graphs, metadata about **source retrieval provenance** is an exception - given the need to efficently find and parse this information for purposes of edge merging and debugging. A complete specification will be provided here soon. This early draft provies a brief overview of the model itself, guidance and conventions for implementing the model, and a few data examples to follow. +While the TRAPI schema uses the generic Attribute class for representing nearly all metadata about Edges in knowledge graphs, metadata about **source retrieval provenance** is an exception - given the need to efficiently find and parse this information for purposes of edge merging and debugging. A complete specification will be provided here soon. This early draft provies a brief overview of the model itself, guidance and conventions for implementing the model, and a few data examples to follow. -The diagram below shows the classes and properties defined in the [TRAPI schema]([url](https://github.com/NCATSTranslator/ReasonerAPI/blob/master/TranslatorReasonerAPI.yaml#L1107)) to support representation of source retrieval provenance metadata. +The diagram below shows the classes and properties defined in the [TRAPI schema](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/docs/reference.md#edge-) to support representation of source retrieval provenance metadata. ![image](https://github.com/NCATSTranslator/ReasonerAPI/assets/5184212/840b8061-2fe4-4e15-968f-97cd87de22ab) @@ -21,19 +22,19 @@ The `Edge.sources` property contains one or more `RetrievalSource` objects - whi | ------------- | ------------- | | resource_id | (required) The CURIE for an InformationResource that served as a source of knowledge expressed in an Edge, or a source of data used to generate this knowledge. | | resource_role | (required) The role played by the InformationResource in serving as a source for an Edge (primary_knowledge_source, aggregator_knowledge_source, supporting_data_source). | -| upstream_resource_ids| (optional) An upstream InformationResource from which the resource being described directly retrieved a record of the knowledge expressed in the Edge, or data used to generate this knowledge. | -| source_record_urls | (optional) A URL linking to a specific web page or document provided by the source that contains a record of the knowledge expressed in the Edge. | +| upstream_resource_ids | (optional) An upstream InformationResource from which the resource being described directly retrieved a record of the knowledge expressed in the Edge, or data used to generate this knowledge. | +| source_record_urls | (optional) URL(s) linking to specific web pages or documents provided by the source that contains a record of the knowledge expressed in the Edge. | ## Implementation Guidance -A quick guide for implementers. Using the model describd above: +A quick guide for implementers. Using the model described above: -1. All Edges MUST report **one and only one** Retrieval Source serving as the `primary knowledge source`. +1. All Edges MUST report **one and only one** Retrieval Source serving as the `primary_knowledge_source`. -2. All Edges MUST provide a list of any Retrieval Sources that served as `aggregator knowledge sources` by retrieving the knowledge expressed in the Edge from the priamry source of another aggregator. +2. All Edges MUST provide a list of any Retrieval Sources that served as `aggregator_knowledge_source`s by retrieving the knowledge expressed in the Edge from the primary source or another aggregator. -3. All Edges representing knowledge generated through analysis of data by a Translator Knoledge Provider (KP) SHOULD report any Retrieval Sources providing the data that they operated on as a `supporting data source`. -4. Values of the `RetrievalSource.resource_id` MUST be an CURIE from the InfoRes Catalog [here]([url](https://github.com/biolink/biolink-model/blob/master/infores_catalog.yaml)) (e.g. “infores:dgidb”, “infores:molepro”) +3. All Edges representing knowledge generated through analysis of data by a Translator Knowledge Provider (KP) SHOULD report any Retrieval Sources providing the data that they operated on as a `supporting_data_source`. +4. Values of the `RetrievalSource.resource_id` MUST be an CURIE from the InfoRes Catalog [here](https://github.com/biolink/information-resource-registry) (e.g. “infores:dgidb”, “infores:molepro”) ## Data Examples @@ -41,95 +42,85 @@ A quick guide for implementers. Using the model describd above: Below we provide JSON data examples illustrating two retrieval scenarios. **Scenario 1**: Knowledge retrieval from a single external knowledge source -A single Edge originates in primary source KS1, and is retrieved through multiple aggregators ending with the UI. Along the way, ARA1 merges the two edges retrieved from KP1 and KP1.  +A single Edge originates in primary source KS1, and is retrieved through multiple aggregators ending with the UI. Along the way, ARA1 merges the two edges retrieved from KP1 and KP2.  ![image](https://github.com/NCATSTranslator/ReasonerAPI/assets/5184212/39f08657-f4a5-4410-b2c4-244a9558ef4b) + *KS = an external Knowledge Source. KP = a Translator Knowledge Provider.  ARA = a Translator Automated Reasoning Agent, UI = the Translator User Interface. Each arrow in the diagram below (R1-R5) represents the distinct retrieval of one edge.* - ```` + ````json { - "edges": { "subject": "RXCUI:1544384", "predicate": "biolink:treats", "object": "MONDO:0008383", "sources": [ - "type": biolink:RetrievalSource, + { "resource_id": "infores:KS_1", - "resource_role": "primary knowledge source", + "resource_role": "primary_knowledge_source", }, { # R1 - "type": biolink:RetrievalSource "resource_id": "infores:KP_1", - "resource_role": "aggregator knowledge source", + "resource_role": "aggregator_knowledge_source", "usptream_resource_ids": ["infores:KS_1"] }, { # R2 - "type": biolink:RetrievalSource, "resource_id": "infores:KP_2", - "resource_role": "aggregator knowledge source", + "resource_role": "aggregator_knowledge_source", "usptream_resource_ids": ["infores:KS_1"] }, { # R3, R4 - "type": biolink:RetrievalSource, "resource_id": "infores:ARA1", - "resource_role": "aggregator knowledge source", + "resource_role": "aggregator_knowledge_source", "usptream_resource_ids": ["infores:KP_1", "infores:KP_2"] }, { # R5 - "type": biolink:RetrievalSource, "resource_id": "infores:UI", - "resource_role": "aggregator knowledge source", + "resource_role": "aggregator_knowledge_source", "usptream_resource_ids": ["infores:ARA_1"] }, ] } ```` -**Scenario 2:** Retrieveal of knowledge generated by a KP from data -In this scenario, the knoweldge expressed in the Edge being retrieved was originally generated by a KP based on on analysis of data it retrieved from upstream data sources. This is often the case for KPs like ICEES, COHD, and Multiomics KP that generate Edges reporting statistical corelations between variables in clinical, environmntal, or multiomics datasets. +**Scenario 2:** Retrieval of knowledge generated by a KP from data +In this scenario, the knowledge expressed in the Edge being retrieved was originally generated by a KP based on on analysis of data it retrieved from upstream data sources. This is often the case for KPs like ICEES, COHD, and Multiomics KP that generate Edges reporting statistical correlations between variables in clinical, environmental, or multiomics datasets. -In the scenario diagrammed below, data from two soruces (DB1, DB2) is retrieved by KP1, where the data is analyzed to generate an Edge. This makes KP1 the "primary source" of the knowledge, and DB1 and DB2 "supporting data sources". ARA1 then retrieves this edges from KP1 and then passes it along to the UI. +In the scenario diagrammed below, data from two sources (DB1, DB2) is retrieved by KP1, where the data is analyzed to generate an Edge. This makes KP1 the "primary source" of the knowledge, and DB1 and DB2 "supporting data sources". ARA1 then retrieves this edges from KP1 and then passes it along to the UI. ![image](https://github.com/NCATSTranslator/ReasonerAPI/assets/5184212/40cce738-1235-4ab3-8628-fca92e348761) + *DB = an external data source. KP = a Translator Knowledge Provider.  ARA = a Translator Automated Reasoning Agent, UI = the Translator User Interface. -Each arrow (R1-R5) represents a distinct retrieval event (grey arrows/text indicates the retrieval of *data* rather than knowledge).* +Each arrow (R1-R5) represents a distinct retrieval event (grey arrows/text indicates the retrieval of **data** rather than knowledge).* -```` - { - "edges": { - "id": "e21aa4542" +````json + { "subject": "RXCUI:1544384", "predicate": "biolink:correlated_with", "object": "MONDO:0008383", "sources": [ { - "type": biolink:RetrievalSource, "resource_id": "infores:DB_1", - "resource_role": "supporting data source", + "resource_role": "supporting_data_source", }, { - "type": biolink:RetrievalSource, "resource_id": "infores:DB_2", - "resource_role": "supporting data source", + "resource_role": "supporting_data_source", }, - { - "type": biolink:Source, # R1, R2 + { # R1, R2 "resource_id": "infores:KP_1", - "resource_role": "primary knowledge source", - "upstreams_resource_ids": ["infores:DB_1", "infores:DB_2"] + "resource_role": "primary_knowledge_source", + "upstream_resource_ids": ["infores:DB_1", "infores:DB_2"] }, - { # R3 - "type": biolink:RetrievalSource, + { # R3 "resource_id": "infores:ARA_1", - "resource_role": "aggregator data source", - "upstreams_resource_ids": ["infores:KP_1"] + "resource_role": "aggregator_knowledge_source", + "upstream_resource_ids": ["infores:KP_1"] }, - { # R4 - "type": biolink:RetrievalSource, + { # R4 "resource_id": "infores:UI", - "resource_role": "aggregator data source", - "upstreams_resource_ids": ["infores:ARA_1"] + "resource_role": "aggregator_knowledge_source", + "upstream_resource_ids": ["infores:ARA_1"] }, ] } diff --git a/ImplementationGuidance/Specifications/supporting_publications_specification.md b/ImplementationGuidance/Specifications/supporting_publications_specification.md index 6febbfb9..a6ff3b01 100644 --- a/ImplementationGuidance/Specifications/supporting_publications_specification.md +++ b/ImplementationGuidance/Specifications/supporting_publications_specification.md @@ -7,25 +7,26 @@ publications (broadly defined here to include any document made available for pu a declared Edge. The Biolink Model describes the `biolink:publications` attribute as follows: + ```yaml -publications: - aliases: ["supporting publications", "supporting documents"] - is_a: association slot - description: >- - A list of one or more publications that report the statement expressed in an Association, - or provide information used as evidence supporting this statement. - The notion of a ‘Publication’ is considered broadly to include any document made - available for public consumption. It covers scientific journal issues, individual articles, and - books - as well as things like pre-prints, white papers, patents, drug - labels, web pages, protocol documents, and even a part of a publication if - of significant knowledge scope (e.g. a figure, figure legend, or section - highlighted by NLP). - range: publication + publications: + aliases: ['supporting publications', 'supporting documents'] + description: >- + One or more publications that report the statement expressed in an + Association, or provide information used as evidence supporting this statement. + comments: >- + The notion of a ‘Publication’ is considered broadly to include any + document made available for public consumption. It covers journal issues, + individual articles, and books - and also things like article pre-prints, + white papers, patents, drug labels, web pages, protocol documents, etc. + is_a: association slot + multivalued: true + range: publication ``` ## Implementation Guidance -1. When a knowledge source reports one or more publication supporting an Edge, KPs MUST use the `biolink:publications` edge property as the `Attribute.attribute_type_id` field, and capture publications as a list in the `Attribute.value` field. e.g.: +1. When a knowledge source reports one or more publications supporting an Edge, KPs MUST use the `biolink:publications` edge property as the `Attribute.attribute_type_id` field, and capture publications as a list in the `Attribute.value` field. e.g.: "attribute_type_id": "biolink:publications", "value": ["PMID:31737390", "PMID:29076384"] @@ -33,7 +34,7 @@ publications: 2. Knowledge sources typically designate supporting publications using a **[CURIE](https://www.w3.org/TR/2010/NOTE-curie-20101216/)** or full **[URI/URL](https://www.w3.org/Addressing/)**, but may in some cases provide only a **free-text string** title or description. Specific syntax and reporting requirements apply to each designator type: - a. When a knowledge source provides a **CURIE** for a publication, the ingesting KP MUST ensure that its prefix matches the **spelling and casing** defined in the Biolink Model [prefix map](https://github.com/biolink/biolink-model/blob/master/prefix-map/biolink-model-prefix-map.json) - and make adjustments as necessary. (e.g "PMID" not "pmid", "doi" not "DOI"). + a. When a knowledge source provides a **CURIE** for a publication, the ingesting KP MUST ensure that its prefix matches the **spelling and casing** defined in the Biolink Model [prefix map](https://github.com/biolink/biolink-model/blob/master/src/biolink_model/prefixmaps/biolink-model-prefix-map.json) - and make adjustments as necessary. (e.g "PMID" not "pmid", "doi" not "DOI"). b. When a knowledge source provides a **URL** for a publication, the ingesting KP MUST report the full URL **EXCEPT** in cases where it contains a Pubmed, Pubmed Central (Europe or NLM), or DOI identifier. Here, the KP MUST convert the URL into CURIE form, e.g.: @@ -42,20 +43,18 @@ publications: http://europepmc.org/articles/PMC6246007 → PMC:6246007 https://doi.org/10.1080/17512433.2018.1398644 → doi:0.1080/17512433.2018.1398644 - c. When a knowledge source provides a **free-text description** of a supporting publication (e.g. its title, or a bibliographic reference), the ingesting KP MAY capture this text they see fit. + c. When a knowledge source provides a **free-text description** of a supporting publication (e.g. its title, or a bibliographic reference), the ingesting KP MAY capture this text as they see fit. 3. If a knowledge source reports **multiple publications supporting a single Edge**, the ingesting KP SHOULD organize them into Attribute objects according to the specific instructions below. a. When all publications supporting the Edge are reported **in CURIE or URI/URL format**, the KP SHOULD capture them as a list in a single Attribute object where the `value_type_id` is "linkml:Uriorcurie": - { - "edges": [ +```json { - "id": "Association001", "subject": "CHEBI:3215", "predicate": "biolink:interacts_with", - "object": "NCBIGene:51176", + "object": "NCBIGene:51176", "attributes": [ { "attribute_type_id": "biolink:publications", @@ -69,16 +68,12 @@ publications: } ] } - ] - } - +``` b. When all publications supporting the Edge are reported **as free-text descriptions**, the KP SHOULD capture them as a list in a single Attribute object where the `value_type_id` is "linkml:String": - { - "edges": [ +```json { - "id": "Association001", "subject": "CHEBI:3215", "predicate": "biolink:interacts_with", "object": "NCBIGene:51176", @@ -94,15 +89,12 @@ publications: } ] } - ] - } + ``` c. When some of the publications supporting the Edge are in **CURIE/URI format** and others are **free-text**, the ingesting KP MUST **create separate 'publications' Attributes** to hold those reported in CURIE and URI format separately from those described as free-text: - { - "edges": [ +```json { - "id": "Association001", "subject": "CHEBI:3215", "predicate": "biolink:interacts_with", "object": "NCBIGene:51176", @@ -128,17 +120,17 @@ publications: } ] } - ] - } +``` - **NOTE** that the requirement level 'SHOULD' is used above because KPs MAY choose at any time to **separate an individual publication into its own Attribute**, if they wish to provide specific information about it using Attribute fields (e.g. `description`), or using a nested Attribute object. 4. If a knowledge source provides **multiple identifiers for a single publication supporting an Edge** (e.g. a PMID, PMCID, and DOI for the same journal article), KPs MUST report only one identifier per publication, in the following order of preference: PMID > PMCID > PMC > DOI. -5. KPs can expect consumers to obtain **metadata about supporting journal articles** that are index by Pubmed (e.g. title, journal, abstract, dates, equivalent identifiers), from the Text Mining Knowledge Provider’s Publication Metadata API. However, the Knowledge Providers MAY use the `Attribute.description` and `Attribute.value_url` fields to provide additional metadata in the TRAPI message itself. +5. KPs can expect consumers to obtain **metadata about supporting journal articles** that are indexed by Pubmed (e.g. title, journal, abstract, dates, equivalent identifiers). However, the Knowledge Providers MAY use the `Attribute.description` and `Attribute.value_url` fields to provide additional metadata in the TRAPI message itself. 6. Finally, in the short term KPs can continue the current practice of including references to **supporting clinical trial records** alongside references to publications in Attribute objects using the `biolink:publications` edge property. Trial identifiers from clinicaltrials.gov MUST be reported in CURIE format using the prefix "clinicaltrials" (e.g. "clinicaltrials:NCT00222573"). +```json { "attribute_type_id": "biolink:publications", "value": [ @@ -147,38 +139,39 @@ publications: "clinicaltrials:NCT00222573", "clinicaltrials:NCT00503152", "clinicaltrials:NCT00634963" - ] + ], "value_type_id": "biolink:Uriorcurie", "value_urls": "https://clinicaltrials.gov/search?id=%22NCT02658760%22OR%22NCT02679560%22OR%22NCT05084573%22", "attribute_source": "infores:chembl" }, +``` - - **NOTE** however that we will soon be moving to **use of a new `supporting_studies` Edge property** to capture supporting clincial trials and other types of studies in a separate Attribute from publications. A specification for this is forthcoming. + - **NOTE** however that we will soon be moving to **use of a new `supporting_studies` Edge property** to capture supporting clinical trials and other types of studies in a separate Attribute from publications. A specification for this is forthcoming. ### An Important Clarification about Retrieval Source URLs vs Supporting Publications -Above we define "publications" broadly to include any publicly available document, and include web pages in this scope. However, if a data provider wants to share web pages that display the source record from which they retrieved knowledge expressed in their edge, a URL for this web page should be captured NOT as a supporting publication per the specification above, but rather in the `RetrievalSource` object, per the [Retrieval Provenance Specification](https://github.com/NCATSTranslator/ReasonerAPI/blob/master/ImplementationGuidance/Specifications/retrieval_provenance_specification.md). +Above we define "publications" broadly to include any publicly available document, and include web pages in this scope. However, if a data provider wants to share web pages that display the source record from which they retrieved knowledge expressed in their edge, a URL for this web page should be captured NOT as a supporting publication per the specification above, but rather in the `RetrievalSource` object, per the [Retrieval Provenance Specification](retrieval_provenance_specification.md). For example, consider an edge provided by the **SRI Reference KG** connecting the BRCA2 gene to Hereditary Breast Ovarian Cancer Syndrome, with the following ClinVar record as its primary source: [image](https://github.com/NCATSTranslator/ReasonerAPI/assets/5184212/9f3be816-b7ff-4709-89bd-c7e314f67bfd) -The SRI KG wants to report the **six journal articles** that ClinVar provides as support for this statement, and also provide the **URL of the ClinVar web page** where the user can explore the source record. Technically, Biolink would consider this ClinVar web page as fitting under its broad definition of 'Publication' - and thus allow for it to be captured in an Attribute using the `publications` edge property. However, we provide a dedicated `source_record_url` property in the RetrievalSource object for reporting web pages that display the source record from which the KP retrieved knowledge expressed in their edge. +The SRI KG wants to report the **six journal articles** that ClinVar provides as support for this statement, and also provide the **URL of the ClinVar web page** where the user can explore the source record. Technically, Biolink would consider this ClinVar web page as fitting under its broad definition of 'Publication' - and thus allow for it to be captured in an Attribute using the `publications` edge property. However, we provide a dedicated `source_record_urls` property in the RetrievalSource object for reporting web pages that display the source record from which the KP retrieved knowledge expressed in their edge. So in this case, the correct way to capture the six supporting publications and the source record url would be as follows. ````json -"subject": "BRCA2" -"predicate": "associated with" -"object": "Hereditary Breast Ovarian Cancer Syndrome" +"subject": "NCBIGene:675", // BRCA2 +"predicate": "biolink:associated_with", +"object": "DOID:5683", // Hereditary Breast Ovarian Cancer Syndrome "sources": [ { "resource_id": "infores:clinvar", - "resource_role": "primary knowledge source", - "source_record_url": "https://www.ncbi.nlm.nih.gov/clinvar/variation/9342/" + "resource_role": "primary_knowledge_source", + "source_record_urls": ["https://www.ncbi.nlm.nih.gov/clinvar/variation/9342/"] }, { "resource_id": "infores:sri-reference-kg", - "resource_role": "aggregator knowledge source", - "upstream_resource_ids": "infores:clinvar" + "resource_role": "aggregator_knowledge_source", + "upstream_resource_ids": ["infores:clinvar"] } ] "attributes": [ diff --git a/docs/reference.md b/docs/reference.md index c518937c..da8c311f 100644 --- a/docs/reference.md +++ b/docs/reference.md @@ -2,7 +2,7 @@ ## Components -#### Query [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L277:L315) +#### Query [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L277:L315) The Query class is used to package a user request for information. A Query object consists of a required Message object with optional additional properties. Additional properties are intended to convey implementation-specific or query-independent parameters. For example, an additional property specifying a log level could allow a user to override the default log level in order to receive more fine-grained log information when debugging an issue. ##### Fixed Fields @@ -14,7 +14,7 @@ The Query class is used to package a user request for information. A Query objec | message | [Message](#message-) | **REQUIRED**. The query Message is a serialization of the user request. Content of the Message object depends on the intended TRAPI operation. For example, the fill operation requires a non-empty query_graph property as part of the Message, whereas other operations, e.g. overlay, require non-empty results and knowledge_graph properties. | | workflow | [workflow](#workflow-) | List of workflow steps to be executed. | -#### AsyncQuery [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L316:L339) +#### AsyncQuery [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L316:L339) The AsyncQuery class is effectively the same as the Query class but it requires a callback property. @@ -27,7 +27,7 @@ The AsyncQuery class is effectively the same as the Query class but it requires | --- | :---: | --- | | callback | `string` | **REQUIRED**. Upon completion, this server will send a POST request to the callback URL with `Content-Type: application/json` header and request body containing a JSON-encoded `Response` object. The server MAY POST `Response` objects before work is fully complete to provide interim results with a Response.status value of 'Running'. If a POST operation to the callback URL does not succeed, the server SHOULD retry the POST at least once. | -#### QueryParameters [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L340:L371) +#### QueryParameters [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L340:L371) Query-time parameters that don't affect the semantics of a query or intended workflow, but may affect overall behavior of the server in the execution of this query. The server MUST repeat the parameters it is given in its Response. ##### Fixed Fields @@ -38,7 +38,7 @@ Query-time parameters that don't affect the semantics of a query or intended wor | log_level | [LogLevel](#loglevel-) | The least critical level of logs to return. | | bypass_cache | `boolean` | Set to true in order to request that the agent obtain fresh information from its sources in all cases where it has a viable choice between requesting fresh information in real time and using cached information. The agent receiving this flag MUST also include it in TRAPI sent to downstream sources (e.g., ARS -> ARAs -> KPs). | -#### AsyncQueryResponse [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L372:L402) +#### AsyncQueryResponse [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L372:L402) The AsyncQueryResponse object contains a payload that must be returned from a submitted async_query. ##### Fixed Fields @@ -49,7 +49,7 @@ The AsyncQueryResponse object contains a payload that must be returned from a su | description | `string` | A brief human-readable description of the result of the async_query submission. | | job_id | `string` | **REQUIRED**. An identifier for the submitted job that can be used with /async_query_status to receive an update on the status of the job. | -#### AsyncQueryStatusResponse [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L403:L445) +#### AsyncQueryStatusResponse [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L403:L445) The AsyncQueryStatusResponse object contains a payload that describes the current status of a previously submitted async_query. ##### Fixed Fields @@ -61,7 +61,7 @@ The AsyncQueryStatusResponse object contains a payload that describes the curren | logs | Array\[[LogEntry](#logentry-)\] | **REQUIRED**. **Minimum items: 1.** A list of LogEntry items, containing errors, warnings, debugging information, etc. List items MUST be in chronological order with earliest first. The most recent entry should be last. Its timestamp will be compared against the current time to see if there is still activity. | | response_url | `string` | Optional URL that can be queried to restrieve the full TRAPI Response. | -#### Response [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L446:L502) +#### Response [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L446:L502) The Response object contains the main payload when a TRAPI query endpoint interprets and responds to the submitted query successfully (i.e., HTTP Status Code 200). The message property contains the knowledge of the response (query graph, knowledge graph, and results). The status, description, and logs properties provide additional details about the response. ##### Fixed Fields @@ -77,7 +77,7 @@ The Response object contains the main payload when a TRAPI query endpoint interp | schema_version | `string` | Version label of the TRAPI schema used in this document | | biolink_version | `string` | Version label of the Biolink model used in this document | -#### Message [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L503:L546) +#### Message [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L503:L546) The message object holds the main content of a Query or a Response in three properties: query_graph, results, and knowledge_graph. The query_graph property contains the query configuration, the results property contains any answers that are returned by the service, and knowledge_graph property contains lists of edges and nodes in the thought graph corresponding to this message. The content of these properties is context-dependent to the encompassing object and the TRAPI operation requested. ##### Fixed Fields @@ -89,7 +89,7 @@ The message object holds the main content of a Query or a Response in three prop | knowledge_graph | [KnowledgeGraph](#knowledgegraph-) | KnowledgeGraph object that contains lists of nodes and edges in the thought graph corresponding to the message | | auxiliary_graphs | Map\[`string`, [AuxiliaryGraph](#auxiliarygraph-)\] | **Minimum properties: 1.** Dictionary of AuxiliaryGraph instances that are used by Knowledge Graph Edges and Result Analyses. These are referenced elsewhere by the dictionary key. | -#### LogEntry [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L547:L584) +#### LogEntry [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L547:L584) The LogEntry object contains information useful for tracing and debugging across Translator components. Although an individual component (for example, an ARA or KP) may have its own logging and debugging infrastructure, this internal information is not, in general, available to other components. In addition to a timestamp and logging level, LogEntry includes a string intended to be read by a human, along with one of a standardized set of codes describing the condition of the component sending the message. ##### Fixed Fields @@ -101,7 +101,7 @@ The LogEntry object contains information useful for tracing and debugging across | code | `string` | One of a standardized set of short codes e.g. QueryNotTraversable, KPNotAvailable, KPResponseMalformed | | message | `string` | **REQUIRED**. A human-readable log message | -#### LogLevel [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L585:L592) +#### LogLevel [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L585:L592) Logging level `string` @@ -113,7 +113,7 @@ one of: - INFO - DEBUG -#### Result [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L593:L623) +#### Result [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L593:L623) A Result object specifies the nodes and edges in the knowledge graph that satisfy the structure or conditions of a user-submitted query graph. It must contain a NodeBindings object (list of query graph node to knowledge graph node mappings) and a list of Analysis objects. ##### Fixed Fields @@ -123,7 +123,7 @@ A Result object specifies the nodes and edges in the knowledge graph that satisf | node_bindings | Map\[`string`, [NodeBinding](#nodebinding-)\] | **REQUIRED**. **Minimum properties: 1.** The dictionary of input QNodes to KnowledgeGraph Node bindings where the dictionary keys are the key identifiers of the QNodes and the associated values of those keys are instances of NodeBinding schema type (see below). Because a given QNode may have multiple KnowledgeGraph Nodes bound in the result, the NodeBinding object may list multiple KnowledgeGraph Nodes. | | analyses | Array\[[Analysis](#analysis-)\] | **Minimum items: 1.** The list of all Analysis components that contribute to the result. See below for Analysis components. | -#### NodeBinding [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L624:L643) +#### NodeBinding [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L624:L643) A NodeBinding object defines all relevant KnowledgeGraph Node mappings, identified by the corresponding object key identifier(s) of the Node(s) within the Knowledge Graph. Instances of NodeBinding may include extra annotation in the form of additional properties. (such annotation is not yet fully standardized). Each Node Binding must bind directly to node in the original Query Graph. ##### Fixed Fields @@ -132,7 +132,7 @@ A NodeBinding object defines all relevant KnowledgeGraph Node mappings, identifi | --- | :---: | --- | | ids | Array\[[CURIE](#curie-)\] | **REQUIRED**. **Minimum items: 1.** The CURIEs of one or more Nodes within the Knowledge Graph. | -#### Analysis [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L644:L714) +#### Analysis [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L644:L714) An Analysis is a dictionary that contains information about the result tied to a particular service. Each Analysis is generated by a single reasoning service, and describes the outputs of analyses performed by the reasoner on a particular Result (e.g. a result score), along with provenance information supporting the analysis (e.g. method or data that supported generation of the score). ##### Fixed Fields @@ -147,7 +147,7 @@ An Analysis is a dictionary that contains information about the result tied to a | scoring_method | `string` | An identifier and link to an explanation for the method used to generate the score. | | attributes | Array\[[Attribute](#attribute-)\] | The attributes of this particular Analysis. | -#### EdgeBinding [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L715:L734) +#### EdgeBinding [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L715:L734) An EdgeBinding object defines all relevant KnowledgeGraph Edge mappings, identified by the corresponding 'id' object key identifier of the Edge within the knowledge graph. Instances of EdgeBinding may include extra annotation (such annotation is not yet fully standardized). EdgeBindings are captured within a specific reasoner's Analysis object because the Edges in the KnowledgeGraph that get bound to the input QueryGraph may differ between reasoners. ##### Fixed Fields @@ -156,7 +156,7 @@ An EdgeBinding object defines all relevant KnowledgeGraph Edge mappings, identif | --- | :---: | --- | | ids | Array\[`string`\] | **REQUIRED**. **Minimum items: 1.** The key identifiers of specific KnowledgeGraph Edges. | -#### PathBinding [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L735:L751) +#### PathBinding [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L735:L751) A PathBinding object binds a single QueryGraph path (the key to this object) to one or more relevant AuxiliaryGraph ids containing a list of edges in the path. The AuxiliaryGraph does not convey any order of edges in the path. ##### Fixed Fields @@ -165,7 +165,7 @@ A PathBinding object binds a single QueryGraph path (the key to this object) to | --- | :---: | --- | | ids | Array\[`string`\] | **REQUIRED**. **Minimum items: 1.** The key identifiers of specific auxiliary graphs. | -#### AuxiliaryGraph [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L752:L777) +#### AuxiliaryGraph [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L752:L777) A single AuxiliaryGraph instance that is used by KnowledgeGraph Edges, Result Analysis support graphs, and PathBindings. Edges comprising an AuxiliaryGraph are a subset of the KnowledgeGraph in the message. Data creators can create an AuxiliaryGraph to assemble a specific collection of edges from the KnowledgeGraph into a named graph that can be referenced from an Edge as evidence/explanation supporting that Edge, from a Result Analysis as information used to generate a score, or from a PathBinding as the path for that Analysis. ##### Fixed Fields @@ -174,7 +174,7 @@ A single AuxiliaryGraph instance that is used by KnowledgeGraph Edges, Result An | --- | :---: | --- | | edges | Array\[`string`\] | **REQUIRED**. **Minimum items: 1.** List of edges that form the AuxiliaryGraph. Each item is a reference to a single KnowledgeGraph Edge. This list is not ordered, nor is the order intended to convey any relationship between the edges that form this AuxiliaryGraph. | -#### KnowledgeGraph [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L778:L804) +#### KnowledgeGraph [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L778:L804) The knowledge graph associated with a set of results. The instances of Node and Edge defining this graph represent instances of biolink:NamedThing (concept nodes) and biolink:Association (relationship edges) representing (Attribute) annotated knowledge returned from the knowledge sources and inference agents wrapped by the given TRAPI implementation. ##### Fixed Fields @@ -184,7 +184,7 @@ The knowledge graph associated with a set of results. The instances of Node and | nodes | Map\[`string`, [Node](#node-)\] | **REQUIRED**. Dictionary of Node instances used in the KnowledgeGraph, referenced elsewhere in the TRAPI output by the dictionary key. | | edges | Map\[`string`, [Edge](#edge-)\] | Dictionary of Edge instances used in the KnowledgeGraph, referenced elsewhere in the TRAPI output by the dictionary key. | -#### QueryGraph [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L805:L845) +#### QueryGraph [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L805:L845) A graph representing a biomedical question. It serves as a template for each Result (answer), where each bound knowledge graph node/edge is expected to obey the constraints of the associated QueryGraph element. ##### Fixed Fields @@ -195,7 +195,7 @@ A graph representing a biomedical question. It serves as a template for each Res | edges | Map\[`string`, [QEdge](#qedge-)\] | **Minimum properties: 1.** The edge specifications. The keys of this map are unique edge identifiers and the corresponding values include the constraints on bound edges, in addition to specifying the subject and object QNodes. | | paths | Map\[`string`, [QPath](#qpath-)\] | **Minimum properties: 1.** The QueryGraph path specification, used only for pathfinder type queries. The keys of this map are unique path identifiers and the corresponding values include the constraints on bound paths, in addition to specifying the subject, object, and intermediate QNodes. | -#### QNode [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L846:L921) +#### QNode [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L846:L921) A node in the QueryGraph used to represent an entity in a query. If no CURIEs are not specified, any nodes matching the category of the QNode will be returned in the Results. ##### Fixed Fields @@ -208,7 +208,7 @@ A node in the QueryGraph used to represent an entity in a query. If no CURIEs ar | member_ids | Array\[[CURIE](#curie-)\] | **Minimum items: 1.** A list of CURIE identifiers for members of a queried set. This property MUST be populated under a set_interpretation of MANY or ALL, when the 'ids' property holds a UUID representing the set itself. This property MUST NOT be used under a set_interpretation of BATCH or COLLATE or when set_interpretation is absent. | | constraints | Array\[[AttributeConstraint](#attributeconstraint-)\] | **Minimum items: 1.** A list of constraints applied to a query node. If there are multiple items, they must all be true (equivalent to AND) | -#### QEdge [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L922:L981) +#### QEdge [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L922:L981) An edge in the QueryGraph used as a filter pattern specification in a query. If the optional predicate property is not specified, it is assumed to be a wildcard match to the target knowledge space. If specified, the ontological inheritance hierarchy associated with the term provided is assumed, such that edge bindings returned may be an exact match to the given QEdge predicate term, or to a term that is a descendant of the QEdge predicate term. ##### Fixed Fields @@ -221,7 +221,7 @@ An edge in the QueryGraph used as a filter pattern specification in a query. If | object | `string` | **REQUIRED**. Corresponds to the map key identifier of the object concept node anchoring the query filter pattern for the query relationship edge. | | constraints | [QEdgeConstraints](#qedgeconstraints-) | An object containing all constraints placed on the QEdge. ALL edges bound to this QEdge MUST conform to ALL given constraints; underlying edges (such as those appearing in supporting graphs) are not required to conform to the given constraints. | -#### QEdgeConstraints [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L982:L1056) +#### QEdgeConstraints [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L982:L1056) A subschema for constraints that may be placed on a given QEdge. ALL edges bound to the given QEdge MUST conform to ALL given constraints; underlying edges (such as those appearing in supporting graphs) are not required to conform to the given constraints. ##### Fixed Fields @@ -234,7 +234,7 @@ A subschema for constraints that may be placed on a given QEdge. ALL edges bound | qualifiers | Array\[[QualifierSetConstraint](#qualifiersetconstraint-)\] | **Minimum items: 1.** A list of QualifierSetConstraints applied to a QEdge. If multiple QualifierSetConstraints are provided, there is an OR relationship between them. If the QEdge has multiple predicates or if the QNodes that correspond to the subject or object of this QEdge have multiple categories or multiple curies, then constraints.qualifiers MUST NOT be specified because these complex use cases are not supported at this time. | | sources | [AllowDenyConstraint](#allowdenyconstraint-) \| `object` | A list of infores CURIEs which are either allowed or denied in the sources (resource_id) of the bound Edge. If `behavior` is set to "ALLOW", ANY (at least 1) of the given infores CURIEs MUST be present. If `behavior` is set to "DENY", then ALL given infores CURIEs MUST NOT be present. | -#### AllowDenyConstraint [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L1057:L1080) +#### AllowDenyConstraint [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L1057:L1080) A list of values which are to either be allowed or denied. If `behavior` is set to "ALLOW", then ANY (at least 1) of the given values MUST appear in the constrained property in order for it to meet the constraint (OR relationship). If `behavior` is set to "DENY", then ALL of the given values MUST NOT appear in the constrained property in order for it to meet the constraint (NOT (x OR y) relationship). ##### Fixed Fields @@ -244,7 +244,7 @@ A list of values which are to either be allowed or denied. If `behavior` is set | behavior | `string` | **REQUIRED**. | | values | Array\[`string`\] | **REQUIRED**. **Minimum items: 1.** | -#### QPath [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L1081:L1129) +#### QPath [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L1081:L1129) A path in the QueryGraph used for pathfinder queries. Both subject and object MUST reference QNodes that have a CURIE in their ids property. Paths returned that bind to this QPath MUST represent some relationship between the subject and object. ##### Fixed Fields @@ -256,7 +256,7 @@ A path in the QueryGraph used for pathfinder queries. Both subject and object MU | predicates | Array\[[BiolinkPredicate](#biolinkpredicate-)\] | **Minimum items: 1.** QPath predicates are intended to convey what type of paths are desired, NOT a constraint on the types of predicates that may be in result paths. If no predicate is listed, the ARA SHOULD find paths such that the relationship represented by the path is a "related_to" relationship. These should be Biolink Model predicates and are allowed to be of type 'abstract' or 'mixin'. Use of 'deprecated' predicates SHOULD be avoided. | | constraints | Array\[[PathConstraint](#pathconstraint-)\] | **Minimum items: 1.** A list of constraints for the QPath. If multiple constraints are listed, it should be interpreted as an OR relationship. Each path returned MUST comply with at least one constraint. | -#### PathConstraint [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L1130:L1146) +#### PathConstraint [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L1130:L1146) A constraint for paths. ARAs must comply with constraints when finding paths. ##### Fixed Fields @@ -265,7 +265,7 @@ A constraint for paths. ARAs must comply with constraints when finding paths. | --- | :---: | --- | | intermediate_categories | Array\[[BiolinkEntity](#biolinkentity-)\] | **Minimum items: 1.** A list of Biolink Model categories by which to constrain paths returned. If multiple categories are listed, it should be interpreted as an AND relationship. Each path returned by ARAs MUST contain at least one node of each category listed. | -#### Node [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L1147:L1181) +#### Node [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L1147:L1181) A node in the KnowledgeGraph which represents some biomedical concept. Nodes are identified by the keys in the KnowledgeGraph Node mapping. ##### Fixed Fields @@ -277,7 +277,7 @@ A node in the KnowledgeGraph which represents some biomedical concept. Nodes are | attributes | Array\[[Attribute](#attribute-)\] | A list of attributes describing the node | | is_set | `boolean` | Indicates that the node represents a set of entities. If this property is absent, it is assumed to be false. | -#### Attribute [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L1182:L1265) +#### Attribute [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L1182:L1265) Generic attribute for a node or an edge that expands the key-value pair concept by including properties for additional metadata. These properties MAY be used to describe the source of the statement made in a key-value pair of the attribute object, or describe the attribute's value itself including its semantic type, or a url providing additional information about it. An attribute may be further qualified with sub-attributes (for example to provide confidence intervals on a value). ##### Fixed Fields @@ -293,7 +293,7 @@ Generic attribute for a node or an edge that expands the key-value pair concept | description | `string` | Human-readable description for the attribute and its value. | | attributes | Array\[[Attribute](#attribute-)\] | A list of attributes providing further information about the parent attribute (for example to provide provenance information about the parent attribute). | -#### Edge [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L1266:L1348) +#### Edge [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L1266:L1348) A specification of the semantic relationship linking two concepts that are expressed as nodes in the knowledge graph resulting from a query to the service. ##### Fixed Fields @@ -309,7 +309,7 @@ A specification of the semantic relationship linking two concepts that are expre | knowledge_level | `string` | **REQUIRED**. One of the biolink-enumerated permissible values for `knowledge level` that provides the level of knowledge the Edge represents. (See https://biolink.github.io/biolink-model/KnowledgeLevelEnum/) | | agent_type | `string` | **REQUIRED**. One of the biolink-enumerated permissible values for `agent type` that provides the kind of agent that originated the knowledge presented by the Edge. (See https://biolink.github.io/biolink-model/AgentTypeEnum/) | -#### Qualifier [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L1349:L1384) +#### Qualifier [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L1349:L1384) An additional nuance attached to an assertion ##### Fixed Fields @@ -319,7 +319,7 @@ An additional nuance attached to an assertion | qualifier_type_id | [CURIE](#curie-) | **REQUIRED**. CURIE for a Biolink 'qualifier' association slot, generally taken from Biolink association slots designated for this purpose e.g. biolink:subject_aspect_qualifier, biolink:subject_direction_qualifier, biolink:object_aspect_qualifier, etc. Such qualifiers are used to elaborate a second layer of meaning of a knowledge graph edge. Available qualifiers can be found at https://biolink.github.io/biolink-model/qualifiers.html, which mostly have slot names with the suffix string 'qualifier'. | | qualifier_value | `string` | **REQUIRED**. The value associated with the type of the qualifier, drawn from a set of controlled values by the type as specified in the Biolink model (e.g. 'expression' or 'abundance' for the qualifier type 'biolink:subject_aspect_qualifier', etc). The enumeration of qualifier values for a given qualifier type is generally constrained by the category of the edge (i.e. biolink:Association subtype). | -#### QualifierSetConstraint [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L1385:L1400) +#### QualifierSetConstraint [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L1385:L1400) A constraint on the qualifiers of a bound Edge (types and values). A given key-value pair defines the required qualifier_type_id and qualifier_value of one Qualifier, respectively. For example, a QualifierSetConstraint can constrain a "ChemicalX - affects - ?Gene" query to return only edges where ChemicalX specifically affects the 'expression' of the Gene, by constraining on the qualifier_type "biolink:object_aspect_qualifier" with a qualifier_value of "expression". Multiple type-value pairs have an AND relationship. @@ -329,19 +329,19 @@ A constraint on the qualifiers of a bound Edge (types and values). A given key-v | --- | :---: | --- | | ^biolink: | `string` | | -#### BiolinkEntity [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L1401:L1412) +#### BiolinkEntity [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L1401:L1412) Compact URI (CURIE) for a Biolink class, biolink:NamedThing or a child thereof. The CURIE must use the prefix 'biolink:' followed by the PascalCase class name. `string` (pattern: `^biolink:[A-Z][a-zA-Z]*$`) -#### BiolinkPredicate [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L1413:L1425) +#### BiolinkPredicate [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L1413:L1425) CURIE for a Biolink 'predicate' slot, taken from the Biolink slot ('is_a') hierarchy rooted in biolink:related_to (snake_case). This predicate defines the Biolink relationship between the subject and object nodes of a biolink:Association defining a knowledge graph edge. `string` (pattern: `^biolink:[a-z][a-z_]*$`) -#### CURIE [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L1426:L1435) +#### CURIE [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L1426:L1435) A Compact URI, consisting of a prefix and a reference separated by a colon, such as UniProtKB:P00738. Via an external context definition, the CURIE prefix and colon may be replaced by a URI prefix, such as https://identifiers.org/uniprot/, to form a full URI. `string` -#### MetaKnowledgeGraph [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L1436:L1463) +#### MetaKnowledgeGraph [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L1436:L1463) Knowledge-map representation of this TRAPI web service. The meta knowledge graph is composed of the union of most specific categories and predicates for each node and edge. ##### Fixed Fields @@ -351,7 +351,7 @@ Knowledge-map representation of this TRAPI web service. The meta knowledge graph | nodes | Map\[`string`, [MetaNode](#metanode-)\] | **REQUIRED**. **Minimum properties: 1.** Collection of the most specific node categories provided by this TRAPI web service, indexed by Biolink class CURIEs. A node category is only exposed here if there is node for which that is the most specific category available. | | edges | Array\[[MetaEdge](#metaedge-)\] | **REQUIRED**. List of the most specific edges/predicates provided by this TRAPI web service. A predicate is only exposed here if there is an edge for which the predicate is the most specific available. | -#### MetaNode [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L1464:L1487) +#### MetaNode [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L1464:L1487) Description of a node category provided by this TRAPI web service. ##### Fixed Fields @@ -361,7 +361,7 @@ Description of a node category provided by this TRAPI web service. | id_prefixes | Array\[`string`\] | **REQUIRED**. **Minimum items: 1.** List of CURIE prefixes for the node category that this TRAPI web service understands and accepts on the input. | | attributes | Array\[[MetaAttribute](#metaattribute-)\] | Node attributes provided by this TRAPI web service. | -#### MetaEdge [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L1488:L1552) +#### MetaEdge [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L1488:L1552) Edge in a meta knowledge map describing relationship between a subject Biolink class and an object Biolink class. ##### Fixed Fields @@ -376,7 +376,7 @@ Edge in a meta knowledge map describing relationship between a subject Biolink c | qualifiers | Array\[[MetaQualifier](#metaqualifier-)\] | **Minimum items: 1.** Qualifiers that are possible to be found on this edge type. | | association | [BiolinkEntity](#biolinkentity-) | The Biolink association type (entity) that this edge represents. Associations are classes in Biolink that represent a relationship between two entities. For example, the association 'gene interacts with gene' is represented by the Biolink class, 'biolink:GeneToGeneAssociation'. If association is filled out, then the testing harness can help validate that the qualifiers are being used correctly. | -#### MetaQualifier [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L1553:L1572) +#### MetaQualifier [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L1553:L1572) ##### Fixed Fields @@ -385,7 +385,7 @@ Edge in a meta knowledge map describing relationship between a subject Biolink c | qualifier_type_id | [CURIE](#curie-) | **REQUIRED**. The CURIE of the qualifier type. | | applicable_values | Array\[`string`\] | **Minimum items: 1.** The list of values that are possible for this qualifier. | -#### MetaAttribute [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L1573:L1610) +#### MetaAttribute [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L1573:L1610) ##### Fixed Fields @@ -397,7 +397,7 @@ Edge in a meta knowledge map describing relationship between a subject Biolink c | constraint_use | `boolean` | Indicates whether this attribute can be used as a query constraint. | | constraint_name | `string` | Human-readable name or label for the constraint concept. Required whenever constraint_use is true. | -#### AttributeConstraint [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L1611:L1704) +#### AttributeConstraint [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L1611:L1704) Generic query constraint for a query node or query edge ##### Fixed Fields @@ -412,7 +412,7 @@ Generic query constraint for a query node or query edge | unit_id | any | CURIE of the units of the value or list of values in the 'value' property. The Units of Measurement Ontology (UO) should be used if possible. The unit_id MUST be provided for (lists of) numerical values that correspond to a quantity that has units. | | unit_name | any | Term name that is associated with the CURIE of the units of the value or list of values in the 'value' property. The Units of Measurement Ontology (UO) SHOULD be used if possible. This property SHOULD be provided if a unit_id is provided. This is redundant but recommended for human readability. | -#### RetrievalSource [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L1705:L1762) +#### RetrievalSource [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L1705:L1762) Provides information about how a particular InformationResource served as a source from which knowledge expressed in an Edge, or data used to generate this knowledge, was retrieved. ##### Fixed Fields @@ -424,7 +424,7 @@ Provides information about how a particular InformationResource served as a sour | upstream_resource_ids | Array\[[CURIE](#curie-)\] | **Minimum items: 1.** An upstream InformationResource from which the resource being described directly retrieved a record of the knowledge expressed in the Edge, or data used to generate this knowledge. This is an array because there are cases where a merged Edge holds knowledge that was retrieved from multiple sources. e.g. an Edge provided by the ARAGORN ARA can express knowledge it retrieved from both the automat-mychem-info and molepro KPs, which both provided it with records of this single fact. | | source_record_urls | Array\[`string`\] | **Minimum items: 1.** A URL linking to a specific web page or document provided by the source that contains a record of the knowledge expressed in the Edge. If the knowledge is contained in more than one web page on an information resource's site, urls MAY be provided for each. For example, Therapeutic Targets Database (TTD) has separate web pages for 'Imatinib' and its protein target KIT, both of which hold the claim that 'the KIT protein is a therapeutic target for Imatinib'. | -#### ResourceRoleEnum [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0/TranslatorReasonerAPI.yaml#L1763:L1775) +#### ResourceRoleEnum [↗](https://github.com/NCATSTranslator/ReasonerAPI/blob/2.0-specification-update/TranslatorReasonerAPI.yaml#L1763:L1775) The role played by the information resource in serving as a source for an Edge. Note that a given Edge should have one and only one 'primary_knowledge_source' source, and may have any number of 'aggregator_knowledge_source' or 'supporting_data_source' sources. This enumeration is found in Biolink Model, but is repeated here for convenience. `string`