NCATSTranslator · colleenXu · May 14, 2026 · May 14, 2026 · May 14, 2026 · May 14, 2026
diff --git a/ImplementationGuidance/Specifications/knowledge_level_agent_type_specification.md b/ImplementationGuidance/Specifications/knowledge_level_agent_type_specification.md
@@ -10,56 +10,59 @@
 
 These properties are complemented by a larger model that is being developed to support a more detailed representation of evidence and provenance metadata, and will be documented elsewhere.
 
-The scope of this initial specification covers only **Agent Type** and **Knowledge Level** metadata - which describes the type of knowledge expressed in an edge based on type of agent that originally generated a statement of knowledge encoded in an Edge, and the level of knowledge expressed in an Edge. It is complemented by a [Supplemental Guidance document](https://docs.google.com/document/d/140dtM5CjWM97JiBRdAmDT-9IKqHoOj-xbE_5TWkdYqg/edit) that provides detailed examples implementation support.
+The scope of this initial specification covers only **Agent Type** and **Knowledge Level** metadata - which describes the type of knowledge expressed in an edge based on the type of agent that originally generated a statement of knowledge encoded in an Edge, and the level of knowledge expressed in an Edge. It is complemented by a [Supplemental Guidance document](https://docs.google.com/document/d/140dtM5CjWM97JiBRdAmDT-9IKqHoOj-xbE_5TWkdYqg/edit) that provides detailed examples and implementation support.
 
 ## Biolink properties and enumerations
-Biolink defines hte following properties and enumerations for classifying agent type and knowledge level, which are used to annotate individual edges in knowledge graphs. 
+Biolink defines the following properties and enumerations for classifying agent type and knowledge level, which are used to annotate individual edges in knowledge graphs. 
 
 ### Agent Type
 **Biolink Edge Property**:
-   - `agent_type`: describes the high-level category of agent that originally generated a statement of knowledge or other type of information. Permissible values are provided by the **biolink:AgentTypeEnum** enumeration, as defined below:
+   - `agent_type`: Describes the high-level category of agent who originally generated a statement of knowledge or other type of information. Permissible values are provided by the **biolink:AgentTypeEnum** enumeration, as defined below:
 
 **Permissible Values**:
 
-   - `manual_agent`: a human agent is responsible for generating the knowledge expressed in the Edge.  The human may utilize computationally generated information as evidence for the resulting knowledge, but the human is the one who ultimately generates this knowledge.
-    - `automated_agent`: an automated agent, typically a software program or tool, is responsible for generating the knowledge expressed in the Edge. Human contribution to the knowledge creation process  ends with the definition and coding of algorithms or analysis pipelines that get  executed by the automated agent.
-        - `data_analysis_pipeline`:  an automated agent that executes an analysis workflow over data and reports results in an Edge. These typically  report statistical associations/correlations between variables in the input data.
-        - `computational_model`: an automated agent that generates knowledge (typically predictions) based on rules/logic explicitly encoded in an algorithm (e.g. heuristic models, supervised classifiers), or  learned from patterns observed in data (e.g. ML models, unsupervised classifiers).
-        - `text_mining_agent`:  an automated agent that uses Natural Language Processing to recognize concepts and/or relationships in text, and generates Edges relating these concepts with formally encoded semantics.
-        - `image_processing_agent`: an automated agent that processes images to generate Edges reporting knowledge  derived from the image and/or expressed in text the image depicts (e.g. via OCR).
-   - ` manual_validation_of_ automated_agent`: a human agent reviews and validates/approves the veracity of knowledge that is initially generated by an automated agent.
-   - `not_provided`:  the agent type is not provided, typically because it cannot be determined from available information if the agent that generated the knowledge is manual or automated.
+   - `manual_agent`: A human agent who is responsible for generating a statement of knowledge. The human may utilize computationally generated information as evidence for the resulting knowledge, but the human is the one who ultimately interprets/reasons with this evidence to produce a statement of knowledge.
+    - `automated_agent`: An automated agent, typically a software program or tool, that is responsible for generating a statement of knowledge. Human contribution to the knowledge creation process ends with the definition and coding of algorithms or analysis pipelines that get executed by the automated agent.
+        - `data_analysis_pipeline`: An automated agent that executes an analysis workflow over data and reports the direct results of the analysis. These typically report statistical associations/correlations between variables in the input dataset, and do not interpret/infer broader conclusions from associations the analysis reveals in the data.
+        - `computational_model`: An automated agent that generates knowledge statements (typically predictions) based on rules/logic explicitly encoded in an algorithm (e.g. heuristic models, supervised classifiers), or learned from patterns observed in data (e.g. ML models, unsupervised classifiers).
+        - `text_mining_agent`: An automated agent that uses Natural Language Processing to recognize concepts and/or relationships in text, and report them using formally encoded semantics (e.g. as an edge in a knowledge graph).
+        - `image_processing_agent`: An automated agent that processes images to generate textual statements of knowledge derived from the image and/or expressed in text the image depicts (e.g. via OCR).
+   - `manual_validation_of_automated_agent`: A human agent reviews and validates/approves the veracity of knowledge that is initially generated by an automated agent.
+   - `not_provided`: The agent type is not provided, typically because it cannot be determined from available information if the agent that generated the knowledge is manual or automated.
 
-Note that this property indicates the type of agent who *produced a final statement of knowledge*, which is often different from the agent or agents who produced *information used as evidence* to support the generation of this knowledge. For example, if a human curator concludes that a particular gene variant causes a medical condition - based on their interpretation of information produced by computational modeling tools, automated statistical analyses, and robotic laboratory assay systems - the agent type for this statement is "manual_agent" despite all of the evidence being created by automated agents. But if any of these systems is programmed to generate knowledge statements directly and without human assistance, the statement would be attributed to an "automated_agent".
+Note that this property indicates the type of agent who *produced a final statement of knowledge*, which is often different from the agent or agents who produced *information used as evidence* to support generation of this knowledge. For example, if a human curator concludes that a particular gene variant causes a medical condition - based on their interpretation of information produced by computational modeling tools, automated data analysis pipelines, and robotic laboratory assay systems - the agent_type for this statement is "manual_agent" - despite all of the evidence being created by automated agents. But if any of these systems is programmed to generate knowledge statements directly and without human assistance, the statement would be attributed to an "automated_agent".
 
 ### Knowledge Level
 **Biolink Edge Property**: 
-- `knowledge_level`: describes the level of knowledge expressed in a statement, based on the reasoning or analysis methods used to generate the statement, or the scope or specificity of what the statement expresses to be true. Permissible values are defined in the **biolink:KnowledgeLevelEnum** enumeration:
+- `knowledge_level`: Describes the level of knowledge expressed in a statement, based on the reasoning or analysis methods used to generate the statement, or the scope or specificity of what the statement expresses to be true. Permissible values are defined in the **biolink:KnowledgeLevelEnum** enumeration:
 
 **Permissible Values**:
-   - `knowledge_assertion`: a statement of purported fact that is put forth by an agent as true, based on assessment of direct evidence. Assertions generally have a high confidence of being true based on the strength of evidence supporting them.
-   - `logical_entailment`: a statement reporting a conclusion that follows logically from premises, which are typically well-established facts or knowledge assertions. (e.g. fingernail part of finger, finger part of hand → fingernail part of hand)). Logical entailments are based on dedictive inference, and generally have a high degree of confidence when based on sound premises and inference logic. 
-   - `prediction`: a statement of a possible fact based on more probabilistic (non-deductive) forms of reasoning over indirect forms of evidence, that lead to more speculative conclusions. Predictions often have a lower degree of confidence based on the indirect nature of their evidence and reasoning supporting them.
-   - `statistical_association`: a statement that reports concepts representing variables in a dataset to be statistically associated in the context of a particular cohort or dataset (e.g. “Metformin Treatment (variable 1) is correlated with Diabetes Diagnosis (variable 2) in EHR dataset X”). These associations are inherently true in that they simple report the results of some statistical analysis, but do not interpret these data to draw broader conclusions about general types in the domain of discourse.
-   - `observation`: a statement reporting (and possibly quantifying) a phenomenon that was observed to occur - absent any analysis or interpretation that generates a statistical association or supports a broader conclusion or inference. Observation statements are also inherently true in that they simple report what an agent observed - without any interpretation or inference. 
-   - `not_provided`: the knowledge level/type fora  statement is not provided, typically because it cannot be determined from available information.
+   - `knowledge_assertion`: A statement of purported fact that is put forth by an agent as true, based on assessment of direct evidence. Assertions are likely but not definitively true.
+   - `logical_entailment`: A statement reporting a conclusion that follows logically from premises representing established facts or knowledge assertions (e.g. fingernail part of finger, finger part of hand --> fingernail part of hand). 
+   - `prediction`: A statement of a possible fact based on probabilistic forms of reasoning over more indirect forms of evidence, that lead to more speculative conclusions.
+   - `statistical_association`: A statement that reports concepts representing variables in a dataset to be statistically associated with each other in a particular cohort (e.g. 'Metformin Treatment (variable 1) is correlated with Diabetes Diagnosis (variable 2) in EHR dataset X').
+   - `text_co_occurrence`: A statement reporting that mentions of two concepts in some corpus of text (e.g. the biomedical literature) occur together at a statistically significant frequency - suggesting that a real-world biological or clinical relationship may exist between the concepts.
+   - `observation`: A statement reporting (and possibly quantifying) a phenomenon that was observed to occur - absent any analysis or interpretation that generates a statistical association or supports a broader conclusion or inference. 
+   - `not_provided`: The knowledge level is not provided, typically because it cannot be determined from available information.
 
-NOTE that the notion of a 'level' of knowledge can in one sense relate to the strength of a statement - i.e. how confident we are that it says something true about our domain of discourse. Here, we can generally consider Knowledge Assertions to be stronger than Entailments to be stronger than Predictions. But in another sense, 'level' of knowledge can refer to the scope or specificity of what a statement expresses -  on a spectrum from context-specific results of a data analysis, to generalized assertions of knowledge or fact. Here, Statistical Associations and Observations represent more foundational statements that are only slightly removed from the data on which they are based (the former reporting the direct results of an analysis in terms of correlations between variables in the data, and the latter describing phenomena that were observed/reported to have occurred).
+NOTE that the notion of a 'level' of knowledge can in one sense relate to the strength of a statement - i.e. how confident we are that it says something true about our domain of discourse. Here, we can generally consider Assertions to be stronger than Entailments to be stronger than Predictions. But in another sense, 'level' of knowledge can refer to the scope or specificity of what a statement expresses - on a spectrum from context-specific results of a data analysis, to generalized assertions of knowledge or fact. Here, Statistical Associations and Observations represent more foundational statements that are only slightly removed from the data on which they are based (the former reporting the direct results of an analysis in terms of correlations between variables in the data, and the latter describing phenomena that were observed/reported to have occurred).
 
 ## Implementation Guidance
 
 1. Knowledge Providers MUST report one and only one `agent type` on each Edge they return in a TRAPI message, using an `agent_type` property. The value of this property MUST come from the [biolink:AgentTypeEnum](https://biolink.github.io/biolink-model/AgentTypeEnum/).
 
 2. Knowledge Providers MUST report one and only one `knowledge level` for each Edge returned in a TRAPI message, using a `knowledge_level` property. The value MUST come from the [biolink:KnowledgeLevelEnum](https://biolink.github.io/biolink-model/KnowledgeLevelEnum/).
 
+```json
        "edge_id": {
            "subject": "PUBCHEM.COMPOUND:6623",
            "predicate": "biolink:affects",
            "object": "HGNC:3467",
-           "sources": { . . . }
+           "sources": { . . . },
            "knowledge_level": "knowledge_assertion",
            "agent_type": "manual_agent"
        }
+```
 
 3. The main challenge in applying this standard concerns selecting appropriate agent type and knowledge level terms for a given Edge. To assist KPs in this task, a [Supplemental Guidance document](https://docs.google.com/document/d/140dtM5CjWM97JiBRdAmDT-9IKqHoOj-xbE_5TWkdYqg/edit) provides additional implementation support beyond the base specification above. This includes clarification of key distinctions, tips for proper term selection, and a corpus of examples illustrating how agent type and knowledge level terms are applied to the diverse kinds of Edges provided in Translator knowledge graphs.
 
diff --git a/ImplementationGuidance/Specifications/pathfinder_query_specification.md b/ImplementationGuidance/Specifications/pathfinder_query_specification.md
@@ -148,7 +148,7 @@ Paths are represented within AuxiliaryGraphs. Each Path references edges from th
 AuxiliaryGraph's `edges` field. These Edges are unordered; their sequence does not convey Path structure. Instead, the
 Path must be reconstructed from the graph itself.
 
-Paths are expected to be linear—there should be no way to skip or bypass any node in the Path. However, parallel edges
+Paths are expected to be linear — there should be no way to skip or bypass any node in the Path. However, parallel edges
 between two nodes are allowed.
 
 Using the Knowledge Graph above, we can construct the AuxiliaryGraphs shown in the example below:
@@ -185,7 +185,7 @@ Path `a2` represents a direct lookup edge between Crohn’s and Parkinson’s. I
 input nodes, it should be included as a valid path.
 
 The Paths shown correspond to the unconstrained version of the initial query. Applying `intermediate_categories` 
-constraints would yield a slightly different set of paths. As show below in the constrained version of the 
+constraints would yield a slightly different set of paths. As shown below in the constrained version of the 
 AuxiliaryGraphs, `a2` is  removed because it does not contain any `Gene` nodes. Therefore, it is not a valid Path 
 for the constrained version of the query.
 
@@ -212,7 +212,7 @@ for the constrained version of the query.
 ## Results
 
 Each individual Result is structured similarly, with NodeBindings and Analyses. Both input
-nodes must be pinned, and therefor no unpinned nodes will exist in the QueryGraph.  This means there is only one result
+nodes must be pinned, and therefore no unpinned nodes will exist in the QueryGraph.  This means there is only one result
 for each query, contained within the Results field of the Message. This Result can have many Analyses, each one
 corresponding to a different Path. This follows the same Result-merging rules used in other query types, where results
 that contain the same nodes but different analyses are combined into a single result, with their analyses concatenated.