-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Summary
During our migration of OntoMathPRO v8 to Neo4j graph database, we identified 32 data quality issues:
- 23 cyclic rdfs:subClassOf relationships (causing circular hierarchies)
- 9 orphan nodes (missing parent relationships)
Impact: These issues prevent direct DAG (Directed Acyclic Graph) implementation, which is essential for graph databases and ontology reasoning systems.
Issue #1: Cyclic rdfs:subClassOf Relationships (23 cycles)
Severity
HIGH - Prevents proper hierarchy traversal and reasoning
Description
Cycles occur when rdfs:subClassOf relationships form circular paths, violating the DAG property required for proper ontology hierarchies.
Example cycle:
- A rdfs:subClassOf B
- B rdfs:subClassOf C
- C rdfs:subClassOf A ← Cycle!
Detected Major Cycles
-
E34 Cycle (Length: 5)
- Path: E34 → E1660 → E4830 → E5122 → E6214 → E34
- Concepts: Mathematical knowledge object chain
-
E2844 Cycle (Length: 3)
- Path: E2844 → E1660 → E34 → E2844
- Concepts: Element of mathematical analysis chain
-
Matrix Cycle (Length: 4)
- Path: MatrixOperation → SquareMatrix → DiagonalMatrix → MatrixOperation
Plus 20 more cycles (mostly 2-node bidirectional relationships)
Reproduction Steps
Using Protégé:
- Open `ontomathpro_v8.owl` in Protégé
- Select "Tools" → "Reasoner" → "HermiT"
- Run "Start Reasoner"
- Navigate to E34 class
- Expand "SubClass Of" hierarchy
- Observe circular reference
Using Neo4j Cypher (after import):
```cypher
MATCH path = (n:ObjectType)-[:GENERALIZES*]->(n)
RETURN [node in nodes(path) | node.name] as cycle_path,
length(path) as cycle_length
ORDER BY cycle_length DESC
```
Recommended Fix
For the E34 cycle specifically, we recommend removing E34 → E1660 relationship:
Rationale:
- "Mathematical knowledge object" (E34) should NOT be subclass of "Value" (E1660)
- Counter-examples: Theorem, Operator, Formula are not values
- Keeping E1660 → E34 (Value is-a Mathematical knowledge object) is semantically correct
General approach:
- Analyze semantic correctness of each rdfs:subClassOf in the cycle
- Remove the weakest relationship (least semantically justified)
- Re-validate hierarchy
Issue #2: Orphan Nodes (9 nodes)
Severity
MEDIUM - Reduces hierarchy completeness
Description
9 nodes have no parent relationships due to encoding/naming mismatches in the OWL file.
Detected Orphans
Emden-Fowler Family (5 nodes):
- `Emden–FowlerEquation` (expected parent: E1897)
- `Emden–FowlerTypeEquation`
- `EmdenEquation`
- `Thomas–FermiEquation`
- `Euler–Poisson–DarbouxEquation`
Root Cause: Encoding mismatch (`â` vs `-`)
ElementMatrices Family (4 nodes):
- `ElementMatriсesTheory` (Cyrillic 'с')
- `DeterminantMatrix`
- `MatrixOperation`
- `TraceMatrix`
Root Cause: Cyrillic character in parent name (`с` instead of `c`)
Recommended Fix
- Normalize encoding: Convert all em-dashes to regular hyphens
- Fix Cyrillic characters: Replace Cyrillic 'с' with Latin 'c' in `ElementMatricesTheory`
- Add missing relationships:
```xml
<owl:Class rdf:about="EmdenEquation">
<rdfs:subClassOf rdf:resource="Emden-FowlerEquation"/>
</owl:Class>
```
Impact Analysis
Current State
- Total Classes: 4,052
- With cycles: 23 classes affected
- Orphaned: 9 classes
- Effective completeness: ~99.2%
Consequences
- ❌ Cannot be used in Neo4j without manual fixes
- ❌ Reasoners may produce incorrect inferences
- ❌ Hierarchy visualization tools fail
- ❌ SPARQL queries return incomplete results
Full Details
For complete analysis including all 23 cycles, reproduction scripts, and detailed recommendations, see our full report:
Repository: [Our internal analysis repository]
Report File: `palantir/docs/ontomathpro_issues_report.md`
Environment
- OntoMathPRO Version: v8 (`ontomathpro_v8.owl`)
- Detection Method: Neo4j graph database migration + Python OWL parsing
- Analysis Date: 2025-11-08
- Reporter: Math Ontology Migration Team
We're happy to provide additional details or collaborate on fixes. Thank you for maintaining this valuable resource!