Ontology Design Considerations for SNOMED CT and other OWL ontologies
This post was inspired by some discussions on the IHTSDO discussions groups on the way to model substance hierarchies in SNOMED CT. I think there were two principal points of view, ones who’d like to use universal restrictions (ONLY) everywhere and ones who’d rather use existential restrictions (SOME) mostly and use universal restrictions for cases where they were indicated. In fact, one of the people on the discussion removed all existential restrictions to control/ensure ‘desired’ inference! So, what ontology design considerations relevant for SNOMED CT design (or any other ontology for that matter)?
I think both the camps were right in trying to ‘bend’/’apply’ logic to the ‘problem’. However, in this case I am more inclined to agree in principle with the camp saying that we use existential restrictions (SOME) mostly and use universals in specific cases. The simple reason is that perhaps for the more ‘generic’ use cases existential restrictions (SOME) are useful and using universal restrictions (ONLY) makes things more brittle and also unwieldy further down the road (when the ontology is no longer a ‘small’/’toy’).
Having designed formal ontologies (with thousands of concepts) in the past, I perhaps state the following considerations/principles that might help with this ‘problem’. There are a lot more well-informed audience in the community, so I am sure they’ll chip in too. Since this is a long post, here is the summary of the points below:
1. Try using modular ontologies to separate concerns/use-cases.
2. Complexity in a real ontology is different from what it looks like on paper.
3. Not all knowledge for inferences needs to come from an OWL ontology or Description Logic (DL).
1: Identify & Design to Use Cases
The use case(s) for an ontology need(s) to be very clearly understood and in some cases serve as a way of restricting scope creep. In most cases, for what it is designed SNOMED CT (and its constituent hierarchies) work quite well with the use of an existential restriction (SOME). So, it is not wise to ‘remove’ all the existential restrictions, to fit a particular use case. The use of covering axioms (SOME & ONLY) all over the place might also have the same effect. Apart from complexity, the other consideration here is the influence of all these covering axioms on the time needed to classify ontologies. For largish ontologies, this is non-trivial and we know that for some modelling (and logical) constructs this at least in EXP-TIME if not worse. Also see #2 below for related debugging issues. From experience (and not just mine), I think it is often more easy to create layers of ontological knowledge (modules) the core of which perhaps has the more generic knowledge (and satisfies the generic use cases) and then add additional layers of the more specific/restrictive knowledge (that satisfies the more specific use cases). That way, we can choose to ‘load/reason-over’ over right level of knowledge for a given use case, without having to worry about the ‘brittleness’ existing all over the place. So maybe, one approach is to ‘move’ all these covering axioms to a separate ontology module. Also see #3 below on why not all knowledge belongs in the SNOMED CT space. Having using this approach in the past, I can tell you that it is an elegant design architecture and is an useful notion for SNOMED CT.
2: Avoid Over-Engineering
It is often tempting to use all the clever bells and whistles (or the very clearly obvious feature) to satisfy a perceived use case. However, ontologies are actually quite a bit like software code, only a lot more complex with many chances for things to break/misbehave. In fact unlike regular software code, there are very few debugging tools for ontologies. However, ontology designers are likely to make the same mistakes that normal software developers make:
- Believe that their code/ontology is clean/logically-consistent
- The code/ontology will always compile-correctly/stay-logically-consistent.
- They always remember/know why a feature/design-decision exists in the code/ontology
The truth is that very often people end up debugging code/ontology. Debugging OWL ontologies is somewhat of a ‘black art’ and universal restrictions make it more ‘interesting’ to debug. That said, there are very good reasons to use universal restriction, so at Manchester University (? one of the birth places of Protege-OWL), we were always taught to ‘use them carefully’. It does quite often lead to insidious errors when the ontology size grows (to thousands). While SNOMED CT at this moment does not use all of the expressivity of Description Logic (DL), there is nothing stopping us from using more features to do clever things. Remember, the more ‘clever’ use case you try to fit into the same ontology, the more likely that it’ll be harder to debug. An often quoted example is, with the use of domains and ranges on object properties. For the given object property has_Topping, it is logical to assume that we should specify that the domain is a Pizza_Class and range is a Topping_Class. This is correct most of the time, so it works for all types of pizzas (Pizza_Class objects). If I now state that an Ice_Cream_Class has_Topping Chocolate, then this assertion might or might not fail. You might further down the line notice that due to other inferences, you ontology might tell you that an ice cream can be cooked in an oven, which might leave you scratching your head! In fact stating that an ice cream has a topping, actually leads to it being inferred as type of a Pizza_Class (because the domain of has_Topping property has been set to Pizza_Class (correctly previously). Of course when you have a few thousand classes in your ontology, tracking down this innocuous domain restriction as the cause of an inconsistent ontology can be pretty fun, especially if you haven’t touched the Pizza section of the ontology in a few months. This is just one example, of the somewhat curious debugging issues that might happen when people are tempted to use clever things just because the tool/logic allows it.
3: Understand the nature of the Knowledge
While it is tempting to see ‘description logic’ as the super-duper answer to all the short-comings of other approaches/formalisms, there are cases where naively using a subsumption type logic might unintended/undesirable side effects. An often quoted example is that Digoxin is often used for cardiac arrhythmias (atrial node) and since Wolf-Parkinson-White syndrome (WPW) is a type of Arrythmia, the subsumption type logic could be used to infer that Digoxin can be used to treat WPF. However, in the real world Digoxin is contra-indicated in WPW (and possibly close to kill the patient with WPW). Ontologies are really about ‘intensional’ knowledge but there is also need for a different type of ‘extensional’ knowledge for which description logic formalism might not be the most appropriate solution (especially if it is naively used). This is perhaps one reason why there are many knowledge modelling formalisms around and also why classifications exist alongside terminologies!
I hope this rant helps… In summary:
- Try using modular ontologies to separate concerns/use-cases.
- Complexity in a real ontology is different from what it looks like on paper.
- Not all knowledge for inferences needs to come from a DL ontology.