From 0e0aa9a6b8e81f513d50ad7db639aba478a2313e Mon Sep 17 00:00:00 2001 From: Inari Listenmaa Date: Thu, 18 Jun 2020 13:53:14 +0200 Subject: [PATCH] More on ontologies + wordnet --- legal_ontology.md | 2 +- ontology.md | 77 +++++++++++++++++++++++++++++++++++++++++++++-- sumo.md | 29 ++++++++++++++++++ wordnet.md | 24 +++++++++++++++ 4 files changed, 128 insertions(+), 4 deletions(-) create mode 100644 sumo.md create mode 100644 wordnet.md diff --git a/legal_ontology.md b/legal_ontology.md index e5508e6..0360403 100644 --- a/legal_ontology.md +++ b/legal_ontology.md @@ -12,7 +12,7 @@ Say that you write a law that says > You have the obligation to pay taxes. -Maybe you link your law to [WordNet](https://wordnet.princeton.edu/) and make sure you have the right sense of "you", "have", "obligation", "pay" and "taxes". Now you can translate it into any other language that has a [linked WordNet](https://github.com/GrammaticalFramework/gf-wordnet#readme) with the same identifiers. +Maybe you link your law to and make sure you have the right sense of "you", "have", "obligation", "pay" and "taxes". Now you can translate it into any other language that has a [linked WordNet](https://github.com/GrammaticalFramework/gf-wordnet#readme) with the same identifiers. Now you know that _obligation_ in the sense `06785951` 'a legal agreement specifying a payment or action and the penalty for failure to comply' is translated into Bulgarian as _обвързаност_. diff --git a/ontology.md b/ontology.md index 03df2cf..852034a 100644 --- a/ontology.md +++ b/ontology.md @@ -6,8 +6,79 @@ date: "2020-06-17" Collection of concepts and their relationships, in a machine-readable format. -Kind of like this zettelkasten: here we link concepts to each other. In addition, the _links_ themselves may have types and be the subject or object in another relation, linked by something that itself has a type and can be a subject/object of yet another relation, and so on. + is a big, well-known general ontology. -Contrast with a web of free text, this kind of structure allows for e.g. automated question answering. +### Taxonomy -(That is just one kind of ontology, there are other designs. I don't know this area very well. But other ontologies seem to contain _axioms_ or _facts_.) +The basic building block of an ontology is a hierarchy of concepts. Higher nodes represent general concepts, lower nodes more specific. Example: + + Entity + / | \ + Biology … Geography + / | | \ … … / | | \ + … … … … … … … … … … … … … + | \ + Neurology Europe + / | | \ / | \ + … … … … … … … … … … … … … … … + | | + Optic nerve Heathrow airport + + +Just a tree of arbitrary labels isn't particularly useful. That's why ontologies may have some of the following features. + +### Relationships + +So far we've seen only taxonomy membership in a strict tree structure. +* Spice Girls `is-a` Band +* Wannabe `is-a` Song + +But we can also have other _relationships_. +* Spice Girls `perform` Wannabe +* Wannabe `year` 1996 + +Furthermore, the relation links themselves can be the subject or object in another relation. +* Perform `is-a` Action +* is-a `is-a` Relation + +See [RDF](https://en.wikipedia.org/wiki/Resource_Description_Framework), a data model based on such triples. + +### Axioms +TODO + +### Mapping to logic +TODO + +### Mapping to lexical resources + +[Niles and Pease (2003)](http://www.adampease.org/professional/Niles-IKE.pdf) map mid-level entries from to . WordNet itself has some relations like synonymy and hypernymy, and I'm not quite sure how they work together with the relations of an ontology (TODO: read the whole 2003 paper). + +Main point is that a concept in an ontology corresponds to one or more synonym sets in WordNet. Consider a corner of ontology like the following: + + Entity + / \ + Abstract Physical + / | | \ / | \ + Attribute … … … … + / | \ + Measure + / \ + TimeMeasure LengthMeasure + + +The concept `Measure` is mapped to a number of WordNet synonym sets, such as +* _space_ 00014887 '3-dimensional expanse in which everything is located' +* _time_ 15160774 'the fourth coordinate that is required (along with three spatial dimensions) to specify a physical event' + +And `LengthMeasure` is mapped only to 00014887 _space_. + + +## Ontology extraction + +Quote from [Herbelot, 2011](https://web.archive.org/web/20130704143830/http://www.peerpress.de/discoursecpp.pdf) + +> [O]ntology extraction — a subfield of natural language processing which, put simply, specialises in producing lists. […] Well-loved ontology extraction tasks include the retrieval of Oscar nominees, chemical reactions and dead presidents. In this kind of research, the machine is asked, for instance, to produce a list of things that are ‘like lorries’ and is expected to duly return (given the current state of the art) +> +> `truck car motorcycle plane engine hamster.` +> +> Because lorries have wheels and hamsters have too. diff --git a/sumo.md b/sumo.md new file mode 100644 index 0000000..fb35fbf --- /dev/null +++ b/sumo.md @@ -0,0 +1,29 @@ +--- +date: "2020-06-18" +--- + +# SUMO + +SUMO (Suggested Upper-Merged Ontology) has the approach of _domain_ ontologies and _merge_ ontologies. Example: + +* Top level +``` + Entity + / \ + Abstract Physical + / | | \ / | \ + Attribute … … Object … Process +``` +* Domain ontology +``` + AirportsFromAtoK + / | \ + + / / |  \ \ + Arlanda … … Heathrow … + +``` + +All entries in SUMO are part of a single tree, starting from `Entity`. A domain ontology about airports is linked to the top level ontology, in a distant subtree of physical entities. + +TODO: I don't know how the linking works in practice, or if the technical details have any relevance to the scope of CCLAW readings. ([Niles, Pease (2001)](https://dl.acm.org/doi/pdf/10.1145/505168.505170) seems to describe the merging process, read later if interested.) diff --git a/wordnet.md b/wordnet.md new file mode 100644 index 0000000..51f9a05 --- /dev/null +++ b/wordnet.md @@ -0,0 +1,24 @@ +--- +date: "2020-06-18" +--- + +# WordNet + +[Princeton WordNet](https://wordnet.princeton.edu/) is a lexical database consisting of _synonym sets_ (synset). Each synset has +- id +- part-of-speech (noun, verb or adjective) +- definition +- example(s) of use + +The word "space" belongs to the following synsets. + +* 00028950-__n__ _the unlimited expanse in which everything is located; "they tested his ability to locate objects in space"_ +* 13933399-__n__ _an empty area (usually bounded in some way between things); "the architect left space in front of the building"; "they stopped at an open space in the jungle"; "the space between his teeth"_ +* 08670545-__n__ _an area reserved for some particular purpose; "the laboratory's floor space"_ +* 08517454-__n__ _[astrology] any location outside the Earth's atmosphere; "the astronauts walked in outer space without a tether"; "the first major milestone in space exploration was in 1957, when the USSR's Sputnik 1 orbited the Earth"_ +* 06852240-__n__ _[linguistics, publishing] a blank character used to separate successive words in writing or printing; "he said the space is the most important character in the alphabet"_ +* 15197259-__n__ _[time period] the interval between two times; "it all happened in the space of 10 minutes"_ +* 06401196-__n__ _a blank area; "write your name in the space provided" +* 06875252-__n__ _[music] one of the areas between or below or above the lines of a musical staff; "the spaces are the notes F-A-C-E"_ +* 04037131-__n__ _[publishing] a block of type without a raised letter; used for spacing between words or sentences_ +* 01992094-__v__ _place at intervals; "Space the interviews so that you have some time between the different candidates"_