Get sense/synset relation metadata #216

goodmami · 2024-11-02T03:03:36Z

Is your feature request related to a problem? Please describe.

There is currently no way to get the metadata for a sense relation or synset relation. It exists in the database, but as relations are not modeled with classes there is nowhere to place a metadata() method.

Describe the solution you'd like

It should be possible to get metadata for any relation. This could be something like adding a with_metadata=True parameter to methods like Sense.relations(), and the return value would map relations to lists of tuples of (Sense, relation-metadta). This is not ideal, though, because it changes the return type.

It should also be possible to filter relations based on metadata (at least dc:type). E.g., sense.relations("other", type="material") could only get relations matching the reltype and the dc:type attribute.

Describe alternatives you've considered

One can get the metadata through the non-public wn._queries.get_metadata() function, but they'd have to know the table name and the db-internal rowid of the relation they want the metadata for.

Additional context

This is more urgent now that (as of the Open English Wordnet 2023) the dc:type metadata attribute is used to distinguish other relations (see also #215).

The text was updated successfully, but these errors were encountered:

goodmami · 2024-11-02T20:11:09Z

We could also change the key type in the returned dictionary from Sense.relations() or Synset.relations() to some new type inheriting from str. This would not exactly be a breaking change, but we would still see different behavior because relations that may have been previously grouped would then be separate. E.g.:

class Relation(str):
    def __new__(cls, name: str, type: Optional[str] = None) -> Relation:
        obj = cls(name)
        obj.type = type

    def __str__(self) -> str:
        return self.name

    def __repr__(self) -> str:
        return f"{self.name}({self.type})" if self.type else self.name

    def __hash__(self) -> int:
        return hash(repr(self))

goodmami · 2024-11-13T00:06:36Z

There are issues with the above solution because the base class isn't just str but also the non-public _LexiconElement class (which allows it to store a db-internal ID and lexicon ID), and Python doesn't seem to like the multiple inheritance.

Here is an alternative:

Relation names stay as strings
Type metadata can be included in relation name when searching (e.g., sense.relations("other.agent")); if left off (sense.relations("other")), it matches any relations with the given relation name (similar to current behavior)
Results of .relations() or .get_related() methods are RelatedSense or RelatedSynset objects. These are the same as Sense/Synset objects but they also:
- Store the db-internal id and lexicon id of the relation traversed to get to the target
- Have methods like .relation_metadata(), .relation_name(), and .relation_source()
- This might also open up a solution for Tracing back 'inferred' synsets to their reference lexicons #167

I still need to think a bit about how this would work with interlingual searches, subsequent queries, and comparisons (e.g., is a RelatedSynset equal to a Synset or a RelatedSynset from a different relation when the target synset is the same?).

goodmami · 2024-11-13T04:41:03Z

Further refining the above... It probably makes more sense to just put the additional attributes and methods on the Sense and Synset classes instead of creating separate RelatedSense and RelatedSynset classes. The return values of those methods would just be something like None when they were not obtained via relation traversal.

Senses and Synsets now have an `incoming_relation()` method. The value of this method returns a SenseRelation, SenseSynsetRelation, or SynsetRelation object if the Sense/Synset is the result of a relation traversal. Otherwise the method returns `None`. The new relation objects specify the relation name, the source and target IDs of the relation, and the lexicon where the relation originated. Fixes #216 Fixes #167

goodmami · 2024-11-21T07:17:36Z

Ok here's what I have working. @fcbond and @jmccrae, do you agree with the proposed API?

Senses and Synsets now have a .incoming_relation() method which return a relation object if the sense/synset were the result of a relation traversal, otherwise the method returns None. This relation object has attributes for the relation name, source id, and target id. It has methods to get the lexicon where the relation was defined and the relation's metadata.

>>> import wn
>>> oewn = wn.Wordnet('oewn:2024')
>>> dog = oewn.synsets('dog')[0]
>>> dog.incoming_relation()  # None; no relation was traversed
>>> dog.hypernyms()[0].incoming_relation()
SynsetRelation('hypernym', 'oewn-02086723-n', 'oewn-02085998-n')
>>> dog.hypernyms()[0].incoming_relation().lexicon()
<Lexicon oewn:2024 [en]>
>>> dog.hypernyms()[0].incoming_relation().metadata()
{}
>>> oewn.senses('ally', pos="v")[0].get_related("other")[0].incoming_relation().metadata()
{'type': 'agent'}

This also works with interlingual traversals. Even though the source and target are in one lexicon, the lexicon of the relation may be different. For instance:

>>> es = wn.Wordnet('omw-es')  # depends on omw-en by default
>>> perro = es.synsets("perro")[0]  # Spanish for 'dog'
>>> perro.hypernyms()[0]  # hypernym is a Spanish omw-es synset
Synset('omw-es-02083346-n')
>>> perro.hypernyms()[0].words()
[Word('omw-es-cánido-n')]
>>> perro.hypernyms()[0].incoming_relation()  # relation traverses English omw-en synsets
SynsetRelation('hypernym', 'omw-en-02084071-n', 'omw-en-02083346-n')
>>> perro.hypernyms()[0].incoming_relation().lexicon()
<Lexicon omw-en:1.4 [en]>

jmccrae · 2024-11-21T08:36:16Z

I am not sure that I like that a synset returns a different result for incoming_relation based on how you found it. I think this could be quite unintuitive.

Wouldn't it be easier just to add a new method, like synset.relation_objects()?

goodmami · 2024-11-21T23:32:24Z

@jmccrae thanks for the feedback.

I am not sure that I like that a synset returns a different result for incoming_relation based on how you found it. I think this could be quite unintuitive.

That bothered me, too, even though two synsets or senses that differed only by the traversal (if any) to arrive at them would still compare equal.

Wouldn't it be easier just to add a new method, like synset.relation_objects()?

That would be ok, but I wanted something that integrated with the normal ways of traversing relations. In the proposed implementation above, all the existing relation methods (.hypernyms(), .relation_paths(), wn.taxonomy functions, etc.) can be inspected for the relation objects, but if it were localized to a specific function, they would not.

Another alternative is a method like Wordnet.traversals(source, target) that is similar to wn.taxonomy.shortest_path() but returns the relations instead of the synsets and would also work for senses. But it wouldn't guarantee that the path to a given synset is the one that was originally traversed.

jmccrae · 2024-11-22T16:07:52Z

My preference is still for a different traversal method, it seems much simple, but perhaps @fcbond has another opinion?

goodmami · 2024-11-24T21:03:26Z

I can think of three reasons for getting the relation objects that aren't solved by the existing API:

To inspect the metadata on a relation
To distinguish relations with the same source, target, and relation name with different metadata (related to (1))
Given a target sense/synset from an interlingual query, to discover where the relation came from, especially when the target is *INFERRED*

(1) and (2) are easily solved with a method like .relation_objects(), but (3) is not (you'd need to go back to the source synset or sense, iterate over its relation objects, and find one or more that match the relname + target). Furthermore, in interlingual queries, the targets of synset relations are not in the same lexicon as the one being queried (e.g., searching for hypernyms of a synset in omw-fr uses relations from omw-en, then the target ILIs are resolved in omw-fr).

If we don't want to make Sense and Synset objects more stateful than the already are, here's an alternative that expands on the .relation_objects() method: .relation_map(). It returns a dictionary where the keys are the relation objects and they map 1-to-1 to resolved targets. This way you can deterministically identify the relation used to arrive at some target.

Synset.relation_map() -> dict[SynsetRelation, Synset]: ...

fcbond · 2024-11-24T21:15:38Z

Hi, I think I like this approach better. But I am not sure what happens when, e.g., you have two hypernyms. In this case the SynsetRelation has the same RelationType, but some other property (an internal ID) that distinguishes them from each other? Will we make this ID accessible? Presumably then you can get the metadata from the SynsetRelation key? I hope the question makes sense, ...

…

On Sun, 24 Nov 2024 at 22:03, Michael Wayne Goodman < ***@***.***> wrote: I can think of three reasons for getting the relation objects that aren't solved by the existing API: 1. To inspect the metadata on a relation 2. To distinguish relations with the same source, target, and relation name with different metadata (related to (1)) 3. Given a target sense/synset from an interlingual query, to discover where the relation came from, especially when the target is *INFERRED* (1) and (2) are easily solved with a method like .relation_objects(), but (3) is not (you'd need to go back to the source synset or sense, iterate over its relation objects, and find one or more that match the relname + target). Furthermore, in interlingual queries, the targets of synset relations are not in the same lexicon as the one being queried (e.g., searching for hypernyms of a synset in omw-fr uses relations from omw-en, then the target ILIs are resolved in omw-fr). If we don't want to make Sense and Synset objects more stateful than the already are, here's an alternative that expands on the .relation_objects() method: .relation_map(). It returns a dictionary where the keys are the relation objects and they map 1-to-1 to resolved targets. This way you can deterministically identify the relation used to arrive at some target. Synset.relation_map() -> dict[SynsetRelation, Synset]: ... — Reply to this email directly, view it on GitHub <#216 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIPZRXA6G7RGT5IDESJI332CI5LJAVCNFSM6AAAAABRBJQCTOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJWGIZTKOBSGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Francis Bond <https://fcbond.github.io/>

goodmami added the enhancement New feature or request label Nov 2, 2024

goodmami self-assigned this Nov 2, 2024

goodmami mentioned this issue Nov 9, 2024

Add OEWN 2024 to index #221

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get sense/synset relation metadata #216

Get sense/synset relation metadata #216

goodmami commented Nov 2, 2024

goodmami commented Nov 2, 2024

goodmami commented Nov 13, 2024

goodmami commented Nov 13, 2024

goodmami commented Nov 21, 2024

jmccrae commented Nov 21, 2024

goodmami commented Nov 21, 2024

jmccrae commented Nov 22, 2024

goodmami commented Nov 24, 2024

fcbond commented Nov 24, 2024 via email

Get sense/synset relation metadata #216

Get sense/synset relation metadata #216

Comments

goodmami commented Nov 2, 2024

goodmami commented Nov 2, 2024

goodmami commented Nov 13, 2024

goodmami commented Nov 13, 2024

goodmami commented Nov 21, 2024

jmccrae commented Nov 21, 2024

goodmami commented Nov 21, 2024

jmccrae commented Nov 22, 2024

goodmami commented Nov 24, 2024

fcbond commented Nov 24, 2024 via email