Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get sense/synset relation metadata #216

Open
goodmami opened this issue Nov 2, 2024 · 9 comments
Open

Get sense/synset relation metadata #216

goodmami opened this issue Nov 2, 2024 · 9 comments
Assignees
Labels
enhancement New feature or request

Comments

@goodmami
Copy link
Owner

goodmami commented Nov 2, 2024

Is your feature request related to a problem? Please describe.

There is currently no way to get the metadata for a sense relation or synset relation. It exists in the database, but as relations are not modeled with classes there is nowhere to place a metadata() method.

Describe the solution you'd like

It should be possible to get metadata for any relation. This could be something like adding a with_metadata=True parameter to methods like Sense.relations(), and the return value would map relations to lists of tuples of (Sense, relation-metadta). This is not ideal, though, because it changes the return type.

It should also be possible to filter relations based on metadata (at least dc:type). E.g., sense.relations("other", type="material") could only get relations matching the reltype and the dc:type attribute.

Describe alternatives you've considered

One can get the metadata through the non-public wn._queries.get_metadata() function, but they'd have to know the table name and the db-internal rowid of the relation they want the metadata for.

Additional context

This is more urgent now that (as of the Open English Wordnet 2023) the dc:type metadata attribute is used to distinguish other relations (see also #215).

@goodmami goodmami added the enhancement New feature or request label Nov 2, 2024
@goodmami goodmami self-assigned this Nov 2, 2024
@goodmami
Copy link
Owner Author

goodmami commented Nov 2, 2024

We could also change the key type in the returned dictionary from Sense.relations() or Synset.relations() to some new type inheriting from str. This would not exactly be a breaking change, but we would still see different behavior because relations that may have been previously grouped would then be separate. E.g.:

class Relation(str):
    def __new__(cls, name: str, type: Optional[str] = None) -> Relation:
        obj = cls(name)
        obj.type = type

    def __str__(self) -> str:
        return self.name

    def __repr__(self) -> str:
        return f"{self.name}({self.type})" if self.type else self.name

    def __hash__(self) -> int:
        return hash(repr(self))

@goodmami
Copy link
Owner Author

There are issues with the above solution because the base class isn't just str but also the non-public _LexiconElement class (which allows it to store a db-internal ID and lexicon ID), and Python doesn't seem to like the multiple inheritance.

Here is an alternative:

  • Relation names stay as strings
  • Type metadata can be included in relation name when searching (e.g., sense.relations("other.agent")); if left off (sense.relations("other")), it matches any relations with the given relation name (similar to current behavior)
  • Results of .relations() or .get_related() methods are RelatedSense or RelatedSynset objects. These are the same as Sense/Synset objects but they also:

I still need to think a bit about how this would work with interlingual searches, subsequent queries, and comparisons (e.g., is a RelatedSynset equal to a Synset or a RelatedSynset from a different relation when the target synset is the same?).

@goodmami
Copy link
Owner Author

Further refining the above... It probably makes more sense to just put the additional attributes and methods on the Sense and Synset classes instead of creating separate RelatedSense and RelatedSynset classes. The return values of those methods would just be something like None when they were not obtained via relation traversal.

goodmami added a commit that referenced this issue Nov 21, 2024
Senses and Synsets now have an `incoming_relation()` method. The value
of this method returns a SenseRelation, SenseSynsetRelation, or
SynsetRelation object if the Sense/Synset is the result of a relation
traversal. Otherwise the method returns `None`.

The new relation objects specify the relation name, the source and
target IDs of the relation, and the lexicon where the relation
originated.

Fixes #216
Fixes #167
@goodmami
Copy link
Owner Author

Ok here's what I have working. @fcbond and @jmccrae, do you agree with the proposed API?

Senses and Synsets now have a .incoming_relation() method which return a relation object if the sense/synset were the result of a relation traversal, otherwise the method returns None. This relation object has attributes for the relation name, source id, and target id. It has methods to get the lexicon where the relation was defined and the relation's metadata.

>>> import wn
>>> oewn = wn.Wordnet('oewn:2024')
>>> dog = oewn.synsets('dog')[0]
>>> dog.incoming_relation()  # None; no relation was traversed
>>> dog.hypernyms()[0].incoming_relation()
SynsetRelation('hypernym', 'oewn-02086723-n', 'oewn-02085998-n')
>>> dog.hypernyms()[0].incoming_relation().lexicon()
<Lexicon oewn:2024 [en]>
>>> dog.hypernyms()[0].incoming_relation().metadata()
{}
>>> oewn.senses('ally', pos="v")[0].get_related("other")[0].incoming_relation().metadata()
{'type': 'agent'}

This also works with interlingual traversals. Even though the source and target are in one lexicon, the lexicon of the relation may be different. For instance:

>>> es = wn.Wordnet('omw-es')  # depends on omw-en by default
>>> perro = es.synsets("perro")[0]  # Spanish for 'dog'
>>> perro.hypernyms()[0]  # hypernym is a Spanish omw-es synset
Synset('omw-es-02083346-n')
>>> perro.hypernyms()[0].words()
[Word('omw-es-cánido-n')]
>>> perro.hypernyms()[0].incoming_relation()  # relation traverses English omw-en synsets
SynsetRelation('hypernym', 'omw-en-02084071-n', 'omw-en-02083346-n')
>>> perro.hypernyms()[0].incoming_relation().lexicon()
<Lexicon omw-en:1.4 [en]>

@jmccrae
Copy link

jmccrae commented Nov 21, 2024

I am not sure that I like that a synset returns a different result for incoming_relation based on how you found it. I think this could be quite unintuitive.

Wouldn't it be easier just to add a new method, like synset.relation_objects()?

@goodmami
Copy link
Owner Author

@jmccrae thanks for the feedback.

I am not sure that I like that a synset returns a different result for incoming_relation based on how you found it. I think this could be quite unintuitive.

That bothered me, too, even though two synsets or senses that differed only by the traversal (if any) to arrive at them would still compare equal.

Wouldn't it be easier just to add a new method, like synset.relation_objects()?

That would be ok, but I wanted something that integrated with the normal ways of traversing relations. In the proposed implementation above, all the existing relation methods (.hypernyms(), .relation_paths(), wn.taxonomy functions, etc.) can be inspected for the relation objects, but if it were localized to a specific function, they would not.

Another alternative is a method like Wordnet.traversals(source, target) that is similar to wn.taxonomy.shortest_path() but returns the relations instead of the synsets and would also work for senses. But it wouldn't guarantee that the path to a given synset is the one that was originally traversed.

@jmccrae
Copy link

jmccrae commented Nov 22, 2024

My preference is still for a different traversal method, it seems much simple, but perhaps @fcbond has another opinion?

@goodmami
Copy link
Owner Author

I can think of three reasons for getting the relation objects that aren't solved by the existing API:

  1. To inspect the metadata on a relation
  2. To distinguish relations with the same source, target, and relation name with different metadata (related to (1))
  3. Given a target sense/synset from an interlingual query, to discover where the relation came from, especially when the target is *INFERRED*

(1) and (2) are easily solved with a method like .relation_objects(), but (3) is not (you'd need to go back to the source synset or sense, iterate over its relation objects, and find one or more that match the relname + target). Furthermore, in interlingual queries, the targets of synset relations are not in the same lexicon as the one being queried (e.g., searching for hypernyms of a synset in omw-fr uses relations from omw-en, then the target ILIs are resolved in omw-fr).

If we don't want to make Sense and Synset objects more stateful than the already are, here's an alternative that expands on the .relation_objects() method: .relation_map(). It returns a dictionary where the keys are the relation objects and they map 1-to-1 to resolved targets. This way you can deterministically identify the relation used to arrive at some target.

Synset.relation_map() -> dict[SynsetRelation, Synset]: ...

@fcbond
Copy link
Collaborator

fcbond commented Nov 24, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants