-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DTD 1.1 #52
Comments
Why in the lemon RDF vocab, a lexicalEntry has a |
These are all related to the original formats. The modelling of I would see |
Thank you, I thought we could change the schemas and DTDs in this repo freely. It would make sense to use the same terminology on both if possible. |
We can change things of course, but there needs to be good reasons to make changes with the precedents of previous formats. I guess I can close this issue? |
No other GWA member want to make a comment? I would vote to adapt the XML and RDF schemas to a single terminology. |
I'm not a voting member but I'll add that I agree with John. We may not live in the best possible world, but we shouldn't break backward compatibility only for (effectively) aesthetics. You may be interested in #43, however. |
The problem is how to define if a modification is only aesthetic. But fine , good to have more opinions. Thank you for the link to the other issue. |
Fair point. For me, I'd ask if the change allows us to do something we couldn't do before, or prevent us from doing something we could do before. If not, it's aesthetic (or "non-functional", etc.). For example, |
I more or less agree with you. Instead of the vague "what you can do and can't", I'd suggest reasoning in terms of information. Some changes are indeed cosmetic such as renames (no info brought in or removed).
In this case, the PartOfSpeech attribute trickles down from the file's name to Lemmas and Synsets where it hardly brings new information (except for the tricky adjectives which can split into a or s). Of course you need it after merging but it can be derived and recorded then. Also, we want maintenance scripts to find it suspicious for wn-noun files to contain Lemmas with verb parts-of-speech. It is assumed that PartOfSpeech is propagated up from (unique) Lemma to LexicalUnit if need be. Because we don't want to repeat it at both levels. But it is a "matter of interpretation" as you say because inheritance does not usually flow from child to parent. I've already expressed LexicalEntry and Lemma is a one-one relation and the tags should be merged. We don't need them separate. The current discussion but illustrates this point I am making and is virtually endless: either the PartOfSpeech is propagated down from parent to child or propagated up from unique child to parent. Who cares ? But one may question whether we should have a parent-child pair here. |
@1313ou thanks for the further thoughts. While I only meant my definition as an informal rule of thumb, I agree that framing it in terms of encoded information instead of capabilities is better. I'm not convinced that merging |
Are they distinct entities ? |
It may help if we think of
I think that is incorrect as the lemma is a realized form. It's just the canonical/dictionary/citation form. Also, not all wordnets use
I wonder if we're talking about different things, as this seems backwards. The senses shouldn't change for alternative forms of the same lexical entry, but we could imagine that the syntactic behaviour could change (e.g., plural nouns in English not requiring a determiner). Currently we do not have a way to encode relationships between |
I'll give you that, though the DTD fails to capture this inheritance: it just copies the element definitions. Both have Pronunciations, and Tags. I should have said 'is inflected as' or dropped the 'i.e. ..' altogether. But as you note, a lemma acts as a name ("citation"), so it stands for what it names. Having a parent and a unique child is aesthetic in your terms. It doesn't add information. But it is ineffective in that it scatters information and more steps are required to retrieve it. Non-collapsing them would make (more) sense if multiple lemmas were allowed for a lexical entry (for instance color + colour, realize + realise) following the practice of what most dictionaries do. The LexicalEntry tag could then group these lemmas and give substance to the feeling they refer to one and the same entity. The current DTD leaves no option but to have separate multiple lexical entries that are grouped through synset membership. Mine is a database-design principle, as often here, that seeks effectiveness but I can grant you a point of view based on fine-grained concepts is also legitimate.
As I advocated elsewhere SyntacticBehaviour is attached to senses. As such it shouldn't be here in the first place, but further down, under the Sense tag. Added to that, the current DTD definition can't make a difference between reference and definition. So it merges them into one tag with
This makes it mandatory to repeat 'Somebody ----s somebody' 4525 times throughout the English WordNet database for instance. And it's too permissive because it fails to capture that either id OR senses is required. Otherwise, if you want a bag to put just about anything, here is the perfect fit. |
Why partOfSpeech is an attribute of the Lemma and not an attribute of lexicalEntry?
The text was updated successfully, but these errors were encountered: