-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prevent using of "short" annotation ID and release MMIF 2.0 #228
Comments
I can see some value in always having However, I really like the idea of having |
Oh, I just realized that what I thought was option 2 is really another option. |
If |
While working on clamsproject/mmif-python#285, I realized that allowing the short form for annotation id values within the view "scope" actually introduces un-recoverable ambiguity, and hence we MUST stop doing this. Here's an example of output MMIF from an imaginary spell correction app. {
...
"documents": [
{
"@type": "https://mmif.clams.ai/vocabulary/TextDocument/v1/",
"properties": {
"id": "td1",
"location": "file:///var/archive/spell-errors.txt" }
}
],
"views": [
{
"id": "v1",
"metadata": {
"app": "http://apps.clams.ai/awesome-spell-corrector/v7",
...
},
"annotations": [
{
"@type": "https://mmif.clams.ai/vocabulary/TextDocument/v1/",
"properties": {
"id": "td1",
"text": { "@value": "very correctly spelled text" }
}
},
{
"@type": "https://mmif.clams.ai/vocabulary/Alignment/v1/",
"properties": {
"id": "a1",
"source": "td1",
"target": "td1" # ???
}
},
...
]
}
]
}
|
Yes, that does appear to be an oversight since we nowhere stated that annotation identifiers should not be the same as identifiers in the documents list. |
Last time I and @marcverhagen talked about this in a in-person meeting, we temporarily decided to update the SDK code to use "long_id" everywhere, but keep the MMIF version 1.x. That seemed to work with MMIF spec version. But now that I think of If we update the SDK to use long_id everywhere, the new SDK won't be able to read existing mmif files. Hence, to mark the boundary between that compatibility I firmly believe that we MUST release |
Because
The generosity of allowing short form for annotation
id
values provides almost no benefit (besides saving some bytes) while adding lots of technical debt and bugs. For example, we have this code in visualizerhttps://github.com/clamsproject/mmif-visualizer/blob/9019c154fccde11a605b77d412ddaca3e7be566c/ocr.py#L65-L68
but
source
andtarget
values inAlignment
annotations almost always use the "long" form (prefixed with view ID) of the annotation ID since lots of alignments are done across views, and hence these checks are destined to fail and introduce silent errors.I suggest we allow only using long form of the ID in everywhere in MMIF, so that
ann.id
andann.long_id
always return the same value. This is major change in MMIF spec, hence leads to MMIF 2.0.Done when
Either
==
checks withann.id
value to a fixed standard and educate all existing and future developers to use the same implementation,ann.id
to returnann.long_id
.Additional context
No response
The text was updated successfully, but these errors were encountered: