-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for retrieving original XMLAnnotation value as string #273
Comments
both of these could perhaps work... |
Sure! The test is to encode the 3 strings below as XMLAnnotation Values, write them to an ome-xml document (inside a larger OME structure of course), then read that ome-xml document back in and recover the original strings exactly. The problem arises as soon as the XMLAnnotations are constructed, before the xml round-trip: from ome_types.model import OME, XMLAnnotation
values = [
'{"foo":1,"bar:"other"}',
' <outer xmlns="http://example.com/"> <inner>ABC</inner> </outer> ',
'hello <mis> </matched> world',
]
ome = OME()
for v in values:
ome.structured_annotations.append(XMLAnnotation(value=v))
def get_xmlannotation_value(a):
# What goes here?
...
values2 = [get_xmlannotation_value(a) for a in ome.structured_annotations]
for v, v2 in zip(values, values2):
print("before:", repr(v))
print("after: ", repr(v2))
print()
assert values2 == values, "value mismatch" I tried a number of different things without success. For example this still leaves a bunch of extra namespace declarations in values 2 and 3, and it seems like a lot of work go back and forth between text and DOM multiple times: import lxml.etree
def get_xmlannotation_value(a):
e = lxml.etree.fromstring(a.value.to_xml())
return e.text + ''.join(lxml.etree.tostring(c, encoding=str) for c in e) Tangentially, I learned that passing certain invalid XML breaks our
Note that the text "after" is outside Maybe we could handle this like the |
I guess there is also the small issue that actual XML content won't have whitespace preserved (like in my second example value) but if it's semantically equivalent that should be fine I guess. Here are 2 examples of actual XML content in XMLAnnotations I found on our OMERO server, for reference: <SegmentDefinition>
<Name>CCND1-high</Name>
<CollectionOrder>1</CollectionOrder>
<DisplayColor>#83e8f5</DisplayColor>
<BlueSelection>3</BlueSelection>
<GreenSelection>3</GreenSelection>
<YellowSelection>3</YellowSelection>
<RedSelection>1</RedSelection>
<Erode>1</Erode>
<Dilate>2</Dilate>
<HoleSize>160</HoleSize>
<ParticleSize>50</ParticleSize>
</SegmentDefinition> <ChannelIntensity>
<Name>Blue</Name>
<Minintensity>10</Minintensity>
<Maxintensity>3594</Maxintensity>
</ChannelIntensity> |
A quick but big 👍🏽 from my side for having strong support of XMLAnnotations in ome-types. This might be something that needs to be bubbled upstream as well so that we can achieve well-defined semantics. That's obviously a bigger project, but since the XMLAnnotation element says:
I could see wrapping any text like JSON with a minimal XML block. (We could go so far as to define a well-known element like |
This is off-topic but since the OME team has been summoned 😁 , why does PathViewer store raw JSON in XMLAnnotations in the first place rather than something like CommentAnnotation? Was it just to hide it from end users (since the interactive clients don't expose XMLAnnotations)? |
For XMLAnnotations, we know from the schema that "The contents of this is not processed as OME XML but should still be well-formed XML". Inside the OMERO code and the Postgres database, the value is indeed just a plain string and no XML processing at all is done. In fact I've noticed that Glencoe's PathViewer app even stores raw JSON in XMLAnnotations (no wrapping root element or anything)!
I'm trying to add support for XMLAnnotations to omero-cli-transfer, but the way ome-types represents the XMLAnnotation Value as a parsed DOM-like AnyElement structure is making it hard to ship XMLAnnotation content faithfully from OMERO A -> ome-xml -> OMERO B. In order to recreate an annotation in the receiving OMERO instance, I need to provide the Value as a string, but without the opening and closing
<Value>
tag. I tried callingto_xml()
on the annotation or the Value, then doing some DOM munging and re-serializing, but since Value can contain mixed XML content it's actually a little tricky and I'm not confident I can reliably obtain the exact original string. I'm currently leaning toward constructing ome_types XMLAnnotation.Value objects explicitly asValue(any_elements=[value_string_from_omero])
so any actual XML content ends up getting xml-escaped and not parsed, but this seems hacky. Do you have any other thoughts? Could ome-types be modified to retain the original string value in addition to the parsed AnyElement thing? Or maybe the AnyElement representation could be produced on-demand from the stored string rather than serve as the internal representation.The text was updated successfully, but these errors were encountered: