Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] First attempt at using Open Annotation Data Model #83

Closed
wants to merge 10 commits into from

Conversation

Treora
Copy link
Contributor

@Treora Treora commented Jul 25, 2014

Annotations can now be serialised into JSON-LD formatted RDF, following
the data model spec: http://www.openannotation.org/spec/core/

Addresses #74

@BigBlueHat
Copy link

Woot! 😄

Can you paste an example of what the output JSON-LD created by this looks like and at what URL(s) it's available?

Thanks!

@Treora
Copy link
Contributor Author

Treora commented Jul 25, 2014

Here's an example annotation. Put the whole thing into the JSON-LD playground to see the data in different formats.

Note this is by no means the final version. I took the freedom to introduce Annotator's selector as http://annotatorjs.org/ns/TextRangeSelector, but this can of course be changed to anything.

The JSON-LD representation is not yet available at any URL. There are two questions here though: which URL(s) is it to be available, and which URI does the annotation itself have (preferably these overlap). The latter question seems more important to me and is not yet addressed in this first attempt. It probably depends on the particular application which URI they wish their annotations to have, and a base URI can be specified to achieve this. The default URI could be the URL that serves this individual annotation, e.g. /annotations/<id>. It would be nice to use the same URL for both the plain json annotation and for the json-ld OA format, as they are just different representations of the same resource. Content negotiation should be used to pick the right representation.

{
  "@context": {
    "cnt": "http://www.w3.org/2011/content#", 
    "prov": "http://www.w3.org/ns/prov#", 
    "annotator": "http://annotatorjs.org/ns/", 
    "oa": "http://www.w3.org/ns/oa#", 
    "dc": "http://purl.org/dc/elements/1.1/", 
    "dctypes": "http://purl.org/dc/dcmitype/", 
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#", 
    "xsd": "http://www.w3.org/2001/XMLSchema#"
  }, 
  "@id": "4zAZEVMpTvq3hQjp8vzBnA", 
  "@type": "oa:Annotation",
  "oa:hasBody": [
    {
      "cnt:chars": "bla bla blub", 
      "@type": "cnt:ContentAsText", 
      "dc:format": "text/plain"
    }
  ],
  "oa:hasTarget": {
    "oa:hasSource": {
      "@id": "http://localhost:4000/dev.html"
    }, 
    "oa:hasSelector": {
      "annotator:endContainer": "/ul[1]/li[1]", 
      "annotator:startOffset": 0, 
      "annotator:startContainer": "/ul[1]/li[1]", 
      "@type": "annotator:TextRangeSelector", 
      "annotator:endOffset": 26
    }, 
    "@type": "oa:SpecificResource"
  },
  "oa:annotatedBy": "alice",
  "oa:annotatedAt": {
    "@value": "2014-07-25T19:17:36.121460+00:00", 
    "@type": "xsd:dateTime"
  },
  "oa:serializedBy": {
    "foaf:name": "annotator-store", 
    "@id": "annotator:annotator-store", 
    "@type": "prov:Software-agent", 
    "foaf:homepage": {
      "@id": "http://annotatorjs.org"
    }
  }, 
  "oa:serializedAt": {
    "@value": "2014-07-25T19:17:36.121478+00:00", 
    "@type": "xsd:dateTime"
  },
  "oa:motivatedBy": [
    {
      "@id": "oa:commenting"
    }
  ]
}

@Treora
Copy link
Contributor Author

Treora commented Jul 28, 2014

A question now is how would we want the content negotiation to happen? It appears that there is no mimetype associated with JSON-LD "because it is just JSON". I do not agree with that decision, as it is a specific and well-defined usage of json, so there should be a mime type that uses the +json suffix [1]. Alas, application/rdf+json is already claimed by another rdf as json format [2].

An alternative to content negotiation is to simply have another URL to obtain the JSON-LD version, but that seems somewhat silly.

@tilgovi
Copy link
Member

tilgovi commented Jul 28, 2014

@tilgovi
Copy link
Member

tilgovi commented Jul 28, 2014

I'm happy to accept a PR that adds the API without adding to store.py first. CONNEG and REST in another step.

@tilgovi
Copy link
Member

tilgovi commented Jul 28, 2014

The instance data should be moved to the top of the class (jsonld_namespaces, jsonld_baseuri). Even though I understand you did it to match the class and property names of the data model, Python isn't RDF or JSON and so I feel the properties should be snake_case.

@Treora
Copy link
Contributor Author

Treora commented Jul 28, 2014

"application/ld+json"

Thanks. I was using exactly that one for now, but I could not remember where I had taken it from. I already made the CONNEG and REST stuff so if it's okay I'll just include it in this pull request.

@tilgovi
Copy link
Member

tilgovi commented Jul 28, 2014

@Treora make it a separate PR. The discussion is cleaner that way. And perhaps @nickstenning and @azaroth42 can take a look at this in isolation and we can get it merged more quickly.

@tilgovi
Copy link
Member

tilgovi commented Jul 28, 2014

If we want to continue to support Python 2.6 we can also add a conditional test against the Python version when declaring the requirements in setup.py if it's desirable to run tests on Travis over the JSON-LD work.

@azaroth42
Copy link
Member

A few simplifications are possible:

  • Just point to the W3C context in @context; l72 could read:
context = "http://www.w3.org/ns/oa-context-20130208.json"
  • L120/121, L134/135: Not sure if there's some magic, or if there'll be just one @type here? Looks like a bug from first read.
  • L110 makes the assertion that tags are about the Target, not about the annotation. I'm happy with that ... please can it be reflected in the documentation?
  • L150/152 could be just motivations.append("oa:(name)ing") -- the context does the mapping to @id
  • L169 should add a new @context to define annotator: prefix. Then its only included when it's needed, not all the time (and makes the @context line at the top correct). See this thread for more info, though: http://lists.w3.org/Archives/Public/public-linked-json/2014Jul/0011.html
    • L189 Semantify 👍
    • L218 Context already defines xsd:dateTime so no need to do it here too. (Yes, the spec document for serialization doesn't list this... That would be a bug!)

Thanks for the work!!

@azaroth42
Copy link
Member

Regarding the contexts, you could also do:

context = ["http://www.w3.org/ns/oa-context-20130208.json", {"annotator" : "http://annotatorjs.org/ns/"}]

And then ignore the comment about L169

@Treora
Copy link
Contributor Author

Treora commented Jul 29, 2014

Thanks for the comments. I did not know w3c published the OA JSON-LD context, that's a nice service, makes our code and JSON-LD documents cleaner.

@@ -40,6 +52,8 @@ class Annotation(es.Model):
__type__ = TYPE
__mapping__ = MAPPING

jsonld_baseurl = ''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels better to leave this out to me. Otherwise we might end up setting a base=''. I don't know what JSON-LD processors are supposed to do with that. Let the caller add it to the context if necessary, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's nice to have some default setting for it, so it's immediately obvious that it can be set, there's a place for its documentation, it shows up in tab completion in IDEs, etcetera. Setting it to None seems to be the elegant approach here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you have to guard the setting so you don't have null in the JSON there.
On Jul 29, 2014 4:09 PM, "Gerben" [email protected] wrote:

In annotator/annotation.py:

@@ -40,6 +52,8 @@ class Annotation(es.Model):
type = TYPE
mapping = MAPPING

  • jsonld_baseurl = ''

I think it's nice to have some default setting for it, so it's immediately
obvious that it can be set, there's a place for its documentation, it shows
up in tab completion in IDEs, etcetera. Setting it to None seems to be the
elegant approach here.


Reply to this email directly or view it on GitHub
https://github.com/openannotation/annotator-store/pull/83/files#r15558751
.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the @base was already conditional on the trueness of jsonld_baseurl. (so in fact it would not even end up setting @base="" by default like you feared)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Seems fine but unnecessary.

@tilgovi
Copy link
Member

tilgovi commented Jul 29, 2014

Looking really good!

@nickstenning
Copy link
Member

I'm very happy to defer to the experts (@azaroth42) on the subject of whether the serialization is good. It certainly looks very sensible to me.

That said, from a programming point of view, I think this should probably not be implemented as a property on Annotation, but as a separate concrete implementation of a Renderer interface, something like:

class OARenderer(object):
    def render(self, annotation):
        out = OrderedDict()
        out['@context'] = self.context
        out['@id'] = annotation.id
        out['@type'] = 'oa:Annotation'
        out['hasBody'] = self._has_body(annotation)
        ...
        return out

This will make it much easier to test, and avoid the problem of having numerous methods on the Annotation class which are nothing about annotations, per se, but about a specific representation of an annotation. As things currently stand in this PR, I think most programmers would be a bit surprised to find that the tags property returns a list of oa:Tag dicts.

@Treora
Copy link
Contributor Author

Treora commented Aug 10, 2014

From a programming point of view there could be many improvements, for example using something like rdflib and pyld could result in more flexible code than this hand-crafted JSON-LD. I just started with an easy first step.
A question is whether an RDF representation is considered just as an output format, or if it is to become the native format at some moment, in which case it would make some sense to keep these methods in the Annotation class.

I think that, to achieve the vision of a distributed annotation network on the web, I would like to have some sort of graph-based, semantic-web-minded storage system at some moment, rather than using Elasticsearch as a primary database with a non-standard data format that can be presented as OA JSON-LD when requested. However, this idea is so far away from the current annotator-store, which is at its core an Elasticsearch wrapper plus a REST API, that it probably makes more sense to start such a system from scratch.

I guess that for annotator-store it would be okay to just have a JSON-LD renderer, as you suggested, and consider it a nice compatibility extra, so OA-based systems can read from it.

@tilgovi
Copy link
Member

tilgovi commented Aug 10, 2014

Agreed on all counts, Gerben. I do think there's something to keeping the
separation between model and renderer, though. Not sure how that should
look.
On Aug 9, 2014 6:56 PM, "Gerben" [email protected] wrote:

From a programming point of view there could be many improvements, for
example using something like rdflib and pyld could result in more
flexible code than this hand-crafted JSON-LD. I just started with an easy
first step.
A question is whether an RDF representation is considered just as an
output format, or if it is to become the native format at some moment, in
which case it would make some sense to keep these methods in the Annotation
class.

I think that, to achieve the vision of a distributed annotation network on
the web, I would like to have some sort of graph-based, semantic-web-minded
storage system at some moment, rather than using Elasticsearch as a primary
database with a non-standard data format that can be presented as OA
JSON-LD when requested. However, this idea is so far away from the current
annotator-store, which is at its core an Elasticsearch wrapper plus a REST
API, that it probably makes more sense to start such a system from scratch.

I guess that for annotator-store it would be okay to just have a JSON-LD
renderer, as you suggested, and consider it a nice compatibility extra, so
OA-based systems can read from it.


Reply to this email directly or view it on GitHub
#83 (comment)
.

@nickstenning
Copy link
Member

Does the code need access to any private information about the model instance in order to render JSON-LD? If not, it probably shouldn't be part of the model class, for a number of different reasons (testability, readability, security to name a few).

More generally, given that a model object has a potentially limitless number of representations, I think it's clear that we couldn't keep adding representation code into the model.

But I'm totally agreed that there need to be some pretty major changes to the annotator-store if it's to be more widely used, and it's worth noting that this repository was originally meant to be a reference implementation of the storage API, not a production-ready storage application.

I'm already firmly of the opinion that using ElasticSearch as a persistence layer is ridiculous, and am hoping to find time to change that soon. That said, the direction I want to move in is a database (AKA postgres) and not a triplestore.

@tilgovi
Copy link
Member

tilgovi commented Aug 11, 2014

+1
On Aug 11, 2014 5:22 AM, "Nick Stenning" [email protected] wrote:

Does the code need access to any private information about the model
instance in order to render JSON-LD? If not, it probably shouldn't be part
of the model class, for a number of different reasons (testability,
readability, security to name a few).

More generally, given that a model object has a potentially limitless
number of representations, I think it's fairly obvious that we couldn't
keep adding representation code into the model.

But I'm totally agreed that there need to be some pretty major changes to
the annotator-store if it's to be more widely used, and it's worth noting
that this repository was originally meant to be a reference implementation
of the storage API, not a production-ready storage application.

I'm already firmly of the opinion that using ElasticSearch as a
persistence layer is ridiculous, and am hoping to find time to change that
soon. That said, the direction I want to move in is a database (AKA
postgres) and not a triplestore.


Reply to this email directly or view it on GitHub
#83 (comment)
.

@azaroth42
Copy link
Member

No complaints from me, about any of it :) OA is intended for
interoperability between systems, not necessarily an internal model.
Given some of the graph boundary issues, I think that some sort of record
store (postgres, mongo, whatever) will be easier for everything other than
graph specific queries. And they could be more easily done by building
just the graph that's needed rather than maintaining all of the information
in a triplestore just in case.

@BigBlueHat
Copy link

To @nickstenning's point about this repo being a reference implementation, I'd like to highlight the following--and perhaps move this discussion elsewhere soon. 😃

Despite what the README currently states, this repo is actually an implementation of the Store API expected by the Annotator.Store plugin. There are other implementations of the Store API, but there could also be other storage-focused plugins that traffic in completely different formats.

Perhaps a good way forward on all these fronts is:

  1. Tighten up relationship and docs between the Annotator.Store plugin and this repo.
  2. (Re)curate a list of Store API implementations--I've found a few lists, and have an implementation of my own that I've not added to any of them...yet! 😄
  3. Encourage (perhaps) the creation of more Annotator plugins to work with other storage endpoints: Linked Data Profile, PouchDB, and AtomPub seem like fun options. 😉

Regardless, we've got a solid JSON-LD representation built by @Treora and I'm excited to see it in use. 😸

@nickstenning
Copy link
Member

There are other implementations of the Store API, but there could also be other storage-focused plugins that traffic in completely different formats.

Yes! And there will need to be, because this naive "REST" API is not going to work for shipping annotations between providers in the bold new world of interoperable annotations we're aiming for. I'm looking forward to discussing this with you in more detail.

@azaroth42
Copy link
Member

Discussing ... on the Open Annotation community group list? :)

@nickstenning
Copy link
Member

Why yes, @azaroth42. You read my mind!

@azaroth42
Copy link
Member

👍 🍻

@nickstenning
Copy link
Member

So, just in case I came across negatively -- I think this is great, @Treora! I'll happily merge this (and the attendant code to expose this repr in the API) if the object creation moves into a separate annotation renderer class with its own tests.

@Treora
Copy link
Contributor Author

Treora commented Aug 11, 2014

No worries, you don't sound negative. I was not aware it was originally intended as a reference implementation. If you wish to keep it more like that, it's totally okay with me to keep it separate (I could put it in Hypothes.is instead, so my effort would not be useless). On the other hand, it's perhaps also nice if there's a reference implementation for converting annotator's annotations to OA.

Treora added 2 commits August 11, 2014 15:53
Annotations can now be serialised into JSON-LD formatted RDF, following
the data model spec: http://www.openannotation.org/spec/core/
Because JSON-LD spec recommends keeping @context at the top.
Treora added 7 commits August 11, 2014 15:54
Let's keep the default very generic, for example we don't even know if
users are Persons. Implementors can subclass Annotation to add more user
info (as with any of the properties).
@gergely-ujvari
Copy link
Contributor

Closing in favor of #119

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants