Skip to content

Scoping Document

Jeffery Antoniuk edited this page Aug 25, 2017 · 6 revisions

General background

What are the objectives of the entity form?

  • Entity lookup (to include lookups to additional external sources and the soon-to-be implemented TEI entities)
    • person (TEI)
    • organization (TEI)
    • place (TEI)
    • title (MODS as of 2017-08-21)
  • Entity creation/modification forms (to allow CWRC users to create/edit TEI entities)

What are the problems is the entity form trying to solve? And why is the current one not good enough?

Current entity lookup dialogs:

  • has fewer lookup sources than it is desired (see wiki - Entity sources)
  • are not easily configurable by people who install CWRC_Writer in another environment and want to use the entity lookups but with additional/fewer/different sources)

Current entity creation/modification forms

  • are CWRC schema tied
  • can't be configured to have required fields without modifying the schema (e.g. can't make the standard name field required in the form without customizing the TEI to make the type attribute required on the name tag)

What is the list features envisioned?

  • List of projects is a way of solving the connundrum of having Orlando and likely REED maintain separate entity collections.

Specific questions:

  • what will the dialogs connect to?

    • entity creation/modification dialogs will connect to the entity collections within CWRC; these dialogs are relevant to CWRC users only, since CWRC-Writer external users will not be able to save to the CWRC environment.
  • what are the expectations of interface they will incorporated into

    • CWRC Writer AND Islandora
  • what are the expections on the back-end storage

    • the dialogs will create raw XML (TEI valid) that would get saved into a Fedora datastream
  • what are the expectations on the back-end search

    • That the TEI tags/attributes included will be indexed by SOLR (we need to define the fields and pass them to be able to add them to the search interface, e.g autocomplete for the entity lookups in the MODS forms and the CWRC-Writer dialog lookups)
  • what are the sources of material and exceptions on the different data interachange formats for the different data sources (how to add new ones)

    • see below - Entity Sources
    • in terms of EAP (entity aggregation page) - handle as VIAF
  • when will CWRc auth be required?

    • for entity creation/modification - all authentication will be handled from within CWRC
    • for entity lookups - if accessed by external partners - treated as anonymous users - will only see public entities

Entity management for different CWRC-Writer implementations

Is this for both the sandbox version and the islandora version? Are there any other CWRC installations? And are we assuming that someone who installs another CWRC-Writer with some other backend will create their own dialogs? They can use the wikidata/cwrc dialogs, but if they want to add another entity store, then they'll modify the dialog code directly, i.e, create their own entity lookups that follow the API we've define for an entity lookup component?

Mihaela:

Entity lookup - both Islandora and sandbox. Entity creation - both Islandora and sandbox (with the "switch" turned to WikiData and the option to turn it back to CWRC disabled) No other CWRC installations for now but the CWRC main installation (CWRC beta) . We are not assuming that someone who installs another CWRC-Writer with some other backend will create their own dialogs. They could use the "out of the box dialogs", with the CWRC-defined lookups and WIKIDATA entity creation option. If they want to customize the dialogs (e.g. adding additional entity lookups, saving their own entities somewhere different from WikiData, etc.), they should be able to do so.

Dialogs customization

James:

Yes, anyone who wants to use the default dialogs can. But, you said: "If they want to customize the dialogs (e.g. adding additional entity lookups, saving their own entities somewhere different from WIkiData, etc.), they should be able to do so." It's the "should be able to do so" part that's unclear. How should they be able to do so?

Mihaela:

On their end, provided the code of the out-of -the box dialogs is well documented, which I am sure it will be.

James:

Still not clear on what you mean by 'on their end'. Would they clone our dialogs repo (i.e., this very Github repo), modify the javascript code, optionally publish to NPM, and then import their newly modified dialogs into the CWRC-Writer (as we do now, e.g., https://github.com/cwrc/CWRC-GitWriter/blob/master/src/js/app.js#L28-29)? I just want to be perfectly clear about this, because I think the early suggestion was that we would build in some more 'configurable' means to register new entity stores.

Mihaela:

I think the scenario you outline should be correct for the entity creation dialogs. To my understanding, the configurable way to register new entity stores that we talked about refered to the entity lookups. @SusanBrown , please correct me if I am wrong. Also, I think it would be highly complicated and time consuming if you tried to make the entity creation dialogs highly configurable, am I right, @jchartrand ?

James:

Yes, introducing some way to allow people to configure entity creation dialogs with some other backend store would be extremely time consuming, and would make the code far more complicated. Even making entity lookup configurable would be very time consuming and complicated. In my opinion, the downsides (complexity, ongoing maintenance, using up all our magic dust to predict the future, i.e, what people might want) outweigh the benefits tenfold or more.

Susan:

Allow people with non-CWRC installations to "create their own entity lookups that follow the API we've define for an entity lookup component?" Yes: support this by ensuring that the API and the way the entity dialogs interact with it are well documented.

Method of modification: "clone our dialogs repo (i.e., this very Github repo), modify the javascript code, optionally publish to NPM, and then import their newly modified dialogs into the CWRC-Writer"

It would be good if configuration could extend to allowing people to choose which of the entity stores that we point to are desired by them. They won't want them all. That means we can incorporate lookups for i.Sicily, (for instance from that other big LOD epigraphy project if they have entities as I am assuming they might, or from Pelagios for instance, which won't be relevant to people working in more recent periods), and they won't have to appear in everyone's version of CWRC-Writer.

It would also be nice to have a quick-and-dirty lookup option that would allow someone to add a different entity lookup source by pointing at a standard sparql endpoint. It might produce ugly results but it would be a start for those without more technical resources and a preferable alternative to adding other entity values by hand (and we should make it an option to do this rather than relying 100% on lookups).

Inventory of CWRC instances

James:

You also said "No other CWRC installations for now but the CWRC main installation (CWRC beta)". Is this CWRC beta installation another name for the Islandora version, or is it yet another version? Just so we are all clear, could you explicitly list the installations that we are talking about? Even better would be specific IP addresses for the installs. The sandbox, for example, is http://208.75.74.217.

Mihaela:

Precisely.I meant the installation at beta.cwrc.ca. We also have a number of additional sandboxes which are mirrors of beta at different stages of the beta code, but I don't think we need to have the new entity dialogs backward compatible with any of these mirrors. @jefferya , please correct me if I am wrong and also provide the IP addresses for the installations you think should be compatible with the new (TEI) dialogs.

James:

Again, to be perfectly clear, beta.cwrc.ca, is the Islandora install that Andrew/Jeff are working on upgrading to the new CWRC-Writer? So, the two repositories will use the new dialogs with (at least initially) are beta.cwrc.ca and the sandbox (http://208.75.74.217/)?

Susan:

Confirmed, seeing if @jefferya has anything to add.

Jeff:

  • production yes. First starting with dev/test servers (e.g., Andrew has a local vagrant of Islandora with the CWRC-Writer and integration modules. All the dev and test boxes should be updated with the test TEI entities including the multisites - need to liaise with modernistcommons.

CWRC entity service API

We talked about the API for the CWRC entity service (to do lookups and to author new entities) a bit during our last meeting, but I don't know if we ever identified the best documentation for the Entity API? Could someone identify that here in this issue (i.e., provide the URL for the docs)?

Mihaela:

According to Jeff (email approx. 1 week ago): best bet for documentation: https://github.com/cwrc/cwrc_entities#entities or the CWRC-Dialogs GitHub

XML schemas

James:

In those docs, it says that the data that should be passed into a create entity call is "XML entity object as per the CWRC schema". Just to tie this off, could you give me the link to that schema (to be sure I've got the right one, and the right version, etc.)?

Mihaela:

The cwrc entity schema is here: http://cwrc.ca/schemas/entities.rng We want - however - to replace these CWRC entities with TEI entities - so this schema will become obsolete once we transform all the existing CWRC entities into TEI entities and the new dialogs that would create TEI entities will be implemented. I expect to finish drafting the TEI entity templates by the end of this week, after which @SusanBrown and I talked about consulting with some prosopography experts on the resulting templates. To give you an idea about the look and structure of a TEI entity, please refer to the model I created for the TEI person entity: https://github.com/cwrc/CWRC-Schema/blob/master/templates/person_entity_TEI_template.xml

Absolute URIs

James:

And also, the docs define relative paths for the GET, POST, and PUT. Could someone tell me the absolute URI (i.e., the server where the best version of the entity service is running - so I can start hooking up the new dialogs to that service)?

Mihaela:

Not sure what you mean by absolute URI (i.e., the server where the best version of the entity service is running). We will, I think, end up having multiple entity collections within this collection in beta: http://beta.cwrc.ca/islandora/object/cwrc%3AentityCollection.

Jeff:

  • URL of the entity endpoint should be part of the conf option passed to the dialog (so it will work on test servers) Examples:
    • //{SERVER_NAME}/islandora/cwrc_entities/v1/person/{PID}
    • //beta.cwrc.ca/islandora/cwrc_entities/v1/person/{PID}

Changes to the back end

James:

Both suggest that the entity service will at some point undergo pretty significant changes (to add support for multiple collections and to change the schema), and so perhaps I should hold off until that is ready? If so, should I still continue on with the wikidata portion?

Mihaela:

The backend (i.e. Islandora) changes to the entity management system are in Jeff's work queue and in the work queue of the yet-to-hire Drupal developer. Jeff has a few other priorities ahead of it (triplestore swap, Compute Canada cloud migration, etc.), so it would be useful if you could work on the wikidata dialogs independently. Do you think this would be an effective way to go?

James:

yes, I'll work on the wikidata bit independently. There is still the outstanding question I asked in yet another email, i.e., should we really be building dialogs to interact with wikidata for entity creation/edit, or would it not be significantly easier (we wouldn't have to deal with authentication for example) to just point people at the existing wikidata page for entity creation when they do want to create an entity in wikidata? Again, wikidata search would still be directly incorporated into the cwrcWriter.

Susan:

I am nervous about delaying for reasons of relative priority.

Hopefully once we get the schema in place we can move forward to some extent.

When @jefferya is back we should see whether the multi-collection thing will complicate things much. I would hope not, in which case we might give it priority so that this work can go forward. It's essential to Orlando migration too.

Jeff:

  • first step I see is identifying all the changes required to the back-end; not just the entity dialogs (schema and project) but also things related to editable templates and what's needed for the integrating images into CWRC-Writer for side-by-side usage (if that's the write word). Also, the NPM CWRC-Writer version. Some of these features may require substantial changes and it might be easier to start from scratch rather try to understand the cwrc/islandora_cwrc_writer and cwrc/islandora_cwrc_document modules.
  • multi-collection: there is a note below

Entity modifications

James:

And actually, one more question: the entity docs define the GET (search), POST (create), and also a PUT (update). I assume we aren't handling updates as well with this? Presumably if someone wants to update an entity entry they could do so through the main entity CWRC entity forms (i.e., outside of the CWRC-Writer)?**

Mihaela:

I believe we want to be able to also edit existing entities in the dialogs (this was a feature available at some point in the old dialogs, I think) At the moment, there is no option to edit the entities from the CWRC interface.

James:

As I outlined in an email yesterday, has anyone considered building an independent web page for interacting (search, add, edit) with the entity service? That might be a better use of time than building edit into the cwrc-writer.**

Mihaela

To some extent, such an interaction service is already available in cwrc beta, as users can search and facet for entities within the repository. Any entity URI redirects to what we call the entity aggregation page for that entity, which is an on-the-fly web page that uses API calls to grab and list all the objects in the repository that reference said entity, together with details from the entity record itself (i.e. Pauline Johnson, but the EAP service is extremely slow now) As for creating and editing entities within CWRC, this has proved problematic in the past. As you know, there is no easy way in Islandora to create /modify xml documents (hence the complexity of the CWRC-Writer integration. ) There are some Islandora XML forms created by EMiC that were meant to be used with their critical edition toolkit, but to my knowledge they were never implemented in CWRC and upon cursory testing a while back, they were found to be buggy. Another option we considered was to rig the CWRC-Writer entity creation/modification dialogs to be called from within the CWRC interface, but I am not sure where that went (can dig it out and get back to you).

James:

It has to be easier to create an independent page to add/edit entities to cwrc than to do it through the cwrcWriter? And if the entity service (edit/add) is to be usable by other projects, surely there's got to be an independent page?

Susan:

I believe that the "Modify" button for editing entities always took one to a separate page. OK to continue with that.

Jeff:

"Modify" allowed editing within the Dialogs. HTTP POST is both create and edit. HTTP was never used.

CWRC Authentication for the entity service

James:

How does authentication work with the entity service? The docs only mention that a 403 might be returned: "In general 403 means an authenticated user attempted some action and was denied. Either the authenticated user does not have a Drupal role that grants him/her permission to perform that action, or XACML denied the action for that user." Is there any more documentation about how to authenticate against the service, but outside of Drupal?

Mihaela:

I think it's both: the authenticated user's Drupal role should have the permission to perform that action AND the XACML policy of the collection where (s)he wants to create the entity has to give manage privileges to the authenticated user's Drupal role. Have you checked the old dialogs github repository for authentication documentation? If what 's there is not enough, I can dig in and ask Jeff.

James:

Yes, there is some documentation in the readme of https://github.com/cwrc/CWRC-Dialogs, but I am admittedly still confused. If Jeff has any more guidance that'd be great, and I'll also ask Andrew. We should get this documentation into the repo for the cwrc entity service itself (https://github.com/cwrc/cwrc_entities) since I think the service is meant to be used independent of the dialogs? But again, it sounds like this service will soon change anyhow (to support multiple collections and to change to a TEI schema) so maybe the documentation can be updated then? Any timeline on when the cwrc entity service will be updated?

Mihaela:

Don't have a precise timeline for this, but I suspect best optimistic prediction would be in the fall.

James:

ok, we should split the cwrc entity portion off from this issue then into it's own GitHub issue, so we know to come back to it in the fall?

Susan:

Agreed that it will be fall. And we may want to speed things up by pulling you into working on it more directly.

Jeff:

@James, can you clarify your question in this section? What parts are confusing? Are there specific call that are confusing?

  • The entity REST api is completely independent of the dialogs.

  • 403 message depends on the setup of the Drupal permissions

  • All documentation related to entities in https://github.com/cwrc/CWRC-Dialogs should be considered obsolete and useless (it only causes confusion).

  • Authentication: using the general authentication service within Drupal that is not specific to any one module - does this help answer your question James?:

    • within the CWRC repo domain (i.e., with Drupal) the dialogs will not need authentication as Drupal adds the session information to the call
    • my understanding, the dialog will not be used with authentication outside CWRC (i.e., public objects will be visible but no editing. If this is wrong, notes here private link

Entity sources

Could we list here the various entity sources that should be used for each of the entity types? For people it looks like VIAF, CWRC, and Wikidata? How about the other entity types?

Susan:

Entity sources: For now (new ones bolded; listed in order of priority) Person: CWRC, VIAF, Getty ULAN (http://www.getty.edu/research/tools/vocabularies/ulan/), DBpedia, WikiData Organization: CWRC, VIAF, DBpedia, WikiData, Place: CWRC, GeoNames, Google GeoCode, DBpedia, WikiData,

We will almost certainly want to add others down the line as possibilities (see what I said about being able to choose which ones you want to show up in the dialogs for a particular installation of CWRC-Writer.


Project logos:

What are the boxed X's beside the names in the entity lookup dialog in your sketch?

Mihaela:

Project logos, which should populate based on the entity collection the entity is stored in (In the example included in the sketch, it could be the CWRC logo, the Orlando Logo or the Reed logo) CWRC: http://cwrc.ca/logos/CWRC_logos_2016_versions/CWRCLogo-Vert-FullColour.png Orlando: see attached REED: http://reed.utoronto.ca/ (Susan will get the logo file from them; will update you when we have it)

Jeff

  • perhaps consider a lower res image to improve initial load time?

Form mockups clarifications:

The top right corner of the first page of your sketch has a radio button (or similar) to switch between CWRC and Wikidata, with text: 'Where to create. Both options should be available to project members only', but the next page of your sketch has a box that reads 'You have selected a project you don't have access to. Please select another one.', which kind of suggests that someone who wasn't a project member was shown an option that they shouldn't have been shown?

Mihaela:

Page 2 is a result of selecting the "wrong" project in the responsibility dropdown (at the bottom of the form. The "switch" at the top would be available to CWRC members that are also members of at least one CWRC affiliated research project (only people who meet this double condition could create new CWRC entities, though they may choose to create a new entity in WIkidata - hence the availability of the "switch"). Some projects within CWRC will have separate entity collections (i.e.that they prefer to curate separately from the pool of CWRC entities). In the backend of the CWRC entity system (i.e. Islandora), we will setup separate entity collections for these projects and give editing permissions for those collections only to users who are member of those projects (e.g. only an Orlando member should be able to create an entity in the Orlando person entity collection), while members of any CWRC-affiliated projects could save a new entity to the CWRC entity collections. See the If...else rules outlined on page 1 . On page one, the user (a CWRC member and the member of a CWRC-affiliated project, since she has the option to save either to the CWRC entity management system or to Wikidata), selects REED in the responsibility project dropdown, which would prompt the system to save the entity in the REED entity collection, but only if the user has editing privileges for that collection (i.e. is a member of that project). If she doesn't, if she's a member of CEWW for example, but not of REED, she should be getting the feedback outlined on page 2. (In this particular case, she should be able to save to the CWRC entity collection only)

James:

Is there a URI from which I can get the list of projects, to show in the dropdown (rather than hard-coding the project names into the form)? Or will there be such a URI? If it is yet to come, when will it be available?**

Mihaela:

There is no such list at this point. Since we are not going to list all the CWRC projects, but only the ones that will have curated entity collections (REED, Orlando and CWRC), I think it makes more sense to have them hardcoded into the dialogs. If you remember the dialog mockups I shared with you this week, the selection of a project name would dictate the collection in which the entity would be saved.

James:

Deciding who is authorized to access which collections will I guess be handled somehow in Fedora? I think my question here is probably the same as the one I asked just above in 2d, i.e., how does authentication/authorization work with the CWRC entity service when the service is used outside of Islandora (Drupal), i.e, is used simply as a REST service? Or is there documentation about how something like the CWRC-Writer, which is embedded in the Drupal page, can make authenticated XMLHttpRequests to the server?**

Susan:

Confirmed

Jeff:

  • Projects: I'm against hard-coding the projects in the dialogs. My feeling this should be a JSON formatted config file passed to the dialogs by the Drupal instantiation (and either hard-coded in the Drupal module or user editable via the Drupal admin page for the Drupal module if any chance the list may change). The ideal in my opinion, keep the CWRC dependencies close to the CWRC repo.
  • Authentication:
    • within the CWRC repo domain (i.e., with Drupal) the dialogs will not need authentication as Drupal adds the session information to the call
    • my understanding, the dialog will not be used with authentication outside CWRC (i.e., public objects will be visible but no editing. If this is wrong, notes here private link

Authentication for entity creation within CWRC

**How do we know whether someone who is using the sandbox belongs to a given project or not (and so whether we should show things specific to that project)? The authentication for the sandbox only authenticates with GitHub. I suppose we could have GitHub projects for each editing project, and check against the Github project membership?

Mihaela:

The question is moot, I think, since this functionality (creating/editing entities within the CWRC entity management system) would only be available within the CWRC integrated CWRC-Writer . The GitHub sandbox would have lookups to all the sources listed at #3 above but the entity creation form for the sandbox would only have the Wikidata option available. Last but not least, external partners installing CWRC-Writer on their platforms will make their own decisions about how to adapt the dialogs to support saving to their entity management systems.

James:

Is this definitely the case that "The question is moot, I think, since this functionality (creating/editing entities within the CWRC entity management system) would only be available within the CWRC integrated CWRC-Writer."? Couldn't there feasibly be a CWRC-affiliated project that stores it's files in GitHub, and so they use the sandbox, but they still want to use the CWRC entity system? Or is that getting too complicated?

Mihaela:

I don't believe there would ever be a case where we would want sandbox users to be able to manage the CWRC entities, given the fact that the management of the CWRC entities has been restricted by a CWRC research board ruling to CWRC members who are also affiliated with at least one CWRC research project.

Susan:

Confirmed