-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add paratext systemId elements to metadata 2.1.1 #17
Comments
see also #8 (review) (under ## add new elements to text metadata 2.0 under to capture alternative project types) |
@klassenjm how should we proceed on this to unblock Lois who is trying to upload to DBL a (transliteration) project that has already been uploaded? (https://thedigitalbiblelibrary.org/entry?id=19b20027201977c5)
|
My current thoughts: That we not support upload of the following as regular DBL entries (resource only uploads excepted):
It would be very nice if we could retain basedOn. What do we capture there? The PT project GUID? a name? If we could capture the GUID and see if that was already an entry in DBL - and then set up a relation in the DBL metadata -- would that be good? (I think so -- but I don't feel like I want to come across as though it's a requirement yet). |
@ericpyle @klassenjm Could we do this on the wishlist so we have some hope of finding the discussion when we come to make the decisions? https://github.com/ubsicap/dbl-archive-validation/blob/master/v2/2_1_1/wishlist.md |
Yes |
@klassenjm I know that Biblica and others were hoping we'd support Auxiliary, so they have a workflow that supports forking active projects before uploading it to DBL. I don't remember where that discussion landed given that PT Registry does not host those projects, except that perhaps the PT uploader could potentially borrow the needed metadata from basedOn (unless that just is not workable in all cases). also fwiw, DBL already has GlobalAnthropologyNotes as resourceOnly. Certainly the PT uploader should have certain precautions about projects that typically get uploaded, but resourceOnly does allow non-typical projects. We aren't technically running the same schema validation for resourceOnly uploads, but at the same time we are using the same metadata structure for transporting metadata with resourceOnly uploads. |
@mvahowe sorry, please move my comment to the appropriate place. I was expecting a pull-request type place to do the comment, but it doesn't look like that's what we can do on the document you linked to? |
@ericpyle I changed my "current thoughts". Will look at updating the wishlist. |
@mvahowe I'm a little confused about how to maintain a discussion in a wishlist? Please advise. @klassenjm what are your "current thoughts"? Re: basedOn.
Both, as you can see in the example snippet at the top:
I had asked @mvahowe or you about this in the past re: relationships element, and was advised against that (can't remember the reason). Perhaps it's in the old trello card. It would be good to support some kind of actionable linkage on the DBL webpage, at least. |
@ericpyle, add your comments under the others and make a PR for the changes, which I'll approve. (Adding stuff to the wishlist doesn't mean we'll do it, for now we're just collecting the ideas.) |
@ericpyle, insofar as I understand the problem we are trying to solve this looks ok. My main question is how urgently this needs to happen. If it's blocking something, we need to ship 2.1.1, which means discussing the other possible changes, finalizing the spec, consulting with partners, ensuring we can migrate all existing entries, writing the scripts to do that migration and then maybe migrating everything. How urgently does that need to happen? |
@mvahowe it's not urgent as far as 2.1.1 goes. I would just like confirmation of the data structure for projectType and basedOn so I can start using it for my resourceOnly uploads (which is not consumed by LCH partners). I suppose I could version my resourceOnly uploads metadata as "2.1.1" to be more consistent with the specification. Or maybe I could use "2.1.1r" to indicate it's not actually 2.1.1 but 2.1.1-like. |
@ericpyle But doesn't using them for resourceOnly uploads mean that the server needs to validate using 2.1.1? If so I think we need to fix the specification of 2.1.1 first. If we don't, what happens when we do fix the specification and 2.1.1 then means something else? |
@mvahowe I'm planning to do a python based validation or just a schematron validation for resourceOnly uploads. Otherwise we'd need to add resourceOnly as another expression of metadata, but I wouldn't want you to feel the need to have add or refactor what you've already done for all the others schemas. |
I'm quite happy to add a resourceOnly variant to the schema set and I suspect it will create less confusion in the long term (because, eg, any operation we run across all entries will need all entries to be valid according to the same schema.) |
@mvahowe if we made a resourceOnly variant, would you propose that gets added to type/isParatextResourceOnly? |
@ericpyle I think we could find a pithier label but, yes, something like that. |
@mvahowe can you suggest a pithier label? (Many regular uploads can become Paratext resources.) |
@mvahowe @smorrison one thing we need to think about is how to handle book lists for resourceOnly uploads. These will only have source/source.zip files. However, we do need some way for canons data to be able to communicate which books the archivist would like DBL to include in the pt resource download. Perhaps that's just a matter of every src pointing to "source/source.zip" and role being the book name? |
When someone does a resourceOnly upload -- if I understand what happens today -- they need a Canon in Paratext and Paratext only includes in the the upload the books mentioned in the Canon. What's the simplest way in the metadata we can refer to that list of books? What do resource downloads do today to know which books should be downloaded (the total of books mentioned in all Canons, I expect). Does it look at publications for that? If so, could we create one default publication for resourceOnly uploads? |
@klassenjm yes, a canon is needed, although by default in the absence of a project canon, an adhoc in-memory canon will be created on the fly based on all books present. That currently populates the contents/bookList in metadata 1.5 which fills a bookList table. That table is used on download to know which books should be downloaded. For regular text uploads (not resourceOnly), we also append (on the fly) any peripheral books to what was listed in the bookList (since peripherals are not allowed in regular uploads). In typical metadata 1.5 > 2.1 transformations/migration the old content/bookList maps to publication/canonicalContent and the order is given in structure/content. However, unless we are OKAY with letting peripherals in resourceOnly uploads also be listed in canonicalContent, we'd need to provide either another table or at least another metadata source which is joined to provide the full bookList for resourceOnly uploads. Personally, I'm OKAY with canonicalContent being used this way, since it "just works" in the current system, but I understand why @mvahowe might think that's insane |
@ericpyle I believe we've now down this? |
@mvahowe we still need |
@mvahowe so, I guess |
@mvahowe although to be more valid,
But if that's too hard to pull off you can include them as well. Perhaps you could catch that in schematron check instead? |
@ericpyle This seems to be one of the things that RelaxNG does make easy. |
In metadata 2.1.1:
Where optional
<projectType>
can beStandard|Daughter|StudyBible|StudyBibleAdditions|BackTranslation|Auxiliary|TransliterationManual|TransliterationWithEncoder|ConsultantNotes|GlobalConsultantNotes|GlobalAnthropologyNotes
Where optional
<basedOn>
has required<name>
(lenGe2String) and required<id>
(ptId)`In text metadata 1.5:
The text was updated successfully, but these errors were encountered: