-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resourceOnly variant plus validating example document #30
Conversation
Thanks, Mark. See my comments below. @mvahowe wrote:
Should we call this type medium
I had mentioned elsewhere that
Sure. it's the same list as in #17
Perhaps we didn't want to allow those roles for regular uploads? Should be okay as long as usx validation will catch any attempts to upload peripherals for now.
Okay. I'd still favor a new element under @smorrison: have any opinions on this?
It's my understanding that @bryceallison advised against fragment ids. He'd rather we just break them into separate files.
I don't have a strong feeling about keeping this or not. I'm certainly okay with populating |
<archivistName>Lois Gourley</archivistName> | ||
<dateArchived>2017-07-19T21:36:30.284353</dateArchived> | ||
<dateUpdated>2017-11-01T17:23:53.336625</dateUpdated> | ||
<comments>Need a comment here</comments> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mvahowe please make "comments" optional.
v2/2_1_1/test_docs/valid_2_1_1.xml
Outdated
</language> | ||
<manifest> | ||
<container uri="source"> | ||
<container uri="p8z-manifests"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mvahowe you can remove "p8z-manifests" and its contents. It's debris from web2py pt resource download behavior for debug purposes (since this keeps creeping into manifests, I'm guessing I should remove that behavior).
@ericpyle I've made another commit addressing some of your concerns. I hear what you're saying about resourceOnly, but I just can't bring myself to create a fundamental "type" in DBL that is explicitly tied to one particular client technology. In practice I don't think anyone is going to try to "hack" resources into PT this way and, also, if that's really a concern, the issue we need to address is server-side security. ie, if Joe DBL User is not supposed to be able to create these things, DBL ought to enforce that restriction. I left projectType in type where I think it belongs and added the enum. I put the 2 basedOn elements as optional ptId subelements (without a basedOn wrapper). In general, we haven't tried to work out which types of systemId can appear with which types of entry and I'm inclined not to start now. Thank-you for humouring me re publications. I think this is the least strange way to handle this data. From my perspective (which has DBL at the centre and PT as one possible HTTP client), resourceOnly entries are, in a sense, published on the PT platform and, as I understand your explanations, the bookList controls which of the available resources are "published". I really don't want to create a precedent for DBL entries where we're acting as a glorified cloud backup solution for opaque, binary data. And I think the srcPart mechanism may be useful for other entry types in the future. I totally agree with @bryceallison that, in an ideal world, we'd have one reference-able "thing" per file. That's why, last week, I was arguing that Paratext should give us the individual SFM files and not a zip. :-) But I understand why you don't want to do that, and I can imagine others making similar arguments about other media types in the future. I think the fragment identifier proposal came from me, but it was in reaction to an earlier proposal to use xpaths as ids. I think xpaths are awesome for finding things but utterly unsuitable as ids because there are literally an infinite number of xpaths to reach any particular node. So, if/when we need to label parts of an XML or HTML document, I think a fragment identifier is preferable to an xpath. But I'd be happy for us to never do either. I can make comments optional if we really need to do this, but it seems like another departure from DBL practice that just tends to make things more opaque. I think I'd prefer a boilerplate comment that might one day turn into something informative. (Presumably these things do get updated, fixed, etc, from time to time?) Finally, the schema doesn't require or forbid any manifest content so, if you don't want p8z-manifests to appear, you just need to stop uploading it. If you do upload it, it's going to appear in the manifest. Making a special case for that file would mean hacks right across DBL. |
@mvahowe wrote:
@mvahowe, you were the one who taught be about the need to tell clients the truth. :) (Btw, At some point we continued to call our metadata
I agree that it might make sense to add a server-side check, especially if that ever happens purposefully or accidentally.
I was just observing that you have two duplicate enumerations of systemIds. one in
Okay, you can keep
I think I had tried to explain that |
@ericpyle The thing is, even in our last audio conversation, the position shifted from "Only PT will ever use these entries" to "There may well be other translation tools that eventually wish to use these entries but they'll have to do it the PT way." FWIW I think DBLMetadata works because DBL was, is and is likely to continue to be the curator of that schema. Apart from that, I accept that there are many things that are different about these entries, but I'm telling myself that it's a question of degree. (If I don't tell myself that, I find myself wanting to remove the entries and put the content somewhere else.) You're right, we could make this systemId specific to resourceOnly media, but then, to be consistent, we ought to try to work out which media types can have a ptreg or dbp systemId. Ultimately we trust people to give us sensible systemIds, and even if we tighten the schema there will still be nothing to stop people making up systemIds by generating random strings, so I don't think complicating the schema gets us much extra value. But if you like I'll treat the paratext systemId as an exception. My point about p8z-manifests is that it isn't a schema issue. There are no specific filenames in the schema. The manifests for migration and auto-upgrading are generated by Python code on the server, so it sounds to me like you need to report a bug against dbl-archive. |
@mvahowe wrote:
You're right. It is possible in the future we'd want to allow owners to the additionally capability to deliver SFM content to non-paratext clients using a more DBL-native authentication/authorization scheme.
Not necessary, but I think you could feel confident about
You're right that it's a DBL bug. https://github.com/ubsicap/dbl-archive/issues/293 |
@ericpyle We seemed to have reached the point where there was an argument for doing things either way so, since you're going to have to deal with this more often than me, I thought I'd try it your way. Latest commit renames the type as paratextZipResource, which tells people the truth :-) and also leaves space for a more conventional paratextResource one day. I also limited the ptSystemId to this type. (Along the way I rediscovered that RelaxNG doesn't like complex logic around interleaved elements with the same name, ie we can't easily make ptSystemId required and other ids optional but, as I said yesterday, I don't think that's the biggest way for clients to mess up systemId if they really want to.) Does this look fit for purpose now? |
@mvahowe wrote:
haha. well, I had come to place where I was okay with your way, but if you're good with this, sounds okay with me. (I tend to err on the side of the YAGNI (you aint going to need it) principle of agile programming.)
Sounds fine.
ptSystemId needs also to be in |
@ericpyle Better? |
v2/2_1_1/schemas/metadata.rng
Outdated
@@ -2,7 +2,7 @@ | |||
<!-- | |||
metadata.rnc v2.1.1 | |||
Generated by mark@notamacbook using makeRNC.py | |||
Generated Tue Feb 6 15:47:19 2018 | |||
Generated Tue Feb 6 17:14:01 2018 | |||
--> | |||
<grammar ns="" xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mvahowe as far as I can tell, you didn't actually add ptSystemId back into textIdentificationElement?
@ericpyle Indeed, and I found some other oddness such as the dbpId not being available for audio. Can you take another look, please? |
@mvahowe wrote:
Probably because at some point I confirmed that dbpId was no longer needed, and thus did not need to be in 2.1. But if it's there I suppose there's no harm in keeping it. I'm guessing that audio metadata migrations have dropped those. |
@ericpyle I don't recall a conversation about dropping dbpId. I tried to put it back recently because we now have hundreds of DBP entries with the DBP ID in the comments section for want of a suitable systemId. |
The valid example document is in v2/2_1_1/test_docs. The gory detail is in the 2.1.1 schema. The easiest way to explore this is to edit the example document with a validating XML editor.
I've attacked the problem from several directions:
In some cases I've made stuff optional for resourceOnly.
In some cases I've modifed the schema for all variants.
Features of note include:
projectType and optional basedOn under type section, which means the current paratext systemId still works. (@ericpyle, can you give me the definitive list of projectTypes?)
Non-canonical book codes are now valid @ROLE values. (I don't know why I hadn't done this before since I had the list of non-canonical books in the schema. Maybe I was waiting for the end of the peripheral content discussion.)
New, optional srcPart attribute for content elements, which provides a path to find content within a manifest entry. Currently the options are
zip://.* for something inside a zip file
xpath://.* for something inside an XML file
#* for an XML/HTML fragment id
canonicalContent is missing because resourceOnly content is not canonical (in the sense of being Scripture), even if it happens to use the same codes.