Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What goes in a scriptureText burrito? #52

Closed
mvahowe opened this issue Aug 18, 2019 · 13 comments
Closed

What goes in a scriptureText burrito? #52

mvahowe opened this issue Aug 18, 2019 · 13 comments
Assignees
Labels
Defining Feature or Fix We have a rough idea that needs refining before implementation
Milestone

Comments

@mvahowe
Copy link
Contributor

mvahowe commented Aug 18, 2019

https://github.com/bible-technology/scripture-burrito/blob/mvh/non_text_scripture/docs/scripture_text_flavor.rst

is where things get real. I suggest that we have the USFM/USX discussion elsewhere. We can talk about the metadata, but I don't think it's very controversial. The fun bit is deciding what resources we are going to require and allow.

The "release" list is quite short. (I think DBL sometimes sees other files but it's complicated to get a list - did I ever mention that what Paratext puts into DBL right now is not documented anywhere to the best of my knowledge?)

The "source" list is based on unzipping archive.zip for CEVUK and removing the files that @FoolRunning said Paratext didn't need. (I'll post his email below for transparency.) I don't pretend to understand PT internals, but, from eyeballing this list,

  • some of what is here can probably be moved to SB metadata or to a SB-generic way of achieving the same thing (license.json, BookNames.xml...)

  • some of this looks to me like worfklow around the translation, which we decided early on was definitely out of scope for SB, unless we want to end up inventing a whole social network infrastructure (CommentTags.xml, ProjectUserAccess.xml, ProjectProgress.xml...)

  • some of this looks quite useful in a portability context, with the big assumption that every SB editor maintains that information (hyphenatedWords.txt, TermRenderings.xml...)

  • Some of this looks like user settings (Settings.xml) which simply cannot be portable unless there's an SB standard for UX at quite a low level.

We agreed a long time ago that SB was to be 'round trippable', eg

  • There's a project in PT, with settings, styles, term renderings and lots of other cool stuff

  • That project is exported as a burrito and passes through one or more other system, and is edited, updated and extended along the way

  • The extensively modified project eventually returns to PT

  • Everything works as if the project had never left PTX.

I've expressed concerns about the mechanics of this process on many occasions. We're now looking at the implementational details. AFAICS:

  • For each of these files that PT actually needs, losing the files for a project presumably means losing data

  • Stashing the files in a notional "for PT only" section of the SB, or somewhere in PTX, doesn't seem to help much if the burrito is updated outside of PT. eg TermRenderings.xml and many other files will be out of date by the time PT gets its burrito back

  • Making each of these files portable requires all compliant SB editors to maintain that file, which requires a specification for each file.

  • All this is just for content produced by PT. Presumably every other editor, present and future, will have its own set of working files. To round trip via PT, PT would need to support all those files.

Or, to put it more succinctly, I don't think we're even close to being able to promise 100% roundtrippability.

What we can do is start with the easy stuff, and that would be better than nothing. So

  • Which of these files don't need to be here at all?

  • which of these files contain content that could be represented somewhere in SB metadata?

  • which of these files really belongs in some sort of ecosystem project management system like PT Reg?

  • of the rest, which of these files could we aim to make a standard in the near future? (custom.json has to be one candidate since we've already done work on this.)

Concretely, before SB 0.1 Beta, we need to remove the ??? in the page linked to above, either by defining and justifying the presence of the file or by removing it.

@mvahowe
Copy link
Contributor Author

mvahowe commented Aug 18, 2019

From an email by @FoolRunning, June 18th:

Mark,

These are the files in the source zip that Paratext definitely does not need:

.hg folder
Any files with the .DIC extension
Any files with the .TXT extension except for hyphenatedWords.txt
ldml.xml
ProjectProgress.tsv
unique.id
wordlist.wdl

Assuming we're not having DBL be an archival for Paratext projects,
these are files Paratext does not need for a resource text:

Anything that starts with Notes_
Anything that starts with PrintDraft
Canons.xml (Is this needed for DBL?)
CheckingStatus.xml
CommentTags.xml
hyphenatedWords.txt
license.json
ProjectProgress.xml
ProjectUserAccess.xml

@mvahowe
Copy link
Contributor Author

mvahowe commented Aug 18, 2019

(Things look a lot easier if, instead of roundtrippability between an arbitrary number of editors, we aim for content to be able to make a one-way trip into Paratext after initial editing elsewhere, and we we assume that Paratext wil be used for the bulk of consistency and other checks.)

@jonathanrobie
Copy link
Collaborator

I agree with TIm's list.

Canons.xml might be worth discussing, though.

I think most real implementations will have this kind of data, which is not interoperable among applications. I also think trying to actually store the .hg files would bloat burritos to the point that they would be harder to handle - better to identify the repository for anyone who has the right credentials, and better to avoid sending files that might provide access to the repository for people or applications that do not.

@mvahowe
Copy link
Contributor Author

mvahowe commented Aug 19, 2019

My understanding was that Canons.xml was a standard file. Does it get extended for custom canons? We have a mechanism for that in SB metadata.

We still need to decide how things are going to work in practice. So, eg, if

  • I have a project in Send and Receive

  • I send it to some other ecosystem, minus everything bar metadata and USFM files

  • that other ecosystem makes significant edits and editions

  • the metadata and USFM comes back to Paratext

how does that work? What happens if, say, the hyphenation, terms and other non-portable files are out of date with respect to the USFM? How does PT behave in those circumstances?

@mvahowe
Copy link
Contributor Author

mvahowe commented Feb 19, 2020

@jonathanrobie @jag3773 @joelthe1 We're approaching the point where we'll need to commit to details on this level. Some of these files look a little arbitrary. That's an issue because

  • Non-Paratext burrito creators will need to recreate all these files and, also
  • This effectively freezes current Paratext formats for Paratext too.

So maybe we should finally have a conversation about what really needs to be there and whether the formats make sense for inter-system exchanges?

@jag3773 jag3773 added this to the SB 0.2.0-beta milestone Feb 19, 2020
@mvahowe mvahowe added the Defining Feature or Fix We have a rough idea that needs refining before implementation label Feb 20, 2020
@FoolRunning
Copy link
Collaborator

FoolRunning commented Feb 20, 2020

I was asked to produce a list of files that are needed in a Paratext resource for Paratext to be able to adequately represent the text, project settings, etc. to the user. The list was written above, but was written in a negative way (i.e. what files are currently included that don't need to be included), so I thought it best to create a positive list as well:

  • SFM files (i.e. the USFM text)
  • custom.sty (custom USFM stylesheet)
  • custom.vrs (custom versification)
  • LDML file (language file)
  • Settings.xml (project settings and metadata)

The following files aren't really required, but most users would appreciate them existing (helps with translation, I think):

  • BookNames.xml (names of the books in vernacular language)
  • ErrorDenials.xml (errors from the checks that were denied)
  • TermRenderings.xml (key terms in vernacular language)
  • Lexicon.xml (list of words)
  • SpellingStatus.xml (status of word acceptance)
  • WordAnalyses.xml (morphology of words in the lexicon)

EDIT: Crossed out files that we don't really need (after talking with someone about it).

@mvahowe
Copy link
Contributor Author

mvahowe commented Feb 21, 2020

Thanks @FoolRunning

For the sake of landing something within our lifetimes, I suggest we go with your first list for now.

I think we have the USFM/USX covered in other issues. (With the variant proposal I think we could return either USFM or USX for "Paratext Resources".)

I'll spin up new issues to review the style, versification and LDML files. I can't imagine that their presence is going to be controversial but we might want to tweak the format.

Settings seem to me to be a much more complex conversation. I'll make an issue for that too.

@jag3773
Copy link
Collaborator

jag3773 commented Oct 8, 2020

We should have a role defined for these:

App specific files:

  • (@jtauber will fix this description) Reuse the idAuthority and map an id to a url and then ... App is free to put specific files that they need into these:

    • ErrorDenials.xml (errors from the checks that were denied)
    • TermRenderings.xml (key terms in vernacular language)
    • Settings.xml (project settings and metadata)

@jag3773 jag3773 removed the Talk About This! Consider putting this issue on the agenda for an SB meeting label Oct 8, 2020
@jag3773
Copy link
Collaborator

jag3773 commented Oct 22, 2020

This PR would also be included in the non-app specific role definitions: https://github.com/bible-technology/scripture-burrito/pull/158/files

@jtauber
Copy link
Collaborator

jtauber commented Mar 11, 2021

I've added a localedata role for the LDML file.

Do we then just need versification and the stylesheet roles to call this done?

I'm not sure I fully understand what is needed for "App specific files". Can't an app just put whatever files it likes in the ingredients and if they're app-specific, either use x- roles or no role at all (and assume the app knows what they are by filename) ?

@jag3773
Copy link
Collaborator

jag3773 commented Mar 11, 2021

Do we then just need versification and the stylesheet roles to call this done?

That sounds good to me @jtauber .

I can't make sense of my comment about "app specific files." Seems like we talked about it and that was the solution we came up with, but it's obviously not clear enough to implement! Maybe @jonathanrobie can make sense of it? If not, let's not worry about it for now.

@jag3773
Copy link
Collaborator

jag3773 commented Mar 11, 2021

Decided today that custom.sty should be an x- role. @jtauber Include a bit in the docs about that.

Also document a recommendation that app specific files show up in a sub-directory of ingredients. This is not globally enforced, nor required, but a recommendation.

@jtauber
Copy link
Collaborator

jtauber commented Mar 18, 2021

The remaining issues are now covered by #248 so closing this.

@jtauber jtauber closed this as completed Mar 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Defining Feature or Fix We have a rough idea that needs refining before implementation
Projects
None yet
Development

No branches or pull requests

5 participants