Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More structured way of handling authority identifiers in the header #3

Open
christofs opened this issue Apr 25, 2024 · 7 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@christofs
Copy link

christofs commented Apr 25, 2024

We currently have no really transparent, easy-to-parse way of providing authority identifiers such as VIAF and Wikidata ids, as noted also in #1. One solution for this that does not require a prefixDef would be something along the following lines:

<titleStmt>
<title>...</title>
<author>
<name>last, first</name>
<idno type="wikidata" corresp="https://wikidata.org/wiki/">Q0123456789</idno>
</author>
</titleStmt>

Similarly, for editors or other people, and with an alternative structure:

<respStmt>
<resp>publisher</resp>
<name>Trier University</name>
<idno type="ROR" corresp="https://ror.org/02778hg05"/>
</respStmt>
<respStmt>
<name>Julia Röttgermann</name>
<idno type="ORCID" corresp="https://orcid.org/0000-0002-1918-8117"/>
</respStmt>

I'm not insisting on any of the attributes or particular structures, and happy to see alternative solutions. But what would be nice is to be able to use XPath without a lot of tricks (like looking up base URLs somewhere else depending on the value of an attribute) and without context knowledge (such as base URLs) in order to automatically follow the links implied by these identifiers. I'd be happy to accept some verbosity or even redundancy to make this possible.

@christofs christofs added the enhancement New feature or request label Apr 25, 2024
@lb42
Copy link
Collaborator

lb42 commented Apr 25, 2024

Why do you want to give the identifier value as an attribute rather than as content of the <idno> ? What's wrong with e.,g.

<idno type="ORCID">https://orcid.org/0000-0002-1918-8117</idno>
<idno type="wikidata" >https://wikidata.org/wiki/Q0123456789</idno>

Purists might object that the identifier and the URL using it should be distinguished, I suppose. In which case you could simply do

<ref type="ORCID">https://orcid.org/0000-0002-1918-8117</ref>
<ref  type="wikidata" >https://wikidata.org/wiki/Q0123456789</ref>

or even

<ref type="ORCID">https://orcid.org/<idno>0000-0002-1918-8117</idno></ref>
<ref type="wikidata" >https://wikidata.org/wiki/<idno>Q0123456789</idno></ref>

@morethanbooks
Copy link

I don't consider myself a purist :) but yes, I would like to see the IDs and the URL explicitly encoded separately. I could live with <ref type="ORCID">https://orcid.org/<idno>0000-0002-1918-8117</idno></ref>, although I can imagine that this might be a bit problematic when reading the files with e.g. lxml.

@lb42
Copy link
Collaborator

lb42 commented Apr 28, 2024

Actually, thinking about this again, it's clear that supplying the "canonical" reference via an attribute rather than as content for an ident makes much better sense, for the simple reason that you might have (e.g.) a VIAF number for an author and also for a title, not to mention others. Much simpler to specify those values using @ref (or @corresp) on the appropriate element (author, title, etc.). These attributes are by definition URL values, so the full URL must be supplied, possibly abbreviated via a defined prefix.

@christofs
Copy link
Author

I agree that there are some arguments for supplying the identifier as a @ref on the appropriate element (such as title, author, name). However, when there are multiple relevant identifiers (such as GND, Wikidata and VIAF), this becomes cumbersome and a bit less easy to extract.

I think that is why I still prefer the solution proposed above:
<idno type="ORCID">https://orcid.org/0000-0002-1918-8117</idno>
or
<ref type="ORCID">https://orcid.org/0000-0002-1918-8117</ref>

The <idno> element would be a child of the relevant element / entity, to make clear what it refers to. That's pretty clear, too, isn't it?

And I personally don't care for separating the domain / base URL from the identifier itself. Those who want to look up the information, can do it using the full link. And those who want to record just the identifier itself, can easily remove the base URL, which is always the same for a given identifier.

I especially don't like the nesting of ref and idno. What about this structure, though?

<idno type="ORCID" xml:base="https://orcid.org/">0000-0002-1918-8117</idno>

That seems pretty neat to me.

@christofs christofs mentioned this issue Apr 30, 2024
@lb42
Copy link
Collaborator

lb42 commented May 3, 2024

Using a child <idno> works well for some elements (like author) but less well for others e.g. title - it's allowed by the default TEI content model for <title>, but then you either have to have mixed content or weird tagging like this

 <title><title>The real title</title><idno>the identifier</idno></title>

The Guidelines have examples using <idno> in the way you would like but only as children of <publicationStmt> which is not (I think) what we are looking for here.

Adding a @ref attribute, however, is possible and easy for all the elements for which we might want to link to an authority file. Is it really so difficult to extract individual values from multi-valued attributes? (You have to be able to do that to parse any TEI pointer value, after all) How often will there be multiple values to supply?

I can, of course, make the schema support either approach, or both!

@lb42
Copy link
Collaborator

lb42 commented May 3, 2024

And yes, I agree that distinguishing the identifier within the URL is a bit weird. Using @xml:base as you suggest seems simple enough though.

@lb42
Copy link
Collaborator

lb42 commented May 5, 2024

Another reason for preferring the @ref solution is that it's the one we already explicitly suggest in the ELTeC doc, of course. So it has to remain as a possibility, or we break existing documents :-(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants