Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OAI-PMH spec appears to require an xml declaration not included in the xoai output (?) #225

Closed
landreev opened this issue Mar 18, 2024 · 1 comment
Labels
bug Something isn't working wontfix This will not be worked on

Comments

@landreev
Copy link
Collaborator

landreev commented Mar 18, 2024

I am NOT sure this is really necessary, but opening this following a user issue in the Dataverse repo. It was pointed out that the spec appears to require an opening <?xml ... ?> declaration (spec, 3.2: http://www.openarchives.org/OAI/openarchivesprotocol.html#XMLResponse part 1):

The first tag output is an XML declaration where the version is always 1.0 and the encoding is always UTF-8, eg: <?xml version="1.0" encoding="UTF-8" ?>

I am not aware of a practical situation where the absence of this header is actually causing a problem. The Dataverse repo issue (IQSS/dataverse#10329) was opened under an assumption that the OAI records were unparsable and not well-formed without it when UTF-8 characters were present in the metadata, but in reality the problem they were running into was an instance of the "split UTF-8 character" bug, fixed in #188.

My understanding is that in a practical harvesting scenario the server already unambiguously communicates to the client to expect UTF-8 encoded xml, via the http header Content-Type: text/xml;charset=UTF-8, so this declaration seems redundant (?). Still, the spec says so - so, I'm opening and leaving this issue here, for your consideration.

P.S. This is a weird coincidence - the fact that the bug the reporting user encountered was in fact caused by the xoai code's attempt to strip this very same xml declaration header from the cached metadata record...

@landreev landreev changed the title OAI-PMH appears to require an xml declaration not included in the xoai output (?) OAI-PMH spec appears to require an xml declaration not included in the xoai output (?) Mar 18, 2024
@poikilotherm poikilotherm added bug Something isn't working wontfix This will not be worked on labels Mar 25, 2024
@poikilotherm
Copy link
Member

poikilotherm commented Mar 25, 2024

IMO this library is not in charge to add the XML processing instruction (in this case, the XML prolog).

DataProvider.handle() returns an OAIPMH object, which is an XML node, ready to be written, representing the root <OAI-PMH> element. One could argue it is the responsibility of the application using the library to add any other XML related things that need to be done before the root element, as it is responsible to set headers etc as well.

Technically, the prolog is optional, see https://www.w3schools.com/XML/xml_syntax.asp. It becomes mandatory when you use a different encoding or XML 1.1, which AFAIK we are not.

I'm closing this as "wontfix" for now. Please feel free to reach out if you think this is wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants