You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am NOT sure this is really necessary, but opening this following a user issue in the Dataverse repo. It was pointed out that the spec appears to require an opening <?xml ... ?> declaration (spec, 3.2: http://www.openarchives.org/OAI/openarchivesprotocol.html#XMLResponse part 1):
The first tag output is an XML declaration where the version is always 1.0 and the encoding is always UTF-8, eg: <?xml version="1.0" encoding="UTF-8" ?>
I am not aware of a practical situation where the absence of this header is actually causing a problem. The Dataverse repo issue (IQSS/dataverse#10329) was opened under an assumption that the OAI records were unparsable and not well-formed without it when UTF-8 characters were present in the metadata, but in reality the problem they were running into was an instance of the "split UTF-8 character" bug, fixed in #188.
My understanding is that in a practical harvesting scenario the server already unambiguously communicates to the client to expect UTF-8 encoded xml, via the http header Content-Type: text/xml;charset=UTF-8, so this declaration seems redundant (?). Still, the spec says so - so, I'm opening and leaving this issue here, for your consideration.
P.S. This is a weird coincidence - the fact that the bug the reporting user encountered was in fact caused by the xoai code's attempt to strip this very same xml declaration header from the cached metadata record...
The text was updated successfully, but these errors were encountered:
landreev
changed the title
OAI-PMH appears to require an xml declaration not included in the xoai output (?)
OAI-PMH spec appears to require an xml declaration not included in the xoai output (?)
Mar 18, 2024
IMO this library is not in charge to add the XML processing instruction (in this case, the XML prolog).
DataProvider.handle() returns an OAIPMH object, which is an XML node, ready to be written, representing the root <OAI-PMH> element. One could argue it is the responsibility of the application using the library to add any other XML related things that need to be done before the root element, as it is responsible to set headers etc as well.
Technically, the prolog is optional, see https://www.w3schools.com/XML/xml_syntax.asp. It becomes mandatory when you use a different encoding or XML 1.1, which AFAIK we are not.
I'm closing this as "wontfix" for now. Please feel free to reach out if you think this is wrong.
I am NOT sure this is really necessary, but opening this following a user issue in the Dataverse repo. It was pointed out that the spec appears to require an opening
<?xml ... ?>
declaration (spec, 3.2: http://www.openarchives.org/OAI/openarchivesprotocol.html#XMLResponse part 1):The first tag output is an XML declaration where the version is always 1.0 and the encoding is always UTF-8, eg: <?xml version="1.0" encoding="UTF-8" ?>
I am not aware of a practical situation where the absence of this header is actually causing a problem. The Dataverse repo issue (IQSS/dataverse#10329) was opened under an assumption that the OAI records were unparsable and not well-formed without it when UTF-8 characters were present in the metadata, but in reality the problem they were running into was an instance of the "split UTF-8 character" bug, fixed in #188.
My understanding is that in a practical harvesting scenario the server already unambiguously communicates to the client to expect UTF-8 encoded xml, via the http header
Content-Type: text/xml;charset=UTF-8
, so this declaration seems redundant (?). Still, the spec says so - so, I'm opening and leaving this issue here, for your consideration.P.S. This is a weird coincidence - the fact that the bug the reporting user encountered was in fact caused by the xoai code's attempt to strip this very same xml declaration header from the cached metadata record...
The text was updated successfully, but these errors were encountered: