Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add language, other data to the Push to Archive during cataloging #132

Open
twinkietoes-on opened this issue Dec 21, 2021 · 4 comments
Open

Comments

@twinkietoes-on
Copy link
Collaborator

A user suggested:

Language tags at IA are not working: On https://archive.org/details/librivoxaudio, it says there are 16,244 results, but only 69 items in English, 2 in German, and 1 in French. It is not possible to have a list of all LibriVox recordings [at Archive] in a given language.

This would require the language data to be sent when the project is uploaded to IA during cataloging. I doubt we could make it retroactive, but it may be nice to have new projects bring over the language data.

At the same time, it would be nice to bring over the total project duration also, if possible. This is populated in the project database sometime during the "upload to Archive" step. It would be nice to plug that info into the "Run Time" field at Archive as well.

Priority for both: 5 (low)

@hornc
Copy link
Contributor

hornc commented Apr 3, 2023

Relates to / (near) duplicate of #8

@hornc
Copy link
Contributor

hornc commented Apr 3, 2023

I have started looking into these issues and the archive.org metadata for the LibriVox collection -- runtime is populated automatically when things work well, as part of the audio file processing and format generation. 17,650 items in the collection have a runtime Only 310 items don't, and I'm not yet sure why.

@twinkietoes-on
Copy link
Collaborator Author

twinkietoes-on commented Apr 3, 2023

Are you speaking of the LibriVox catalog, where some run times are missing?
That happens usually by human error. The cataloging system populates that field when sending the files to Archive, but if the MC has a browser window open with the database entry and makes an edit there, the run time and some other info auto-populated upon Archive delivery are erased, since they weren't in the database entry in their browser window.
As far as I know, they don't get added to the Archive pages unless the MC does it manually.

@hornc
Copy link
Contributor

hornc commented Apr 3, 2023

@twinkietoes-on Thanks for the response. I was talking about the archive.org metadata and collection. I was going by the statement from https://archive.org/developers/metadata-schema/index.html#runtime

runtime:

usage notes: Uploader can set this field, but most often we have determined and set this value during the derive process.

Which seems plausible (but I don't know for sure if it's true), and the fact that most of the collection on archive.org has a runtime: https://archive.org/search?query=collection%3Alibrivoxaudio+AND+runtime%3A%2a vs. the ~310 which don't: https://archive.org/search?query=collection%3Alibrivoxaudio+AND+NOT+runtime%3A%2A

I can also imagine any auto-generation on the archive.org side being over-ridden in some cases side, if a blank runtime is explicitly passed or something, so maybe it's complicated.

My main goal was to improve the archive.org language metadata for the https://archive.org/details/librivoxaudio collection, but this issue suggests runtime is a problem too. Missing runtime seems a lot rarer than missing language though --it looks like there is a mechanism to populate runtime, but language was just missing? I've tried to add it in #161

redrun45 added a commit to redrun45/librivox-catalog that referenced this issue Dec 8, 2024
redrun45 added a commit to redrun45/librivox-catalog that referenced this issue Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants