Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source data model #444

Open
5 of 7 tasks
BeritJanssen opened this issue Mar 30, 2021 · 4 comments
Open
5 of 7 tasks

Source data model #444

BeritJanssen opened this issue Mar 30, 2021 · 4 comments
Assignees
Labels
consortium agreement Was agreed with the consortium enhancement New feature or request
Milestone

Comments

@BeritJanssen
Copy link
Member

BeritJanssen commented Mar 30, 2021

Since we'll have sources from various platforms, we need to make sure that the interface can handle them. Necessary steps:

  • Get consortium input about what should be in the model
  • in progress Consolidate consortium responses
  • in progress Find suitable vocabularies
  • Form plan, request feedback
  • Migrate
  • Adjust upload form
  • Add fields on Elasticsearch
@BeritJanssen BeritJanssen added consortium agreement Was agreed with the consortium enhancement New feature or request labels Mar 30, 2021
@BeritJanssen BeritJanssen added this to the Extra mile milestone Mar 30, 2021
@jgonggrijp
Copy link
Member

Related: #390.

@jgonggrijp jgonggrijp self-assigned this Mar 31, 2021
@jgonggrijp
Copy link
Member

jgonggrijp commented Apr 6, 2021

This is the input I have gather so far on Slack. I consider this stage complete.

Field Status quo Gustavo (social media) Claire (bibliography) Alessio (crowdsourcing ontology, subset of schema:MediaObject) Guillaume (scraped reviews)
title required desirable schema:title page_title (optional, generally same as book_name)
author required optional (anonymization) author
editor optional optional
fulltext required necessary text
screenshot/rendering absent necessary (eventually)
language required necessary schema:inLanguage language
source type required, book/article/review/social media post/web content/other necessary
publication date required necessary (tweet date) optional, free-form text, "creation date" more appropriate for archival resources date_text (free-form)
publisher optional optional/irrelevant optional
URL optional optional (anonymization) schema:identifier url
repository absent desirable, free-form (usually archive, location, collection, call, fasc, folio)
format absent schema:encodingFormat
thumbnail absent schema:thumbnailURL
book author absent book_author_name
book title absent book_name
date of retrieval absent date_of_scrapping

Desirable sources to import from

Regarding consolidation, some preliminary ideas:

  • We might need three date fields: date of creation, date of publication and date of retrieval. RDF supports dynamic typing, so we can use xsd:date when possible and xsd:string otherwise.
  • Several bits of input hint at future support for other formats than plain text. I'd really like to generalize this so that follow-up projects can add support for additional formats without needing to change the source data model. Plausible formats to support in the future include bitmap, PDF, Twitter JSON (possibly up to whole threads) and HTML.
  • Dublin Core and Schema.org seem to be the most direct candidates for a foundation (which is not too different from what we already have). DC fits very well with the archival perspective. The one "catch" with this is that we are currently using DC for ownership and permission administration; we should probably move the latter to new terms in our own vocab or staff namespaces.

@jgonggrijp
Copy link
Member

@JeltevanBoheemen I think we can merge #86 into this by just adding a public yes/no field.

@JeltevanBoheemen
Copy link
Contributor

Good idea, I'll add it to the datamodel.

jgonggrijp added a commit that referenced this issue Jul 28, 2022
This seems to have been an omission in 77247e5.
jgonggrijp added a commit that referenced this issue Aug 3, 2022
This seems to have been an omission in 77247e5.
jgonggrijp added a commit that referenced this issue Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consortium agreement Was agreed with the consortium enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants