Skip to content
This repository has been archived by the owner on Feb 7, 2023. It is now read-only.

Clarifying explanation in "2_gallica_subset/Gallica_subset.ipynb" #1

Open
archaeoklammt opened this issue Jun 17, 2022 · 1 comment

Comments

@archaeoklammt
Copy link
Member

Part on "length of documents"
Just for the sake of clarity, I would recommend explaining about the manifests, which are false duplicates. Duplicates, because they turn up times in the data set, but false, because the data records are referring to different parts of the same manifest, as this is the case with ID 14254 and 14256 in the sample above. This will go well with your "Update" at the end of the section.

@archaeoklammt
Copy link
Member Author

Again: it won't change the overall outcomes of the procedure, but might be optimized. The clear rules for breaking down the information toward pagination is not quite clear for one of the cases:
"if twice 'p.' then take the larger interval" - but isn't it that in theory both intervals are designating content, to that the data records refers?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant