-
Notifications
You must be signed in to change notification settings - Fork 0
simonzsolt/rpha-mek
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This was the procedure to convert the csv files of the printed sources into JSON arrays. - all rmk and rmny files were concatenated into all_NAME.csv files - files were imported via mongochef - 11 documents tom 'rmk' collection - 34 documents 'rmny' collection - pages were converted to JSON files with pagesToJson.js - target: pagesJson/res.json - the script uses csvtojson npm package - in order to parse strings like. '0000' as strings the 'checkType : false' condition was added to the options - the resulting file is not a valid JSON array - the last ',' needs to be removed - the complete list needs to be placed in '[]'-s and formated accordingly - each line within a PAGE.csv was converted to a separate object - the name of the csv page files (i.e. the last part of MEK folder URL) was added to the objects as 'MEK_sub' - 640 documents were imported TODO: - fix pages model - MEK field is duplicate with RMNY and RMK fileds - to model the parts of the MEK URL as follows: - MEKUrl: -prot - host - path1 - path2 - path3 - path4 - path5 - e.g. http://mek.oszk.hu/12200/12273/html/RMK_1_0007_0018.html - prot: "http", - host: "mek.oszk.hu", - path1: "12200" - path2: "12273" - path3: "html" - path4: "RMK_1_0007_0018" - path5: "html" - rename docBibNum in RMK to pageNum - some values ha a \d\/.{4-5} pattern (e.g. '1/00Z6R') - this could be a typo of '1/0066R' LOG RMNY - MEKUrl remodelled page -MEKUrl remodelled rmny-aggreg.js contains a few URLs created with two aggregation steps (sourceRmny and SourceRmnyPageColl)
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published