Skip to content

simonzsolt/rpha-mek

Repository files navigation

This was the procedure to convert the csv files of the printed sources into JSON arrays.

    - all rmk and rmny files were concatenated into all_NAME.csv files
        - files were imported via mongochef
        - 11 documents tom 'rmk' collection
        - 34  documents 'rmny' collection
    - pages were converted to JSON files with pagesToJson.js
        - target: pagesJson/res.json
        - the script uses csvtojson npm package
        - in order to parse strings like. '0000' as strings the  'checkType : false'
            condition was added to the options
        - the resulting file is not a valid JSON array
            - the last ',' needs to be removed
            - the complete list needs to be placed in '[]'-s and formated
                accordingly
        - each line within a PAGE.csv was converted to a separate object
        - the name of the csv page files (i.e. the last part of MEK folder URL)
            was added to the objects as 'MEK_sub'
        - 640 documents were imported

TODO:
    - fix pages model
        - MEK field is duplicate with RMNY and RMK fileds
        - to model the parts of the MEK URL as follows:
            - MEKUrl:
                -prot
                - host
                - path1
                - path2
                - path3
                - path4
                - path5
            - e.g. http://mek.oszk.hu/12200/12273/html/RMK_1_0007_0018.html
                    - prot: "http",
                    - host: "mek.oszk.hu",
                    - path1: "12200"
                    - path2: "12273"
                    - path3: "html"
                    - path4: "RMK_1_0007_0018"
                    - path5: "html"
    - rename docBibNum in RMK to pageNum
        - some values ha a \d\/.{4-5} pattern (e.g. '1/00Z6R')
            - this could be a typo of '1/0066R'

LOG

    RMNY
        - MEKUrl remodelled

    page
        -MEKUrl remodelled


rmny-aggreg.js contains a few URLs created with two aggregation steps (sourceRmny and SourceRmnyPageColl)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published