The SABIO backend requires the following input data structure, separated into 3 files:
The meta file declaring meta-properties of the dataset, such as its name and own URL, as well as (part of the) semantics: (1) properties of objects used for filtering and sorting them and (2) textual properties of objects that will be used for keyword search and scoring by the SABIO algorithms.
The meta file is a JSON dict with the following items:
-
id
: the dataset's ID, will be used in the backend as a key for index (i.e. should be hashable & unique) -
name
: the dataset's display name, that is displayed in the frontend (doesn't need to be unique) -
dataset_url
: the URL of the dataset's website; e.g. the landing page for the public search interface in the case of the NMvW dataset -
dataset_params
: a dict where the keys are the names of columns in the dataset CSV which are to be used as search and sort parameters; the dict's values are themselves dicts which contain the information required to instantiate datasets.DatasetParameter Python objects (see the class definition for possible values and how they are used in the backend) -
text_columns
: names of columns in the dataset CSV to be used in the backend for search and scoring; objects'name
anddescription
fields are used by default (text_columns may be an empty list); in the backend, the columns' values will be joined by "\n" for searching and scoring (which affects neither search results nor scores)
This is the main data table and needs to contain at least the following fields. Fields that are not declared as dataset_params
or text_colums
in the meta JSON (see above) are simply ignored.
ID | start_date | end_date | date_string | name | description | <object_parameter_1> | ... | <object_parameter_n> | <text_column_1> | ... | <text_column_m> | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
dtype | int | ISO date | ISO date | string | string | string | any | ... | any | str | ... | str |
unique values | True | False | False | False | False | False | False | ... | False | False | ... | False |
description | will be used as index into table | format: yyyy-mm-dd | format: yyyy-mm-dd; if empty, will be set to start_date | display date in interface, displayed as-is (no required format); if empty, string will be 'start_date -- end_date' | display name in interface | displayed as part of object details | optional; may be additional parameter to filter and sort (e.g. categorical) | ... | optional; may be additional parameter to filter and sort (e.g. categorical) | optional; may be additional field to filter and score | ... | optional; may be additional field to filter and score |
The CSV holding URLs of images for objects. These URLs are simply passed to the front-end, where they are inserted into the user interface when hovering or clicking on objects. The SABIO backend therefore also works without images but this CSV is still required. So if no images are available, simply fill this CSV with placeholders (such as "").
ID | thumbnail_URL | img_URL | |
---|---|---|---|
dtype | int | URL | URL |
unique values | True | False | False |
description | will be used as index into the dataset CSV | passed as-is to the front-end; shown on hover | passed as-is to the front-end; shown in object details |