Works with Python ^3.9
git clone [email protected]:clarin-eric/resource-families-html-generator.git # via SSH or
git clone https://github.com/clarin-eric/resource-families-html-generator.git # via HTTPS
cd ./resource-families-html-generator/
pip install .
ClarTable is a Python module for generating html presentation layer for tabular data from .csv file.
usage: python -m rfhg [-h] -i PATH -r PATH -o PATH
Create html table from given data and rules.
To navigate static resources within the module prepend `static.`
to the path, eg. `-r static.rules/rules.json`
optional arguments:
-h, --help show this help message and exit
-i PATH path to a .csv file or folder with .csv files
-r PATH path to a .json file with rules
-o PATH path to file where output html table will be generated
The html tables for resource families can be generated via GitHub. Push new .csv files to /resouce_families
and after processing they will appear in gh-pages branch.
In order to create html table from .csv file with default rules, the file requires all of following columns (order not important). Note that names of columns are case sensitive. If you need generator to consider additional columns contact [email protected] or adjust rules.json.
Make sure, that your .csv files use ; (semicolon) as a column separator.
Single cell may containt multiple paragraphs or structures split with #SEP separator. Following the example below the Description cell consists of 3 paragraphs. Some of the cells depend on others, looking into Buttons cell there are 2 buttons names split with the separator and respective URLs in Buttons_URL.
Corpus | Corpus_URL | Language | Size | Annotation | Licence | Description | Buttons | Buttons_URL | Publication | Publication_URL | Note |
---|---|---|---|---|---|---|---|---|---|---|---|
Example Corpus Name | www.examplaryurl.com | English | 100 million tokens | tokenised, PoS-tagged, lemmatised | CC-BY | First examplary sentence #SEPSecond examplary sentence to be started from new line #SEPExample with <a href="http://some.url">hyperlink</a> in it |
Concordancer#SEPDownload | https://www.concordancer.com/ #SEPhttps://www.download.com | Smith et al. (3019) | https://publication.url | Note text to be displayed in button field |
Table title will be derived from the .csv file name in format X-table_title.csv, where X is index used for table ordering. Tables can be grouped into sections by storing them in the intermediate directory within corpora that is subject to the same indexation principle as .csv files. For example corpora with structure:
Historical corpora
├── 1-Historical corpora in the CLARIN infrastructure
│ ├── 1-Monolingual corpora.csv
│ └── 2-Multilingual corpora.csv
└── 2-Other historical corpora
├── 1-Monolingual corpora.csv
└── 2-Multilingual corpora.csv
Will produce:
Rules are composed of nested json notation of tags and field. Given rule:
{"tags": [
{"tag": "<table class=\"table\" cellspacing=\"2\">", "tags": [
{"tag": "<thead>", "tags": [
{"tag": "<tr>", "tags": [
{"tag": "<th>", "text": "Corpus name"}
]}
]},
{"tag": "<tbody>", "tags": [
{"tag": "<tr>", "tags": [
{"tag": "<td valign=\"top\"", "tags": [
{"tag": "<p>", "fields": [
{"text": "<strong>Field data</strong> will be inserted here: %s", "columns": ['column_name_in_csv_file']}
]}
]}
]}
]}
]}
]}
Generated html table with names of corpora, assuming there were only 2 rows in a .csv file
<table class ="table" cellspacing="2">
<thead>
<tr>
<th valign="top">Corpus name
</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top">
<p>
<strong>Field data</strong> will be inserted here: NKJP 2.1.4
</p>
</td>
</tr>
</tbody>
<tbody>
<tr>
<td valign="top">
<p>
<strong>Field data</strong> will be inserted here: Common Crawl
</p>
</td>
</tr>
</tbody>
</table>
Corpus name |
---|
Some text here Field data will be inserted here: NKJP 2.1.4 |
Some text here Field data will be inserted here: Common Crawl |
<tbody> tag encloses tags and fields for row creation, only tags nested within <tbody> ... </tbody> can contain "fields": []