-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to convert WikiTables dataset to your JSON format? #4
Comments
Hi @HaoAreYuDong, I saw your reply to the WDC dataset request. I need it to reproduce your work. Many thanks! |
Hello @HaoAreYuDong . I am also interested in replicating parts of your work. I would also greatly appreciate it if you could share the preprocessing script for WikiTables as it is a must for reaching impressive results such as the ones you reported. |
Data can be downloaded from: https://drive.google.com/file/d/1XyZAtH9F8UoLsXHsBWriWkh9pZ92e3sX/view?usp=sharing (for wikidata.pt) and https://drive.google.com/file/d/19GYZyJNlOMk8xB_nhn1lfwfEWXWj4Iww (for wikidata.json). |
Please find an example preprocessing script at https://drive.google.com/drive/folders/1fEiBs9d6GwV1zq8Kk4bMbWMEUeYrLkeL?usp=sharing. |
Please find an example script at https://drive.google.com/drive/folders/1fEiBs9d6GwV1zq8Kk4bMbWMEUeYrLkeL?usp=sharing. |
Hi @HaoAreYuDong,
http://l3s.de/~fetahu/wiki_tables/data/table_data/html_data/structured_html_table_data.json.gz |
I did not save the original file but still have a processed file which largely maintains the information. But it is big, is there any way to share with you? E.g., a shared folder. |
@HaoAreYuDong thank you for the quick response, I made sure it has enough free space (about 55 GB). |
Great. Thanks. |
It;s here. https://drive.google.com/file/d/19GYZyJNlOMk8xB_nhn1lfwfEWXWj4Iww/view?usp=drive_link |
@HaoAreYuDong I'd like to explore and run inference on few of my tables, already holds RI, CI,Cd (Row index, Column Index, Children) fields which were generated using the file but in the script it refers to few json files which consist of 4 fields: id, caption, header, rows. I can't see those fields in the data, I have also checked the raw data in the download section here: Hope it's not a hassle for you, |
I've downloaded the dataset but it needs some pre-processing to get it to your format, as in the sample you provide in the repo.
Do you have the scripts for this process?
The text was updated successfully, but these errors were encountered: