How to convert WikiTables dataset to your JSON format? #4

eloukas · 2021-11-05T16:03:49Z

I've downloaded the dataset but it needs some pre-processing to get it to your format, as in the sample you provide in the repo.
Do you have the scripts for this process?

eloukas · 2021-11-10T10:11:19Z

Hi @HaoAreYuDong, I saw your reply to the WDC dataset request.
Could you also maybe share here the complete pre-processed WikiTables dataset or your sample script for the preprocessing?

I need it to reproduce your work.

Many thanks!

nickmagginas · 2021-11-17T10:48:14Z

Hello @HaoAreYuDong . I am also interested in replicating parts of your work. I would also greatly appreciate it if you could share the preprocessing script for WikiTables as it is a must for reaching impressive results such as the ones you reported.
Thank you very much in advance and thank you for your great work as well as for open sourcing it. :)

HaoAreYuDong · 2021-11-18T13:12:41Z

Data can be downloaded from: https://drive.google.com/file/d/1XyZAtH9F8UoLsXHsBWriWkh9pZ92e3sX/view?usp=sharing (for wikidata.pt) and https://drive.google.com/file/d/19GYZyJNlOMk8xB_nhn1lfwfEWXWj4Iww (for wikidata.json).
Script will be uploaded soon.

HaoAreYuDong · 2021-12-10T08:31:15Z

Hello @HaoAreYuDong . I am also interested in replicating parts of your work. I would also greatly appreciate it if you could share the preprocessing script for WikiTables as it is a must for reaching impressive results such as the ones you reported. Thank you very much in advance and thank you for your great work as well as for open sourcing it. :)

Please find an example preprocessing script at https://drive.google.com/drive/folders/1fEiBs9d6GwV1zq8Kk4bMbWMEUeYrLkeL?usp=sharing.

HaoAreYuDong · 2021-12-10T08:36:27Z

Hi @HaoAreYuDong, I saw your reply to the WDC dataset request. Could you also maybe share here the complete pre-processed WikiTables dataset or your sample script for the preprocessing?

I need it to reproduce your work.

Many thanks!

Please find an example script at https://drive.google.com/drive/folders/1fEiBs9d6GwV1zq8Kk4bMbWMEUeYrLkeL?usp=sharing.

codingforpleasure · 2023-08-03T09:07:56Z

Hi @HaoAreYuDong,
Thank you for sharing your work, I find the article very interesting.

I noticed in the tuta/data/pretrain/wiki-table-samples.json each table has a unique id, Is there away to see the origin table on the web?
It is mentioned in the file split_wiki.py:
updated_structured_html_tables can be downloaded from:

http://l3s.de/~fetahu/wiki_tables/data/table_data/html_data/structured_html_table_data.json.gz
But the link is down (page not found), can you please post a working link,
Thank you in advance.

HaoAreYuDong · 2023-08-03T11:31:41Z

Hi @HaoAreYuDong, Thank you for sharing your work, I find the article very interesting.

I noticed in the tuta/data/pretrain/wiki-table-samples.json each table has a unique id, Is there away to see the origin table on the web?

It is mentioned in the file split_wiki.py:
updated_structured_html_tables can be downloaded from:

http://l3s.de/~fetahu/wiki_tables/data/table_data/html_data/structured_html_table_data.json.gz But the link is down (page not found), can you please post a working link, Thank you in advance.

I did not save the original file but still have a processed file which largely maintains the information. But it is big, is there any way to share with you? E.g., a shared folder.

codingforpleasure · 2023-08-04T22:44:56Z

@HaoAreYuDong thank you for the quick response,
I'll be glad if you could please upload the processed file to this google drive.

I made sure it has enough free space (about 55 GB).
Thank you!

HaoAreYuDong · 2023-08-05T03:32:31Z

@HaoAreYuDong thank you for the quick response, I'll be glad if you could please upload the processed file to this google drive.

I made sure it has enough free space (about 55 GB). Thank you!

Great. Thanks.

HaoAreYuDong · 2023-08-05T03:53:10Z

@HaoAreYuDong thank you for the quick response, I'll be glad if you could please upload the processed file to this google drive.

I made sure it has enough free space (about 55 GB). Thank you!

It;s here. https://drive.google.com/file/d/19GYZyJNlOMk8xB_nhn1lfwfEWXWj4Iww/view?usp=drive_link
Hope that it can help you.

codingforpleasure · 2023-08-08T10:54:06Z

@HaoAreYuDong I'd like to explore and run inference on few of my tables,
according to data you have shared:
https://drive.google.com/file/d/19GYZyJNlOMk8xB_nhn1lfwfEWXWj4Iww/view?usp=drive_link

already holds RI, CI,Cd (Row index, Column Index, Children) fields which were generated using the file process_wiki.pt you posted here.

but in the script it refers to few json files which consist of 4 fields: id, caption, header, rows.

I can't see those fields in the data,
can you please share at least one json file which demonstrates the file structure before running on it the script.

I have also checked the raw data in the download section here:
A dataset of 1.6M Wikipedia Tables in JSON format
http://websail-fe.cs.northwestern.edu/TabEL/ and neither of them hold the spoken fields.

Hope it's not a hassle for you,
Thank you I appreciate it

eloukas mentioned this issue Nov 10, 2021

How to convert WDC dataset to your JSON format? #5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to convert WikiTables dataset to your JSON format? #4

How to convert WikiTables dataset to your JSON format? #4

eloukas commented Nov 5, 2021

eloukas commented Nov 10, 2021 •

edited

Loading

nickmagginas commented Nov 17, 2021 •

edited

Loading

HaoAreYuDong commented Nov 18, 2021 •

edited

Loading

HaoAreYuDong commented Dec 10, 2021

HaoAreYuDong commented Dec 10, 2021

codingforpleasure commented Aug 3, 2023 •

edited

Loading

HaoAreYuDong commented Aug 3, 2023

codingforpleasure commented Aug 4, 2023

HaoAreYuDong commented Aug 5, 2023

HaoAreYuDong commented Aug 5, 2023

codingforpleasure commented Aug 8, 2023 •

edited

Loading

How to convert WikiTables dataset to your JSON format? #4

How to convert WikiTables dataset to your JSON format? #4

Comments

eloukas commented Nov 5, 2021

eloukas commented Nov 10, 2021 • edited Loading

nickmagginas commented Nov 17, 2021 • edited Loading

HaoAreYuDong commented Nov 18, 2021 • edited Loading

HaoAreYuDong commented Dec 10, 2021

HaoAreYuDong commented Dec 10, 2021

codingforpleasure commented Aug 3, 2023 • edited Loading

HaoAreYuDong commented Aug 3, 2023

codingforpleasure commented Aug 4, 2023

HaoAreYuDong commented Aug 5, 2023

HaoAreYuDong commented Aug 5, 2023

codingforpleasure commented Aug 8, 2023 • edited Loading

eloukas commented Nov 10, 2021 •

edited

Loading

nickmagginas commented Nov 17, 2021 •

edited

Loading

HaoAreYuDong commented Nov 18, 2021 •

edited

Loading

codingforpleasure commented Aug 3, 2023 •

edited

Loading

codingforpleasure commented Aug 8, 2023 •

edited

Loading