Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fandom #12

Open
upintheairsheep opened this issue Feb 13, 2023 · 3 comments
Open

Fandom #12

upintheairsheep opened this issue Feb 13, 2023 · 3 comments

Comments

@upintheairsheep
Copy link

Use in congection with https://github.com/WikiTeam/wikiteam to dump the textual contents of every wiki on there(i dont know if we should include the histories or not), or maybe only the most popular 10000 wikis. We should also integrate https://archive.org/details/wikia_dump_20200214 , https://archive.org/details/wikia_dump_20180602 , and the Fandom wikis of https://archive.org/details/wikiteam for the purpose of having the contents of deleted pages and wikis in the Pile. Having every Fandom wiki in the Pile would be really beneficial for AI, as the Fandom website includes vast knowledge of fiction, and it would make every future AI have accurate knowledge of fictional stuff, as well as real-world stuff

@upintheairsheep
Copy link
Author

This is a less recent dataset, https://www.kaggle.com/datasets/abeserra/wikia-census

@upintheairsheep
Copy link
Author

And just to note, remember to make a dump of every Fandom wiki starting with the 1000 most popular ones, then going down to all of the other wikis, then integrate previous dumps of Fandom to compensate for deleted pages and deleted wikis.

@upintheairsheep
Copy link
Author

Fandom text is licensed under an open licence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant