You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
http://archive.org/ - Contact the internet archive to give you a listing of all the data you want, the Internet Archive is a giant library filled with documents (books, manuals, and other random PDF files) and other interesting files, including JSONs from mirrored online videos, sometimes including their comments and just random important documents. For PDF files, they provide a variety of formats, like an OCR txt and a OCR xml. See https://archive.org/download/andrus-thesis as an example. Just to note, they also include mirrored online videos including their metadata and sometimes comments. See https://archive.org/download/youtube-DPMluEVUqS0 as an example of this and https://archive.org/download/instagram-apple as another format commonly used. The archive also provides directory listings on common compressed files, so you can scrape them for documents too. See #11 for formats.
The text was updated successfully, but these errors were encountered:
http://archive.org/ - Contact the internet archive to give you a listing of all the data you want, the Internet Archive is a giant library filled with documents (books, manuals, and other random PDF files) and other interesting files, including JSONs from mirrored online videos, sometimes including their comments and just random important documents. For PDF files, they provide a variety of formats, like an OCR txt and a OCR xml. See https://archive.org/download/andrus-thesis as an example. Just to note, they also include mirrored online videos including their metadata and sometimes comments. See https://archive.org/download/youtube-DPMluEVUqS0 as an example of this and https://archive.org/download/instagram-apple as another format commonly used. The archive also provides directory listings on common compressed files, so you can scrape them for documents too. See #11 for formats.
The text was updated successfully, but these errors were encountered: