Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken Link: Chapter 5.4 Big Data #595

Open
hrh-bbc-rd opened this issue Jul 17, 2023 · 4 comments
Open

Broken Link: Chapter 5.4 Big Data #595

hrh-bbc-rd opened this issue Jul 17, 2023 · 4 comments

Comments

@hrh-bbc-rd
Copy link

The link to the PubMed Abstracts Database is broken in the Chapter 5 Section 4 'Big Datasets Chapter'.

Broken link in question found in

data_files = "https://the-eye.eu/public/AI/pile_preliminary_components/PUBMED_title_abstracts_2019_baseline.jsonl.zst"

Chapter here

@hrh-bbc-rd
Copy link
Author

I have been able to continue doing the course by using this link instead

data_files = "https://the-eye.eu/public/AI/pile_v2/data/NIH_ExPORTER_awarded_grant_text.jsonl.zst"

@tj-cahill
Copy link

Looks like this URL changing and breaking the link has been an issue before (see #324)

@tj-cahill
Copy link

Note that there is another broken link further down the page on this line in the following code block:

law_dataset_streamed = load_dataset(
    "json",
    data_files="https://the-eye.eu/public/AI/pile_preliminary_components/FreeLaw_Opinions.jsonl.zst",
    split="train",
    streaming=True,
)
next(iter(law_dataset_streamed))

@Dboee
Copy link

Dboee commented Jan 24, 2024

Same issue here, looks like the pile has been taken down due to copyright reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants