Broken Link to PubMed Abstracts dataset #623

yacinebouaouni · 2023-09-26T20:41:37Z

The link provided in Section 5 / Big data? 🤗 Datasets to the rescue! :
data_files = "https://the-eye.eu/public/AI/pile_preliminary_components/PUBMED_title_abstracts_2019_baseline.jsonl.zst"
is broken

The text was updated successfully, but these errors were encountered:

qualis2006 · 2024-01-08T07:21:17Z

Here is the Huggingface repository that I have created for the pubmed abstract dataset that you may want to look at:

from datasets import load_dataset
pubmed_dataset = load_dataset("qualis2006/PUBMED_title_abstracts_2020_baseline")
pubmed_dataset

Downloading data: 100%
7.98G/7.98G [11:47<00:00, 9.68MB/s]
Generating train split: 17722096/0 [00:36<00:00, 505376.37 examples/s]

DatasetDict({
train: Dataset({
features: ['meta', 'text'],
num_rows: 17722096
})

mik-tf · 2024-02-26T18:40:51Z

@qualis2006 Nice! Thanks. On my end, it works using your code, and then I need to call pubmed_dataset['train'] instead of pubmed_dataset throughout the rest of the page.

To run the code as is on the page, we can download the dataset with the full URL.

data_files="https://huggingface.co/datasets/qualis2006/PUBMED_title_abstracts_2020_baseline/resolve/main/PUBMED_title_abstracts_2020_baseline.jsonl.zst"

@yacinebouaouni this line should work.

mariosasko mentioned this issue Oct 2, 2023

Broken Link to PubMed Abstracts dataset . huggingface/datasets#6273

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broken Link to PubMed Abstracts dataset #623

Broken Link to PubMed Abstracts dataset #623

yacinebouaouni commented Sep 26, 2023

qualis2006 commented Jan 8, 2024

mik-tf commented Feb 26, 2024 •

edited

Loading

Broken Link to PubMed Abstracts dataset #623

Broken Link to PubMed Abstracts dataset #623

Comments

yacinebouaouni commented Sep 26, 2023

qualis2006 commented Jan 8, 2024

mik-tf commented Feb 26, 2024 • edited Loading

mik-tf commented Feb 26, 2024 •

edited

Loading