Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Data Files Due to 404 Errors in get-all-data.sh #45

Open
ciioprof0 opened this issue Nov 24, 2024 · 1 comment
Open

Missing Data Files Due to 404 Errors in get-all-data.sh #45

ciioprof0 opened this issue Nov 24, 2024 · 1 comment

Comments

@ciioprof0
Copy link

The script get-all-data.sh downloads most of the data files successfully but fails to download the following files due to 404 errors:

  • data/nmt/eng-fra.txt
  • data/nmt/simplest_eng_fra.csv
  • data/yelp/raw_train.csv

Example Error Content

In the case of raw_train.csv, the downloaded file contains the following HTML, indicating a "404 Not Found" error:

<html lang="en" dir="ltr">
  <meta charset="utf-8">
  <meta name="viewport" content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 404 (Not Found)!!1</title>
  <style>
    /* Truncated for brevity */
  </style>
  <main>
    <a href="//www.google.com">
      <span id="logo" aria-label="Google" role="img"></span>
    </a>
    <p><b>404.</b> <ins>That’s an error.</ins></p>
    <p>The requested URL was not found on this server. <ins>That’s all we know.</ins></p>
  </main>
</html>

Steps to Reproduce

  1. Run the get-all-data.sh script as described in the book.
  2. Observe that the listed files are not downloaded, and the resulting files contain 404 error messages.

Expected Behavior

The script should download the required data files or provide updated instructions if the files have been moved or removed.

Suggestions

  • Verify whether the file links (Google Drive IDs or other sources) are still valid.
  • Provide updated links or alternative sources for the missing files.
  • If the files are no longer available, include placeholder data or instructions to generate equivalent datasets.

Thank you for addressing this issue!

@ciioprof0
Copy link
Author

Alternate locations for missing data files.

- data/nmt/eng-fra.txt: https://github.com/saranshmanu/Neural-Machine-Translation/blob/master/dataset/eng-fra.txt
- data/nmt/simplest_eng_fra.csv: https://github.com/an1604/Neural-Machine-Translation-NMT-Model/blob/main/simplest_eng_fra.csv
- data/yelp/raw_train.csv: https://www.kaggle.com/datasets/hhalalwi/yelp-light/data?select=raw_train.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant