Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource punkt not found. #755

Closed
kpennell opened this issue Jun 21, 2023 · 2 comments
Closed

Resource punkt not found. #755

kpennell opened this issue Jun 21, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@kpennell
Copy link

kpennell commented Jun 21, 2023

Note: if you'd like to ask a question or open a discussion, head over to the Discussions section and post it there.

Describe the bug and how to reproduce it
I put some docx and pptx files in the source docs folder (I had it working fine with just state of the union) and now it doesn't want to ingest.

Expected behavior
Hoped it would ingest my docs

Environment (please complete the following information):

  • macOS Monterey / Version 12.6.6 (21G646) / Processor 2.6 GHz 6-Core Intel Core i7
    • Python 3.12.0b3

Additional context
Here's what I'm getting. Not sure what to change on the nltk front.

Here's a summary of what's happening:

- The script is attempting to execute the command python3 ingest.py.
  • There is an error related to the NLTK library: Error loading averaged_perceptron_tagger and Error loading punkt. The error message suggests that the required NLTK resource punkt is not found.
  • The script is trying to append data to an existing vectorstore located at db.
  • It is using an embedded DuckDB with persistence, meaning the data will be stored in a file or database named db.
  • The script is loading documents from a source directory called source_documents.
  • It is attempting to load new documents, but there seems to be an issue with the NLTK library again, as indicated by the error messages.
  • The loading process is interrupted with a traceback indicating an error occurred while executing the function load_single_document.
  • The traceback suggests that the error is related to the NLTK resource punkt not being found, and it advises using the NLTK Downloader to obtain the missing resource.

In summary, the script ingest.py is encountering errors related to the NLTK library and the missing punkt resource, which is preventing it from loading and processing documents.

Big error I'm getting:
LookupError:


Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:

import nltk
nltk.download('punkt')

For more information see: https://www.nltk.org/data.html

Attempted to load tokenizers/punkt/PY3/english.pickle

Searched in:
- '/Users/kylepennell/nltk_data'
- '/Users/kylepennell/Desktop/myenv2/nltk_data'
- '/Users/kylepennell/Desktop/myenv2/share/nltk_data'
- '/Users/kylepennell/Desktop/myenv2/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- ''


Whole thing:

Appending to existing vectorstore at db
Using embedded DuckDB with persistence: data will be stored in: db
Loading documents from source_documents
Loading new documents:   1%|▏                    | 1/98 [00:02<04:42,  2.92s/it][nltk_data] Error loading punkt: <urlopen error [SSL:
[nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data]     unable to get local issuer certificate (_ssl.c:1002)>
[nltk_data] Error loading punkt: <urlopen error [SSL:
[nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data]     unable to get local issuer certificate (_ssl.c:1002)>
[nltk_data] Error loading punkt: <urlopen error [SSL:
[nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data]     unable to get local issuer certificate (_ssl.c:1002)>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
[nltk_data]     failed: unable to get local issuer certificate
[nltk_data]     (_ssl.c:1002)>
[nltk_data] Error loading punkt: <urlopen error [SSL:
[nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data]     unable to get local issuer certificate (_ssl.c:1002)>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
[nltk_data]     failed: unable to get local issuer certificate
[nltk_data]     (_ssl.c:1002)>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
[nltk_data]     failed: unable to get local issuer certificate
[nltk_data]     (_ssl.c:1002)>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
[nltk_data]     failed: unable to get local issuer certificate
[nltk_data]     (_ssl.c:1002)>
Loading new documents:   7%|█▌                   | 7/98 [00:05<01:06,  1.36it/s]
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/privateGPT-main/ingest.py", line 89, in load_single_document
    return loader.load()
           ^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/langchain/document_loaders/unstructured.py", line 71, in load
    elements = self._get_elements()
               ^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/langchain/document_loaders/word_document.py", line 102, in _get_elements
    return partition_docx(filename=self.file_path, **self.unstructured_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/unstructured/partition/docx.py", line 144, in partition_docx
    para_element: Optional[Text] = _paragraph_to_element(paragraph)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/unstructured/partition/docx.py", line 185, in _paragraph_to_element
    return _text_to_element(text)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/unstructured/partition/docx.py", line 201, in _text_to_element
    elif is_possible_narrative_text(text):
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/unstructured/partition/text_type.py", line 76, in is_possible_narrative_text
    if exceeds_cap_ratio(text, threshold=cap_threshold):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/unstructured/partition/text_type.py", line 273, in exceeds_cap_ratio
    if sentence_count(text, 3) > 1:
       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/unstructured/partition/text_type.py", line 222, in sentence_count
    sentences = sent_tokenize(text)
                ^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/unstructured/nlp/tokenize.py", line 38, in sent_tokenize
    return _sent_tokenize(text)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 106, in sent_tokenize
    tokenizer = load(f"tokenizers/punkt/{language}.pickle")
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/nltk/data.py", line 750, in load
    opened_resource = _open(resource_url)
                      ^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/nltk/data.py", line 876, in _open
    return find(path_, path + [""]).open()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/nltk/data.py", line 583, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt/PY3/english.pickle

  Searched in:
    - '/Users/kylepennell/nltk_data'
    - '/Users/kylepennell/Desktop/myenv2/nltk_data'
    - '/Users/kylepennell/Desktop/myenv2/share/nltk_data'
    - '/Users/kylepennell/Desktop/myenv2/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/kylepennell/Desktop/privateGPT-main/ingest.py", line 166, in <module>
    main()
  File "/Users/kylepennell/Desktop/privateGPT-main/ingest.py", line 150, in main
    texts = process_documents([metadata['source'] for metadata in collection['metadatas']])
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/privateGPT-main/ingest.py", line 118, in process_documents
    documents = load_documents(source_directory, ignored_files)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/privateGPT-main/ingest.py", line 107, in load_documents
    for i, docs in enumerate(pool.imap_unordered(load_single_document, filtered_files)):
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/pool.py", line 873, in next
    raise value
LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt/PY3/english.pickle

  Searched in:
    - '/Users/kylepennell/nltk_data'
    - '/Users/kylepennell/Desktop/myenv2/nltk_data'
    - '/Users/kylepennell/Desktop/myenv2/share/nltk_data'
    - '/Users/kylepennell/Desktop/myenv2/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

(myenv2) kylepennell@Kyles-MacBook-Pro privateGPT-main % python3 ingest.py
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
[nltk_data]     failed: unable to get local issuer certificate
[nltk_data]     (_ssl.c:1002)>
Appending to existing vectorstore at db
Using embedded DuckDB with persistence: data will be stored in: db
Loading documents from source_documents
Loading new documents:   0%|                             | 0/98 [00:00<?, ?it/s][nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
[nltk_data]     failed: unable to get local issuer certificate
[nltk_data]     (_ssl.c:1002)>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
[nltk_data]     failed: unable to get local issuer certificate
[nltk_data]     (_ssl.c:1002)>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
[nltk_data]     failed: unable to get local issuer certificate
[nltk_data]     (_ssl.c:1002)>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
[nltk_data]     failed: unable to get local issuer certificate
[nltk_data]     (_ssl.c:1002)>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
[nltk_data]     failed: unable to get local issuer certificate
[nltk_data]     (_ssl.c:1002)>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
[nltk_data]     failed: unable to get local issuer certificate
[nltk_data]     (_ssl.c:1002)>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
[nltk_data]     failed: unable to get local issuer certificate
[nltk_data]     (_ssl.c:1002)>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
[nltk_data]     failed: unable to get local issuer certificate
[nltk_data]     (_ssl.c:1002)>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
[nltk_data]     failed: unable to get local issuer certificate
[nltk_data]     (_ssl.c:1002)>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
[nltk_data]     failed: unable to get local issuer certificate
[nltk_data]     (_ssl.c:1002)>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
[nltk_data]     failed: unable to get local issuer certificate
[nltk_data]     (_ssl.c:1002)>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
[nltk_data]     failed: unable to get local issuer certificate
[nltk_data]     (_ssl.c:1002)>
Loading new documents:   1%|▏                    | 1/98 [00:04<06:30,  4.03s/it][nltk_data] Error loading punkt: <urlopen error [SSL:
[nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data]     unable to get local issuer certificate (_ssl.c:1002)>
[nltk_data] Error loading punkt: <urlopen error [SSL:
[nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data]     unable to get local issuer certificate (_ssl.c:1002)>
[nltk_data] Error loading punkt: <urlopen error [SSL:
[nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data]     unable to get local issuer certificate (_ssl.c:1002)>
[nltk_data] Error loading punkt: <urlopen error [SSL:
[nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data]     unable to get local issuer certificate (_ssl.c:1002)>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
[nltk_data]     failed: unable to get local issuer certificate
[nltk_data]     (_ssl.c:1002)>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
[nltk_data]     failed: unable to get local issuer certificate
[nltk_data]     (_ssl.c:1002)>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
[nltk_data]     failed: unable to get local issuer certificate
[nltk_data]     (_ssl.c:1002)>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
[nltk_data]     failed: unable to get local issuer certificate
[nltk_data]     (_ssl.c:1002)>
Loading new documents:   7%|█▌                   | 7/98 [00:04<01:01,  1.49it/s]
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/privateGPT-main/ingest.py", line 96, in load_single_document
    return loader.load()
           ^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/langchain/document_loaders/unstructured.py", line 71, in load
    elements = self._get_elements()
               ^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/langchain/document_loaders/word_document.py", line 102, in _get_elements
    return partition_docx(filename=self.file_path, **self.unstructured_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/unstructured/partition/docx.py", line 144, in partition_docx
    para_element: Optional[Text] = _paragraph_to_element(paragraph)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/unstructured/partition/docx.py", line 185, in _paragraph_to_element
    return _text_to_element(text)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/unstructured/partition/docx.py", line 201, in _text_to_element
    elif is_possible_narrative_text(text):
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/unstructured/partition/text_type.py", line 76, in is_possible_narrative_text
    if exceeds_cap_ratio(text, threshold=cap_threshold):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/unstructured/partition/text_type.py", line 273, in exceeds_cap_ratio
    if sentence_count(text, 3) > 1:
       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/unstructured/partition/text_type.py", line 222, in sentence_count
    sentences = sent_tokenize(text)
                ^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/unstructured/nlp/tokenize.py", line 38, in sent_tokenize
    return _sent_tokenize(text)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 106, in sent_tokenize
    tokenizer = load(f"tokenizers/punkt/{language}.pickle")
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/nltk/data.py", line 750, in load
    opened_resource = _open(resource_url)
                      ^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/nltk/data.py", line 876, in _open
    return find(path_, path + [""]).open()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/myenv2/lib/python3.11/site-packages/nltk/data.py", line 583, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt/PY3/english.pickle

  Searched in:
    - '/Users/kylepennell/nltk_data'
    - '/Users/kylepennell/Desktop/myenv2/nltk_data'
    - '/Users/kylepennell/Desktop/myenv2/share/nltk_data'
    - '/Users/kylepennell/Desktop/myenv2/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/kylepennell/Desktop/privateGPT-main/ingest.py", line 173, in <module>
    main()
  File "/Users/kylepennell/Desktop/privateGPT-main/ingest.py", line 157, in main
    texts = process_documents([metadata['source'] for metadata in collection['metadatas']])
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/privateGPT-main/ingest.py", line 125, in process_documents
    documents = load_documents(source_directory, ignored_files)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kylepennell/Desktop/privateGPT-main/ingest.py", line 114, in load_documents
    for i, docs in enumerate(pool.imap_unordered(load_single_document, filtered_files)):
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/pool.py", line 873, in next
    raise value
LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt/PY3/english.pickle

  Searched in:
    - '/Users/kylepennell/nltk_data'
    - '/Users/kylepennell/Desktop/myenv2/nltk_data'
    - '/Users/kylepennell/Desktop/myenv2/share/nltk_data'
    - '/Users/kylepennell/Desktop/myenv2/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

(myenv2) kylepennell@Kyles-MacBook-Pro privateGPT-main % python3          
Python 3.11.4 (v3.11.4:d2340ef257, Jun  6 2023, 19:15:51) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download('punkt')
[nltk_data] Error loading punkt: <urlopen error [SSL:
[nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data]     unable to get local issuer certificate (_ssl.c:1002)>
False
>>> exit()
(myenv2) kylepennell@Kyles-MacBook-Pro privateGPT-main % 
@kpennell kpennell added the bug Something isn't working label Jun 21, 2023
@kpennell
Copy link
Author

seems to work - delip/PyTorchNLPBook#14

@DeepReef11
Copy link

DeepReef11 commented Jul 17, 2023

Q) Where am I suppose to run the mentioned code?

A) Add the following in ingest.py file:

import nltk
nltk.download('punkt')

then try again python ingest.py. If it doesn't work, take a look at the link above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants