Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with partition_pdf module #32

Open
decsousa opened this issue Aug 3, 2023 · 8 comments
Open

Problem with partition_pdf module #32

decsousa opened this issue Aug 3, 2023 · 8 comments

Comments

@decsousa
Copy link

decsousa commented Aug 3, 2023

Hello, when I try to run the code the following error is displayed:

Traceback (most recent call last):
File "C:\Users\Diego Sousa\Desktop\botchatgpt\botchatgpt\chat02.py", line 35, in
index = VectorstoreIndexCreator().from_loaders([loader])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Diego Sousa\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\indexes\vectorstore.py", line 72, in from_loaders
docs.extend(loader.load())
^^^^^^^^^^^^^
File "C:\Users\Diego Sousa\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\document_loaders\directory.py", line 137, in load
self.load_file(i, p, docs, pbar)
File "C:\Users\Diego Sousa\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\document_loaders\directory.py", line 94, in load_file
raise e
File "C:\Users\Diego Sousa\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\document_loaders\directory.py", line 88, in load_file
sub_docs = self.loader_cls(str(item), **self.loader_kwargs).load()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Diego Sousa\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\document_loaders\unstructured.py", line 86, in load
elements = self._get_elements()
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Diego Sousa\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\document_loaders\unstructured.py", line 171, in _get_elements
return partition(filename=self.file_path, **self.unstructured_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Diego Sousa\AppData\Local\Programs\Python\Python311\Lib\site-packages\unstructured\partition\auto.py", line 221, in partition
elements = partition_pdf(
^^^^^^^^^^^^^
NameError: name 'partition_pdf' is not defined. Did you mean: 'partition_xml'?

has anyone had this same problem?

@psujit775
Copy link

+1

1 similar comment
@GavinXZhang
Copy link

+1

@JayKayNJIT
Copy link

Following

@fengmzhu
Copy link

fengmzhu commented Aug 5, 2023

+1

@3dylson
Copy link

3dylson commented Aug 5, 2023

To make it work I had to:

at the file .../site-packages/unstructured/partition/auto.py

add the line: from unstructured.partition.pdf import partition_pdf

then pip3 install pdf2image pdfminer.six

last if you have macOS, search 'Install Certificates.command' in the finder and open it.

Then do the following steps in the terminal:

python3
import nltk
nltk.download()

@bobbyfongprivate
Copy link

Downgrading to version 0.7.12 resolved the problem for me. You can do this by running the following command in your virtual environment:

pip install unstructured==0.7.12

@fire115
Copy link

fire115 commented Aug 15, 2023

pip install unstructured==0.7.12 works

@Zhi0467
Copy link

Zhi0467 commented Jun 28, 2024

To make it work I had to:

at the file .../site-packages/unstructured/partition/auto.py

add the line: from unstructured.partition.pdf import partition_pdf

then pip3 install pdf2image pdfminer.six

last if you have macOS, search 'Install Certificates.command' in the finder and open it.

Then do the following steps in the terminal:

python3
import nltk
nltk.download()

I tried this but then I got this error:
File "/Users/wangzhi/anaconda3/envs/chat/lib/python3.12/site-packages/langchain_community/document_loaders/unstructured.py", line 168, in _get_elements
from unstructured.partition.auto import partition
File "/Users/wangzhi/anaconda3/envs/chat/lib/python3.12/site-packages/unstructured/partition/auto.py", line 28, in
from unstructured.partition.pdf import partition_pdf
File "/Users/wangzhi/anaconda3/envs/chat/lib/python3.12/site-packages/unstructured/partition/pdf.py", line 19, in
from pillow_heif import register_heif_opener
ModuleNotFoundError: No module named 'pillow_heif'

any ideas please? @3dylson

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants