Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is poppler installed and in PATH #38

Open
uteitler opened this issue Aug 26, 2023 · 1 comment
Open

Is poppler installed and in PATH #38

uteitler opened this issue Aug 26, 2023 · 1 comment

Comments

@uteitler
Copy link

i changed the code slightly to point it to a directory with pdf files ..
loader = DirectoryLoader("Q:/", recursive=True)

And keep getting the following errors ..

I have tried:
pip3 install pdf2image pdfminer.six
somewhere else advised
pip install unstructured==0.7.12

however then I got "pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information."
so I did a pip install tesseract

And now I end up back with the poppler error

File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\pdf2image\pdf2image.py", line 568, in pdfinfo_from_path
proc = Popen(command, env=env, stdout=PIPE, stderr=PIPE)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\subprocess.py", line 1026, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\subprocess.py", line 1538, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\User\Downloads\chatgpt-retrieval-main\chatgpt-retrieval-main\chatgpt.py", line 37, in
index = VectorstoreIndexCreator(vectorstore_kwargs={"persist_directory":"persist"}).from_loaders([loader])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\indexes\vectorstore.py", line 81, in from_loaders
docs.extend(loader.load())
^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\document_loaders\directory.py", line 156, in load
self.load_file(i, p, docs, pbar)
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\document_loaders\directory.py", line 105, in load_file
raise e
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\document_loaders\directory.py", line 99, in load_file
sub_docs = self.loader_cls(str(item), **self.loader_kwargs).load()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\document_loaders\unstructured.py", line 86, in load
elements = self._get_elements()
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\document_loaders\unstructured.py", line 172, in _get_elements
return partition(filename=self.file_path, **self.unstructured_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\unstructured\partition\auto.py", line 180, in partition
elements = partition_pdf(
^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\unstructured\documents\elements.py", line 138, in wrapper
elements = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\unstructured\file_utils\filetype.py", line 519, in wrapper
elements = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\unstructured\partition\pdf.py", line 83, in partition_pdf
return partition_pdf_or_image(
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\unstructured\partition\pdf.py", line 141, in partition_pdf_or_image
return _partition_pdf_or_image_with_ocr(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\unstructured\utils.py", line 43, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\unstructured\partition\pdf.py", line 353, in _partition_pdf_or_image_with_ocr
document = pdf2image.convert_from_path(filename)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\pdf2image\pdf2image.py", line 127, in convert_from_path
page_count = pdfinfo_from_path(
^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\pdf2image\pdf2image.py", line 594, in pdfinfo_from_path
raise PDFInfoNotInstalledError(
pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?

@Junaid-Nazir-828
Copy link

try installing pypdf pip install pypdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants