From 7a49bec8bf7f5a47b37d471f0c22b8e4ef643b21 Mon Sep 17 00:00:00 2001 From: Lohith <32676813+kslohith@users.noreply.github.com> Date: Tue, 7 Nov 2023 10:25:21 -0500 Subject: [PATCH] Verified that issue #1067 is resolved and added documentation for load pdf functionality. (#1343) Issue #1067 about not being able to load pdf files, was verified to be working with evadb documentation pdf and a new page for loading pdf is added to the documentation. Screenshot 2023-11-07 at 1 33 01 AM Co-authored-by: Lohith K S --- docs/_toc.yml | 1 + docs/source/reference/evaql/load_pdf.rst | 16 ++++++++++++++++ 2 files changed, 17 insertions(+) create mode 100644 docs/source/reference/evaql/load_pdf.rst diff --git a/docs/_toc.yml b/docs/_toc.yml index a50c653579..ca191ce42d 100644 --- a/docs/_toc.yml +++ b/docs/_toc.yml @@ -45,6 +45,7 @@ parts: - file: source/reference/evaql/load_csv - file: source/reference/evaql/load_image - file: source/reference/evaql/load_video + - file: source/reference/evaql/load_pdf - file: source/reference/evaql/select - file: source/reference/evaql/explain - file: source/reference/evaql/show_functions diff --git a/docs/source/reference/evaql/load_pdf.rst b/docs/source/reference/evaql/load_pdf.rst new file mode 100644 index 0000000000..8ced9aeb3f --- /dev/null +++ b/docs/source/reference/evaql/load_pdf.rst @@ -0,0 +1,16 @@ +LOAD PDF +========== + +.. _load-pdf: + +.. code:: mysql + + LOAD PDF 'test_pdf.pdf' INTO MyPDFs; + +PDFs can be directly imported into a table, where the PDF document is segmented into pages and paragraphs. +Each row in the table corresponds to a paragraph extracted from the PDF, and the resulting table includes columns for ``name`` , ``page``, ``paragraph``, and ``data``. + +| ``name`` signifies the title of the uploaded PDF. +| ``page`` signifies the specific page number from which the data is retrieved. +| ``paragraph`` signifies the individual paragraph within a page from which the data is extracted. +| ``data`` refers to the text extracted from the paragraph on the given page.