From 7a49bec8bf7f5a47b37d471f0c22b8e4ef643b21 Mon Sep 17 00:00:00 2001
From: Lohith <32676813+kslohith@users.noreply.github.com>
Date: Tue, 7 Nov 2023 10:25:21 -0500
Subject: [PATCH] Verified that issue #1067 is resolved and added documentation
for load pdf functionality. (#1343)
Issue #1067 about not being able to load pdf files, was verified to be
working with evadb documentation pdf and a new page for loading pdf is
added to the documentation.
Co-authored-by: Lohith K S
---
docs/_toc.yml | 1 +
docs/source/reference/evaql/load_pdf.rst | 16 ++++++++++++++++
2 files changed, 17 insertions(+)
create mode 100644 docs/source/reference/evaql/load_pdf.rst
diff --git a/docs/_toc.yml b/docs/_toc.yml
index a50c653579..ca191ce42d 100644
--- a/docs/_toc.yml
+++ b/docs/_toc.yml
@@ -45,6 +45,7 @@ parts:
- file: source/reference/evaql/load_csv
- file: source/reference/evaql/load_image
- file: source/reference/evaql/load_video
+ - file: source/reference/evaql/load_pdf
- file: source/reference/evaql/select
- file: source/reference/evaql/explain
- file: source/reference/evaql/show_functions
diff --git a/docs/source/reference/evaql/load_pdf.rst b/docs/source/reference/evaql/load_pdf.rst
new file mode 100644
index 0000000000..8ced9aeb3f
--- /dev/null
+++ b/docs/source/reference/evaql/load_pdf.rst
@@ -0,0 +1,16 @@
+LOAD PDF
+==========
+
+.. _load-pdf:
+
+.. code:: mysql
+
+ LOAD PDF 'test_pdf.pdf' INTO MyPDFs;
+
+PDFs can be directly imported into a table, where the PDF document is segmented into pages and paragraphs.
+Each row in the table corresponds to a paragraph extracted from the PDF, and the resulting table includes columns for ``name`` , ``page``, ``paragraph``, and ``data``.
+
+| ``name`` signifies the title of the uploaded PDF.
+| ``page`` signifies the specific page number from which the data is retrieved.
+| ``paragraph`` signifies the individual paragraph within a page from which the data is extracted.
+| ``data`` refers to the text extracted from the paragraph on the given page.