Add model inference documentation (#1030)

This documentation shows case the flexibility of EvaDB's declarative language, and variety of task we can accomplish. A brief introduction to the optimization in EvaDB. Will populate more details over time when optimization features become stable.
georgia-tech-db · Sep 16, 2023 · a0ec785 · a0ec785
1 parent 04c539a
commit a0ec785
Show file tree

Hide file tree

Showing 6 changed files with 141 additions and 44 deletions.
diff --git a/docs/_toc.yml b/docs/_toc.yml
@@ -9,6 +9,8 @@ parts:
             title: Installation Options
       - file: source/overview/connect-to-database
         title: Connect to Database
+      - file: source/overview/model-inference
+        title: Model inference
       - file: source/overview/concepts
         title: Concepts
         sections:
@@ -77,6 +79,9 @@ parts:
             title: YOLO 
           - file: source/reference/ai/custom
             title: Custom Model
+
+      - file: source/reference/optimizations
+        title: Optimizations
 
       # - file: source/reference/io
       #   title: IO Descriptors

diff --git a/docs/source/overview/connect-to-database.rst b/docs/source/overview/connect-to-database.rst
@@ -6,52 +6,45 @@ EvaDB supports an extensive range of data sources for structured and unstructure
 Connect to a SQL Database System
 --------------------------------
 
-1. Use the `CREATE DATABASE` statement to connect to an existing SQL database.
+1. Use the ``CREATE DATABASE`` statement to connect to an existing SQL database.
 
-.. code-block:: python
+.. code-block::
 
-   cursor.query("""
-        CREATE DATABASE restaurant_reviews 
-        WITH ENGINE = 'postgres', 
-        PARAMETERS = {
-            "user": "eva",
-            "password": "password",
-            "host": "localhost",
-            "port": "5432",
-            "database": "restaurant_reviews"
-     	   };""").df()
+   CREATE DATABASE restaurant_reviews 
+   WITH ENGINE = 'postgres', 
+   PARAMETERS = {
+       "user": "eva",
+       "password": "password",
+       "host": "localhost",
+       "port": "5432",
+       "database": "restaurant_reviews"
+   };
 
 .. note::
 
    Go over the :ref:`CREATE DATABASE<sql-create-database>` statement for more details. The :ref:`Databases<databases>` page lists all the database systems that EvaDB currently supports.
 
-2. Preview the Available Data Using `SELECT`
+2. Preview the Available Data Using ``SELECT``
 
-You can now preview the available data in the `restaurant_reviews` database with a standard :ref:`SELECT<sql-select>` statement.
+You can now preview the available data in the ``restaurant_reviews`` database with a standard :ref:`SELECT<sql-select>` statement.
 
-.. code-block:: python
+.. code-block:: sql
 
-   cursor.query("""
-      SELECT * 
-      FROM restaurant_reviews.food_review;
-      """).df()
+   SELECT * FROM restaurant_reviews.food_review;
 
-3. Run Native Queries in the Connected Database With `USE`
+3. Run Native Queries in the Connected Database With ``USE``
 
 You can also run native queries directly in the connected database system by the :ref:`USE<sql-use>` statement.
 
-.. code-block:: python
+.. code-block::
 
-   cursor.query(
-      """
-        USE restaurant_reviews {
-                INSERT INTO food_review (name, review) 
-                VALUES (
-                  'Customer 1', 
-                  'I ordered fried rice but it is too salty.'
-                )
-        };
-      """).df()
+   USE restaurant_reviews {
+       INSERT INTO food_review (name, review) 
+       VALUES (
+           'Customer 1', 
+           'I ordered fried rice but it is too salty.'
+       )
+   };
 
 
 Load Unstructured Data
@@ -63,23 +56,17 @@ EvaDB supports diverse types of unstructured data. Here are some examples:
 
 You can load a collection of images obtained from Reddit from the local filesystem into EvaDB using the :ref:`LOAD<sql-load>` statement.
 
-.. code-block:: python
-   
-   cursor.query("""
-      LOAD IMAGE 'reddit-images/*.jpg' 
-      INTO reddit_dataset;
-   """).df()
+.. code-block:: sql
+
+   LOAD IMAGE 'reddit-images/*.jpg' INTO reddit_dataset;
 
 2. Load Video from Cloud Bucket
 
 You can load a video from an S3 cloud bucket into EvaDB using the :ref:`LOAD<sql-load>` statement.
 
-.. code-block:: python
+.. code-block:: sql
 
-   cursor.query("""
-      LOAD VIDEO 's3://bucket/eva_videos/mnist.mp4' 
-      INTO MNISTVid;
-   """).df()
+   LOAD VIDEO 's3://bucket/eva_videos/mnist.mp4' INTO MNISTVid;
 
 .. note::
 

diff --git a/docs/source/overview/model-inference.rst b/docs/source/overview/model-inference.rst
@@ -0,0 +1,94 @@
+.. _model-inference:
+
+Model Inference
+===============
+
+In EvaDB, every model is a function. We can compose SQL queries using functions as building units similar to conventional SQL functions. EvaDB's `cascades optimizer <https://faculty.cc.gatech.edu/~jarulraj/courses/8803-s21/slides/22-cascades.pdf>` will optimize the evaluation of user-defined functions for lower latency. Go over :ref:`optimizations` for more details.
+
+.. note::
+
+   EvaDB ships with a variety of builtin user-defined functions. Go over :ref:`models` to check them. Did not find the desired model? Go over :ref:`udf` to create your own user-defined functions and contribute to EvaDB.
+
+1. Projection
+
+   The most common usecases are model inference in projections. For example, we can use the `MnistImageClassifier <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/mnist_image_classifier.py>`_ to identify numbers from the `MINST <https://www.dropbox.com/s/yxljxz6zxoqu54v/mnist.mp4>`_ video. 
+
+.. code-block:: sql
+
+   SELECT MnistImageClassifier(data).label FROM minst_vid;
+
+2. Selection
+
+   Another common usecases are model inference in selections. In the below example, we use ``TextSummarizer`` and ``TextClassifier`` from :ref:`HuggingFace<hf>` to summarize the negative food reviews.
+
+.. code-block:: sql
+
+   SELECT TextSummarizer(data)
+   FROM food_reviews
+   WHERE TextClassifier(data).label = 'NEGATIVE';
+
+EvaDB also provides specialized array operators to construct queries. Go over built-in utility operators and functions for all of them. Below is an example of ``CONTAIN``:
+
+.. code-block:: sql
+
+   SELECT id FROM camera_videos 
+   WHERE ObjectDetector(data).labels @> ['person', 'car'];
+
+3. Lateral Join
+
+   In EvaDB, we can also use models in joins.
+   The most powerful usecase is lateral join combined with ``UNNEST``, which is very helpful to flatten the output from `one-to-many` models.
+   The key idea here is a model could give multiple outputs (e.g., bounding box) stored in an array. This syntax is used to unroll elements from the array into multiple rows.
+   Typical examples are `face detectors <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/face_detector.py>`_ and `object detectors <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/fastrcnn_object_detector.py>`_. 
+   In the below example, we use `emotion detector <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/emotion_detector.py>_` to detect emotions from faces in the movie, where a single scene can contain multiple faces. 
+
+.. code-block:: sql
+   
+   SELECT EmotionDetector(Crop(data, Face.bbox))
+   FROM movie
+   LATERAL JOIN UNNEST(FaceDetector(data)) AS Face(bbox, conf);
+
+4. Aggregate Functions
+
+   Models can also be executed on a sequence of frames, particularly for action detection. This can be accomplished by utilizing ``GROUP BY`` and ``SEGMENT`` to concatenate consecutive frames into a single segment.
+
+.. code-block:: sql
+
+   SELECT ASLActionRecognition(SEGMENT(data)) 
+   FROM ASL_ACTIONS 
+   SAMPLE 5 
+   GROUP BY '16 frames';
+
+Here is another example grouping paragraphs from PDFs:
+
+.. code-block:: sql
+
+   SELECT SEGMENT(data) FROM MyPDFs GROUP BY '10 paragraphs';
+
+5. Order By
+
+   Models (typically feature extractors) can also be used in the ``ORDER BY`` for embedding-based similarity search. EvaDB also has index support to facilitate this type of queries.
+   In the below examples, we use the `SentenceFeatureExtractor <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/sentence_feature_extractor.py>`_ to find relevant context `When was the NATO created` from a collection of pdfs as the knowledge base. Go over `PrivateGPT notebook <https://github.com/georgia-tech-db/evadb/blob/staging/tutorials/13-privategpt.ipynb>`_ for more details.
+
+.. code-block:: sql
+
+   SELECT data FROM MyPDFs
+   ORDER BY Similarity(
+       SentenceFeatureExtractor('When was the NATO created?'),
+       SentenceFeatureExtractor(data)
+   );
+
+We can also use the `SiftFeatureExtractor <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/sift_feature_extractor.py>`_ to find similar images from a collection of images as the gallery. Go over :ref:`image-search` for more details.
+
+.. code-block:: sql
+
+   SELECT name FROM reddit_dataset
+   ORDER BY Similarity(
+       SiftFeatureExtractor(Open('reddit-images/cat.jpg')),
+       SiftFeatureExtractor(data)
+   );
+
+
+.. note::
+
+   Go over our :ref:`Usecases<sentiment-analysis>` to check more ways of utlizing models in real-world use cases.
diff --git a/docs/source/reference/ai/custom.rst b/docs/source/reference/ai/custom.rst
@@ -1,6 +1,5 @@
 .. _udf:
 
-
 Functions
 ======================
 

diff --git a/docs/source/reference/ai/index.rst b/docs/source/reference/ai/index.rst
@@ -1,3 +1,5 @@
+.. _models:
+
 Models
 ------------------------------------------
 
@@ -7,4 +9,4 @@ This section compiles a comprehensive catalog of the model integrations that Eva
 
 Please refer to the following table of contents for easy navigation:
 
-.. tableofcontents::
+.. tableofcontents::
diff --git a/docs/source/reference/optimizations.rst b/docs/source/reference/optimizations.rst
@@ -0,0 +1,10 @@
+.. _optimizations:
+
+Optimizations
+=============
+
+EvaDB optimizes the evaluation of user-defined functions in three manifolds.
+
+1. Cache expensive function invocations and reuse their results in future invocations.
+2. Cost-based predicate reordering to evaluate fast and selective predicate first.
+3. Ray-based distributed inference. EvaDB not only parallelizes model inference to improve GPU utilization but also builds pipeline to parallelize CPU processing (i.e., loading and decoding data).