From a0ec7850768f7d40b4a0a6feb71b557d916b5ecc Mon Sep 17 00:00:00 2001 From: Andy Xu Date: Sat, 16 Sep 2023 16:25:09 -0700 Subject: [PATCH] Add model inference documentation (#1030) This documentation shows case the flexibility of EvaDB's declarative language, and variety of task we can accomplish. A brief introduction to the optimization in EvaDB. Will populate more details over time when optimization features become stable. --- docs/_toc.yml | 5 ++ docs/source/overview/connect-to-database.rst | 71 ++++++--------- docs/source/overview/model-inference.rst | 94 ++++++++++++++++++++ docs/source/reference/ai/custom.rst | 1 - docs/source/reference/ai/index.rst | 4 +- docs/source/reference/optimizations.rst | 10 +++ 6 files changed, 141 insertions(+), 44 deletions(-) create mode 100644 docs/source/overview/model-inference.rst create mode 100644 docs/source/reference/optimizations.rst diff --git a/docs/_toc.yml b/docs/_toc.yml index b049af511c..2b173bfbe1 100644 --- a/docs/_toc.yml +++ b/docs/_toc.yml @@ -9,6 +9,8 @@ parts: title: Installation Options - file: source/overview/connect-to-database title: Connect to Database + - file: source/overview/model-inference + title: Model inference - file: source/overview/concepts title: Concepts sections: @@ -77,6 +79,9 @@ parts: title: YOLO - file: source/reference/ai/custom title: Custom Model + + - file: source/reference/optimizations + title: Optimizations # - file: source/reference/io # title: IO Descriptors diff --git a/docs/source/overview/connect-to-database.rst b/docs/source/overview/connect-to-database.rst index e74de58ca3..45a27d96c0 100644 --- a/docs/source/overview/connect-to-database.rst +++ b/docs/source/overview/connect-to-database.rst @@ -6,52 +6,45 @@ EvaDB supports an extensive range of data sources for structured and unstructure Connect to a SQL Database System -------------------------------- -1. Use the `CREATE DATABASE` statement to connect to an existing SQL database. +1. Use the ``CREATE DATABASE`` statement to connect to an existing SQL database. -.. code-block:: python +.. code-block:: - cursor.query(""" - CREATE DATABASE restaurant_reviews - WITH ENGINE = 'postgres', - PARAMETERS = { - "user": "eva", - "password": "password", - "host": "localhost", - "port": "5432", - "database": "restaurant_reviews" - };""").df() + CREATE DATABASE restaurant_reviews + WITH ENGINE = 'postgres', + PARAMETERS = { + "user": "eva", + "password": "password", + "host": "localhost", + "port": "5432", + "database": "restaurant_reviews" + }; .. note:: Go over the :ref:`CREATE DATABASE` statement for more details. The :ref:`Databases` page lists all the database systems that EvaDB currently supports. -2. Preview the Available Data Using `SELECT` +2. Preview the Available Data Using ``SELECT`` -You can now preview the available data in the `restaurant_reviews` database with a standard :ref:`SELECT` statement. +You can now preview the available data in the ``restaurant_reviews`` database with a standard :ref:`SELECT` statement. -.. code-block:: python +.. code-block:: sql - cursor.query(""" - SELECT * - FROM restaurant_reviews.food_review; - """).df() + SELECT * FROM restaurant_reviews.food_review; -3. Run Native Queries in the Connected Database With `USE` +3. Run Native Queries in the Connected Database With ``USE`` You can also run native queries directly in the connected database system by the :ref:`USE` statement. -.. code-block:: python +.. code-block:: - cursor.query( - """ - USE restaurant_reviews { - INSERT INTO food_review (name, review) - VALUES ( - 'Customer 1', - 'I ordered fried rice but it is too salty.' - ) - }; - """).df() + USE restaurant_reviews { + INSERT INTO food_review (name, review) + VALUES ( + 'Customer 1', + 'I ordered fried rice but it is too salty.' + ) + }; Load Unstructured Data @@ -63,23 +56,17 @@ EvaDB supports diverse types of unstructured data. Here are some examples: You can load a collection of images obtained from Reddit from the local filesystem into EvaDB using the :ref:`LOAD` statement. -.. code-block:: python - - cursor.query(""" - LOAD IMAGE 'reddit-images/*.jpg' - INTO reddit_dataset; - """).df() +.. code-block:: sql + + LOAD IMAGE 'reddit-images/*.jpg' INTO reddit_dataset; 2. Load Video from Cloud Bucket You can load a video from an S3 cloud bucket into EvaDB using the :ref:`LOAD` statement. -.. code-block:: python +.. code-block:: sql - cursor.query(""" - LOAD VIDEO 's3://bucket/eva_videos/mnist.mp4' - INTO MNISTVid; - """).df() + LOAD VIDEO 's3://bucket/eva_videos/mnist.mp4' INTO MNISTVid; .. note:: diff --git a/docs/source/overview/model-inference.rst b/docs/source/overview/model-inference.rst new file mode 100644 index 0000000000..79799c7f5b --- /dev/null +++ b/docs/source/overview/model-inference.rst @@ -0,0 +1,94 @@ +.. _model-inference: + +Model Inference +=============== + +In EvaDB, every model is a function. We can compose SQL queries using functions as building units similar to conventional SQL functions. EvaDB's `cascades optimizer ` will optimize the evaluation of user-defined functions for lower latency. Go over :ref:`optimizations` for more details. + +.. note:: + + EvaDB ships with a variety of builtin user-defined functions. Go over :ref:`models` to check them. Did not find the desired model? Go over :ref:`udf` to create your own user-defined functions and contribute to EvaDB. + +1. Projection + + The most common usecases are model inference in projections. For example, we can use the `MnistImageClassifier `_ to identify numbers from the `MINST `_ video. + +.. code-block:: sql + + SELECT MnistImageClassifier(data).label FROM minst_vid; + +2. Selection + + Another common usecases are model inference in selections. In the below example, we use ``TextSummarizer`` and ``TextClassifier`` from :ref:`HuggingFace` to summarize the negative food reviews. + +.. code-block:: sql + + SELECT TextSummarizer(data) + FROM food_reviews + WHERE TextClassifier(data).label = 'NEGATIVE'; + +EvaDB also provides specialized array operators to construct queries. Go over built-in utility operators and functions for all of them. Below is an example of ``CONTAIN``: + +.. code-block:: sql + + SELECT id FROM camera_videos + WHERE ObjectDetector(data).labels @> ['person', 'car']; + +3. Lateral Join + + In EvaDB, we can also use models in joins. + The most powerful usecase is lateral join combined with ``UNNEST``, which is very helpful to flatten the output from `one-to-many` models. + The key idea here is a model could give multiple outputs (e.g., bounding box) stored in an array. This syntax is used to unroll elements from the array into multiple rows. + Typical examples are `face detectors `_ and `object detectors `_. + In the below example, we use `emotion detector _` to detect emotions from faces in the movie, where a single scene can contain multiple faces. + +.. code-block:: sql + + SELECT EmotionDetector(Crop(data, Face.bbox)) + FROM movie + LATERAL JOIN UNNEST(FaceDetector(data)) AS Face(bbox, conf); + +4. Aggregate Functions + + Models can also be executed on a sequence of frames, particularly for action detection. This can be accomplished by utilizing ``GROUP BY`` and ``SEGMENT`` to concatenate consecutive frames into a single segment. + +.. code-block:: sql + + SELECT ASLActionRecognition(SEGMENT(data)) + FROM ASL_ACTIONS + SAMPLE 5 + GROUP BY '16 frames'; + +Here is another example grouping paragraphs from PDFs: + +.. code-block:: sql + + SELECT SEGMENT(data) FROM MyPDFs GROUP BY '10 paragraphs'; + +5. Order By + + Models (typically feature extractors) can also be used in the ``ORDER BY`` for embedding-based similarity search. EvaDB also has index support to facilitate this type of queries. + In the below examples, we use the `SentenceFeatureExtractor `_ to find relevant context `When was the NATO created` from a collection of pdfs as the knowledge base. Go over `PrivateGPT notebook `_ for more details. + +.. code-block:: sql + + SELECT data FROM MyPDFs + ORDER BY Similarity( + SentenceFeatureExtractor('When was the NATO created?'), + SentenceFeatureExtractor(data) + ); + +We can also use the `SiftFeatureExtractor `_ to find similar images from a collection of images as the gallery. Go over :ref:`image-search` for more details. + +.. code-block:: sql + + SELECT name FROM reddit_dataset + ORDER BY Similarity( + SiftFeatureExtractor(Open('reddit-images/cat.jpg')), + SiftFeatureExtractor(data) + ); + + +.. note:: + + Go over our :ref:`Usecases` to check more ways of utlizing models in real-world use cases. diff --git a/docs/source/reference/ai/custom.rst b/docs/source/reference/ai/custom.rst index d57a0fe059..b528131b72 100644 --- a/docs/source/reference/ai/custom.rst +++ b/docs/source/reference/ai/custom.rst @@ -1,6 +1,5 @@ .. _udf: - Functions ====================== diff --git a/docs/source/reference/ai/index.rst b/docs/source/reference/ai/index.rst index a25fd58815..7cc31ab328 100644 --- a/docs/source/reference/ai/index.rst +++ b/docs/source/reference/ai/index.rst @@ -1,3 +1,5 @@ +.. _models: + Models ------------------------------------------ @@ -7,4 +9,4 @@ This section compiles a comprehensive catalog of the model integrations that Eva Please refer to the following table of contents for easy navigation: -.. tableofcontents:: \ No newline at end of file +.. tableofcontents:: diff --git a/docs/source/reference/optimizations.rst b/docs/source/reference/optimizations.rst new file mode 100644 index 0000000000..624d8856a2 --- /dev/null +++ b/docs/source/reference/optimizations.rst @@ -0,0 +1,10 @@ +.. _optimizations: + +Optimizations +============= + +EvaDB optimizes the evaluation of user-defined functions in three manifolds. + +1. Cache expensive function invocations and reuse their results in future invocations. +2. Cost-based predicate reordering to evaluate fast and selective predicate first. +3. Ray-based distributed inference. EvaDB not only parallelizes model inference to improve GPU utilization but also builds pipeline to parallelize CPU processing (i.e., loading and decoding data).