Skip to content

Commit

Permalink
Add model inference documentation (#1030)
Browse files Browse the repository at this point in the history
This documentation shows case the flexibility of EvaDB's declarative
language, and variety of task we can accomplish.

A brief introduction to the optimization in EvaDB. Will populate more
details over time when optimization features become stable.
  • Loading branch information
xzdandy authored Sep 16, 2023
1 parent 04c539a commit a0ec785
Show file tree
Hide file tree
Showing 6 changed files with 141 additions and 44 deletions.
5 changes: 5 additions & 0 deletions docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ parts:
title: Installation Options
- file: source/overview/connect-to-database
title: Connect to Database
- file: source/overview/model-inference
title: Model inference
- file: source/overview/concepts
title: Concepts
sections:
Expand Down Expand Up @@ -77,6 +79,9 @@ parts:
title: YOLO
- file: source/reference/ai/custom
title: Custom Model

- file: source/reference/optimizations
title: Optimizations

# - file: source/reference/io
# title: IO Descriptors
Expand Down
71 changes: 29 additions & 42 deletions docs/source/overview/connect-to-database.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,52 +6,45 @@ EvaDB supports an extensive range of data sources for structured and unstructure
Connect to a SQL Database System
--------------------------------

1. Use the `CREATE DATABASE` statement to connect to an existing SQL database.
1. Use the ``CREATE DATABASE`` statement to connect to an existing SQL database.

.. code-block:: python
.. code-block::
cursor.query("""
CREATE DATABASE restaurant_reviews
WITH ENGINE = 'postgres',
PARAMETERS = {
"user": "eva",
"password": "password",
"host": "localhost",
"port": "5432",
"database": "restaurant_reviews"
};""").df()
CREATE DATABASE restaurant_reviews
WITH ENGINE = 'postgres',
PARAMETERS = {
"user": "eva",
"password": "password",
"host": "localhost",
"port": "5432",
"database": "restaurant_reviews"
};
.. note::

Go over the :ref:`CREATE DATABASE<sql-create-database>` statement for more details. The :ref:`Databases<databases>` page lists all the database systems that EvaDB currently supports.

2. Preview the Available Data Using `SELECT`
2. Preview the Available Data Using ``SELECT``

You can now preview the available data in the `restaurant_reviews` database with a standard :ref:`SELECT<sql-select>` statement.
You can now preview the available data in the ``restaurant_reviews`` database with a standard :ref:`SELECT<sql-select>` statement.

.. code-block:: python
.. code-block:: sql
cursor.query("""
SELECT *
FROM restaurant_reviews.food_review;
""").df()
SELECT * FROM restaurant_reviews.food_review;
3. Run Native Queries in the Connected Database With `USE`
3. Run Native Queries in the Connected Database With ``USE``

You can also run native queries directly in the connected database system by the :ref:`USE<sql-use>` statement.

.. code-block:: python
.. code-block::
cursor.query(
"""
USE restaurant_reviews {
INSERT INTO food_review (name, review)
VALUES (
'Customer 1',
'I ordered fried rice but it is too salty.'
)
};
""").df()
USE restaurant_reviews {
INSERT INTO food_review (name, review)
VALUES (
'Customer 1',
'I ordered fried rice but it is too salty.'
)
};
Load Unstructured Data
Expand All @@ -63,23 +56,17 @@ EvaDB supports diverse types of unstructured data. Here are some examples:

You can load a collection of images obtained from Reddit from the local filesystem into EvaDB using the :ref:`LOAD<sql-load>` statement.

.. code-block:: python
cursor.query("""
LOAD IMAGE 'reddit-images/*.jpg'
INTO reddit_dataset;
""").df()
.. code-block:: sql
LOAD IMAGE 'reddit-images/*.jpg' INTO reddit_dataset;
2. Load Video from Cloud Bucket

You can load a video from an S3 cloud bucket into EvaDB using the :ref:`LOAD<sql-load>` statement.

.. code-block:: python
.. code-block:: sql
cursor.query("""
LOAD VIDEO 's3://bucket/eva_videos/mnist.mp4'
INTO MNISTVid;
""").df()
LOAD VIDEO 's3://bucket/eva_videos/mnist.mp4' INTO MNISTVid;
.. note::

Expand Down
94 changes: 94 additions & 0 deletions docs/source/overview/model-inference.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
.. _model-inference:

Model Inference
===============

In EvaDB, every model is a function. We can compose SQL queries using functions as building units similar to conventional SQL functions. EvaDB's `cascades optimizer <https://faculty.cc.gatech.edu/~jarulraj/courses/8803-s21/slides/22-cascades.pdf>` will optimize the evaluation of user-defined functions for lower latency. Go over :ref:`optimizations` for more details.

.. note::

EvaDB ships with a variety of builtin user-defined functions. Go over :ref:`models` to check them. Did not find the desired model? Go over :ref:`udf` to create your own user-defined functions and contribute to EvaDB.

1. Projection

The most common usecases are model inference in projections. For example, we can use the `MnistImageClassifier <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/mnist_image_classifier.py>`_ to identify numbers from the `MINST <https://www.dropbox.com/s/yxljxz6zxoqu54v/mnist.mp4>`_ video.

.. code-block:: sql
SELECT MnistImageClassifier(data).label FROM minst_vid;
2. Selection

Another common usecases are model inference in selections. In the below example, we use ``TextSummarizer`` and ``TextClassifier`` from :ref:`HuggingFace<hf>` to summarize the negative food reviews.

.. code-block:: sql
SELECT TextSummarizer(data)
FROM food_reviews
WHERE TextClassifier(data).label = 'NEGATIVE';
EvaDB also provides specialized array operators to construct queries. Go over built-in utility operators and functions for all of them. Below is an example of ``CONTAIN``:

.. code-block:: sql
SELECT id FROM camera_videos
WHERE ObjectDetector(data).labels @> ['person', 'car'];
3. Lateral Join

In EvaDB, we can also use models in joins.
The most powerful usecase is lateral join combined with ``UNNEST``, which is very helpful to flatten the output from `one-to-many` models.
The key idea here is a model could give multiple outputs (e.g., bounding box) stored in an array. This syntax is used to unroll elements from the array into multiple rows.
Typical examples are `face detectors <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/face_detector.py>`_ and `object detectors <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/fastrcnn_object_detector.py>`_.
In the below example, we use `emotion detector <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/emotion_detector.py>_` to detect emotions from faces in the movie, where a single scene can contain multiple faces.

.. code-block:: sql
SELECT EmotionDetector(Crop(data, Face.bbox))
FROM movie
LATERAL JOIN UNNEST(FaceDetector(data)) AS Face(bbox, conf);
4. Aggregate Functions

Models can also be executed on a sequence of frames, particularly for action detection. This can be accomplished by utilizing ``GROUP BY`` and ``SEGMENT`` to concatenate consecutive frames into a single segment.

.. code-block:: sql
SELECT ASLActionRecognition(SEGMENT(data))
FROM ASL_ACTIONS
SAMPLE 5
GROUP BY '16 frames';
Here is another example grouping paragraphs from PDFs:

.. code-block:: sql
SELECT SEGMENT(data) FROM MyPDFs GROUP BY '10 paragraphs';
5. Order By

Models (typically feature extractors) can also be used in the ``ORDER BY`` for embedding-based similarity search. EvaDB also has index support to facilitate this type of queries.
In the below examples, we use the `SentenceFeatureExtractor <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/sentence_feature_extractor.py>`_ to find relevant context `When was the NATO created` from a collection of pdfs as the knowledge base. Go over `PrivateGPT notebook <https://github.com/georgia-tech-db/evadb/blob/staging/tutorials/13-privategpt.ipynb>`_ for more details.

.. code-block:: sql
SELECT data FROM MyPDFs
ORDER BY Similarity(
SentenceFeatureExtractor('When was the NATO created?'),
SentenceFeatureExtractor(data)
);
We can also use the `SiftFeatureExtractor <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/sift_feature_extractor.py>`_ to find similar images from a collection of images as the gallery. Go over :ref:`image-search` for more details.

.. code-block:: sql
SELECT name FROM reddit_dataset
ORDER BY Similarity(
SiftFeatureExtractor(Open('reddit-images/cat.jpg')),
SiftFeatureExtractor(data)
);
.. note::

Go over our :ref:`Usecases<sentiment-analysis>` to check more ways of utlizing models in real-world use cases.
1 change: 0 additions & 1 deletion docs/source/reference/ai/custom.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
.. _udf:


Functions
======================

Expand Down
4 changes: 3 additions & 1 deletion docs/source/reference/ai/index.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _models:

Models
------------------------------------------

Expand All @@ -7,4 +9,4 @@ This section compiles a comprehensive catalog of the model integrations that Eva

Please refer to the following table of contents for easy navigation:

.. tableofcontents::
.. tableofcontents::
10 changes: 10 additions & 0 deletions docs/source/reference/optimizations.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
.. _optimizations:

Optimizations
=============

EvaDB optimizes the evaluation of user-defined functions in three manifolds.

1. Cache expensive function invocations and reuse their results in future invocations.
2. Cost-based predicate reordering to evaluate fast and selective predicate first.
3. Ray-based distributed inference. EvaDB not only parallelizes model inference to improve GPU utilization but also builds pipeline to parallelize CPU processing (i.e., loading and decoding data).

0 comments on commit a0ec785

Please sign in to comment.