RPC Records list filtering #3700

mathemancer · 2024-07-22T04:48:40Z

Related to #3690

Adds filtering to the records.list RPC function.

Technical details

The filters implemented are derived from the current Front End code.

The Filter spec is documented, but some changes warrant extra attention.

First, and and or are now just filters like the rest. Each takes two operands. So, to get the SQL

WHERE id < 3 AND col1 > 'potato' AND col2 = 'potahto'

you submit

{
    "type": "and",
    "args": [
        {
            "type": "and",
            "args": [
                {
                    "type": "lesser",
                    "args": [
                        {"type": "attnum", "value": 1},
                        {"type": "literal", "value": 3}
                    ]
                },
                {
                    "type": "greater",
                    "args": [
                        {"type": "attnum", "value": 2},
                        {"type": "literal", "value": "potato"}
                    ]
                }
            ]
        },
        {
            "type": "equal",
            "args": [
                {"type": "attnum", "value": 3},
                {"type": "literal", "value": "potahto"}
            ]
        }
    ]
}

In reality, this produces the SQL:

WHERE (id < 3) AND ((col1 > 'potato') AND (col2 = 'potahto'))

The reason for this change is that it lets the recursive function that parses the filter spec be memoryless. It's also much more flexible moving forward.

Another change is that the filter type is no longer a key, but is instead a value corresponding to the key "type" in each filter. This is much more typical, and (again) allows the parser to be memoryless.

Finally, and less importantly, we've changed column_id to attnum to conform with the other RPC functions.

Checklist

My pull request has a descriptive title (not a vague title like Update index.md).
My pull request targets the develop branch of the repository
My commit messages follow best practices.
My code follows the established code style of the repository.
I added tests for the changes I made (if applicable).
I added or updated documentation (if applicable).
I tried running the project locally and verified that there are no
visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

Also, add some type information for documentation

seancolsen · 2024-07-22T13:55:15Z

I've only read the PR description, but I wanted to chime in with some thoughts...

This looks nice! The JSON is basically an AST for a limited set of SQL expressions. Very cool! The way you're using the type field works really well with TypeScript idioms.

I have one minor nitpick that very well may not be worth addressing. I'm just mentioning this as food for thought, on the off chance that it might tip your mental calculus into making an adjustment.

Instead of:

{
  "type": "lesser",
  "args": [
    {
      "type": "attnum",
      "value": 1
    },
    {
      "type": "literal",
      "value": 3
    }
  ]
}

I'd rather see something like:

{
  "type": "lesser",
  "lhs": {
    "type": "attnum",
    "value": 1
  },
  "rhs": {
    "type": "literal",
    "value": 3
  }
}

Why? Because a "less-than" expression always requires exactly two arguments. Structuring the JSON fields statically as lhs and rhs (instead of dynamically as args) allows us to encode that requirement into the type system.

To be clear: I don't mean to suggest that we'd have lhs and rhs values for every type of expression — only for certain types.

Here's an example of a comparison that we might want to add in the future which would be structured differently:

{
  "type": "in",
  "value": {
    "type": "attnum",
    "value": 1
  },
  "values": [
    {
      "type": "literal",
      "value": 3
    },
    {
      "type": "literal",
      "value": 4
    }
  ]
}

Overall this is minor though. Not worth slowing things down unless you happen to see other good reasons for making such a change too.

pavish

@mathemancer The syntax looks great to me! I do not see any concerns.

Adding some thoughts towards @seancolsen's comment:

I still prefer the syntax in the PR description because it follows a uniform structure (with each node containing a type and args/value) which is much easier to parse and work with.

In this example:

{
  "type": "lesser",
  "lhs": {
    "type": "attnum",
    "value": 1
  },
  "rhs": {
    "type": "literal",
    "value": 3
  }
}

I'd prefer finding the lhs and rhs using the type key instead of having them as additional keys.

In the scenario on the PR description, one is a column and the other is a literal, so we can deduct that easily using attnum and literal, and it works great.

However there are scenarios where it might be required, say both arguments are columns, I'd prefer something like this instead:

{
  "type": "lesser",
  "args": [
    {
      "type": "lhs",
      "value": {
        "operand_type": "attnum",
        "operand": 4
      }
    },
    {
      "type": "rhs",
      "value": {
        "operand_type": "attnum",
        "operand": 3
      }
    }
  ]
}

This allows maximum flexibility w.r.t structure and even for us to nest the expressions if we ever want to, eg., comparing the output of one filter with another.

mathemancer · 2024-07-23T04:25:20Z

@seancolsen @pavish A couple of quick notes regarding your well-received commentary.

My first attempt was similar to what @seancolsen suggested, but the irregularity of the form made things (much) more complex in the implementation of the parser. Basically, every key ends up being another case in the end, and there's kind of a combinatorial explosion because the function is recursive.
Please note that the args array is ordered. If you put the literal before the attnum in the args array for a lesser object, you'll end up with something like 42 < mycolumn. It seems like you both may not have noticed this.
As for comparing outputs of filters or functions via nesting (as suggested by @pavish), the current form allows for that, and so does the underlying implementation. It's just that nesting is currently not that useful since all currently implemented functions or operators output SQL representing boolean values. But, if (for example) you want a logical implies operator, i.e., with the truth table

a b r

t t t

t f f

f t t

f f t

you can simply nest your filters inside a filter object of type lesser_or_equal. Not sure how useful that would be for a user though.

Finally, I should point out that the form, and the underlying parser, is flexible enough to allow for writing a whole bunch of arbitrary SQL. It doesn't actually know it's writing a "filter". I intend to use it for more general function calls in the future.

As a quick example, the json_array_length_lesser_or_equal filter is a bit redundant. We could easily implement a json_array_length 'filter' (really just a preproc function call at that point) by adding

('json_array_length', 'jsonb_array_length(%s::jsonb)')

to the msar.filter_templates table (which would use a rename at that point), and then the filter object

{
    "type": "or",
    "args": [
        {
            "type": "lesser",
            "args": [
                {
                    "type": "json_array_length",
                    "args": [
                        {"type": "attnum", "value": 2}
                    ]
                },
                {"type": "literal", "value": 3}
            ]
        },
        {
            "type": "equal",
            "args": [
                {
                    "type": "json_array_length",
                    "args": [
                        {"type": "attnum", "value": 2}
                    ]
                },
                {"type": "literal", "value": 3}
            ]
        }
    ]
}

would produce the SQL

WHERE (jsonb_array_length(col1::jsonb) < '3') OR (jsonb_array_length(col1::jsonb) = '3')

This filter will give the exact same output as json_array_length_lesser_equal. I didn't end up going that far on this, because I wanted to minimize scope (on both ends) by keeping the list of functions the same as it was for the REST version. If it would be useful to have any of these 'preproc' functions (json_array_length would be a main candidate) available by beta, just let me know. It's 10 minutes to implement, and 10 minutes to write a quick test.

You may notice that many of our current filter functions could actually just be compositions or argument-swaps of other functions. In the long run, I think we should lean on that feature.

use composition rather than prewrapped functions add more parentheses to avoid ambiguity avoid ILIKE, stopping bugs due to inadvertent wildcards

mathemancer · 2024-07-25T08:17:17Z

@seancolsen Please take a look at the decomposed functions when/if you have time or interest.

I ended up keeping the lesser_or_equals functions as well as greater_or_equals, since I think it's more convenient that way and people are used to having those operators available.

I also went ahead and made separate versions of contains or starts_with based on case sensitivity or insensitivity.

Finally, I realized once I got into the docs that there's a starts_with function that doesn't allow any wildcards at all, saving us potential mistakes. Also the strpos function returns a truthy value if the substring is in the value and a falsy one otherwise. Since it also doesn't accept wildcards, I went with that for our 'contains' logic.

If you find that you don't want to use any of the functions (e.g., the lesser_or_equals function) when you're doing your implementation, just let me know and we can remove the cruft later.

seancolsen

Wonderful!

Anish9901 · 2024-07-25T22:58:37Z

db/sql/00_msar.sql

+  ('json_array_length', 'jsonb_array_length((%s)::jsonb)'),
+  ('json_array_contains', '(%s) @> (%s)'),


I don't know if we should be explicitly typecasting to jsonb here, as this would allow columns storing a json blob in a text column to be typecasted and return a result. However, if that is the intended behavior, it would be good to add typecasting to json_array_contains as well.

The main goal was to allow this to be used on both json and jsonb. But, your point stands. I'll add typecasting to json_array_contains.

That makes sense.

Anish9901

Looks good now @mathemancer, Thanks!

mathemancer added 12 commits July 19, 2024 00:50

add initial where building function and filters

42235d2

add more filtering functions

6abb681

add debugging output about what query ran to response

a127f18

implement remaining filtering functions

08beaed

add test for each leaf filter function

6294458

Merge branch 'develop' into records_list_filter

68ced5e

add composition test for filtering, fix composition

b16958b

add more complex composition example

2a152b0

fix records list test with new type

0380f32

fix stupid syntax mistake 💩 💩 💩

15c61d1

Merge branch 'develop' into records_list_filter

c90664e

change column_id -> attnum for conformity with other RPC functions

1155d7f

Also, add some type information for documentation

mathemancer marked this pull request as ready for review July 22, 2024 07:38

mathemancer requested review from Anish9901 and pavish July 22, 2024 08:00

mathemancer assigned pavish and Anish9901 Jul 22, 2024

mathemancer added the pr-status: review A PR awaiting review label Jul 22, 2024

mathemancer added this to the Beta milestone Jul 22, 2024

add new classes to docs

45490f9

pavish reviewed Jul 22, 2024

View reviewed changes

pavish removed their assignment Jul 22, 2024

update filtering logic as per convo with Sean

bb78a2e

use composition rather than prewrapped functions add more parentheses to avoid ambiguity avoid ILIKE, stopping bugs due to inadvertent wildcards

mathemancer added 2 commits July 25, 2024 16:26

generalize terminology to avoid 'filter' where appropriate

79631b0

Merge branch 'develop' into records_list_filter

2cadd69

mathemancer mentioned this pull request Jul 25, 2024

Add records.search RPC function #3708

Merged

7 tasks

seancolsen approved these changes Jul 25, 2024

View reviewed changes

Anish9901 requested changes Jul 25, 2024

View reviewed changes

mathemancer added 2 commits July 26, 2024 14:16

use explicit type casting for json containment

570c87b

Merge branch 'develop' into records_list_filter

8447a14

mathemancer requested a review from Anish9901 July 26, 2024 06:19

Anish9901 approved these changes Jul 26, 2024

View reviewed changes

Anish9901 added this pull request to the merge queue Jul 26, 2024

Merged via the queue into develop with commit cdb87c4 Jul 26, 2024
37 checks passed

Anish9901 deleted the records_list_filter branch July 26, 2024 08:46

This was referenced Aug 1, 2024

Unable to use numbers in filtering #3719

Closed

Handle new records filtering on the front end #3728

Merged

kgodey modified the milestones: Beta, Pre-beta test build #1 Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RPC Records list filtering #3700

RPC Records list filtering #3700

mathemancer commented Jul 22, 2024 •

edited

Loading

seancolsen commented Jul 22, 2024

pavish left a comment

mathemancer commented Jul 23, 2024 •

edited

Loading

mathemancer commented Jul 25, 2024 •

edited

Loading

seancolsen left a comment

Anish9901 Jul 25, 2024

mathemancer Jul 26, 2024

Anish9901 Jul 26, 2024

Anish9901 left a comment

		('json_array_length', 'jsonb_array_length((%s)::jsonb)'),
		('json_array_contains', '(%s) @> (%s)'),

RPC Records list filtering #3700

RPC Records list filtering #3700

Conversation

mathemancer commented Jul 22, 2024 • edited Loading

Checklist

Developer Certificate of Origin

seancolsen commented Jul 22, 2024

pavish left a comment

Choose a reason for hiding this comment

mathemancer commented Jul 23, 2024 • edited Loading

mathemancer commented Jul 25, 2024 • edited Loading

seancolsen left a comment

Choose a reason for hiding this comment

Anish9901 Jul 25, 2024

Choose a reason for hiding this comment

mathemancer Jul 26, 2024

Choose a reason for hiding this comment

Anish9901 Jul 26, 2024

Choose a reason for hiding this comment

Anish9901 left a comment

Choose a reason for hiding this comment

mathemancer commented Jul 22, 2024 •

edited

Loading

mathemancer commented Jul 23, 2024 •

edited

Loading

mathemancer commented Jul 25, 2024 •

edited

Loading