Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPC Records list filtering #3700

Merged
merged 18 commits into from
Jul 26, 2024
Merged

RPC Records list filtering #3700

merged 18 commits into from
Jul 26, 2024

Conversation

mathemancer
Copy link
Contributor

@mathemancer mathemancer commented Jul 22, 2024

Related to #3690

Adds filtering to the records.list RPC function.

Technical details

The filters implemented are derived from the current Front End code.

The Filter spec is documented, but some changes warrant extra attention.

First, and and or are now just filters like the rest. Each takes two operands. So, to get the SQL

WHERE id < 3 AND col1 > 'potato' AND col2 = 'potahto'

you submit

{
    "type": "and",
    "args": [
        {
            "type": "and",
            "args": [
                {
                    "type": "lesser",
                    "args": [
                        {"type": "attnum", "value": 1},
                        {"type": "literal", "value": 3}
                    ]
                },
                {
                    "type": "greater",
                    "args": [
                        {"type": "attnum", "value": 2},
                        {"type": "literal", "value": "potato"}
                    ]
                }
            ]
        },
        {
            "type": "equal",
            "args": [
                {"type": "attnum", "value": 3},
                {"type": "literal", "value": "potahto"}
            ]
        }
    ]
}

In reality, this produces the SQL:

WHERE (id < 3) AND ((col1 > 'potato') AND (col2 = 'potahto'))

The reason for this change is that it lets the recursive function that parses the filter spec be memoryless. It's also much more flexible moving forward.

Another change is that the filter type is no longer a key, but is instead a value corresponding to the key "type" in each filter. This is much more typical, and (again) allows the parser to be memoryless.

Finally, and less importantly, we've changed column_id to attnum to conform with the other RPC functions.

Checklist

  • My pull request has a descriptive title (not a vague title like Update index.md).
  • My pull request targets the develop branch of the repository
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no
    visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@mathemancer mathemancer marked this pull request as ready for review July 22, 2024 07:38
@mathemancer mathemancer requested review from Anish9901 and pavish July 22, 2024 08:00
@mathemancer mathemancer added the pr-status: review A PR awaiting review label Jul 22, 2024
@mathemancer mathemancer added this to the Beta milestone Jul 22, 2024
@seancolsen
Copy link
Contributor

I've only read the PR description, but I wanted to chime in with some thoughts...

This looks nice! The JSON is basically an AST for a limited set of SQL expressions. Very cool! The way you're using the type field works really well with TypeScript idioms.

I have one minor nitpick that very well may not be worth addressing. I'm just mentioning this as food for thought, on the off chance that it might tip your mental calculus into making an adjustment.

Instead of:

{
  "type": "lesser",
  "args": [
    {
      "type": "attnum",
      "value": 1
    },
    {
      "type": "literal",
      "value": 3
    }
  ]
}

I'd rather see something like:

{
  "type": "lesser",
  "lhs": {
    "type": "attnum",
    "value": 1
  },
  "rhs": {
    "type": "literal",
    "value": 3
  }
}

Why? Because a "less-than" expression always requires exactly two arguments. Structuring the JSON fields statically as lhs and rhs (instead of dynamically as args) allows us to encode that requirement into the type system.

To be clear: I don't mean to suggest that we'd have lhs and rhs values for every type of expression — only for certain types.

Here's an example of a comparison that we might want to add in the future which would be structured differently:

{
  "type": "in",
  "value": {
    "type": "attnum",
    "value": 1
  },
  "values": [
    {
      "type": "literal",
      "value": 3
    },
    {
      "type": "literal",
      "value": 4
    }
  ]
}

Overall this is minor though. Not worth slowing things down unless you happen to see other good reasons for making such a change too.

Copy link
Member

@pavish pavish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mathemancer The syntax looks great to me! I do not see any concerns.

Adding some thoughts towards @seancolsen's comment:

I still prefer the syntax in the PR description because it follows a uniform structure (with each node containing a type and args/value) which is much easier to parse and work with.

In this example:

{
  "type": "lesser",
  "lhs": {
    "type": "attnum",
    "value": 1
  },
  "rhs": {
    "type": "literal",
    "value": 3
  }
}

I'd prefer finding the lhs and rhs using the type key instead of having them as additional keys.

In the scenario on the PR description, one is a column and the other is a literal, so we can deduct that easily using attnum and literal, and it works great.

However there are scenarios where it might be required, say both arguments are columns, I'd prefer something like this instead:

{
  "type": "lesser",
  "args": [
    {
      "type": "lhs",
      "value": {
        "operand_type": "attnum",
        "operand": 4
      }
    },
    {
      "type": "rhs",
      "value": {
        "operand_type": "attnum",
        "operand": 3
      }
    }
  ]
}

This allows maximum flexibility w.r.t structure and even for us to nest the expressions if we ever want to, eg., comparing the output of one filter with another.

@pavish pavish removed their assignment Jul 22, 2024
@mathemancer
Copy link
Contributor Author

mathemancer commented Jul 23, 2024

@seancolsen @pavish A couple of quick notes regarding your well-received commentary.

  • My first attempt was similar to what @seancolsen suggested, but the irregularity of the form made things (much) more complex in the implementation of the parser. Basically, every key ends up being another case in the end, and there's kind of a combinatorial explosion because the function is recursive.

  • Please note that the args array is ordered. If you put the literal before the attnum in the args array for a lesser object, you'll end up with something like 42 < mycolumn. It seems like you both may not have noticed this.

  • As for comparing outputs of filters or functions via nesting (as suggested by @pavish), the current form allows for that, and so does the underlying implementation. It's just that nesting is currently not that useful since all currently implemented functions or operators output SQL representing boolean values. But, if (for example) you want a logical implies operator, i.e., with the truth table

    a b r
    t t t
    t f f
    f t t
    f f t

    you can simply nest your filters inside a filter object of type lesser_or_equal. Not sure how useful that would be for a user though.

Finally, I should point out that the form, and the underlying parser, is flexible enough to allow for writing a whole bunch of arbitrary SQL. It doesn't actually know it's writing a "filter". I intend to use it for more general function calls in the future.

As a quick example, the json_array_length_lesser_or_equal filter is a bit redundant. We could easily implement a json_array_length 'filter' (really just a preproc function call at that point) by adding

('json_array_length', 'jsonb_array_length(%s::jsonb)')

to the msar.filter_templates table (which would use a rename at that point), and then the filter object

{
    "type": "or",
    "args": [
        {
            "type": "lesser",
            "args": [
                {
                    "type": "json_array_length",
                    "args": [
                        {"type": "attnum", "value": 2}
                    ]
                },
                {"type": "literal", "value": 3}
            ]
        },
        {
            "type": "equal",
            "args": [
                {
                    "type": "json_array_length",
                    "args": [
                        {"type": "attnum", "value": 2}
                    ]
                },
                {"type": "literal", "value": 3}
            ]
        }
    ]
}

would produce the SQL

WHERE (jsonb_array_length(col1::jsonb) < '3') OR (jsonb_array_length(col1::jsonb) = '3')

This filter will give the exact same output as json_array_length_lesser_equal. I didn't end up going that far on this, because I wanted to minimize scope (on both ends) by keeping the list of functions the same as it was for the REST version. If it would be useful to have any of these 'preproc' functions (json_array_length would be a main candidate) available by beta, just let me know. It's 10 minutes to implement, and 10 minutes to write a quick test.

You may notice that many of our current filter functions could actually just be compositions or argument-swaps of other functions. In the long run, I think we should lean on that feature.

use composition rather than prewrapped functions
add more parentheses to avoid ambiguity
avoid ILIKE, stopping bugs due to inadvertent wildcards
@mathemancer
Copy link
Contributor Author

mathemancer commented Jul 25, 2024

@seancolsen Please take a look at the decomposed functions when/if you have time or interest.

I ended up keeping the lesser_or_equals functions as well as greater_or_equals, since I think it's more convenient that way and people are used to having those operators available.

I also went ahead and made separate versions of contains or starts_with based on case sensitivity or insensitivity.

Finally, I realized once I got into the docs that there's a starts_with function that doesn't allow any wildcards at all, saving us potential mistakes. Also the strpos function returns a truthy value if the substring is in the value and a falsy one otherwise. Since it also doesn't accept wildcards, I went with that for our 'contains' logic.

If you find that you don't want to use any of the functions (e.g., the lesser_or_equals function) when you're doing your implementation, just let me know and we can remove the cruft later.

@mathemancer mathemancer mentioned this pull request Jul 25, 2024
7 tasks
Copy link
Contributor

@seancolsen seancolsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful!

Comment on lines 3326 to 3327
('json_array_length', 'jsonb_array_length((%s)::jsonb)'),
('json_array_contains', '(%s) @> (%s)'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we should be explicitly typecasting to jsonb here, as this would allow columns storing a json blob in a text column to be typecasted and return a result. However, if that is the intended behavior, it would be good to add typecasting to json_array_contains as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main goal was to allow this to be used on both json and jsonb. But, your point stands. I'll add typecasting to json_array_contains.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense.

@mathemancer mathemancer requested a review from Anish9901 July 26, 2024 06:19
Copy link
Member

@Anish9901 Anish9901 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now @mathemancer, Thanks!

@Anish9901 Anish9901 added this pull request to the merge queue Jul 26, 2024
Merged via the queue into develop with commit cdb87c4 Jul 26, 2024
37 checks passed
@Anish9901 Anish9901 deleted the records_list_filter branch July 26, 2024 08:46
@kgodey kgodey modified the milestones: Beta, Pre-beta test build #1 Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-status: review A PR awaiting review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants