Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PG17 compatibility: add/fix tests with correlated subqueries that can be pulled to a join #7745

Merged
merged 1 commit into from
Nov 20, 2024

Conversation

colm-mchugh
Copy link
Contributor

@colm-mchugh colm-mchugh commented Nov 14, 2024

Fix Test Failure in subquery_in_where, set_operations, dml_recursive in PG17 #7741

The test failures are caused by this commit in PG17, which enables correlated subqueries to be pulled up to a join. Prior to this, the correlated subquery was implemented as a subplan. In citus, it is not possible to pushdown a correlated subplan, but with a different plan in PG17 the query can be executed, per the test diff from subquery_in_where:

37,39c37,41
< DEBUG:  generating subplan XXX_1 for CTE event_id: SELECT user_id AS events_user_id, "time" AS events_time, event_type FROM public.events_table
< DEBUG:  Plan XXX query after replacing subqueries and CTEs: SELECT count(*) AS count FROM (SELECT intermediate_result.events_user_id, intermediate_result.events_time, intermediate_result.event_type FROM read_intermediate_result('XXX_1'::text, 'binary'::citus_copy_format) intermediate_result(events_user_id integer, events_time timestamp without time zone, event_type integer)) event_id WHERE (events_user_id OPERATOR(pg_catalog.=) ANY (SELECT users_table.user_id FROM public.users_table WHERE (users_table."time" OPERATOR(pg_catalog.=) event_id.events_time)))
< ERROR:  correlated subqueries are not supported when the FROM clause contains a CTE or subquery
---
>  count
> ---------------------------------------------------------------------
>      0
> (1 row)
> 

This is because with pg17 = ANY subquery in the queries can be implemented as a join, instead of as a subplan filter on a table scan. For example, SELECT * FROM test a WHERE x IN (SELECT x FROM test b UNION SELECT y FROM test c WHERE a.x = c.x) ORDER BY 1,2 (from set_operations) has this plan in pg17; note that the subquery is the inner side of a nested loop join:

┌───────────────────────────────────────────────────┐
│                    QUERY PLAN                     │
├───────────────────────────────────────────────────┤
│ Sort                                              │
│   Sort Key: a.x, a.y                              │
│   ->  Nested Loop                                 │
│         ->  Seq Scan on test a                    │
│         ->  Subquery Scan on "ANY_subquery"       │
│               Filter: (a.x = "ANY_subquery".x)    │
│               ->  HashAggregate                   │
│                     Group Key: b.x                │
│                     ->  Append                    │
│                           ->  Seq Scan on test b  │
│                           ->  Seq Scan on test c  │
│                                 Filter: (a.x = x) │
└───────────────────────────────────────────────────┘

and this plan in pg16 (and previous pg versions); the subquery is a correlated subplan filter on a table scan:

┌───────────────────────────────────────────────┐
│                  QUERY PLAN                   │
├───────────────────────────────────────────────┤
│ Sort                                          │
│   Sort Key: a.x, a.y                          │
│   ->  Seq Scan on test a                      │
│         Filter: (SubPlan 1)                   │
│         SubPlan 1                             │
│           ->  HashAggregate                   │
│                 Group Key: b.x                │
│                 ->  Append                    │
│                       ->  Seq Scan on test b  │
│                       ->  Seq Scan on test c  │
│                             Filter: (a.x = x) │
└───────────────────────────────────────────────┘

The fix Modifies the queries causing the test failures so that an ANY subquery is not folded to a join, preserving the expected output of the tests. A similar approach was taken for existing regress tests in the postgres commit. See the join regress test, for example.

@colm-mchugh colm-mchugh self-assigned this Nov 14, 2024
Copy link

codecov bot commented Nov 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (release-13.0@0fed87a). Learn more about missing BASE report.

Additional details and impacted files
@@               Coverage Diff               @@
##             release-13.0    #7745   +/-   ##
===============================================
  Coverage                ?   89.64%           
===============================================
  Files                   ?      274           
  Lines                   ?    59583           
  Branches                ?     7436           
===============================================
  Hits                    ?    53413           
  Misses                  ?     4037           
  Partials                ?     2133           
---- 🚨 Try these New Features:

@@ -134,7 +134,7 @@ SELECT * FROM test a WHERE x NOT IN (SELECT x FROM test b WHERE y = 1 UNION SELE
SELECT * FROM test a WHERE x IN (SELECT x FROM test b UNION SELECT y FROM test c) ORDER BY 1,2;

-- correlated subquery with union in WHERE clause
SELECT * FROM test a WHERE x IN (SELECT x FROM test b UNION SELECT y FROM test c WHERE a.x = c.x) ORDER BY 1,2;
SELECT * FROM test a WHERE (x + random()) IN (SELECT x FROM test b UNION SELECT y FROM test c WHERE a.x = c.x) ORDER BY 1,2;
Copy link
Member

@naisila naisila Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar approach was taken for existing regress tests in the postgres commit.

They followed this approach in the postgres tests because they were having EXPLAIN diffs, and they wanted to avoid adding a new alternative test output file for PG17. In Citus, note that in these two tests, we are trying to run the query, not to explain it. So, we try to run these queries, both of them unexpectedly work.

My point is, we also need to understand what changed in the Citus planner path, in the codebase, and make sure that Citus is running these queries correctly.

Current fix is great, by the way, no extra output file, but we may need to test this more extensively in Citus through this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I think that the Citus planner is running the queries correctly (in pg17) because it is getting a different plan from the pg planner, but I will verify, and see what tests can be added (maybe to pg17 regress test?) to test the new behavior in pg17.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, that sounds great.

maybe to pg17 regress test

Yes, makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest push contains a pg17 regress test that tests the pg17 feature of pulling up correlated ANY subqueries. It can be extended to test other 17-related functionality as appropriate.

@naisila
Copy link
Member

naisila commented Nov 14, 2024

By the way, can we add a similar fix to dml_recursive test to avoid the extra output? #7727

@colm-mchugh
Copy link
Contributor Author

By the way, can we add a similar fix to dml_recursive test to avoid the extra output? #7727

Yes, it looks like dml_recursive can have a similar fix. In all three cases - set_operations, subquery_in_where and dml_recursive - the plan created by the Postgres planner pre-pg17 implemented the correlated subquery as a SubPlan filter. In all three cases with pg17 the pg optimizer can fold the correlated subquery to a join, so the pg plan does not have any correlated SubPlans, which seems to avoid the limitations in Citus.

Copy link
Member

@naisila naisila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful PR, thank you.
We can merge after the team sync on the queries that work, given that we don't discover any issues in that meeting.

@colm-mchugh
Copy link
Contributor Author

@microsoft-github-policy-service agree company="Microsoft"

Copy link
Member

@naisila naisila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I was holding off on merging is that I forgot to test the PR with PG16 😆Sorry about that.
So, I realized we miss pg17_0.sql file, which is the alternative file output for pg16/pg15/pg14 runs. Thats why I am requesting changes with this PR review.
Also, check-style test is failing, it looks like there are some whitespaces around https://github.com/citusdata/citus/actions/runs/11861227624/job/33058148915?pr=7745

Additionally, I really like that you provided the query version rewritten with subquery pulled up to a join, which Citus can execute in all PG versions. So, I was thinking, we can include these outputs in pg17_0.sql file
Usually pgxx_0.sql file only has the following lines as we don't execute in previous versions:

--
-- PG16
--
SHOW server_version \gset
SELECT substring(:'server_version', '\d+')::int >= 16 AS server_version_ge_16
\gset
\if :server_version_ge_16
\else
\q

However, we might let it execute in this case. What do you think?

@colm-mchugh
Copy link
Contributor Author

colm-mchugh commented Nov 18, 2024

The reason why I was holding off on merging is that I forgot to test the PR with PG16 😆Sorry about that. So, I realized we miss pg17_0.sql file, which is the alternative file output for pg16/pg15/pg14 runs. Thats why I am requesting changes with this PR review. Also, check-style test is failing, it looks like there are some whitespaces around https://github.com/citusdata/citus/actions/runs/11861227624/job/33058148915?pr=7745

Ah, I was not aware of the pgxx_0.sql convention, let me address, and also check-style

However, we might let it execute in this case. What do you think?

I think that's reasonable! (include queries that Citus can run with pg < pg17)

Copy link
Member

@naisila naisila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for merging to me, thanks! A couple of things:

  • check-style is failing because of some whitespaces I think
  • before merging we need to change the base PR to release-13.0

@naisila naisila changed the title Fix Test Failure in subquery_in_where, set_operations in PG17 (#7741) PG17 compatibility: add/fix tests with correlated subqueries that can be pulled to a join Nov 19, 2024
@colm-mchugh
Copy link
Contributor Author

colm-mchugh commented Nov 19, 2024

  • check-style is failing because of some whitespaces I think

Fixed; forgot to run after changing the test table population

  • before merging we need to change the base PR to release-13.0

Just want to sanity-check how change the base PR to release-13.0 is done; is it:

git checkout dev-branch
git rebase -i --onto release-13.0 naisila/pg17_support dev-branch
< drop irrelevant commits >
git push -f

?

Thanks!

@colm-mchugh colm-mchugh changed the base branch from naisila/pg17_support to release-13.0 November 19, 2024 21:30
@colm-mchugh colm-mchugh changed the base branch from release-13.0 to naisila/pg17_support November 19, 2024 21:31
@colm-mchugh colm-mchugh changed the base branch from naisila/pg17_support to release-13.0 November 20, 2024 09:23
…in PG17 (#7741)

Change the queries causing the test failures so that the ANY subquery
cannot be pulled up to a join, preserving the expected output of the test.

Add pg17 regress test for correlated ANY subqueries that can be folded
to a join in pg17, and for testing other pg17 features as required.
@colm-mchugh
Copy link
Contributor Author

The PR has been rebased to release-13.0, should be good to merge pending any relevant checks

@naisila naisila merged commit 680c23f into release-13.0 Nov 20, 2024
121 checks passed
@naisila naisila deleted the cmchugh/pg17-set_operations branch November 20, 2024 11:51
colm-mchugh added a commit that referenced this pull request Nov 22, 2024
Preserve the test error message by adjusting the query so that PG17
cannot pull it up to a join. Another instance of a subquery that can
be pulled up to a join with PG17 (#7745)
colm-mchugh added a commit that referenced this pull request Nov 22, 2024
Preserve the test error message by adjusting the query so that PG17
cannot pull it up to a join. Another instance of a subquery that can
be pulled up to a join with PG17 (#7745)
colm-mchugh added a commit that referenced this pull request Nov 22, 2024
Preserve the test error message by adjusting the query so that PG17
cannot pull it up to a join. Another instance of a subquery that can be
pulled up to a join with PG17 (#7745)

This should have been fixed in, but slipped by, #7745
m3hm3t pushed a commit that referenced this pull request Nov 28, 2024
… be pulled to a join (#7745)

Fix Test Failure in subquery_in_where, set_operations, dml_recursive in
PG17 #7741

The test failures are caused by[ this commit in
PG17](https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9f1337639),
which enables correlated subqueries to be pulled up to a join. Prior to
this, the correlated subquery was implemented as a subplan. In citus, it
is not possible to pushdown a correlated subplan, but with a different
plan in PG17 the query can be executed, per the test diff from
`subquery_in_where`:

```
37,39c37,41
< DEBUG:  generating subplan XXX_1 for CTE event_id: SELECT user_id AS events_user_id, "time" AS events_time, event_type FROM public.events_table
< DEBUG:  Plan XXX query after replacing subqueries and CTEs: SELECT count(*) AS count FROM ...
< ERROR:  correlated subqueries are not supported when the FROM clause contains a CTE or subquery
---
>  count
> ---------------------------------------------------------------------
>      0
> (1 row)
> 
```

This is because with pg17 `= ANY subquery` in the queries can be
implemented as a join, instead of as a subplan filter on a table scan.
For example, `SELECT * FROM test a WHERE x IN (SELECT x FROM test b
UNION SELECT y FROM test c WHERE a.x = c.x) ORDER BY 1,2` (from
set_operations) has this plan in pg17; note that the subquery is the
inner side of a nested loop join:
```
┌───────────────────────────────────────────────────┐
│                    QUERY PLAN                     │
├───────────────────────────────────────────────────┤
│ Sort                                              │
│   Sort Key: a.x, a.y                              │
│   ->  Nested Loop                                 │
│         ->  Seq Scan on test a                    │
│         ->  Subquery Scan on "ANY_subquery"       │
│               Filter: (a.x = "ANY_subquery".x)    │
│               ->  HashAggregate                   │
│                     Group Key: b.x                │
│                     ->  Append                    │
│                           ->  Seq Scan on test b  │
│                           ->  Seq Scan on test c  │
│                                 Filter: (a.x = x) │
└───────────────────────────────────────────────────┘
```
and this plan in pg16 (and previous pg versions); the subquery is a
correlated subplan filter on a table scan:
```
┌───────────────────────────────────────────────┐
│                  QUERY PLAN                   │
├───────────────────────────────────────────────┤
│ Sort                                          │
│   Sort Key: a.x, a.y                          │
│   ->  Seq Scan on test a                      │
│         Filter: (SubPlan 1)                   │
│         SubPlan 1                             │
│           ->  HashAggregate                   │
│                 Group Key: b.x                │
│                 ->  Append                    │
│                       ->  Seq Scan on test b  │
│                       ->  Seq Scan on test c  │
│                             Filter: (a.x = x) │
└───────────────────────────────────────────────┘
```

The fix Modifies the queries causing the test failures so that an ANY
subquery is not folded to a join, preserving the expected output of the
tests. A similar approach was taken for existing regress tests in the[
postgres
commit](https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9f1337639).
See the `join `regress test, for example.

We also add pg17 specific tests that leverage this improvement in Postgres
with Citus distributed planning as well.
m3hm3t pushed a commit that referenced this pull request Nov 28, 2024
Preserve the test error message by adjusting the query so that PG17
cannot pull it up to a join. Another instance of a subquery that can be
pulled up to a join with PG17 (#7745)

This should have been fixed in, but slipped by, #7745
m3hm3t pushed a commit that referenced this pull request Nov 28, 2024
… be pulled to a join (#7745)

Fix Test Failure in subquery_in_where, set_operations, dml_recursive in
PG17 #7741

The test failures are caused by[ this commit in
PG17](https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9f1337639),
which enables correlated subqueries to be pulled up to a join. Prior to
this, the correlated subquery was implemented as a subplan. In citus, it
is not possible to pushdown a correlated subplan, but with a different
plan in PG17 the query can be executed, per the test diff from
`subquery_in_where`:

```
37,39c37,41
< DEBUG:  generating subplan XXX_1 for CTE event_id: SELECT user_id AS events_user_id, "time" AS events_time, event_type FROM public.events_table
< DEBUG:  Plan XXX query after replacing subqueries and CTEs: SELECT count(*) AS count FROM ...
< ERROR:  correlated subqueries are not supported when the FROM clause contains a CTE or subquery
---
>  count
> ---------------------------------------------------------------------
>      0
> (1 row)
> 
```

This is because with pg17 `= ANY subquery` in the queries can be
implemented as a join, instead of as a subplan filter on a table scan.
For example, `SELECT * FROM test a WHERE x IN (SELECT x FROM test b
UNION SELECT y FROM test c WHERE a.x = c.x) ORDER BY 1,2` (from
set_operations) has this plan in pg17; note that the subquery is the
inner side of a nested loop join:
```
┌───────────────────────────────────────────────────┐
│                    QUERY PLAN                     │
├───────────────────────────────────────────────────┤
│ Sort                                              │
│   Sort Key: a.x, a.y                              │
│   ->  Nested Loop                                 │
│         ->  Seq Scan on test a                    │
│         ->  Subquery Scan on "ANY_subquery"       │
│               Filter: (a.x = "ANY_subquery".x)    │
│               ->  HashAggregate                   │
│                     Group Key: b.x                │
│                     ->  Append                    │
│                           ->  Seq Scan on test b  │
│                           ->  Seq Scan on test c  │
│                                 Filter: (a.x = x) │
└───────────────────────────────────────────────────┘
```
and this plan in pg16 (and previous pg versions); the subquery is a
correlated subplan filter on a table scan:
```
┌───────────────────────────────────────────────┐
│                  QUERY PLAN                   │
├───────────────────────────────────────────────┤
│ Sort                                          │
│   Sort Key: a.x, a.y                          │
│   ->  Seq Scan on test a                      │
│         Filter: (SubPlan 1)                   │
│         SubPlan 1                             │
│           ->  HashAggregate                   │
│                 Group Key: b.x                │
│                 ->  Append                    │
│                       ->  Seq Scan on test b  │
│                       ->  Seq Scan on test c  │
│                             Filter: (a.x = x) │
└───────────────────────────────────────────────┘
```

The fix Modifies the queries causing the test failures so that an ANY
subquery is not folded to a join, preserving the expected output of the
tests. A similar approach was taken for existing regress tests in the[
postgres
commit](https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9f1337639).
See the `join `regress test, for example.

We also add pg17 specific tests that leverage this improvement in Postgres
with Citus distributed planning as well.
m3hm3t pushed a commit that referenced this pull request Nov 28, 2024
Preserve the test error message by adjusting the query so that PG17
cannot pull it up to a join. Another instance of a subquery that can be
pulled up to a join with PG17 (#7745)

This should have been fixed in, but slipped by, #7745
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

📘Fix missing ERROR: cannot push down this subquery in set_operations and subquery_in_where
2 participants