Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporary tables #2962

Merged
merged 18 commits into from
Nov 12, 2024
Merged

Temporary tables #2962

merged 18 commits into from
Nov 12, 2024

Conversation

hatyo
Copy link
Contributor

@hatyo hatyo commented Oct 30, 2024

This introduces temporary tables planning and execution. A temporary table is an in-memory buffer that holds a list of QueryResult objects.

They offer means for storing temporary computation results, and can be leveraged to serve as SQL temporary tables in the future. The following APIs are introduced:

  1. TempTable, this wraps a Queue of QueryResult objects, it is mutable, and in its current version, not thread-safe.
  2. Two logical expressions are introduced to model table queue source / sink:
    a. TableValuedCorrelationScanExpression enables scanning from a table-valued correlation, such as a TempTable.
    b. TempTableInsertExpression enables inserting into a TempTable .
  3. Two physical operators are introduced that correspond to an implementation of the above expressions; TableValuedCorrelationScanPlan and TempTableInsertPlan.
  4. Expression implementation rules.

Few tests are added to illustrate the planning capabilities and execution correctness over temporary tables.

Few extra details:

  • Added Ser/De logic for QueryResult.
  • The TempTable can currently serialize its state, i.e. the underlying queue elements.

Few examples:

Inserting into a temp table:

insert into t2 values ((1, 'foo'), (2, 'bar'))

the logical plan looks like this:
image

while the physical plan looks like this:

image

Selecting from a temp table: the following is a logical plan representing a join between two named temp tables:

select * from t1, t2
image

and the physical NLJ implementation of the join:
image

This solves #2961

hatyo added 6 commits October 22, 2024 20:43
- rework continuation semantics and add few continuation tests.
- new implementation of RecursiveUnorderedUnionCursor, incomplete, requires testing.
- new implementation of RecordQueryRecursiveUnorderedUnionPlan, incomplete, requires testing.
- new implementation of DoubleBufferCursor, requires more testing.
@hatyo hatyo added the DO NOT MERGE do not merge label Oct 30, 2024
@hatyo hatyo requested review from normen662 and MMcM October 31, 2024 17:53
@hatyo hatyo changed the title WIP - Table queues Table queues Oct 31, 2024
@hatyo hatyo marked this pull request as ready for review October 31, 2024 17:53
@hatyo hatyo removed the DO NOT MERGE do not merge label Nov 1, 2024
@FoundationDB FoundationDB deleted a comment from foundationdb-ci Nov 1, 2024
@@ -44,17 +44,18 @@ public class Bindings {
/**
* Bindings slots used internally by plan operators.
*/
public enum Internal {
public enum BindingType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that Internal is too short for what is now a fairly wide variety of special bindings. But BindingType suggests to me that all bindings would be of one of these types, when these are still just the special ones and normally named parameters aren't one of these types. Also, the serialized enum is PBindingKind. It seems that perhaps a longer consistent name is needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only bindings that do not have an Internal are named bindings coming from the heuristic planner. I think everything in the cascades planner actually has a corresponding Internal (mostly CORRELATION or CONSTANT so far).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the newly added enum value TABLE_QUEUE effectively reverting the change, but I renamed Internal to BindingKind (not BindingType) to align with the corresponding PB definition in record_query_plan.proto:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although one might prefer to build a ConstantObjectValue and/or CONSTANT binding, from some higher level query data structure, it works today to call CascadesPlanner with a RecordQuery containing a ParameterComparison and run that plan with an EvaluationContext with a Bindings with a corresponding named parameter. In other words, an external binding kind. These untyped / neutral bindings are not specific to the heuristic planner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted the name back to how it was before. It looks like introducing an External binding could cause wider code changes that are outside the scope of this PR. Let's discuss this as a follow up item.

@SuppressWarnings("unchecked")
@Nonnull
public RecordCursor<QueryResult> getReadCursor(@Nullable byte[] continuation) {
return new ListCursor<>((List<QueryResult>)getReadBuffer(), continuation);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this exhausts, it does not work to have one part of the plan filling and another part emptying. One must, I think, do all the adds and then getReadCursor to process the queue. Does that mean that the table queue itself ought to have a mode for which direction it is currently handling and callers must set that explicitly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is intended to be used such that there are no interleaving reads and writes, to address that, the consumer must make sure that these accesses are synchronised correctly e.g. by means of read-write lock, otherwise it would might break any expected continuation semantics imposed by the consumer.

@hatyo hatyo changed the title Table queues Temporary tables Nov 7, 2024
@hatyo hatyo force-pushed the table-queues branch 2 times, most recently from 03e4c8d to ff3e1ce Compare November 7, 2024 12:53
- Massive renaming, now we have `TempTableScanExpression`, `TempTableInsertExpression`, respective `TempTableInsertPlan`, and `TempTableScanPlan`, PB names are also consistent with that.
- Moved `PEnumLightValue`, `PUUID`, `PFDBRecordVersion`, `PComparableObject`, `PQueryResult`, and `PTempTable` to `record_metadata.proto` since these are runtime objects.
Removed the name from `TempTable`.
Use `Value` to “refer” to a `TempTable` in the `EvaluationContext` instead of a plain `CorrelationIdentifier`, in all `TempTableScanExpression`, `TempTableInsertExpression`, respective `TempTableInsertPlan`, and `TempTableScanPlan`. Made the expressions’ constructors private and added two factory methods that enable the customer to create an expression using a constant binding, or a correlated binding.
- Fixed the `translateCorrelations` implementations in `UpdateExpression`, `InsertExpression`, and `DeleteExpression`. Now it rebuilds the expression using the correct translated inner.
@FoundationDB FoundationDB deleted a comment from foundationdb-ci Nov 8, 2024
@FoundationDB FoundationDB deleted a comment from foundationdb-ci Nov 8, 2024
@FoundationDB FoundationDB deleted a comment from foundationdb-ci Nov 8, 2024
@FoundationDB FoundationDB deleted a comment from foundationdb-ci Nov 8, 2024
@FoundationDB FoundationDB deleted a comment from foundationdb-ci Nov 8, 2024
@FoundationDB FoundationDB deleted a comment from foundationdb-ci Nov 8, 2024
@FoundationDB FoundationDB deleted a comment from foundationdb-ci Nov 12, 2024
@FoundationDB FoundationDB deleted a comment from foundationdb-ci Nov 12, 2024
@FoundationDB FoundationDB deleted a comment from foundationdb-ci Nov 12, 2024
@FoundationDB FoundationDB deleted a comment from foundationdb-ci Nov 12, 2024
@FoundationDB FoundationDB deleted a comment from foundationdb-ci Nov 12, 2024
@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: 97337ec
  • Duration 0:52:40
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@FoundationDB FoundationDB deleted a comment from foundationdb-ci Nov 12, 2024
@MMcM MMcM merged commit 3e84644 into FoundationDB:main Nov 12, 2024
1 check passed
@hatyo hatyo deleted the table-queues branch November 13, 2024 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants