Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quietly prepare unprepared queries in batches #812

Merged
merged 2 commits into from
Sep 27, 2023

Conversation

sylwiaszunejko
Copy link
Contributor

@sylwiaszunejko sylwiaszunejko commented Sep 18, 2023

Motivation

See the referenced issue (#800)

What's done

Information about the types of the bind markers is necessary to make the serialization API safe (i.e. so that the user data won't be misinterpreted when sending it to the database). The problem with unprepared statements in batch is that the driver doesn't have a good way to determine column names and types of the bind markers.

Now before sending a batch, every unprepared statement is quietly prepared on one connection to obtain the information about the bind markers. If the list of values to be bound to the statement is empty, we just send it as an unprepared query.

Fixes: #800

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

@piodul
Copy link
Collaborator

piodul commented Sep 20, 2023

Please change the PR title to mention that it's about unprepared queries in batches. Currently, it's too similar to the previous PR's title IMO (#806).

let mut batch: Batch = Default::default();
batch.config = init_batch.config.clone();

let mut to_prepare = HashSet::<usize>::new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use a HashSet to store the index? Every index you are going to insert will be unique, so it using just Vec would be simpler.

Besides that, using a HashSet would make more sense if you stored query strings in it - this way if somebody wants to execute a single unprepared multiple times with different parameters, it will only have to be prepared once. I think we should follow this approach.

Copy link
Contributor Author

@sylwiaszunejko sylwiaszunejko Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced HashSet with Vec. I tried to store query strings, but I would still need to know the indexes to maintain the right order of the statements in batch

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the code looks more complicated than before, actually. The vec_id variable and comparing indexes looks tricky...

I replaced HashSet with Vec. I tried to store query strings, but I would still need to know the indexes to maintain the right order of the statements in batch

You can first create a hashset with all needed strings. Then, you can prepare the statements and put them in a hashmap, indexed by statement string. Finally, when you re-create the batch, when you iterate and encounter an unprepared statement/query in the original batch, look up the prepared statement in the hashmap.

values: impl BatchValues,
consistency: Consistency,
serial_consistency: Option<SerialConsistency>,
) -> Result<QueryResult, QueryError> {
let batch = {
let mut batch: Batch = Default::default();
batch.config = init_batch.config.clone();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no statements to prepare, can we avoid creating a new batch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added checking if to_prepare is empty

{
let mut value_iter = values.batch_values_iter();
for (id, stmt) in init_batch.statements.iter().enumerate() {
let value = value_iter.next_serialized().transpose()?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serializing values for a query can be relatively expensive, so it would be best to avoid it. Let's use skip_next instead of next_serialized in case of BatchStatement::PreparedStatement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@sylwiaszunejko sylwiaszunejko changed the title Quietly prepare unprepared queries Quietly prepare unprepared queries in batches Sep 21, 2023
let mut batch: Batch = Default::default();

if to_prepare.is_empty() {
batch = init_batch.clone();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather avoid cloning anything if we can use the init_batch directly. Can we use Cow for this? If we can use init_batch, assign Cow::Borrowed(init_batch) to batch. Otherwise, construct a new one and assign Cow::Owned(new_batch) to batch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

let mut batch: Batch = Default::default();
batch.config = init_batch.config.clone();

let mut to_prepare = HashSet::<usize>::new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the code looks more complicated than before, actually. The vec_id variable and comparing indexes looks tricky...

I replaced HashSet with Vec. I tried to store query strings, but I would still need to know the indexes to maintain the right order of the statements in batch

You can first create a hashset with all needed strings. Then, you can prepare the statements and put them in a hashmap, indexed by statement string. Finally, when you re-create the batch, when you iterate and encounter an unprepared statement/query in the original batch, look up the prepared statement in the hashmap.

}

batch
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, let's see if we can extract all of the added logic here to a separate method/function. It's getting quite long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried, but to extract whole logic I have to pass values as an argument and that's not obvious, because I cannot clone/copy it, and it is used by the rest of the function so I moving is not possible. I am not sure if it is better to extract only part of the logic or to not extract it at all

Copy link
Collaborator

@piodul piodul Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to copy or clone values, you only need to pass it by reference. As I can see, it's only needed for .batch_values_iter() which takes values by reference - so you can pass values by reference to that function.

I imagine that the final method will look like this (I hope I didn't mess up lifetimes):

async fn prepare_batch<'b>(
    &self,
    batch: &'b Batch,
    values_iter: impl BatchValuesIterator<'_>, // or: `values: &impl BatchValues,`
) -> Cow<'b, Batch> {
    // ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few errors and I have to change it a little bit to:

 async fn prepare_batch<'b>(
        &self,
        init_batch: &'b Batch,
        mut values_iter: impl BatchValuesIterator<'_>,
    ) -> Result<Cow<'b, Batch>, QueryError> {
    //...
}

but it still does not build, there is a problem with lifetimes in session_test.rs:408 (implementation of std::marker::Send is not general enough
|
= note: std::marker::Send would have to be implemented for the type &'0 str, for any lifetime '0...
= note: ...but std::marker::Send is actually implemented for the type &'1 str, for some specific lifetime '1).
I am not sure how to fix it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Please try passing &values instead. For some reason, the BatchValuesIterator cannot be held across await points, so I suppose there might be issues when trying to pass it to async functions.

@sylwiaszunejko sylwiaszunejko force-pushed the prepare_batch branch 4 times, most recently from 3c58014 to a623af9 Compare September 25, 2023 18:04
Copy link
Collaborator

@piodul piodul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some final nitpicks with regards to the implementation. What about tests?

I'd like to see some simple test that sends a batch, queries the table and sees whether the batch modified the table as expected. No need to involve the proxy. I'd like to see several test cases that exercise different paths in the prepare_batch code:

  • Batch only contains prepared statements,
  • Batch only contains unprepared statements (queries) without values,
  • Batch only contains unprepared statements with values,
  • Batch that contains a combination of the above

init_batch: &'b Batch,
values: impl BatchValues,
) -> Result<Cow<'b, Batch>, QueryError> {
let mut to_prepare = HashSet::<String>::new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you can probably use &str as the key type. The strings will be borrowed from the init_batch object which is alive for the whole duration of this function and we don't modify it, so lifetimes should check out, This will save you an allocation when inserting to the hashset.

Comment on lines 855 to 872
let mut batch: Cow<Batch>;

if to_prepare.is_empty() {
batch = Cow::Borrowed(init_batch);
} else {
batch = Cow::Owned(Default::default());
batch.to_mut().config = init_batch.config.clone();
for stmt in &init_batch.statements {
match stmt {
BatchStatement::Query(query) => match prepared_queries.get(&query.contents) {
Some(prepared) => batch.to_mut().append_statement(prepared.clone()),
None => batch.to_mut().append_statement(query.clone()),
},
BatchStatement::PreparedStatement(prepared) => {
batch.to_mut().append_statement(prepared.clone());
}
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Rust, ifs are expressions and evaluate to a value - you can use that to assign to batch immediately.

Moreover, in the else part, I suggest creating a value of type Batch, initializing it and then assigning it to Cow<Batch>. This will allow to avoid calling batch.to_mut() in multiple places.

let batch = if to_prepare.is_empty() {
    Cow::Borrowed(init_batch)
} else {
    let mut batch = Batch::default();
    // ...
    Cow::Owned(batch)
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, if you implement an early return like I suggested before, here you will be able to drop the if altogether and only leave the code from the else branch.

}
}

let mut prepared_queries = HashMap::<String, PreparedStatement>::new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one can probably take a &str, too.

scylla/src/transport/connection.rs Show resolved Hide resolved
Before sending an unprepared statement in batch, quietly prepare it on one connection to obtain the information about the
bind markers. If the list of values to be bound is empty, we
just send it as an unprepared query.
@piodul piodul merged commit 07b262c into scylladb:main Sep 27, 2023
8 checks passed
piodul added a commit that referenced this pull request Oct 5, 2023
This reverts commit 07b262c, reversing
changes made to 1854c96.
piodul added a commit that referenced this pull request Oct 5, 2023
This reverts commit 07b262c, reversing
changes made to 1854c96.
piodul added a commit that referenced this pull request Oct 6, 2023
This reverts commit 07b262c, reversing
changes made to 1854c96.
@Lorak-mmk Lorak-mmk mentioned this pull request Dec 14, 2023
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Serialization refactor: obtain information about column types when doing Session::execute
2 participants