-
Notifications
You must be signed in to change notification settings - Fork 75
Doing a query for a range of a float64 field is unacceptably slow #70
Comments
The query optimizer doesn't yet step in for this case so the query makes a full table scan. Please try (untested): SELECT * FROM
(SELECT * FROM Nodes WHERE Lon > 151)
WHERE Lon < 151.0001 This may help, but only a little. I'd suggest instead doing (untested): rs, _, err := db.Run(nil, "SELECT * FROM Nodes WHERE Lon > 151")
if err != nil {
...
}
if err := rs[0].Do(false, func(data []interface) (bool, error) {
if data[LonColNum].(float64) >= 151.0001 {
return false, nil
}
// process data, Lon is in (151, 151.0001)
return true, nil
}); err != nil {
...
} Using indexes, this should materialize only the rows of the interval and I expect it to be much faster then the inner select hack as the outer select will still have to do a linear scan. #TODO(optimizer) |
The problem with doing the last code snippet is that in the average case it generates a temporary results set of 500K nodes (half the db), which still have to be iterated through.
Just counting the rows returned from the first part of that query took over 30 seconds. Thats to say nothing of the Do(). |
The I promise it will create exactly those rows which have Lon in (151, 151.0001), assuming Lon has an existing index. The first But in the discussed code, the closure passed to the .Do method returns false as soon as Lon >= 151.001. As the physical index is ascending, no more rows are ever read from the DB, only those in the requested interval. |
Impressive :) Is there a way to use |
Unfortunately no, there's no way how to do this using |
Dayum. I guess ill either have to wait for the optimizer or use the normal interface to ql for the server application. Can you recommend any sneaky code for doing the interval select across two fields? IE: |
I have no good idea how to do this as efficiently as the correct use of the indexes can. It really needs the query optimizer to do this work automatically. If you can afford to wait few days or a week or so then maybe I can add this feature in that time. But I have virtually no free time at the moment. |
I have no problem waiting :) Thanks |
Hey man, I was going to try and tackle this as uni break just started. But then I tried to have a look through your code and was like welp haha Perhaps you can give me some pointers? |
Well, where to start from? Perhaps please tell me which specific place/functionality/... you want a pointer to and I'll attempt to give you some ;-) |
Okies, well perhaps a good place to start would be detecting the From the looks for things in stmt.go, all the SQL is compiled into a AST. So what I was thinking of doing was adding more code to selectstmt.exec0 to look into each WHERE clause and detect when someone is doing a > or < for the same field twice? What do you think? |
That place handles certain cross join optimization (
Actually Top level "optimizer" is here. That's the place where multiple relations on the same indexed field should probably be detected and dispatched to an optimize-attempt function (something like this). Currently the optimizer looks only for single columns with indices and a single relational operation on that (indexed) column. The addition would be in
I think the above might get you started. If not, please feel free to suggest/ask more. |
Awesome, that should be enough for me to play around and start looking at how things are operating. Ill fork and start playing when I wake up in the morning (today!) What do you think of re-writing the other way round - changing an expression like |
Ah yes, between is rewritten to < and > thats why |
Specifically, |
What is lhs? Is it the table/column name? and where is the mysterious expand1 function? What do you keep in ctx and arg? |
The current Evaluation
|
Okay, starting to get my head around it. Are the temporary result sets being returned via those callback passed in parameters? Plan is to copy paste the |
The rows produced are returned by the callback ( I'll now create the newBetween to normalize
Maybe the new code will go inside this fn. I cannot find quickly where it's checked the bin op is a simple 'field op expr'. There we need to divert for the case 'field op1 expr1 && field op1 expr2', check the field has index and that the op1 is in |
Awesome, Ill start playing around with the code in that fn so I understand On Sat, Sep 20, 2014 at 6:09 PM, cznic [email protected] wrote:
|
On Sat, Sep 20, 2014 at 10:16 AM, Twitch [email protected] wrote:
Correct. |
Okay so to start off im going to work on the test case:
When the query
Where the query
So now I am going to look at detecting the case we need to optimize in tryUseIndex() |
Yep. BTW: In your example, |
|
Ah yes, you are right.
I just spent the last half hour trying to come up with a recursive function to scan the whole parse tree and re-arrange and detect all the things, my train of thought has died so I think thats a bit too ambitious. I'm going to try a spaghetti of switches now. Fork rebased. |
what does expr.isStatic() do? If it does what I think it does, it might be a handy shortcut for me. |
|
Added some comments at your commit. |
Awesome, that simplifies things. I think I have written code that detects the case in my latest commit. Please have a look. operand1 is the minimum number, operand2 is the max. op1 and op2 are stored so we know if it is ge/le or </>. |
Nice work. Small issue found:
|
It's more complicated then I wrote above. *: Because its equal to |
Not to worry, ive almost finished writing that conditional. Dodgey, but:
|
collate1 does that. Note that it panics when it cannot collate b/c of incompatible types. Before calling An |
Ehm, |
I havent used collate1, but have a look at my latest commit. Moar bugfixes and features. Havent tested parameters but they should work. |
Actually, the easiest way to collate two values is probably like this: // op eg '<', ge, ...
x, err := newBinaryOperation(op, val1, val2) // does ALL necessary static checks
if err != nil {
// fail
}
v, err := x.eval(ctx, arg) // Does ALL necessary dynamic checks
if err != nil {
// fail
}
if v.val.(bool) {
// val1 op val2 == true
} Normally it would be too costly, but here we are doing it only once or twice as a preparation step.
Will do. |
If I pass nil, ctx.arg will that fix it or am I missing something? |
Yes, that should be okay. Otherwise I think I've finished reviewing your last commit, please see the inline comments. IMO you're getting pretty close to the "real" thing! |
Awesome. I'm going to get started on the method which will do the heavy lifting. Any preferences for how things are done? I was just going to follow the design of tryBinOpID() but without the ID crap. |
I suggest to get inspiration from tryBinOp instead. The ID version is derived from it and the ID version handles only one integer type and does not handle nulls (no row |
Hectic. I'm gonna go to bed and hopefully get on to it tomorrow. Feel free to change whatever you want in the meantime. |
I was going to work on this, except I met my ex randomly and we got along way too well and ended up out for dinner. TL;DR my brain is frazzled, cant work on this tonight :/ sorry |
No need to say sorry, you're your only boss in charge of this task ;-) |
This is my attempt at storing and using open street map data for my city, btw. There are over a million nodes. I want the following query to be as fast as possible (<150ms when the db is in RAM) because there are other queries to do nessesary to render a tile.
If I have a table:
And I run the query:
(I will add a Latitude range conditional too when I get this working well).
It takes over a minute to complete, only returning 40 or so rows (out of 1 million). My guess is that ql is generating two sets of data for each WHERE clause and trying to union them, which is taking a while.
Is there a better way I can write my query? or perhaps I am missing an index of some kind? or maybe something that can be done with the backend to speed it up for this kind of thing? (like a RANGE keyword/operator)
Thanks,
Tom
The text was updated successfully, but these errors were encountered: