Presto-odbc with large tables (>300 K rows) #49

nnadtoka · 2014-09-30T20:55:43Z

We are seeing a little bit of an issue with loading big tables sizes (over 300K-500K rows) into Tableau with Presto odbc driver. It becomes very slow, seems to only work with only an extract of data.

Have you seen any issues like that before? What kind of tables have you tested it on (row numbers, number of columns, data types)

Thanks Nezih.

Best,
Nataliya.

nezihyigitbasi · 2014-09-30T21:39:35Z

Hi Nataliya,
We also had similar issues with big tables @ Netflix as the driver currently loads the whole result set in memory before returning it to application/Tableau. That's why it's slow. Please see https://github.com/prestodb/presto-odbc/blob/master/driver/driver.d#L448 for the for loop that exhausts all rows in the returned result set. Like you say the driver is currently more suitable for data extracts.

Nezih

yitzchakl · 2014-11-17T09:33:54Z

Hi.

On the driver.d, how could i turn SQLExecuteImpl method to return the result set to the application by paging (or similar functionality) and not only at the end of batch processing?
Is there a way to prevent from exhausting all rows in the result set to the memory?

markisaa · 2014-11-17T16:59:38Z

@yitzchakl - I just did some checking. This may be possible (though a definitive answer will require yet more investigation).

Notes to a future implementer:

You must not move the call to ExecuteImpl out of SQLPrepare. In particular, it is necessary to check the syntax/validity of the query at the prepare step, as this cannot be done later on. (Presto also has no SQLPrepare or purely syntax checking query capability).
The results from Presto are already naturally "paged". We use the http interface, which shows some results and has a link to the "next" batch of results. Presently the code reads all of them up-front. We only need to read the first page right away to get the error checking component. It is possible that we can meet this need while defering more of the work till it is called for.
This is presently of questionable benefit: Because garbage collection is turned off, even if we defer reading the input, the memory would not be freed, therefore it may not buy us anything at all. This is especially true given that Tableau, in my experience, likes to read all of the data up-front anyway. Re-enabling the garbage collector correctly is a highly non-trivial task.

yitzchakl · 2014-11-18T14:42:05Z

Ok, thanks @markisaa .

My main goal is to improve the performance of the execute, right now my testing results are not so good (i'm working with Qlikview, not Tableau) :
3 minutes for 300,000 rows (14 columns table)
4 minutes for 540,000 rows (14 columns table)
9 minutes (!) for 560,000 rows (31 columns table)

Off course i'm running without tracing log :-)
Which most expensive operation should be optimized? the for loop?

Thanks in advance.

markisaa · 2014-11-18T18:33:16Z

Sorry to hear that. Performance was definitely not high on the original design considerations; we wanted something that worked and my internship only lasted so long. The only way to know what's slow would be to benchmark. It would not surprise me if the [mis]use of memory played a role here (though I'm sure there's plenty of code that can just be optimized anyway).

I'm not in a position to work on this at the moment (swamped with my last term of university), but should that change I will see if I can help on this front. I'll also do some reading on how to properly re-enable the GC.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Presto-odbc with large tables (>300 K rows) #49

Presto-odbc with large tables (>300 K rows) #49

nnadtoka commented Sep 30, 2014

nezihyigitbasi commented Sep 30, 2014

yitzchakl commented Nov 17, 2014

markisaa commented Nov 17, 2014

yitzchakl commented Nov 18, 2014

markisaa commented Nov 18, 2014

Presto-odbc with large tables (>300 K rows) #49

Presto-odbc with large tables (>300 K rows) #49

Comments

nnadtoka commented Sep 30, 2014

nezihyigitbasi commented Sep 30, 2014

yitzchakl commented Nov 17, 2014

markisaa commented Nov 17, 2014

yitzchakl commented Nov 18, 2014

markisaa commented Nov 18, 2014