Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with distance join scaleup performance... #91

Open
aocalderon opened this issue Dec 19, 2017 · 1 comment
Open

Issue with distance join scaleup performance... #91

aocalderon opened this issue Dec 19, 2017 · 1 comment

Comments

@aocalderon
Copy link

Hello there,

I have a particular question about the performance of distance join when increase the number of available cores but also the size of the workload (size of the datasets to join). I have a group of points datasets (namely 'points' and 'centers') which double in size for each run. For example, points0.txt has ~20000 points, points1.txt has ~40000, points2.txt has ~60000 and points3.txt has ~80000. Similar for centers but ranging from 47663 (centers0.txt) to 190655 (centers3.txt, four times centers0.txt). Now I perform distance join between corresponding datasets for points and centers duplicating the number of cores accordingly for each run. So, first a distance join between points0 and centers0 using N cores, then between points1 and centers1 using 2N cores, then using 3N cores and so on.

So what we expect is that the execution time for each distance join to be quite similar. However, you can see in [1] that there is considerable improvement in performance when you increase the number of cores. It sounds very good but we are wondering why...

We have prepared a sample code you can download from [2]. It has the source code and random datasets and also some scripts to run a similar experiment and plot the data.

Wonder if we are missing a particular parameters or special setting to run distance join in Simba. Any help or feedback will be very appreciated.

Kind regards,
Andres

[1] http://www.cs.ucr.edu/~acald013/public/simba/Centers2Points.pdf
[2] http://www.cs.ucr.edu/~acald013/public/simba/RandomTester.tar.gz

@dongx-psu
Copy link
Member

dongx-psu commented Dec 19, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants