You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using UR(v0.2.3) template and have some trouble with scalability.
Training take 18hours (each day) and last 12 hours it use only one core.
As I can see URAlgorithm.scala (line 144) call SimilarityAnalysis.cooccurrencesIDSs
with data.actions (12 partitions)
untill reduceByKey in AtB.scala it executes in parallel
but after this it executing in single thread.
It is strange, that when SimilarityAnalysis.scala(line 145) call
indexedDatasets(0).create(drm, indexedDatasets(0).columnIDs, indexedDatasets(i).columnIDs)
it return IndexedDataset with only one partition.
Hi,
I'm using UR(v0.2.3) template and have some trouble with scalability.
Training take 18hours (each day) and last 12 hours it use only one core.
As I can see URAlgorithm.scala (line 144) call SimilarityAnalysis.cooccurrencesIDSs
with data.actions (12 partitions)
untill reduceByKey in AtB.scala it executes in parallel
but after this it executing in single thread.
It is strange, that when SimilarityAnalysis.scala(line 145) call
indexedDatasets(0).create(drm, indexedDatasets(0).columnIDs, indexedDatasets(i).columnIDs)
it return IndexedDataset with only one partition.
As I can see in SimilarityAnalysis.scala(line 63)
drmARaw.par(auto = true)
May be this cause decreasing the number of partitions.
As I can see in master branch of MAHOUT
has ParOpt:
https://github.com/apache/mahout/blob/master/math-scala/src/main/scala/org/apache/mahout/math/cf/SimilarityAnalysis.scala#L142
May be this can fix the problem.
So, am I right with root of problems, and how can I fix it?
The text was updated successfully, but these errors were encountered: