groupAll takes a HUGE amount of time #29

pferrel · 2016-11-19T20:51:50Z

Running on a large cluster and medium sized data (100Mb) this stage take 9.2 hours, by far the longest phase. Any ideas @laser13 @alexice ? This is not very large data and running on 4 r3.4xlarge AWS instances.

pferrel · 2016-11-19T21:01:55Z

here is the old implementation. Should I try putting this back in?

template-scala-parallel-universal-recommendation/src/main/scala/URModel.scala

Line 162 in 816275b

    
           def groupAll( fields: Seq[RDD[(String, (Map[String, Any]))]]): RDD[(String, (Map[String, Any]))] = {

alexice · 2016-11-19T21:47:31Z

Yes, it would be good to compare total time and stage time of previous code. Looks wired. Maybe this is because of some laziness and some other calculations were attributed to this line?

On Nov 19, 2016, at 23:51 , Pat Ferrel [email protected] wrote:

Running on a large cluster and medium sized data (100Mb) this stage take 9.2 hours, by far the longest phase. Any ideas @laser13 @alexice ? This is not very large data and running on 4 r3.4xlarge AWS instances. We are only using popularity, no random or user-defined ranking.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Best regards,
Alexey Pan'kov
e-mail: [email protected]
phone: +7 981 891 2239

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

groupAll takes a HUGE amount of time #29

groupAll takes a HUGE amount of time #29

pferrel commented Nov 19, 2016 •

edited

Loading

pferrel commented Nov 19, 2016

alexice commented Nov 19, 2016

groupAll takes a HUGE amount of time #29

groupAll takes a HUGE amount of time #29

Comments

pferrel commented Nov 19, 2016 • edited Loading

pferrel commented Nov 19, 2016

alexice commented Nov 19, 2016

pferrel commented Nov 19, 2016 •

edited

Loading