Provide an example summingbird-scalding job #474

johnynek · 2014-03-10T17:37:42Z

Show how to launch the whole job on a hadoop cluster.

Take a look at:

https://github.com/twitter/summingbird/blob/develop/summingbird-scalding/src/main/scala/com/twitter/summingbird/scalding/Executor.scala

Which will need one of these:
https://github.com/twitter/summingbird/blob/develop/summingbird-batch-hadoop/src/main/scala/com/twitter/summingbird/batch/state/HDFSState.scala

Here is an example configured job for storm:
https://github.com/twitter/summingbird/blob/develop/summingbird-example/src/main/scala/com/twitter/summingbird/example/StormRunner.scala

jcoveney · 2014-03-10T19:39:16Z

What would we write to? We don't have a good open source batch KV store...we could try HBase or MySql (the latter more just for illustrative purposes)?

caniszczyk · 2014-03-10T19:40:44Z

Leaning towards something that Travis CI supports out of the box as a service would help with tests:
http://docs.travis-ci.com/user/database-setup/

ianoc · 2014-03-10T20:00:12Z

We can run the scalding job without having the mysql or HBase portion
available. a more complete environment it would be good to have it. But our
VersionedBatchedStore is an hdfs file system only thing. It should be
relatively straight forward if we have a source of input data.

A full example would be interesting with client/HBase(or mysql) and
everything. The difficulty with that is mostly around getting a good
datasource?

On Mon, Mar 10, 2014 at 12:40 PM, Chris Aniszczyk
[email protected]:

Leaning towards something that Travis CI supports out of the box as a
service would help with tests:
http://docs.travis-ci.com/user/database-setup/

Reply to this email directly or view it on GitHubhttps://github.com//issues/474#issuecomment-37225047
.

ghost · 2014-05-02T16:38:24Z

Would have an AWS machine image with a full setup perhaps make sense? Access to a real-time and off-line data source would of course be needed. Seems like Twitter's filtered API would make sense. https://dev.twitter.com/docs/api/1.1/post/statuses/filter.

Is anyone working on this issue yet?

…overage Ianoc/q tree benchmark more coverage

johnynek added scalding labels Mar 10, 2014

snoble pushed a commit to snoble/summingbird that referenced this issue Sep 8, 2017

Merge pull request twitter#474 from twitter/ianoc/qTreeBenchmarkMoreC…

0a30c0b

…overage Ianoc/q tree benchmark more coverage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide an example summingbird-scalding job #474

Provide an example summingbird-scalding job #474

johnynek commented Mar 10, 2014

jcoveney commented Mar 10, 2014

caniszczyk commented Mar 10, 2014

ianoc commented Mar 10, 2014

ghost commented May 2, 2014

Provide an example summingbird-scalding job #474

Provide an example summingbird-scalding job #474

Comments

johnynek commented Mar 10, 2014

jcoveney commented Mar 10, 2014

caniszczyk commented Mar 10, 2014

ianoc commented Mar 10, 2014

ghost commented May 2, 2014