Improve Scio on Jupyter #226

anish749 · 2018-09-18T12:28:24Z

I tried to make this more usable.

Since this was almost 2 yrs old lying un used, I took the liberty to break some of the APIs.

TL;DR:

upgrade to scio 0.6.1
Add api to easily create multiple Scio context with same PipelineOptions.
Some helper functions to make use of Scio easier from Jupyter.

This primarily adds the functionality to easily close and recreate Scio contexts with same pipeline options and run on data flow / other runners. It still doesn't make things very interactive as outputs are still not in-memory, and its takes a few minutes to start a Dataflow Job.
This PR however makes it easy to iteratively develop batch pipelines in Scio / Beam.

Also we can use Taps to temporarily materialize a SCollection to the staging bucket and read the data from there. This makes analysis somewhat easier.

* upgrade to scio 0.6.1 * Add api to easily create multiple scio context with same settings. * Some helper functions to make use of Scio easier.

anish749 · 2018-09-21T17:18:30Z

Hey @alexarchambault would you kindly review this PR..

alexarchambault · 2018-09-21T22:56:38Z

Hey @anish749, sorry for the delay.

I can merge, then make a release, if you have an immediate use of this. But most development now happens on the develop branch. It supports spark via this project that targets Ammonite, that it just extends a bit to get some extra Jupyter-specific niceties (progress bars, …).

For scio, I guess a project similar to ammonite-spark could be written, adding scio support to Ammonite. It can then be used as is from the upcoming version of jupyter-scala. I was thinking of maybe renaming ammonite-spark to something like ammonite-bigdata, and add support for scio in it, among other stuff. Is it something that would be useful for you?

anish749 · 2018-09-23T20:56:35Z

Hey @alexarchambault I think the dev branch is quite ahead of the master. What is the plan for merging and release of the next version? If it is longer, then it would be great to have v0.4.3.

If the idea is to separate out the jupyter-scala repo into multiple repos, I feel it might be a good idea to have Scio as ammonite-scio separate from ammonite-spark, given that a majority of users who would plan to use Spark for interactive analysis would not be using Scio at the same time and vice versa.

The problem with Beam / Scio is that it is not very well suited for interactive analysis at the moment, which narrows the use cases while in Jupyter. There were times when I really felt the need of having a notebook based environment, and hence started experimenting with this.

And I was also wondering the support for Almond in a docker image. I was testing this locally in a docker image, which makes collaborative development easier. I was thinking of adding this to https://github.com/jupyter/docker-stacks as well. What do you think?

alexarchambault · 2018-10-05T16:28:20Z

@anish749 Sure, go for it for https://github.com/jupyter/docker-stacks, don't hesitate to ping me there for feedback.

FYI, @Atry already wrote and pushed a docker image for almond, see #214 (comment) (but it's not added to docker-stacks)

Atry · 2018-10-05T16:30:29Z

@anish749 You can find those docker images at https://hub.docker.com/r/popatry/almond-images/

anish749 and others added 2 commits September 18, 2018 14:18

Improve Scio on Jupyter

cc5f451

* upgrade to scio 0.6.1 * Add api to easily create multiple scio context with same settings. * Some helper functions to make use of Scio easier.

revert back installation script to a non snapshot version

1094ad6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Scio on Jupyter #226

Improve Scio on Jupyter #226

anish749 commented Sep 18, 2018

anish749 commented Sep 21, 2018

alexarchambault commented Sep 21, 2018

anish749 commented Sep 23, 2018 •

edited

Loading

alexarchambault commented Oct 5, 2018

Atry commented Oct 5, 2018

Improve Scio on Jupyter #226

Are you sure you want to change the base?

Improve Scio on Jupyter #226

Conversation

anish749 commented Sep 18, 2018

anish749 commented Sep 21, 2018

alexarchambault commented Sep 21, 2018

anish749 commented Sep 23, 2018 • edited Loading

alexarchambault commented Oct 5, 2018

Atry commented Oct 5, 2018

anish749 commented Sep 23, 2018 •

edited

Loading