-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Scio on Jupyter #226
base: main
Are you sure you want to change the base?
Conversation
* upgrade to scio 0.6.1 * Add api to easily create multiple scio context with same settings. * Some helper functions to make use of Scio easier.
Hey @alexarchambault would you kindly review this PR.. |
Hey @anish749, sorry for the delay. I can merge, then make a release, if you have an immediate use of this. But most development now happens on the For scio, I guess a project similar to ammonite-spark could be written, adding scio support to Ammonite. It can then be used as is from the upcoming version of jupyter-scala. I was thinking of maybe renaming ammonite-spark to something like ammonite-bigdata, and add support for scio in it, among other stuff. Is it something that would be useful for you? |
Hey @alexarchambault I think the dev branch is quite ahead of the master. What is the plan for merging and release of the next version? If it is longer, then it would be great to have v0.4.3. If the idea is to separate out the The problem with Beam / Scio is that it is not very well suited for interactive analysis at the moment, which narrows the use cases while in Jupyter. There were times when I really felt the need of having a notebook based environment, and hence started experimenting with this. And I was also wondering the support for Almond in a docker image. I was testing this locally in a docker image, which makes collaborative development easier. I was thinking of adding this to https://github.com/jupyter/docker-stacks as well. What do you think? |
@anish749 Sure, go for it for https://github.com/jupyter/docker-stacks, don't hesitate to ping me there for feedback. FYI, @Atry already wrote and pushed a docker image for almond, see #214 (comment) (but it's not added to docker-stacks) |
@anish749 You can find those docker images at https://hub.docker.com/r/popatry/almond-images/ |
I tried to make this more usable.
Since this was almost 2 yrs old lying un used, I took the liberty to break some of the APIs.
TL;DR:
This primarily adds the functionality to easily close and recreate Scio contexts with same pipeline options and run on data flow / other runners. It still doesn't make things very interactive as outputs are still not in-memory, and its takes a few minutes to start a Dataflow Job.
This PR however makes it easy to iteratively develop batch pipelines in Scio / Beam.
Also we can use Taps to temporarily materialize a SCollection to the staging bucket and read the data from there. This makes analysis somewhat easier.