-
Notifications
You must be signed in to change notification settings - Fork 9
« Dockerizing » Swift-Colab #16
Comments
For starters, I'm going to lay out the challenges we will face. Swift-Colab is extremely fine-tuned for the Colab environment. I hard-coded concise URLs such as It is definitely possible to bring this to Docker. For starters, I recommend attempting to run the original swift-jupyter on an x86 machine. Then, we should make a separate repository for building the prototype that runs on both Colab and Docker. Finally, merge it into the main repo to form Swift-Colab 3.0. If you read the summary of Swift-Colab's history, this is exactly the same thing I did with swift-colab-dev and Swift-Colab 2.0. I want to repeat the same tactic with the addition of Docker support because there will be so many breaking changes.
|
I was drafting the above message while you were typing yours, so it wasn't a proper response to your questions. |
np. It does help me understand where you were coming from. I don’t believe Docker has any knowledge of any Apple specific hardware/tech (it’s basically a linux layer) [side note: this might someday unlock that: https://github.com/KhaosT/MacVM]. Re: Jupyter running in a Mac environment, I think it does run directly from mac os (so no docker in between), and thus would hv access to the whole hardware. Maybe what can be done here (1) as a starting point would be to port swift-colab to a local jupyter notebook/ jupyterlab env (mac os/linux). I don’t think docker is relevant unless we look at running swift-colab in a JupyterHub environment (ie the equivalent of Google Colab running on your own cloud/hardware). In that case, we would need to have a working swift-colab on linux (as described in (1) above). PS: the swift-colab name might not be as appropriate when we get there, but that is for later … :) |
Small update: one can indeed package everything in docker as described in the original swift-jupyter repo. that is equivalent to packaging it for consumption with JupyterHub. I will look at using swift-colab with jupyterlab and then look at packaging it in docker. |
I think I skimmed over some of your original points. You were suggesting that we could pre-compile Swift packages. That will require a of work and long-term decision making as described here, thus why I wanted to drop the plan. As I look through all the backend libraries, they should compile very quickly in a SwiftPM environment. The exception is the CTensorFlow/X10 dependency, but that is already used as a binary. I might even transform it into a "system library" installed through a mechanism like Brew or APT. It will likely be a mutual dependency of S4TF and our matrix library, if my current plans for hardware acceleration don't change. I was debating adding a new magic command for installing X10 on Google Colab. |
The naming seems to be a done deal. I have used the name in so many places and changing it would break source compatibility with some existing Colab notebooks. I don't think we can change it at this point, unless we create an entirely separate repo for Docker support and maintain that repo indefinitely. Temporarily, yes, we could do that. Perhaps whether we merge it or maintain a fork is best left to discussion by the working group. |
for (a) I would think (but could be mistaken) we would just need to iterate over a few tool chains versions in a GitHub action to generate .tgz packages which can be downloaded (would also save everyone from recompiling the same thing, even if it is indeed a relatively fast compile). For (b), do you mean the need to make multiple toplichtend available? And re: the name, of course I am not suggesting that it should change as I am sure that it has been used in too many place for it to be convenient to change (and it’s your call regardless). It was a tongue in cheek remark as “colab” is somewhat the Google branding of Jupyter. |
(a) AutoDiff is not currently ABI-stable. S4TF is a major planned use case for Colab, and it relies on AutoDiff. That means if the ABI changes, all the old binaries will become useless unless you commit to relying on an outdated toolchain. This gets even worse if you rely on Swift development snapshots, where the ABI is free to unexpectedly change without giving you notice. Luckily, Swift 5.7 is set to become something we can rely on for AutoDiff for a while. We could stick to using the Swift 5.7 release toolchain for a long time, and update our binaries when we move on to Swift 5.8. For quantity of work, we also have to actively maintain the binaries on each repository. If someone stops doing this after a while, we might need a mechanism to override the stale binaries and force it to recompile. (b) I strive to be backward-compatible. As described on the Swift Forums thread, there are a lot of decisions to make. Some may result in magic commands in Jupyter notebooks, or third-party users caching binaries in their own repositories. If I change my mind on a single decision, then I may introduce API or ABI breakage. Also, check a PR to PythonKit about how far I went to preserve API stability. That's not to say we can't deal with ABI breakage; this is experimental. But I feel like you already have a working alternative to all this online binary caching stuff. Compile the library once on your local machine, using the most aggressive optimizations. It might take minutes, but so do things like downloading a Swift development snapshot or making a cup of coffee. The library might update every week or so, at which point you must compile again. And if you want to pull from the master branch instead of a stable release, just compile in debug mode without optimizations! I'm not ruling out the hosting of online binaries, but I want to wait a great deal of time before committing to that. We should see whether the current solution has bottlenecks that noticeably affect productivity and warrant changing Swift-Colab's library importing mechanism. |
To start off, a good goal is a kernel that just relays the cell’s input as it’s output. It doesn’t matter what language you write the dummy Jupyter kernel in; it could even be in Shell (if that’s possible). We just need experimentation and a proof of concept before diving deeper. The final product will be a JIT-compiled Swift binary that might be called from the command line or from a Python script. Until we have that setup and a mechanism for locating Swift toolchains, you could take the old swift-jupyter approach and write the kernel in Python. It’s up to your preference which language it is. |
JupyterLab could present a security risk, because it could modify files on your personal computer. Is it possible to run JupyterLab within an encapsulated “venv”? |
For certain it can run in docker, and maybe in venv (not sure). That said, it presents a security risk only in so far that it runs as you directly on top of your system. I’ve started to look at the instructions on installing swift-colab in gg colab to install it in a local jupyterLab env, but halted when I saw that you are doing some overwriting of Jupyter kernels :). Next I plan to spawn a dockerized env and try. Haven’t gotten round to it yet. |
As long as it's physically possible to utilize hardware acceleration from JupyterLab, we can delay that to some time in the future. For now, it's okay if we use Docker. I think it's a virtual machine all the way down to the assembly level, so AMX and OpenCL won't be accessible from there. But it will help Swift-Colab break up its hard-coded URLs and assumptions about the OS. In the long and distant future, we'll use Metal on an M1 Max GPU, performing both hardware-accelerated machine learning and hardware-accelerated linear algebra, in a Jupyter notebook. |
@philipturner : I'm performing the first steps of the dockerization (*), and I hit this: https://github.com/ratranqu/swift-colab/runs/6967462426?check_suite_focus=true#step:6:1121 (*): I basically start from a simple Jupiter image (jupyter/base-notebook) and try to run the |
That's a crash inside the file
Could you narrow the location of the bug? Just litter You will also probably have to rewrite a lot of Swift code that assumes a Python kernel already exists. Good job on the design for the Dockerfile! It is awesome having |
@ratranqu Thanks for your insights. I have built up sufficient motivation to take action personally, and plan to support JupyterLab in a future release. I recently modified the README (see #21 (comment) for a pointer to what changed). I don't plan to invest time supporting Docker, but it's not off the table. Even though Docker is CPU-only, I would incorporate support if you or someone else worked on it. Thus, this GitHub issue should stay open. Do you have any comment? |
In the recent Swift Numeric call, the idea of lowering the bar for scripting in Swift in the context of Data Science was surfaced, which lead to a suggestion to see if we could dockerize swift-colab.
From reading through the swift-colab repo, I see the following possible directions:
Each of the above would lower the bar of using swift-colab, as well as allow use in stand alone Jupyter environments.
Were you thinking about something else?
The text was updated successfully, but these errors were encountered: