-
Notifications
You must be signed in to change notification settings - Fork 9
Provides a JupyterDisplay protocol package? #21
Comments
This sounds like a good idea. I can look into it when I get around to the next major update to Swift-Colab. But probably for the next few months, I’ll be occupied with the Metal/GPU side of my project. We will probably need to wait until changes can be made to Swift-Colab, because Jupyter display stuff is deeply integrated into the kernel. I removed the |
I saw that your fork supports local Jupyter notebooks, a feature I would like to integrate into Swift-Colab via JupyterLab (Swift- Could you provide a quick explanation of what’s been going on with the swift-jupyter fork lately? What bug fixes, changes to the README, etc. have you added? I’d like to help your contributions reach a larger audience, and I can leverage the large traffic Swift-Colab gets (48 vs 9 stars). We can start off with a link in my README directed to your fork, but eventually I might merge your work in entirely into Swift-Colab 3.0. As a time frame of what I’ve said previously when conversing with you:
|
Instead of a Swift package dependency, this library could be a system module imported like Apple’s Foundation library. You don’t compile that with SwiftPM; it just comes with the toolchain. Same thing with this. In v2.3, I’m planning a way to link modules into each other like they’re system modules, enabling the compilation of https://github.com/s4tf/models. This idea can be repurposed for the package you propose, and I think it will be easier to maintain that way. Plus, more flexibility because I can swap out the backend implementation at any time. Similar to my idea for the Metal and OpenCL backends for S4TF, which use an API-stable C-like interface and virtual function calls. |
Interesting. Yeah, I went to a different route by not using
I recently become interested in using notebooks to do more GUI heavy stuff besides some data crunching. Most work in
One thing I liked yours is the independency LLDB integration without Python. The CI is flaky recently and packages were published without the Python LLDB plugin (I guess that will be more of a problem for VSCode Swift extension than notebooks? Why no one report that earlier is super strange). Are you using Bazel for internal development too? Curious why it is slated to v2.3. There are definitely quite a bit more weirdness with Bazel than SwiftPM (which natively supports compile framework) than I anticipated when I did it. I am not certain that I have solved it all though. Seems work for my repo now (which is nontrivial, as it has interactions with several C heavy / Swift heavy libraries), but in the past every time I use it extensively, there are bugs pop up somehow (mostly around compilation flags, linker flags or missing module.modulemap for C libs). |
I have only ever used Bazel to compile tensorflow/tensorflow, but not to compile Swift code. I am not very familiar with it and do not use it internally. I wanted to put it v2.3 because that's less difficult than adding local notebook support. I was also going to change the inline SwiftPM package spec parser, supporting multiple types of syntax. I prefer the terse %install '.package(url: "https://...", .branch("main")' // 4.2 style, currently only supported
%install '.package(url: "https://...", branch: "main")' // 5.0 style, coming in v2.3
%install '.bazel(url: "https://...", anotherArgument: "@PythonKit//:PythonKit")' // coming in v2.3
%install '.cmake(...)' // distant future; Swift-Colab is easily extensible to more build systems This differs form your fork's new
That was one of the big reasons why I rewrote Swift-Colab to create v2.0. Development toolchains kept crashing, and I didn't know why. I assumed that only release toolchains could be compatible with Swift-Colab. Apparently, dev toolchains were compatible but I had to rewrite the LLDB dependency from scratch. I had a long-term vision of compiling S4TF on Colab months down the road, which only supported dev toolchains at the time. That was a good choice, because I think Swift 5.6 dropped the Python bindings.
From my experience, SwiftPM is quite buggy. I reported two bugs on the swift-package-manager repo, along with an issue in swift-colab.
Working with buggy build systems is nothing new to me, but I am concerned about immediate download time. Google Colab has an incredibly fast network bandwidth, so downloading Swift toolchains is fine. I found that downloading via
I want to adapt the mindset of "zero tolerance for latency" to local notebook support, which should create the best possible user experience. For example, users should not need to compile LLDB Python bindings themselves; that takes minutes. The SwiftPM engine is really powerful in this respect, because it compiles in debug mode by default. Bazel should not compile in fully-optimized mode by default, because that increases compile time. However, we should provide a well-documented mechanism to enable release-mode compilation with Bazel. The README can yell out "enable this option because it's fastest!" |
Is there any chance we can translate SwiftPM flags into Bazel build options? For example,
When I designed Swift-Colab, I thought from the end-user's experience more than anything else. They are often impatient and don't know as much about SwiftPM/command-line stuff as we do. However, I can't change the semantic purpose of certain features just for the sake of ergonomics. The idea above would change the semantic purpose of Balancing theoretical correctness with ergonomics led to some other wierd decisions previously. For example, I included a Python code snippet for restarting the runtime, but only on the README. It is not part of template notebooks. You can expect a lot of deliberation and caution over little changes like this to Swift-Colab. This quality assurance stems from me trying to create the "authentic"/"official" continuation of Swift for TensorFlow. What would Google do if they were making this repository? |
I can go more detail into how I use Bazel, and in general how it works if you want. Basically, it is unlike SwiftPM which has a vibrant GitHub presence that you can simply include a URL and pulling in dependencies. However, it is great if you have a projects that has mixed-language dependencies, such as mine, which mixed C libraries, Python libraries and tied together into a Swift binary (or notebook).
TBH, this is probably an oversight. Last time this happened for 5.4, I pinged in #swift channel and the CI were fixed, so we have it in 5.5 :)
Ha, nothing is easy then. I suspect REPL mode is used by very few especially the library support. But hopefully Swift Playground on various platforms can help this.
Yeah, at that point, probably have a common cache for people to mount :) TBH, the speed improvements when linking to optimized version v.s. debug version is night-and-day.
Yeah, Bazel have 3 build modes for all its supported languages by default (that includes Rust, Go, C, C++ and Swift):
Agreed. Hence I asked why Bazel. It is not exactly a popular choice. However, it does give you some advantages, such as setup a consistent environment. Now, if you want a quick starter on Bazel as a build system for Swift code: Your repo needs to have a You specify the code to build with
You can build, or run (if it is a executable target) with bazel command-line:
Here is the thing that probably doesn't map well mentally between Bazel / SwiftPM: SwiftPM has only one Minor but often important, you can put some bazel build parameters in Here is what my setup looks like when using Bazel + swift-jupyter:
Once setup, this is great for me. However, as you can see, it differs from mental model when operating with SwiftPM. Most notably: with SwiftPM, the notebook can operate almost independently, it doesn't need a repo with Although you can, in theory, just create a temporary directory with All these, not to mention that not many Swift projects have no Bazel BUILD files, and the support to directly use SwiftPM Package.swift through https://github.com/cgrindel/rules_spm is not well-tested for non-Apple platforms. |
That's basically It definitely seems like Bazel is different enough from SwiftPM to warrant a I think I'll leave the work of Bazel support to your fork for the time being. I'm not ruling it out; I just have finite time and other priorities right now. Swift-Colab v2.3 can just focus on the newest build system modifications to support compiling S4TF on release toolchains. |
I linked your repository in the README - would you mind reading my additions (+ drop-down) and telling me if there's anything that can be written better? The new text appears between the Table of Contents and |
Thank you! It reads great! |
I'd prefer if we interface with libraries using virtual function calls. For example, a top-level function could take an For example, we might extend the // Encoding happens behind the scenes; the calling library submits uncompressed frames.
// Always let the user decide the codec.
enum JupyterVideoCodec {
case motionJPEG
case automatic // backend chooses the fastest codec
// case mpeg4 - might be added in the future
}
struct JupyterVideoConfig {
var width: Int
var height: Int
var codec: JupyterVideoCodec
// Best practice: rendering library intercepts this variable and sets the timestamp.
var timestamp: Double?
}
protocol JupyterDisplayProtocol {
...
/*static*/ func submitVideoFrame(bytes: Data, config: JupyterVideoConfig)
// Perhaps also a feature for accepting keyboard/mouse input?
/*static*/ func addEventListener(_ body: @escaping (???) -> Void)
/*static*/ func setEventListener(for event: ???, _ body: @escaping (???) -> Void)
} A library function for rendering might do this: import JupyterDisplayModule // pre-installed into the system
func renderGLVideo(display: any JupyterDisplayProtocol, config: JupyterVideoConfig) {
var receivedEvent = false
display.addEventListender { receivedEvent = true }
// Loop ends when the user manually interrupts cell execution, so the program should save game simulation progress accordingly.
while true {
usleep(16_000) // Interrupt might stop the program here.
if receivedEvent { /*Do something*/ }
// Base its size on the video config's `width` and `height`.
let rawFrameBuffer = Data(...)
var currentFrameConfig = config
currentFrameConfig.timestamp = ... // Optional; just enhances encoder's ability to synchronize frames.
display.submitVideoFrame(bytes: rawFrameBuffer, config: currentFrameConfig)
}
} Usage in a Jupyter notebook: import SomeLibrary
setupSimulation()
// Allow interop between IPythonDisplay <-> JupyterDisplay either by:
// (a) Making them typealiases of each other.
// (b) Using conditional compilation for Colab vs. local notebook.
// (c) Retaining `IPythonDisplay` for backward-compatibility, but introducing a new
// `JupyterDisplay` type to Swift-Colab.
let config = JupyterDisplayConfig(
width: 720,
height: 1280,
codec: .motionJPEG)
// Several possibilities for how to input the local display variable:
renderGLVideo(display: IPythonDisplay.current, config: config)
renderGLVideo(display: JupyterDisplay.current, config: config)
renderGLVideo(display: jupyterDisplay, config: config)
renderGLVideo(display: JupyterDisplay.self, config: config)
// Alternatively, the user can render frames with custom code:
while true {
usleep(...)
JupyterDisplay.current.submitVideoFrame(...)
} |
Yes. The inspiration is mostly from Python side, where things like matplotlib or tdqm can identify Jupyter environment automagically and act accordingly. Having explicit pass in from library point view is great. But I maintain that also provides a singleton access would enable that "automagic" feeling for library users (they don't need to pass in anything to get the benefit). Thinking about the tdqm use cases, if we have a Swift equivalent: for i in tdqm(someCollection) {
} that can render progress bar in Jupyter correctly, it is automagic. If we require a pass in: for i in tdqm(someCollection, display: IPythonDisplay.current) {
} it is less so magical, but OK. However, if we wrap it one layer deeper: final class SomeTrainer {
public func train() {
for i in tdqm(data) {
}
}
}
let trainer = SomeTrainer()
trainer.fit() comparing with: final class SomeTrainer {
public func train(display: any JupyterDisplayProtocol) {
for i in tdqm(data, display: display) {
}
}
}
let trainer = SomeTrainer()
trainer.fit(display: IPythonDisplay.current) Much more needle to threading through now. I think for libraries that require such rigor, having dependency passed in is great. But for libraries that don't, have an escape hatch (a global singleton access) would be good too. |
In that case, I could just create a system library with a concrete type called We may not need a protocol to encapsulate the functionality because there's one concrete type. Would you be fine if your current rendering notebooks are not backward-compatible, and have to be written after the final iteration of this design? Also, I should just typealias |
I think you focused too much on streaming aspect of it :) The proposal for protocolize it is to encapsulate notebook differences, if there are any, when displaying images or text or htmls. For example, the concrete implementation in mine |
Then we can just add static methods to the We already have: swift-colab/Sources/include/EnableIPythonDisplay.swift Lines 81 to 93 in 405e271
How about we change your Click to expand/// A display manager for Jupyter notebook.
///
/// This provides necessary methods for other Swift libraries to introspect whether you can
/// display from within a Jupyter notebook, and methods to send HTML, images or texts over
/// to the notebook.
public enum JupyterDisplay {
/// flush display messages to the notebook immediately. By default, these messages are
/// processed at the end of cell execution.
public func flush() {}
/// Display a base64 encoded image in the notebook.
public func display(base64EncodedPNG: String, metadata: String?) {}
/// Display a snippet of HTML in the notebook.
public func display(html: String, metadata: String?) {}
/// Display plain text in the notebook.
public func display(text: String, metadata: String?) {}
/// Check whether the notebook is active and accepting display messages.
public private(set) var isEnabled: Bool = false
/// Get the cell number.
public var executionCount: Int { 0 /*TODO: Return actual cell count*/ }
public func enable() {
// TODO: Some other logic
JupyterDisplay.isEnabled = true
}
}
extension JupyterDisplay {
/// Display a base64 encoded image in the notebook.
public func display(base64EncodedPNG: String) {
display(base64EncodedPNG: base64EncodedPNG, metadata: nil)
}
/// Display a snippet of HTML in the notebook.
public func display(html: String) {
display(html: html, metadata: nil)
}
/// Display plain text in the notebook.
public func display(text: String) {
display(text: text, metadata: nil)
}
}
public typealias IPythonDisplay = JupyterDisplay Then import JupyterDisplay
extension JupyterDisplay {
static func foo() {} /* methods for communicating with the kernel */
}
JupyterDisplay.enable() Also, unify IPythonDisplay.swift and JupyterDisplay.swift by parsing the filename before executing the |
Also, would you mind reimplementing your logic for HTTP streaming using Python libraries, with PythonKit installed as a system library (or regular package dependency for now) for all conformant packages? Then source code would work across both Colab and local notebooks, swapping the two different implementations at runtime. You could try different build configurations if this doesn't work out verbatim. #if canImport(PythonKit)
let jpegDependency1 = try? Python.attemptImport("jpegDependency1")
#endif
#if canImport(NIOCore)
// alternative implementation
#endif That puts the burden of streaming logic on the package designer, not the Jupyter kernel. Maybe the kernel can include streaming logic in the future, but this approach provides more flexibility at the moment. |
We may talk about slightly different things. I am not interested in how the file we use for Thus, when you use it in the notebook, it looks like this: import Gym
%include "EnableJupyterDisplay.swift"
let viewer = MuJoCoViewer()
viewer.render() If it is not in notebook, this will create a window and render a image. If it is in notebook, this will pipe an image through the notebook. To implement this kind of support, viewer.render(with: IPythonDisplay.current) In this way, we still provides a protocol, but library doesn't need to talk to the global instance, and feels "better engineered". As I elaborated earlier, this may not feel as magical as Python, so it is a balanced act. |
So what you're saying is: That package has a mutable variable called // EnableJupyterDisplay.swift
import JupyterDisplay
// Renamed the existing `JupyterDisplay` type to `_JupyterDisplay`
enum _JupyterDisplay {
... // already existing code
}
extension _JupyterDisplay: JupyterDisplayProtocol {}
JupyterDisplay.JupyterDisplay = // something that delegates to `_JupyterDisplay` // EnableIPythonDisplay.swift
import JupyterDisplay
enum IPythonDisplay {
... // already existing code
}
extension IPythonDisplay: JupyterDisplayProtocol {}
JupyterDisplay.JupyterDisplay = // something that delegates to `IPythonDisplay` That would be very flexible, although I'm not sure we should expose a global variable like that. Perhaps we at least make it immutable at face-value, but provide an API to mutate it. For example: extension JupyterDisplayProtocol {
public func _setJupyterDisplayInstance(_ instance: any JupyterDisplayProtocol) {
// Optionally do some cleanup on the existing instance.
// Optionally do some validation on the input instance.
JupyterDisplay = instance
}
} I think I'm getting it now, but I'd like to consider alternative approaches as well. For example, we could substitute a protocol with an API that just swaps out function pointers, which a static For example, we could do the following: protocol JupyterDisplayProtocol {
// non-static function
func display(base64EncodedPNG: ...)
}
enum JupyterDisplay {
private var delegate: JupyterDisplayProtocol = EmptyJupyterDisplay()
public func _setCurrentInstance(_ instance: JupyterDisplayProtocol) {
Self.delegate = instance
}
// static function
static func display(base64EncodedPNG: ...) {
delegate.display(base64EncodedPNG: base64EncodedPNG)
}
static var isEnabled: Bool {
delegate.isEnabled
}
} |
You stated above that we could make a neutral organization for hosting the Swift package. I'm not yet sure that's the best path forward. If it's specific to the Jupyter kernel and maintained alongside the Jupyter kernel, perhaps it should be embedded inside each Jupyter kernel's repository. Creating an external repo is an irreversible decision; if we change our mind later, people now have Swift package dependencies to a stale repository (or API breakage). Would you mind trying out alternative options first? In previous comments, I had the ideas of embedding #if canImport(JupyterDisplay)
import JupyterDisplay
#endif
extension MuJoCoViewer: Renderable {
public func render() {
#if canImport(JupyterDisplay)
if JupyterDisplay.isEnabled {
// Check to see if we launched the render server yet.
if httpChannel == nil {
httpChannel = try? renderServer.bind(host: "0.0.0.0", port: .random(in: 10_000..<20_000))
.wait()
}
if JupyterDisplay.executionCount != renderCell {
JupyterDisplay.display(html: renderServer.html)
JupyterDisplay.flush()
renderCell = JupyterDisplay.executionCount
}
}
#endif
...
}
} Another concern is how to detect a module's dependency to the |
Also, why can't we use IPythonDisplay in local Jupyter notebooks? Your README said it crashes after a few dozen cell executions, but I couldn't find any more information on the crash. |
Yes, there are several ways to implement this. It is a kind of dependency injection pattern. The important thing is to expose a protocol-only interface module to other libraries and the heavy-weight dependency can be injected from notebook end. Your
That's correct, SwiftPM doesn't support system module detection, due to it is a very cross-compilation friendly nature (iOS always cross-compiled on a macOS system). That's why have a protocol-only library as the root dependency is ideal actually.
TBH, I am a bit burned by all weirdness of Swift REPL. The last shenanigan I found is that by linking to liblinear, for loop will crash with Swift 5.6.x REPL. The |
I'm deciding to take a break from my Metal backend for S4TF, and complete Swift-Colab v2.3 now. For starters, I resolved #17. I'm also going through the documentation and slowly completing it, which I've been too lazy to do for a while. I plan to add two new magic commands:
The current documentation of |
I see what you mean now. This allow Colab to inject a concrete implementation directly (or not, can still be a protocol-only library and inject later, but it seems to be the purpose). I need to think a bit to weigh the pros and cons for this approach. It definitely have tensions with how traditionally dependency-injection was done, or with how Bazel / SwiftPM manages its dependencies (both prefers concrete dependencies, rather than phantom ones (i.e. the dependencies declared in Package.swift, rather than implicit ones doesn't show up anywhere). For one, it makes unit tests your code that specializes with JupyterDisplay harder to do (previously, you can inject your FakeJupyterDisplay and run |
If possible, I encourage the user to create a concrete dependency in This feature augments the "clever idea" to bypass S4TF's long compile time, where you rewrite a notebook's contents without recompiling S4TF. A user would have just looked through several tutorials with their one S4TF instance compiled. Now they encounter the tutorial that runs Other than this specific use case, and
My unit tests are a bunch of Colab notebooks (see Testing). You test that the code works by firing up Swift-Colab and running the notebook cells. This setup does not permit running However, I would like to add a dedicated SwiftPM testing feature. I often set certain environment variables before running SwiftPM tests, which |
You probably already thought through this many times, but have you tried to just support a generic cache directive? This is often supported in CI environments to make CI jobs faster. SwiftPM have everything checked out and compiled under |
I recently though of using Google Drive to cache build products, but it doesn't solve the problem that necessitates This is unlike the Google Drive caching seems like a good analogue to local filesystem caching, because it stores data over long periods of time. I am revamping the experience of mounting Google Drives, and just bypassed a major restriction around calling |
The restriction on Google Drive was that I couldn't mount the drive from Swift code. I had to switch over to the Python Jupyter kernel, mount the drive, then switch back. I am bypassing this mechanism by forwarding For reference, here is the serialization mechanism. Encode 8 bytes, which are an |
I was actually talking about provide a public accessible cache, so whether it is compiled separately or compiled into a megapackage will be the same. And for first-time user, it will be as fast because you pull that from public cache everyone uses. Whether it is from GDrive or a S3 bucket is an implementation detail.
Similar, I haven't removed the KernelCommunicator method yet, as it seems more efficient (avoid formatting and copying). But I do provided an alternative |
I don't think that's a very good idea. You would have to cache exponential variations of different exact Swift toolchain + different build flags + different version of the external Swift package. I'm not even sure you can distinguish each unique environment. If one factor is different, the However, making a cache just for the environment combos found in tutorial notebooks could artificially boost performance. It would make tutorials go faster for beginners, but it doesn't augment power users' workflows. In other words, it's optimizing for benchmark performance.
KernelCommunicator pipes the data through the file system, which has the same overhead as the new mechanism I propose (which also pipes). I think it also serializes the data before copying, which is what my new mechanism does. KernelCommunicator got painful when I had to marshal the data through LLDB's C++ API, rather than its Python API. I had to make a recursive serialization protocol just for one Swift <-> C++ bridging function, which used exponentially expanding buffers while encoding - written in straight C. |
Depends on how SwiftPM's .build directory structured, I didn't read too much into their code to know how they keep these directories up-to-date. Sharing cache for Bazel is safe and efficient (due to content hashing). If the cache is on object / library level, it will benefit both beginners and power users because the later can still reuse big chunk of compiled code. Anyway, SwiftPM is out of my expertise, just want to clarify some of these claims might be build system specific, not an inherited limitation of doing public read-only cache itself.
Hmm, that doesn't match what I saw in KernelCommunicator.swift inherited from google/swift-jupyter (https://github.com/google/swift-jupyter/blob/main/KernelCommunicator.swift#L60)? For that file, it serializes into JSON and pass the pointer of that serialized JSON back to LLDB. In LLDB, it can use read memory method to read it back. By saving one memcpy, I meant that. But I guess it may not worth it though. |
I was thinking that the LLDB process is a separate system-level process, which you spawn and communicate with over pipes. I wasn't thinking correctly, because LLDB does run everything in the current process; it's just a bunch of library calls. That means you could access stuff stored in raw memory. However, the lifetime overhead of each data segment is probably dominated by disk writing/piping in |
Yeah, once it is to |
I plan to add a I should have explained this, but I like to hop around between components of the S4TF project. It's so massive in scope, that I can get bored of one part and move on to another. When I work on Swift-Colab, I usually spend a week or two making several disparate changes, culminating in a minor release. The changes I'm discussing in this thread are achievable within that time frame, but local notebook support is not. Thus, local notebook support will come in the round after this, circa early 2023. I'm also simultaneously building SwiftOpenCL* in staggered intervals (April 2022, June/July 2022, probably September 2022), patching up s4tf/s4tf gradually, and creating the Metal backend for S4TF. More info here, which is linked on my GitHub profile. A few days ago, I switched gears from the Metal backend to Swift-Colab.
|
Seems we are the only two people use Swift with notebooks. Anyway, I think we need to collaborate on a new header-like library called JupyterDisplay. I have a prototype: https://github.com/liuliu/swift-jupyter/blob/main/display/JupyterDisplay.swift
It is really simple. The idea is that any other libraries can depend on this library and provide notebook rich display support. If you looked at Python packages, many of them "battery-included" for notebook displays either inside the main package or in the support package.
So far, the
EnableJupyterDisplay.swift
implementation took hands into itself and provides backends for matplotlib or Python or SwiftPlot. That is really limited.On the technical side, this package simply exposes a
JupyterDisplay
instance thatEnableJupyterDisplay.swift
can switch to a different implementation, thus, enabling the said behavior.Let me know what do you think. I mainly want to propose moving this into a "neutral" org and package it lightly (like package to SwiftPM and Bazel) and everyone can use.
The text was updated successfully, but these errors were encountered: