Jupyter

The Kotlin Spark API also supports Kotlin Jupyter notebooks. To it, simply add

%use spark

to the top of your notebook. This will get the latest version of the API, together with the latest version of Spark. To define a certain version of Spark or the API itself, simply add it like this:

%use spark(spark=3.2, v=1.1.0)

NOTE: You need kotlin-jupyter-kernel to be at least version 0.11.0.83 for the Kotlin Spark API to work. Also, if the %use spark magic does not output "Spark session has been started...", and %use spark-streaming doesn't work at all, add %useLatestDescriptors above it.

Inside the notebook a Spark session will be initiated automatically. This can be accessed via the spark value. sc: JavaSparkContext can also be accessed directly. The API operates pretty similarly.

There is also support for HTML rendering of Datasets and simple (Java)RDDs. Check out the example as well.

To use Spark Streaming abilities, instead use

%use spark-streaming

This does not start a Spark session right away, meaning you can call withSparkStreaming(batchDuration) {} in whichever cell you want. Check out the example.

Introduction
Quick Start Guide
Column functions
Streaming
Tuples
Jupyter
UDF
API Reference
Kotlin DataFrame interoperability

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jupyter

Clone this wiki locally