Skip to content

Latest commit

 

History

History
31 lines (18 loc) · 1.44 KB

spark-streaming-transformeddstreams.adoc

File metadata and controls

31 lines (18 loc) · 1.44 KB

TransformedDStream

TransformedDStream is the specialized DStream that is the result of transform operator.

It is constructed with a collection of parents dstreams and transformFunc transform function.

Note
When created, it asserts that the input collection of dstreams use the same StreamingContext and slide interval.
Note
It is acceptable to have more than one dependent dstream.

The dependencies is the input collection of dstreams.

The slide interval is exactly the same as that in the first dstream in parents.

When requested to compute a RDD, it goes over every dstream in parents and asks to getOrCompute a RDD.

Note
It may throw a SparkException when a dstream does not compute a RDD for a batch.
Caution
FIXME Prepare an example to face the exception.

It then calls transformFunc with the collection of RDDs.

If the transform function returns null a SparkException is thrown:

org.apache.spark.SparkException: Transform function must not return null. Return SparkContext.emptyRDD() instead to represent no element as the result of transformation.
	at org.apache.spark.streaming.dstream.TransformedDStream.compute(TransformedDStream.scala:48)

The result of transformFunc is returned.