TransformedDStream

TransformedDStream is the specialized DStream that is the result of transform operator.

It is constructed with a collection of parents dstreams and transformFunc transform function.

Note	When created, it asserts that the input collection of dstreams use the same StreamingContext and slide interval.

Note	It is acceptable to have more than one dependent dstream.

The dependencies is the input collection of dstreams.

The slide interval is exactly the same as that in the first dstream in parents.

When requested to compute a RDD, it goes over every dstream in parents and asks to getOrCompute a RDD.

Note	It may throw a `SparkException` when a dstream does not compute a RDD for a batch.

Caution

FIXME Prepare an example to face the exception.

It then calls transformFunc with the collection of RDDs.

If the transform function returns null a SparkException is thrown:

org.apache.spark.SparkException: Transform function must not return null. Return SparkContext.emptyRDD() instead to represent no element as the result of transformation.
	at org.apache.spark.streaming.dstream.TransformedDStream.compute(TransformedDStream.scala:48)

The result of transformFunc is returned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark-streaming-transformeddstreams.adoc

spark-streaming-transformeddstreams.adoc

TransformedDStream

Files

spark-streaming-transformeddstreams.adoc

Latest commit

History

spark-streaming-transformeddstreams.adoc

File metadata and controls

TransformedDStream