LocationStrategy
allows a DirectKafkaInputDStream to request Spark executors to execute Kafka consumers as close topic leaders of topic partitions as possible.
LocationStrategy
is used when DirectKafkaInputDStream
computes a KafkaRDD
for a given batch interval and is a means of distributing processing Kafka records across Spark executors.
Location Strategy | Description |
---|---|
PreferBrokers |
Use when executors are on the same nodes as your Kafka brokers. |
PreferConsistent |
Use in most cases as it consistently distributes partitions across all executors. |
PreferFixed |
Use to place particular Accepts a collection of topic partition and host pairs. Any topic partition not specified uses a consistent location. |
Note
|
A topic partition is described using Kafka’s TopicPartition. |
You can create a LocationStrategy
using LocationStrategies factory object.
import org.apache.spark.streaming.kafka010.LocationStrategies
val preferredHosts = LocationStrategies.PreferConsistent