reduceByKey Spark

Basically reduceByKey function works only for RDDs which contains key and value pairs kind of elements(i.e RDDs having tuple or Map as a data element). It is a transformation operation which means it is lazily evaluated. We need to pass one associative function as a parameter, which will be applied to the source RDD and will create a new RDD as with resulting values(i.e. key value pair). This operation is a wide operation as data shuffling may happen across the partitions.

val x = sc.parallelize(Array((“a”, 1), (“b”, 1), (“a”, 1),(“a”, 1), (“b”, 1), (“b”, 1),(“b”, 1), (“b”, 1)), 3)

val y = x.reduceByKey((accum, n) => (accum + n))


