Pair RDD operations are the real time operations which we use in projects. We can solve all real time issues using pair RDD's.
In distributed environment, to handle complex problems, we can't go with just value based approach. We should also have a key associated with it.
Remember in Map Reduce, internal calls will happen using record which is a Key, Value pair. In RDBMS, multiple columns will be available associate to one primary key.
Record <key, Value> Values can be multiple but it will associate with a Key
Example :
scala> val namesrdd = sc.parallelize(List("raj", "venkat", "sunil", "kalyan", "anvith", "raju", "dev", "hari"), 2)
namesrdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[38] at parallelize at <console>:23
scala> val prdd1 =namesrdd.map(x => (x, x))
prdd1: org.apache.spark.rdd.RDD[(String, String)] = MapPartitionsRDD[39] at map at <console>:23
scala> val prdd2 =namesrdd.map(x => (x, x, x))
prdd2: org.apache.spark.rdd.RDD[(String, String, String)] = MapPartitionsRDD[40] at map at <console>:23
Example for creating a paired RDD :
scala> val prdd3 = namesrdd.map(x => (x.length, x))
prdd3: org.apache.spark.rdd.RDD[(Int, String)] = MapPartitionsRDD[41] at map at <console>:23
scala> prdd3.collect().foreach(println)
(3,raj)
(6,venkat)
(5,sunil)
(6,kalyan)
(6,anvith)
(4,raju)
(3,dev)
(4,hari)
Another couple of examples for pair RDD :
scala> val prdd3 = namesrdd.keyBy(x => x.length)
prdd3: org.apache.spark.rdd.RDD[(Int, String)] = MapPartitionsRDD[42] at keyBy at <console>:23
scala> prdd3.collect().foreach(println)
(3,raj)
(6,venkat)
(5,sunil)
(6,kalyan)
(6,anvith)
(4,raju)
(3,dev)
(4,hari)
scala> val prdd3 = namesrdd.groupBy(x => x.length)
prdd3: org.apache.spark.rdd.RDD[(Int, Iterable[String])] = ShuffledRDD[44] at groupBy at <console>:23
scala> prdd3.collect().foreach(println)
(4,CompactBuffer(raju, hari))
(6,CompactBuffer(venkat, kalyan, anvith))
(3,CompactBuffer(raj, dev))
(5,CompactBuffer(sunil))
Thanks,
Arun Mathe
Email ID : arunkumar.mathe@gmail.com
Contact ID : 9704117111
Comments
Post a Comment