QnaList > Groups > Spark-User > Mar 2016
faq

Column Explode A Map

Hi,
Imagine you have a structure like this:
val events = sqlContext.createDataFrame(
   Seq(
     ("a", Map("a"->1,"b"->1)),
     ("b", Map("b"->1,"c"->1)),
     ("c", Map("a"->1,"c"->1))
   )
 ).toDF("id","map")
What I want to achieve is have the map values as a separate columns.
Basically I want to achieve this:
+---+----+----+----+
| id|   a|   b|   c|
+---+----+----+----+
|  a|   1|   1|null|
|  b|null|   1|   1|
|  c|   1|null|   1|
+---+----+----+----+
I managed to create it with an explode-pivot combo, but for large dataset,
and a list of map keys around 1000 I imagine this will
be prohibitively expensive. I reckon there must be a much easier way to
achieve that, than:
val exploded events.select(col("id"),explode(col("map"))).groupBy("id").pivot("key").sum("value")
Any help would be appreciated. :)

asked Mar 24 2016 at 12:01

=?ISO-8859-2?Q?Micha=B3_Zieli=F1ski?= 's gravatar image



Related discussions

Tagged

Group Spark-user

asked Mar 24 2016 at 12:01

active Mar 24 2016 at 12:01

posts:1

users:1

Spark-dev

Spark-user

©2013 QnaList.com . QnaList is part of ZisaTechnologies LLC.