[Spark-User] Column explode a map

Hi,
Imagine you have a structure like this:
val events = sqlContext.createDataFrame(
   Seq(
     ("a", Map("a"->1,"b"->1)),
     ("b", Map("b"->1,"c"->1)),
     ("c", Map("a"->1,"c"->1))
   )
 ).toDF("id","map")
What I want to achieve is have the map values as a separate columns.
Basically I want to achieve this:
+---+----+----+----+
| id|   a|   b|   c|
+---+----+----+----+
|  a|   1|   1|null|
|  b|null|   1|   1|
|  c|   1|null|   1|
+---+----+----+----+
I managed to create it with an explode-pivot combo, but for large dataset,
and a list of map keys around 1000 I imagine this will
be prohibitively expensive. I reckon there must be a much easier way to
achieve that, than:
val exploded events.select(col("id"),explode(col("map"))).groupBy("id").pivot("key").sum("value")
Any help would be appreciated. :)

Reply To : Column Explode A Map

asked Mar 24 2016 at 12:01

=?ISO-8859-2?Q?Micha=B3_Zieli=F1ski?=

Related discussions

Map Explode

Is there an open source UDF to explode a Map (like explode for an array) that has key and value columns for each row? =20 Thanks, Quinn

Extending A Parameter Map And Column Attributes?

Why can you extends a result map but not a parameter map? Also, why can't you map a parameter in a parameter map to a specific database column so that order doesn't matter. This would allow you to do things like define a paramter map for an entire class and then only select the parts that are required for the specific statement you are interested in. Kris --------------------------------- Do

Odd Behaviour With Get_json() (or Perhaps With Explode(array))

We are running this query: select name, sum_id from ( select name, players, array(player1, player2, player3, player4, player5, player6, player7, player8) arr from ( select name, get_json_object(roster_json, '$.memberList.playerId') players, get_json_object(roster_json, '$.memberList.playerId\[0]') player1, get_json_object(roster_json, '$.memberList.playerId\[1]') player2, get_json_object(roster_json

Transpose Values Of A Column

Hi I have data in a file as follows . There are 3 columns separated by semicolon(;). Each column would have multiple values separated by comma (,). 11,22,33;144,244,344;yny; I need output data in below format. It is like transposing values of each column. 11 144 y 22 244 n 33 344 y Can we write map reduce program to achieve this. Could you help on the code on how to write. Thanks

Error While Applying A Parameter Map: Invalid Column Index.

Hi, Two problems here, first the error itself, second I haven't yet found how to get more info from Ibatis to debug this myself. Maybe I've missed something obvious. --- The error occurred while applying a parameter map. --- Check the getFoo-InlineParameterMap. --- Check the parameter mapping for the 'value' property. --- Cause: java.sql.SQLException: Invalid column index The SQL query has a long

Is It A Gud Way To Store A Map Object In Hbase Column

hi i have a huge map object, which comes from the solr query results. map contains around 400-500 key-value pairs is it a gud way to store the entire map as a value in the column. is there any particular things like column vaue size, i need to take care of or shud i store it in different columns with dynamic column names storing the map object makes task easy. but want know,which way is

Converting A Column To A Map

Hi, I have a column in my schemaRDD that is a map but I'm unable to convert it to a map.. I've tried converting it to a Tuple2[String,String]: val converted = jsonFiles.map(line=> { line(10).asInstanceOf[Tuple2[String,String]]}) but I get ClassCastException: 14/11/23 11:51:30 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 2, localhost): java.lang.ClassCastException: org.apache

Finding The Input File Of A Failed Map Task

In the JobTracker website, when I click on a JobId, there is a listing of completed maps and killed maps. When I click on the number under the column completed or killed, there is a table with columns as mentioned below. Task, Complete, Status, Start Time, Finish Time, Errors Status column is blank for Failed jobs, while for completed jobs it lists the actual inputfile/block on which this map was

Size Of A Hive Map Column In Characters!

The size(map) function is defined as follows: size(Map) Returns the number of elements in the map type What if I want the total size of the map for that row? This doesn't work: select length(MAP); How can I get the total size of a map column in either bytes or characters? Mark E. Sunderlin Data Architect | AOL NETWORKS BDM P: 703-265-6935 | C: 540-327-6222 | AIM: MESunderlin 22000

How To Add A Map Column To A Existing Hive Table

I have table which was created by the following statement create table test (a string, b string) PARTITIONED BY (p1 SMALLINT, p2 SMALLINT, p3 SMALLINT, p4 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '; but now I have to add a column to this table, the column has the type : map create table test (a string, b string, c map) PARTITIONED BY (p1 SMALLINT, p2 SMALLINT, p3 SMALLINT, p4 STRING

Column Explode A Map

Related discussions

Spark-dev

Spark-user