[Spark-User] DataFrame python UDF performnce too slow

Hi,
I am running Spark 1.6.0 on EMR. The job fails with OOM.I have DataFrame
with 250 columns and I am applying UDF on more than 50 of the columns. I am
registering the DataFrame as temptable and applying the UDF in hive_context
sql statement. I am applying the UDF after sort merge join of two DataFrame
(each of around 4GB) and multiple broadcast joins of 22 Dim table.
Below is how I am applying the UDF.
data_frame.registerTempTable("temp_table")
new_df = hive_context.sql("select
python_udf(column_1),python_udf(column_2), ... , from temp_table")
There is Jira for the same issue (
https://issues.apache.org/jira/browse/SPARK-8632) which is resolved for
1.6.0 but I am running into the similar issue.
Thanks,
Bijay

Reply To : DataFrame Python UDF Performnce Too Slow

asked Mar 24 2016 at 09:20

Bijay Pathak

Related discussions

Python + MongoDB - Cursor Iteration Too Slow ?

I have a database find query which returns 150k documents where each document contains three integer fields and one datetime field. The following code attempts to create a list from the cursor object. Iterating the cursor is incredibly slow - about 80 seconds! The same operation via the C++ drivers is orders of magnitude faster - it must be an issue with PyMongo? client = MongoClient() client = MongoClient

Ndb Is Too Slow?

Hi everybody, I ran an open-source banner serving softwar (phpAdsNew) using both ndb and innodb engine. The innodb engine is the one that the developers of phpAdsNew suggest for using. With a (relatively) small load on it I noticed that using the ndb engine, database access was very slow, it was taking a lot of time for a banner to be server, and in a 80% no banner was served at all. Then I

MySQL Too Slow

i have MSAccess table with 10000 records without any index i import same table in MySQL without any index When i access data from this table through ODBC in ColdFusion: Data from Access took 14 Sec and Data from MySQL took 24 Sec MySQL seems to be halfway back....?? Does this the case realy?? Comments??

MySQL Too Slow....

i have a table with 10000 records in MS Access and MySQL with no index in either database. I query both tables from ColdFusion using ODBC datasources and Data from Access took 13sec to display while Date from MySQL took 23sec to come up MySQL seems to be half way slow....???? should i use index etc???? comments???

Query Too Slow

Hi, I have the following query and everytime i try to run it, it returns an error "Mysql server has gone away". Is it beacause the query is very slow? If so, how can i speed it up? Query -> SELECT a.idemail, a.fklastresp FROM wmkt_email a, wmkt_client b, wmkt_maillist_client c WHERE bActive AND b.fkemail NOT IN (1, 2 ) AND c.fkmaillist IN (2) AND a.idemail=b.fkemail AND c.fkclient=b.

Query Executes Too Slow

I have a table with 2 fields which I, every fifteen minutes, load new data into. The data is not formatted by me. It comes from an external source, so when the data is added to the table, the ID doesn�t come in the right order. The table looks like this: +-----+--------------+ | ID | headline | +-----+--------------+ | 1 | head1 | +-----+--------------+ | 2 | head2 | +-----+--------------+ |

DomDaoManagerBuilder.Configure Is Too Slow

An error occured while fetching this message, sorry !

Hadoop Too SLow..

Hi, I have got a cluster with 3 machines : 1 Master and 2 slaves I set -the mapred max tasks 5 -mapred map tasks 17 and reduce tasks 2 I start crawl with depth 2 topN 2 but it runs approximately 25 minute. I start a local crawl with 1 computer and it finishes in 2 minutes . The difference is very big .Is it normal or am i wrong in configurations. I tryed with different

Remote Mysql Too Slow

Hello, I have been provided a muscular linux server to use as a Mysql server in our organization. The server is located just beside the web server and within the same network. This dedicated server has 8GB RAM, i5 processors and running mysql as service. No apache, php ..... nothing. All resources are dedicated to mysql only. Mysql version - mysql Ver 14.14 Distrib 5.1.49, for debian

Innodb Too Slow

Hi, why innodb queries work MUCH slower (100 times) than if the table was of myisam type? It's mysql 4.0.3 on FreeBSD 4.5 server. The innodb monitor outputs many messages like the following, why are they there and what do they mean? Purge done for trx's n:o < 0 782 undo n:o < 0 0 Total number of lock structs in row lock hash table 0 LIST OF TRANSACTIONS FOR EACH SESSION: MySQL thread

DataFrame Python UDF Performnce Too Slow

Related discussions

Spark-dev

Spark-user