[Spark-User] RDD Partitions not distributed evenly to executors

[ CC'ing dev list since nearly identical questions have occurred in
user list recently w/o resolution;
c.f.:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html
http://apache-spark-user-list.1001560.n3.nabble.com/Partitions-are-get-placed-on-the-single-node-tt26597.html
]
Hello,
In short, I'm reporting a problem concerning load imbalance of RDD
partitions across a standalone cluster. Though there are 16 cores
available per node, certain nodes will have >16 partitions, and some
will correspondingly have

Reply To : RDD Partitions Not Distributed Evenly To Executors

asked Apr 4 2016 at 06:12

Mike Hynes

Related discussions

Determining Number Of Executors Within RDD

Hi, I want implement an RDD wherein the decision of number of partitions is based on the number of executors that have been set up. Is there some way I can determine the number of executors within the getPartitions() call?

Storm UI: Not Displayed Executors

Hello. Recently I upgraded storm to 0.9.2. In Component summary page of Storm UI, Executors is not displayed only when "emitted" of its spout/bolt is 0. Please tell me the solution. Thanks.

Executors Not Registering With The Driver

Up until last week we had no problems running a Spark standalone cluster. We now have a problem registering executors with the driver node in any application. Although we can start-all and see the worker on 8080 no executors are registered with the blockmanager. The feedback we have is scant but we're getting stuff like this suggesting it's a name resolution issue of some kind: 14/04/09

Yarn Not Running As Many Executors As I'd Like

Running on Amazon EMR w/Yarn and Spark 1.1.1, I have trouble getting Yarn to use the number of executors that I specify in spark-submit: In a cluster with two core nodes will typically only result in one executor running at a time. I can play with the memory settings and num-cores-per-executor, and sometimes I can get 2 executors running at consistently.

Lost Executors

Hi, I am using Spark 1.0.1 on Yarn 2.5, and doing everything through spark shell. I am running a job that essentially reads a bunch of HBase keys, looks up HBase data, and performs some filtering and aggregation. The job works fine in smaller datasets, but when i try to execute on the full dataset, the job never completes. The few symptoms i notice are: a. The job shows progress for a

Spark 1.1.0 Does Not Spawn More Than 6 Executors In Yarn-client Mode And Ignores --num-executors

I am running spark 1.1.0 on AWS EMR and I am running a batch job that should seems to be highly parallelizable in yarn-client mode. But spark stop spawning any more executors after spawning 6 executors even though YARN cluster has 15 healthy m1.large nodes. I even tried providing '--num-executors 60' argument during spark-submit but even that doesn't help. A quick look at spark admin UI suggests

What Parallelism Need To Set?

HI, I have simple spout and bolt ,here my spout is reading 10 files each file contains 200 mb dataand my bolt is to perform receiving tuples write into file . for checking how much time it will take in storm cluster? in my cluster i have 3 machines for nimbus ,supervisor-1 and supervisor-2 respectively. please suggest me how many workers and how many executors for spout,bolt i tried

Executors In Single Local Mode

Hi, I'm trying to execute a stream application using local[4], however I just see one executor in the web UI, shouldn't be more? one executor per worker thread? I'm trying to open connections in all the worker nodes to a mysql database and keep them open until the end of the stream. Do you guys know any better way to do this? right now I'm just trying to create static connections in each

[jira] [Updated] (HBASE-3809) .META. May Not Come Back Online If > Number Of Executors Servers Crash And One Of Those > Number Of Executors Was Carrying Meta

[ https://issues.apache.org/jira/browse/HBASE-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-3809: --------------------------------- Fix Version/s: (was: 0.94.0) 0.96.0 Moving out of 0.94. > .META. may not come back online if > number of executors servers crash and one of those > number of executors

Basic Spark Terms - Slaves, Workers, Executors

Can someone explain what each of these terms means in terms of Spark? I've a confusion between what's the difference between Slaves, workers, executors. My understanding is the slaves and workers are interchangeable ? Thanks. Can someone explain what each of these terms means in terms of Spark? I've a confusion between what's the difference between Slaves, workers, executors.�My

Lost Executors

I'm using spark 1.0.1 on a quite large cluster, with gobs of memory, etc. Cluster resources are available to me via Yarn and I am seeing these errors quite often. ERROR YarnClientClusterScheduler: Lost executor 63 on : remote Akka client disassociated This is in an interactive shell session. I don't know a lot about Yarn plumbing and am wondering if there's some constraint in play -- executors

[Streaming] Cannot Get Executors To Stay Alive

Hi, I tried a similar question before and didn't get any answers,so I'll try again: I am using updateStateByKey, pretty much exactly as shown in the examples shipping with Spark: def createContext(master:String,dropDir:String, checkpointDirectory:String) = { val updateFunc = (values: Seq[Int], state: Option[Int]) => { val currentCount = values.sum val previousCount = state

Problem With Giving Memory To Executors On YARN

I'm launching a Spark shell with the following parameters ./spark-shell --master yarn-client --executor-memory 32g --driver-memory 4g but when I look at the Spark UI it shows only 209.3 GB total memory. Executors (10) - *Memory:* 55.9 GB Used (209.3 GB Total) This is a 10 node YARN cluster where each node has 48G of memory. Any idea what I'm missing here? Thanks -Soumya

Controlling Number Of Executors On Mesos Vs YARN

I'm trying to compare the performance of Spark running on Mesos vs YARN. However, I am having problems being able to configure the Spark workload to run in a similar way on Mesos and YARN. When running Spark on YARN, you can specify the number of executors per node. So if I have a node with 4 CPUs, I can specify 6 executors on that node. When running Spark on Mesos, there doesn't seem to be

Number Of Executors Smaller Than Requested In YARN.

Hi, When I try requesting a large number of executors - e.g. 242, it doesn't seem to actually reach that number. E.g., under the executors tab, I only see an executor ID of upto 234. This despite the fact that there're plenty more memory available as well as CPU cores, etc in the system. In fact, in the YARN page, it shows that 243 containers are running (242 executors + driver). Anyone

Max Number Of Executors Per A Node

Hi, I am currently running a single node storm deployment with 6 workers. But when I try to deploy multiple topologies in a way that those utilize all the workers there are still idle workers at the end though there are topologies which do not get the number of workers which those asked for. Same applies to no of executors. With three topologies which each initialize around 150 executors actual

Spark GroupBy Operation Is Only Assigned 2 Executors

Hi all, I am running a simple analysis using Spark streaming. I set executor number and default parallelism both as 300. The program consumes data from Kafka and do a simple groupBy operation with 300 as the parameter. The batch size is one minute. In the first two batches, there are around 50 executors. However, after the first two batches, there are always 2 executors for the groupBy operation

Pyspark/yarn And Inconsistent Number Of Executors

I've set up a YARN (Hadoop 2.4.1) cluster with Spark 1.0.1 and I've been seeing some inconsistencies with out of memory errors (java.lang.OutOfMemoryError: unable to create new native thread) when increasing the number of executors for a simple job (wordcount). The general format of my submission is: spark-submit \ --master yarn-client \ --num-executors=$EXECUTORS \ --executor-cores

Storm 0.9.2-incubating - Num Workers And Num Executors Switched?

We recently upgraded to Storm 0.9.2-incubating, and found that on the UI, Num workers and Num executors switched. Example: In older version (0.9.0.1): [cid:[email protected]] In new version (0.9.2-incubating): [cid:[email protected]] Is this a UI bug? Or did something change in Storm core functionality? Thanks, Jing

Increasing Workers Or Executors Has Problem ?

HI, i did small example on storm in cluster mode , which contains one spout and one bolt.here in my spout am reading list of files(10 files each contains 100 records ) ,while in my bolt am just writing receiving tuples into file. when i run this application with 2 executors for bolt and 2 executors for spout and 2 workers then it is executed fine. there is no duplicate tuples. i received 1000

RDD Partitions Not Distributed Evenly To Executors

Related discussions

Spark-dev

Spark-user