Un-needed checkpoints are not getting automatically deleted in my
application.
I.e. the lineage looks something like this and checkpoints simply
accumulate in a temporary directory (every lineage point, however, does zip
with a globally permanent):
PermanentRDD: Global zips with all the intermediate ones
Intermediate RDDs: A--->B--->C---->D---->E---->F---->----->G
| |
|
Hi,
Using the spark-shell, I can't sc.parallelize to get an RDD.
Looks like a bug.
scala> sc.parallelize(Array("a","s","d"))
java.lang.NullPointerException
at (:17)
at (:22)
at (:24)
at (:26)
at (:28)
at (:30)
at (:32)
at (:34)
at (:36)
at .(:40)
at .()
at .(:11)
at .()
at $export()
at sun.reflect.NativeMethodAccessorImpl
Hi,
What is the easiest way to skip first n lines in rdd??
I am not able to figure this one out?
Thanks
Hi, What is the easiest way to skip first n lines in rdd??I am not able to figure this one out?Thanks
Hi Spark users,
I need a small help in collecting stage level information of Spark
workflows. I have added a a listener to spark context and the goal is to
monitor each stage. There are two issues I am struggling with:
1. Trying to find the input data paths for each stage. Looking at the code,
it seems that stage objects do not maintain this information. But is it
possible to obtain it by
Hi,
I have two RDDs, vertices and edges. Vertices is an RDD and edges is a pair
RDD. I want to take three way join of these two. Joins work only when both
the RDDs are pair RDDS right? So, how am I supposed to take a three way
join of these RDDs?
Thank You
Hi,
I have two RDDs, vertices and edges. Vertices is an RDD and edges is a pair
RDD. I want to take three way join of these two. Joins work only when both
the RDDs are pair RDDS right? So, how am I supposed to take a three way
join of these RDDs?
Thank You
Hi,
I have two RDDs, veritces which is an RDD and edges, which is a pair RDD. I
have to do a three-way join of these two. Joins work only when both the RDDs
are pair RDDs, so how can we perform a three-way join of these RDDs?
Thank You
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Joins-in-Spark-tp20819.html
Sent from the Apache Spark User List mailing
Hi all,
Does Spark support in-memory shuffling now? If not, is there any
consideration for it?
Thanks!
Xudong Zheng
Hi
I am using Spark to distribute computationally intensive tasks across the
cluster. Currently I partition my RDD of tasks randomly. There is a large
variation in how long each of the jobs take to complete, leading to most
partitions being processed quickly and a couple of partitions take forever
to complete. I can mitigate this problem by increasing the number of
partitions to some extent
How can I set the default serializer to Kyro when using Spark Shell?
How can I set the default serializer to Kyro when using Spark Shell?