[Spark-User] Checkpoints in Spark

I'm curious about what kind of things are saved in the checkpoints.
I just changed the number of executors when I execute Spark and it didn't
happen until I remove the checkpoint, I guess that if I'm using
log4j.properties and I want to changed I have to remove the checkpoint as
well.
When you need to change your code and don't wan to to lose any data, Is
there any easy way to do this change in the code?

Reply To : Checkpoints In Spark

asked Mar 30 2016 at 03:49

Guillermo Ortiz

Related discussions

Spark Job (not Spark Streaming) Doesn't Delete Un-needed Checkpoints.

Un-needed checkpoints are not getting automatically deleted in my application. I.e. the lineage looks something like this and checkpoints simply accumulate in a temporary directory (every lineage point, however, does zip with a globally permanent): PermanentRDD: Global zips with all the intermediate ones Intermediate RDDs: A--->B--->C---->D---->E---->F---->----->G | | |

Can't Sc.paralellize In Spark 0.7.3 Spark-shell

Hi, Using the spark-shell, I can't sc.parallelize to get an RDD. Looks like a bug. scala> sc.parallelize(Array("a","s","d")) java.lang.NullPointerException at (:17) at (:22) at (:24) at (:26) at (:28) at (:30) at (:32) at (:34) at (:36) at .(:40) at .() at .(:11) at .() at $export() at sun.reflect.NativeMethodAccessorImpl

Skip Lines In Spark

Hi, What is the easiest way to skip first n lines in rdd?? I am not able to figure this one out? Thanks Hi, What is the easiest way to skip first n lines in rdd??I am not able to figure this one out?Thanks

Stage Monitoring In Spark

Hi Spark users, I need a small help in collecting stage level information of Spark workflows. I have added a a listener to spark context and the goal is to monitor each stage. There are two issues I am struggling with: 1. Trying to find the input data paths for each stage. Looking at the code, it seems that stage objects do not maintain this information. But is it possible to obtain it by

Joins In Spark

Hi, I have two RDDs, vertices and edges. Vertices is an RDD and edges is a pair RDD. I want to take three way join of these two. Joins work only when both the RDDs are pair RDDS right? So, how am I supposed to take a three way join of these RDDs? Thank You

Joins In Spark

Hi, I have two RDDs, veritces which is an RDD and edges, which is a pair RDD. I have to do a three-way join of these two. Joins work only when both the RDDs are pair RDDs, so how can we perform a three-way join of these RDDs? Thank You View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Joins-in-Spark-tp20819.html Sent from the Apache Spark User List mailing

Does Spark Support In-memory Shuffling?

Hi all, Does Spark support in-memory shuffling now? If not, is there any consideration for it? Thanks! Xudong Zheng

Equally Weighted Partitions In Spark

Hi I am using Spark to distribute computationally intensive tasks across the cluster. Currently I partition my RDD of tasks randomly. There is a large variation in how long each of the jobs take to complete, leading to most partitions being processed quickly and a couple of partitions take forever to complete. I can mitigate this problem by increasing the number of partitions to some extent

Setting Serializer In Spark Shell

How can I set the default serializer to Kyro when using Spark Shell? How can I set the default serializer to Kyro when using Spark Shell?

Checkpoints In Spark

Related discussions

Spark-dev

Spark-user