[Cdh-User] Package and deploy 3rd party jars for CDH4 when submitting jobs programmatically

I recently switched from CDH3 to CDH4, and now I'm getting ClassNotFoundExceptions for any 3rd party jars my Mappers or Reducers reference.  I currently use the maven assembly plugin to build a jar that packages all referenced jars inside the main jar's lib folder.  On CDH3, when a mapper or reducer was executed the Hadoop's JVM was able to find any referenced jars inside the lib folder and use them without any problem.  Do I need to update the class path setting in primary jar's manifest to point to the lib folder?  I didn't have to do this with CDH3.
On CDH4, the hadoop JVM doesn't seem to be able find/use any jars in the lib folder, hence the ClassNotFoundExceptions.  To temporarily get around this problem I've unpackaged the 3rd party jars, so all their .class files are loose in my main jar.  But I'm not a big fan of this approach.  
I have read multiple time that CDH4 changed how 3rd party jars are referenced and/or loaded, and the "preferred" method to reference and deploy them is by declaring them with the "-libjars" flag from the cmd line.  That works great when a person is manually kicking off a MR job from the cmd line, but not when MR jobs are built into a software product, which get invoked by the Hadoop Java API.  How are we supposed to handle 3rd party jars when the MR jobs are being initiated by an application server via the Java Hadoop API?
Thanks,
JohnC

Reply To : Package And Deploy 3rd Party Jars For CDH4 When Submitting Jobs Programmatically

asked Feb 27 2013 at 12:16

John Conwell

2 Replies for : Package And Deploy 3rd Party Jars For CDH4 When Submitting Jobs Programmatically

Hi John, 
Could you also add in if you use MR1 or MR2 in CDH4? 
- show quoted text -
> -- 
> 
> 
> 
-- 
Harsh J

Reply To : Package And Deploy 3rd Party Jars For CDH4 When Submitting Jobs Programmatically

answered Feb 27 2013 at 12:46

Harsh J

Oh yea..I'm using MR1
- show quoted text -

Reply To : Package And Deploy 3rd Party Jars For CDH4 When Submitting Jobs Programmatically

answered Feb 27 2013 at 13:48

John Conwell

Related discussions

Submitting And Running Hadoop Jobs Programmatically

Hi, I am working on a open source project Nectar where i am trying to create the hadoop jobs depending upon the user input. I was using Java Process API to run the bin/hadoop shell script to submit the jobs. But it seems not good way because the process creation model is not consistent across different operating systems . Is there any better way to submit the jobs rather than invoking the shell

Launching Concurrent Jobs Programmatically

Hi all, I have a central app that currently kicks of old-style Hadoop M/R jobs either on-demand or via a scheduling mechanism. My intention is to gradually port this app over to using a Spark standalone cluster. The data will remain on HDFS. Couple of questions: 1. Is there a way to get Spark jobs to load from jars that have been pre-distributed to HDFS? I need to run these jobs programmatically

Creating SchemaField And FieldType Programmatically

I'm creating a some Solr plugins that index and search documents in a special way, and I'd like to make them as easy as possible to configure. Ideally I'd like users to be able to just drop a jar in place without having to copy any configuration into schema.xml, although I suppose they will have to register the plugins in solrconfig.xml. I tried making my UpdateProcessor "core aware" and creating

Adding LoginName And Passwords Programmatically

Hello, I have a database of about 40K members, which I want to port to MySQL. This is not the problem, my problem is that I have the following table: CREATE TABLE Person ( PersonID int(11) NOT NULL auto_increment, Title varchar(10) default NULL, FirstName varchar(35) NOT NULL default '', Initials varchar(5) default NULL, Surname varchar(35) NOT NULL default '', Email varchar

Getting Spark Job Progress Programmatically

I am writing yet another Spark job server and have been able to submit jobs and return/save results. I let multiple jobs use the same spark context but I set job group while firing each job so that I can in future cancel jobs. Further, what I deserve to do is provide some kind of status update/progress on running jobs (a % completion but be awesome) but I am unable to figure out appropriate spark

--map-column-java CLOBCOL=String Programmatically

Hello, It is related to https://issues.apache.org/jira/browse/SQOOP-1079,I am following http://sqoop.apache.org/docs/1.99.2/ClientAPI.html "JOB" "Below given code shows how to create a import job" section. Everything is working perfectly except the CLOB in oracle are not import correctly. Since I am creating and submitting the MJob programmatically. How do I add the "--map-column-java CLOBCOL=String

Bug When Creating New Dsn Programmatically

MyODBC 3.51.03 in setup.c in function set_attributes starting at line 152: /* set the default configuration for new DSN */ if(lpsetupdlg->fNewDSN) { strcpy(lpsetupdlg->aAttr[KEY_DESC].szAttr,"MySQL ODBC 3.51 Driver DSN"); strcpy(lpsetupdlg->aAttr[KEY_DB].szAttr,"test"); strcpy(lpsetupdlg->aAttr[KEY_SERVER].szAttr,"localhost"); strcpy(lpsetupdlg->aAttr[KEY_PORT

Bug When Creating New Dsn Programmatically

How To Programmatically Register Custom Functions In CDH4 Hue/Beeswax

We have many users who utilize our custom functions to access things in our Hive data warehouse via Hue and Beeswax. We usually expose this by manually using the 'create temporary function' command so that they don't have to register these functions themselves. We have tried to use the Thrift API to call this automatically, but we're having trouble getting the Thrift API to work using the command

Bind Programmatically

Hey ! I would like to programmatically marshal and bind my body to do : from("file://C:/Temp/camel/rep1/?noop=true") .split().tokenize("\n") .unmarshal() .bindy(BindyType.Csv, Ticket.class) .process(new Processor() { public void process(Exchange exchange) throws Exception { Ticket ticket = (Ticket) exchange.getIn().getBody() ; // Convert from ticket to CSV which