Hey guys,
We are currently using the JDBC interface to Hive to remotely send Hive queries.
The only problem here is that when the statement is executed, it just sits and hangs until the Hive query has completed. Is there any way to somehow submit a query and be able to get a handler on some object that I could ping to see how far along (percentage, perhaps) the Hive query has gone?
Thank you, Ryan
Ryan LeCompte's gravatar image asked Jun 24 2010 at 14:29 in Hive-User by Ryan LeCompte

4 Answers

JDBC doesn't have a standard interface for this, but one clumsy way might be to set a unique property on the session, and then scan the job tracker (visiting the job.xml to grab the properties) to find jobs with this property set. Then you could use the standard Hadoop progress reporting (at least for single-job queries).
I just verified that if I do
set xyzzy=1;
I can find the jobs manually from the job tracker web UI.
JVS
John Sichi's gravatar image answered Jun 24 2010 at 17:49 by John Sichi
On Thu, Jun 24, 2010 at 1:49 PM, John Sichi wrote: > JDBC doesn't have a standard interface for this, but one clumsy way might be to set a unique property on the session, and then scan the job tracker (visiting the job.xml to grab the properties) to find jobs with this property set.  Then you could use the standard Hadoop progress reporting (at least for single-job queries). > > I just verified that if I do > > set xyzzy=1; > > I can find the jobs manually from the job tracker web UI. > > JVS > > On Jun 24, 2010, at 7:29 AM, Ryan LeCompte wrote: > >> Hey guys, >> >> We are currently using the JDBC interface to Hive to remotely send Hive queries. >> >> The only problem here is that when the statement is executed, it just sits and hangs until the Hive query has completed. Is there any way to somehow submit a query and be able to get a handler on some object that I could ping to see how far along (percentage, perhaps) the Hive query has gone? >> >> Thank you, >> Ryan >> > >
We do something like this
select /* mynamehere */ ...
This way we know exactly who to blame when the cluster gets saturated from one look at the JobTracker page.
Edward Capriolo's gravatar image answered Jun 24 2010 at 18:08 by Edward Capriolo
Good ideas!
That works great for manually looking at the job tracker UI... but is there a way to figure out the job ID of the query programmatically in order to track the progress? Or do I need to screen-scape the job tracker web page somehow?
Thanks, Ryan
Ryan LeCompte's gravatar image answered Jun 24 2010 at 18:16 by Ryan LeCompte
I believe anything you can do from the web page you can do from the Hadoop = Java API.
JVS
John Sichi's gravatar image answered Jun 24 2010 at 20:53 by John Sichi