Hi,
is it possible to kill a running query (including all the hadoop jobs
behind)?
I think it's not, because the Hive JDBC Driver doesn't implement .close()
and .cancel() on the (prepared) statement.
This attached code shows the problem.
Bevor the statement gets executed, it will spawn a Thread that tries to
stop the execution of the query after 10 sec.
Are there any other ways to stop the job on the cluster?
I could do it over the Job Client, but for that i need the JobId.
Thanks a lot.
Best Regards,
Christian.
Christian Schneider 's gravatar image asked Jun 25 2013 at 09:51 in Hive-User by Christian Schneider

7 Answers

I figured out that there are two implementations of the Hive JDBC driver in
the hive-jdbc-0.10-cdh4.2.0 jar.
1. org.apache.hadoop.hive.jdbc.HiveStatement
2. org.apache.hive.jdbc.HiveStatement
The 1. implements .close() and .cancel() but it will not delete the running
jobs on the cluster anyway.
Any suggestions?
2013/6/25 Christian Schneider
Christian Schneider 's gravatar image answered Jun 25 2013 at 10:22 by Christian Schneider
Well... if the query created a MR job on your cluster then there's always:
1. use jobtracker to find your job id.
2. use hadoop job -kill <job_id> to nuke it.
you're saying there's no way to interrupt/kill the query from the client?
That very well may be the case.
Stephen Sprague 's gravatar image answered Jun 25 2013 at 10:44 by Stephen Sprague
Hi Stephen, thanks for the anser.
Identifying to the JobId is not that easy. I also tought about this.
Our application adds now a unique prefix to all queries. With this we can
identify the job.
Smht. like this:
SELECT * FROM foobar;
Now, our application can filter by Job Names starting with: -- UUID:
3242-414-124-14... to kill the query.
But i think this is more a workaround then a reliable, or?
Best Regards,
Christian.
2013/6/25 Stephen Sprague
Christian Schneider 's gravatar image answered Jun 25 2013 at 10:49 by Christian Schneider
Yes. that's a great idea there with the comment. And yes i wouldn't
disagree its a workaround to do all this.
Would be nice if the HiveServer2 could return a job_id before the MR job
kicks off (like the hive local client does) then at least you'd have an
unequivocal job_id. But ultimately until the jdbc client can implement
cancel() correctly I don't know of an alternative and even then sometimes
the connection between client and server can get broken and you'd have to
fall back to a more ruthless way to kill the job on the server as like
above.
so, yeah, might have to go with rolling your own here. I like it! :)
Stephen Sprague 's gravatar image answered Jun 25 2013 at 11:05 by Stephen Sprague
Hi Christian,
Sounds like a work around, but how do you prefix the job with a certain
name? Is that possible with a hive query statement?
Best regards,
Robin Verlangen
*Data Architect*
*
*
W http://www.robinverlangen.nl
E is CloudPelican? <http://goo.gl/HkB3D>*
Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.
Robin Verlangen 's gravatar image answered Jun 25 2013 at 11:05 by Robin Verlangen
all it is is a comment on the line above the first statement - and that'll
show up in the jobtracker. Just as he shows in his example.
Stephen Sprague 's gravatar image answered Jun 25 2013 at 12:59 by Stephen Sprague
We uses https://issues.apache.org/jira/browse/HIVE-3235 and kill jobs if
needed.
2013/6/26 Stephen Sprague
=?EUC-KR?B?TmF2aXO3+b3Cv+w=?= 's gravatar image answered Jun 25 2013 at 19:19 by Navis류승우

Related Discussions