Hi,
When a hive query is submitted to hiveserver2 over JDBC, is there a way to get the Hadoop job id (and status) for the hive query?
The JDBC call "statement.execute(hiveQuery)" is a blocking call. Specifically, is there any way to execute a query on the same JDBC connection to from another thread know the job Id?
For now, I am following this approach: Before submitting the actual query, I execute the following on the same statement:
set mapred.job.name=myjob.<pid>.<currentTime>
Here <pid> is the process id of the submitting java process and <currentTime> is obtained using System.currentTimeMillis().
This sets the job name for the subsequent queries. I can then query the job Id for this job name using the JobClient and then I can monitor the job status using this job Id.
Let me know if there is a better way to proceed.
-Kiran
Lonikar, Kiran 's gravatar image asked May 28 2015 at 21:23 in Hive-User by Lonikar, Kiran

7 Answers

Hi ?Kiran,
Which version of Hive are you using.
In 1.2 release, we have an option to set session level logging from client via hive.server2.logging.operation.level. Setting this parameter to EXECUTION level should provide map-red job information associated with the query at the client side, which you should be able to retrieve in a parallel thread as the query is running. This idea is demonstrated in the following hive-unit test:
https://github.com/apache/hive/blob/master/itests/hive-unit/src/test/java/org/apache/hive/service/cli/operation/TestOperationLoggingAPIWithMr.java
More information about the related parameter can be found here :
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.logging.operation.level
For the above parameter to work, hiveserver2 should have logging enabled, i.e. hive.server2.logging.operation.enabled should be set to true (default is true) when you start hiveserver2.
Thanks
Hari
Hari Subramaniyan 's gravatar image answered May 29 2015 at 02:01 by Hari Subramaniyan
Hi Hari,
I am using hive 0.13 and above. Thanks for the info. The example you provided uses cli and does not seem like using JDBC. The JDBC calls are blocking (no API like executeStatementAsync). Does the class org.apache.hive.service.cli.CLIServiceClient use JDBC internally?
Does it involve log scraping? I would prefer a programmatic interface.
-Kiran
From: Hari Subramaniyan Friday, May 29, 2015 2:32 PM
To: Re: Hive over JDBC: Retrieving job Id and status
Hi ​Kiran,
Which version of Hive are you using.
In 1.2 release, we have an option to set session level logging from client via hive.server2.logging.operation.level. Setting this parameter to EXECUTION level should provide map-red job information associated with the query at the client side, which you should be able to retrieve in a parallel thread as the query is running. This idea is demonstrated in the following hive-unit test:
https://github.com/apache/hive/blob/master/itests/hive-unit/src/test/java/org/apache/hive/service/cli/operation/TestOperationLoggingAPIWithMr.java
More information about the related parameter can be found here :
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.logging.operation.level
For the above parameter to work, hiveserver2 should have logging enabled, i.e. hive.server2.logging.operation.enabled should be set to true (default is true) when you start hiveserver2.
Thanks
Hari
________________________________
From: Lonikar, Kiran Thursday, May 28, 2015 9:23 PM
To: Hive over JDBC: Retrieving job Id and status
Hi,
When a hive query is submitted to hiveserver2 over JDBC, is there a way to get the Hadoop job id (and status) for the hive query?
The JDBC call “statement.execute(hiveQuery)” is a blocking call. Specifically, is there any way to execute a query on the same JDBC connection to from another thread know the job Id?
For now, I am following this approach: Before submitting the actual query, I execute the following on the same statement:
set mapred.job.name=myjob.<pid>.<currentTime>
Here <pid> is the process id of the submitting java process and <currentTime> is obtained using System.currentTimeMillis().
This sets the job name for the subsequent queries. I can then query the job Id for this job name using the JobClient and then I can monitor the job status using this job Id.
Let me know if there is a better way to proceed.
-Kiran
Lonikar, Kiran 's gravatar image answered May 29 2015 at 03:17 by Lonikar, Kiran
Hi Kiran,
For Async calls, see https://github.com/apache/hive/blob/master/itests/hive-unit/src/test/java/org/apache/hive/service/cli/operation/TestOperationLoggingAPIWithMr.java#L83
The client in this case uses MiniHS2.getClient() which uses JDBC internally. The method suggested involves log scraping, I am not sure if there is a direct API to retrieve the mapred job ids associated with hive queries run via HiveServer2 other than parsing the logs.
Thanks
Hari
Hari Subramaniyan 's gravatar image answered May 29 2015 at 04:44 by Hari Subramaniyan
Hi Hari,
Thanks for your prompt replies. I went through JIRAs 7615 and 4629. It appears that the log retrieval also happens over JDBC. It would be great if you could confirm this too and also if it works even when running HS2 over http. The JIRAs talk of modification to thrift interface, and not http.
If the log retrieval does not happen over JDBC, or in http HS2 mode, it will not work over knox. I want a solution which works over knox.
If it doesn’t, I will go with my solution of giving a unique job name to my query and then querying for the id using job client.
-Kiran
From: Hari Subramaniyan Friday, May 29, 2015 5:14 PM
To: Lonikar, Kiran; Re: Hive over JDBC: Retrieving job Id and status
Hi Kiran,
For Async calls, see https://github.com/apache/hive/blob/master/itests/hive-unit/src/test/java/org/apache/hive/service/cli/operation/TestOperationLoggingAPIWithMr.java#L83
The client in this case uses MiniHS2.getClient() which uses JDBC internally. The method suggested involves log scraping, I am not sure if there is a direct API to retrieve the mapred job ids associated with hive queries run via HiveServer2 other than parsing the logs.
Thanks
Hari
________________________________
From: Lonikar, Kiran Friday, May 29, 2015 3:17 AM
To: RE: Hive over JDBC: Retrieving job Id and status
Hi Hari,
I am using hive 0.13 and above. Thanks for the info. The example you provided uses cli and does not seem like using JDBC. The JDBC calls are blocking (no API like executeStatementAsync). Does the class org.apache.hive.service.cli.CLIServiceClient use JDBC internally?
Does it involve log scraping? I would prefer a programmatic interface.
-Kiran
From: Hari Subramaniyan Friday, May 29, 2015 2:32 PM
To: Re: Hive over JDBC: Retrieving job Id and status
Hi ​Kiran,
Which version of Hive are you using.
In 1.2 release, we have an option to set session level logging from client via hive.server2.logging.operation.level. Setting this parameter to EXECUTION level should provide map-red job information associated with the query at the client side, which you should be able to retrieve in a parallel thread as the query is running. This idea is demonstrated in the following hive-unit test:
https://github.com/apache/hive/blob/master/itests/hive-unit/src/test/java/org/apache/hive/service/cli/operation/TestOperationLoggingAPIWithMr.java
More information about the related parameter can be found here :
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.logging.operation.level
For the above parameter to work, hiveserver2 should have logging enabled, i.e. hive.server2.logging.operation.enabled should be set to true (default is true) when you start hiveserver2.
Thanks
Hari
________________________________
From: Lonikar, Kiran Thursday, May 28, 2015 9:23 PM
To: Hive over JDBC: Retrieving job Id and status
Hi,
When a hive query is submitted to hiveserver2 over JDBC, is there a way to get the Hadoop job id (and status) for the hive query?
The JDBC call “statement.execute(hiveQuery)” is a blocking call. Specifically, is there any way to execute a query on the same JDBC connection to from another thread know the job Id?
For now, I am following this approach: Before submitting the actual query, I execute the following on the same statement:
set mapred.job.name=myjob.<pid>.<currentTime>
Here <pid> is the process id of the submitting java process and <currentTime> is obtained using System.currentTimeMillis().
This sets the job name for the subsequent queries. I can then query the job Id for this job name using the JobClient and then I can monitor the job status using this job Id.
Let me know if there is a better way to proceed.
-Kiran
Lonikar, Kiran 's gravatar image answered May 30 2015 at 05:39 by Lonikar, Kiran
Hi Kiran,​
The HS2 logging (introduced via HIVE-7615 and HIVE-4629) will work with HS2 in http mode. Theoretically, it should work over knox as well; although I haven't tried it myself.
Thanks
Hari
________________________________
From: Lonikar, Kiran Saturday, May 30, 2015 5:39 AM
To: Hari Subramaniyan; RE: Hive over JDBC: Retrieving job Id and status
Hi Hari,
Thanks for your prompt replies. I went through JIRAs 7615 and 4629. It appears that the log retrieval also happens over JDBC. It would be great if you could confirm this too and also if it works even when running HS2 over http. The JIRAs talk of modification to thrift interface, and not http.
If the log retrieval does not happen over JDBC, or in http HS2 mode, it will not work over knox. I want a solution which works over knox.
If it doesn’t, I will go with my solution of giving a unique job name to my query and then querying for the id using job client.
-Kiran
From: Hari Subramaniyan Friday, May 29, 2015 5:14 PM
To: Lonikar, Kiran; Re: Hive over JDBC: Retrieving job Id and status
Hi Kiran,
For Async calls, see https://github.com/apache/hive/blob/master/itests/hive-unit/src/test/java/org/apache/hive/service/cli/operation/TestOperationLoggingAPIWithMr.java#L83
The client in this case uses MiniHS2.getClient() which uses JDBC internally. The method suggested involves log scraping, I am not sure if there is a direct API to retrieve the mapred job ids associated with hive queries run via HiveServer2 other than parsing the logs.
Thanks
Hari
________________________________
From: Lonikar, Kiran Friday, May 29, 2015 3:17 AM
To: RE: Hive over JDBC: Retrieving job Id and status
Hi Hari,
I am using hive 0.13 and above. Thanks for the info. The example you provided uses cli and does not seem like using JDBC. The JDBC calls are blocking (no API like executeStatementAsync). Does the class org.apache.hive.service.cli.CLIServiceClient use JDBC internally?
Does it involve log scraping? I would prefer a programmatic interface.
-Kiran
From: Hari Subramaniyan Friday, May 29, 2015 2:32 PM
To: Re: Hive over JDBC: Retrieving job Id and status
Hi ​Kiran,
Which version of Hive are you using.
In 1.2 release, we have an option to set session level logging from client via hive.server2.logging.operation.level. Setting this parameter to EXECUTION level should provide map-red job information associated with the query at the client side, which you should be able to retrieve in a parallel thread as the query is running. This idea is demonstrated in the following hive-unit test:
https://github.com/apache/hive/blob/master/itests/hive-unit/src/test/java/org/apache/hive/service/cli/operation/TestOperationLoggingAPIWithMr.java
More information about the related parameter can be found here :
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.logging.operation.level
For the above parameter to work, hiveserver2 should have logging enabled, i.e. hive.server2.logging.operation.enabled should be set to true (default is true) when you start hiveserver2.
Thanks
Hari
________________________________
From: Lonikar, Kiran Thursday, May 28, 2015 9:23 PM
To: Hive over JDBC: Retrieving job Id and status
Hi,
When a hive query is submitted to hiveserver2 over JDBC, is there a way to get the Hadoop job id (and status) for the hive query?
The JDBC call “statement.execute(hiveQuery)” is a blocking call. Specifically, is there any way to execute a query on the same JDBC connection to from another thread know the job Id?
For now, I am following this approach: Before submitting the actual query, I execute the following on the same statement:
set mapred.job.name=myjob.<pid>.<currentTime>
Here <pid> is the process id of the submitting java process and <currentTime> is obtained using System.currentTimeMillis().
This sets the job name for the subsequent queries. I can then query the job Id for this job name using the JobClient and then I can monitor the job status using this job Id.
Let me know if there is a better way to proceed.
-Kiran
Hari Subramaniyan 's gravatar image answered May 30 2015 at 11:04 by Hari Subramaniyan
Hi Hari,
Thanks. I also found this in the Beeline code: https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/Commands.java
The execute, createLogRunnable and showRemainingLogsIfAny functions depict the usage. It uses a JDBC API extension through the hive specific HiveStatement (derived from JDBC’s java.sql.Statement). HiveStatement has APIs “boolean hasMoreLogs()” and “List<String> getQueryLog()”.
I do not think these APIs are documented anywhere (introduced in hive 0.14).
-Kiran
From: Hari Subramaniyan Saturday, May 30, 2015 11:35 PM
To: Lonikar, Kiran; Re: Hive over JDBC: Retrieving job Id and status
Hi Kiran,​
The HS2 logging (introduced via HIVE-7615 and HIVE-4629) will work with HS2 in http mode. Theoretically, it should work over knox as well; although I haven't tried it myself.
Thanks
Hari
________________________________
From: Lonikar, Kiran Saturday, May 30, 2015 5:39 AM
To: Hari Subramaniyan; RE: Hive over JDBC: Retrieving job Id and status
Hi Hari,
Thanks for your prompt replies. I went through JIRAs 7615 and 4629. It appears that the log retrieval also happens over JDBC. It would be great if you could confirm this too and also if it works even when running HS2 over http. The JIRAs talk of modification to thrift interface, and not http.
If the log retrieval does not happen over JDBC, or in http HS2 mode, it will not work over knox. I want a solution which works over knox.
If it doesn’t, I will go with my solution of giving a unique job name to my query and then querying for the id using job client.
-Kiran
From: Hari Subramaniyan Friday, May 29, 2015 5:14 PM
To: Lonikar, Kiran; job Id and status
Hi Kiran,
For Async calls, see https://github.com/apache/hive/blob/master/itests/hive-unit/src/test/java/org/apache/hive/service/cli/operation/TestOperationLoggingAPIWithMr.java#L83
The client in this case uses MiniHS2.getClient() which uses JDBC internally. The method suggested involves log scraping, I am not sure if there is a direct API to retrieve the mapred job ids associated with hive queries run via HiveServer2 other than parsing the logs.
Thanks
Hari
________________________________
From: Lonikar, Kiran Friday, May 29, 2015 3:17 AM
To: RE: Hive over JDBC: Retrieving job Id and status
Hi Hari,
I am using hive 0.13 and above. Thanks for the info. The example you provided uses cli and does not seem like using JDBC. The JDBC calls are blocking (no API like executeStatementAsync). Does the class org.apache.hive.service.cli.CLIServiceClient use JDBC internally?
Does it involve log scraping? I would prefer a programmatic interface.
-Kiran
From: Hari Subramaniyan Friday, May 29, 2015 2:32 PM
To: Re: Hive over JDBC: Retrieving job Id and status
Hi ​Kiran,
Which version of Hive are you using.
In 1.2 release, we have an option to set session level logging from client via hive.server2.logging.operation.level. Setting this parameter to EXECUTION level should provide map-red job information associated with the query at the client side, which you should be able to retrieve in a parallel thread as the query is running. This idea is demonstrated in the following hive-unit test:
https://github.com/apache/hive/blob/master/itests/hive-unit/src/test/java/org/apache/hive/service/cli/operation/TestOperationLoggingAPIWithMr.java
More information about the related parameter can be found here :
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.logging.operation.level
For the above parameter to work, hiveserver2 should have logging enabled, i.e. hive.server2.logging.operation.enabled should be set to true (default is true) when you start hiveserver2.
Thanks
Hari
________________________________
From: Lonikar, Kiran Thursday, May 28, 2015 9:23 PM
To: Hive over JDBC: Retrieving job Id and status
Hi,
When a hive query is submitted to hiveserver2 over JDBC, is there a way to get the Hadoop job id (and status) for the hive query?
The JDBC call “statement.execute(hiveQuery)” is a blocking call. Specifically, is there any way to execute a query on the same JDBC connection to from another thread know the job Id?
For now, I am following this approach: Before submitting the actual query, I execute the following on the same statement:
set mapred.job.name=myjob.<pid>.<currentTime>
Here <pid> is the process id of the submitting java process and <currentTime> is obtained using System.currentTimeMillis().
This sets the job name for the subsequent queries. I can then query the job Id for this job name using the JobClient and then I can monitor the job status using this job Id.
Let me know if there is a better way to proceed.
-Kiran
Lonikar, Kiran 's gravatar image answered Jun 1 2015 at 23:45 by Lonikar, Kiran
Hi Kiran,
You can find the API documentation for hive here :
http://hive.apache.org/javadocs/r1.2.0/api/
For an earlier version, please make sure to look at the correct one from http://hive.apache.org/javadocs/<http://hive.apache.org/javadocs/r1.2.0/api/>
Thanks
Hari
________________________________
From: Lonikar, Kiran Monday, June 01, 2015 11:45 PM
To: Hari Subramaniyan; RE: Hive over JDBC: Retrieving job Id and status
Hi Hari,
Thanks. I also found this in the Beeline code: https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/Commands.java
The execute, createLogRunnable and showRemainingLogsIfAny functions depict the usage. It uses a JDBC API extension through the hive specific HiveStatement (derived from JDBC’s java.sql.Statement). HiveStatement has APIs “boolean hasMoreLogs()” and “List<String> getQueryLog()”.
I do not think these APIs are documented anywhere (introduced in hive 0.14).
-Kiran
From: Hari Subramaniyan Saturday, May 30, 2015 11:35 PM
To: Lonikar, Kiran; Re: Hive over JDBC: Retrieving job Id and status
Hi Kiran,​
The HS2 logging (introduced via HIVE-7615 and HIVE-4629) will work with HS2 in http mode. Theoretically, it should work over knox as well; although I haven't tried it myself.
Thanks
Hari
________________________________
From: Lonikar, Kiran Saturday, May 30, 2015 5:39 AM
To: Hari Subramaniyan; RE: Hive over JDBC: Retrieving job Id and status
Hi Hari,
Thanks for your prompt replies. I went through JIRAs 7615 and 4629. It appears that the log retrieval also happens over JDBC. It would be great if you could confirm this too and also if it works even when running HS2 over http. The JIRAs talk of modification to thrift interface, and not http.
If the log retrieval does not happen over JDBC, or in http HS2 mode, it will not work over knox. I want a solution which works over knox.
If it doesn’t, I will go with my solution of giving a unique job name to my query and then querying for the id using job client.
-Kiran
From: Hari Subramaniyan Friday, May 29, 2015 5:14 PM
To: Lonikar, Kiran; Re: Hive over JDBC: Retrieving job Id and status
Hi Kiran,
For Async calls, see https://github.com/apache/hive/blob/master/itests/hive-unit/src/test/java/org/apache/hive/service/cli/operation/TestOperationLoggingAPIWithMr.java#L83
The client in this case uses MiniHS2.getClient() which uses JDBC internally. The method suggested involves log scraping, I am not sure if there is a direct API to retrieve the mapred job ids associated with hive queries run via HiveServer2 other than parsing the logs.
Thanks
Hari
________________________________
From: Lonikar, Kiran Friday, May 29, 2015 3:17 AM
To: RE: Hive over JDBC: Retrieving job Id and status
Hi Hari,
I am using hive 0.13 and above. Thanks for the info. The example you provided uses cli and does not seem like using JDBC. The JDBC calls are blocking (no API like executeStatementAsync). Does the class org.apache.hive.service.cli.CLIServiceClient use JDBC internally?
Does it involve log scraping? I would prefer a programmatic interface.
-Kiran
From: Hari Subramaniyan Friday, May 29, 2015 2:32 PM
To: Re: Hive over JDBC: Retrieving job Id and status
Hi ​Kiran,
Which version of Hive are you using.
In 1.2 release, we have an option to set session level logging from client via hive.server2.logging.operation.level. Setting this parameter to EXECUTION level should provide map-red job information associated with the query at the client side, which you should be able to retrieve in a parallel thread as the query is running. This idea is demonstrated in the following hive-unit test:
https://github.com/apache/hive/blob/master/itests/hive-unit/src/test/java/org/apache/hive/service/cli/operation/TestOperationLoggingAPIWithMr.java
More information about the related parameter can be found here :
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.logging.operation.level
For the above parameter to work, hiveserver2 should have logging enabled, i.e. hive.server2.logging.operation.enabled should be set to true (default is true) when you start hiveserver2.
Thanks
Hari
________________________________
From: Lonikar, Kiran Thursday, May 28, 2015 9:23 PM
To: Hive over JDBC: Retrieving job Id and status
Hi,
When a hive query is submitted to hiveserver2 over JDBC, is there a way to get the Hadoop job id (and status) for the hive query?
The JDBC call “statement.execute(hiveQuery)” is a blocking call. Specifically, is there any way to execute a query on the same JDBC connection to from another thread know the job Id?
For now, I am following this approach: Before submitting the actual query, I execute the following on the same statement:
set mapred.job.name=myjob.<pid>.<currentTime>
Here <pid> is the process id of the submitting java process and <currentTime> is obtained using System.currentTimeMillis().
This sets the job name for the subsequent queries. I can then query the job Id for this job name using the JobClient and then I can monitor the job status using this job Id.
Let me know if there is a better way to proceed.
-Kiran
Hari Subramaniyan 's gravatar image answered Jun 2 2015 at 18:14 by Hari Subramaniyan