QnaList > Groups > Spark-User > Apr 2016
faq

Standard Deviation In Hive 2 Is Still Incorrect

Hi,
I reported back in April 2015 that what Hive calls Standard Deviation
Function  STDDEV is a pointer to STDDEV_POP. This is incorrect and has not
been rectified in Hive 2
Both Oracle and Sybase point STDDEV to STDDEV_SAMP not STDDEV_POP. Also I
did tests with Spark 1.6 as well and Spark correctly points STTDEV to
STDDEV_SAMP.
The following query was used
SELECT
SQRT((SUM(POWER(AMOUNT_SOLD,2))-(COUNT(1)*POWER(AVG(AMOUNT_SOLD),2)))/(COUNT(1)-1))
AS MYSTDDEV,
        STDDEV(amount_sold) AS STDDEV,
        STDDEV_SAMP(amount_sold) AS STDDEV_SAMP,
        STDDEV_POP(amount_sold) AS STDDEV_POP
from    sales;
The following is from running the above query on Hive where STDDEV -->
 STDDEV_POP which is incorrect
+--------------------+---------------------+--------------------+---------------------+--+
|      mystddev      |       stddev        |    stddev_samp     |
stddev_pop      |
+--------------------+---------------------+--------------------+---------------------+--+
| 260.7270919450411  | 260.72704617040444  | 260.7270722861465  |
260.72704617040444
|
+--------------------+---------------------+--------------------+---------------------+--+
The following is from Spark-sql where STDDEV -->  STDDEV_SAMP which is
correct
+--------------------+---------------------+--------------------+---------------------+--+
|      mystddev      |       stddev        |    stddev_samp     |
stddev_pop      |
+--------------------+---------------------+--------------------+---------------------+--+
| 260.7270919450411  | 260.7270722861637   | 260.7270722861637  |
260.72704617042166  |
+--------------------+---------------------+--------------------+---------------------+--+
Hopefully The Hive one will be corrected.
Thanks
Dr Mich Talebzadeh
LinkedIn * https://www.linkedin.com/profile/view?idÔŅĹEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*
http://talebzadehmich.wordpress.com

asked Apr 4 2016 at 15:27

Mich Talebzadeh 's gravatar image



Related discussions

Tagged

Group Spark-user

asked Apr 4 2016 at 15:27

active Apr 4 2016 at 15:27

posts:1

users:1

Spark-dev

Spark-user

©2013 QnaList.com