[Hive-User] Hive equivalent of row

I have a table with three columns, A, B, and Score, where A and B are some items, and Score
is some kind of affinity between A and B. There are N number of items of each A and B, so
that the total number of rows in the table are N^2.
Is there a way to fetch "top 5 items in B" for each item in A? So, for each distinct item
in A, I want to look up 5 items in B which have the highest value in Score.
If this were to be done in DB2, I would probably use some kind of windowing function using
row_number().

Reply To : Hive Equivalent Of Row_number()

asked Apr 12 2012 at 20:43

Saurabh S

1 Replies for : Hive Equivalent Of Row_number()

http://www.quora.com/Hive-computing/How-are-SQL-type-analytic-and-windowing-functions-accomplished-in-Hadoop-Hive
--
Alex K

Reply To : Hive Equivalent Of Row_number()

answered Apr 12 2012 at 20:46

Alex Kozlov

Related discussions

ROW_NUMBER() Equivalent In Hive

Hi, This is Kumar, and this is my first question in this group. I have a requirement to implement ROW_NUMBER() from Teradata in Hive where partitioning happens on multiple columns along with multiple column ordering. It can be easily implemented in Hadoop MR, but I have to do in Hive. By doing in UDF can assign same rank to grouping key considering dataset is small, but ordering need to be done

Status Of Spark Analytics Functions? Over, Rank, Percentile, Row_number, Etc.

I’m curious what the status of implementing hive analytics functions in spark. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics Many of these seem missing. I’m assuming they’re not implemented yet? Is there an ETA on them? or am I the first to bring this up? :-P Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator

VIEW And ROW_NUMBER

I need to create a view, about a ranking. The select from which I generate the view has a "ORDER BY" and I need to have a column in that select that shows the position of the object in that ranking. I have searched on google, and I have found that it's possibile to do it using the SET command and using variables.. but I don't think in a VIEW I can use variables and SET. Any idea? Thanks

Support Of “QUALIFY ROW_NUMBER() OVER(...)”

Hi All, Is there any approach to support the *QUALIFY* SQL key word, or any workaround? Like the SQL statement below, SELECT id, col1, col2, ROW_NUMBER() OVER(PARTITION BY col3 ORDER BY col1 ASC) AS Rwn FROM table_name *QUALIFY Rwn=1;* Currently impala supports the row_number() over(...) analytic function, http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/latest

Rownum, Row_number() Or Looping Ability With Hiveql?

Hi, =20 Is there a hive equivalent to Oracle's rownum, row_number() or the abili= ty to loop through a resultset? =20 I have been struggling to create a hive query that will give me max X records, per something, when sorted by something. For example, I have= book data, multiple records for any given isbn, and want the lowest 5 priced= books per isbn. =20 I can accomplish this in oracle with

Help In ROW_NUMBER() OVER (PARTITION BY) In Hive

Hi Experts, I'm working with Teradata query conversion to hive environment (Hive version 0.10.0).The challenge that am facing here is in converting the below line in query. In SELECT clause: ROW_NUMBER() OVER (PARTITION BY CLMST_KEY2 ORDER BY COUNTER) AS CLMST_ORDR_NBR When searched found like instead of ROW_NUMBER() I can go with ROW_SEQUENCE using UDF. what to do with OVER clause and

Row_number() Over(Partition By) Throw Error With Null Input.

Hi, select row_number() over (PARTITION BY country,state,department,branch_name) from Employee_details; select count(*) over (PARTITION BY country,state,department,branch_name) from Employee_details; These queries throw Error when Employee_details table has zero rows They work great if the table is not Empty. Is there any Limitation using this UDF's with Null Input? Please Help me guys

ROW_NUMBER() And RANK() In MongoDB

I have a products table with below schema (ProductID INT, ProductStartDate date, ProductExpDate date, ProductTypeID int, #PacketsInProduct int, Price int, Discount int, Score int) I need to write a query something like this in MongoDB. I am using C# driver 1.5. The problem I am getting in first CTE below (Products_CTE) where Row_NUMBER() is deciding the product rank which is matching my where

Row_number() Over Skew Partition By Clolumns

The following SQL will run very slow due to skew values in skew_col column: select row_number() over (partition by skew_col) from some_table; Is there any way to optimize it? Thanks

Row_number() Over (partition By Col1 Order By Col2)

Hi All, I want to do this in Pig. "row_number() over (partition by col1 order by col2)" Any suggestions how I can do this? I know I can do group by instead of partition by and order by in Pig. But is there is any function with which I can generate row number() or rank() as we can do in SQL? Thanks for any help and suggestions. Sonia

Hive Equivalent Of Row_number()

Related discussions

Hive-commits

Hive-dev

Hive-user