[Hbase-User] Collation order of items

Is there any way to control introduce a different ordering scheme from
the base comparable bytes?  My use case is that I am using UTF-8 data
for my keys, and I would like to have scans use UTF-8 collation.
Could this be done by providing an alternate implementation of
WritableComparable?
Thanks in advance!
--Tom

Reply To : Collation Order Of Items

asked Jun 8 2012 at 16:35

Tom Brown

5 Replies for : Collation Order Of Items

On Fri, Jun 8, 2012 at 9:35 AM, Tom Brown  wrote:
> Is there any way to control introduce a different ordering scheme from
> the base comparable bytes?  My use case is that I am using UTF-8 data
> for my keys, and I would like to have scans use UTF-8 collation.
>
> Could this be done by providing an alternate implementation of
> WritableComparable?
>
> Thanks in advance!
>
Unfortunately no Tom.  The database is all sorted the same way.
Different sorts per table would complicate system interactions (the
catalog tables would have to change sort by table).  It might be
doable but it would take some work.
Can you store your data UTF-16 or UTF-32?  Its a while since I dealt
w/ this stuff but IIRC, their sort order is byte order?  (WARNING!  I
could be way off here).
St.Ack

Reply To : Collation Order Of Items

answered Jun 8 2012 at 17:14

Stack

Storing the bytes as native UTF-16 or UTF-32 will not help.  Even
strings in UTF-8 format can be sorted by their code points when stored
as bytes.  Unfortunately, that's not really useful for collation as
characters like "è" (U+00E8) should appear between "e" (U+0065) and
"f" (U+0066), but the code points to not allow this.
Thanks anyway!
--Tom

Reply To : Collation Order Of Items

answered Jun 8 2012 at 17:34

Tom Brown

Tom, another approach you could take would be to store an ASCII encoded version of the string
as the row key or column qualifier, and then the full UTF-8 string elsewhere (e.g. in the
cell value, or even later in the row key). That wouldn't work out the fine sorting (whether
"è" sorts before or after "e") but it would solve the gross sorting ("è" would always come
before "f"). If you need true UTF-8 collation in the results, you could then implement it
as a layer on top of that (in your app, or maybe a co-processor, I'm not sure about the latter).
But at least with this approach, you'd be able to take advantage of rowkey ranges in your
scans, which would probably make up for any time spent doing a secondary sort.
Ian

Reply To : Collation Order Of Items

answered Jun 8 2012 at 17:40

Ian Varley

Yet another approach is to transform your keys into byte comparable values
that preserve your desired sort order, and store that instead. The ICU
library has the ability to do this for various collations of UTF strings:
http://userguide.icu-project.org/collation/architecture#TOC-Sort-Keys
So for this case HBase could store the ICU sortkey rather than the actual
UTF string. You then get correct scans, but just as in Ian's example, you
need to implement a layer that converts requests your client requests to
HBase UTF to sortkey. This will almost certainly give you better HBase
performance since memcmp is generally faster than a custom comparator.

Reply To : Collation Order Of Items

answered Jun 8 2012 at 17:58

Jason Frantz

On Fri, Jun 8, 2012 at 10:58 AM, Jason Frantz  wrote:
> Yet another approach is to transform your keys into byte comparable values
> that preserve your desired sort order, and store that instead. The ICU
> library has the ability to do this for various collations of UTF strings:
>
> http://userguide.icu-project.org/collation/architecture#TOC-Sort-Keys
>
> So for this case HBase could store the ICU sortkey rather than the actual
> UTF string. You then get correct scans, but just as in Ian's example, you
> need to implement a layer that converts requests your client requests to
> HBase UTF to sortkey. This will almost certainly give you better HBase
> performance since memcmp is generally faster than a custom comparator.
I love this mailing list. Thanks, you just helped solve a problem for
me unrelated to HBase.
Best regards,
   - Andy
Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)

Reply To : Collation Order Of Items

answered Jun 8 2012 at 18:59

Andrew Purtell

Related discussions

Changing Sort Order Of Items.

I'm trying to create the most efficient way to allow a user to change the display order of a group of rows in a table. Lets say the basic table is: id group_id name sort_order The query to display it would be "SELECT id, name FROM mytable WHERE group_id = $x ORDER BY sort_order" Now when I display it they currenlty all have the same sort_order value so they come in the order

Sort Order Of "missing" Items

When items are sorted, are all the docs with the sort field missing considered "tied" in terms of their sort order, or are they "indeterminate", or do they have some arbitrary order imposed on them (e.g. _docid_)? For example, would "b" be considered as part of the sort in the following query, or would all the missing 'a' fields be in some kind of order already, thus making the sort algorithm never

Changing Order Of Items

this one bugs me for a while. how to change order. I have a list of tasks. by status, task could be 1 (todo) or 0 (done) - status value stored in mysql. I can list tasks per status or all. order number is stored in mysql too. the easiest way to change order is to have form for each task where you will enter manually number and then submit (one submit button for whole form). but, if you change

Order Of Items In Cachedump

I know cachedump may be going away soon, but for now it's still a somewhat useful diagnostic tool. To that end, what determines the order of the keys returned by cachedump? Is it based on the LRU or just completely random? Thanks! You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from

Changing Order Of Collection Items

Hi, Anybody a great idea for changing item order in a sorted set using the wicket viewer? I have a rank-property in the database but how to easily manage the item order (move item(s) up/down single or multiple places)... Thanks, Erik

Order Of Items In A WHERE...IN Clause

Hello, Is it permissible to order a clause such that the search term is the first item (in the clause)? standard: field1 IN (123, 654, 789) in question: 123 IN (field1, field2, field3) I am interested to know if the optimizer treats this any differently if anybody can shed any light on it (except for the obvious difference in the above queries!) Thanks, Andy

Listing Question

Hi there everyone, I have a little problem, I could do this with 2 seperate queries but if I can do it with 1 then even better ;-) I have to list items in numeric order IF the field isn't empty (ie: 0 comes at the top, followed by 1 etc ....) and that isn't an issue as PHP with MySQL makes that very easy - BUT here's the problem i'm having. Once it's gone through the list, any items that

Sessions

Hi I am working on a Admin view of an Intranet site. They need to enter orders. An order can include many inventory items so I have divided it up to two pages. I have set up a first page session that passes the first page info onto the second page where the inventory items are addes to the order. I need to be able to add multple second pages while still maintaining the first page info and

A Template Generates Invalid Items Order

I have a problem play in 2.0 and 2.1. fragment template code: � � � � � � � � � � � � � � � � � � � � � � � � � � � @for(prop_type

Sort Question

Hi, I need to sort a map however when more then 10 items the sort order takes 11 before 2, ie: 1,11,12,2,3,4,5 etc. I've tried sortWith(_._1.sort < _._1.sort) and sortBy(_.1.sort) but doesn't work Thanks

Collation Order Of Items

Related discussions

Hbase-commits

Hbase-dev

Hbase-issues

Hbase-user