Tom, another approach you could take would be to store an ASCII encoded version of the string
as the row key or column qualifier, and then the full UTF-8 string elsewhere (e.g. in the
cell value, or even later in the row key). That wouldn't work out the fine sorting (whether
"è" sorts before or after "e") but it would solve the gross sorting ("è" would always come
before "f"). If you need true UTF-8 collation in the results, you could then implement it
as a layer on top of that (in your app, or maybe a co-processor, I'm not sure about the latter).
But at least with this approach, you'd be able to take advantage of rowkey ranges in your
scans, which would probably make up for any time spent doing a secondary sort.