I have an application that need to find out the n most recent modified
files for a given user id. I started out few tables but still couldn't get
what i want, I hope someone get point to some right direction...
See my tables below.
#1 won't work, because file_id's timeuuid contains creation time, not the
modification time.
#2 won't work, because i can't order by a non primary key
column(modified_date)
#3,#4 although i can now get a time series of modification time of each
file belongs to a user, my return list may still not accurate because a
single directory could have lot of modification changes. I basically end up
pulling out series of modification timestamp for the same directory.
Any suggestion?
Thanks
#1
CREATE TABLE user_file (
user_id uuid,
file_id timeuuid,
PRIMARY KEY(user_id, file_id)
);
#2
CREATE TABLE user_file (
user_id uuid,
file_id timeuuid,
modified_date timestamp,
PRIMARY KEY(user_id, file_id)
);
#3
CREATE TABLE user_file (
user_id uuid,
file_id timeuuid,
modified_date timestamp,
PRIMARY KEY(user_id, file_id, modified_date)
);
#4
CREATE TABLE user_file (
user_id uuid,
modified_date timestamp,
file_id timeuuid,
PRIMARY KEY(user_id, modified_date, file_id)
);
Jimmy Lin 's gravatar image asked Jul 9 2013 at 23:51 in Cassandra-User by Jimmy Lin

4 Answers

What you described this sounds like the most appropriate:
CREATE TABLE user_file (
user_id uuid,
modified_date timestamp,
file_id timeuuid,
PRIMARY KEY(user_id, modified_date)
);
If you normally need more information about the file then either store that as additional fields or pack the data using something like JSON or Protobuf.
Not sure I understand the problem.
Cheers
Aaron Morton
Freelance Cassandra Consultant
New
aaron morton 's gravatar image answered Jul 11 2013 at 00:00 by aaron morton
what I mean is, I really just want the last modified date instead of series
of timestamp and still able to sort or order by it.
(maybe I should rephrase my question as how to sort or order by last
modified column in a row)
CREATE TABLE user_file (
user_id uuid,
modified_date timestamp,
file_id timeuuid,
PRIMARY KEY(user_id, modified_date)
);
e.g user1 update file A 3 times in a row, and update file B, then update
file A again.
insert into user_file values(user1_uuid, date1, file_a_uuid);
insert into user_file values(user1_uuid, date2, file_a_uuid);
insert into user_file values(user1_uuid, date3, file_a_uuid);
insert into user_file values(user1_uuid, date4, file_b_uuid);
insert into user_file values(user1_uuid, date5, file_a_uuid);
#trying to get top 3 most recent changed files
select * from user_file where user_id=user1_uuid limit 3
using CQL, I will get 3 rows back(all file a)
(user1_uuid, date1, file_a_uuid);
(user1_uuid, date2, file_a_uuid);
(user1_uuid, date3, file_a_uuid);
what I want is (file a AND file b)
user1_uuid, date1, file_a_uuid
user1_uuid, date4, file_b_uuid
So how do I order by/sort by last modified column in a row?
thanks
Jimmy Lin 's gravatar image answered Jul 11 2013 at 00:39 by Jimmy Lin
Thanks for the suggestion.
I don't care the history of the update time to a file, BUT I do want to
ordered by it.
Reason for that is, without that, and if I have 10k+ file belongs to a
user, I have to fetch all the last modified time of all these 10k+ file and
sort through them in my application and only return the top N. Kind of
expensive.
I would like to see if it is possible to rely on Cassandra native storage
to achieve this.
CREATE TABLE user_file (
user_id uuid,
file_id timeuuid,
last_modified_time timestamp,
PRIMARY KEY(user_id, file_id)
);
select * from user_file where user_id=user1_uuid order by
last_modified_time limit 10
Above CQL would be invalid, because last_modified_time is not part of the
compound key, and is not allowed to used for order by purpose.
Jimmy Lin 's gravatar image answered Jul 11 2013 at 01:10 by Jimmy Lin
I think there is not an extremely simple solution to your problem. You
will probably need to use multiple tables to get the view you need. One
keyed just by file UUID, which tracks some basic metadata about the file
including the last modified time. Another as a materialized view of the
most recently modified files.
When a user updates the file, you'd need to read the current last_modified
time for that file and delete its value out of the most_recently_modified
table before inserting it back in, and updating the last_modified on the
files table.
This is a little bit fragile because it depends on reading then modifying
based on that result - and that's a typical antipattern for
eventually-consistent databases.
You might consider using a column on the user table using the List type
which keeps track of the most recently modified files for that user, treat
it like a queue and pop off the oldest ones each file write. This still
ends up being read-then-write, but presumably it is less prone to race
conditions because the user is not modifying many files at the same moment
in time, while many users could be modifying the same file at the same
moment. So it still falls under the antipattern, but at least the failures
will be less likely.
Eric Stevens 's gravatar image answered Jul 11 2013 at 05:55 by Eric Stevens

Related Discussions

  • Deleting Old Items During Compaction (WAS: Deleting Old Items) in Cassandra-user

  • Hi, We looking for solution for same problem. We have a wide column family with counters and we want to delete old data like 1 months old. One of potential ideas was to implement hook in compaction code and drop column which we don't need. Is this a viable option? Thanks, Ilya From: aaron morton [mailto:[email protected]] Sent: Tuesday, February 12, 2013 9:01 AM To: [email protected]..

  • Deleting Old Items in Cassandra-user

  • Hi, I would like to know if there is a way to delete old/unused data easily ? I know about TTL but there are 2 limitations of TTL: - AFAIK, there is no TTL on counter columns - TTL need to be defined at write time, so it's too late for data already inserted. I also could use a standard "delete" but it seems inappropriate for such a massive. In some cases, I don't know the row key and would like...

  • Write Time Of CQL3 Set Items in Cassandra-user

  • Hi all, I am using C* 1.2.4 with CQL3 and am taking advantage of the new collection support. One usage case I have is that I want a set of text and I need to know the time when each item in the set was written. If I understand CQL3 correctly, the underlying data engine utilizes composites for sets and maps (where sets are just maps with no values). I was hoping that I could store my text...

  • Any Limits On Number Of Items In A Collection Column Type in Cassandra-user

  • Hi, I have a column daycount list. The column is storing a count. Every few secs a new count is appended. The total count for the day is the sum of all items in the list. My application logs indicate I wrote about 110000 items to the column for a particular row. Assume row key is day_timestamp. But when I do a read on the column I get back a list with only 43000 items. Checked with...

  • CQL & Selecting Individual Items From A Map in Cassandra-user

  • I was wondering if anybody could explain the rationale behind disallowing selection of individual elements from a map in CQL and why an entire map must be retrieved at once when items are stored as distinct columns? Are there any plans to allow individual selection? Liam Stewart :: [email protected]..

  • Store Individual Inventory Items In A Table, How To Assign Them Correctly in Cassandra-user

  • Say I have 100 products in inventory, instead of having a counter I want to create 100 rows per inventory item. When someone purchases a product, how can I correctly assign that customer a product from inventory without having any race conditions etc? Thanks....

  • Selectinmg Most Recent Dates From Multiple Table Items in Mysql-general

  • This seems similar to something I'm working on which I haven't sorted out to my satisfaction yet. We have a system that collects and stores data that is time stamped in a mysql database. There is data from 40 sensors and it does not arrive at exactly the same time so each sensor reading and it's time stamp are stored. The table contains a date/time, the sensor ID, and the value. it...

  • Finding Items ? in Ubuntu-users

  • Is there a app with a gui interface, that can search, any hard drive, find the item or items that I am looking for... and able to delete or move those found items. (GUI) Thanks- Rich...

  • Question On The Profiler in Mongodb-user

  • Can someone explain what the "remove query" items imply? Just looking to squeeze out any inefficiencies, if possible... > db.system.profile.find( { millis : { $gt : 50 } } ) { "ts" : "Wed Oct 13 2010 22:11:26 GMT-0400 (EDT)", "info" : "query mdalert-production.$cmd ntoreturn:1 command  reslen:64 bytes:48 85ms", "millis" : 85 } { "ts" : "Wed Oct 13 2010 22:12:18 GMT-0400 (EDT)", "info" : "...

  • Is There A Way To Get The First 'n' Children Of A Node? in Zookeeper-user

  • We are building up some quite large (100,000) lists of items under a node. Each item is small but there are a lot of them. I need to process these items but certainly don't need to get the full list at a time. I don't see any way to get 'some' of the nodes under a node. When I ask for the full list zk fails with a "Packet len xzy is out of range!" error. I could work around this by adjusting...

  • Take The First N Items Of An Iterator in Python

  • I thought there was an iterator in itertools for taking the first n items of an iterator, then halting, but I can't find it. Did I imagine such a tool, or am I missing something? Steven...

  • Finding Rows With Common Items in Mysql-general

  • Hi all, Lets say I have a table with two colums: 'orderid' and 'productid'. This table contains line items for purchases in a store where a purchase could be 1 or more line items (1 or more rows). A typical order may look like this: orderid | productid 12345 | 9876 12345 | 6789 12345 | 7698 Is there a simple way to query the table to pull orders which have 2 or more products...

  • Exploiting The Last Visited Items in Mahout-user

  • Hello, I played around a little bit with recommendations for anonymous users. Therefore I have simply build a preference array based on the recently visited items, like it is explained in "Mahout in Action". This seems to work out pretty well since the recent items perfectly relate the user's latest interests. However, now I want to include the most recent visited items into my main recommender...

  • My Noob Question Of The Week - Is It Possible To Filter Out Array Items In A Find Query? in Mongodb-user

  • Consider the following: > db.catalogs.insert({_id:1,title:'cat1',items:[{title:'item1',private:true},{title:'item2',private:false}]}) > db.catalogs.insert({_id:2,title:'cat2',items:[{title:'item3',private:true},{title:'item4',private:false}]}) Is there a way to get back the documents (by any query criteria), but have the inner-array filtered by a field (privacy in this case)? the embedded-document...

  • TOP N Items in Hadoop-common-user

  • Hello , I am new to Hadoop.Can anybody suggest any example or procedure of outputting TOP N items having maximum total count, where the input file has have (Item, count ) pair in each line . Items can repeat. Thanks Neil http://neilghosh.com -- Thanks and Regards Neil http://neilghosh.com...

  • Finding Missing Items In A Series in Mysql-general

  • I have a mysql table like this mysql> describe reports; +---------+---------+------+-----+------------+-------+ | Field | Type | Null | Key | Default | Extra | +---------+---------+------+-----+------------+-------+ | rptdate | date | | PRI | 0000-00-00 | | | id | int(11) | | PRI | 0 | | | summary | text | | | | | +---------+---------+------+-----+------------+-------+ ...

  • Finding Items Not In 2 Lists/dictionaries in Python

  • common items in 2 dictionaries. How would I find all of the items that are *not* in both dictionaries (or lists)?...

  • Keeping The First N Items Of A List In Memory And The Rest On Disk in Redis-db

  • Hi, Currently developing a real-time chat with history (like Facebook). I think Redis fits for this use case; however when the data grows, memory will not be enough because of many messages being stored; it will be unnecessary to keep all the messages in memory. Is there any way to keep only first 1000 items of a List in memory and overflow the rest to the disk? Thanks...

  • Tkinter Items Question in Python

  • Hello, I'm making a little drawing tool using Tkinter, where the user can delete the last item by right-clicking with the mouse (and by continuing to click clean the whole canvas). The following method (of my canvas class) does just what I want: def DeleteLastItem(self,event): items = self.find_all() if items: self.delete(items[-1]) But, I was wondering, if there is a more direct...

  • [[email protected]] SSL & Nonsecure Items in Httpd-users

  • I'm currently wrestling with the set up of SSL on Apache (on Windows) and I have the server up and running and accepting https connections etc. When I first connect to the website from my browser (IE in this case) I get the warning regarding the certificate (I have set up a test certificate), which is fine. However every page I subsequently go to I get a "do you want to display nonsecure items" dialog...

  • [[email protected]] Apache 2.2.26 > Mod_headers > Header Onsuccess Unset Age > Not Working in Httpd-users

  • Hi all, Is it impossible for mod_headers to remove the "Age" header from items served out of mod_cache? (Example below) Thanks, Geoff Millikan Request URL: http://www.t1shopper.com/ssi/broadband.css Request Method: GET Status Code: 200 OK Accept:text/css,*/*;q=0.1 Accept-Encoding:gzip,deflate,sdch Accept-Language:en-US,en;q=0.8 Connection:keep-alive Cookie:tsmc=no%20id Host...

  • Iterating Over The Data Items In A Dict. in Python

  • for key, value in dict.items(): # it is a function - call it! Oleg. Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN....

  • Iterating Over The Data Items In A Dict. in Python

  • I would like to iterate over the data items, not the keys, in a dict. The closest I can get is: for n in mydict.keys(): i = mydict[n] do_something_useful(i) However, this seems a bit awkward. Big Gaute http://www.srcf.ucam.org/~gs234/ Yow! Are we wet yet?...

  • Changes To Shelf Items Fail Silently? in Python

  • shelftest.py: #!/usr/bin/python import shelve complex_example = shelve.open("/tmp/complex_example_shelf") complex_example["a"] = [] print complex_example["a"] complex_example["a"].append("b") print complex_example["a"] complex_example["a"].extend(["b"]) print complex_example["a"] complex_example["a"] = complex_example...

  • Read N Items From Topic in Incubator-kafka-users

  • hi, How do I read N items from a topic? I also would like to do this for a consumer group, so that each consumer can specify an N number of tuples to read, and each consumer reads distinct tuples. Thanks, Josh...

  • Place N Indistinguishable Items Into K Distinguishable Boxes in Python

  • Hi, I need a generator which produces all ways to place n indistinguishable items into k distinguishable boxes. For n=4, k=3, there are (4+3-1)!/(3-1)!/4! = 15 ways. (0,0,4) (0,4,0) (4,0,0) (0,2,2) (2,0,2) (2,2,0) (0,1,3) (0,3,1) (3,0,1) (3,1,0) (1,1,2) (1,2,1) (2,1,1) The generator needs to be fast and efficient. Thanks....