I have an application that need to find out the n most recent modified
files for a given user id. I started out few tables but still couldn't get
what i want, I hope someone get point to some right direction...
See my tables below.
#1 won't work, because file_id's timeuuid contains creation time, not the
modification time.
#2 won't work, because i can't order by a non primary key
column(modified_date)
#3,#4 although i can now get a time series of modification time of each
file belongs to a user, my return list may still not accurate because a
single directory could have lot of modification changes. I basically end up
pulling out series of modification timestamp for the same directory.
Any suggestion?
Thanks
#1
CREATE TABLE user_file (
user_id uuid,
file_id timeuuid,
PRIMARY KEY(user_id, file_id)
);
#2
CREATE TABLE user_file (
user_id uuid,
file_id timeuuid,
modified_date timestamp,
PRIMARY KEY(user_id, file_id)
);
#3
CREATE TABLE user_file (
user_id uuid,
file_id timeuuid,
modified_date timestamp,
PRIMARY KEY(user_id, file_id, modified_date)
);
#4
CREATE TABLE user_file (
user_id uuid,
modified_date timestamp,
file_id timeuuid,
PRIMARY KEY(user_id, modified_date, file_id)
);
Jimmy Lin 's gravatar image asked Jul 9 2013 at 23:51 in Cassandra-User by Jimmy Lin

4 Answers

What you described this sounds like the most appropriate:
CREATE TABLE user_file (
user_id uuid,
modified_date timestamp,
file_id timeuuid,
PRIMARY KEY(user_id, modified_date)
);
If you normally need more information about the file then either store that as additional fields or pack the data using something like JSON or Protobuf.
Not sure I understand the problem.
Cheers
Aaron Morton
Freelance Cassandra Consultant
New
aaron morton 's gravatar image answered Jul 11 2013 at 00:00 by aaron morton
what I mean is, I really just want the last modified date instead of series
of timestamp and still able to sort or order by it.
(maybe I should rephrase my question as how to sort or order by last
modified column in a row)
CREATE TABLE user_file (
user_id uuid,
modified_date timestamp,
file_id timeuuid,
PRIMARY KEY(user_id, modified_date)
);
e.g user1 update file A 3 times in a row, and update file B, then update
file A again.
insert into user_file values(user1_uuid, date1, file_a_uuid);
insert into user_file values(user1_uuid, date2, file_a_uuid);
insert into user_file values(user1_uuid, date3, file_a_uuid);
insert into user_file values(user1_uuid, date4, file_b_uuid);
insert into user_file values(user1_uuid, date5, file_a_uuid);
#trying to get top 3 most recent changed files
select * from user_file where user_id=user1_uuid limit 3
using CQL, I will get 3 rows back(all file a)
(user1_uuid, date1, file_a_uuid);
(user1_uuid, date2, file_a_uuid);
(user1_uuid, date3, file_a_uuid);
what I want is (file a AND file b)
user1_uuid, date1, file_a_uuid
user1_uuid, date4, file_b_uuid
So how do I order by/sort by last modified column in a row?
thanks
Jimmy Lin 's gravatar image answered Jul 11 2013 at 00:39 by Jimmy Lin
Thanks for the suggestion.
I don't care the history of the update time to a file, BUT I do want to
ordered by it.
Reason for that is, without that, and if I have 10k+ file belongs to a
user, I have to fetch all the last modified time of all these 10k+ file and
sort through them in my application and only return the top N. Kind of
expensive.
I would like to see if it is possible to rely on Cassandra native storage
to achieve this.
CREATE TABLE user_file (
user_id uuid,
file_id timeuuid,
last_modified_time timestamp,
PRIMARY KEY(user_id, file_id)
);
select * from user_file where user_id=user1_uuid order by
last_modified_time limit 10
Above CQL would be invalid, because last_modified_time is not part of the
compound key, and is not allowed to used for order by purpose.
Jimmy Lin 's gravatar image answered Jul 11 2013 at 01:10 by Jimmy Lin
I think there is not an extremely simple solution to your problem. You
will probably need to use multiple tables to get the view you need. One
keyed just by file UUID, which tracks some basic metadata about the file
including the last modified time. Another as a materialized view of the
most recently modified files.
When a user updates the file, you'd need to read the current last_modified
time for that file and delete its value out of the most_recently_modified
table before inserting it back in, and updating the last_modified on the
files table.
This is a little bit fragile because it depends on reading then modifying
based on that result - and that's a typical antipattern for
eventually-consistent databases.
You might consider using a column on the user table using the List type
which keeps track of the most recently modified files for that user, treat
it like a queue and pop off the oldest ones each file write. This still
ends up being read-then-write, but presumably it is less prone to race
conditions because the user is not modifying many files at the same moment
in time, while many users could be modifying the same file at the same
moment. So it still falls under the antipattern, but at least the failures
will be less likely.
Eric Stevens 's gravatar image answered Jul 11 2013 at 05:55 by Eric Stevens

Related Discussions