Trying to come up with a relative measure of popularity for items in a recommender. Something that could be used to rank items.
The user - item preference matrix would be the obvious thought. Just add the number of preferences per item. Maybe transpose the preference matrix (the temp DRM created by the recommender), then for each row vector (now that a row = item) grab the number of non zero preferences. This corresponds to the number of preferences, and would give one measure of popularity. In the case where the items are not boolean youd sum the weights.
However it might be a better idea to look at the item-item similarity matrix. It doesnt need to be transposed and contains the important similarities--as calculated by LLR for example. Here similarity means similarity in which users preferred an item. So summing the non-zero weights would give perhaps an even better relative popularity measure. For the same reason clustering the similarity matrix would yield important clusters.
Anyone have intuition about this?
I started to think about this because transposing the user-item matrix seems to yield a fromat that cannot be sent directly into clustering.
Pat Ferrel 's gravatar image asked Feb 5 2014 at 01:27 in Mahout-User by Pat Ferrel

19 Answers

Well, I think what you are suggesting is to define popularity as being
similar to other items. So in this way most popular items will be
those which are most similar to all other items, like the centroids in
K-means.
I would first check the correlation between this definition and the
standard one (that is, the definition of popularity as having the
highest number of ratings). But my intuition is that they are
different things. For example. an item might lie at the center in the
similarity space but it might not be a popular item. However, there
might still be some correlation, it would be interesting to check it.
hope it helps
Tevfik Aytekin 's gravatar image answered Feb 6 2014 at 12:41 by Tevfik Aytekin
I have always defined popularity as just the number of ratings/prefs,
yes. You could rank on some kind of 'net promoter score' -- good
ratings minus bad ratings -- though that becomes more like 'most
liked'.
How do you get popularity from similarity -- similarity to what?
Ranking by sum of similarities seems more like a measure of how much
the item is the 'centroid' of all items. Not necessarily most popular
but 'least eccentric'.
Sean Owen 's gravatar image answered Feb 6 2014 at 13:15 by Sean Owen
If you look at the indicator matrix (cooccurrence reduced by LLR), you will
usually have asymmetry due to limitations on the number of indicators per
row.
This will give you some interesting results when you look at the column
sums. I wouldn't call it popularity, but it is an interesting measure.
Ted Dunning 's gravatar image answered Feb 6 2014 at 15:14 by Ted Dunning
The problem with the usual preference count is that big hit items can be overwhelmingly popular. If you want to know which ones the most people saw and are likely to have an opinion about then this seems a good measure. But these hugely popular items may not differentiate taste.
So we calculate the important taste indicators with LLR. The benefit of the similarity matrix is that it attempts to model the important cooccurrences.
There is an affect of hugely popular items where they really say nothing about similarity of taste. Everyone likes motherhood and Apple pie so it doesnt say much about us if we both do to. This is usually accounted for with something like TFIDF so I suppose another weighted popularity measure would be to run the preference matrix through TFIDF to de-weight non-differentiating preferences.
If you look at the indicator matrix (cooccurrence reduced by LLR), you will
usually have asymmetry due to limitations on the number of indicators per
row.
This will give you some interesting results when you look at the column
sums. I wouldn't call it popularity, but it is an interesting measure.
Pat Ferrel 's gravatar image answered Feb 6 2014 at 16:16 by Pat Ferrel
Agree - I thought by asking for most popular you meant to look for apple
pie.
Agree with you and Ted that the sum of similarity says something
interesting even if it is not popularity exactly.
Sean Owen 's gravatar image answered Feb 6 2014 at 16:33 by Sean Owen
Rising popularity is often a better match to what people want to see on a
"most popular" page.
The best measure for that in my experience is log (new_count + offset) /
(old_count + offset) where new and old counts are the number of views
during the periods in question and offset is used partly to avoid log(0) or
x/0 problems, but also to give a Bayesian grounding to the measure.
Ted Dunning 's gravatar image answered Feb 7 2014 at 00:35 by Ted Dunning
A velocity measure of sorts, makes a lot of sense for a whats hot list.
The particular thing Im looking at now is how to rank a list of items by some measure of popularity when you dont have a velocity. There is an introduction date though so another way to look at popularity might be to decay it with something like e^-t where t is its age. You can see the decay in the views histogram.
Rising popularity is often a better match to what people want to see on a
"most popular" page.
The best measure for that in my experience is log (new_count + offset) /
(old_count + offset) where new and old counts are the number of views
during the periods in question and offset is used partly to avoid log(0) or
x/0 problems, but also to give a Bayesian grounding to the measure.
Pat Ferrel 's gravatar image answered Feb 7 2014 at 02:38 by Pat Ferrel
If this model can be made Bayesian enough to sample from the posterior distribution of total popularity, then you can use the Thomson sampling trick and sort by sampled total views rather than estimated total views. That will give uncertain items (typically new ones) a chance to be shown in the ratings without flooding the list with newcomers.
Sent from my iPhone
Ted Dunning 's gravatar image answered Feb 7 2014 at 05:43 by Ted Dunning
Didnt mean to imply I had historical view datayet.
The Thompson sampling trick looks useful for auto converging to the best of A/B versions and a replacement for dithering. Below you are proposing another case to replace ditheringthis time on a list of popular items? Dithering works on anything you can rank but Thompson Sampling usually implies a time dimension. The initial guess, first Thompson sample, could be thought of as a form of dithering I suppose? Havent looked at the math but it wouldnt surprise me to find they are very similar things.
While we are talking about it, why arent we adding things like cross-reccomendations, dithering, popularity, and other generally useful techniques into the Mahout recommenders? All the data is there to do these things, and they could be packaged in the same Mahout Jobs. They seem to be languishing a bit while technology and the art of recommendations moves on.
If we add temporal data to preference data a bunch of new features come to mind, like hot lists or asymmetric train/query preference history.
If this model can be made Bayesian enough to sample from the posterior distribution of total popularity, then you can use the Thomson sampling trick and sort by sampled total views rather than estimated total views. That will give uncertain items (typically new ones) a chance to be shown in the ratings without flooding the list with newcomers.
Sent from my iPhone
Pat Ferrel 's gravatar image answered Feb 8 2014 at 16:50 by Pat Ferrel
Thompson sampling doesn't require time other than a sense of what do we now
know. It really is just a correct form for dithering that uses our current
knowledge.
For a worked out version of Thompson sampling with ranking, see this blog:
http://tdunning.blogspot.com/2013/04/learning-to-rank-in-very-bayesian-way.html
The reason that we aren't adding this like cross-rec and other things is
that "we" have full-time jobs, mostly. Suneel is full-time on Mahout, but
the rest are not. You seem more active than most.
Ted Dunning 's gravatar image answered Feb 8 2014 at 20:13 by Ted Dunning
I am not fulltime on Mahout either and have a fulltime job which is unrelated to Mahout. Its just that I have been sacrificing personal time to keep things moving on Mahout.
Thompson sampling doesn't require time other than a sense of what do we now know.  It really is just a correct form for dithering that uses our current knowledge.
For a worked out version of Thompson sampling with ranking, see this blog: http://tdunning.blogspot.com/2013/04/learning-to-rank-in-very-bayesian-way.html
The reason that we aren't adding this like cross-rec and other things is that "we" have full-time jobs, mostly.  Suneel is full-time on Mahout, but the rest are not.  You seem more active than most.
aren’t we adding things like items
Suneel Marthi 's gravatar image answered Feb 8 2014 at 21:05 by Suneel Marthi
That was by no means to criticize effort level, which has been impressive especially during the release.
It was more a question about the best place to add these things and whether they are important. Whether people see these things as custom post processing or core.
The reason that we aren't adding this like cross-rec and other things is
that "we" have full-time jobs, mostly. Suneel is full-time on Mahout, but
the rest are not. You seem more active than most.
Pat Ferrel 's gravatar image answered Feb 9 2014 at 00:53 by Pat Ferrel
I have different opinions about each piece.
I think that cross recommendation is as core as RowSimilarityJob and should
be a parallel implementation or integrated. Parallel is probably easier.
It is even plausible to have a version of RowSimilarityJob that doesn't
support all the different distance measures but does support multiple cross
and direct processing using LLR or related cooccurrence based measures. It
would be very cool if a single pass over the data could do many kinds of co
or cross occurrence operations.
For dithering, it really is post processing. That said, it is also the
single largest improvement that anybody typically gets when testing
different options so it is a bit goofy to not have good support for some
kinds of dithering.
For Thompson sampled recommenders, I am not sure where to start hacking on
our current code.
Ted Dunning 's gravatar image answered Feb 9 2014 at 03:19 by Ted Dunning
Theres been work done on the cross-recommender. There is a Mahout-style XRecommenderJob that has two preference models for two actions or preference types. It uses matrix multiply to get a cooccurrence type similarity matrix. If we had a cross-row-similarity-job, it could pretty easily be integrated and Id volunteer to integrate it. The XRSJ is probably beyond me right now so if we can scare up someone to do that wed be a long way down the road.
Ill put a feature request into Jira and take this to the dev list
BTW this is already integrated with the solr-recommender.
I have different opinions about each piece.
I think that cross recommendation is as core as RowSimilarityJob and should
be a parallel implementation or integrated. Parallel is probably easier.
It is even plausible to have a version of RowSimilarityJob that doesn't
support all the different distance measures but does support multiple cross
and direct processing using LLR or related cooccurrence based measures. It
would be very cool if a single pass over the data could do many kinds of co
or cross occurrence operations.
For dithering, it really is post processing. That said, it is also the
single largest improvement that anybody typically gets when testing
different options so it is a bit goofy to not have good support for some
kinds of dithering.
For Thompson sampled recommenders, I am not sure where to start hacking on
our current code.
Pat Ferrel 's gravatar image answered Feb 14 2014 at 18:56 by Pat Ferrel
I'd like to see cross-recommendations added too.
But I also want some automation of the steps required to build a simple
recommender like the solr/mahout example Ted and Ellen have in their
pamphlet.
Lowering the barrier to entry by providing a sample pipeline would help a
lot of folks get started and hopefully would keep them interested. Perhaps
in examples/bin?
Andrew Musselman 's gravatar image answered Feb 14 2014 at 19:51 by Andrew Musselman
Yes!
But it is very hard to find the time.
Ted Dunning 's gravatar image answered Feb 14 2014 at 20:39 by Ted Dunning
Oh yes. I do have a small team I could enlist to do things like this; is
there a starting point somewhere on Github, Ted?
Andrew Musselman 's gravatar image answered Feb 14 2014 at 20:43 by Andrew Musselman
Note sure if this is what you are looking for. I assume you are talking about Teds paper describing a Solr based recommender pipeline?
Much of the paper was implemented in the solr-recommender referenced below, which has a fairly flexible parallel version of a logfile reader that uses Cascading for mapreduce. It picks out columns in delimited text files. You can choose a constant string for your action id, like purchase or thumbs-up. Then specify the field index for user, item, and action. It assumes strings for all these inputs and creates string-id->Mahout-Integer-id->string-id bidriectional hashmaps as dictionary and reverse dictionary. Everything is scalable except the BiHashmaps, which are in-memory. They arent usually too big for that. There is also a pattern for the input log file names and they are searched for recursively from some root directory.
Caveat emptor: not all the options are implemented or tested. One person has already implemented a scaffolded option and their pull request was merged so feel free to contribute.
It is an example of how to digest logfiles, build Mahout data, and run the recommender. It creates Solr indexing data too but the output of the recommender is up to you to implement. It is a Solr query or a lookup in the Mahout recommender DRM output.
https://github.com/pferrel/solr-recommender
Yes!
But it is very hard to find the time.
Pat Ferrel 's gravatar image answered Feb 14 2014 at 20:45 by Pat Ferrel
Precisely; that's the one, thanks!
Andrew Musselman 's gravatar image answered Feb 14 2014 at 21:33 by Andrew Musselman