Affinity Analysis for Market Basket Recommendation

<span></span><p dir="ltr"><span><b>The Problem</b></span></p><p dir="ltr"><span>Affinity analysis is an analytical technique that aims to discover relationships between activities and preferences that pertain to specific individuals. Based on recorded information, after the analysis, future behavior can be statistically predicted. For a general overview see </span><a href=""><span></span></a><span>. </span><span>Specific applications include clickstream analysis and market basket analysis.</span></p><br/><p dir="ltr"><span>One important area of application is market basket analysis, which has widespread use in planning promotions, designs and sales strategies. Market basket analysis is necessarily somewhat open-ended, but one of the more useful angles of attack is the extraction of association rules </span><a href=""><span></span></a><span>. </span><span>Ultimately we wish to be able to use the set of items purchased (or otherwise accessed) by a user and recommend other items that they have an increased probability of being interested in, however, this should probably be a separate algorithm.</span></p><br/><p dir="ltr"><span><b>The Interface</b></span></p><p dir="ltr"><span>The program should take a DataAPI url to a file with one session per line. A session represents the entities that were bought/used/visited in a single recorded event. This could be the urls seen in a given browsing session or items bought in a single visit to a store. For example</span></p><p dir="ltr"><span>bread milk eggs</span></p><p dir="ltr"><span>beer diapers</span></p><p dir="ltr"><span>bread bottled_water hot_dogs lemonade</span></p><p dir="ltr"><span>The items are not ordered and there is no customer identification. For simplicity, you may ignore multiple item purchases - two or more loaves of bread just count as one bread purchase. The items are separated by whitespace, and the only constraint on the format is that items but be uniquely identifiable by the string and the string may contain no whitespaces or quotation marks.</span></p><p dir="ltr"><span>At minimum the algorithm should return frequent itemsets, preferably with some weight denoting each itemset&#8217;s prevalence. Even better would be full association rule mining, also with weights to indicate prevalence.</span></p><p dir="ltr"><span><b>The Algorithm</b></span></p><p dir="ltr"><span>This is necessarily somewhat open-ended. Mahout includes an implementation of FP-Growth, but it requires hadoop configuration,which is an unacceptable degree of complexity for many users.</span></p><p dir="ltr"><span><a href=""></a></span></p><p dir="ltr"><span>A decent example of this is</span></p><p dir="ltr"><a href=""></a></p>
Fulfilled By
  • A scalable method for finding frequent patterns within large datasets
    • requests
  • {{comment.username}}
Bounty expires in
Bounty expired
(no tags)