paranoia / FpGrowth / 0.2.0

Java implementation of the Frequent Pattern Growth (FP-Growth) algorithm, which is a scalable method for finding frequent patterns within large datasets. For example, it could be used to find Association Rules and develop collaborative-filtering  systems, such as "Other people also bought"...

 The algorithm takes three arguments:

  • Dataset: path to a local (data://...) dataset, where each line represents a single transaction and each item is separated by whitespace. (See Example file).
  • Support: represents the minimum frequency for a pattern to be recognized. In most cases you'll want to increase this number to reduce the size of the output.
  • MinItems: represents the minimum number of items (per association rule). Having the value of this argument as 1 will return each unique item in the dataset and the number of times they appeared. Most applications would require a number higher than 1.
  • Output: optionally specify a local (data://...) location to which the output JSON should be written. This is required for result sets exceeding 10Mb in size.

This algorithm was featured in the Algorithmia Blog Post: "Mining Product Hunt, Part 2: Building a Recommendation Engine".