Given two documents/text (strings), this algorithm returns a similarity measurement value between 0 and 1, 1 for text that are purely same and 0 for that are purely unrelated. It involves transforming each text into a vectors in a k - dimensional space model, then compute the cosine similarity ( i.e. dot product of the vectors) between them.
This algorithm is very useful in content based recommendation engine for recommending products having similar attributes like title, materials, fabric, color, care tips, patterns for the ecommerce domain.
Suppose, I have two products (taken from fashion sites) like
http://www.limeroad.com/olive-green-cotton-kurti-mystique-india-p10083527?df_type=new_home, http://www.limeroad.com/green-cotton-kurti-mystique-india-p10083532 having title /description
So, for a particular items, one can recommend similar/related items from the large datasets.