amitkumargaur

amitkumargaur / TextSimilarityMeasurement / 0.15.0

README.md

Given two documents/text (strings), this algorithm returns a similarity measurement value  between 0 and 1, 1 for text that are purely same and 0 for that are purely unrelated. It involves transforming each text into a vectors in a k - dimensional space model, then compute the cosine similarity ( i.e. dot product of the vectors) between them.

This algorithm is very useful in content based recommendation engine for recommending products having similar attributes like title, materials, fabric, color, care tips, patterns for the ecommerce domain.

Ex:-
Input:-
Suppose, I have two products (taken from fashion sites) like
http://www.limeroad.com/olive-green-cotton-kurti-mystique-india-p10083527?df_type=new_home, http://www.limeroad.com/green-cotton-kurti-mystique-india-p10083532 having title /description

["olive green cotton kurta",  "green cotton kurta"]

Similarity Index:-

0.8660254037844387

So, for a particular items, one can recommend similar/related items from the large datasets.