Locality-sensitive hashing for near-duplicate detection

Description
<span></span><p dir="ltr"><span>Locality-sensitive hashing (LSH) is a method that is used to reduce the dimensionality of data. In contrast to regular hash functions, its aim is to maximize the possibility of collisions between similar items.</span></p><br/><p dir="ltr"><span>Various applications include image similarity and audio similarity, but a potentially simpler (while still very useful) problem is near-duplicate detection. The algorithm should take a piece of text (as a string) and output a hash, such that two near-duplicates (texts having only a small number of differences) will have the same hash. For more information see</span></p><a href="http://en.wikipedia.org/wiki/Locality-sensitive_hashing"><span>http://en.wikipedia.org/wiki/Locality-sensitive_hashing and http://www2007.org/papers/paper215.pdf</span></a>
Discussion
  • {{comment.username}}
Status
Active
submission(s) pending review
Bounty expires in
Bounty expired
Bounty
0
Tags
(no tags)