PetiteProgrammer / TextSimilarity / 1.0.0
README.md
This algorithm will compare documents (can be any kind of document) and report which documents are the most similar.
Some examples this algorithm could be used for:
- Plagiarism detection (natural language, programming source, etc.)
- Removal of similar copies within some directory
- Analysis and clustering of documents.
Example
Input:
{ "files": [ ["doc1", "this is an example input"], ["doc2", "this is another example input"], ["doc3", "the third document is not like the others"] ] }
Output:
[ [0.6825611979794738, "doc1", "doc2"], [0.1303428532021814, "doc2", "doc3"], [0.05714684431258296, "doc1", "doc3"] ]
Input
argument type description files [[String, String]] list of document id's and document content num_results Int (optional) number of results, default = 100 (less if less document pairs can be computed)
Output
[[Float, String, String]]: Similarity value, document id 1, document id 2.
Contents