ge

ge / FastStringComparator / 0.2.0

README.md

GE logo

Overview

FastStringComparator from GE Licensing is a fast measure of string similarity.

This metric calculates the common characters between strings and their ordering to produce a similarity value in the range of [0, 1], 0 indicating the two strings are identical and 1 indicating the two strings do not share any common characters.

FastStringComparator runs in linear O(m+n) time, where m and n are the length of the input strings. This is asymptotically faster than other common string metrics such as Levenshtein Distance which run in O(mn) time.

US Patent 9,269,028
https://patents.google.com/patent/US9269028B2/en?oq=US9269028B2

Applicable Scenarios and Problems

This algorithm works well when comparing large strings. For example, when comparing Dr. Suess's classic "Green Eggs and Ham" to the novel "Moby Dick" the standard Levenstein algorithm requires 10 seconds of compute time on Algorithmia, while FastStringComparator takes less than a second. Large DNA sequences, Cybersecurity strings and other applications are ideal for GE's Fast String Comparator.

Usage

Input

ParameterDescription
AFirst String to Compare
BSecond String to Compare

What data pre-processing would be great to perform on the input before calling this algorithm?

Output

A 0 means the strings are exactly the same A 1 means the strings are nothing alike

Examples

Example 1

Compare two strings with to characters out of place

Input

{"A":"Looter", "B":"Boofer"}

Output

"String Similarity: 0.3333333333333333"