README.md
Fast Algorithm to compute the most common prefixes in a large dataset.
AKA "Starting Pattern Occurrence Frequency"
Results ranked by the most common.
- Next letter prediction
- Word completion
- DNA sequencing
- Protein sequencing
- Computational linguistics
- Compression algorithms
Optional parameters:
- minScore - result frequency cutoff [default: 2]
- minLength - minimum prefix length [default: 1]
- startsWith - fixed prefix filter, useful for predictions (not used by default)
- maxResults - return at most this many results (by default return all matching)
Examples:
{
"minLength": 4,
"maxResults": 10,
"dataset": ["John", "William", "James", "Charles", "George", "Frank", "Joseph", ...]
}
Returns the top10 Baby Name prefixes, minimum length 4, from a 20th century US Baby Names list (2.5Mb), example dataset trimmed, result:
{"Mari": 1941, "Fran": 1420, "Chris": 1227, "Chri": 1227, "Juli": 1167, "Will": 1151, "Char": 1066, "Christ": 1057, "Marg": 1041, "Kath": 983}
Sample input and output against a "buzzwords" list: