mheimann

mheimann / WordOperations / 0.1.0

README.md

word2vec is a celebrated word embedding that represents each word as a vector whose features are intended to maximize the predictive accuracy of the given word predicting the appearance of other words that do in fact occur in similar contexts.  This algorithm is based on the Gensim implementation of word2vec, trained on a Google News corpus of over 100 billion words (freely available and linked above).  It was observed on very large corpuses that words would demonstrate linguistic regularities, such as "vec(king) - vec(man) + vec(woman) ≈ vec(queen)".

This algorithm allows the user to perform operations on words, either mathematically (word1 - word2 + word3 = ?), or in the popular academic format of analogies (word1 is to word2 as word3 is to ?).  In the format provided, mathematical symbols must be preceded by a double backslash (in case one of the words for some reason is a mathematical symbol).