Parsey McParseface

Let’s play a game: can you tell the difference between these two sentences?

“Most of the time, travellers worry about their luggage.”

“Most of the time travellers worry about their luggage.”

Whoa, remove the comma and all of a sudden we’re having an entirely different conversation!

The little nuances of language can be hard enough for a human to understand, let alone a computer! How could we possibly teach a computer to understand the difference?

Language is the heart of most human endeavors—being able to effectively communicate is one of the most important socioeconomic indicators of success. We discussed the process of teaching computers language in our NLP blog post, but we primarily discussed complete tools, not the underlying components to create them.

Introduction to Syntaxnet

saw man

First off, let’s do something different: we’re going to start by describing what Syntaxnet isn’t.

Syntaxnet isn’t going be able to tell you what someone’s sentiment was in a tweet (Sentiment Analysis), nor is it capable of extracting street addresses from written documents (NER). Syntaxnet is not a high level magic bullet for every big NLP task. What it is, however, is a force multiplier.

Most natural language tools are trained to grok semantic meaning by focusing on word order and the removal of so-called “stop words.” This works pretty well but it obviously loses resolution.

Consider what was at one point the state-of-the art of document classification.

The common most approaches loosely follow this procedure:

  1. define your corpus and split it up into separate “documents”
  2. use TF-IDF to extract important keywords from your documents, ignoring the others
  3. extract latent word representations by throwing keywords into word2vec
  4. either average word vectors on a per document basis, or storing them individually to calculate word movers distance
  5. Store the “document vectors” together in latent space along with a classification label (if any), potentially with a clustering step
  6. On inference / evaluation, perform the first 3 steps for each document, and use an algorithm such as K-nearest neighbours, or non-negative matrix factorization to detect the most likely label

Our existing Document Classifier and Facebook’s FastText both use similar procedures, as do most other NLP libraries and open source tools. Here are some tutorials and papers for further reading: example 1, example 2, example 3

Did you notice yet why the standard approach might not yield the best results?

We’re completely ignoring grammar and linguistic syntax of a sentence, and only considering “rare” words which in some cases can result in nonsense! You may say “Sure James, this might not work in every case, but it should get pretty close most of the time right?”

For any predictive system, often the important consideration is the difference to a baseline predictor, rather than the absolute accuracy. A model that predicts that the weather will be the same as yesterday will often be accurate, but it’s not adding any value.
The thing about dependency parsing is that about 80% of the dependencies are very easy and unambiguous, which means that a system that only predicts those dependencies correctly is injecting very little extra information, that wasn’t trivially available by just looking at each word and its neighbours.

For some applications like the development function / info bots such as Alexa or the Google Assistant a simple, heuristic approach gets you “good enough” results. These bots have been designed to understand and process simple requests with basic syntax structures like “Alexa, order me a pizza,” or “Hey Google, call an Uber to take me home.”

Those two examples probably worked, but try saying this to a google home bot: “Hey Google, order me a ride home, but pick the cheapest ride between Lyft and Uber.”. How did that turn out? Is there any way you can rephrase that so that Google Home might understand?

If these NLP bots were trained with a powerful syntax parser like Syntaxnet, it’s possible that we could train a bot that’s capable of answering more linguistically challenging requests, even the example above.

I’m intrigued: What is Syntaxnet?

Syntaxnet is a tensorflow based syntax parsing framework developed by Google. Within this framework, the fantastic development team created a whimsically named pre-trained model called Parsey McParseface. It’s this pre-trained model that makes our algorithm work!

“What’s a Syntax Parser?” you ask. We can split up that definition into two components: a dependency mapper, and a part of speech tagger.

A Dependency map is a syntactic structure that defines how words relate to each other as a labelled directed acyclic graph.

dependency map
– image courtesy of

In basic terms, it splits up a sentence into component phrases with arrows pointing between them, eventually even breaking down phrases into words. These arrows contain “dependency” labels like negation or contraction, and compound – but also labels like preposition, phrase object and appositional modifier are frequently found. You can find a full list of available modifiers here. What’s great about capturing dependency labels is that now we’re able to break up sentences into separate components, and find patterns that can help translate natural language into machine readable directives.

Great, so that’s the basics of a dependency map… but what about Parts of Speech Tagging? Parts of Speech provide us with even more information about how words are used in a sentence. Let’s revisit our two earlier examples, and compare the two Parts of Speech Tags (tree format only outputs Parts of Speech tags, so we can more cleanly spot the differences):

"src":"Most of the time, travellers worry about their luggage.",

Input: Most of the time, travellers worry about their luggage.


+-- travellers NOUN++NNS nmod
+-- of ADP++IN case
+-- the DET++DT det
+-- time, ADJ++JJ amod
+-- worry ADJ++JJ acl
+-- luggage. NOUN++NN nmod
+-- about ADP++IN case
+-- their PRON++PRP$ nmod:poss

"src":"Most of the time travellers worry about their luggage.",

Input: Most of the time travellers worry about their luggage.


+-- travellers NOUN++NNS nmod
+-- of ADP++IN case
+-- the DET++DT det
+-- time NOUN++NN compound
+-- worry ADJ++JJ acl
+-- luggage. NOUN++NN nmod
+-- about ADP++IN case
+-- their PRON++PRP$ nmod:poss

Notice the difference a comma makes? The word time can mean different things, depending on context. In the phrase Most of the time, time is an adjectival modifier and it modifies the word most. However in the second case, time forms a compound word with traveller, creating time traveller which is a pluralized noun.

Great, so this tool captures both the parts of speech, and the dependency relationships between phrases. Lets talk about how we can incorporate this into a pipeline.

How to get started using Syntaxnet

Getting started with syntaxnet is easy – we should note that it takes around 4 seconds to load, so processing in batch mode is most efficient.

Lets take a look at a quick python example:

import Algorithmia
input = {
"src":"Algorithmia is a marketplace for algorithms. The Technological Singularity will transform Society.",
client = Algorithmia.client('YOUR_API_KEY')
result = client.algo('deeplearning/Parsey/1.0.4').pipe(input).result['output']
sentence_proper_nouns = []
for sentence_data in result['sentences']:
for word in result['words']:
if word['universal_pos'] == "PROPN":
# [Algorithmia, Society]


Now we’re able to easily extract syntactic components from our sentences! We’ve gained a powerful new tool that can extend our natural language processing tools and make them even better.

Did we whet your appetite for NLP? Here’s a list of more NLP algorithms on our platform:

James Sutton