In this post, we’ll focus on potential use cases. We’ll start with a quick refresher on what this algorithm does, and then look at concrete examples of real world problems that this algorithm can tackle – and why it makes sense for you to give it go.
Let’s talk about supervision for a second. What does supervised learning mean in a data science or machine learning context?
Supervised learning is the act of providing annotated or labeled data to a machine learning model to accomplish a particular task. Generally, that task is related to classification, but it doesn’t have to be. Most supervised machine learning models are trained end-to-end, and that can take time. Even with the miracle of transfer learning finally becoming practically feasible, you still need to fine tune the model for your specialized task – and that takes time too.
Not all tasks make sense to be supervised; content recommendation is a great example of this. In content recommendation, you’re looking to find the most similar object in your dataset to a specimen—labels don’t matter. What’s neat about unsupervised models is they don’t need specialized training to work well. Unsupervised models aren’t trained end-to-end and generally use a latent representation (like a word embedding) of the input data, which is very cheap to compute compared to what’s required with supervised learning.
Semi-supervised learning and one-shot classification
You might be wondering: “what if you add labels to your unsupervised data points? Can you use that for classification?” In short – yes you can.
By simply labeling some points in your unsupervised data set, you can not only classify and recommend content, but also get a one-shot classifier for free!
A one-shot classifier is great, because it doesn’t require multiple passes over your data (which can be super time consuming), and it’s incredibly flexible in how easy it can be to evolve over time to suit your needs.
This is what the document classifier is. It’s a document recommender at heart, but it’s also a ‘1 shot classifier’—and that flexibility lets you do some very powerful things.
Document classifier use cases
Example 1 – customer support
Does your company have a customer support system? If so, you’ve probably noticed that many customer requests tend to sound the same. Some of them might even have the same answer. You might even have a few pre-made responses ready to go for any of the run-of-the-mill questions you get frequently. Wouldn’t it be great if a tool existed to automate many of these manual processes so you can focus on the more challenging support problems? Or maybe you just want to figure out how to automatically assign the support ticket to the right person?
You’re in luck: it doesn’t matter if you’re looking for a tool that can suggest responses or auto assign conversations to key support members, because the document classifier does both. We even created a model using our own customer support data to do this ourselves.
Automating the simple stuff frees up your support team’s time, allowing them to focus on making sure your customers are getting the best customer experience possible.
Example 2 – Legal document discovery
Sifting through discovery documents is legendary in its tedium and difficulty. Organizations in litigation proceedings sometimes dump thousands of unrelated, difficult-to-sift-through documents on opposing consul. Sometimes it’s unintentional, but it’s not uncommon for this to be a tactic used to overwhelm and hide evidence. This process is called
Data Dumping and it can be terrifying for small legal teams.
In SEC v. Collins & Aikman Corp. (S.D.N.Y. 2009), the SEC dumped 1.7 million records (10.6 million pages) on the defendant saying that the defendant could search them for the relevant evidence and asserting that it didn’t maintain a document collection relating specifically to the subjects addressed. As the court correctly noted, Rule 34 of the Federal Rules of Civil Procedure prohibit, “simply dumping large quantities of unrequested materials onto the discovering party along with the items actually sought.” (Source)
But it’s 2018, and machine learning is finally here. Algorithms like the document classifier really can do most of the heavy lifting.
The algorithm can filter e-discovery documents (or scanned and digitized print documents) based on how similar they are to your known, relevant documents fast and automatically. It takes a few steps, but the time savings of going from thousands of documents to hundreds can be a real life saver.
This doesn’t 100% remove the pain of e-discovery data dumps, but it certainly can reduce the pain that your team is feeling, and your bottom line.
Example 3 – text-based content recommendation
Recommendation engines are everywhere, and they can be super valuable for your consumer focused business. But how do you make one?
Product data is primarily text based with tags sometimes dictating what genre of product you’re looking at. Getting your customers to easily find similar products through this text classification strategy is a great way to increase your revenue as well as keeping folks on your site.
The document classifier fits in great as an all-included part of any web based recommendation layer – and solves an important and sometimes difficult part of any business, figuring out what your customers are looking to buy and providing it to them.
The document classifier is a force multiplier: it’s able to improve the productivity of your teams without sacrificing your customer experience.
Why not see if you can use it to solve a problem facing your industry!
If run into any issues feel free to get in touch with us using the little button on the bottom right: we’re always happy to help.
If you’re curious about what other natural language processing (NLP) algorithms we have to offer, take a look at some of these: