Developer Center

Resources to get you started with Algorithmia

Acquiring Data for Document Classification


Available on GitHub.

<!DOCTYPE html>

sample-apps/ at master · algorithmiaio/sample-apps · GitHub
Cannot retrieve contributors at this time
34 lines (20 sloc) 1.48 KB

Retrieve data from a public API, then train the Document Classifier to predict keywords for new text samples

Algorithmia's Document Clasifier lets you train it on a set of documents (blocks of text), each associated with a keyword. Once it has been trained, you can then give a new document and it will return a set of predicted keywords.

For the full blog post related to this recipe, see

Getting Started

Create a free Algorithmia account, and install the Algorithmia Python client and BeautifulSoup:

pip install algorithmia
pip install beautifulsoup4

Detailed instructions can be found in the blog post.

How To Run the Script

First, edit the script and replace your_api_key with your Algorithmia API Key

Use the command line, and navigate to the folder with your Python file and run:


This sample used PubMed data, but to go further, modify the script to use a different datasource API or a webpage scraper such as

Built With

You can’t perform that action at this time.