Algorithmia Blog - Deploying AI at scale

NaNoGenMo + Text Analysis with Algorithmia’s Natural Language Processing algorithms

We’ve just wrapped up November, which means aspiring writers all over the world are frantically typing away in an attempt to finish an entire novel in one month as part of National Novel Writing Month, also known as NaNoWriMo. Each November, participants aim to write 50,000 words on a 30 day deadline–a difficult feat for any writer! NaNoWriMo has been around for quite a long time, but for the last couple of years programmers and digital artists have been participating in a cheeky alternative: NaNoGenMo, or National Novel Generation Month.

Internet artist Darius Kazemi started NaNoGenMo after tweeting the idea in 2013:


This November is the third organized installment of NaNoGenMo and the community keeps growing every year as more and more programmers & artists become interested in the strange intersection of code, language processing, and literature. And because the event is primarily driven by developers, submissions are posted on a Github repo as Issues so that participants can comment on one another’s ideas and help each other create some of the most unique and sometimes nonsensical novels written in November.

In the NaNoGenMo world, “novel” is pretty loosely defined. According to the rules,

“The “novel” is defined however you want. It could be 50,000 repetitions of the word “meow”. It could literally grab a random novel from Project Gutenberg. It doesn’t matter, as long as it’s 50k+ words.”

(And of course, someone did make that 50,000 word “meow” book in 2014!)

Novel generation can be much more complicated than it appears from the outside. Some books integrate with social media by pulling text from twitter to generate dialogue, others go down a recursive rabbithole, and some even generate graphic novels.

Algorithmia is home to a wide variety of algorithms that are a perfect fit for NaNoGenMo. Because I don’t have any background in natural language processing or computational linguistics, I found it was easy to combine algorithms that not only helped me generate my novel, but gave me insights on the texts I used as a basis.

I chose the texts I wanted to work with based on two things: availability in the public domain and to have an interesting author demographic. While there are tons of NaNoGenMo books out there that are based on other texts, I wanted to find a really unique set of texts to base my novel on. I also developed an interest in 19th century American literature after reading Uncle Tom’s Cabin when I was 12. Luckily for me, Project Gutenberg is home to many novels and autobiographies that fit this intersection of interests!

First step: compile a corpus of texts. I chose to go with two sets of 7 books to compare. The first set was composed of primarily slave and emanicpation narratives from Black female authors. While digging around in these texts, I realized that books as seemingly disparate as Little Women were published at the same time. Somehow I have never really thought about how such drastically different worlds were becoming exposed in what we now think of as classic American literature, so I decided it would be interested to compare. The second set of texts are all from white female authors and published around the mid-19th century.

Set one:

  • 1861 – Incidents in the Life of a Slave Girl by Harriet Jacobs
  • 1868 to 1888 (published in serial form) – Trial and Triumph by Frances Ellen Watkins Harper
  • 1868 to 1888 (published in serial form) – Sowing and Reaping: A Temperance Story by Frances Ellen Watkins Harper
  • 1868 to 1888 (published in serial form) – Minnie’s Sacrifice by Frances Ellen Watkins Harper
  • 1868 – Behind the Scenes by Elizabeth Keckley
  • 1891 – From the Darkness Cometh the Light, or, Struggles for Freedom by Lucy Delaney
  • 1892 – Iola Leroy, or Shadows Uplifted by Frances Ellen Watkins Harper

Set two:

  • 1845 – Woman in the Nineteenth Century by Margaret Fuller
  • 1852 – Uncle Tom’s Cabin by Harriet Beecher Stowe
  • 1854 – The Lamplighter by Maria S. Cummins
  • 1854 – Ruth Hall: A Domestic Tale of the Present Time by Fanny Fern (pen name of Sara Payson Willis)
  • 1860 – Rutledge by Miriam Coles Harris
  • 1868 – Little Women by Louisa May Alcott
  • 1869 to 1870 (published in serial form) – An Old Fashioned Girl by Louisa May Alcott
  • 1872 – What Katy Did by Susan Coolidge

Before I started generating my own novel based on these texts, I rolled up my sleeves and got to work on analyzing them. The Algorithmia platform is already full of many text analysis algorithms, so instead of getting lost in learning natural language processing from scratch, it was as simple as choosing an algorithm, passing in my texts, and comparing the results.

Haven’t read any of the books? Don’t worry! The first algorithm I ran on the texts was Summarizer. This algorithm is pretty straightforward to use–input text, get back key sentences and ranked keywords. Read the summaries of Set One and Set Two if you need a literary refresher!

Using the AutoTag algorithm, I set out to discover if there would be a difference in the topics we’d find between the two author demographics. The Autotag algorithm uses a variant of Latent Dirichlet allocation and returns a set of keywords that reprensent the topics in the text. I then took each of the topics returned by the algorithm and classified them into various categories or themes to see if we could find some common threads.


I had suspected that the second set of books would have more domestic related themes, but I was mostly unsurprised that there were no autotagged keywords about race or slavery in that set. Interestingly, specific names as keywords were fairly frequent in both sets, averaging 4.8 out of 8 topics for set one and 5.7 of the topics in set two.

While this algorithm gives us some interesting insights into our texts, it can’t tell us everything and sometimes it can even trick you. For example, I grew suspicious of Sowing & Reaping when the AutoTag algorithm returned that one of the topics was “romaine”. I suspected that this book did not in fact focus on a type of lettuce as a main topic. Since I hadn’t read this specific book, I looked it up–turned out to be the last name of a main character!

After running the AutoTag algorithm on my data sets, I decided it check out Sentiment Analysis. This algorithm uses text analysis, natural language processing, and computational linguistics to identify subjective information in text. It’s also known as opinion mining. The algorithm I used returns a rating of Very Negative, Negative, Neutral, Positive or Very Positive.

Here’s the breakdown of sentiment by book:

Set One Books Sentiment Set Two Books Sentiment
Incidents in the Life of a Slave Girl Negative Woman in the Nineteenth Century Negative
Trial and Triumph Negative Uncle Tom’s Cabin Negative
From the Darkness Cometh the Light Negative The Lamplighter Negative
Sowing and Reaping Negative Ruth Hall Neutral
Minnie’s Sacrifice Negative Rutledge Very Negative
Behind the Scenes Positive Little Women Negative
Iola Leroy Negative What Katy Did Negative

Unsurprisingly, 12 out of 14 of the books I analyzed were Negative or Very Negative. Rough times in the 19th century!

Next, I decided it might be interesting to see what popped up with Profanity Detection. While getting the data into the algorithm and writing the results back to a file was easy, it turns out that profanity detection requires a lot of double checking by hand. I knew that some words that came up were not really profane back then; words like “queer”, “pussy”, and “muff” were innocent in the context of these 19th century texts.

Interestingly, the frequency of racial profanity of the two data sets ended up being relatively similar:



Of course running the algorithm doesn’t give you the full picture since in our second set of data about 95% of the racial profanity came from one book: Uncle Tom’s Cabin. This is unsurprising since it’s the only work in our second set of books that was written by an aboloitionist. However, we still don’t quite get the full picture about profanity in these books: many of the words used were not considered slurs back then, and additionally, within the use of dialogue this kind of language takes on a different dimension. The thing we can learn from an algorithm such as Profanity Detection is that there is a very stark different in who these books focused on as main characters and what kind of world they lived in. Four of the seven books written by white authors had zero instances of these words.

Now, you’ve read though all this and you’ve seen the results from all these different algorithms, you might be thinking to yourself that you don’t know how to do natural language processing so maybe this will be something you put on a project list and try out later. The most amazing part of this project that I haven’t told you yet is this: every single one of the scripts that I wrote to do NLP and text analysis was under 30 lines of code.

Check out the script I wrote for running the AutoTag algorithm:

import Algorithmia
import os
import json

client = Algorithmia.client('my_api_key')
algo = client.algo('nlp/AutoTag/0.1.4')

rootdir = './clean_books/set_one/'
output_file = 'set_one_autotag_results.txt'
results = ''

for subdir, dirs, files in os.walk(rootdir):
for filename in files:
with open(rootdir + filename, 'r') as content_file:
input =
print "Autotagging " + filename
results += filename + "\n\n"
results += json.dumps(algo.pipe(input))
results += "\n\n"
with open(output_file, 'w') as f:


print "Done!"

After analyzing all my books, it was time to generate my novel to complete NaNoGenMo. This was so easy compared to the text analysis! Once again, with just simple API calls, I generated trigram models based on each set of books. I then made book previews based on each trigram model just to see if you could hear a difference in the books generated on these different demographics.

The book preview from Set One:

Dem young uns vil kill you dead than to see you. Well, you would be less unhappy marriages if labor were more women in the midst of her nice pudding, as there are no enemies to good old aunt, and confirm themselves in woods and gloomy clouds hung like graceful draperies. Talk about the streets of the ballot in his land, that those who have fitted their children?

Belle, and I live in such dingy, humble quarters. said Mrs. Underhill, from my own sorrow-darkened home, I did, that he had asked them. Do you remember the incident so well were given to Frederick Douglass contributed $200, besides lecturing for us. The President added: Man is a fair specimen of her negro blood in his friendship, but they may be an old woman entered her home with me? If the vessel had been. Reader, I felt humiliated enough.

The book preview from Set Two:

aw! Yes, said Miss Skinlin she hasn’t the first heir to the female figure. The waves dance bright and happy when I forgot to learn, before which she told me to read and study. My Uncle, with a commanding, What are you better than Kintuck.

It was useless to ask one last word I ran down a corridor as dark and narrow streets or the other.

No Oh, Earth! And no one interfered, and it was. but then strangers came so by letting out all fear and distress and doubts of the damned, as well as bodies. What word Can we not get. I don’t resent the sarcasm, and unsettled most of my observing her to rise. Fortunately the gate swinging in the recesses, chrysanthemums and Christmas roses bloomed as freshly as in her voice, what everybody finds in the streets so, for the best thing was insufferably disgusting and loathsome to me. I said a thing as leisure there.

The most interesting difference I found in the text generated from these different data sets was that the text from Set Two sounded much more formal. The first set of books, the ones written by Black authors, tended to have much more dialogue written in such a way as to let the reader hear the accents and dialects of the time. These words became part of the model to generate text, so as you can see in the first sentence of the Set One preview, the algorithm generated text that still makes a lot of sense even with words that are intended to showcase an accent.

In the end, I decided to create a trigram model based on both sets of text and use that to generate my full length novel. I didn’t have to do any fancy code, I merely made another API call to the Generate Trigram Frequencies algorithm, this time passing in the entirety of my data set. Then, to generate my novel, I wrote a quick script that calls into another algorithm: Generate Paragraph From Trigram. This algorithm uses the trained trigram model to generate paragraphs of text. Since NaNoGenMo requires the book to be at least 50,000 words, I simply wrote a loop that calls the Generate Paragraph algorithm until the total word count of the book reaches the goal:

import Algorithmia
import os
import re
from random import randint

client = Algorithmia.client('my_api_key')
text_from_trigram = client.algo('/lizmrush/GenerateParagraphFromTrigram')
trigrams_file = "data://.algo/ngram/GenerateTrigramFrequencies/temp/all-trigrams.txt"

book_title = 'full_book.txt'
book = ''
book_word_length = 50000

while len(re.findall(r'w+', book)) < book_word_length:
print "Generating new paragraph..."
input = [trigrams_file, "xxBeGiN142xx", "xxEnD142xx", (randint(1,9))]
new_paragraph = text_from_trigram.pipe(input)
book += new_paragraph
book += 'nn'
print "Updated word count:"
print len(re.findall(r'w+', book))

with open(book_title, 'w') as f:


print "Done!"
print "You book is now complete. Give " + book_title + " a read now!"

Even with extra new lines for readability, the code I needed to generate an entire novel with Algorithmia was still under 30 lines! And I ended up generating a really unique, interesting novel without getting lost in the highly technical parts of natural language generation. Now, the text isn’t perfect: sometimes the sentences don’t quite sound right and there isn’t really any sort of story arch, but for such simple code I think it’s pretty good! My favorite part about using 19th century texts as the data set was that sometimes you can’t tell if the generated text is hard to read because it’s generated and doesn’t make much sense or because it sounds so old-timey. My book includes the following gems that just might pass as human-written text:

I have said of human life when I saw the Ohio river, that you shall work.

Still, falsehood may be hearing you. She only ‘spects something. Them curls may make a noise you shall not.

You can read the book, or rather, attempt to read the book, online or you can download it from the repo.

It’s mindblowingly fast and simple to get the power of these algorithms into your hands once they are behind a simple API call. You can see all the other scripts I wrote in the GitHub repo for this project. If you browse around, you’ll see that each script is nearly identical. The only real changes I had to make were replacing the algorithm I was calling and what I named the files to write results to! The Algorithmia platform is an incredibly powerful tool. Instead of spending days, weeks, months learning how to code my own natural language processing and text analysis algorithms, I could just pop my data into a variety of algorithms with simple API calls. No sweat, just results.

A behind the scenes look at the making of our GeekWire ‘Seattle 10′ napkin

We mentioned previously that GeekWire and a panel of Seattle’s top startup leaders selected Algorithmia as one of the 10 most promising startups in Seattle – an incredible honor to say the least. As part of this award, we were asked to translate our business onto a giant six-foot by six-foot cocktail napkin, which will be unveiled tonight to the public at the annual GeekWire Gala at the Museum of History & Industry (MOHAI)

We’re pretty excited with our napkin, and wanted to offer a sneak peek to readers. The napkin was designed by our developer evangelist, Liz Rush, who also happens to enjoy cartooning in her spare time. 

The concept started as a sketch with the different types of personas (users) that benefit from Algorithmia, and a mission statement: “Algorithmia: Bringing together organizations, academics, researchers, hackers, and engineers to unlock the power of algorithms in an accessible, open marketplace.”


We wanted to include the Algorithmia binary tree logo, punch up the mission statement, and move the design more toward a printed circuit board look:


Getting better! Something was missing, though… one of us had “a ridiculous idea to write the text the way you’d call an algorithm in Python.”


Thankfully, somebody knows some Python around here. Our CTO Kenny Daniel straightened things out:


There we go. The final napkin turned out great. We’re most excited about how we communicate the idea that Algorithmia enables developers to create tomorrow’s smart applications today in a first-of-its-kind marketplace for algorithms, which unlocks the building blocks of human intelligence, and provides access to world class scientific research and artificial intelligence in five lines of code or less.



Click the video below for a time-lapse version of the cocktail napkin design process!

Oh, and we even created the algorithm Purpose, which takes an array of your users as a string, and returns the purpose for your organization. Try it for yourself here.

Get Started Building Intelligent, Serverless Apps Using AWS Lambda and Algorithmia

In this walkthrough, we’ll show you how to quickly make a serverless photo app that creates digital art pieces in less than 300 lines of code using AWS Lambda and Algorithmia. We’ll be using the Quadtree Art Generator algorithm to create our art, and push the new image to our S3 bucket automatically:


AWS Lambda is great, because you can run code without provisioning or managing servers. Similarly, Algorithmia let’s you tap into the power of the algorithm economy with just a single API call. Together you can quickly build and deploy serverless apps within minutes.

Ready? Okay, let’s get started.

Step 1: Create Accounts

You’ll need a free AWS account, as well as an Algorithmia account. We provide you with 10,000 credits to get started, which will be more than enough for this demo and beyond.

Step 2: Create Your S3 Bucket

Now we need to create your S3 bucket for this project.

Start by selecting S3 from the AWS dashboard.


Then select “Create Bucket” from the Actions menu. Give your bucket a unique name (remember: only lowercase names, and no spaces), and then select a region where you want this bucket hosted


Once your bucket is created, you want to create two folders: input, and output. The input folder is what Lambda will watch for new images. The images will get processed and then returned to the output folder.

Step 3: IAM Role Configuration

Before we can create a Lambda function, we need to first make an IAM execution role. IAM stands for “Identity and Access Management,” and is an AWS service that helps you control access.

First, go to the AWS IAM Roles page, and select “Create Role.”


Under “Select Role Type,” find and select AWS Lambda. Search for AWSLambdaExecute, and select that.

If you need it, find the complete AWS documentation for this step here.

Step 4: Create the Lambda Function

Create a Lambda function by going to the Services menu in the AWS console, and select Lambda from the list.


Hit the blue “Get Started Now” button. On the “Select Blueprint” page, scroll to the bottom and hit “Skip.”

Give your function a name, description, and set the runtime to Node.js.

Now, copy our SDK from this Gist, into the Lambda function code box below.


Replace ‘YOUR_API_KEY_HERE’ with your Algorithmia API key in the Gist. Your API key can be found on the dashboard of your Algorithmia account.


Let’s walk through the top half of the Gist so we can understand how this works. We first define which Algorithmia algorithm we want to use. In this case we’re using the Quadtree Art Generator:

var algo = "algo://besirkurtulmus/quadtree_art/0.1.x"; 
Then we grab the new image from our S3 bucket
var s3 = new AWS.S3();
var bucket = event.Records[0]
var key = decodeURIComponent(event.Records[0].s3.object.key.replace(/+/g, " "));
var params = {Bucket: bucket, Key: key};
var signedUrl = s3.getSignedUrl('getObject', params);

We process the image, turning it into quadtree art, and upload the image back to our bucket.

var client = algorithmia(apiKey);
    client.algo(algo).pipe(signedUrl).then(function(output) {
        if(output.error) {
            // The algorithm returned an error
            console.log("Error: " + output.error.message);
            // We call context.succeed to avoid Lambda retries, for more information see: 
        } else {
            // Upload the result image to the bucket
            var outputKey = 'output/'+key.substring(key.lastIndexOf('/') + 1);
            var params = {Bucket: bucket, Key: outputKey, Body: output.get()};
            s3.upload(params, function(err, data) {
                if (err) {
                    console.log("Error uploading data: ", err);
                } else {
                    console.log("Successfully uploaded data to bucket");
                    context.succeed("Finished processing");

Got that? Okay, great.

When you’re ready, select the IAM role you created in Step 3. It is best to be on the safe side and adjust Timeout to be maximum, so 5 min, and hit “Next.“

Step 5: Configure Event Sources

Once your function is created, we need to setup the event for Lambda to respond to. Start by clicking the "Event Sources” tab on your functions detail page. Then select “Add event source.”

Select S3 from the event source type drop-down. Select the bucket you created in Step 2. The event type you want to select is “Object Created (All).” We also want to add a prefix to tell Lambda to watch for new images here. In this case, we’ll use the prefix “input/”. Hit submit and you’re done.

Congrats, your AWS Lambda + Algorithmia function is ready to go. Lambda will now listen for new events in your S3 bucket, and automatically pass those images to Algorithmia where they will get processed by the Quadtree Art Generator, and then added back to S3 in the /output folder.

Test this out by logging into your S3 bucket, and navigating to the input folder, and uploading an image. Then, navigate to the output folder, where you’ll have your own piece of digital quadtree art!

Here’s out founders Diego Oppenheimer, and Kenny Daniel before:


…and after quadtree art generation:


What’s Next

You now have a working prototype that uses AWS Lambda and Algorithmia. You could use this same workflow to easily detect and crop photos using Smart Thumbnail, transcribe videos using speech to text, or check images for nudity. Learn more about how to leverage Algorithmia and Lambda here.

In a follow-up guide, we’ll teach you how to create a simple Android photo sharing app for uploading photos to S3, where Lambda will pick them up and turn them into digital art for others to enjoy.

DubHacks Spotlight: intuiti0n Helps Find Seminal Research Papers in Any Field


Algorithmia was on-hand at the second-annual DubHacks hackathon last month, the largest collegiate hackathon in the Pacific Northwest. Over 600 student developers and designers flocked to the University of Washington in Seattle campus to form teams, build projects, and create solutions to real-world problems.

intuiti0n wanted to make the literature review process easier by building a service that finds important research papers across all fields of study. The team was comprised of Nirawit Jittipairoj, Alex Thompson, and Bryant Wong.

We spoke to Bryant Wong from the team, a senior at the University of Washington with a triple major (!) in mathematics, statistics, and computer science, about their intuiti0n hack.

What was the problem you were trying to solve?

“Two of the members of our team have been involved with academic research, which has the goal of trying to push the limits of human knowledge. However, in order to push the limits of human knowledge, you need know exactly what is in that field, which you do with a literature review. However, literature reviews are kind of a Catch-22 – you need to read the most important papers in a field, but because you don’t know what’s in the field, you don’t know what papers to read. As a result, literature reviews are often spent just hunting for papers that appear relevant, and then discarding most of them as they are often only tangentially related to your field. This makes the whole process tedious and extremely inefficient.”

How did you solve this problem?

“We devised an app that centered around extracting data from papers, and used them to generate topics to make targeted searches to find (other) papers. We were taking the abstract and title from a paper, running an NLP algorithm called Latent Dirichlet Analysis (LDA) on it to generate topics, then run those topics through Google Scholar, parsing the results with Beautiful Soup. The user could set a threshold for the number of papers they would like returned so that the algorithm does not run indefinitely. Our heuristic for judging the importance of a paper was not so good, as we used the number of papers that had cited this paper. Obviously this is not a good metric as there are many irrelevant papers that are cited, but we did not have a better concrete heuristic to judge by.”

How did you utilize Algorithmia in your project?

“We used Algorithmia as the backbone for our machine learning and topic generation, as we ran our data through one of the LDA algorithms available on Algorithmia to generate topics. This provided several advantages for us over implementing the algorithm ourselves: 

1) not having to implement a complicated algorithm

2) not having a powerful enough server to run the algorithm (as our local machines were not particularly powerful)

3) simple integration in our Python scripts. 

This was a no-brainer decision and allowed us to have a half-functioning product by the end of DubHacks.”