The Language Identification microservice from Algorithmia is a straightforward API which accepts a piece of text, and attempts to identify the natural language in which it is written.
This simple Python script will examine all the .txt and .docx files in a directory, identify the language of each file, and move them into subdirectories according to their ISO 639 language code (‘en’, ‘fr’, etc).
For the full blog post related to this recipe, see Build Your Own Language Detection Microservice.
Create a free Algorithmia account, and install the Algorithmia Python client and the python-docx package:
Detailed instructions can be found in the blog post.
How To Run the Script
First, edit the script and replace
your_api_key with your Algorithmia API Key
/some/file/path/ with a local directory which you wish to examine.
Use the command line, and navigate to the folder with your Python file and run: