sphinx / SpeechRecognition / 0.4.3


This algorithm uses CMU Sphinx open source library to recognize speech in audio files that are uploaded to the Data API or Youtube videos that are licensed under Creative Commons. The models that are used to perform speech recognition are the latest Generic English models published by CMU on their Sourceforge website.  After doing a Youtube search with keywords on www.youtube.com, you can open filters and select Creative Commons to see what videos are available.

The first input to the algorithm is the link to the media file (either a Data API url or a Youtube video url). There is an optional second input that points to a .tar.gz folder in the Data API that includes a new language model that you trained. The folder structure should be flat, including the .lm.dmp file (language model file) and the .dict file (dictionary file). The files that are required for the acoustic model should also be there (means, mixture_weights, etc).

The output is a Json object that contains the following fields:

text: The transcribed text of the audio file

wordtimes: When the actual words were spoken (or silences)

best3: Best 3 guesses for all of the phrases from the file

Warning: Please note that depending on the length of the media file, if you are using the website console and the video is longer than 4 minutes, it might time out.