sphinx / AdaptiveSpeechRecognition / 0.3.1


This algorithm uses CMU Sphinx open source library to recognize speech in audio files that are uploaded to the Data API or Youtube videos that are licensed under Creative Commons. The difference from /sphinx/SpeechRecognition is that this algorithm makes an attempt to adapt the recognition stats to the specific speaker in the media file. Note that this causes this algorithm to run twice as slow. The models that are used to perform speech recognition are the latest Generic English models published by CMU on their Sourceforge website

The first input to the algorithm is the link to the media file (either a Data API url or a Youtube video url). There is an optional second input that points to a .tar.gz folder in the Data API that includes a new language model that you trained. The folder structure should be flat, including the .lm.dmp file (language model file) and the .dict file (dictionary file). The files that are required for the acoustic model should also be there (means, mixture_weights, etc).

The output is a Json object that contains the following fields:

text: The transcribed text of the audio file

wordtimes: When the actual words were spoken (or silences)

best3: Best 3 guesses for all of the phrases from the file

Warning: Please note that depending on the length of the media file, if you are using the website console, it might time out if the input is longer than 4 minutes.