This algorithm uses CMU Sphinx open source library to recognize speech in audio files that are uploaded to the Data API or Youtube videos that are licensed under Creative Commons. The difference from /sphinx/SpeechRecognition is that this algorithm makes an attempt to adapt the recognition stats to the specific speaker in the media file. Note that this causes this algorithm to run twice as slow. The models that are used to perform speech recognition are the latest Generic English models published by CMU on their Sourceforge website.
The output is a Json object that contains the following fields:
text: The transcribed text of the audio file
wordtimes: When the actual words were spoken (or silences)
best3: Best 3 guesses for all of the phrases from the file