This algorithm uses CMU Sphinx open source library to recognize speech in audio files that are uploaded to the Data API or Youtube videos that are licensed under Creative Commons. The models that are used to perform speech recognition are the latest Generic English models published by CMU on their Sourceforge website. After doing a Youtube search with keywords on www.youtube.com, you can open filters and select Creative Commons to see what videos are available.
The output is a Json object that contains the following fields:
text: The transcribed text of the audio file
wordtimes: When the actual words were spoken (or silences)
best3: Best 3 guesses for all of the phrases from the file