![]() By exposing our acoustic and language models to increasing amounts of real-world data, we expect to bridge the gap in performance between the ‘lab setting’ and the deployed service. Saon added, “We are currently working on integrating these technologies into IBM Watson’s state-of-the-art speech-to-text service. “The models were trained on 2000 hours of publicly available transcribed audio from the Switchboard, Fisher, and CallHome corpora.” “On the acoustic side, we use a fusion of two powerful, deep neural networks that predict context-dependent phones from the input audio,” Saon said. Status of the API call is also returned as an output. This activity returns the output in JSON string format. IBM said it made improvements in both acoustic and language modeling. This activity uses IBM Watson Speech to Text API to convert audio to text. The ultimate goal is to reach or exceed human accuracy, which is estimated to be around 4 percent on this task, dubbed the Switchboard. Most recently, the advent of deep neural networks was critical in helping achieve the 8 percent and 6.9 percent results, said George Saon, lead scientist in the IBM Watson Group, in a post. The IBM Speech API is offered through Watson. ![]() Spurred by a series of Defense Advanced Research Projects Agency evaluations in the past couple of decades, IBM’s system improved steadily. json to obtain the API key, which the Google Speech API requires for authentication. ![]() ![]() To put this result in perspective, back in 1995, a “high-performance” IBM recognizer achieved a 43 percent error rate. Watson had its finest moment in 2011 when it beat the reigning human champion on the “Jeopardy” television quiz show. The Watson team included Kenny, Tom Sercu, Steven Rennie, and Jeff Kuo. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |