Speech to text

Transcription latency reflects the number of seconds it takes for the speech-to-text engine to convert raw audio to a work able transcription. While WER is undeniably the most relevant indicator of speech-to-text reliability, there is also another factor involved – latency. Apart from Accuracy, is There Anything Else to Consider? You could also work on reducing interference in the calling vicinity, by using superior quality microphones, keeping ambient noise levels to a minimum, and eliminating sudden interruptions.įinally, you could specially train the AI to accurately transcribe industry-specific terminology that could be commonly used by your agents, but may not be so commonplace enough for the engine to pick up correctly the very first time around. You can train the AI to accurately interpret the specific accent, inflexion, and voice modulation commonly used by your agents by feeding the engine pre-recorded audio files. There are two ways you can improve accuracy rates for automatic transcriptions – training the AI engine and reducing interference. How to Make Speech-to-Text More Reliable?

In other words, you should expect anywher e between 15-25 errors for every 100 words transcribed across the leading speech-to-text engines available today. Like Rev, dedicated provider Temi also premises better reliability at 13.9% WER or 86.1% accuracy. For example, a May ben chmark found Microsoft to be 81.01% accurate, AWS to be 83.12% accurate, and Google largely the same at 84.46%. Of course, these numbers are subject to testing conditions and the complexity of tasks thrown at it.

This means that in 1000 words of written text, you would have at least 160 incorrectly transc ribed words as per the above benchmarks. As per benchmarks published in March 2020, Amazon had an accuracy of 73% (i.e., 27% WER), Microsoft was 78% accurate, Google came in at 79%, and Rev.ai (a dedicated speech-to-text engine provider) scored a slightly better 84%. Surprisingly, despite the sophisticated nature of A I today, the average WER for speech-to-text is far from 100%. Either way, knowing a speech-to-text engine’s WER is essential to understanding how reliable it actually is. Technically, accuracy is the exact inverse of WER if a piece of transcribed text contains 2% of errors, then it means that it is 98% accurate. This is measured in Word Error Rate (WER) which is the percentage of errors for every 100 words. The relia bility of speech-to-text hinges on its accuracy rate – i.e., how many errors it would contain on an average. But there is a catch here – how reliable is speech-to-text, really? The State of Speech-to-Text Accuracy in 2021įirst, let’s understand how speech-to-text accuracy is measured. Speech to text works with live audio and recorded files equally well, aiming to give you accurate transcripts that you can use for sentiment analysis, quality assurance, and agent training. Speech-to-text technology uses artificial intelligence (AI) to automatically transcribe raw speech into structured written text.