Amazon AWS Transcribe – Disruption For The Transcription Typing Industry

Transcription, a service whereby audio is listened to by a real person and physically typed, is now under pressure from new technology created by Amazon called AWS Transcribe an Automated Speech Recognition (ASR) service.

The transcription industry has been around for years based on a need to have recorded conversational audio converted to written text. There have been many attempts over the years to automate this process using computer technology which has generally failed. Although speech recognition technology is successful in some cases for single speaker audio, just one person talking, for multispeaker audio it has always been a challenge. The best known voice-to-text software is Dragon by Nuance. It has been around for some 21 years, initially released in 1991, but it has not managed to evolve to crack the multi-speaker audio speech-to-text conundrum which props up the transcription service industry.

Image of Typing Pool using Typewriters

This is all about to change with AWS Transcribe from Amazon. The solution can convert multi-speaker audio, quickly and cheaply, to highly accurate text. Not only that, it will identify individual speakers within the audio and tag each word with a confidence score making it easy for proof-readers to focus on the small number of inaccurate or less confident words. This is going to be a blow to the transcription typing industry as the need for physical typists will start to fall away. The only glimmer of hope is that there will still be a need for proofreaders to format the text output produced by AWS Transcribe, mainly to format the text into paragraphs. AWS Transcribe does insert punctuation into the text.

In the demonstration video below we show how an interview between a male and female speaker is transcribed by AWS transcribe. The audio was taken from a televised interview between an ABC journalist and an Australian politician, the audio is around 90 seconds long. You will be able to see how AWS transcribe quickly processes the audio file, text output created, how the text is assigned to each speaker and the confidence score tagged to each word.

We estimate that AWS Transcribe will cut voice to text conversion costs by 75%-80% and also increase turnaround time significantly.

In some tests we ran a few years ago using Dragon for Windows to convert our audio to text, a one-hour single speaker audio file’s processing time was cut from 5 person-hours (transcribed and proofread) to just 75 minutes (proofread only). With the accuracy of AWS Transcribe we would expect the proofing time to also decrease. When you also take into account the turnaround time audio files of one hour or more can be converted to text within 24 hours.

Amazon’s AWS Transcribe uses a pay-as-you-go model. There is no expensive software to buy, no local infrastructure to install, no audio profiles to train. Just submit an audio file and have it converted to text. With the use of AWS Transcribe API’s the solution can also take a live feed of audio and convert it on the fly. The use cases for AWS Transcribe are endless but here are a couple:

  • Podcasters – Put a text copy of your podcast online to help with SEO
  • Company AGM’s – Record and convert AGM’s for your company website
  • Anyone who records interviews or meetings – police, financial advisers, teams meetings, social clubs, journalists, students

All the infrastructure required to convert digital audio to text is managed by Amazon in the cloud. Setup of AWS Transcribe allows you to set your region if data sovereignty is an issue. Audio is encrypted and AWS are ticking all the regulatory audio requirements like HIPPA for the health industry.

Here at Dictate Australia we are working to produce a streamlined process for you to be able to submit your audio to us. Whether interviews or meetings, perhaps recorded on one of the fantastic Olympus digital voice recorders that we sell. Or recorded on your iPhone or iPad perhaps with a Rode iOS mic connected or even recorded on the Olympus Dictation app. However you record your audio we will then process through AWS transcribe for you if you are not comfortable with creating an Amazon AWS account and the required cloud infrastructure to go with it. Watch this space.

