This post revised November 2012 with updated Dragon version details for Mac and Windows.
Another update, early 2017. The below pretty much holds true still but there have been some advancements in voice-to-text software since this blog was originally written. We are starting to see some success with transcribing one on one interviews. Multi speakers, over talking each other with lots of background noise will always be an issue for voice to text software. Clear here to see our 2017 post with an example of voice to text for a one on one interview.
Another update 2019. ASR or Automatic Speech Recognition for interviews and meetings now possible with Amazon AWS Transcribe – see this post.
No, is the short answer. Voice or speech recognition software will only successfully work with one spoken voice and that spoken voice, for best results, needs to have trained the software to understand their voice and style of speaking known as creating a voice profile. Most people who use voice recognition software typically pop on a headset and mic and chat away to their Windows PC or Mac and watch as their voice is magically (or using the Dragon voice recognition engine if you don’t believe in magic) converted to text.
Voice recognition is currently going through a spike in popularity, especially since the iPhone 4S was released back in October 2011 running iOS5 with Siri. Siri has definitely helped raise the awareness of just how good voice recognition technology has evolved over the last couple of years, Nuance being the company leading the way in this technology on Mac, Windows and now inside both iOS5 and iOS6 working closely with Apple.
Most of us now has an iPhone or smart phone which is capable of recording audio and most who need to capture important meetings or interviews also have a high quality digital voice recorder. Dragon NaturallySpeaking version 12 Premium and Professional edition on Windows and the recently released Dragon Dictate 3 for Mac can take recorded audio and transcribe it, but that is only for the single voice trained person who is speaking. So it is understandable that the leap is often made that if it can transcribe one voice it can transcribe many, not so. The technology is not there yet and in my opinion won’t be for quite some time.
Nuance on their website cleverly have a couple of pages dedicated to Transcribing Interviews or Transcribing Lectures using a technique called voice writing or parroting. The idea is simple, you record your interview, meeting, lecture or whatever it is that has multiple voices or a single voice that you can not get trained in Dragon (possibly the voice of someone making a speech or presenting at a conference). Then when you are sitting in front of your Windows PC or Mac you listen to your recorded audio and you speak what you hear. Therefore it is your trained voice that Dragon hears as you are repeatings other peoples words from your recording. Sounds simple doesn’t it – it isn’t, I have tried it a few times and have become extremely frustrated. You may find that the voice you are listening to is too fast for you to remember and repeat back which means you are constantly stopping and rewinding the audio, which on an iPhone app or basic digital voice recorder is not always that easy. You will likely often have to stop the audio to correct mistakes made by Dragon, maybe when unfamiliar names or terms are used which have not been trained into your user profile. It will work but it does take a lot of concentration and a large amount of practice, you can give it a try, just search YouTube for a boardroom meeting and see if you can repeat what you hear while wearing a headset and mic.
So what are your practical options for transcribing audio or video of meetings and interviews or in fact any audio that you need turned from voice to text?
Option 1 – Type The Audio Yourself
You could transcribe the audio yourself using some specialised transcription software to help you, transcription is the process of typing what you hear. When I say help it won’t actually do the work of typing it for you but it will assist greatly with the audio playback. For example if you are a slow typist you could slow the speed of audio playback down to match your typing speed or conversly also speed up the audio if your fingers are a blur when you type. Using transcription software you can also easily rewind the audio, again set up to match your typing ability so you could either rewind back half a second or if you are a poor typist like me rewind the audio back up to 5 seconds. The biggest benefit of using transcription software is that audio can be controlled from inside the word processor that you are typing in. By that I mean using a USB connected foot pedal or pre-defined keys on your keyboard you can stop/play/rewind the audio without leaving the document that you are typing. This is a huge time saver, often people try to transcribe using Windows Media Player or Quicktime for their audio playback and they have to constantly switch between they document to the audio player to stop/rewind/play and then back to the document which adds a considerable amount of time to the whole transcribing process.
Pros Of Doing It Yourself
- Cost, from the point of view that you are not paying someone else to do the work.
- Cost, from the point of view that transcription software is free or relatively cheap depending on what you buy.
Cons Of Doing It Yourself
- It is hard work and not as easy as it sounds.
- Transcription typing does take a long time even if you are experienced.
Just as an indication of how long transcription takes, a good quality one hour audio file of an interview between two people would take an experienced transcription typist around 2-3 hours to transcribe. If you are not experienced expect that to rise to 4-5 hours and without the help of transcription software even longer.
- Express Scribe from NCH Software – $FREE
- Express Scribe Pro from NCH Software – approx. AU$40 – Click Here To Buy Express Scribe PRO – Plays .dss .ds2 .dct .dvf .avi .mov .wmv audio and video files.
- Olympus AS-2400 Digital Transcription Kit – Includes software, headset and USB foot pedal – RRP AU$425 but usually it sells online for less. Windows & Mac compatible.
Option 2 – Outsource To A Professional Transcription Service
If you need your audio transcribed quickly and accurately then this option is for you. Just Google transcription service and see how many businesses are listed both organically and in the paid Google ads. Transcription has become a boom business over the last few years and that makes total sense. With the availability and high quality of digital voice recorders and iPhone apps just about everyone is recording their meetings, interviews, podcasts, vodcasts, focus groups, lectures, letters, assessments the list goes on. It is great to have a record of your meeting and the like stored digitally which can be easily shared within your business on your intranet or with the world on the internet but more often than not a written transcript is also required for offline reading, compliance or for public record. There are all kinds of reasons audio and video is transcribed into text these days.
This is where a professional transcription service can come in to help you. Simply send them your digital audio or video via the internet and they can set to work using their team of experienced transcription typists to convert your voice to text. Here is a security tip, always make sure that the transcription service you use has their own secure file server and that the data you send them is encrypted either using FTPS (not FTP) or HTTPS (not HTTP) – this is important for two reasons. 1. your audio/video is encrypted as it travels across the internet from you to their servers. 2. by ensuring they run their own secure server they will have full control and access over what happens on that server. Many transcription services use third party providers like SendThisFile.com and they nor you have any control over your data on that third parties servers. Ask any transcription service that you approach about their security and file transfer mechanisms. Many still use FTP and not FTPS, transferring files across FTP alone does not encrypt the file contents, be it audio, video or worse still your transcribed documents.
Any good professional transcription service will turn your audio around quickly and the final transcript will very accurate. Typically there are two phases to the transcription process; phase 1 is the transcription typist who listens and types your audio/video to text. Phase 2 is a quality assurance phase, a proofreader will listen to your audio and proof against what has been transcribed. This double verifies the typists work and the document can be formatted to your specific requirements.
As audio and video for transcription is mainly digital you can utilise the services of a transcription company anywhere in the world, you may want to do this for cost benefit reasons but more often than not to take advantage of time zones. Many UK and US businesses will use the services of Australian based transcription companies for just this reason as Australia tends to work their business day as Europe and the US sleep.
- Speed, this is a transcription services core business. Reputable companies will turn around audio and video very quickly.
- Accuracy, again as a core business this is how transcription services are rated from a customers point of view. A good transcription firm will always produce high quality accurate transcriptions because of the two phase approach.
- Ability to adapt, regardless of how much audio or video you send you should always expect the same fast, accurate turn around of transcripts.
- You don’t have to do it so you can focus your time on more productive work.
- Cost is the only con, you are paying for a service so compared to you doing it yourself there will be a cost involved. Obviously the larger more reputable transcription companies will have different rates from the work from home type of micro transcription service.
So there you have it ! Reasons why Dragon NaturallySpeaking for Windows or MacSpeech Scribe for Apple Mac are not good for transcribing meetings or interviews and the options you have open to you to convert recorded audio or video into text.