|
This post revised November 2012 with updated Dragon version details for Mac and Windows.
Another update, early 2017. The below pretty much holds true still but there have been some advancements in voice-to-text software since this blog was originally written. We are starting to see some success with transcribing one on one interviews. Multi speakers, over talking each other with lots of background noise will always be an issue for voice to text software. Clear here to see our 2017 post with an example of voice to text for a one on one interview.
Another update 2019. ASR or Automatic Speech Recognition for interviews and meetings now possible with Amazon AWS Transcribe – see this post.
Latest update September 2020. Microsoft allow voice-to-text of meetings and interviews in Word and can be used on a Mac or a PC. Details and a demo video in this blog post dictat.es/word-transcribe – oh and yes and it’s free and works really well.
No, is the short answer. [We wrote this blog post 9 years ago and it is still relevant. Dragon is falling behind compared to the cloud based solutions offered by Philips (SpeechLive), AWS (Transcribe), Google (Voice Typing in Google Docs), Word online from Microsoft (works on a Mac and PC).] Voice or speech recognition software will only successfully work with one spoken voice and that spoken voice, for best results, needs to have trained the software to understand their voice and style of speaking known as creating a voice profile. Most people who use voice recognition software typically pop on a headset and mic and chat away to their Windows PC or Mac and watch as their voice is magically (or using the Dragon voice recognition engine if you don’t believe in magic) converted to text.
Voice recognition is currently going through a spike in popularity, especially since the iPhone 4S was released back in October 2011 running iOS5 with Siri. Siri has definitely helped raise the awareness of just how good voice recognition technology has evolved over the last couple of years, Nuance being the company leading the way in this technology on Mac, Windows and now inside both iOS5 and iOS6 working closely with Apple.
Most of us now has an iPhone or smart phone which is capable of recording audio and most who need to capture important meetings or interviews also have a high quality digital voice recorder. Dragon NaturallySpeaking version 12 Premium and Professional edition on Windows and the recently released Dragon Dictate 3 for Mac can take recorded audio and transcribe it, but that is only for the single voice trained person who is speaking. So it is understandable that the leap is often made that if it can transcribe one voice it can transcribe many, not so. The technology is not there yet and in my opinion won’t be for quite some time.
Nuance on their website cleverly have a couple of pages dedicated to Transcribing Interviews or Transcribing Lectures using a technique called voice writing or parroting. The idea is simple, you record your interview, meeting, lecture or whatever it is that has multiple voices or a single voice that you can not get trained in Dragon (possibly the voice of someone making a speech or presenting at a conference). Then when you are sitting in front of your Windows PC or Mac you listen to your recorded audio and you speak what you hear. Therefore it is your trained voice that Dragon hears as you are repeatings other peoples words from your recording. Sounds simple doesn’t it – it isn’t, I have tried it a few times and have become extremely frustrated. You may find that the voice you are listening to is too fast for you to remember and repeat back which means you are constantly stopping and rewinding the audio, which on an iPhone app or basic digital voice recorder is not always that easy. You will likely often have to stop the audio to correct mistakes made by Dragon, maybe when unfamiliar names or terms are used which have not been trained into your user profile. It will work but it does take a lot of concentration and a large amount of practice, you can give it a try, just search YouTube for a boardroom meeting and see if you can repeat what you hear while wearing a headset and mic.
So what are your practical options for transcribing audio or video of meetings and interviews or in fact any audio that you need turned from voice to text?
Option 1 – Type The Audio Yourself
You could transcribe the audio yourself using some specialised transcription software to help you, transcription is the process of typing what you hear. When I say help it won’t actually do the work of typing it for you but it will assist greatly with the audio playback. For example if you are a slow typist you could slow the speed of audio playback down to match your typing speed or conversly also speed up the audio if your fingers are a blur when you type. Using transcription software you can also easily rewind the audio, again set up to match your typing ability so you could either rewind back half a second or if you are a poor typist like me rewind the audio back up to 5 seconds. The biggest benefit of using transcription software is that audio can be controlled from inside the word processor that you are typing in. By that I mean using a USB connected foot pedal or pre-defined keys on your keyboard you can stop/play/rewind the audio without leaving the document that you are typing. This is a huge time saver, often people try to transcribe using Windows Media Player or Quicktime for their audio playback and they have to constantly switch between they document to the audio player to stop/rewind/play and then back to the document which adds a considerable amount of time to the whole transcribing process.
Pros Of Doing It Yourself
- Cost, from the point of view that you are not paying someone else to do the work.
- Cost, from the point of view that transcription software is free or relatively cheap depending on what you buy.
Cons Of Doing It Yourself
- It is hard work and not as easy as it sounds.
- Transcription typing does take a long time even if you are experienced.
Just as an indication of how long transcription takes, a good quality one hour audio file of an interview between two people would take an experienced transcription typist around 2-3 hours to transcribe. If you are not experienced expect that to rise to 4-5 hours and without the help of transcription software even longer.
Transcription Software
- Express Scribe from NCH Software – $FREE
- Express Scribe Pro from NCH Software – approx. AU$40 – Click Here To Buy Express Scribe PRO – Plays .dss .ds2 .dct .dvf .avi .mov .wmv audio and video files.
- Olympus AS-2400 Digital Transcription Kit – Includes software, headset and USB foot pedal – RRP AU$425 but usually it sells online for less. Windows & Mac compatible.
Option 2 – Outsource To A Professional Transcription Service
If you need your audio transcribed quickly and accurately then this option is for you. Just Google transcription service and see how many businesses are listed both organically and in the paid Google ads. Transcription has become a boom business over the last few years and that makes total sense. With the availability and high quality of digital voice recorders and iPhone apps just about everyone is recording their meetings, interviews, podcasts, vodcasts, focus groups, lectures, letters, assessments the list goes on. It is great to have a record of your meeting and the like stored digitally which can be easily shared within your business on your intranet or with the world on the internet but more often than not a written transcript is also required for offline reading, compliance or for public record. There are all kinds of reasons audio and video is transcribed into text these days.
This is where a professional transcription service can come in to help you. Simply send them your digital audio or video via the internet and they can set to work using their team of experienced transcription typists to convert your voice to text. Here is a security tip, always make sure that the transcription service you use has their own secure file server and that the data you send them is encrypted either using FTPS (not FTP) or HTTPS (not HTTP) – this is important for two reasons. 1. your audio/video is encrypted as it travels across the internet from you to their servers. 2. by ensuring they run their own secure server they will have full control and access over what happens on that server. Many transcription services use third party providers like SendThisFile.com and they nor you have any control over your data on that third parties servers. Ask any transcription service that you approach about their security and file transfer mechanisms. Many still use FTP and not FTPS, transferring files across FTP alone does not encrypt the file contents, be it audio, video or worse still your transcribed documents.
Any good professional transcription service will turn your audio around quickly and the final transcript will very accurate. Typically there are two phases to the transcription process; phase 1 is the transcription typist who listens and types your audio/video to text. Phase 2 is a quality assurance phase, a proofreader will listen to your audio and proof against what has been transcribed. This double verifies the typists work and the document can be formatted to your specific requirements.
As audio and video for transcription is mainly digital you can utilise the services of a transcription company anywhere in the world, you may want to do this for cost benefit reasons but more often than not to take advantage of time zones. Many UK and US businesses will use the services of Australian based transcription companies for just this reason as Australia tends to work their business day as Europe and the US sleep.
Pros
- Speed, this is a transcription services core business. Reputable companies will turn around audio and video very quickly.
- Accuracy, again as a core business this is how transcription services are rated from a customers point of view. A good transcription firm will always produce high quality accurate transcriptions because of the two phase approach.
- Ability to adapt, regardless of how much audio or video you send you should always expect the same fast, accurate turn around of transcripts.
- You don’t have to do it so you can focus your time on more productive work.
Cons
- Cost is the only con, you are paying for a service so compared to you doing it yourself there will be a cost involved. Obviously the larger more reputable transcription companies will have different rates from the work from home type of micro transcription service.
So there you have it ! Reasons why Dragon NaturallySpeaking for Windows or MacSpeech Scribe for Apple Mac are not good for transcribing meetings or interviews and the options you have open to you to convert recorded audio or video into text.
Recent development in European funded research has been addressing speech recognition for meetings for some time (AMIDA). A commercial cloud-service had been released based on the work of AMIDA called Koemei.com. A microphone array system from Dev-Audio (www.dev-audio.com), integrated with koemei enables hardware segmented multi-party speech recognition particularly for group meetings.
Thank your for visiting and for your comment Temitope
I am familiar with Koemei and the superb Microcone multi channel mic from dev-audio (an up and coming Australian company who recently won an Australian International Design Award) as I have sent a couple of sample high quality two speaker audio files for transcription. Although there is still some work to be done the concept is sound, this is exciting times as Koemei push the multi speaker voice recognition boundaries and I hope that we at Dictate Australia can help with development and support into the future. The dev-audio Microcone is not just a superb design but used in conjunction with its intuitive multi channel recording software will prove to be quite an amazing package in the near future.
Koemei and dev-audio are a couple of businesses that we are certainly keeping an eye on and look forward to working with.
Dave
Dictate Australia
I have also been following the development of Dragon and other speech recognition systems with interest over the last ten years, since I began transcribing. I think as transcription services we have little to ‘fear’ in the near future with interviews and focus groups etc. since I have also tried the ‘parroting’ route, and I found it quicker to type!
I think it’s great, though, to be involved in future developments such as the microcone, rather than being an ostrich and burying ones head in the sand.
Hey Anne
Thank you for your comment and for dropping by my blog. Agree, currently there is little to fear in the transcription world for multi speaker audio for the foreseeable future, single speaker audio on the other hand is fast becoming the norm to go to Dragon rather than a transcriptionist, this trend will continue to grow.
Dave
Dictate Australia
Court reporters all over the country use Dragon NaturallySpealing 11.5 and now Version 12 to respeak, voice write, parrot verbatim of live TV broadcasting for captioning for the hearing impaired here in USA – so do not believe for one moment one should remain totally confident that theirs and mine, our current description of our jobs is secure. Perhaps another 5 years – at the most. Transcribing by QWERTY will be transhed. Those of us in the transcribing business will become TEXT EDITORS. If you don’t like it, tough, do something else. Apple, Microsoft, Nuance with Dragon, Siri by Apple, others in Europe, they are all vying for the Holy Grail, offering very cheap prerecorded audio to be transcribed for a penny a word, realtime independent speech recognition for a peny a word. Hey, whatever happened to the coachmen and the horse drawn carriages? It’s coming and we have to retool and be reeducated for something else – UNLESS we decide to make it our business to know what we think is our enemy and embrace it instead, understand what it can and will do, and then tell clients, no matter where y ou go, we can transcribe what you say within moments to use at your next meeting just the way you want it edited by us.
Hey Steve
Thanks for dropping by and for your comment. I agree, we need to be prepared for the future and embrace it early. We have been doing that for years in our transcription business, early on we became Nuance Gold Resellers and our niche is the Mac side of the dictation and transcription world. We are already offering a service whereby users can send us their recorded voice and their Dragon profile and we process it through Dragon and optionally proofread/update their profile with changes.
Now that Apple has made voice mainstream with the release of Siri a couple of years ago and more recently with Dictation built into OS X 10.8 everybody knows that voice to text is here to stay. A friend recently asked me if I could recommend a learn to type software program for her 6 year old son. I told her not to bother, he will be using his voice to type the majority of what he needs to. As I have said a few times, the future is voice.
Thanks
Dave
Dictate Australia
Which of these recorders would be best to record, the and then plug into the computer for translation?
Hello Faye
Pretty much any voice recorder can be used to record your voice to pass through Dragon or use the new Transcribe function in Microsoft Word. There are a couple of things to keep in mind:
1. Always record in a quiet environment, no background noise.
2. Record on a high-quality audio format and make sure your audio format is compatible with the voice-to-text solution you are using.
You can of course also use your smartphone to record although these are not as good as picking up multi speakers (interviews/meetings) as a dedicated voice recorder.
Hope that helps.
Dave
Dictate