Here is an iPhone app I wanted to show you, it is called Downright Transcription and has been written by Ronald Lo a 32 year old Hong Kong resident. Ronald had the idea for an app after a friend of his who works in a hospital needed some help converting voice to text for meetings. Now, just a warning, Downright Transcription is fairly new and I believe at the time of writing is at version 1.0 so there will be updates to come. And on top of that it is not perfect, but, and this is why I am writing about this app it is one well worth watching and here is why:
- It is trying to convert speech to text for multiple speakers (more than one person talking) audio. Dragon only does single speaker voice to text.
- It is using open source speech to text technology and is being tweaked by Ronald to get the best results
- It processes the audio on device, no sending audio to a server it is converted on your iPhone so it is fast
I have done some testing using my own voice, it was ok, it transcribed probably around 80% of my speech correctly, however I was outside with traffic noise and not using a microphone. With Downright Transcription just with all speech recognition software it works well only with clear audio and with little to no background noise.
The app itself is not that intuitive and takes some thought on how it works, this will be addressed in later releases. But playing with it for just a few minutes you can work out how to record and then have that audio transcribed. Anything that reduces the effort of typing audio has to be a good thing and I hope to send Ronald some suggestions on how his app can be improved over time. I love it when people take on a challenge, it would be easy to sit back and try existing solutions, well done Ronald for giving this a go, I hope the project continues on and improves over time, speech recognition for meetings is the holy grail of the voice recognition industry.
I have made some suggestions to Ronald on how his app can be improved and evolve. These suggestions include a built in transcription kit in the app which will help with proof reading of the emailed transcript on your PC or Mac. The idea being that while controlling the playback of your recorded audio on your iPhone using your earbuds you can proof your emailed text. The ability to listen back, stop, play and rewind the audio while proofing the document would be a big plus.
The Open Source speech recognition software that Ronald has referenced in the making of this app are the Julius Large Vocabulary CSR Engine and CMU Sphinx Open Source Toolkit for Speech Recognition.