Microsoft Enables Multi Speaker Voice Recognition in Word for Mac and PC

Voiced by Amazon Polly

Microsoft via Office 365 has officially joined the speech recognition club by enabling the new “Transcribe” function in Word. Transcribe is essentially automatic transcription of voice or voices to text. Yes, voices! Transcribe can convert speech to text for more than one speaking voices to text and it can identify those speakers, we have an example in the video below.


  • … you can automatically convert recorded interviews or meetings to text
  • … you can record a meeting or interview as it happens a convert the audio to text
  • … it is very accurate
  • … you can do this on a Mac or a Windows PC
  • … it is free (for now for up to 300 minutes of audio)

Make no mistake, this is a game-changer for the Transcription industry and people who need quick and easy voice to text solutions. With such a huge cross-platform macOS and obviously Windows adoption of the Office suite, this is voice-to-text for the masses. Good work Microsoft, we like where you have been going the last few years.

Microsoft Word Transcribe has two options to pass audio:

Option 1. You can record yourself speaking, your interview or your meeting directly to your PC or Mac. The recording can be paused if you take a break. When finished it will upload and then convert to text, identify the speakers and provide an easy to use proofreading panel.

Options 2. Use a voice recorder or your smartphone to record yourself, a meeting or an interview and pass that to Transcribe to convert to text. Same deal, speakers identified and proofing is easy before to paste the text into the Word document.

Microsoft now joins Amazon with their AWS Transcribe, Philips with their SpeechLive and many others moving this technology from expensive installs on local devices to a cloud pay-as-you-go model.

In a Microsoft blog post released last week, 25th August 2020, they lay out the ins and outs of the new Word Transcribe functionality:

Microsoft Word Transcribe Transcription Blog
Click the above image for the Microsoft Blog

The Microsoft blog details how Dictate can be used in Word to speak and see your voice converted to text in real-time. By using Transcribe in Word you then have two options, either:

  1. Record your voice (or interviews/meetings) and have that transcribed
  2. Upload a recording of your voice (or interview/meetings), perhaps from a digital voice recorder or a voice recording app on your smartphone

How is Word Transcribe Option 1 above different from Word Dictate?

Option 1 essentially uses your device to record your spoken voices which are then passed to the Microsoft Cloud to be converted to text, you do not see the text created in real-time. By choosing this option has a couple of effects, a) you can speak for much longer (eg. narrate a book or thesis, record a meeting or interview) and b) the audio will be used by Microsoft to learn using AI your voice and context.  Word Dictate is more suited to dictating short paragraphs or sentences.

Click here for Microsoft’s How To for audio transcription in Word.

Should you choose to upload your recorded audio it must be in one the following audio formats:

  • .wav
  • .mp4
  • .mp3
  • .m4a (the audio format created when recording on an iPhone)

Microsoft Word Transcribe 300 audio minutes free per monthAs is the way with cloud solutions you will be looking at either a dedicated monthly/yearly cost and/or a pay-as-you-go model. Looking at the Word Transcribe model you will get 300 minutes per month for free so you can really test out Microsoft’s voice recognition capabilities and if it is for you.



2 thoughts on “Microsoft Enables Multi Speaker Voice Recognition in Word for Mac and PC”

  1. Thank you Dave for this blog. I tried it, and found it better then the current alternative I was using. It was more accurate transcription. The user interface was better. It was more accurate at distinguishing different speakers.

  2. Thank you Richard for giving your experience, the voice recognition world is going through quite a transformation currently, with more options, more pay-as-you-go/free very cost-effective ways of converting voice to text whether in real-time or from a recording.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.