Voice recognition

Add to the list of 'cool things I want'.

David Pogue: Like Having a Secretary in Your PC : I’m wearing a headset, talking, and my PC is writing down everything I say in Microsoft Word. I’m speaking at full speed, perfectly normally except that I’m pronouncing the punctuation (comma), like this (period).

Let’s try something a little tougher. Pyridoxine hydrochloride. Antagonistic Lilliputians. Infinitesimal zithers.

... The software I’m using is Dragon NaturallySpeaking 9.0 (www.nuance.com), the latest version of the best-selling speech-recognition software for Windows. This software, which made its debut Tuesday, is remarkable for two reasons.

Reason 1: You don’t have to train this software. That’s when you have to read aloud a canned piece of prose that it displays on the screen — a standard ritual that has begun the speech-recognition adventure for thousands of people.

I can remember, in the early days, having to read 45 minutes’ worth of these scripts for the software’s benefit. But each successive version of NaturallySpeaking has required less training time; in Version 8, five minutes was all it took.

And now they’ve topped that: NatSpeak 9 requires no training at all.

I gave it a test. After a fresh installation of the software, I opened a random page in a book and read a 1,000-word passage — without doing any training.

The software got 11 words wrong, which means it got 98.9 percent of the passage correct. Some of those errors were forgivable, like when it heard “typology” instead of “topology.”

But Nuance says that you’ll get even better accuracy if you do read one of the training scripts, so I tried that, too. I trained the software by reading its “Alice in Wonderland” excerpt. This time, when I read the same 1,000 words from my book, only six errors popped up. That’s 99.4 percent correct.

The best part is that these are the lowest accuracy rates you’ll get, because the software gets smarter the more you use it — or, rather, the more you correct its errors.

You do this entirely by voice. You say, “correct ‘typology,’ ” for example; beneath that word on the screen, a numbered menu of alternate transcriptions pops up. You see that alternate 1 is “topology,” for example, so you say “choose 1.” The software instantly corrects the word, learns from its mistake and deposits your blinking insertion point back at the point where you stopped dictating, ready for more.

Over time, therefore, the accuracy improves. When I tried the same 1,000-word excerpt after importing my time-polished voice files from Version 8, I got 99.6 percent accuracy. That’s four words wrong out of a thousand — including, of course, “topology.”

For this reason, it doesn’t much matter whether or not you skip the initial training; the accuracy of the two approaches will eventually converge toward 100 percent.

NatSpeak 9 is remarkable for a second reason, too: it’s a new version containing very little new.

Yes, they’ve eliminated the training requirement. And yes, the new NatSpeak is 20 percent more accurate than before if you do the initial training. Then again, what’s a 20 percent improvement in a program that’s already 99.4 percent accurate — 99.5? That’s maybe one less error every 1,000 words.

Of course, still haven't managed to find time to instal Windows on either of our Mac Intel machines (D has an Intel iMac, I have this MacBookPro), afterwords, NatSpeak will probably be one of the first purchases after....

NatSpeak also runs beautifully on the Macintosh. The setup is a bit involved: you need a recent Intel-based Mac, Apple’s free Boot Camp utility, a copy of Windows XP, and a U.S.B. adapter on your headset. And you have to restart the Mac in Windows each time you want to use NatSpeak. But if you can look past all that fine print, NatSpeak on Macintosh is extremely fast and accurate.

Probably will run in Parallel Desktop too.

(p.s. - this is day 3 of our phone/internet outage. A pox on all companies who scrimp on customer support and tech support, such as XO Communications)

Technorati Tags:

About this Entry

This page contains a single entry by swanksalot published on July 22, 2006 12:23 PM.

links for 2006-07-22 was the previous entry in this blog.

Soldiers’ Words May Test PBS Language Rules is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Pages

Powered by Movable Type 4.37