Share via


Text-to-speech overivew in Scientific American

Scientific American recently posted an article overview of text-to-speech. I know that other bloggers have been mentioning it, but since I'm a TTS-centric blogger, I thought that I should put in a plug for it here.

One of the interesting question raised by the authors is the following, "Should machine speech be indistinguishable from a human speaker, as in the well-known Turing test for artificial intelligence?" The authors conclude, "probably not." They say that a "better goal" (rather than a voice that could 'trick' a human listener) is a voice that is "pleasing [and expressive[ to which people feel comfortable listening." Do the authors truly believe this? Or do they mean a more realistic goal rather than a better goal.   I find it hard to believe that every TTS engineer isn't trying to make TTS voices that sound as realistic as possible. Just because you can make a voice 'trick' a user, doesn't mean that it has to be implemented as such. That is, if you can create a voice that would trick a user, you could just as easily tweak it so that it's less realistic and perhaps more suited for a warning system or "video games" (I'm not sure the authors are gamers else they wouldn't have suggested that natural human speech is not most appropriate for video games). You can bet that if the folks at AT&T labs could make a voice to trick a user, they would be writing a much different end to their article.

Comments

  • Anonymous
    June 17, 2005
    The comment has been removed
  • Anonymous
    June 21, 2005
    It's been a while since I've seen "2001"; however, it seemed to me like that voice was in fact a real person's voice. Anybody out there know for sure?

    Now that's another interesting psychological twist... if you think the voice is fake, but indeed it is REAL. Mmmm....