SpVoice Phoneme event (SAPI 5.3)

Microsoft Speech API 5.3

Object: SpVoice (Events)

Phoneme Event

The Phoneme event occurs when the text-to-speech (TTS) engine detects a phoneme boundary while speaking a stream for the SpVoice object.

     StreamNumber As Long,
     StreamPosition As Variant,
     Duration As Long,
     NextPhoneId As Integer,
     Feature As SpeechVisemeFeature,
     CurrentPhoneId As Integer


  • StreamNumber
    The stream number which generated the event. When a voice enqueues more than one stream by speaking asynchronously, the stream number is necessary to associate an event with the appropriate stream.
  • StreamPosition
    The character position in the output stream at which the phoneme begins.
  • Duration
    The duration of the phoneme, in milliseconds.
  • NextPhoneId
    The next phone ID.
  • Feature
    The SpeechVisemeFeature, which may indicate emphasis or stress on the viseme.
  • CurrentPhoneId
    The current phone ID.


When the engine synthesizes a phoneme comprised of more than one phoneme element, it raises an event for each element. For example, when a Japanese TTS engine speaks the phoneme "KYA," which is comprised of the phoneme elements "KI" and "XYA," it raises an SPEI_PHONEME event for each element. Because the element "KI" in this case modifies the sound of the element following it, rather than initiating a sound, the duration of its SPEI_PHONEME event is zero.


The following Visual Basic form code demonstrates the Phoneme event. To run this code, create a form with the following controls:

  • A command button called Command1
  • Two text boxes called Text1 and Text2

Paste this code into the Declarations section of the form.

The Form_Load procedure puts a text string in Text1 and creates a voice object, leaving all its properties with their default settings. The command1_Click procedure calls the Speak method. This will cause the TTS engine to send the Phoneme event to the voice; the Phoneme event code will display the phoneme values in Text2.

  Option Explicit

Public WithEvents vox As SpeechLib.SpVoice

Private Sub Command1_Click()

    vox.Speak Text1.Text, SVSFlagsAsync

End Sub

Private Sub Form_Load()

    Set vox = New SpVoice
    Text1.Text = "This is text in a text box."

End Sub

Private Sub vox_Phoneme(ByVal StreamNumber As Long, ByVal StreamPosition As Variant, ByVal Duration As Long, ByVal NextPhoneId As Integer, ByVal Feature As SpeechLib.SpeechVisemeFeature, ByVal CurrentPhoneId As Integer)

    Text2.Text = Text2.Text & CurrentPhoneId & " "

End Sub