Voice - Use SSML for TTS

📘

NOTE:

To add this product to your account, contact a Telesign expert. This product is available for full-service accounts only.

This page explains how to use speech synthesis markup language (SSML) when using the text-to-speech (TTS) option with Telesign Voice.

What is SSML?

SSML is a web standard for the generation of synthetic speech. Use SSML when you want to specify fine details of how your message is converted to speech, as opposed to relying on the default conversion details used when sending a plain text message.

Reserved characters

The following characters are reserved in SSML, so use the associated escape code when including them in the content of your message.

CharacterEscape code
""
&&
''
<&lt;
>&gt;

Using quotation marks

There are a few special rules related to the " and ' characters above:

Double quotation marks

  • Must always be escaped when in an attribute value delimited by double quotes.
  • Do not need to be escaped when in textual context in the message. For example: <speak>He said, "Do. Or do not. There is no try."</speak>
  • Do not need to be escaped when in an attribute value delimited by single quotes.

Single quotation marks

  • Must be escaped when used as an apostrophe.
  • Do not need to be escaped when in textual context in the message.
  • Do not need to be escaped when in an attribute value delimited by double quotes.

Supported tags

Telesign's implementation of SSML supports the following tags.

<speak>

Enclose your entire message within these tags.

Example (SSML)

<speak>Do. Or do not. There is no try.</speak>

<break>

Add a pause in your message.

Attributes

NameValuesMeaning
strength
noneNo pause. Use this to remove a normally occurring pause, such as after a period.
x-weakNo pause. Same effect as none.
weakPause of the same duration as one after a comma.
mediumPause of the same duration as one after a comma. Same effect as weak.
strongPause of the same duration as one after a sentence.
x-strongPause of the same duration as one after a paragraph.
time
{n}s

(max = 10s)
The duration of the pause, in seconds.
{n}ms

(max = 10000ms)
The duration of the pause, in milliseconds.

Example (SSML)

<speak>Do<break strength="strong"/> Or do not<break time="1s"/> There is no try.</speak>

If you do not include any attributes, the effect varies depending on whether the tag is next to punctuation:

  • Next to comma: Has the same effect as including the attribute strength="strong".
  • Next to period: Has the same effect as including the attribute strength="x-strong".
  • Not next to punctuation: Has the same effect as including the attribute strength="medium".

Example (SSML)

<speak>Do.<break/> Or do not.<break/> There is no try.</speak>

<prosody>

Control the volume, speaking rate, and pitch of the voice. At least one attribute must be included. <prosody> tags can be nested within other <prosody> tags.

Attributes

NameValuesMeaning
volume
defaultResets to the default volume level for the selected voice.
silent, x-soft, soft,medium,loud,x-loudSets the volume to a predefined value for the selected voice.
+{n}dB, -{n}dB

Example: +6dB (approximately doubles current volume)

Example: -6dB (approximately halves current volume)
Changes the volume from the current level to higher (+) or lower (-) by a measurement in decibels.
rate
x-slow, slow, medium, fast, x-fastSets the speaking rate to a predefined value for the selected voice.
{n}%

(min = 20%)

(max = 200%)

Example: 50%(halves the current rate.)

Example: 200%(doubles the current rate.)
Sets the speaking rate to this percentage of the current rate.
pitch
defaultResets to the default pitch for the selected voice.
x-low, low, medium, high, x-highSets the pitch to a predefined value for the selected voice.
+{n}%, -{n}%Increases (+) or decreases (-) the pitch by this percentage.

Example (SSML)

<speak><prosody volume="loud" rate="slow">Do. </prosody><prosody volume="-6dB" rate="150%">Or do <prosody pitch="low">not.</prosody> </prosody>There is no try.</speak>