Voice Verify API - Text-to-speech tips

📘

NOTE:

To add this product to your account, contact a Telesign expert. This product is available for full-service accounts only.

This page provides tips on how to use text-to-speech with Voice Verify API.

When making a voice call, you can choose to have a text-to-speech (TTS) translation of a text message to speech, instead of using Telesign’s default message. To use the TTS feature, make a Voice Verify API request that includes both the text content of the message and the language code from this list of supported languages.

Most generic text will be spoken as expected, with pauses for commas, semicolons, dashes, and at the end of a sentence. You can create a longer pause between words by using a newline character ( "\n" ). Depending on what language you are coding in, you will need to create the newline character as appropriate. For example, if you are using Python, "\n" is correct. For a shorter pause, use a colon ( ":" ) or semicolon ( ";" ).

📘

NOTE:

Unicode characters are permissible for Voice Verify API text-to-speech.

TTS is generally quite robust for spoken text, because the words are typically for general use. However, there are some caveats, especially when using TTS for non-words (such as URLs) or for auto-generated words (such as company names).

The only reliable way to know that you’re producing speech that sounds the way you want to is to generate a test call with the string in the language you want to use. Listening, ideally by a native speaker, is the best way to troubleshoot this feature.

General hints

  • Strings sent to the TTS engine outside of typical words found in the dictionary, such as URLs, email addresses and company names, may be fully pronounced, spelled out, or a combination of the two.
  • The system does what it can to pronounce all words included in the message text. This mostly depends on how close an actual word in the string is similar to a word found in the dictionary. Therefore, even made up words can be interpreted if they resemble a word in the dictionary. Although the pronunciation gets less predictable as the pronunciation gets further from the expected word.
  • Telesign attempts to pronounce words that are unpronounceable. Although this can lead to rather odd results when a phrase in one language is pronounced by a voice in another language or dialect. In some cases, the result is not as expected (for example, “gmail.com” may be pronounced as “graymail.com”).
  • Semi-colons are the most reliable way to include a space between pronounced words/numbers. A long pause can also be included by adding a newline in the message. Therefore, “1;2;3\n4;5;6” (the backslash is essential) will pronounce the first group of three digits, pause and then pronounce the second.
  • Numbers, URLs and made up words are covered for English, but the same lessons are likely to apply to other languages.

Numbers

Most of the following tips for numbers apply to English languages only, using standard text with common words found in the dictionary. Less common words are not as likely to be pronounced properly, and non-dictionary words are unpredictable and vary with the dialect and language.

  • Semi-colons are the most reliable way to put space between pronounced words/numbers. For example, “1;2;3;4;5” reads the digits in sequence and is slower than “1 2 3 4 5” . “12345” sounds to me very much like the one with spaces) but “12,345” is “twelve thousand three hundred forty five”, “123,45” is “one hundred twenty three (pause) forty five”, “1234,5” is “twelve thirty four (pause) five”.
  • Commas work in some languages as effective pauses, but in other languages (French for example) the comma is read (in French “virgule”). So in English “1,2,3,4,5” reads the digits separately with space between them. In French, the digits are read with “virgule” spoken between them.
  • Digits used in strings are read (in “en-US” at least) in pairs, but it depends on the context. For example, a phone number like “1-559-555-5643” in “en-US” is read digit by digit (with pauses at the dashes) but “1;559;555;5653” is read as “one five fifty nine five fifty five fifty six forty three”. This varies from place to place, so in some combinations the “-” is silent, in others “dash” and in others, “hyphen”. In some places, the “559” is “five hundred fifty nine”. In “en-IN” the “5643” is “five hundred sixty four three”, in another it is “five thousand six hundred and forty three” and in another it is “fifty six forty three”.
  • Most other punctuation provides as a slight pause, but not a long pause. Most punctuation is generally not pronounced.