Voice Verify API - Text-to-speech tips
NOTE:To add this product to your account, contact a Telesign expert. This product is available for full-service accounts only.
This page provides tips on how to use text-to-speech with Voice Verify API.
When making a voice call, you can choose to have a text-to-speech (TTS) translation of a text message to speech, instead of using Telesignās default message. To use the TTS feature, make a Voice Verify API request that includes both the text content of the message and the language code from this list of supported languages.
Most generic text will be spoken as expected, with pauses for commas, semicolons, dashes, and at the end of a sentence. You can create a longer pause between words by using a newline character ( "\n" ). Depending on what language you are coding in, you will need to create the newline character as appropriate. For example, if you are using Python, "\n" is correct. For a shorter pause, use a colon ( ":" ) or semicolon ( ";" ).
NOTE:Unicode characters are permissible for Voice Verify API text-to-speech.
TTS is generally quite robust for spoken text, because the words are typically for general use. However, there are some caveats, especially when using TTS for non-words (such as URLs) or for auto-generated words (such as company names).
The only reliable way to know that youāre producing speech that sounds the way you want to is to generate a test call with the string in the language you want to use. Listening, ideally by a native speaker, is the best way to troubleshoot this feature.
General hints
- Strings sent to the TTS engine outside of typical words found in the dictionary, such as URLs, email addresses and company names, may be fully pronounced, spelled out, or a combination of the two.
- The system does what it can to pronounce all words included in the message text. This mostly depends on how close an actual word in the string is similar to a word found in the dictionary. Therefore, even made up words can be interpreted if they resemble a word in the dictionary. Although the pronunciation gets less predictable as the pronunciation gets further from the expected word.
- Telesign attempts to pronounce words that are unpronounceable. Although this can lead to rather odd results when a phrase in one language is pronounced by a voice in another language or dialect. In some cases, the result is not as expected (for example, āgmail.comā may be pronounced as āgraymail.comā).
- Semi-colons are the most reliable way to include a space between pronounced words/numbers. A long pause can also be included by adding a newline in the message. Therefore, ā1;2;3\n4;5;6ā (the backslash is essential) will pronounce the first group of three digits, pause and then pronounce the second.
- Numbers, URLs and made up words are covered for English, but the same lessons are likely to apply to other languages.
Numbers
Most of the following tips for numbers apply to English languages only, using standard text with common words found in the dictionary. Less common words are not as likely to be pronounced properly, and non-dictionary words are unpredictable and vary with the dialect and language.
- Semi-colons are the most reliable way to put space between pronounced words/numbers. For example, ā1;2;3;4;5ā reads the digits in sequence and is slower than ā1 2 3 4 5ā . ā12345ā sounds to me very much like the one with spaces) but ā12,345ā is ātwelve thousand three hundred forty fiveā, ā123,45ā is āone hundred twenty three (pause) forty fiveā, ā1234,5ā is ātwelve thirty four (pause) fiveā.
- Commas work in some languages as effective pauses, but in other languages (French for example) the comma is read (in French āvirguleā). So in English ā1,2,3,4,5ā reads the digits separately with space between them. In French, the digits are read with āvirguleā spoken between them.
- Digits used in strings are read (in āen-USā at least) in pairs, but it depends on the context. For example, a phone number like ā1-559-555-5643ā in āen-USā is read digit by digit (with pauses at the dashes) but ā1;559;555;5653ā is read as āone five fifty nine five fifty five fifty six forty threeā. This varies from place to place, so in some combinations the ā-ā is silent, in others ādashā and in others, āhyphenā. In some places, the ā559ā is āfive hundred fifty nineā. In āen-INā the ā5643ā is āfive hundred sixty four threeā, in another it is āfive thousand six hundred and forty threeā and in another it is āfifty six forty threeā.
- Most other punctuation provides as a slight pause, but not a long pause. Most punctuation is generally not pronounced.
Updated about 1 month ago