Voice - Play audio and perform automatic speech recognition

📘

NOTE:

To add this product to your account, contact a Telesign expert. This product is available for full-service accounts only.

This page explains how to use Telesign Voice to play a message and detect speech from your end user using automatic speech recognition (ASR). You can do this in an outbound call, or in response to an inbound call. This can be used to create simple phone trees, or collect information from your customer.

Before you begin

Make sure you have the following before you start:

If you do not want to use your own number, you should also have:

  • Telesign phone number: A voice-capable phone number you have purchased from Telesign to use as a caller ID. Contact our Customer Support Team for details.

📘

NOTE

For full API reference details for this service, including definitions for each parameter, see POST https://rest-ww.telesign.com/v2/voice.

🚧

CAUTION

Use only codecs and standards for audio files supported by Voice. See Voice - Supported standards and codecs for more details.

Implement stream audio and ASR

This section walks you through dialing a number, playing a message, and detecting speech from an end user. You could also begin this process by receiving an inbound call, then playing a message and detecting speech.

To set up the call, do the following:

  1. Create an outbound call using the dial Action. Add a destination phone number for to and your caller ID (phone number you bought from Telesign) for caller_id_number (using a caller ID is optional but recommended).

Outbound call with Python

from base64 import b64encode

import requests

customer_id = 'Your Customer ID goes here.'
api_key = 'Your API key goes here.'

destination_number = 'The complete phone number you want to call, including country code, with no special characters or spaces.'
caller_id_number = 'The phone number you purchased from Telesign goes here.'
external_id = 'An external ID you generated for this transaction.' # Optional parameter

url = "https://rest-ww.telesign.com/v2/voice"

payload = {
            "jsonrpc": "2.0",
            "method": "dial",
            "params": {
                "to": destination_number,
                "caller_id_number": caller_id_number,
                "external_id": external_id
            }
    }


headers = {
    'Accept': "application/json",
    'Content-Type': "application/json",
    'Authorization': "Basic {}".format(b64encode(customer_id + ":" + api_key).decode('utf-8')),
    }

response = requests.request("POST", url, data=json.dumps(payload), headers=headers)

print(response.text)
  1. Telesign notifies you with a dial_completed Event. You should have some code to check the status is answered.

  2. If everything looks good, you move ahead and respond to the answered Event with a request for the Action of playing a message and detecting speech. Here is an example of what the payload of that request might look like:

{
  "method": "speak",
  "params": {
    "tts": {
      "message": "<speak>Press <prosody volume='loud'>1</prosody> for your account balance. Press <prosody volume='loud'>2</prosody> to speak with a customer service representative.</speak>",
      "language": "en-US",
      "type": "ssml"
    },
    "detect_speech": {
        "timeout": 5000,
        "inter_speech_timeout": 3000,
        "max_speech_duration": 10000,
        "language": "en-US"
    }
  }
}

Here is a code sample showing how you might handle Telesign Events:

Play and detect speech

from json import dumps, loads
from bottle import route, run, request, post
# This is an example response to a dial_completed event that will play a message and detect speech from an end user.
# {
#   "method": "speak",
#   "params": {
#     "tts": {
#       "message": "<speak>Say <prosody volume='loud'>1</prosody> for your account balance. Say <prosody volume='loud'>2</prosody> to speak with a customer service representative.</speak>",
#       "language": "en-US",
#       "type": "ssml"
#     },
#     "detect_speech": {
#         "timeout": 5000,
#         "inter_speech_timeout": 3000,
#         "max_speech_duration": 10000,
#         "language": "en-US"
#     }
#   }
# }

class Response:
    def __init__(self, method, params=None):
        if params is None:
            params = {}
        self.json_rpc = "2.0"
        self.method = method
        self.params = params
    def to_json(self):
        return dumps(self.__dict__)

@post('/')
def telesign_event():
    # Throughout a call session Telesign will notify you of all events.
    # Each event requires you to send us an appropriate action defined in our documentation.
    # This endpoint needs to match the URL stored in our system to properly communicate.
    #
    # In this example, the server will respond to the dial_completed event.
    # First extract the event from the JSON body in the request.
    event = request.json.get('event')
    if event == 'dial_completed':
        # Check for an 'answered' status. For 'answered' fill out your parameters:
        message = "<speak>Say <prosody volume='loud'>1</prosody> for your account balance. Say <prosody volume='loud'>2</prosody> to speak with a customer service representative.</speak>"
        language = "en-US"
        timeout = 10000 # How long to wait for speech from the user before ending recognition.
        inter_speech_timeout = 3000 # How long to wait for the next utterance before ending recognition, after at least one utterance has been spoken.
        max_speech_duration = 10000 # How long to wait for the whole voice recognition.

        # Generate the command in the JSON format.
        return Response(method='speak',
          params={
            'tts': { 'message': message, 'language': language },
            'detect_speech': { 'timeout': timeout, 'inter_speech_timeout': inter_speech_timeout, 'max_speech_duration': max_speech_duration }
          }
        ).to_json()
    else:
        return Response(method='hangup').to_json()

run(host='localhost', port=8080, debug=True)

For a diagram showing the outbound call logic, refer to Possible call flows.

  1. If playing the message and detecting speech is successful, you get back the play_completed Event from Telesign. You can hang up the call by sending the hangup Action or you can send further messages by sending additional play or speak Actions.