Understand SMS encoding, character limits, and splitting

This page provides an overview of the SMS encodings Telesign supports and the character limits for SMS associated with each encoding, which apply to SMS and SMS Verify API. It also explains how messages are split when their length exceeds the character limit for a single SMS.

📘

NOTE:

Some behaviors, features, and default settings can vary depending on factors such as the carrier, country, language, and other variables. For more detailed information, please contact our Customer Support Team.

SMS encodings

Telesign supports several encodings for SMS. Here's an overview of each one:

EncodingDescription
GSM 03.38The size of one character is either 7 bits for most of the characters, or 14 bits for characters represented by the escape byte. We mostly use it in the 8-bit form. For the purposes of splitting is is assumed that each byte of GSM 03.38 encoded payload is 7 bits.
ASCIIEach character is exactly 8 bits. We don't support the extended version. Any value greater than 0x7F is invalid.
Latin-1Each character is exactly 8 bits. We use this encoding only if you connect to us via SMPP.
UTF-16-BEA Unicode encoding that uses 16 or 32 bits per character. UTF-16-BE is fully compatible with UCS2, the official encoding used by the SMPP standard for DCS 8. However, 32-bit code points, which are outside the UCS2 character set (such as flag emojis), may not display as a single character on the end user's device. Despite this, modern devices can typically interpret and display these sequences correctly.

Input and output encodings differ depending on whether you are using our REST API or SMPP.

Input encodings for REST messages

Use UTF-8 encoded Unicode characters as inputs or a format convertible to UTF-8 encoding. Otherwise, the request is rejected. This ensures the ability to send messages in any language, including special characters and non-text elements such as emojis, through our API.

Output encodings for REST messages

After we receive your UTF-8 encoded message, Telesign will try to encode your message further, in the following order:

  1. GSM 03.38
  2. ASCII
  3. UTF-16-BE

📘

NOTE:

Some countries and carriers do not support non-ASCII characters. In order to avoid a message delivery failure for these destinations, we may need to adjust the content.

Input encodings for SMPP messages

The input encoding for SMPP messages is determined by the Data Coding Scheme (DCS) value provided in the request. If the DCS value falls outside the supported range, an "Invalid Data Coding Scheme" response is returned.

Output encodings for SMPP messages

The output encoding is determined from the input message.

📘

NOTE:

Some countries and carriers do not support non-ASCII characters. In order to avoid a message delivery failure for these destinations, we may need to adjust the content. Additionally, for specific providers and operators, we may need to override the output DCS and encoding to ensure proper message delivery.

SMS character limits and splitting

The standard length of a single SMS message is 140 bytes. Depending on the encoding, the number of characters for a single SMS varies. When your message exceeds the character limit for a single SMS, we will split the message into multiple segments and unify it on the end user's device.

The character limits for each encoding are listed in the first column of the table below. A 6-byte User Data Header (UDH) is added to the beginning of each segment to enable concatenation (the process of combining the segments back into a single message on the end user's device). The addition of the UDH reduces the character limit per segment, as listed in the second column of the table.

EncodingCharacter limit for a single messageCharacter limit for a long message
GSM 03.38160153 (per message)
ASCII140134 (per message)
Latin-1140134 (per message)
UTF-16-BE7067 (per message)

The maximum number of parts for a concatenated message is typically 10. More than 10 parts may not be supported by some carriers.

Smart splitting

In some cases, message parts can't be reassembled on the end user's device, causing them to appear as separate messages.

The smart splitting algorithm is designed to prevent splitting words across messages (a "word" is considered a sequence of characters without whitespace). The message part ends with the last complete word that fits, potentially leaving some unused bytes. This also ensures that critical elements, such as URLs and email addresses, are not split across messages. This improves end users' click through and conversion rates. If a word exceeds the length of a single message, it will be split across parts.

In certain countries, message splitting behaves as if the smart splitting feature is always enabled. To obtain a list of these countries or to enable the smart splitting feature for your account, contact our Customer Support Team.