How to deliver natural conversational experiences using Amazon Lex Streaming APIs

Natural conversations often include pauses and interruptions. During customer service calls, a caller may ask to pause the conversation or hold the line while they look up the necessary information before continuing to answer a question. For example, callers often need time to retrieve credit card details when making bill payments. Interruptions are also common. Callers may interrupt a human agent with an answer before the agent finishes asking the entire question (for example, “What’s the CVV code for your credit card? It is the three-digit code top right corner.…”). Just like conversing with human agents, a caller interacting with a bot may interrupt or instruct the bot to hold the line. Previously, you had to orchestrate such dialog on Amazon Lex by managing client attributes and writing code via an AWS Lambda function. Implementing a hold pattern required code to keep track of the previous intent so that the bot could continue the conversation. The orchestration of these conversations was complex to build and maintain, and impacted the time to market for conversational interfaces. Moreover, the user experience was disjointed because the properties of prompts such as ability to interrupt were defined in the session attributes on the client.

Amazon Lex’s new streaming conversation APIs allow you to deliver sophisticated natural conversations across different communication channels. You can now easily configure pauses, interruptions and dialog constructs while building a bot with the Wait and Continue and Interrupt features. This simplifies the overall design and implementation of the conversation and makes it easier to manage. By using these features, the bot builder can quickly enhance the conversational capability of virtual agents or IVR systems.

In the new Wait and Continue feature, the ability to put the conversation into a waiting state is surfaced during slot elicitation. You can configure the slot to respond with a “Wait” message such as “Sure, let me know when you’re ready” when a caller asks for more time to retrieve information. You can also configure the bot to continue the conversation with a “Continue” response based on defined cues such as “I’m ready for the policy ID. Go ahead.” Optionally, you can set a “Still waiting” prompt to play messages like “I’m still here” or “Let me know if you need more time.” You can set the frequency of these messages to play and configure a maximum wait time for user input. If the caller doesn’t provide any input within the maximum wait duration, Amazon Lex resumes the dialog by prompting for the slot. The following screenshot shows the wait and continue configuration options on the Amazon Lex console.

The Interrupt feature enables callers to barge-in while a prompt is played by the bot. A caller may interrupt the bot and answer a question before the prompt is completed. This capability is surfaced at the prompt level and provided as a default setting. On the Amazon Lex console, navigate to the Advanced Settings and under Slot prompts, enable the setting to allow users to interrupt the prompt.

After configuring these features, you can initiate a streaming interaction with the Lex bot by using the StartConversation API. The streaming capability enables you to capture user input, manage state transitions, handle events, and deliver a response required as part of a conversation. The input can be one of three types: audio, text, or DTMF, whereas the response can be either audio or text. The dialog progresses by eliciting an intent, populating any slots, confirming the intent, and finally closing the intent. Streaming allows intents to be defined based on different conversation states such as: InProgress, Waiting, Confirmed, Denied, Fulfilled, ReadyForFulfillment or Failed. A detailed list of different dialog and intents states, see the documentation for Amazon Lex intents.

The StartConversation API uses HTTP/2 for bidirectional communication between the client and Amazon Lex. Clients are expected to maintain a single long-running stream throughout the conversation and exchange required information with the server (and vice versa) via different events. To learn more about streaming for Amazon Lex bots, please refer the Lex streaming documentation.

Amazon Lex Streaming APIs in action

Let’s put all of this together in a bot. I first use the Amazon Lex console to configure a bot and then use the following sample telephony application to interact with the bot. For the purposes of this post, we use a simple bot to complete a bill payment with a credit card. Because callers often need time to retrieve their credit card details, we enable the wait and continue construct when eliciting the credit card number. We also set up a service-side interrupt to manage interruption.

The following is a sample conversation with our bot:

Caller:    I’d like to make a payment on my account
Bot:        Sure. Can we start off with your account ID?
Caller:    Yes the account ID is 1234
Bot:        Great. The balance due is $100. How much would you like to pay today?
Caller:    $100
Bot:        Ok. Can you share the 16-digit number on your credit card?
Caller:    Can you wait a minute?
Bot:        Sure

—————- a few seconds later —————-

Bot:        Let me know when ready. No rush.

—————- a few seconds later —————-

Caller:    Ok I am back
Bot:        Ok please go ahead
Caller:    The credit card number is 1234 5678 1234 5678
Bot:        Got it. Let me repeat that for you 1234 5678 1234 5678. Is that correct?
Caller:    Yes
Bot:        Ok. What’s the CVV code? You can find it in the top right on the signature panel on the back of the card.
Caller:    123
Bot:        Awesome. I will go ahead and process the $100 payment on your account using the card ending in 5678.
Caller:    Ok
Bot:        The payment went through. Your confirmation code is 1234.

The first step is to build an Amazon Lex bot with intents to process payment and get balance on the account. The ProcessPayment intent elicits the information required to process the payment, such as the payment amount, credit card number, CVV code, and expiration date. The GetBalanceAmount intent provides the balance on the account. The FallbackIntent is triggered when the user input can’t be processed by either of the two configured intents.

Deploying the sample bot

To create the sample bot, complete the following steps. This creates an Amazon Lex bot called PaymentsBot.

On the Amazon Lex console, choose Create Bot.
In the Bot configuration section, give the bot the name PaymentsBot.
Specify AWS Identity and Access Management (IAM) permissions and COPPA flag.
Choose Next.
Under Languages, choose English(US).
Choose Done.
Add the ProcessPayment and GetBalanceAmount intents to your bot.
For the ProcessPayment intent, add the following slots:
1. PaymentAmount slot using the built-in AMAZON.Number slot type
2. CreditCardNumber slot using the built-in AMAZON.AlphaNumeric slot type
3. CVV slot using the built-in AMAZON.Number slot type
4. ExpirationDate using the built-in AMAZON.Date built-in slot type
Configure slot elicitation prompts for each slot.
Configure a closing response for the ProcessPayment intent.
Similarly, add and configure slots and prompts for GetBalanceAmount intents.
Choose Build to test your bot.

For more information about creating a bot, see the Lex V2 documentation.

Configuring Wait and Continue

Choose the ProcessPayment intent and navigate to the CreditCardNumber slot.
Choose Advanced Settings to open the slot editor.
Enable Wait and Continue for the slot.
Provide the Wait, Still Waiting, and Continue responses.
Save the intent and choose Build.

The bot is now configured to support the Wait and Continue dialog construct. Now let’s configure the client code. You can use a telephony application to interact with your Lex bot. You can download the code for setting up a telephony IVR interface via Twilio at the GitHub project. The link contains information to set up a telephony interface as well as a client application code to communicate between the telephony interface and Amazon Lex.

Now, let us review the client-side setup to use the bot configuration that we just enabled on the Amazon Lex console. The client application uses the Java SDK to capture payment information. In the beginning, you use the ConfigurationEvent to set up the conversation parameters. Then, you start sending an input event (AudioInputEvent, TextInputEvent or DTMFInputEvent) to send user input to the bot depending on the input type. When sending audio data, you would need to send multiple AudioInputEvent events, with each event containing a slice of the data.

The service first responds with TranscriptEvent to give transcription, then sends the IntentResultEvent to surface the intent classification results. Subsequently, Amazon Lex sends a response event (TextResponseEvent or AudioResponseEvent) that contains the response to play back to caller. If the caller requests the bot to hold the line, the intent is moved to the Waiting state and Amazon Lex sends another set of TranscriptEvent, IntentResultEvent and a response event. When the caller requests to continue the conversation, the intent is set to the InProgress state and the service sends another set of TranscriptEvent, IntentResultEvent and a response event. While the dialog is in the Waiting state, Amazon Lex responds with a set of IntentResultEvent and response event for every “Still waiting” message (there is no transcript event for server-initiated responses). If the caller interrupts the bot prompt at any time, Amazon Lex returns a PlaybackInterruptionEvent.

Let’s walk through the main elements of the client code:

Create the Amazon Lex client:

AwsCredentialsProvider awsCredentialsProvider = StaticCredentialsProvider
        .create(AwsBasicCredentials.create(accessKey, secretKey));

LexRuntimeV2AsyncClient lexRuntimeServiceClient = LexRuntimeV2AsyncClient.builder()
        .region(region)
        .credentialsProvider(awsCredentialsProvider)
        .build();

Create a handler to publish data to server:

EventsPublisher eventsPublisher = new EventsPublisher();

Create a handler to process bot responses:

public class BotResponseHandler implements StartConversationResponseHandler {

    private static final Logger LOG = Logger.getLogger(BotResponseHandler.class);


    @Override
    public void responseReceived(StartConversationResponse startConversationResponse) {
        LOG.info("successfully established the connection with server. request id:" + startConversationResponse.responseMetadata().requestId()); // would have 2XX, request id.
    }

    @Override
    public void onEventStream(SdkPublisher sdkPublisher) {

        sdkPublisher.subscribe(event -> {
            if (event instanceof PlaybackInterruptionEvent) {
                handle((PlaybackInterruptionEvent) event);
            } else if (event instanceof TranscriptEvent) {
                handle((TranscriptEvent) event);
            } else if (event instanceof IntentResultEvent) {
                handle((IntentResultEvent) event);
            } else if (event instanceof TextResponseEvent) {
                handle((TextResponseEvent) event);
            } else if (event instanceof AudioResponseEvent) {
                handle((AudioResponseEvent) event);
            }
        });
    }

    @Override
    public void exceptionOccurred(Throwable throwable) {
        LOG.error(throwable);
        System.err.println("got an exception:" + throwable);
    }

    @Override
    public void complete() {
        LOG.info("on complete");
    }

    private void handle(PlaybackInterruptionEvent event) {
        LOG.info("Got a PlaybackInterruptionEvent: " + event);

        LOG.info("Done with a  PlaybackInterruptionEvent: " + event);
    }

    private void handle(TranscriptEvent event) {
        LOG.info("Got a TranscriptEvent: " + event);
    }


    private void handle(IntentResultEvent event) {
        LOG.info("Got an IntentResultEvent: " + event);

    }

    private void handle(TextResponseEvent event) {
        LOG.info("Got an TextResponseEvent: " + event);

    }

    private void handle(AudioResponseEvent event) {//synthesize speech
        LOG.info("Got a AudioResponseEvent: " + event);
    }

}

Initiate the connection with the bot:

StartConversationRequest.Builder startConversationRequestBuilder = StartConversationRequest.builder()
        .botId(botId)
        .botAliasId(botAliasId)
        .localeId(localeId);

// configure the conversation mode with bot (defaults to audio)
startConversationRequestBuilder = startConversationRequestBuilder.conversationMode(ConversationMode.AUDIO);

// assign a unique identifier for the conversation
startConversationRequestBuilder = startConversationRequestBuilder.sessionId(sessionId);

// build the initial request
StartConversationRequest startConversationRequest = startConversationRequestBuilder.build();

CompletableFuture conversation = lexRuntimeServiceClient.startConversation(
        startConversationRequest,
        eventsPublisher,
        botResponseHandler);

Establish the configurable parameters via ConfigurationEvent:

public void configureConversation() {
    String eventId = "ConfigurationEvent-" + eventIdGenerator.incrementAndGet();

    ConfigurationEvent configurationEvent = StartConversationRequestEventStream
            .configurationEventBuilder()
            .eventId(eventId)
            .clientTimestampMillis(System.currentTimeMillis())
            .responseContentType(RESPONSE_TYPE)
            .build();

    eventWriter.writeConfigurationEvent(configurationEvent);
    LOG.info("sending a ConfigurationEvent to server:" + configurationEvent);
}

Send audio data to server:

public void writeAudioEvent(ByteBuffer byteBuffer) {
    String eventId = "AudioInputEvent-" + eventIdGenerator.incrementAndGet();

    AudioInputEvent audioInputEvent = StartConversationRequestEventStream
            .audioInputEventBuilder()
            .eventId(eventId)
            .clientTimestampMillis(System.currentTimeMillis())
            .audioChunk(SdkBytes.fromByteBuffer(byteBuffer))
            .contentType(AUDIO_CONTENT_TYPE)
            .build();

    eventWriter.writeAudioInputEvent(audioInputEvent);
}

Manage interruptions on the client side:

private void handle(PlaybackInterruptionEvent event) {
    LOG.info("Got a PlaybackInterruptionEvent: " + event);

    callOperator.pausePlayback();

    LOG.info("Done with a  PlaybackInterruptionEvent: " + event);
}

Enter the code to disconnect the connection:

public void disconnect() {

    String eventId = "DisconnectionEvent-" + eventIdGenerator.incrementAndGet();

    DisconnectionEvent disconnectionEvent = StartConversationRequestEventStream
            .disconnectionEventBuilder()
            .eventId(eventId)
            .clientTimestampMillis(System.currentTimeMillis())
            .build();

    eventWriter.writeDisconnectEvent(disconnectionEvent);

    LOG.info("sending a DisconnectionEvent to server:" + disconnectionEvent);
}

You can now deploy the bot on your desktop to test it out.

Things to know

The following are a couple of important things to keep in mind when you’re using the Amazon Lex V2 Console and APIs:

Regions and languages – The Streaming APIs are available in all existing Regions and support all current languages.
Interoperability with Lex V1 console – Streaming APIs are only available in the Lex V2 console and APIs.
Integration with Amazon Connect – As of this writing, Lex V2 APIs are not supported on Amazon Connect. We plan to provide this integration as part of our near-term roadmap.
Pricing – Please see the details on the Lex pricing page.

Try it out

Amazon Lex Streaming API is available now and you can start using it today. Give it a try, design a bot, launch it and let us know what you think! To learn more, please see the Lex streaming API documentation.

About the Authors

Esther Lee is a Product Manager for AWS Language AI Services. She is passionate about the intersection of technology and education. Out of the office, Esther enjoys long walks along the beach, dinners with friends and friendly rounds of Mahjong.

Swapandeep Singh is an engineer with Amazon Lex team. He works on making interactions with bot smoother and more human-like. Outside of work, he likes to travel and learn about different cultures.