Integrate Phone Calls and WhatsApp With OpenAI

Published on January 26, 2023

Introduction

Generative AI has gone mainstream. Over the past year, models and products like ChatGPT and DALL-E 2 have appeared, allowing you to generate texts, images, and audio. The creators of ChatGPT and DALL-E 2, OpenAI, have opened up these powerful tools for developers to access and create imaginative new applications.

This article will first consider how to integrate the Vonage Voice API and OpenAI API. We will create an application that will receive a prompt from a user via phone call and send it to a generative AI service. We will send the AI’s response to the user with Vonage Messages API. Let’s dive in!

Prerequisites

We've already developed a starter Vonage application to receive a call, catch a user response, send it to OpenAI, and then message a WhatsApp reply. With the starter application, you just need to add your credentials into the .env file and deploy the app using Github Codespaces. Also, you can edit the starter application and experiment.

To get started:

Fork this repository. Open it in Codespaces by clicking "Create codespace on main"
Vonage CLI - Once Node.js is installed, you can use npm install -g @vonage/cli to install it. This tool allows you to create and manage your Vonage applications.

Create Codespace interface

Alternatively, you can deploy the app with Node.js and ngrok.

Create a new Vonage app

Sign in/Sign up for free developer.vonage.com; to be able to use the Vonage Voice API, you'll have to create a Vonage Application from the developer portal.

All requests to the Vonage Voice API require authentication. Therefore, you should generate a private key with the Application API, which allows you to create JSON Web Tokens (JWT) to make the requests. For demo purposes, we will use the API key and API Secret.

In the left menu here, click API Settings. Under the API keys tab you will find your API key and Account secret (API secret).

API Settings

Inside the Codespaces environment, copy and paste in the .env file API key and API Secret. Be careful not to store this information in the Github repository itself.

API_KEY=b**********
API_SECRET=******************

A Vonage application contains the security and configuration information you need to interact with the Vonage Voice APIs.

Let's create an Application using Vonage Developer Dashboard.

Click Applications in the leftside menu. Then click “Create a new application”. You’ll need to give your application a name. For example, “VoiceApp”. You will also need to generate a public and private key. Lastly, enable Voice capabilities.

Create Vonage App

Check it using Vonage CLI.

npm install -g @vonage/cli

Set configuration

vonage config:set --apiKey=[API_Key] --apiSecret=[API_Secret]

Expected Output

Configuration saved.

We need to buy a virtual number for our app to accept phone calls. You can do this with the Vonage CLI:

Search and buy virtual phone numbers.
Choose the number with the mentioned Voice in the Capabilities column.
We can search for numbers by country code. The Vonage Numbers API uses ISO Alpha-2 codes. Find the country codes listed here.

vonage numbers:search GB

The following CLI command allows us to buy a virtual number:

vonage numbers:buy **732**56** GB

Or search and buy virtual phone numbers using Vonage Dashboard. Select Voice feature from the dropdown menu.

Find Our App

vonage apps

Name Id Capabilities ─────────────────────────────── ──────────────────────────────────── ──────────── VoiceApp 4e15f46e-****-4a0d-9749-000000000000 voice

Link phone number and the App

vonage apps:link [APP_ID] --number=[NUMBER]

For example

vonage apps:link 4e15f46e-****-4a0d-9749-000000000000 --number=44750385680

Expected response:

Number '**732**56**' is assigned to application '4e15f46e-****-4a0d-9749-000000000000'.

You can also link numbers using Vonage Dashboard, go to Applications, open related App (e.g. VoiceApp), and click the 'Link' button in the list of numbers.

link number with app

Create Call Control Object

Speech Recognition (ASR)

Automatic Speech Recognition (ASR) enables apps to support voice input for cases such as IVR, identification, and different kinds of voice bots/assistants. Using this feature, our app receives transcribed user speech (in the text form) once it expects the user to answer some question by saying it rather than entering digits (DTMF); and then may continue the call flow according to its business logic based on what the user said. The following scheme shows how our application interacts with Vonage API through the Nexmo Call Control Object (NCCO).

ASR scheme

You can use the input action to collect a user's typed digit input or speech input. This action is synchronous; Vonage processes the input and forwards it to the eventUrl webhook endpoint. You will configure it to receive this input in your request. Your webhook endpoint should return another NCCO that replaces the existing NCCO and controls the call based on the user input.

We can see how this all works in our index.js file. First, the App is waiting to be triggered at the webhook/answer endpoint. Then the Application continues through the three actions:

{
      action: 'talk',
      text: 'Hi, describe an image that you want to generate'
    },
    {
      eventMethod: 'POST',
      action: 'input',
      eventUrl: [
        '[Codespace-or-server-URL]/webhooks/asr'],
      type: [ "speech" ],
      speech: {
        language: 'en-gb',
        endOnSilence: 0.1
      }
    },
    {
      action: 'talk',
      text: 'Thank you'
    }

Useful links:

Configure Open AI

OpenAI released new image generation capabilities with their DALL·E models. There are a couple of things that are very cool about the Open AI API. First, as of January 2023, users are credited $18 in free credit that can be used during their first three months. That’s a lot of credit to play with! For example, you can create or edit 900 images 1024x1024. Secondly, according to the ‘Your Content’ chapter in OpenAI’s Terms of Use : "... OpenAI hereby assigns to you all its right, title and interest in and to Output.". This means that you totally own the generated images for any use, even for commercial purposes.

First, after registering and confirming your phone number, you need to generate your API key.

With this API key, we can move forward.

Paste it to your .env file

API_KEY=b**********
API_SECRET=******************
OPENAI_API_KEY=sk-**************************************

In this tutorial, we use Images API to generate an image.

To generate an image, we use the following POST request.

var req = unirest('POST', 'https://api.openai.com/v1/images/generations')
    .headers({
      'Content-Type': 'application/json',
      'Authorization': 'Bearer ' + openaiApiKey
    })

With the following JSON payload. Where you can manage parameter

n - the number of images generated, you can request 1-10 images at a time
size - available sizes 256x256, 512x512, or 1024x1024 pixels. Smaller sizes are faster to generate.

.send(JSON.stringify({
      "prompt": promptText,
      "n": 1,
      "size": "1024x1024"
    }))

The prompt text we will be parsing from the user's response that we receive as a webhook.

let promptText = request.body.speech.results[0].text

After we receive a response from OpenAI API, we will parse the image URL from the body.

let imgUrl = res.body.data[0].url

Configure Vonage Message API

The Vonage Messages API allows you to send and receive messages over SMS, MMS, Facebook Messenger, Viber, and WhatsApp! In our use case, we use WhatsApp, but users can change the code and use Facebook or Viber. Remember to check out the Messages API documentation for more information."

We will use Vonage Messages API WhatsApp sandbox to receive a message with content or a link.

We created the sentMsg function that receives two parameters, phoneNumber, which contains information about the caller's phone number. And imgUrl that we parse from the OpenAI response.

function sentMsg(phoneNumber, imgUrl)

Open WhatsApp on your smartphone, and click the photo icon. Next, scan the QR code and hit send on the pre-filled message.

WhatsApp Sandbox QR

Open Messages API Sandbox if you need additional information or want to use another messenger.

Deploy Our App in Codespace

Open GitHub Codespace in your fork.

In the Codespace terminal, run the following command to install our Node packages:

npm install

Run the following command in the terminal to receive the GitHub Codespace URL for webhooks

echo "https://${CODESPACE_NAME}-3000.preview.app.github.dev/webhooks/asr"

Copy and paste the output in EVENT_URL= in the .env file

API_KEY=b**********
API_SECRET=******************
OPENAI_API_KEY=sk-**************************************
EVENT_URL=https://******************************************-3000.preview.app.github.dev/webhooks/asr

Now we need to update our Vonage App settings to listen for our CodeSpace webhooks. We can do this using the Dashboard. Go to Application in the left menu. Choose our current app and click the 'Edit' button.

Edit App

Update App settings using Vonage CLI. Paste your Codespace URL or server URL instead of [Codespace-or-server-URL] in the following CLI command

vonage apps:update 4e15f46e-****-4a0d-9749-000000000000 --voice_event_url=[Codespace-or-server-URL]/webhooks/event --voice_answer_url=[Codespace-or-server-URL]/webhooks/answer

Run the App

node index.js

In the terminal, open the Port tab. Click on Private in the Visibility column, and change it to Public.

codespace port public

Everything is ready

Try this out by calling the number that is linked with the app **732**56**
Tell the bot your tip
Wait for the content in the corresponding messenger
Monitor the console

Following, you can find a sample image that you can receive on your telephone.

Generated image

Prompt text: Ukrainian Carpathians montane meadow, photograph, photorealistic 8K, HD

Wrap-up

Congratulations! You've now built a bot answering service for an inbound call with Vonage Voice API that sends messages with Vonage Messages API. And it's all hosted on GitHub Codespaces. You could extend this project with Vonage AI Studio, adding a dynamic workflow to respond differently according to caller input. Or, since we've already integrated with OpenAI, you could integrate ChatGPT.

Show off your AI creations or let us know how we can help! Join the conversation on our Vonage Community Slack or send us a message on Twitter.

5 AI Studio Techniques Every Low-Code Developer Should Know

Learn how to improve your Conversational AI Agents with 5 simple but powerful flows that are useful in every agent!

How to Handle Unexpected User Input With AI Studio Fallbacks

Learn how to handle unexpected user input with Vonage AI Studio fallbacks and avoid endless conversational loops.

November 1, 2024

#RAG

#LLM

Reducing RAG Pipeline Latency for Real-Time Voice Conversations

This article explores ways to reduce latency in RAG systems, focusing on real-time voice interactions for customer support.

Oleksii BorysenkoVonage Alumni

Oleksii is a Developer Advocate at Vonage, author and storyteller. His interests include AI/ML, unified communications, education technologies, cloud technologies, and open source.

Ready to start building?

Experience seamless connectivity, real-time messaging, and crystal-clear voice and video calls-all at your fingertips.