Introduction
Generative AI has gone mainstream. Over the past year, models and products like ChatGPT and DALL-E 2 have appeared, allowing you to generate texts, images, and audio. The creators of ChatGPT and DALL-E 2, OpenAI, have opened up these powerful tools for developers to access and create imaginative new applications.
This article will first consider how to integrate the Vonage Voice API and OpenAI API. We will create an application that will receive a prompt from a user via phone call and send it to a generative AI service. We will send the AI’s response to the user with Vonage Messages API. Let’s dive in!
Prerequisites
We've already developed a starter Vonage application to receive a call, catch a user response, send it to OpenAI, and then message a WhatsApp reply. With the starter application, you just need to add your credentials into the .env file and deploy the app using Github Codespaces. Also, you can edit the starter application and experiment.
To get started:
Fork this repository. Open it in Codespaces by clicking "Create codespace on main"
Vonage CLI - Once Node.js is installed, you can use npm install -g @vonage/cli to install it. This tool allows you to create and manage your Vonage applications.
Alternatively, you can deploy the app with Node.js and ngrok.
Create a new Vonage app
Sign in/Sign up for free developer.vonage.com; to be able to use the Vonage Voice API, you'll have to create a Vonage Application from the developer portal.
All requests to the Vonage Voice API require authentication. Therefore, you should generate a private key with the Application API, which allows you to create JSON Web Tokens (JWT) to make the requests. For demo purposes, we will use the API key and API Secret.
In the left menu here, click API Settings. Under the API keys tab you will find your API key and Account secret (API secret).
Inside the Codespaces environment, copy and paste in the .env file API key and API Secret. Be careful not to store this information in the Github repository itself.
API_KEY=b**********
API_SECRET=******************
A Vonage application contains the security and configuration information you need to interact with the Vonage Voice APIs.
Let's create an Application using Vonage Developer Dashboard.
Click Applications in the leftside menu. Then click “Create a new application”. You’ll need to give your application a name. For example, “VoiceApp”. You will also need to generate a public and private key. Lastly, enable Voice capabilities.
Check it using Vonage CLI.
Set configuration
Expected Output
We need to buy a virtual number for our app to accept phone calls. You can do this with the Vonage CLI:
Search and buy virtual phone numbers.
Choose the number with the mentioned
Voice
in the Capabilities column.We can search for numbers by country code. The Vonage Numbers API uses ISO Alpha-2 codes. Find the country codes listed here.
The following CLI command allows us to buy a virtual number:
Or search and buy virtual phone numbers using Vonage Dashboard. Select Voice feature from the dropdown menu.
Find Our App
Link phone number and the App
For example
Expected response:
You can also link numbers using Vonage Dashboard, go to Applications, open related App (e.g. VoiceApp), and click the 'Link' button in the list of numbers.
Create Call Control Object
Speech Recognition (ASR)
Automatic Speech Recognition (ASR) enables apps to support voice input for cases such as IVR, identification, and different kinds of voice bots/assistants. Using this feature, our app receives transcribed user speech (in the text form) once it expects the user to answer some question by saying it rather than entering digits (DTMF); and then may continue the call flow according to its business logic based on what the user said. The following scheme shows how our application interacts with Vonage API through the Nexmo Call Control Object (NCCO).
You can use the input action to collect a user's typed digit input or speech input. This action is synchronous; Vonage processes the input and forwards it to the eventUrl
webhook endpoint. You will configure it to receive this input in your request. Your webhook endpoint should return another NCCO that replaces the existing NCCO and controls the call based on the user input.
We can see how this all works in our index.js
file. First, the App is waiting to be triggered at the webhook/answer
endpoint. Then the Application continues through the three actions:
{
action: 'talk',
text: 'Hi, describe an image that you want to generate'
},
{
eventMethod: 'POST',
action: 'input',
eventUrl: [
'[Codespace-or-server-URL]/webhooks/asr'],
type: [ "speech" ],
speech: {
language: 'en-gb',
endOnSilence: 0.1
}
},
{
action: 'talk',
text: 'Thank you'
}
Useful links:
Configure Open AI
OpenAI released new image generation capabilities with their DALL·E models. There are a couple of things that are very cool about the Open AI API. First, as of January 2023, users are credited $18 in free credit that can be used during their first three months. That’s a lot of credit to play with! For example, you can create or edit 900 images 1024x1024
. Secondly, according to the ‘Your Content’ chapter in OpenAI’s Terms of Use : "... OpenAI hereby assigns to you all its right, title and interest in and to Output.". This means that you totally own the generated images for any use, even for commercial purposes.
First, after registering and confirming your phone number, you need to generate your API key.
With this API key, we can move forward.
Paste it to your .env
file
API_KEY=b**********
API_SECRET=******************
OPENAI_API_KEY=sk-**************************************
In this tutorial, we use Images API to generate an image.
To generate an image, we use the following POST request.
var req = unirest('POST', 'https://api.openai.com/v1/images/generations')
.headers({
'Content-Type': 'application/json',
'Authorization': 'Bearer ' + openaiApiKey
})
With the following JSON payload. Where you can manage parameter
n
- the number of images generated, you can request 1-10 images at a timesize
- available sizes 256x256, 512x512, or 1024x1024 pixels. Smaller sizes are faster to generate.
.send(JSON.stringify({
"prompt": promptText,
"n": 1,
"size": "1024x1024"
}))
The prompt text we will be parsing from the user's response that we receive as a webhook.
let promptText = request.body.speech.results[0].text
After we receive a response from OpenAI API, we will parse the image URL from the body.
let imgUrl = res.body.data[0].url
Configure Vonage Message API
The Vonage Messages API allows you to send and receive messages over SMS, MMS, Facebook Messenger, Viber, and WhatsApp! In our use case, we use WhatsApp, but users can change the code and use Facebook or Viber. Remember to check out the Messages API documentation for more information."
We will use Vonage Messages API WhatsApp sandbox to receive a message with content or a link.
We created the sentMsg
function that receives two parameters, phoneNumber
, which contains information about the caller's phone number. And imgUrl
that we parse from the OpenAI response.
function sentMsg(phoneNumber, imgUrl)
Open WhatsApp on your smartphone, and click the photo icon. Next, scan the QR code and hit send on the pre-filled message.
Open Messages API Sandbox if you need additional information or want to use another messenger.
Deploy Our App in Codespace
Open GitHub Codespace in your fork.
In the Codespace terminal, run the following command to install our Node packages:
npm install
Run the following command in the terminal to receive the GitHub Codespace URL for webhooks
echo "https://${CODESPACE_NAME}-3000.preview.app.github.dev/webhooks/asr"
Copy and paste the output in EVENT_URL=
in the .env
file
API_KEY=b**********
API_SECRET=******************
OPENAI_API_KEY=sk-**************************************
EVENT_URL=https://******************************************-3000.preview.app.github.dev/webhooks/asr
Now we need to update our Vonage App settings to listen for our CodeSpace webhooks. We can do this using the Dashboard. Go to Application in the left menu. Choose our current app and click the 'Edit' button.
Update App settings using Vonage CLI. Paste your Codespace URL or server URL instead of [Codespace-or-server-URL]
in the following CLI command
Run the App
In the terminal, open the Port
tab. Click on Private
in the Visibility
column, and change it to Public
.
Everything is ready
Try this out by calling the number that is linked with the app
**732**56**
Tell the bot your tip
Wait for the content in the corresponding messenger
Monitor the console
Following, you can find a sample image that you can receive on your telephone.
Prompt text: Ukrainian Carpathians montane meadow, photograph, photorealistic 8K, HD
Wrap-up
Congratulations! You've now built a bot answering service for an inbound call with Vonage Voice API that sends messages with Vonage Messages API. And it's all hosted on GitHub Codespaces. You could extend this project with Vonage AI Studio, adding a dynamic workflow to respond differently according to caller input. Or, since we've already integrated with OpenAI, you could integrate ChatGPT.
Show off your AI creations or let us know how we can help! Join the conversation on our Vonage Community Slack or send us a message on Twitter.