How to Build an Advanced IVR / Voice Bot

This guide demonstrates how to build a voice-based AI agent using the Vonage Voice API and OpenAI. You will create a Voice Bot that answers inbound calls, listens to a user's question using Automatic Speech Recognition (ASR), and responds with an intelligent answer generated by an LLM.

Prerequisites

Before you begin, ensure you have:

Setup Your Local Environment

Create a new directory for your project and install the required dependencies:

mkdir vonage-voice-bot cd vonage-voice-bot npm init -y npm install express openai

Expose Your Local Server

Vonage needs to send webhooks to your local machine. Use ngrok to expose your server:

ngrok http 3000

ngrok will be forwarding your port 3000 (defined in your server) towards a public URL, like https://{random id}.ngrok.app.

Keep this terminal open.

Provision Your Vonage Resources

Log in to the Vonage Dashboard to start.

Create a Voice Application

  1. Navigate to Applications > Create a new application.
  2. Give it a name (e.g., Voice AI Bot).
  3. Under Capabilities, enable Voice.
  4. In the Answer URL field, enter your Base URL followed by /webhooks/answer (e.g., https://{random id}.ngrok.app/webhooks/answer). Set the method to
    GET
    .
  5. In the Event URL field, enter your Base URL followed by /webhooks/events. Set the method to
    POST
    .
  6. Click Generate public and private key. Save the private.key file in your project folder (though we won't use it for this basic ASR flow, it's required for app creation).
  7. Click Save changes.
  1. Go to Numbers > Buy Numbers and purchase a voice-enabled number.
  2. Go to Your applications, select your bot application, and click Edit.
  3. Under the Numbers tab, click Link next to your newly purchased number.

Build the Voice Bot

Create a file named index.js and add the following code. Replace YOUR_OPENAI_API_KEY with your actual key.

Note: When running locally with ngrok, req.protocol/req.get('host') may not match your public tunnel URL. If webhooks fail, set your tunnel base URL in config (for example an env var) and build eventUrl from that instead.

Test the Application

  1. Run your server:

    node index.js
  2. Dial your Vonage number from your phone.

  3. When prompted, ask a question (e.g., Why is the sky blue? or Tell me a joke).

  4. The bot will capture your speech, send it to OpenAI, and read the response back to you using Text-to-Speech.

Enable Contextual Conversation

To make the conversation feel natural, we must modify the app to remember previous exchanges and re-prompt the user for more input.

Note: When running locally with ngrok, req.get('host') may not match your public tunnel host. If webhooks fail, build eventUrl using your public tunnel base URL (for example from config/env) instead of the request host.

Update your index.js with this stateful logic:

What Changed

  • The Session Map: We use the uuid to keep different callers' histories separate.
  • Recursive NCCO: Instead of a simple talk action, we now return a talk followed by an input action. This keeps the line open.
  • Memory: By passing the entire history array to OpenAI, the bot now understands follow-up questions like Tell me more about that.

Try the updated application by restarting your server and dialing the Vonage number linked to your application, as in the Test the Application step.

Add "Connect to Human" Tool

This step involves updating your tool definitions and adding a branch to your ASR logic that returns the Vonage connect action.

Update index.js

Add the new tool definition and modify the asr webhook to handle the transfer:

How It Works

  • The Intent: When the user says I want to speak to a manager or Help me, this is too hard the LLM recognizes the intent and triggers the connect_to_human function.
  • The Hand-off: Your server stops the ASR loop and sends the connect action to Vonage.
  • The Connection: Vonage creates a new outbound leg to the HUMAN_AGENT_NUMBER and bridges the two calls together. The AI is no longer "listening" once the connection is made.

Restart your server and call your application's Vonage number. When you ask the bot to speak to a human, it should say the phrase Please hold while I connect you to a human representative, and then connect you to the phone number you set as HUMAN_AGENT_NUMBER.

Next steps

  • Custom Voices: Change the voice name in the talk action for a more branded experience.
  • WebSocket Streaming: For lower latency, use WebSockets to stream audio in real-time.
  • Endpoints: Connect to your PBX or Contact Center via SIP or build your own web interface for a human agent with Client SDK.
  • .NET version: See the same IVR/voice-bot scenario implemented in .NET in this blog post.