Vonage Video API AI Features and Solutions

Vonage Video provides APIs and SDKs that make it possible to access both the video and audio streams from a live video session and modify these streams using Vonage and third-party AI tools and other services.

These modifications can be made in real-time within an ongoing Vonage video session. This means members of the session get to experience the modifications as they happen.

Enhance Video Applications with AI

The Vonage Media Processor provides access to the raw audio and video streams via the Video client SDKs to perform transform operations on individual video and audio tracks. This allows you to modify audio or video streams with Vonage or funnel through third-party services to make your desired modifications.

The Live Captions API enables transcribing of audio streams for real-time captions within the video session. The service is used to provide instant transcriptions for Automatic Speech Recognition (ASR) and Speech-to-Text (STT) use cases, such as on-screen captioning, transcriptions and translations.

Video and AI Topology

Similarly, the Audio Connector provides access to the raw audio from a Vonage video session through a websocket interface, which you can then send to your own managed AI services, like Microsoft's Azure AI Speech Service.

Post Call Transcription in video sessions provides transcriptions & summaries based on a recorded file. This is useful for meeting notes, meeting summarization and compliance use cases.

We have collected a variety of use cases along with the features and sample code that can help you get started. You can also check out the Related resources section below to get started or take a look at this chart to see which of our features might work best for you.

I want to... Feature name Code References (documentation, samples, how-tos)
SPEECH TO TEXT
Record my meeting with a transcription Post-Call Transcription Documentation, Sample App, Webinar
Summary of my meeting Post-Call Transcription w/Summary Documentation, Sample App
Add captions to my meetings Live Captions Documentation, Sample App, Webinar
Capture realtime notes Live Captions Documentation
Translate users into another language (text) Audio Connector Blog Article
Moderate user audio Signaling, Publisher Mute Blog Article
BASIC MEDIA PROCESSING
Blur video background (Web) Background Blur Background Blur/Replacement, Sample App
Blur video background (Native) Media Processor APIs Documentation
Replace video background (Web) Background Replacement Background Blur/Replacement
Replace video background (Native) Media Processor APIs Documentation
Add watermarks to streams Media Processor APIs Watermarks & QR Codes
Add QR Codes Media Processor APIs Watermarks & QR Codes
Remove background noise Noise Suppression Noise suppression
Modulate Audio with effects Media Processor APIs
COMPUTER VISION
Notice when someone raises their hand Media Processor APIs Blog Article
Detect when someone is looking off-screen Media Processor APIs Blog Article
See objects in the frame Media Processor APIs Image Analysis
BOTS AND AGENTIC AI
Get real-time sentiment of the caller Audio Connector Sentiment Analysis
Traditional Chatbot Audio Connector
Automatically translate users speech into another language Audio Connector
Build an AI pipeline (STT→LLM/AI Agent → TTS) Audio Connector Sample App
Speech to Speech AI Agent processing Audio Connector
Play an AI voice announcement into video session Audio Connector
  • Media Processor: Discover how to capture and modify video streams.
  • Audio Connector: Learn how to access and modify raw audio from live Vonage video sessions.
  • Blog Posts: See example tutorials on how live Vonage video sessions can be improved using AI.