Vonage Video API AI Features and Solutions
Vonage Video provides APIs and SDKs that make it possible to access both the video and audio streams from a live video session and modify these streams using Vonage and third-party AI tools and other services.
These modifications can be made in real-time within an ongoing Vonage video session. This means members of the session get to experience the modifications as they happen.
Enhance Video Applications with AI
The Vonage Media Processor provides access to the raw audio and video streams via the Video client SDKs to perform transform operations on individual video and audio tracks. This allows you to modify audio or video streams with Vonage or funnel through third-party services to make your desired modifications.
The Live Captions API enables transcribing of audio streams for real-time captions within the video session. The service is used to provide instant transcriptions for Automatic Speech Recognition (ASR) and Speech-to-Text (STT) use cases, such as on-screen captioning, transcriptions and translations.

Similarly, the Audio Connector provides access to the raw audio from a Vonage video session through a websocket interface, which you can then send to your own managed AI services, like Microsoft's Azure AI Speech Service.
Post Call Transcription in video sessions provides transcriptions & summaries based on a recorded file. This is useful for meeting notes, meeting summarization and compliance use cases.
We have collected a variety of use cases along with the features and sample code that can help you get started. You can also check out the Related resources section below to get started or take a look at this chart to see which of our features might work best for you.
| I want to... | Feature name | Code References (documentation, samples, how-tos) |
|---|---|---|
| SPEECH TO TEXT | ||
| Record my meeting with a transcription | Post-Call Transcription | Documentation, Sample App, Webinar |
| Summary of my meeting | Post-Call Transcription w/Summary | Documentation, Sample App |
| Add captions to my meetings | Live Captions | Documentation, Sample App, Webinar |
| Capture realtime notes | Live Captions | Documentation |
| Translate users into another language (text) | Audio Connector | Blog Article |
| Moderate user audio | Signaling, Publisher Mute | Blog Article |
| BASIC MEDIA PROCESSING | ||
| Blur video background (Web) | Background Blur | Background Blur/Replacement, Sample App |
| Blur video background (Native) | Media Processor APIs | Documentation |
| Replace video background (Web) | Background Replacement | Background Blur/Replacement |
| Replace video background (Native) | Media Processor APIs | Documentation |
| Add watermarks to streams | Media Processor APIs | Watermarks & QR Codes |
| Add QR Codes | Media Processor APIs | Watermarks & QR Codes |
| Remove background noise | Noise Suppression | Noise suppression |
| Modulate Audio with effects | Media Processor APIs | |
| COMPUTER VISION | ||
| Notice when someone raises their hand | Media Processor APIs | Blog Article |
| Detect when someone is looking off-screen | Media Processor APIs | Blog Article |
| See objects in the frame | Media Processor APIs | Image Analysis |
| BOTS AND AGENTIC AI | ||
| Get real-time sentiment of the caller | Audio Connector | Sentiment Analysis |
| Traditional Chatbot | Audio Connector | |
| Automatically translate users speech into another language | Audio Connector | |
| Build an AI pipeline (STT→LLM/AI Agent → TTS) | Audio Connector | Sample App |
| Speech to Speech AI Agent processing | Audio Connector | |
| Play an AI voice announcement into video session | Audio Connector |
Related resources
- Media Processor: Discover how to capture and modify video streams.
- Audio Connector: Learn how to access and modify raw audio from live Vonage video sessions.
- Blog Posts: See example tutorials on how live Vonage video sessions can be improved using AI.