Technical Details
Looking for AI?
Build Video + AI experiences with Video + AI overview, Live Captions, and Post-Call Transcription.
Before you start: see definitions in Glossary.
Architecture at a glance
Every Vonage Video app has a client, an app server, and runs inside a session in the Vonage cloud.
Client
- Uses the client SDK (Web/iOS/Android/Desktop)
- Obtains sessionId + token from your server
- Connects to the session, publishes local A/V
- Subscribes to remote streams, handles events
App Server
- Uses a Server SDK / REST API
- Creates sessions; generates tokens and roles
- Implements business logic & webhooks/callbacks
- Controls recording, broadcast, moderation, storage
Session
- “Room” hosted in Vonage Video Cloud
- Signaling, events, media routing
- Publish/subscribe of streams
Developer map — where to find things
Client SDKs
- Web (JavaScript) — quickstart, API reference, and browser samples.
- iOS — quickstart, API reference, and sample projects.
- Android — quickstart, API reference, and sample projects.
- React Native
- Windows
- macOS
- Linux.
Server SDKs
- Server SDKs (overview) — create sessions, issue tokens/roles, manage archives, broadcasts, SIP, and callbacks.
Starter guides & samples
- Getting Started — step-by-step to your first call.
- Sample apps — ready-to-run (1:1, group, screen share, moderation).
Tools
- Inspector — debug a session (participants, quality, events, errors).
- Developer tools — network tests and diagnostics.
- Callbacks & webhooks — event payloads for archives, broadcasts, and more.
Authentication & request flow (conceptual)
- Credentials — Your backend holds the Vonage API Key and API Secret (never expose the Secret to clients).
- Create a session — Backend creates a sessionId using a Server SDK or REST.
- Issue a client token — Backend generates a short-lived token (optionally with a role/TTL).
- Client joins — Frontend fetches
{ apiKey, sessionId, token }from your endpoint and connects via the client SDK.
Feature overview — what you can build
Core real-time
- Sessions & publish/subscribe — multi-party A/V using WebRTC.
- Screen sharing — share a window or the whole screen.
- Messaging & signaling — in-session data messages for chat and controls.
- Layouts & UI patterns — grid, active speaker, picture-in-picture.
Recording & delivery
- Archiving (recording) — start/stop from your server; store and retrieve recordings.
- Broadcasting (HLS/RTMP) — stream a session to CDN destinations for large audiences.
Interconnect & moderation
- SIP interconnect — bridge PSTN/SIP into a session.
- Moderation controls — roles, force mute, disconnect, token expiry.
Backend responsibilities
In production, your backend typically:
- Creates sessions and issues tokens with roles,
- Starts/stops archives and broadcasts; manages lifecycle and storage,
- Handles webhooks/callbacks (archive completed, broadcast status),
- Orchestrates moderation and enforces business rules,
- Initiates SIP interconnect when needed,
- Brokers speech/AI pipelines (e.g., via Audio Connector) and manages secrets/keys,
- Audits usage and maps session data to your domain model.
Note: TokBox is now Vonage Video.