Video Connector
The Vonage Video Connector Python library enables you to programmatically participate in Vonage Video API sessions as a server-side participant. This library allows you to connect to video sessions, publish and subscribe to streams, and process real-time audio and video data.
The library handles WebRTC connectivity, media processing, and session management automatically, allowing you to focus on building your application logic. Audio is delivered as Linear PCM 16-bit data, and video is delivered as 8-bit frames in YUV420P, RGB24, or ARGB32 formats, all at configurable sample rates, resolutions, and channel configurations.
Important The Vonage Video Connector Python library is designed for server-side applications and requires valid Vonage Video API credentials and tokens with appropriate permissions.
This topic includes the following sections:
- Private beta
- Requirements
- Data structures
- Connecting to a session
- Session settings
- Publishing streams
- Subscribing to streams
- Audio data handling
- Video data handling
- Media buffer management
- Event callbacks
Private beta
Vonage Video Connector is in beta stage. Contact us to get early access.
Requirements
This library requires Python 3.13 running Linux AMD64 and ARM64 platforms. We recommend using Debian Bookworm as it is the distribution where this has been most thoroughly tested.
Data structures
The Vonage Video Connector Python library uses several key data structures to represent sessions, connections, streams, and audio data. Understanding these structures is essential for working with the library effectively.
Session
Represents a Vonage Video API session that clients can connect to:
The Session object is passed to various callback functions to identify which session triggered the event.
Connection
Represents a participant's connection to a session:
Connection data can be used to store custom metadata about participants, such as user IDs or roles.
Stream
Represents a media stream (audio/video) published by a participant:
Streams are created when participants publish media and are used for subscribing to receive their audio/video data.
Publisher
Represents your published stream in the session:
The Publisher object is used in publisher-related callbacks and represents your own published media stream.
Subscriber
Represents a subscription to another participant's stream:
The Subscriber object is used in subscriber-related callbacks and represents your subscription to receive another participant's media.
AudioData
Represents audio data being transmitted or received:
Audio format requirements:
- Sample buffer must contain 16-bit signed integers
- Valid sample rates: 8000, 12000, 16000, 24000, 32000, 44100, 48000 Hz
- Channels: 1 (mono) or 2 (stereo)
- Buffer size must accommodate:
number_of_frames * number_of_channelssamples
VideoFrame
Represents video frame data being transmitted or received:
Video format requirements:
- Frame buffer must contain 8-bit unsigned chars
- Valid formats: YUV420P, RGB24 (BGR), ARGB32 (BGRA)
- Maximum resolution: 1920x1080 pixels (2,073,600 total pixels)
- Buffer size varies by format and resolution
VideoResolution
Represents the dimensions of a video frame:
MediaBufferStats
Provides statistics about media buffers:
Configuration structures
SessionSettings
Configures session-level behavior:
SessionAVSettings
Configures audio and video settings for the session:
Understanding audio configuration:
The SessionAVSettings allows you to configure different audio formats for publishing and receiving:
audio_publisher: Defines the format for audio data you provide via
add_audio(). The audio data you send must match this configuration's sample rate and number of channels.audio_subscribers_mix: Defines the format for the mixed audio you receive from all subscribed streams via the
on_audio_data_cbcallback. The library automatically handles mixing multiple subscribers' audio and resampling/channel conversion to match your specified format.
This separation allows you to optimize for your use case. For example:
- Publish in stereo (2 channels) for high-quality output while receiving a mono mix (1 channel) to simplify processing
- Publish at 16kHz for speech while receiving at 48kHz for high-fidelity playback
- Use different sample rates for publishing and subscription based on your audio processing pipeline requirements
SessionAudioSettings
Configures audio format for publishing or receiving audio data:
SessionVideoPublisherSettings
Configures video settings for publishing:
LoggingSettings
Controls logging verbosity:
PublisherSettings
Configures your published stream:
Note: At least one of has_audio or has_video must be True.
PublisherAudioSettings
Configures audio settings for your published stream:
Discontinuous transmission (DTX) stops sending audio packets during silence, saving bandwidth.
SubscriberSettings
Configures subscriber behavior:
Note: At least one of subscribe_to_audio or subscribe_to_video must be True.
SubscriberVideoSettings
Configures video preferences for subscribers:
Understanding preferred settings:
When subscribing to routed streams that use simulcast, the Vonage Video API SFU (Selective Forwarding Unit) can send different quality layers of the video. The preferred_resolution and preferred_framerate settings allow you to request a specific quality layer:
- preferred_resolution: Requests a specific spatial layer (resolution). The SFU will send the layer that most closely matches your preference.
- preferred_framerate: Requests a specific temporal layer (frame rate). The SFU will send the layer that most closely matches your preference.
These preferences help optimize bandwidth usage and processing requirements on the subscriber side by requesting only the quality level you need, rather than always receiving the highest quality available.
Data structure relationships
The data structures are related in the following hierarchy:
Connecting to a session
Basic connection
To connect to a Vonage Video API session, you need your application ID (API key if using Tokbox), session ID, and a valid token:
Connection with all callbacks
For full session management, implement all available callbacks:
Disconnecting from a session
Disconnect from the session when done:
Session settings
Audio and video configuration
Configure audio and video settings for the session to control the format of media data:
Logging configuration
Control the verbosity of console logging:
Session migration
Enable automatic session migration in case of SFU rotation:
Publishing streams
Publisher configuration
Configure publisher settings before starting to publish:
Start publishing
Begin publishing a stream to the session:
Important:
If you're publishing audio (has_audio=True), you must wait for the on_ready_for_audio_cb callback to be invoked before calling add_audio(). This callback indicates that the audio system is initialized and ready to accept audio data. This requirement does not apply to video-only publishing scenarios.
Adding audio data
Send audio data to your published stream:
Stop publishing
Stop publishing when done:
Subscribing to streams
Subscribe to streams
When a new stream is received, subscribe to it to receive audio and/or video data:
Receiving subscribed media
When you subscribe to streams, the library delivers audio and video data through different callbacks:
Video data: Video frames are delivered individually per subscribed stream through the on_render_frame_cb callback. Each callback invocation includes the subscriber object that identifies which stream the video frame belongs to. This allows you to process video from different participants separately.
Audio data: Audio is delivered as a single mixed stream through the on_audio_data_cb callback registered during connect(). The library automatically mixes audio from all subscribed streams together into a single audio stream. You cannot distinguish between individual participants' audio in this callback.
This design allows you to:
- Process video from each participant independently for tasks like layout management, individual recording, or per-stream video effects
- Receive pre-mixed audio optimized for playback or further processing without manual mixing
- Configure the mixed audio format via
audio_subscribers_mixinSessionAVSettingsto match your processing requirements
Unsubscribe from streams
Stop receiving media from a specific stream:
Audio data handling
Audio format
Audio data is delivered as Linear PCM 16-bit signed integers with the following characteristics:
- Sample rates: 8000, 12000, 16000, 24000, 32000, 44100, or 48000 Hz
- Channels: 1 (mono) or 2 (stereo)
- Format: 16-bit signed integers in a
memoryviewbuffer - Frame size: Typically 20ms chunks (varies by sample rate)
Processing audio data
Handle incoming audio in the audio data callback:
Creating audio data
When adding audio, create properly formatted AudioData objects:
Audio data continuity
When you publish audio, the library manages audio continuity automatically in several scenarios:
Initial publishing:
When you start publishing audio (via publish() with has_audio=True), the library automatically sends silence (zero-filled audio frames) until you provide your first audio data via add_audio(). This ensures the audio stream is immediately available to subscribers without waiting for your application to generate audio data.
Silence tolerance:
If you temporarily stop providing audio data via add_audio(), the library tolerates brief gaps by not sending any audio packets. This hysteresis period prevents unnecessary silence packets during momentary processing delays.
Explicit silence: After the tolerance period, if no new audio data is available, the library switches to sending explicit silence frames (zero-filled audio). This maintains the audio stream while indicating that no active audio is being provided.
Buffer flush: If you provide less than a full period's worth of audio data, the library will flush the remaining data and pad it with silence to maintain the correct timing and prevent audio drift.
Best practices:
- Maintain a consistent audio rate by calling
add_audio()at regular intervals matching your configured sample rate - Monitor buffer statistics using
get_media_buffer_stats()to ensure adequate audio data - Handle the
on_media_buffer_drained_cbcallback to detect when your audio buffer is depleted - Consider implementing an audio generation strategy that adapts to varying processing loads
This automatic audio management ensures that your published audio stream remains continuous and properly timed even during temporary gaps in data availability.
Video data handling
Video format
Video data is delivered as 8-bit unsigned chars in one of three formats:
- YUV420P: Planar YUV format with 4:2:0 chroma subsampling
- RGB24: 24-bit BGR format (8 bits per channel)
- ARGB32: 32-bit BGRA format with alpha channel
Video specifications:
- Resolutions: Up to 1920x1080 (Full HD)
- Frame rates: 1-30 FPS
- Format: 8-bit unsigned chars in a
memoryviewbuffer
Processing video frames
Handle incoming video frames in the render frame callback:
Creating video frames
When publishing video, create properly formatted VideoFrame objects:
Video frame continuity
When you publish video, the library manages frame continuity automatically in several scenarios:
Initial publishing:
When you start publishing video (via publish() with has_video=True), the library automatically sends black frames until you provide your first frame via add_video(). This ensures the video stream is immediately available to subscribers without waiting for your application to generate video data.
Last frame repetition:
If you stop providing video frames via add_video(), the library will automatically repeat the last frame you provided. This ensures smooth playback for subscribers without interruption. The last frame will be repeated for up to 2 seconds.
Black frame fallback: After the maximum repetition period (2 seconds), the library switches to publishing black frames. This indicates to subscribers that video data is no longer actively being provided while maintaining the video stream.
Best practices:
- Maintain a consistent frame rate by calling
add_video()at regular intervals matching your configured FPS - Monitor buffer statistics using
get_media_buffer_stats()to ensure adequate video data - Handle the
on_media_buffer_drained_cbcallback to detect when your video buffer is depleted - Consider implementing a frame generation strategy that adapts to varying processing loads
This automatic frame management ensures that your published video stream remains continuous even during temporary gaps in data availability.
Media buffer management
Checking buffer stats
Monitor the state of your media buffers:
Clearing media buffers
Clear both audio and video buffers when needed:
Buffer drained callback
Handle buffer drain events:
Understanding buffer drain events:
The on_media_buffer_drained_cb callback is invoked when the internal audio or video buffers are depleted. This occurs when media data is being transmitted to the session at a rate that exceeds the rate at which new media data is being provided via add_audio() or add_video() calls.
This callback serves as a notification that you should increase your media production rate or adjust your publishing strategy to maintain continuous media flow. Monitoring these events helps prevent gaps or interruptions in your published stream.
Callback hysteresis behavior:
The callback implements hysteresis to prevent excessive triggering. After the initial drain event, the callback will not be invoked again until the buffer is replenished with new media data and subsequently becomes depleted again. This prevents a flood of repeated notifications while the buffer remains empty.
Getting connection info
Retrieve your local connection information:
Event callbacks
Session callbacks
Handle session-level events:
Connection callbacks
Monitor participant connections:
Stream callbacks
Handle stream events:
Publisher callbacks
Handle publishing events:
Subscriber callbacks
Handle subscription events:
Media buffer callbacks
Handle media buffer events:
Resource cleanup
Always clean up resources properly: