Video Connector

The Vonage Video Connector Python library enables you to programmatically participate in Vonage Video API sessions as a server-side participant. This library allows you to connect to video sessions, publish and subscribe to streams, and process real-time audio and video data.

The library handles WebRTC connectivity, media processing, and session management automatically, allowing you to focus on building your application logic. Audio is delivered as Linear PCM 16-bit data, and video is delivered as 8-bit frames in YUV420P, RGB24, or ARGB32 formats, all at configurable sample rates, resolutions, and channel configurations.

Important The Vonage Video Connector Python library is designed for server-side applications and requires valid Vonage Video API credentials and tokens with appropriate permissions.

This topic includes the following sections:

Private beta

Vonage Video Connector is in beta stage. Contact us to get early access.

Requirements

This library requires Python 3.13 running Linux AMD64 and ARM64 platforms. We recommend using Debian Bookworm as it is the distribution where this has been most thoroughly tested.

Data structures

The Vonage Video Connector Python library uses several key data structures to represent sessions, connections, streams, and audio data. Understanding these structures is essential for working with the library effectively.

Session

Represents a Vonage Video API session that clients can connect to:

The Session object is passed to various callback functions to identify which session triggered the event.

Connection

Represents a participant's connection to a session:

Connection data can be used to store custom metadata about participants, such as user IDs or roles.

Stream

Represents a media stream (audio/video) published by a participant:

Streams are created when participants publish media and are used for subscribing to receive their audio/video data.

Publisher

Represents your published stream in the session:

The Publisher object is used in publisher-related callbacks and represents your own published media stream.

Subscriber

Represents a subscription to another participant's stream:

The Subscriber object is used in subscriber-related callbacks and represents your subscription to receive another participant's media.

AudioData

Represents audio data being transmitted or received:

Audio format requirements:

  • Sample buffer must contain 16-bit signed integers
  • Valid sample rates: 8000, 12000, 16000, 24000, 32000, 44100, 48000 Hz
  • Channels: 1 (mono) or 2 (stereo)
  • Buffer size must accommodate: number_of_frames * number_of_channels samples

VideoFrame

Represents video frame data being transmitted or received:

Video format requirements:

  • Frame buffer must contain 8-bit unsigned chars
  • Valid formats: YUV420P, RGB24 (BGR), ARGB32 (BGRA)
  • Maximum resolution: 1920x1080 pixels (2,073,600 total pixels)
  • Buffer size varies by format and resolution

VideoResolution

Represents the dimensions of a video frame:

MediaBufferStats

Provides statistics about media buffers:

Configuration structures

SessionSettings

Configures session-level behavior:

SessionAVSettings

Configures audio and video settings for the session:

Understanding audio configuration:

The SessionAVSettings allows you to configure different audio formats for publishing and receiving:

  • audio_publisher: Defines the format for audio data you provide via add_audio(). The audio data you send must match this configuration's sample rate and number of channels.

  • audio_subscribers_mix: Defines the format for the mixed audio you receive from all subscribed streams via the on_audio_data_cb callback. The library automatically handles mixing multiple subscribers' audio and resampling/channel conversion to match your specified format.

This separation allows you to optimize for your use case. For example:

  • Publish in stereo (2 channels) for high-quality output while receiving a mono mix (1 channel) to simplify processing
  • Publish at 16kHz for speech while receiving at 48kHz for high-fidelity playback
  • Use different sample rates for publishing and subscription based on your audio processing pipeline requirements

SessionAudioSettings

Configures audio format for publishing or receiving audio data:

SessionVideoPublisherSettings

Configures video settings for publishing:

LoggingSettings

Controls logging verbosity:

PublisherSettings

Configures your published stream:

Note: At least one of has_audio or has_video must be True.

PublisherAudioSettings

Configures audio settings for your published stream:

Discontinuous transmission (DTX) stops sending audio packets during silence, saving bandwidth.

SubscriberSettings

Configures subscriber behavior:

Note: At least one of subscribe_to_audio or subscribe_to_video must be True.

SubscriberVideoSettings

Configures video preferences for subscribers:

Understanding preferred settings:

When subscribing to routed streams that use simulcast, the Vonage Video API SFU (Selective Forwarding Unit) can send different quality layers of the video. The preferred_resolution and preferred_framerate settings allow you to request a specific quality layer:

  • preferred_resolution: Requests a specific spatial layer (resolution). The SFU will send the layer that most closely matches your preference.
  • preferred_framerate: Requests a specific temporal layer (frame rate). The SFU will send the layer that most closely matches your preference.

These preferences help optimize bandwidth usage and processing requirements on the subscriber side by requesting only the quality level you need, rather than always receiving the highest quality available.

Data structure relationships

The data structures are related in the following hierarchy:

Connecting to a session

Basic connection

To connect to a Vonage Video API session, you need your application ID (API key if using Tokbox), session ID, and a valid token:

Connection with all callbacks

For full session management, implement all available callbacks:

Disconnecting from a session

Disconnect from the session when done:

Session settings

Audio and video configuration

Configure audio and video settings for the session to control the format of media data:

Logging configuration

Control the verbosity of console logging:

Session migration

Enable automatic session migration in case of SFU rotation:

Publishing streams

Publisher configuration

Configure publisher settings before starting to publish:

Start publishing

Begin publishing a stream to the session:

Important: If you're publishing audio (has_audio=True), you must wait for the on_ready_for_audio_cb callback to be invoked before calling add_audio(). This callback indicates that the audio system is initialized and ready to accept audio data. This requirement does not apply to video-only publishing scenarios.

Adding audio data

Send audio data to your published stream:

Stop publishing

Stop publishing when done:

Subscribing to streams

Subscribe to streams

When a new stream is received, subscribe to it to receive audio and/or video data:

Receiving subscribed media

When you subscribe to streams, the library delivers audio and video data through different callbacks:

Video data: Video frames are delivered individually per subscribed stream through the on_render_frame_cb callback. Each callback invocation includes the subscriber object that identifies which stream the video frame belongs to. This allows you to process video from different participants separately.

Audio data: Audio is delivered as a single mixed stream through the on_audio_data_cb callback registered during connect(). The library automatically mixes audio from all subscribed streams together into a single audio stream. You cannot distinguish between individual participants' audio in this callback.

This design allows you to:

  • Process video from each participant independently for tasks like layout management, individual recording, or per-stream video effects
  • Receive pre-mixed audio optimized for playback or further processing without manual mixing
  • Configure the mixed audio format via audio_subscribers_mix in SessionAVSettings to match your processing requirements

Unsubscribe from streams

Stop receiving media from a specific stream:

Audio data handling

Audio format

Audio data is delivered as Linear PCM 16-bit signed integers with the following characteristics:

  • Sample rates: 8000, 12000, 16000, 24000, 32000, 44100, or 48000 Hz
  • Channels: 1 (mono) or 2 (stereo)
  • Format: 16-bit signed integers in a memoryview buffer
  • Frame size: Typically 20ms chunks (varies by sample rate)

Processing audio data

Handle incoming audio in the audio data callback:

Creating audio data

When adding audio, create properly formatted AudioData objects:

Audio data continuity

When you publish audio, the library manages audio continuity automatically in several scenarios:

Initial publishing: When you start publishing audio (via publish() with has_audio=True), the library automatically sends silence (zero-filled audio frames) until you provide your first audio data via add_audio(). This ensures the audio stream is immediately available to subscribers without waiting for your application to generate audio data.

Silence tolerance: If you temporarily stop providing audio data via add_audio(), the library tolerates brief gaps by not sending any audio packets. This hysteresis period prevents unnecessary silence packets during momentary processing delays.

Explicit silence: After the tolerance period, if no new audio data is available, the library switches to sending explicit silence frames (zero-filled audio). This maintains the audio stream while indicating that no active audio is being provided.

Buffer flush: If you provide less than a full period's worth of audio data, the library will flush the remaining data and pad it with silence to maintain the correct timing and prevent audio drift.

Best practices:

  • Maintain a consistent audio rate by calling add_audio() at regular intervals matching your configured sample rate
  • Monitor buffer statistics using get_media_buffer_stats() to ensure adequate audio data
  • Handle the on_media_buffer_drained_cb callback to detect when your audio buffer is depleted
  • Consider implementing an audio generation strategy that adapts to varying processing loads

This automatic audio management ensures that your published audio stream remains continuous and properly timed even during temporary gaps in data availability.

Video data handling

Video format

Video data is delivered as 8-bit unsigned chars in one of three formats:

  • YUV420P: Planar YUV format with 4:2:0 chroma subsampling
  • RGB24: 24-bit BGR format (8 bits per channel)
  • ARGB32: 32-bit BGRA format with alpha channel

Video specifications:

  • Resolutions: Up to 1920x1080 (Full HD)
  • Frame rates: 1-30 FPS
  • Format: 8-bit unsigned chars in a memoryview buffer

Processing video frames

Handle incoming video frames in the render frame callback:

Creating video frames

When publishing video, create properly formatted VideoFrame objects:

Video frame continuity

When you publish video, the library manages frame continuity automatically in several scenarios:

Initial publishing: When you start publishing video (via publish() with has_video=True), the library automatically sends black frames until you provide your first frame via add_video(). This ensures the video stream is immediately available to subscribers without waiting for your application to generate video data.

Last frame repetition: If you stop providing video frames via add_video(), the library will automatically repeat the last frame you provided. This ensures smooth playback for subscribers without interruption. The last frame will be repeated for up to 2 seconds.

Black frame fallback: After the maximum repetition period (2 seconds), the library switches to publishing black frames. This indicates to subscribers that video data is no longer actively being provided while maintaining the video stream.

Best practices:

  • Maintain a consistent frame rate by calling add_video() at regular intervals matching your configured FPS
  • Monitor buffer statistics using get_media_buffer_stats() to ensure adequate video data
  • Handle the on_media_buffer_drained_cb callback to detect when your video buffer is depleted
  • Consider implementing a frame generation strategy that adapts to varying processing loads

This automatic frame management ensures that your published video stream remains continuous even during temporary gaps in data availability.

Media buffer management

Checking buffer stats

Monitor the state of your media buffers:

Clearing media buffers

Clear both audio and video buffers when needed:

Buffer drained callback

Handle buffer drain events:

Understanding buffer drain events:

The on_media_buffer_drained_cb callback is invoked when the internal audio or video buffers are depleted. This occurs when media data is being transmitted to the session at a rate that exceeds the rate at which new media data is being provided via add_audio() or add_video() calls.

This callback serves as a notification that you should increase your media production rate or adjust your publishing strategy to maintain continuous media flow. Monitoring these events helps prevent gaps or interruptions in your published stream.

Callback hysteresis behavior:

The callback implements hysteresis to prevent excessive triggering. After the initial drain event, the callback will not be invoked again until the buffer is replenished with new media data and subsequently becomes depleted again. This prevents a flood of repeated notifications while the buffer remains empty.

Getting connection info

Retrieve your local connection information:

Event callbacks

Session callbacks

Handle session-level events:

Connection callbacks

Monitor participant connections:

Stream callbacks

Handle stream events:

Publisher callbacks

Handle publishing events:

Subscriber callbacks

Handle subscription events:

Media buffer callbacks

Handle media buffer events:

Resource cleanup

Always clean up resources properly: