In education and events spaces particularly, adding an active moderation on participants' videos can be very useful, as it makes it possible to block inappropriate content from others. The application we build with this tutorial will also enable you to save data during the call and run post-call analysis on detection performances.
In this blog post, we will implement a Video Moderation application using Vonage Video API and AWS Rekognition. The application will moderate the video published by Camera and Screen sharing for each of the publishers into the session. If the application detects inappropriate content, it will mute the video from the offending publisher and send a notification to all the participants.
Want to jump ahead? You can find the code for this tutorial on GitHub and the video tutorial and demo on Youtube
Prerequisites
A Vonage Video API account. If you don't have one already, you can create an account in the Video Dashboard
An AWS account
Project Architecture
The application backend is implemented using AWS Serverless components such as AWS Lambda, AWS API Gateway, AWS DynamoDB and AWS Rekognition service.
The backend is contained in the src/functions folder. There are two main functions:
api/room.js
: handles the room creation in DynamoDB and assigns Vonage Video API sessionId to the specific room nameapi/moderation.js
: receives the base64 image from the client, sends the image to the AWS Rekognition service and sends back the result to the client
The room function receives a parameter called roomName
. Based on the roomName
, it checks if the room exists. If so, it sends back the sessionId
related to the existing room and the token to join the room. If not, it creates a new sessionId
, saves it in DynamoDB and sends back the credentials (sessionId and token).
The moderation function receives the images from the client-side camera or screen share. Before sending the image to the AWS Rekognition server, the function decodes them into base64 format.
const AWS = require("aws-sdk");
const Rekognition = new AWS.Rekognition();
const config = require("../config.json");
function detectModerationLabels(imageBuffer) {
var params = {
Image: {
Bytes: imageBuffer,
},
MinConfidence: Number(config.AWS_REKOGNITION_MIN_CONFIDENCE),
};
return Rekognition.detectModerationLabels(params).promise();
}
Then, it calls the detectModerationLabels
function. The detectModerationLabels
function gives back the objects detected and the confidence. If no objects are identified, the function returns an empty array. Otherwise, the function returns an array with the identified object to the client-side.
Client Side
The client-side application is a React Single Page Application. The entry point of the project is the src/client/index.js
file. The index file imports the App file, which contains the Routes and Component definition.
Pages
The routes are defined in the App.js file. The code uses the react-router-dom
module to declare the routes. There are two main routes:
Waiting Room: The user can set up their microphone and camera settings and run a pre-call test on this page. Then, they can join the video call.
Video Room: The user can connect to the session, publish their stream, and subscribe to each stream inside the room.
The key thing to note on the Video Room page is the custom hook: useModeration
(hooks/useModeration). The useModeration
hook sends every second a screenshot of the camera (or the screen) to the moderation API function.
For live streaming, it’s ideal to have a process to periodically extract frames and use image-based Rekognition API for analysis. This allows you to get the detection response asynchronously and also allows you to extend your AI/ML process in the future (most of the machine learning models are based on image). Hence, sending a screenshot every second is a good compromise between live content detection and CPU/Bandwidth usage of the client using the video application. To get the screenshot of the stream, the application uses the getImgData function provided by the Video SDK.
useInterval(
() => {
if (
currentPublisher &&
!currentPublisher.isLoading() &&
currentPublisher.stream &&
currentPublisher.stream.hasVideo &&
isModerationActive
) {
sendImage(currentPublisher.getImgData()).then((res) => {
if (res && res.error) {
return;
}
if (res && res.data && res.data.labels && res.data.labels.length) {
setModerationLabels(parseModerationLabels(res.data.labels));
setWarnOpenSnackbar(true);
setCameraIsInappropriate(res.data.innapropriate);
}
});
}
},
isIntervalRunning ? intervalDelay : null
);
If the moderation function detects inappropriate content, the useModeration
hook shows a warning snackbar to the current publisher and disables their webcam or screen for a defined period of time (for example, 10 seconds). The hook also sends a signal to the other participants telling them that the publisher's video has been disabled because of inappropriate content.
Conclusion
This post demonstrates how to integrate a content moderation API, AWS Rekognition, into the Vonage Video API. How the application reacts to inappropriate content is completely customizable based on your use case—it can mute the audio/video of the Publisher, or even forcefully disconnect the user and ban them from joining the session again.
For more details on how you can moderate content using Vonage Video API, have a look at this article.
Resources: https://github.com/nexmo-se/video-api-aws-moderation
Enrico is a former Vonage team member. He worked as a Solutions Engineer, helping the sales team with his technical expertise. He is passionate about the cloud, startups, and new technologies. He is the Co-Founder of a WebRTC Startup in Italy. Out of work, he likes to travel and taste as many weird foods as possible.