Virtual Backgrounds are one of the hot topics among employees that work remotely at the moment. With some of us being isolated at the moment because of the Covid-19 pandemic, a lot of people have to take video calls in order to carry on their work. Some software tools for video conferincing allow setting a virtual background so that users can build a more friendly atmosphere for taking these calls.

The clearer picture
Photo by timJ / Unsplash
Interested in more stories like this? Follow me on Twitter at @b_dmarius and I'll post there every new article.

As a programmer, I was naturally intrigued first time I used such a virtual background. How does it work, I wondered. Can I build such a virtual background? And if yes, how can I do it? Spoiler: it did not go well! Still, I think it was a good educational exercise and I didn't find too much information on this topic while researching this. Therefore, as I do with everyting I learn, I decided to document it here, maybe someone else will benefit from this.

So in this tutorial we are going to try a basic approach for building a a virtual background with Computer Vision techniques, using Python and OpenCV.

Introduction

The goal of this project is to take a video, try to figure out what's the background and what's the foreground of the video, remove the background part and replace it with a picture - the virtual background. Because in this project we are going to use trivial methods, we will need the assumption that the foreground will, in general, have colors different from the background. But first, let's see what are our tools.

Computer Vision

Computer Vision is an interdisciplinary field that deals with how computers can process and(maybe) understand images and videos. We say it is an interdisciplinary field because it borrows a lot of concepts from different disciplines(computer science, algebra, geometry and so on) and combines them to solve a lot of different and complex tasks, like object tracking, object detection, object recognition, object segmentation in images and videos.

OpenCV

OpenCV is a library built for solving computer vision tasks. It is open-source and it is available for several programming languages, including Python and C++. It has a tremendous amount of features for computer vision, with some of them being based on on maths and statistical approaches, and others being based on Machine Learning.

Python

If you've made it this far in this article, you probably know what Python is 😀

Building a virtual background

The approach I tried for this was the following. I'll show code snippets for every step and at the end of the article you'll have the full code.

  1. Import dependencies
import numpy as np
import cv2

2. Load the video from the local environment and initialize data

ap = cv2.VideoCapture('video6.mp4')
ret = True
frameCounter = 0
previousFrame = None
nextFrame = None
iterations = 0

3. Load the substitute background image from the local environment

backgroundImage = cv2.imread("image1.jpg")

4. Split the video frame by frame

while (ret):
	ret, frame = cap.read()

5. Take every pair of two frames

        if frameCounter % 2 == 1:
            nextFrame = frame

        if frameCounter % 2 == 0:
            frameCounter = 0
            previousFrame = frame

        frameCounter = frameCounter + 1
        iterations = iterations + 1

6. Find the absolute difference between the two frames and convert it to grayscale -> obtaining a mask.

        if iterations > 2:
            diff = cv2.absdiff(previousFrame, nextFrame)
            mask = cv2.cvtColor(diff, cv2.COLOR_BGR2GRAY)

Every image consists of pixels - you can imagine this as a 2D matrix with lines and columns and every cell in the matrix is a pixel in an image(of course, for color images we have more dimensions than just 2, but for simplicity, we can ignore this).

We obtain the difference by going pixel by pixel in the first image(so cell by cell in the first matrix) and substituting the corresponding pixel from the other image(so the corresponding cell from the other matrix).

Now here's the trick: if between the 2 frames, a pixel has not been modified, then of course the result will be 0. How can a pixel be different between 2 frames? If the video is completely static(nothing moves in the image), then the difference will be 0 between each and every frame for all the pixels, because nothing is changed. But if something moves in the image, then we can identify where in the image something has moved by detecting the pixel differences. And we can assume that, in a video conference, the things that move are in the foreground – that's you – and the static part is the background.

And what's so important about this 0? The image will show a black color for every pixel that is 0, and we are going to use that in our advantage.

7. Find the cells in the mask that are over a threshold value - I've chosen 3 as a threshold, but you can play with different values. A larger value will remove more from the background, but may also remove more from the foreground.

            th = 3
            isMask = mask > th
            nonMask = mask <= th

8. Create an empty image(0 for every cell) with the size of any of the two frames.

            result = np.zeros_like(nextFrame, np.uint8)

9. Resize the background image so that it has the same size as the frames.

            resized = cv2.resize(backgroundImage, (result.shape[1], result.shape[0]), interpolation = cv2.INTER_AREA)

10. For every cell from the mask that is bigger than the threshold, copy from the original frame.

            result[isMask] = nextFrame[isMask]

11. For every cell from the mask that is lower than the threshold, copy from the substitute background image.

            result[nonMask] = resized[nonMask]

12. Save the result frame to the local environment.

            cv2.imwrite("output" + str(iterations) + ".jpg", result)

Results and conclusion

So what are the results? Honestly, I've been a bit dissapointed by the result. Then I did more research and the reason became more obvious. You need a more advanced approach for this and it's no surprise that big companies invest lots of resources on this type of problem.

Here's a screenshot of the video I tried. It's basically a video of my hand moving in front of a wall.

Virtual background Python and OpenCV tutorial - input
Virtual background Python and OpenCV tutorial - input

And here's a screenshot of the output image. For the background I used a photo of me in Rasnov, Romania.

Virtual background Python and OpenCV tutorial - output
Virtual background Python and OpenCV tutorial - output

As I said, I am not very satisfied with the result. But I am satisfied with what I learned from this project. It was a fun learning experience and a nice way to spend my time working with concepts I am not comfortable to work with.

Other approaches to creating a virtual background

If you think a problem is very complicated and requires levels of intelligence unusual for what you've seen in a computer software - then the answer might be Machine Learning. 😀

There are already Deep Learning models out there that can perform this sort of tasks. But such a model requires large datasets to train on and lots of processing power, out of which I had none at the moment of writing this article. The task to be solved by such a deep learning model is called image segmentation.

Another approach would be a c0mputer vision method for finding the distance between the camera and the objects in the image. Then you would establish a threshold for separating the foreground from the background. After that, you can use the same mask I used to remove the background and introduce a new one.

Thank you so much for reading this. Interested in more stories like this? Follow me on Twitter at @b_dmarius and I'll post there every new article.