A (quasi-) real-time video processing on iOS with AVFoundation

In previous posts, I showed you how to create a custom camera using AVFoundation and how to process an image with the accelerate framework. Let’s now combine both results to create a (quasi-) real-time (I’ll explain later what I mean with quasi) video processing.

Custom camera preview

To appreciate what we are going to do, we need to build a custom camera preview. If we want to process a video buffer and show the result in real-time, we cannot use the AVCaptureVideoPreviewLayer as shown in this post, because that camera preview renders the signal directly and does not offer any way to process it, before the rendering. To make this possible, you need to take the video buffer, process it and then render it on a custom CALayer. Let’s see how to do that.

As I already demonstated here, setting the AVFoundation stack is quite straightfoward (thank you, Apple): you need to create a capture session (AVCaptureSession), then a capture device (AVCaptureDevice) and add it to the session as a device input (AVCaptureDeviceInput). Translating this in source code, this becomes:

The output buffer

Until here, nothing is new with respct to the previous post. Here, instead, where the new stuffs come in place. First of all, we need to define a video data output (AVCaptureVideoDataOutput) and add it to the session:

Here, I defined the output format as YUV (YpCbCr 4:2:0). If you don’t know what I am talking about, I suggest you to give a look at this article. YUV or, more correctly, YCbCr is a very common video format and I use it here, because, except when the color brings some usefull information, you usually use graylevel images for image processing. So, the YUV format provides a signal with the intensity component (the Y) and 2 cromatic components (the U and the V).

The destination layer

Additionally, we need to create a new layer and use it as our rendering destination:

We can add this layer to any other layer. I’m going to add it to my view controller view layer:

Let’s go

The last step of the initial configuration is to create a GCD queue that is going to manage the video buffer and set our class as delegate of the video data output sample buffer (AVCaptureVideoDataOutputSampleBufferDelegate):

Final setup

Now, remeber to add the following frameworks to your project:

  • AVFoundation
  • CoreMedia
  • CoreVideo
  • CoreGraphics

Video rendering

Since the view controller is now the delegate of the capture video data output, you can implement the following callback:

AVFoundation fires this delegate method as soon as it has a data buffer available. So, you can use it to collet the video buffer frames, process them and render them on the layer that we previously created. For the moment, let’s collect the video buffer frames and render them on the layer. Later, we’ll give a look at the image processing.

The previous delegate method returns the sampleBuffer of type CMSampleBufferRef. This is a Core Media object we can bring into Core Video:

Let’s lock the buffer base address:

Then, let’s extract some useful image information:

Remember the video buffer is in YUV format, so I extract the luma component from the buffer in this way:

Now, let’s render this buffer on the layer. To do so, we need to use Core Graphics: create a color space, create a graphic context and render the buffer onto the graphic context using the created color space:

So, the dstImage is a Core Graphics image (CGImage), created from the captured buffer. Finally, we render this image on the layer, changing its contents. We do that on the main queue:

Now, let’s do some clean-up (we are good citizens, right?).

If you build and run, you’ll see the camera in action with your camera preview.

Image processing

Now, let’s start with some funny stuffs. Let’s process the buffer before rendering it. For this, I am going to use the Accelerate framework. The Pixel_8 *lumaBuffer would be the input of my algorithm. I need to convert the this buffer into a vImage_Buffer and prepare a vImage_Buffer for the output of the image processing algorithm.
Add this code after the line generating the lumaBuffer:

The -maxFromImage:toImage: method does all the work. Just for fun, I process the input image with a morphological operator that minimizes a region of interest within the image. Here it is:

If you now run it, rendering the outImage on the custom preview, you should obtain something like this:

Pixelate effect

You can download from here the example.

Final considerations

As I mentioned at the beginning of this post, this processing is done in quasi- real-time. The limitiation derives from the accelerate framework. This framework is optimized for the CPU that is anyway a limited resource. Depending on the final application, this limitation could not be important. However, if you start to add more processing before the rendering, you will see what I mean. Again, the result is really dependent on the application, but if you really want to process and display the processed results in real-time, maybe you should think of using the GPU… but this is something for a future post.




(Visited 1,330 times, 1 visits today)