Capture Video with AVFoundation and Swift

AVFoundation allows you to capture multimedia data generated by different input sources (camera, microphone, …) and redirect them to any output destination (screen, speakers, render context, …).

Some years ago, I wrote this post on how to build a custom video camera based on AVFoundation using Objective-C. At that time, Swift did not exist. Recently, we received so many requests to show how to build the same custom video camera using Swift. So, here I am going to show you how to do that.

AVFoundation allows you to create custom playback and capture solutions for audio, video and still images. The advantage of using AVFoundation instead of the out-of-the-shelf solutions such as the UIImagePickerController, is that you get access to the single camera pixels. In this way, you can process video frames in real-time using frameworks such as Metal, Core Image, Core Audio or Accelerate.

AVFoundation Capture Session

AVFoundation is based on the concept of session. A session is used to control the flow of the data from the input to the output device. You initialize a session in a very straightforward way:

The session allows you to define the audio and video recording quality using the sessionPreset property. For this example, I am going to use low quality session preset (just to keep low the battery consumption):

AVFoundation Capture Device

After the capture session has been created, you need to define the capture device you want to use. It can be a camera or a microphone. In this example, I am going to use the AVMediaTypeVideo type that supports videos and images:

AVFoundation Capture Device Input

Next, you need to define the input of the capture device and add it to the session. The next chuck of code does all what you need:

Since the initialization of the capture device input can throw an error, you use the do-try-catch Swift construct (please, check The Swift Programming Language book for details).

Inside the do scope, I start the session configuration (line 1). Then, I check if I can add an input to the session. In line 2, I instantiate the video data output and set a couple of its properties. The first property (line 3) is the videoSettings. This is a dictionary containing the compression settings keys (jpeg, H264, …) or the pixel buffer attributes (RGBA32, ARGB32, 422YpCbCr8, …). When I apply some image processing algorithms to extract some information from the image content, I usually need to process only the image luminance. Since the process to convert an RBGA signal to a gray-level signal can be computational expensive, I usually ask AVFoundation to perform this job for me directly in hardware. So, as shown in line 3, I use the kCVPixelFormatType_420YpCbCr8BiPlanarFullRange pixel format. This pixel format is composed by two 8-bit components. The first byte represents the luma, while the second byte represents two chroma components (blue and red). This format is also shortly called YCC.

The second property (line 4) should be set to true. There are some cases where you want to wait for the generation of the next frame, but, in general, you don’t want to block the process. In this way, if a video frame arrives too late, it is discarded. You can count the number of discarded frames using the
captureOutput: didDropSampleBuffer: fromConnection: method defined in the AVCaptureVideoDataOutputBufferSampleBufferDelegate protocol.

Finally, I check if I can add an output to the session and commit the configuration (line 5). The last 2 lines (line 6 and line 7) define a GCD serial queue and the delegate object of the data output. The session sends each frame to the delegate object. You can collect each frame implementing

This method belongs to the AVCaptureVideoDataOutputBufferSampleBufferDelegate protocol. The second argument of this method provides the sample buffer of type CMSampleBuffer that represents a camera frame.

Camera Preview

Besides collecting the camera frames, we can also preview them on the App UI. This can be done using an instance of AVCaptureVideoPreviewLayer. This is a special Core Animation layer that AVFoundation uses to render frames in real time. Hence, let’s add the following property to our view controller:

The above piece of code is very simple and does not handle device rotations. I leave this to you as exercise.

Source Example

You can download the complete example from here.


In this post, I showed you how to setup a custom video camera in Swift. Please, refer to the old posts if you want to add additional features. In one of the next posts, I’ll show you how to pass the camera frames to Metal for real-time video processing.


Geppy Parziale (@geppyp) is cofounder of InvasiveCode (@invasivecode). He has developed iOS applications and taught iOS development since 2008. He worked at Apple as iOS and OS X Engineer in the Core Recognition team. He has developed several iOS and OS X apps and frameworks for Apple, and many of his development projects are top-grossing iOS apps that are featured in the App Store.



(Visited 10,225 times, 2 visits today)