Capture Video with AVFoundation and Swift
AVFoundation allows you to capture multimedia data generated by different input sources (camera, microphone, …) and redirect them to any output destination (screen, speakers, render context, …).
Some years ago, I wrote this post on how to build a custom video camera based on AVFoundation
using Objective-C. At that time, Swift did not exist. Recently, we received so many requests to show how to build the same custom video camera using Swift. So, here I am going to show you how to do that.
AVFoundation
allows you to create custom playback and capture solutions for audio, video and still images. The advantage of using AVFoundation
instead of the out-of-the-shelf solutions such as the UIImagePickerController
, is that you get access to the single camera pixels. In this way, you can process video frames in real-time using frameworks such as Metal, Core Image, Core Audio or Accelerate.
AVFoundation Capture Session
AVFoundation
is based on the concept of session. A session is used to control the flow of the data from the input to the output device. You initialize a session in a very straightforward way:
1 |
let cameraSession = AVCaptureSession() |
The session allows you to define the audio and video recording quality using the sessionPreset
property. For this example, I am going to use low quality session preset (just to keep low the battery consumption):
1 |
cameraSession.sessionPreset = AVCaptureSessionPresetLow |
AVFoundation Capture Device
After the capture session has been created, you need to define the capture device you want to use. It can be a camera or a microphone. In this example, I am going to use the AVMediaTypeVideo
type that supports videos and images:
1 |
let captureDevice = AVCaptureDevice.defaultDevice(withMediaType: AVMediaTypeVideo) as AVCaptureDevice |
AVFoundation Capture Device Input
Next, you need to define the input of the capture device and add it to the session. The next chuck of code does all what you need:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
do { let deviceInput = try AVCaptureDeviceInput(device: captureDevice) cameraSession.beginConfiguration() // 1 if (cameraSession.canAddInput(deviceInput) == true) { cameraSession.addInput(deviceInput) } let dataOutput = AVCaptureVideoDataOutput() // 2 dataOutput.videoSettings = [(kCVPixelBufferPixelFormatTypeKey as NSString) : NSNumber(value: kCVPixelFormatType_420YpCbCr8BiPlanarFullRange as UInt32)] // 3 dataOutput.alwaysDiscardsLateVideoFrames = true // 4 if (cameraSession.canAddOutput(dataOutput) == true) { cameraSession.addOutput(dataOutput) } cameraSession.commitConfiguration() //5 let queue = DispatchQueue(label: "com.invasivecode.videoQueue") // 6 dataOutput.setSampleBufferDelegate(self, queue: queue) // 7 } catch let error as NSError { NSLog("\(error), \(error.localizedDescription)") } |
Since the initialization of the capture device input can throw an error, you use the do-try-catch Swift construct (please, check The Swift Programming Language book for details).
Inside the do
scope, I start the session configuration (line 1). Then, I check if I can add an input to the session. In line 2, I instantiate the video data output and set a couple of its properties. The first property (line 3) is the videoSettings
. This is a dictionary containing the compression settings keys (jpeg, H264, …) or the pixel buffer attributes (RGBA32, ARGB32, 422YpCbCr8, …). When I apply some image processing algorithms to extract some information from the image content, I usually need to process only the image luminance. Since the process to convert an RBGA signal to a gray-level signal can be computational expensive, I usually ask AVFoundation to perform this job for me directly in hardware. So, as shown in line 3, I use the kCVPixelFormatType_420YpCbCr8BiPlanarFullRange
pixel format. This pixel format is composed by two 8-bit components. The first byte represents the luma, while the second byte represents two chroma components (blue and red). This format is also shortly called YCC.
The second property (line 4) should be set to true. There are some cases where you want to wait for the generation of the next frame, but, in general, you don’t want to block the process. In this way, if a video frame arrives too late, it is discarded. You can count the number of discarded frames using the
captureOutput: didDropSampleBuffer: fromConnection:
method defined in the AVCaptureVideoDataOutputBufferSampleBufferDelegate
protocol.
Finally, I check if I can add an output to the session and commit the configuration (line 5). The last 2 lines (line 6 and line 7) define a GCD serial queue and the delegate object of the data output. The session sends each frame to the delegate object. You can collect each frame implementing
1 2 3 |
func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, fromConnection connection: AVCaptureConnection!) { } |
This method belongs to the AVCaptureVideoDataOutputBufferSampleBufferDelegate
protocol. The second argument of this method provides the sample buffer of type CMSampleBuffer that represents a camera frame.
Camera Preview
Besides collecting the camera frames, we can also preview them on the App UI. This can be done using an instance of AVCaptureVideoPreviewLayer
. This is a special Core Animation layer that AVFoundation uses to render frames in real time. Hence, let’s add the following property to our view controller:
1 2 3 4 5 6 7 8 |
lazy var previewLayer: AVCaptureVideoPreviewLayer = { let preview = AVCaptureVideoPreviewLayer(session: self.cameraSession) preview?.bounds = CGRect(x: 0, y: 0, width: self.view.bounds.width, height: self.view.bounds.height) preview?.position = CGPoint(x: self.view.bounds.midX, y: self.view.bounds.midY) preview.videoGravity = AVLayerVideoGravityResize return preview }() |
The above piece of code is very simple and does not handle device rotations. I leave this to you as exercise.
Source Example
You can download the complete example from here.
Conclusions
In this post, I showed you how to setup a custom video camera in Swift. Please, refer to the old posts if you want to add additional features. In one of the next posts, I’ll show you how to pass the camera frames to Metal for real-time video processing.
Geppy
Geppy Parziale (@geppyp) is cofounder of InvasiveCode (@invasivecode). He has developed iOS applications and taught iOS development since 2008. He worked at Apple as iOS and OS X Engineer in the Core Recognition team. He has developed several iOS and OS X apps and frameworks for Apple, and many of his development projects are top-grossing iOS apps that are featured in the App Store.
Hi Geppy, I'm currently working on a computer Vision project, that requires object detection using haar cascades. I'm currently processing everything on the CPU using openCV which is obviously too slow. Any chance you could show us how to pass these video frames for processing using the gpu, using either Metal or GPUImage.
Hi, thanks for reading this. Yes, I have already prepared a post to show how to pass the video frames to Metal. Stay tuned.
Did you ever explain how to do this processing, your tutorial was great!
Coming soon... promised.
Have any idea how to capture both rear and front camera at a time?
This is not possible at the moment. You can try to switch between the two camera (with some flickering on the image), but never the 2 cameras at the same time.
your tutorial is very easy to understand, both ObjC and Swift version. Thank you very much!