Metal: Blazing Fast Image Processing
Metal allows developers to perform graphics rendering and parallel computing directly on the GPU. Before Metal, OpenGL ES was the framework available on iOS to perform graphics rendering. Instead, parallel computing using the GPU was only available on OS X through OpenCL. Recently, Apple brought Metal also to OS X. Apple also extended Metal with new features and two new companion frameworks: MetalKit and Metal Performance Shaders.
In this post, I am going to show you how to use Metal to perform parallel processing directly on the GPU. As an example I am going to process an image using a custom filter. This is something you can also do with Core Image. Actually, I recommend you to use Core Image instead of Metal for different reasons. First, Core Image is simpler to setup. Additionally, Core Image offers the possibility to build your custom filters. Starting from iOS 9, Core Image was fully rewritten to take advantage of Metal. However, since you can use Metal to perform any kind of processing (for example, audio processing), I want to show you how to setup and create Metal-based projects.
Metal is only available on the device. So, the following examples and source code cannot be compiled for the iOS simulator.
Metal vs OpenGL/OpenCL
The main advantage of Metal with respect of OpenGL and OpenCL is that Metal was completely built by Apple taking advantage of the Apple hardware. While OpenGL and OpenCL are multi-platform, Metal works only with iOS, OS X and tvOS devices. Writing Metal from scratch allowed Apple to perform massive optimizations almost impossible with OpenGL and OpenCL.
Before we start, I am adding here a small schema showing the components we are going to put together and how they are organized.
Device and GPU
While using Metal you perform the processing and rendering operations directly on the GPU. In Metal, the GPU is represented by an object that conforms to the MTLDevice
protocol. As you will see in the rest of this post, almost every Metal functionality is offered by the framework through protocols rather than concrete classes.
In iOS and OS X, you create a metal object using the MTLCreateSystemDefaultDevice()
function. In OS X, you can also select the metal device from a list of devices returned by the MTLCopyAllDevices()
function.
1 2 |
import Metal let device: MTLDevice? = MTLCreateSystemDefaultDevice() |
If you are using Swift, MTLCreateSystemDefaultDevice()
returns an optional. So, please, handle it correctly.
Once you have the Metal device, you can use it to create different kinds of non-transient objects: command queues, textures and data buffers, sampler states, compute states, function libraries, and pipeline states. I am going to describe in this post many of these objects.
Since it can be expensive to create these non-transient objects, you should create them as soon as possible and reuse them throughout the lifetime of your app. You must avoid to create these objects in performance sensitive code.
Other types of Metal objects (command buffers and command encoders) are instead lightweight objects and must be created and consumed immediately.
Command Queue
After creating a Metal device, you need to create a command queue. A command queue is a queue of command buffers and organizes the order of execution of these command buffers. A command queue is a non-transient object. You create a single command queue per device and keep it during the lifetime of your app.
To create a command queue, call one of the following methods on a MTLDevice
object:
1 2 |
makeCommandQueue() makeCommandQueue(maxCommandBufferCount(_:) |
For example,
1 |
let commandQueue: MTLCommandQueue = device?.makeCommandQueue() |
These methods create an object that conforms to the MTLCommandQueue
protocol. In general, command queues are thread-safe and allow multiple outstanding command buffers to be encoded simultaneously.
Command Buffers
Command buffer objects are created by the MTLCommandQueue
object and can only be committed for execution on the command queue that created it. All command buffers sent to a single command queue are guaranteed to execute in the order in which the command buffers were enqueued.
You use the commandBuffer()
method on the command queue object to create a command buffer that holds a strong reference to any objects that are needed to finish executing the commands encoded in the command buffer:
1 |
let commandBuffer: MTLCommandBuffer = commandQueue.makeCommandBuffer() |
Command buffers are lightweight, single-use objects that store encoded commands that are eventually committed for execution by the GPU. You create command buffers when needed. After a command buffer has been committed for execution, the only valid operations on the command buffer are to wait for it to be scheduled or completed (using synchronous calls or handler blocks) and to check the status of the command buffer execution. When used, scheduled and completed handlers are blocks that are invoked in execution order. These handlers should perform quickly; if expensive or blocking work needs to be scheduled, defer that work to another thread.
In a multithreaded app, it’s advisable to break your overall task into subtasks that can be encoded separately. You create a command buffer for each chunk of work, then you call the enqueue()
method on these command buffer objects to establish the order of execution.
Command Encoders
You use the command buffer to create a command encoder object and fill the buffer with commands. The MTLCommandEncoder
protocol defines the common interface for objects that can write commands into a command buffer.
In Metal, there are many different kinds of command encoders, each providing a different set of commands that can be encoded into the buffer:
- Compute Command Encoder: an object conforming to the
MTLComputeCommandEncoder
protocol and used to encode computational tasks. - Render Command Encoder: an object conforming to the
MTLRenderCommandEncoder
protocol and used to encode graphics rendering tasks. - Blit Command Encoder: an object conforming to the
MTLBlitCommandEncoder
protocol and used for memory management tasks. - Parallel Render Command Encoder: an object conforming to the
MTLParallelRenderCommandEncoder
protocol and used for multiple graphics rendering tasks encoded in parallel.
In this post, we will focus only on the compute command encoders. In general, you can create different types of command encoders for each command buffer. In our case, we can create a compute command encoder from the command buffer:
1 |
let commandEncoder: MTLComputeCommandEncoder = commandBuffer.makeComputeCommandEncoder() |
Once you have the command encoder, you can use it to encode the following data-parallel compute processing commands:
setComputePipelineState(_:)
passing it theMTLComputePipelineState
object (see Compute Pipeline State) that contains the compute function that will be executed.setTexture(_:at:)
andsetBuffer(_:at:)
to specify the resources that hold the input data (or output destination) for the compute function. The index represents the location of each resource in the corresponding argument table. See later Resources and Data.dispatchThreadgroups(_: threadsPerThreadgroup:)
to encode the compute function with a specified number of thread groups (see later Threadgroups) for the grid and the number of threads per threadgroup.- Finally,
endEncoding()
to finish encoding the compute commands onto the command buffer.
The following source code shows an example of how to use a compute command encoder:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
// Creates the command buffer let commandBuffer = commandQueue.commandBuffer() // Creates the command encoder let commandEncoder = commandBuffer.computeCommandEncoder() // Encodes the compute pipeline state commandEncoder.setComputePipelineState(pipelineState) // Encodes the input texture set it at location 0 commandEncoder.setTexture(inTexture, at: 0) // Encodes the output texture set it at location 1 commandEncoder.setTexture(outTexture, at: 1) // Encodes the dispatch of threadgroups (see later) commandEncoder.dispatchThreadgroups(threadGroups, threadsPerThreadgroup: threadGroupCount) // Ends the encoding of the command commandEncoder.endEncoding() // Commits the command to the command buffer commandBuffer.commit() // Waits for the execution of the commands commandBuffer.waitUntilCompleted() |
To fully understand this chunk of code we need to introduce additional concepts:
- Compute Pipeline States
- Libraries and Functions
- Threadgroups
- Resources and data
Let’s give a look at each of them. I know, there are too many objects. However, you write all this boilerplate code only once. Instead, you will spend most of your time in writing Metal functions (or shaders).
Compute Pipeline State
The MTLComputePipelineState
protocol defines the interface for a lightweight object used to encode a reference to a compiled compute program. A compiled compute program is a set of C++ functions that we use to process buffers and textures.
A MTLComputePipelineState
object is fully thread-safe and can be used by many MTLComputeCommandEncoder
objects, even if they are associated with different command buffers.
Because creating a compute pipeline state object can be expensive, you usually create these pipeline state objects during the initialization of your app and reuse them throughout its lifetime.
You create a compute pipeline state using the Metal device and passing it either a Metal function (an object conforming to the MTLFunction
protocol) or a Metal Compute Pipeline Descriptor (an object of type MTLComputePipelineDescriptor
). The creation of the compute pipeline state can be done synchronously and asynchronously.
In the MTLDevice
protocol you find these methods supporting all these options:
1 2 3 4 5 6 7 8 9 10 |
makeComputePipelineState(function:) throws makeComputePipelineState(function:options:reflection:) throws makeComputePipelineState(function:completionHandler:) makeComputePipelineState(function:options:completionHandler:) makeComputePipelineState(descriptor:) throws makeComputePipelineState(descriptor:options:reflection:) makeComputePipelineState(descriptor:completionHandler:) makeComputePipelineState(descriptor:options:completionHandler:) |
Metal Library and Function
To create the compute pipeline state, you can use a MTLFunction
object. The MTLFunction
protocol defines the interface for an object that represents a single Metal shader function that can be executed by the device as part of a graphics shader or compute function. To obtain a MTLFunction
object, first create a MTLLibrary
object from the device and then retrieve a MTLFunction
object from the library.
1 2 3 4 5 6 7 8 |
let defaultLibrary: MTLLibrary? = device?.newDefaultLibrary() let kernelFunction: MTLFunction? = defaultLibrary?.makeFunction(name: "pixelate") do { pipelineState = try device?.makeComputePipelineState(function: kernelFunction) } catch { fatalError("Impossible to setup Metal") } |
The name of the function is the name of the kernel or shader that we are going to write to process the image. A MTLLibrary
object can contain Metal shading language code that is compiled during the app build process or compiled at runtime from a text string containing Metal shading language source code.
The best option is to compile the library at compile time (when we build our application). This reduces the overhead of compiling the library at runtime, something very common in OpenGL.
Use a MTLDevice
method to create a MTLLibrary
object. To create a MTLLibrary
object from a Metal library binary, call one of these MTLDevice
methods:
newDefaultLibrary()
makeLibrary(filepath:) throws
makeLibrary(data:) throws
To create a MTLLibrary
object by compiling source code, call one of these MTLDevice
methods:
makeLibrary(source:options:) throws
makeLibrary(source:options:completionHandler:)
The MTLLibrary
contains a set of functions (objects conforming to the MTLFunction
protocol.
For a rendering pass, you specify a MTLFunction
object as a vertex or fragment shader when you configure a MTLRenderPipelineDescriptor
object. For data-parallel compute processing, you specify a MTLFunction
object as a compute function when you create a MTLComputePipelineState
object.
You can query the MTLFunction
properties at runtime. The functionType
property defines what kind of function it is (MTLFunctionType.Vertex
, MTLFunctionType.Fragment
, MTLFunctionType.Kernel
), and the vertexAttributes
property defines the arguments that it takes.
The makeFunction(name:)
method is used to fetch functions from the library, which makes that code available as a shader for either a MTLRenderPipelineState
object for a render command encoder or for a MTLComputePipelineState
for a compute command encoder.
Resources and Data
If you want to process data on the GPU, you need to convert the data in Metal resources. The MTLResource
protocol defines the interface for any resource object that represents an allocation of GPU memory. In Metal, there are two types of MTLResource
objects:
- Buffers: objects conforming to the
MTLBuffer
protocol. They represent an allocation of unformatted memory that can contain any type of data. Buffers are often used for vertex, shader, and compute state data. - Texture: objects conforming to the
MTLTexture
protocol. They represent an allocation of formatted image data with a specified texture type and pixel format. Texture can be 1D, 2D or 3D. Texture objects are used as source textures for vertex, fragment, or compute functions, as well as to store graphics rendering output (that is, as an attachment).
To create a MTLBuffer
object, you use the following MTLDevice
methods:
newBufferWithLength(_:options:)
creates aMTLBuffer
object with a new storage allocation.makeBuffer(bytes:length:options:)
creates aMTLBuffer
object by copying data from an existing storage allocation into a new allocation.makeBuffer(bytesNoCopy:length:options:deallocator:)
creates aMTLBuffer
object that reuses an existing storage allocation and does not allocate any new storage.
The following methods create and return a MTLTexture
object:
- The
makeTexture(descriptor:)
method of theMTLDevice
protocol creates a texture object with a new storage allocation for the texture image data, using aMTLTextureDescriptor
object to describe the texture’s properties. - The
makeTextureView(pixelFormat:)
andmakeTextureView(pixelFormat:textureType:levels:slices:)
methods of theMTLTexture
protocol create and return a new texture object that shares the same storage allocation as the source texture object. Because they share the same storage, any changes to the pixels of the new texture are reflected in the source texture, and vice versa. For the newly created texture, these methods reinterpret the existing texture image data in the storage allocation of the source texture as if this data were stored in the new specified pixel format. The pixel format of the new texture must be compatible with the pixel format of the source texture. - In iOS, the
makeTextureWith(descriptor:offset:bytesPerRow:)
method of theMTLBuffer
protocol creates and returns a new texture object that shares the storage allocation of the source buffer object as its texture image data. Because they share the same storage, any changes to the pixels of the new texture are reflected in the source buffer, and vice versa.
A MTLTextureDescriptor
object is used to configure new texture objects. To create a new texture, first create a MTLTextureDescriptor
object and set its property values, including the texture’s type, size (width, height, and depth), pixel format, number of mipmap levels, sample count (for multisampling), and memory allocation behavior. You can reuse a MTLTextureDescriptor
object, modifying its property values as needed, to create more MTLTexture
objects. Texture descriptor properties are used only during the creation of a MTLTexture
object. After the texture has been created, property changes in its descriptor have no further effects on it.
After you create a texture, you can call replace(region: mipmapLevel: slice: withBytes: bytesPerRow: bytesPerImage:)
or replace(region: mipmapLevel: withBytes: bytesPerRow:)
to populate the storage allocation of the texture object with image data from system memory.
Instead, you call getBytes(_: bytesPerRow: bytesPerImage: fromRegion: mipmapLevel: slice:)
or getBytes(_: bytesPerRow: fromRegion: mipmapLevel:)
to copy image data from a texture object and store the copied data into system memory.
Resources organization
Buffers and textures are organized in buffer and texture tables. There are a maximum of 31 entries in the buffer table and 31 entries in the texture table. Each entry is represented by an index. You need to pass this index to the command encoder. The index is then used by the Metal function (kernel) as an attribute of the function argument, as we will see later.
Thread Groups
When you perform parallel computing on the GPU, the execution of a kernel is decomposed in multiple threads. Threads are organized into thread groups. Threads in a thread group cooperate by sharing data through thread group memory and by synchronizing their execution to coordinate memory accesses to both device and thread group memory.
It is good practice to ask the Compute Pipeline State about the maximum number of threads per thread group. You can use the maxTotalThreadsPerThreadgroup
property to obtain this information. This number is a multiple of a thread execution width. For best performance, the total number of threads in the thread group should be a multiple of the thread_execution_width
and must be lower than the maximum total threads per thread group.
You can define the size of a thread group using for example the size of the texture to be processed:
1 2 3 4 5 |
let threadGroupCount = MTLSizeMake(16, 16, 1) let threadGroups: MTLSize = MTLSizeMake(Int(inTexture.width) / threadGroupCount.width, Int(inTexture.height) / threadGroupCount.height, 1) commandEncoder.dispatchThreadgroups(threadGroups, threadsPerThreadgroup: threadGroupCount) |
The last line of the previous chunk of code encodes the command to dispatch the thread groups.
Let’s build an example
Let’s start an example here showing how to process an image using Metal. Create a new Xcode project and call it MetalImage. Use Swift as main language. Open the ViewController.swift and let’s add 2 outlets:
1 2 |
@IBOutlet var imageView: UIImageView! @IBOutlet var pixelSizeSlider: UISlider! |
In the storyboard, add an image view and a slider to the same view controller view and connect the first outlet to the image view and the second outlet to the slider.
I am going to use the slider to control an input parameter of the kernel function.
Go back to the ViewController.swift and add import Metal
to the view controller. Also import MetalKit (we will see discuss MetalKit in a future post). Let’s add some properties to our view controller:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
// Metal device lazy var device: MTLDevice! = { MTLCreateSystemDefaultDevice() }() // Metal library lazy var defaultLibrary: MTLLibrary! = { self.device.newDefaultLibrary() }() // The Metal command queue lazy var commandQueue: MTLCommandQueue! = { NSLog("\(self.device.name!)") return self.device.makeCommandQueue() }() // This is the input texture var inTexture: MTLTexture! // This is the output texture var outTexture: MTLTexture! // A Metal compute pipeline state var pipelineState: MTLComputePipelineState! // The number of thread groups let threadGroupCount = MTLSizeMake(16, 16, 1) lazy var threadGroups: MTLSize = { MTLSizeMake(Int(self.inTexture.width) / self.threadGroupCount.width, Int(self.inTexture.height) / self.threadGroupCount.height, 1) }() // The queue to process the setup and the computation of the Metal objects let queue = DispatchQueue("com.invasivecode.metalQueue") |
Then, in the viewDidLoad()
method, let’s add the following:
1 2 3 4 5 6 7 |
override func viewDidLoad() { super.viewDidLoad() queue.async() { self.setUpMetal() } } |
The setupMetal()
method creates the device, library, function and pipeline state objects:
1 2 3 4 5 6 7 8 9 10 |
func setUpMetal() { if let kernelFunction = defaultLibrary.makeFunction(name: "pixelate") { do { pipelineState = try device.makeComputePipelineState(function: kernelFunction) } catch { fatalError("Impossible to setup Metal") } } } |
In the viewDidAppear(_:)
, we can call the remaining methods:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
override func viewDidAppear(animated: Bool) { super.viewDidAppear(animated) queue.async() { self.importTexture() self.applyFilter() let finalResult = self.image(from: self.outTexture) DispatchQueue.main.async { self.imageView.image = finalResult } } } |
Here, I dispatch the methods importTexture()
, applyFilter()
and image(from:)
on a secondary queue. Finally, I dispatch the rendering of the final image on the main queue.
The first method (importTexture()
) is used to convert a UIImage
in a Metal texture. I could create a MTLTextureDescriptor
object containing the specifications of the image and use it to create the texture. However, in iOS 9 Apple introduced MetalKit. This new framework offers an API to import images using a MTKTextureLoader
object. So, I initialize a texture loader passing the Metal device.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
func texture(from image: UIImage) -> MTLTexture { guard let cgImage = image.cgImage else { fatalError("Can't open image \(image)") } let textureLoader = MTKTextureLoader(device: self.device) do { let textureOut = try textureLoader.newTexture(with: cgImage) let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: textureOut.pixelFormat, width: textureOut.width, height: textureOut.height, mipmapped: false) outTexture = self.device.makeTexture(descriptor: textureDescriptor) return textureOut } catch { fatalError("Can't load texture") } } |
Once I create a texture from the input image, I get its texture descriptor and use it to create the output texture.
The next method I need to build is applyFilter()
. This is the central part of the chain. This is were I encode commands and send them to the command buffer.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
func applyFilter() { // Creates the command buffer from the command queue let commandBuffer = commandQueue.makeCommandBuffer() // Creates the command encoder from the command buffer let commandEncoder = commandBuffer.makeComputeCommandEncoder() // Encodes the pipeline state command commandEncoder.setComputePipelineState(pipelineState) // Encodes the input texture command commandEncoder.setTexture(inTexture, at: 0) // Encodes the output texture command commandEncoder.setTexture(outTexture, at: 1) // Encodes the input buffer let buffer = device.makeBuffer(bytes: &pixelSize, length: MemoryLayout<uint>.size, options: MTLResourceOptions.storageModeShared) commandEncoder.setBuffer(buffer, offset: 0, at: 0) // Encodes the thread group commandEncoder.dispatchThreadgroups(threadGroups, threadsPerThreadgroup: threadGroupCount) commandEncoder.endEncoding() // Commits the commands to the command buffer commandBuffer.commit() // Waits until the commands are executed commandBuffer.waitUntilCompleted() }</uint> |
The last method to implement in the viewDidAppear(_:)
is just boilerplate code to convert the output texture to a UIImage object so that you can display it on the screen.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
func image(from texture: MTLTexture) -> UIImage { // The total number of bytes of the texture let imageByteCount = texture.width * texture.height * bytesPerPixel // The number of bytes for each image row let bytesPerRow = texture.width * bytesPerPixel // An empty buffer that will contain the image var src = [UInt8](repeating: 0, count: Int(imageByteCount)) // Gets the bytes from the texture let region = MTLRegionMake2D(0, 0, texture.width, texture.height) texture.getBytes(&src, bytesPerRow: bytesPerRow, fromRegion: region, mipmapLevel: 0) // Creates an image context let bitmapInfo = CGBitmapInfo(rawValue: (CGBitmapInfo.byteOrder32Big.rawValue | CGImageAlphaInfo.premultipliedLast.rawValue)) let bitsPerComponent = 8 let colorSpace = CGColorSpaceCreateDeviceRGB() let context = CGContext(data: &src, width: texture.width, height: texture.height, bitsPerComponent: bitsPerComponent, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo.rawValue) // Creates the image from the graphics context let dstImage = context.makeImage() // Creates the final UIImage return UIImage(cgImage: dstImage!, scale: 0.0, orientation: .up) } |
Kernel Functions or Shaders
Now, we need define the kernel function or compute shader. This is a C++ function that is executed on the GPU. Here, Apple chose C++11 as main programming language. I would have preferred to use Swift also for this piece of the chain, but this is what we get now. I will talk more about kernels or compute shaders in future posts. For the moment, add a Metal file in your project and add the following code:
1 2 3 4 5 6 7 8 9 10 11 12 |
#include <metal_stdlib> using namespace metal; kernel void pixelate(texture2d<float, access::read> inTexture [[texture(0)]], texture2d<float, access::write> outTexture [[texture(1)]], device unsigned int *pixelSize [[buffer(0)]], uint2 gid [[thread_position_in_grid]]) { const uint2 pixellateGrid = uint2((gid.x / pixelSize[0]) * pixelSize[0], (gid.y / pixelSize[0]) * pixelSize[0]); const float4 colorAtPixel = inTexture.read(pixellateGrid); outTexture.write(colorAtPixel, gid); } |
Now, add a Metal file to your Xcode project.
You can download the entire project from here.
Conclusions
In this post, I showed you how to setup a Metal project for parallel computing using the GPU. Next time, we will see how to use it to perform real-time image processing using the iPhone video camera. Check also this post that I wrote in 2012 to demonstrate how to build a custom video camera. So, you will be able to compare Metal to the Accelerate framework and appreciate its computing speed.
Geppy
Geppy Parziale (@geppyp) is cofounder of InvasiveCode (@invasivecode). He has developed iOS applications and taught iOS development since 2008. He worked at Apple as iOS and OS X Engineer in the Core Recognition team. He has developed several iOS and OS X apps and frameworks for Apple, and many of his development projects are top-grossing iOS apps that are featured in the App Store.
awesome thanks for sharing
Thanks for the great introduction to Metal! I've been interested in using Metal to do image processing in LED Lab, my LED display controller app. This gives me many starting points to look at, and a much clearer, simpler overview than the Apple docs provide.
Thank you so much for reading our posts. We are glad that you liked it.
Thanks for the very useful introduction to Metal! I play on using this tech soon and you've made it a whole lot easier to simply get some idea of how to use it. I look forward to the promised installment in which it is demonstrated how to use Metal to process a stream of UI/NSImages!
Thanks for this. Apples Doc's assume you know what you are doing and that is not always the case with students. Silly question, I downloaded your project and i'm getting an error "Value of type 'ViewController' has no member 'imageFromTexture'" would you have an Idea on how to fix this. I know the sims do not work with metal so i'm building on an iphone 7.
Hi Aaron, try again now. Yesterday, I uploaded the wrong version. Thanks.
Hi, Thanks for the post, really interesting. I'm currently investigating how to use Metal for real-time image (video) processing on iOS. I noticed that you are also working on a post about this? Any clue when this will be available so I can also use this as a resource for my investigation (just starting with Metal...)
Thanks for reading our posts. As soon as we have some free time, we will post something on this topic.