Machine Learning for iOS

WWDC16 just ended and Apple left us with new amazing innovative APIs. This year speech recognition, proactive applications, machine learning, user intents, and neural networks have been the most frequent terms used during the conference. So, besides a new rich version 3 of Swift, almost every new addition to iOS, tvOS and macOS is related with artificial intelligence. For example, Metal and Accelerate in iOS 10 provide an implementation of convolutional neural networks (CNNs) for the GPU and CPU respectively. During the keynote, Craig Federighi (Apple’s SVP Software Engineering) showed how the Photos app on iOS organizes our photos according to different smart criteria. He highlighted that Photos app uses deep learning to provide such functionality. Also, Federighi showed how Siri, now available to developers, can suggest what we need.

Artificial Intelligence and Machine Learning

Artificial Intelligence (AI) is a very hot topic these days. You hear and read often terms such as artificial intelligence or machine learning or deep learning.

AI is not a new science. It already appears in Greek myths, showing the human desire of having human-like machines able to deicide autonomously by reacting to the stimuli of the external environment. However, I would say that modern AI begins with the creation of the first computer in the 20th century.

Artificial Intelligence is a wide term including many subfields. One of these subfields is Machine Learning (ML). ML is the study of algorithms that can learn from and make prediction on data. In general, tasks of ML are classified in three main categories:

  • *Supervised Learning*: The algorithms learn from labelled examples we provide. The algorithms then use these example as an analogy to decide about unlabelled data.
  • *Unsupervised Learning*: The algorithms learn from examples without provided labels, leaving the algorithms to find on their own the structure and the meaning of the data.
  • *Reinforcement Learning*: The algorithms interact with a dynamic environment in which it must perform a certain goal without a teacher explicitly telling it whether it has come close to its goal.

Machine Learning has also other type of classifications, but I don’t want to go into many details now. If you are interested in knowing more, please, check this webpage: Machine Learning.

Artificial Neural Network

An important task in ML is data classification. For example, let’s imagine you want to group together some of your photos from your iPhone photo library in 7 classes, representing 7 family members or friends. This is a typical classification problem. If you are like me, you maybe have 20,000 or more pictures in your photo library. If you have time and are patient to go through each single picture, you can organize them in the 7 groups. Recognizing if a photo contains one of the seven people seems to be really easy for a human, but inspecting every single picture of the large photo library is a really tedious task. The manual classification could require multiple days of work. So, can a machine perform this task for me, so that we don’t need to look at each single picture of the database?

Recent advances in machine learning combined with the increase of the GPU computing power and the decrease of their costs can help us to perform the above classification problem (and not only that) using a special type of algorithms known as Artificial Neural Networks (or shortly, ANNs or NNs). An ANN is a mathematical tool we can use to solve the classification problem in an almost automatic way. Neural networks are not new to the scientific community and industry, but they were almost abandoned, because of the enormous amount of computations they required. When the costs of the Graphics Processing Units (GPUs) started to drop, scientists and developers reconsidered neural networks. Especially the work done during the past 4 or 5 years and the introduction of a new type of ANNs, known as Convolutional Neural Network (CNNs), has supported the development of new applications based on neural networks. This opened also a new field of Machine Learning known as Deep Learning.

The work done by an ANN is not completely automatic, because the network must be modeled on the specific classification problem. A neural network falls into the Supervised Learning category (see above for the definition). So, you need to provide the ANN with a quite large amount of labelled data, so that it can learn from these examples. The manual work needed to label the examples can be quite tedious. Let me explain you why.

The figure below highlights an ANN. For the moment, think of it as a black box. Later, we will see how the ANN looks like and how to implement it in Swift.

Neural Network Overview
Neural Network Overview

As you can see in the above picture, the ANN has a single input (an image of a person) and 7 outputs. The number of inputs and outputs depends on the particular classification problem we want to solve. For each input image we provide to it, the neural network will tell us who the person in the image is. Of course, an ANN cannot be 100% accurate. Indeed, a neural network provides you with an estimation of the likelihood that the input image shows the person Steve. In the previous picture, this likelihood is represented by a number between zero (likelihood = 0%) and one (likelihood = 100%).

How does the ANN compute this likelihood? It learns from data through a process known as training. In this process, we provide the ANN with labelled examples. For each labelled example, we force the ANN to adjust its internal states. The process works in the following way. You present a labelled image to the ANN. Then, the ANN computes the likelihood for each of its output. Now, if you provide an image of the person Steve, you would expect that the probability of the output “Steve” would be almost one, while the probability of the remaining outputs would be almost zero. However, since the ANN is starting to learn now and it really doesn’t know who Steve is, it can provide an erroneous result. So, we compute the error between the expected result (Steve = 1, other people = 0) and the actual result. Then, using an algorithm known as back propagation, we inject this error back in the ANN through its outputs. The error is then propagated back until it reaches the ANN’s input. The back-propagation process changes the internal states of the ANN. Once this is done, we present a new photo to the ANN and wait for the neural network to compute the new results. Then, we repeat this process many times until the output error does not change anymore. At that point, we say that the neural network has converged. We stop the training here and we collect the internal states (represented by a large data set of floating point numbers called weights - see later). This data set is what then we use in the final application.

If you want to make your app able to classify images, sounds, or any type of digital signal, you need a neural network and the back-propagation algorithm. The back-propagation process is automatic, but it is very computational expensive and extensive. Here is where GPUs become really helpful. They can enormously reduce the training time. So, instead of spending days to train the neural network, you can use arrays of GPUs and reduce the training time to just few hours.

Besides the processing time, you need to prepare many labelled samples. There are ways to generate these labelled samples automatically, but this is not always possible. So, in some cases you need to collect the data and then, manually label them one by one. These labeled data are also known as the training set. There are some studies about how large the training set should be to train a neural network, but I am not going to cover these details here.

As I mentioned above, in iOS 10, macOS 10.12, and tvOS 10, the Metal and Accelerate frameworks will provide developers with an implementation of a Convolutional Neural Network. Unfortunately, the training algorithm will not be available. So, you would need to train your neural network using third-party tools or you have to develop them by yourself. You have different alternative solutions. For example, you can use TensorFlow from Google or Caffe from Berkley Vision and Learning Center. Other commercial or open-source solutions are also available on the market. An even better alternative is to contact us and let us help you with your specific classification problem. The advantage of working with us is that we will take care also of any pre-processing stage that your data might need previous to the neural network stage to prepare the data. We can also help you to define the neural network architecture, specifying the optimal number of layers and neurons (and other parameters). We can help with all these steps and you can use our algorithms. At INVASIVECODE, we developed the training algorithms based on Metal and Accelerate framework. Additionally, because we are expert in computer vision and pattern recognition, we can preprocess your image or audio data and prepare them for the neural network. Our convolutional neural network supports iOS 8, iOS 9 and iOS 10. We also optimized the algorithms for tvOS and macOS.

Neuron, Synapsis and Layer

Let’s see the main components of an artificial neural network. For sure, the most important component is the Neuron or Node. An ANN can contain hundreds or thousands of neurons organized in Layers. Each neuron is connected to other neurons through Synapses or Connections. The following image highlights an ANN with four layers (L0, L1, L2, and L3).

Example of a neural network
Example of a neural network

Each ANN can contain any number of layers and each layer can contain any number of neurons. The first layer (L0 in the previous figure) is known as the input layer. The last network layer (L3 in the previous figure) is the output layer. Additional layers between the input and output layers are known as hidden layers. The ANN represented in the previous figure is also a fully-connected neural network (FCNN), because each neuron of a hidden layer is connected with every neuron of the previous and next layers. FCNNs are convenient for practical implementations, but you can also have NN were each neuron is connected only to some of the neurons of the previous or next layer.

Synapses and neurons apply transformations to the input data. The data flow from the input layer to the first hidden layer; then, from the the first hidden layer to the second hidden layer, and so on, until they reach the output layer.

Each synapsis input is the output of a neuron. In the same way, the synapsis output becomes the input of a neuron. A synapsis multiplies its input for a value known as weight. Each connection has its own weight. During the neural network training process, the back-propagation algorithm modifies the value of each weight of each synapsis.

Synapses can be implemented in Swift in a very simple way:

Neurons perform two sequential operations on its inputs (see figure below):

  1. Step 1: the neuron sums up all its inputs
  2. Step 2: the neuron applies an activation function to the sum to obtain the final output
A Neuron
A Neuron

If we want to build a neuron in Swift, it is very simple too:

Line 1 computes the sum of the neuron’s inputs. Instead of using an initial value zero, it is very common to use a value different from zero, known as the bias of the neuron.

Line 2 applies the activation function to the previously computed sum. There are different types of activation functions proposed by scientists. In line 2, I am using a sigmoid function (see next figure).

Sigmoid function
Sigmoid function

Other activation functions proposed in the literature are the hyperbolic tangent, the rectified linear unit (ReLU), the leaky rectified linear unit (leaky ReLU), the exponential linear unit (ELU).

The following code snippet shows the implementation of the most important activation functions:

With some little extra code you can have a fully functional neural network.


So, what can you do with an artificial neural network? The number of applications is enormous. If you think carefully about the classification problem I described above, you will find out that this type of issues are very frequent in many applications. For example, face recognition is a classification problem. Or, if you want to let the machine understand your mood just looking at your facial expression, this is also a classification problem. Another typical application is handwriting recognition. Navigation assistance algorithms use CNNs to decide if the object moving in front of the car is a pedestrian or another car.

Neural networks are used nowadays in many fields: military, health, social, risk management, traffic, entertainment and so on.


Machine learning and computer vision open new kind of applications for iOS, tvOS and macOS devices. Smarter, more user-friendly and convenient applications are already starting to appear on mobile devices. But we are just at the beginning. And to make it even more convenient for you, we have developed preprocessing and training algorithms that will be available for purchase very soon. Stay tuned!


Geppy Parziale (@geppyp) is cofounder of InvasiveCode (@invasivecode). He has developed iOS applications and taught iOS development since 2008. He worked at Apple as iOS and OS X Engineer in the Core Recognition team. He has developed several iOS and OS X apps and frameworks for Apple, and many of his development projects are top-grossing iOS apps that are featured in the App Store. Geppy is an expert in computer vision and machine learning.



(Visited 1,002 times, 1 visits today)