Nobody likes it when an App crashes. You don’t like it and neither do your users. And when your App crashes, you, as the developer, are very interested in knowing why, as soon as possible. You want to know what is causing the crash so you can fix it. To do so, it’s invaluable to have crash reports delivered to you and your team without delay. And that is why you install a framework to detect crashes and generate crash reports.
Some crash report libraries are open-source and you can integrate them by yourself. Other libraries are provided as commercial products, offered by well known companies. But remember, some of these commercial offerings are based on the same open source libraries you can find in GitHub.
You might be wondering why I still have not mentioned the names of these crash report services/libraries, or posted their respective links. I am doing this purposefully. This post is not meant to be a critique of crash report providers. Rather, this post is intended to show you how crashes are captured and how crash log generation works. Once you understand how this is done, you will be able to decide whether or not it is worth using third-party crash report services or trust the crash logs provided by Apple in Xcode.
What do we need to do to detect that our App is about to crash? The answer is to handle Mach exceptions. A Mach exception is a synchronous interruption to the normal flow of program control, caused by the program itself. Be aware that not all exceptions cause an app crash, so we do not need to handle all of them.
When an App is about to crash, a Mach exception will occur. Mach provides an IPC-based exception-handling facility wherein exceptions are converted to messages. When an exception occurs, a message is delivered to an exception port, and the thread that caused the exception (aka the victim thread) keeps waiting for a response that indicates if the exception was successfully handled by an exception handler. The exception handler is in charge of creating a crash report with the actual state of the thread that caused the crash. Remember that, while waiting for the reply, the thread that caused the crash is paused, so we might think that it is safe to generate the report at this point.
Well, we are not so safe at this point. We are in a multi-threaded environment, so while one thread is paused, other threads are still running, and that means that they can be changing things around. Moreover, they can be trying to change the same thing that caused the crash in our crashed thread. And that is bad.
If we do not immediately pause all the other threads, as soon as we detect the crash, the crash report that we create will contain information that is not accurate and does not represent the truth of what has just happened.
I said before that the message is sent to an exception port. You can set an exception port at thread level, task level, and/or at host level, in this order. If the message fails to be delivered or processed at a thread level, then the kernel will try at the task level, and then at the host level. This is great, because we can put our exception handler at a task level. That means that the exception handler can be running in another task, i.e. in another App. If our App crashes, the other App will then handle the crash. It will be able to pause all our threads and take a real picture of all the information. Whatever the other App is doing, it will not interfere with our state.
This is how debuggers work. When you set a breakpoint in the debugger, that breakpoint will be translated into a Mach exception that will cause a message to be delivered, and handled by the debugger task (an external process). And that is also how Apple handles crashes and generates the crash reports for our Apps. This is an out of process technique. So, now, you must be realizing, that only Apple or LLDB can do that. You cannot do that in for a sandboxed iOS and OS X App, because a sandboxed app cannot communicate with external processes. So, we cannot do it out of process. The only thing we can do is set an exception port at a thread level (inside the process) and try to use that thread when the crash occurs.
If you are using 2 libraries that handle a crash at the same time, the result could be an inaccurate crashlog. In this case, you must use just one crash report library, and deactivate all the remaining ones. But what happens with the Apple crashlog? Since Apple is also detecting the crashes of our Apps and you cannot deactivate this service, your library will interfere with the Apple crash reporter. The Apple crash reporter is correctly an out of process service. So, Apple does the best job detecting crashes, not the other libraries or services you are using.
Creating the report
Let’s assume at this point that we are able to detect a crash in a safe manner and that we are not messing things up. We have received the message and have paused all the other threads, but one. We do not have to pause the thread we are using to handle the exception and generate the crash report.
At this point, you can do very few things. Your App is in an inconsistent state, so you should try to read that state without corrupting it, save it, and nothing else.
You can only use asynchronous-safe functions. If you use a function that is not asynchronous-safe or that is not reentrant-safe, bad things will happen, and even if you are able to finally generate a report, the data will be corrupted.
One more question, what happens to the privacy of your users? Go to your iPhone Settings App. Tap on Privacy, and then Diagnostics & Usage. Apple asks for these settings when you are setting up your device for the first time. That should give you a sense of how important the privacy of this information is for Apple, its customers, thus your users. Since all these commercial and open source library you are using are not requesting any user permission to upload a crash log to their server, you are doing a very bad job. Be aware of this too.
This is Apple’s job
Apple is doing a great job with crash debugging and crash reporting. And they are the ones who can do it correctly, because they can do it out of process, since they can control the sandbox environment. Each year Apple also introduces new interesting features:
- In WWDC 2015, Apple introduced the Address Sanitizer. You can see the video here: Advanced Debugging and the Address Sanitizer
- In WWDC 2014 they introduced Activity Tracing: Fix Bugs Faster using Activity Tracing
You can use these features to augment the information provided during a crash. Check the Activity Tracing. It is an amazing powerful framework that does not overload your system with additional useless external libraries.
During the labs at this year’s WWDC, I talked with some Apple engineers about crash reports. Since I had been researching this subject for some time, I wanted to know from them if I was doing it right. Their recommendation was very clear: “Do not use another crash report service”. Remember, every time we add a crash reporter in our App, we are interfering with the Apple crash reporters.
If you believe that it is better to run your own crash reporter, make sure that you are using the right tool and you can trust the results. Make sure that it is not messing things up with Apple’s crash handlers.
I hope that you now have a better understanding of what using a third-party crash reporter means, how they work, and the implications and side effects of implanting your own or using a third party library.
If you want to dig deeper in how Mach exceptions, Unix signals, and Unix systems work, you can also check these books:
- Mac OS X Internals: A Systems Approach. Author: Amit Singh. Addison-Wesley
- Advanced Programming in the UNIX® Environment. Third Edition. Authors: W. Richard Stevens and Stephen A. Rago. Addison-Wesley
- Mac OS X and iOS Internals: To the Apple’s Core. Author: Jonathan Levin. Wrox.
All of them are available in the iBooks Store.