转载

iOS App 使用 GCD 导致的卡顿问题

最近在调研 iOS app 中存在的各种卡顿现象以及解决方法。

iOS App 出现卡顿（stall）的概率可能超出大部分人的想象，尤其是对于大公司旗舰型 App。一方面是由于业务功能不停累积，各个产品团队之间缺乏协调，大家都忙着增加功能，系统资源出现瓶颈。另一方面的原因是老设备更新换代太慢，iOS 设备的耐用度极好，现在还有不少 iPhone 4S 在服役，iPhone 6 作为问题设备持有量很高，据估计，现在 iPhone 6s 以前的设备占有比高达 40%。

所以，如果尝试在线上 App 加入卡顿检测的工具，你会发现卡顿出现的概率高的惊人。但卡顿的检测就修复并不简单，主要是因为难以在开发设备上复现。

之前写过一篇介绍主线程卡顿监控的文章，好像现在主流的做法都是通过监控 Runloop 事件回调，检查进入回调的时间间隔是否超过 Threshold，超过则记录当前 App 所有线程的 call stack。

我前段时间从后台上报的卡顿日志里看到这样一个 call stack：

> 0 libsystem_kernel.dylib __workq_kernreturn
> 1 libsystem_pthread.dylib _pthread_workqueue_addthreads
> 2 libdispatch.dylib _dispatch_queue_wakeup_global_slow
> 3 libdispatch.dylib _dispatch_queue_wakeup_with_qos_slow
> 4 libdispatch.dylib dispatch_async

也就是说卡顿出现在 dispatch_async，以我现有对于 GCD 的认知，dispatch_async 是绝无可能出现卡顿的。dispatch_async 的主要任务是从系统线程池里取出一个工作线程，并将 block 放到该线程里去执行。

上述 call stack 确确实实的出现了，而且样本数量还不少，最后一个函数明显是一个内核调用。从函数名字猜测，可能是 GCD 尝试从线程池里获取线程，但已有线程都在执行状态，所以向系统内核申请创建新的线程。但创建线程的内核调用会很慢吗？会慢到让主线程出现卡顿的程度？带着疑问我搜索了大量相关资料，最后比较相关的有这样一篇文章：http://newosxbook.com/articles/GCD.html

其中有这样一段话：

This isn’t due to 10.9’s GCD being different - rather, it demonstrates the true asynchronous nature of GCD: The main thread has yet to return from requesting the worker (which it does by pthread_workqueue_addthreads_np, as I’ll describe later), and already the worker thread has spawned and is mid execution, possibly on another CPU core. The exact state of the main thread with respect to the worker is largely unpredictable.

作者认为，GCD 申请到的线程有可能是一个正在处理其他任务的 thread，main thread 需要等待这个忙碌的线程返回才能继续执行，我对这种说法存疑。

最后求助无门的状况下，我决定使用一次宝贵的 TSL 机会，直接向 Apple 的工程师求教。这里不得不提下，向 Apple 寻求 technical support 是非常宝贵而且可行的方案，每个开发者账号每年都有 2 次机会，不用非常可惜。

我把问题抛过去后，得到一位 Apple 内核团队工程师的回复，我将精简过的回复以问答的形式展示和大家分享：

Q: looks like even if it’s async dispatching, the main thread still has to wait for the other thread to return, during which time, the other thread happen to be in mid execution of sth. this confuses me, what exactly is the main thread waiting for?

为什么主线程需要等待 dispatch_async 返回，主线程到底在等待什么？

A: It’s hard to say with just a user space backtrace. Frame 0 has clearly sent the current thread into the kernel, and this specific kernel call is /way/ too complex to analyse from outside [1].

从用户态调用栈无法得出答案，内核可能的状态过于复杂。

Q: I know it’s suggested that we create limited amount of serial queue，and use target queue probably. but what could happen if we don’t follow that rule?

Apple 一直推荐自己创建 serial GCD queue 的时候，一定要控制数量，而且最好设置 target queue，否则会出现问题，但会出现什么问题我一直很好奇，这次借着机会一起问了。

* On macOS, where the system is happier to over commit, you end up with a thread explosion.  That in turn can lead to problems running out of memory, running out of Mach ports, and so on.
* On iOS, which is not happy about over committing, you find that the latency between a block being queued and it running can skyrocket.  This can, in turn, have knock-on effects.  For example, the last time I looked at a problem like this I found that `NSOperationQueue` was dispatching blocks to the global queue for internal maintenance tasks, so when one subsystem within the app consumed all the dispatch worker threads other subsystems would just stall horribly.
Note: In the context of dispatch, an “over commit” is where the system had to allocate more threads to a queue then there are CPU cores.  In theory this should never be necessary because work you dispatch to a queue should never block waiting for resources.  In practice it’s unavoidable because, at a minimum, the work you queue can end up blocking on the VM subsystem.
Despite this, it’s still best to structure your code to avoid the need for over committing, especially when the over commit doesn’t buy you anything.  For example, code like this:
group = dispatch_group_create();
for (url in urlsToFetch) {
    dispatch_group_enter(group);
    dispatch_async(dispatch_get_global_queue(…), ^{
        … fetch `url` synchronously …
        dispatch_group_leave(group);
    });
}
dispatch_group_wait(group, …);
is horrible because it ties up 10 dispatch worker threads for a very long time without any benefit.  And while this is an extreme example — from dispatch’s perspective, networking is /really/ slow — there are less extreme examples that are similarly problematic.  From dispatch’s perspective, even the disk drive is slow (-:

这段回复很有意思。阅读过 GCD 源码的同学会知道，所有默认创建的 GCD queue 都有一个优先级，但其实每个优先级对应两个 queue，比如一个是 default-priority，那么另一个就是 default-priority-overcommit。dispatch_async 的时候，会首先将任务丢进 default-priority 队列，如果队列满了，就转而丢进 default-priority-overcommit。

在 Mac 系统里，GCD 允许 overcommit，意味着每次 dispatch_async 都会创建一个新线程，即使 over commit 了，这些过量的线程会根据优先级来竞争 CPU 资源。

而在 iOS 系统里，GCD 会控制 overcommit，如果某个优先级队列 over commit 里，那么排在后面的任务就会处于等待状态。移动设备 CPU 资源比较紧张，这种设计合乎常理。

所以如果在 iOS 里创建过多的 serial queue，那么后面提交的任务可能就会一直处于等待状态。这也是为什么我们需要严格控制 queue 的数量和层级关系，最好是 App 当中每个子系统只能分配固定数量和优先级的 queue，从而避免 thread explosion 导致的代码无法及时执行问题。

Q：I know the system watchdog can kill an app if the main thread is taking too long to respond. I also heard rumors that there are two other cases that may gets your app killed by watchdog. the first is too many new threads are being created like by random usage of dispatching work to global concurrent queue? the second case is if CPU has been kept too busy like 100% for too long, watchdog kills app too?

我借机问了下系统 watchdong 强杀 App 的原因，因为坊间一直有传闻是除了主线程长时间没反应之外，创建过多的线程和 CPU 长时间超负荷运转也会导致被强杀。

A：I’m not aware of any specific watchdog check along those lines, but it’s not hard to imagine that the above-mentioned knock-on effects might jam up your app sufficiently for the watchdog to kill it for other reasons. Running the CPU for too long generates a crash report but it doesn’t actually kill the app. It’s essentially a ‘warning’ crash report about the problem.

创建过多线程不会直接导致 watchdog 强杀，但过多线程有可能导致主线程得不到及时处理，而因为其他原因被 kill。而 CPU 长时间过载并不会导致强杀，但系统会生成一个 report 来警告开发者。我确实看到过不少这类 ‘this is not a crash’ 的 crash 日志。

另外还有一些问答，和我当前疑问并不直接相关所以略去。最后再贴一段比较有意思的回复，在阅读之前大家可以自己先思考下：

dispatch_async(myQueue, ^{
    // line A
});
// line B

line A 和 line B 谁先执行？

Consider a snippet like this:

dispatch_async(myQueue, ^{
    // line A
});
// line B

there’s clearly a race condition between lines A and B, that is, between the `dispatch_async` returning and the block running on the queue.  This can pan out in multiple ways, including:

* If `myQueue` (which we’re assuming is a serial queue) is busy, A has to wait so B will definitely run before A.

* If `myQueue` is empty, there’s no idle CPU, and `myQueue` has a higher priority then the thread that called `dispatch_async`, you could imagine the kernel switching the CPU to `myQueue` so that it can run A.

* The thread that called `dispatch_async` could run out of its time quantum after scheduling B on `myQueue` but before returning from `dispatch_async`, which again results in A running before B.

* If `myQueue` is empty and there’s an idle CPU, A and B could end up running simultaneously.

答案

其实最后我也没有得到我想要的准确的答案，可能正如回复里所说，情况有很多而且过于复杂，没法通过一个用户态的 call stack 简单推知内核的状态，但有些有价值的信息还是得以大致理清：

信息一

iOS 系统本身是一个资源调度和分配系统，CPU，disk IO，VM 等都是稀缺资源，各个资源之间会互相影响，主线程的卡顿看似 CPU 资源出现瓶颈，但也有可能内核忙于调度其他资源，比如当前正在发生大量的磁盘读写，或者大量的内存申请和清理，都会导致下面这个简单的创建线程的内核调用出现卡顿：

libsystem_kernel.dylib __workq_kernreturn

所以解决办法只能是自己分析各 thread 的 call stack，根据用户场景分析当前正在消耗的系统资源。后面也确实通过最近提交的代码分析，发现是由于增加了一些非常耗时的磁盘 io 任务（虽然也是放在在子线程），才出现这个看着不怎么沾边的 call stack。revert 之后卡顿警报就消失了。

信息二

现有的卡顿检测工具都只能在超时的情况下 dump call stack，但出现超时有可能是任务 A，B，C 共同作用导致的，A 和 B 可能是真正耗时的任务，C 不耗时但碰巧是最后一个，所以被当成元凶，而 A 和 B 却没有出现在上报日志里。我暂时也没有想到特别好的解决办法。很明显，libsystem_kernel.dylib __workq_kernreturn 就是一个不怎么耗时的 C 任务。

信息三

在使用 GCD 创建 queue，或者说一个 App 内部使用 GCD 执行子线程任务时，最好有一套 App 所有团队都能遵循的队列使用机制，避免创建过多的 thread，而出现意料之外的线程资源紧缺，代码无法及时执行的情况。这很难，尤其是在大公司动则上百人的团队里面。

正文到此结束