iOS 底层 - 性能优化之CPU、GPU

本文源自本人的学习记录整理与理解，其中参考阅读了部分优秀的博客和书籍，尽量以通俗简单的语句转述。引用到的地方如有遗漏或未能一一列举原文出处还望见谅与指出，另文章内容如有不妥之处还望指教，万分感谢！

性能优化是开发中不可避开的一个环节，也是开发者需要深入研究的一个课题；提高APP的性能也是一名高级开发者的必备技能！

这里只聊聊关于iOS系统下的APP性能那些事！想了解APP的性能就需要知道它体现在那些方面，以及如何监控它，并优化它！

APP的性能大概体现在以下方面：
CPU 占用率、 内存使用情况、启动时间、卡顿、FPS、使用时崩溃率、耗电量监控、流量监控、网络状况监控、等等。

本文会从硬件CPU的工作原理，到卡顿和离屏渲染、到APP的启动、电池的能耗、再到安装包瘦身；做逐一认识和尽可能的提出优化办法

1. CPU 占用率

CPU作为中央处理器，是手机最关键的组成部分，所有应用程序都需要它来调度运行，且资源有限。所以当我们的APP因设计不当，使 CPU 持续以高负载运行，将会出现APP卡顿、手机发热发烫、电量消耗过快等等严重影响用户体验的现象。

由此对于应用在CPU中占用率的监控，将变得尤为重要。那么应该如何来获取CPU的占有率呢？！

我们知道APP在运行的时候，会对应一个Mach Task，而Task下可能有多条线程同时执行任务，每个线程都是作为利用CPU的基本单位。所以我们可以通过获取当前Mach Task下所有线程占用 CPU 的情况，来计算APP的 CPU 占有率。

在《OS X and iOS Kernel Programming》是这样描述 Mach task 的：

任务（task）是一种容器（container）对象，虚拟内存空间和其他资源都是通过这个容器对象管理的，这些资源包括设备和其他句柄。严格地说，Mach 的任务并不是其他操作系统中所谓的进程，因为 Mach 作为一个微内核的操作系统，并没有提供“进程”的逻辑，而只是提供了最基本的实现。不过在 BSD 的模型中，这两个概念有1：1的简单映射，每一个 BSD 进程（也就是 OS X 进程）都在底层关联了一个 Mach 任务对象。

Mac OS X 中进程子系统组成的概念图.png

iOS 是基于 Apple Darwin 内核，由操作系统(kernel)、XNU内核和Runtime库 组成，而 XNU 是 Darwin 的内核，它是“X is not UNIX”的缩写，是一个混合内核，由 Mach 微内核和BSD组成。Mach 内核是轻量级的平台，只能完成操作系统最基本的职责，比如：进程和线程、虚拟内存管理、任务调度、进程通信和消息传递机制等。其他的工作，例如文件操作和设备访问，都由 BSD 层实现。

iOS 的线程技术与Mac OS X类似，也是基于 Mach 线程技术实现的，在 Mach 层中 thread_basic_info 结构体封装了单个线程的基本信息：

struct thread_basic_info {
    time_value_t  user_time;      /* user run time */
    time_value_t  system_time;    /* system run time */
    integer_t    cpu_usage;       /* scaled cpu usage percentage */
    policy_t     policy;          /* scheduling policy in effect */
    integer_t    run_state;       /* run state (see below) */
    integer_t    flags;           /* various flags (see below) */
    integer_t    suspend_count;   /* suspend count for thread */
    integer_t    sleep_time;      /* number of seconds that thread  has been sleeping */
}

一个Mach Task包含它的线程列表。内核提供了task_threads API 调用获取指定 task 的线程列表，然后可以通过thread_infoAPI调用来查询指定线程的信息，在thread_act.h中有相关定义。

task_threads 将target_task 任务中的所有线程保存在act_list数组中，act_listCnt表示线程个数：

kern_return_t task_threads
(
    task_t target_task,
    thread_act_array_t *act_list,
    mach_msg_type_number_t *act_listCnt
);

thread_info结构如下：

kern_return_t thread_info
(
    thread_act_t target_act,
    thread_flavor_t flavor,  // 传入不同的宏定义获取不同的线程信息
    thread_info_t thread_info_out,  // 查询到的线程信息
    mach_msg_type_number_t *thread_info_outCnt  // 信息的大小
);

所以我们如下来获取CPU的占有率：

#import "LSLCpuUsage.h"
#import <mach/task.h>
#import <mach/vm_map.h>
#import <mach/mach_init.h>
#import <mach/thread_act.h>
#import <mach/thread_info.h>

@implementation LSLCpuUsage

+ (double)getCpuUsage {
    kern_return_t           kr;
    thread_array_t          threadList;         // 保存当前Mach task的线程列表
    mach_msg_type_number_t  threadCount;        // 保存当前Mach task的线程个数
    thread_info_data_t      threadInfo;         // 保存单个线程的信息列表
    mach_msg_type_number_t  threadInfoCount;    // 保存当前线程的信息列表大小
    thread_basic_info_t     threadBasicInfo;    // 线程的基本信息
    
    // 通过“task_threads”API调用获取指定 task 的线程列表
    //  mach_task_self_，表示获取当前的 Mach task
    kr = task_threads(mach_task_self(), &threadList, &threadCount);
    if (kr != KERN_SUCCESS) {
        return -1;
    }
    double cpuUsage = 0;
    for (int i = 0; i < threadCount; i++) {
        threadInfoCount = THREAD_INFO_MAX;
        // 通过“thread_info”API调用来查询指定线程的信息
        //  flavor参数传的是THREAD_BASIC_INFO，使用这个类型会返回线程的基本信息，
        //  定义在 thread_basic_info_t 结构体，包含了用户和系统的运行时间、运行状态和调度优先级等
        kr = thread_info(threadList[i], THREAD_BASIC_INFO, (thread_info_t)threadInfo, &threadInfoCount);
        if (kr != KERN_SUCCESS) {
            return -1;
        }
        
        threadBasicInfo = (thread_basic_info_t)threadInfo;
        if (!(threadBasicInfo->flags & TH_FLAGS_IDLE)) {
            cpuUsage += threadBasicInfo->cpu_usage;
        }
    }
    
    // 回收内存，防止内存泄漏
    vm_deallocate(mach_task_self(), (vm_offset_t)threadList, threadCount * sizeof(thread_t));

    return cpuUsage / (double)TH_USAGE_SCALE * 100.0;
}
@end

当然也可以更直观的看CPU的占有率，用神奇的Xcode

2. 内存

虽然现在的手机内存越来越大，但毕竟是有限的，如果因为我们的应用设计不当造成内存过高，可能面临被系统“干掉”的风险，这对用户来说是毁灭性的体验。

Mach task 的内存使用信息存放在mach_task_basic_info结构体中，其中resident_size 为应用使用的物理内存大小，virtual_size为虚拟内存大小，在task_info.h中：

#define MACH_TASK_BASIC_INFO     20         /* always 64-bit basic info */
struct mach_task_basic_info {
        mach_vm_size_t  virtual_size;       /* virtual memory size (bytes) */
        mach_vm_size_t  resident_size;      /* resident memory size (bytes) */
        mach_vm_size_t  resident_size_max;  /* maximum resident memory size (bytes) */
        time_value_t    user_time;          /* total user run time for
                                               terminated threads */
        time_value_t    system_time;        /* total system run time for
                                               terminated threads */
        policy_t        policy;             /* default policy for new threads */
        integer_t       suspend_count;      /* suspend count for task */
};

获取方式是通过 task_info API 根据指定的 flavor 类型，返回 target_task 的信息，在task.h中：

kern_return_t task_info
(
    task_name_t target_task,
    task_flavor_t flavor,
    task_info_t task_info_out,
    mach_msg_type_number_t *task_info_outCnt
);

笔者尝试过使用如下方式获取内存情况，基本和腾讯的GT的相近，但是和Xcode和Instruments的值有较大差距：

// 获取当前应用的内存占用情况，和Xcode数值相差较大
+ (double)getResidentMemory {
    struct mach_task_basic_info info;
    mach_msg_type_number_t count = MACH_TASK_BASIC_INFO_COUNT;
    if (task_info(mach_task_self(), MACH_TASK_BASIC_INFO, (task_info_t)&info, &count) == KERN_SUCCESS) {
        return info.resident_size / (1024 * 1024);
    } else {
        return -1.0;
    }
}

后来看了一篇博主讨论了这个问题，说使用phys_footprint才是正解，博客地址。亲测，基本和Xcode的数值相近。

// 获取当前应用的内存占用情况，和Xcode数值相近
+ (double)getMemoryUsage {
    task_vm_info_data_t vmInfo;
    mach_msg_type_number_t count = TASK_VM_INFO_COUNT;
    if(task_info(mach_task_self(), TASK_VM_INFO, (task_info_t) &vmInfo, &count) == KERN_SUCCESS) {
        return (double)vmInfo.phys_footprint / (1024 * 1024);
    } else {
        return -1.0;
    }
}

博主文中提到：关于 phys_footprint 的定义可以在 XNU 源码中，找到 osfmk/kern/task.c 里对于 phys_footprint 的注释，博主认为注释里提到的公式计算的应该才是应用实际使用的物理内存。

/*
 * phys_footprint
 *   Physical footprint: This is the sum of:
 *     + (internal - alternate_accounting)
 *     + (internal_compressed - alternate_accounting_compressed)
 *     + iokit_mapped
 *     + purgeable_nonvolatile
 *     + purgeable_nonvolatile_compressed
 *     + page_table
 *
 * internal
 *   The task's anonymous memory, which on iOS is always resident.
 *
 * internal_compressed
 *   Amount of this task's internal memory which is held by the compressor.
 *   Such memory is no longer actually resident for the task [i.e., resident in its pmap],
 *   and could be either decompressed back into memory, or paged out to storage, depending
 *   on our implementation.
 *
 * iokit_mapped
 *   IOKit mappings: The total size of all IOKit mappings in this task, regardless of
     clean/dirty or internal/external state].
 *
 * alternate_accounting
 *   The number of internal dirty pages which are part of IOKit mappings. By definition, these pages
 *   are counted in both internal *and* iokit_mapped, so we must subtract them from the total to avoid
 *   double counting.
 */

当然我也是赞同这点的😁😁😁😁😁😁。

3. FPS

通过维基百科我们知道，FPS是Frames Per Second 的简称缩写，意思是每秒传输帧数，也就是我们常说的“刷新率（单位为Hz）。

FPS是测量用于保存、显示动态视频的信息数量。每秒钟帧数愈多，所显示的画面就会愈流畅，FPS值越低就越卡顿，所以这个值在一定程度上可以衡量应用在图像绘制渲染处理时的性能。一般我们的APP的FPS只要保持在50-60之间，用户体验都是比较流畅的。

苹果手机屏幕的正常刷新频率是每秒60次，即可以理解为FPS值为60。我们都知道CADisplayLink是和屏幕刷新频率保存一致，所以我们是否可以通过它来监控我们的FPS呢？！

首先CADisplayLink是什么

CADisplayLink是CoreAnimation提供的另一个类似于NSTimer的类，它总是在屏幕完成一次更新之前启动，它的接口设计的和NSTimer很类似，所以它实际上就是一个内置实现的替代，但是和timeInterval以秒为单位不同，CADisplayLink有一个整型的frameInterval属性，指定了间隔多少帧之后才执行。默认值是1，意味着每次屏幕更新之前都会执行一次。但是如果动画的代码执行起来超过了六十分之一秒，你可以指定frameInterval为2，就是说动画每隔一帧执行一次（一秒钟30帧）。

使用CADisplayLink监控界面的FPS值，参考自YYFPSLabel：

import UIKit

class LSLFPSMonitor: UILabel {

    private var link: CADisplayLink = CADisplayLink.init()
    private var count: NSInteger = 0
    private var lastTime: TimeInterval = 0.0
    private var fpsColor: UIColor = UIColor.green
    public var fps: Double = 0.0
    
    // MARK: - init
    
    override init(frame: CGRect) {
        var f = frame
        if f.size == CGSize.zero {
            f.size = CGSize(width: 55.0, height: 22.0)
        }
        super.init(frame: f)
        
        self.textColor = UIColor.white
        self.textAlignment = .center
        self.font = UIFont.init(name: "Menlo", size: 12.0)
        self.backgroundColor = UIColor.black
        
        link = CADisplayLink.init(target: LSLWeakProxy(target: self), selector: #selector(tick))
        link.add(to: RunLoop.current, forMode: RunLoopMode.commonModes)
    }
    
    deinit {
        link.invalidate()
    }
    
    required init?(coder aDecoder: NSCoder) {
        fatalError("init(coder:) has not been implemented")
    }
    
    // MARK: - actions
    
    @objc func tick(link: CADisplayLink) {
        guard lastTime != 0 else {
            lastTime = link.timestamp
            return
        }
        
        count += 1
        let delta = link.timestamp - lastTime
        guard delta >= 1.0 else {
            return
        }
        
        lastTime = link.timestamp
        fps = Double(count) / delta
        let fpsText = "\(String.init(format: "%.3f", fps)) FPS"
        count = 0
        
        let attrMStr = NSMutableAttributedString(attributedString: NSAttributedString(string: fpsText))
        if fps > 55.0{
            fpsColor = UIColor.green
        } else if(fps >= 50.0 && fps <= 55.0) {
            fpsColor = UIColor.yellow
        } else {
            fpsColor = UIColor.red
        }
        attrMStr.setAttributes([NSAttributedStringKey.foregroundColor:fpsColor], range: NSMakeRange(0, attrMStr.length - 3))
        attrMStr.setAttributes([NSAttributedStringKey.foregroundColor:UIColor.white], range: NSMakeRange(attrMStr.length - 3, 3))
        DispatchQueue.main.async {
            self.attributedText = attrMStr
        }
    }
}

通过CADisplayLink的实现方式，并真机测试之后，确实是可以在很大程度上满足了监控FPS的业务需求和为提高用户体验提供参考，但是和Instruments的值可能会有些出入。下面我们来讨论下使用CADisplayLink的方式，可能存在的问题。

(1). 和Instruments值对比有出入，原因如下:

CADisplayLink运行在被添加的那个RunLoop之中（一般是在主线程中），因此它只能检测出当前RunLoop下的帧率。RunLoop中所管理的任务的调度时机，受任务所处的RunLoopMode和CPU的繁忙程度所影响。所以想要真正定位到准确的性能问题所在，最好还是通过Instrument来确认。

(2). 使用CADisplayLink可能存在的循环引用问题。

例如以下写法：

let link = CADisplayLink.init(target: self, selector: #selector(tick))

let timer = Timer.init(timeInterval: 1.0, target: self, selector: #selector(tick), userInfo: nil, repeats: true)

原因：以上两种用法，都会对 self 强引用，此时 timer持有 self，self 也持有 timer，循环引用导致页面 dismiss 时，双方都无法释放，造成循环引用。此时使用 weak 也不能有效解决:

weak var weakSelf = self
let link = CADisplayLink.init(target: weakSelf, selector: #selector(tick))

那么我们应该怎样解决这个问题，有人会说在deinit(或dealloc)中调用定时器的invalidate方法，但是这是无效的，因为已经造成循环引用了，不会走到这个方法的。

YYKit作者提供的解决方案是使用 YYWeakProxy，这个YYWeakProxy不是继承自NSObject而是继承NSProxy。

NSProxy

An abstract superclass defining an API for objects that act as stand-ins for other objects or for objects that don’t exist yet.

NSProxy是一个为对象定义接口的抽象父类，并且为其它对象或者一些不存在的对象扮演了替身角色。具体的可以看下NSProxy的官方文档

万能替身上线

修改后代码如下，亲测定时器如愿释放，LSLWeakProxy的具体实现代码已经同步到github中。

let link = CADisplayLink.init(target: LSLWeakProxy(target: self), selector: #selector(tick))

4. CPU和GPU

在屏幕成像的过程中，CPU和GPU起着至关重要的作用

**CPU (Central Processing Unit)

对象的创建和销毁、对象属性的调整、布局计算、文本的计算的排版、图片的格式转换和解码、图像的绘制 (Core Graphics)

GPU (Graphics Processing Unit) 图像处理器

纹理的渲染
纹理是一个用来保存图像颜色的元素值的缓存，渲染是指将数据生成图像的过程。纹理渲染则是将保存在内存中的颜色值等数据，生成图像的过程。

现在的手机设备基本都是采用双缓存+垂直同步（即V-Sync）屏幕显示技术。

`屏幕显示内容的过程`

屏幕显示过程.png

如上图所示，系统内CPU、GPU和显示器是协同完成显示工作的。其中CPU负责计算显示的内容，例如视图创建、布局计算、图片解码、文本绘制等等。
随后CPU将计算好的内容提交给GPU，由GPU进行变换、合成、渲染。
GPU会预先渲染好一帧放入一个缓冲区内，让视频控制器读取，当下一帧渲染好后，GPU会直接将视频控制器的指针指向第二个容器（双缓存原理）。这里，GPU会等待显示器的VSync（即垂直同步）信号发出后，才进行新的一帧渲染和缓冲区更新（这样能解决画面撕裂现象，也增加了画面流畅度，但需要消费更多的计算资源，也会带来部分延迟）。

离屏渲染

在OpenGL中，GPU有2种渲染方式

On-Screen Rendering: 当前屏幕渲染，在当前屏幕用于显示的屏幕缓冲区进行渲染操作
Off-Screen Rendering: 离屏渲染，在当前屏幕缓冲区以外新开辟一个缓冲区进行渲染操作；这个操作是非常消耗性能的
离屏渲染消耗性能的原因

需要创建新的缓冲区
离屏渲染的整个过程，需要多次切换上下文环境, 先是从当前屏幕(On - Screen)切换到离屏(Off-Screen); 等到离屏渲染结束以后，将离屏缓冲区的渲染结果显示到屏幕上时，又需要将上下文环境从离屏切换到当前屏幕

那些操作会触发离屏渲染？

光栅化，是将几何数据经过一系列变换后最终转换为像素，从而呈现在显示设备上的过程，光栅化的本质是坐标变换、几何离散化；

我们使用 UITableView 和 UICollectionView 时经常会遇到各个 Cell 的样式是一样的，这时候我们可以使用这个属性提高性能:
layer.shouldRasterize = YES
layer.rasterizationScale = [[UIScreen mainScreen] scale]

遮罩，layer.mask
圆角，同时设置layer.masksToBounds = YES、layer.cornerRadius大于0
圆角可以通过CoreGraphics这种技术绘制裁剪圆角，如果是固定图片可以让UI提供带圆角图片
阴影，layer.shadow相关API ; 但如果设置了layer.shadowPath就不会产生离屏渲染

5. 卡顿

卡顿的原因：

掉帧.png

由上面屏幕显示的原理，采用了垂直同步机制的手机设备。如果在一个VSync 时间内，CPU 或GPU 没有完成内容提交，则那一帧就会被丢弃，等待下一次机会再显示，而这时显示屏会保留之前的内容不变。例如在主线程里添加了阻碍主线程去响应点击、滑动事件、以及阻碍主线程的UI绘制等的代码，都是造成卡顿的常见原因。

卡顿监控：

卡顿监控一般有两种实现方案：

(1). 主线程卡顿监控。通过子线程监测主线程的runLoop，判断结束休眠到再次休眠两个状态区域之间的耗时是否达到一定阈值。
(2). FPS监控。要保持流畅的UI交互，App 刷新率应该当努力保持在 60fps。FPS的监控实现原理，上面已经探讨过这里略过。

在使用FPS监控性能的实践过程中，发现 FPS 值抖动较大，造成侦测卡顿比较困难。为了解决这个问题，通过采用检测主线程每次执行消息循环的时间，当这一时间大于规定的阈值时，就记为发生了一次卡顿的方式来监控。

这也是美团的移动端采用的性能监控Hertz 方案，微信团队也在实践过程中提出来类似的方案--微信读书 iOS 性能优化总结。

美团Hertz方案流程图.png

/* Run Loop Observer Activities */
typedef CF_OPTIONS(CFOptionFlags, CFRunLoopActivity) {
    kCFRunLoopEntry = (1UL << 0),                 // 即将进入Loop
    kCFRunLoopBeforeTimers = (1UL << 1),          // 即将处理Timer
    kCFRunLoopBeforeSources = (1UL << 2),         // 即将处理Source
    kCFRunLoopBeforeWaiting = (1UL << 5),         // 即将进入休眠
    kCFRunLoopAfterWaiting = (1UL << 6),          // 刚从休眠中唤醒
    kCFRunLoopExit = (1UL << 7),                  // 即将退出Loop
    kCFRunLoopAllActivities = 0x0FFFFFFFU         // 所有状态
};

方案的提出，是根据滚动引发的Sources事件或其它交互事件总是被快速的执行完成，然后进入到kCFRunLoopBeforeWaiting状态下；假如在滚动过程中发生了卡顿现象，那么RunLoop必然会保持kCFRunLoopAfterWaiting或者kCFRunLoopBeforeSources这两个状态之一。

所以监控主线程卡顿的方案一：

开辟一个子线程，然后实时计算 kCFRunLoopBeforeSources 和 kCFRunLoopAfterWaiting 两个状态区域之间的耗时是否超过某个阀值，来断定主线程的卡顿情况。

(南栀倾寒)给出了自己的解决方案，Swift的卡顿检测第三方ANREye。这套卡顿监控方案大致思路为：创建一个子线程进行循环检测，每次检测时设置标记位为YES，然后派发任务到主线程中将标记位设置为NO。接着子线程沉睡超时阙值时长，判断标志位是否成功设置成NO，如果没有说明主线程发生了卡顿。

结合这套方案，当主线程处在Before Waiting状态的时候，通过派发任务到主线程来设置标记位的方式处理常态下的卡顿检测：

#define lsl_SEMAPHORE_SUCCESS 0
static BOOL lsl_is_monitoring = NO;
static dispatch_semaphore_t lsl_semaphore;
static NSTimeInterval lsl_time_out_interval = 0.05;


@implementation LSLAppFluencyMonitor

static inline dispatch_queue_t __lsl_fluecy_monitor_queue() {
    static dispatch_queue_t lsl_fluecy_monitor_queue;
    static dispatch_once_t once;
    dispatch_once(&once, ^{
        lsl_fluecy_monitor_queue = dispatch_queue_create("com.dream.lsl_monitor_queue", NULL);
    });
    return lsl_fluecy_monitor_queue;
}

static inline void __lsl_monitor_init() {
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        lsl_semaphore = dispatch_semaphore_create(0);
    });
}

#pragma mark - Public
+ (instancetype)monitor {
    return [LSLAppFluencyMonitor new];
}

- (void)startMonitoring {
    if (lsl_is_monitoring) { return; }
    lsl_is_monitoring = YES;
    __lsl_monitor_init();
    dispatch_async(__lsl_fluecy_monitor_queue(), ^{
        while (lsl_is_monitoring) {
            __block BOOL timeOut = YES;
            dispatch_async(dispatch_get_main_queue(), ^{
                timeOut = NO;
                dispatch_semaphore_signal(lsl_semaphore);
            });
            [NSThread sleepForTimeInterval: lsl_time_out_interval];
            if (timeOut) {
                [LSLBacktraceLogger lsl_logMain];       // 打印主线程调用栈
//                [LSLBacktraceLogger lsl_logCurrent];    // 打印当前线程的调用栈
//                [LSLBacktraceLogger lsl_logAllThread];  // 打印所有线程的调用栈
            }
            dispatch_wait(lsl_semaphore, DISPATCH_TIME_FOREVER);
        }
    });
}

- (void)stopMonitoring {
    if (!lsl_is_monitoring) { return; }
    lsl_is_monitoring = NO;
}

@end

其中LSLBacktraceLogger是获取堆栈信息的类，详情见代码Github。

打印日志如下:

2018-08-16 12:36:33.910491+0800 AppPerformance[4802:171145] Backtrace of Thread 771:
======================================================================================
libsystem_kernel.dylib         0x10d089bce __semwait_signal + 10
libsystem_c.dylib              0x10ce55d10 usleep + 53
AppPerformance                 0x108b8b478 $S14AppPerformance25LSLFPSTableViewControllerC05tableD0_12cellForRowAtSo07UITableD4CellCSo0kD0C_10Foundation9IndexPathVtF + 1144
AppPerformance                 0x108b8b60b $S14AppPerformance25LSLFPSTableViewControllerC05tableD0_12cellForRowAtSo07UITableD4CellCSo0kD0C_10Foundation9IndexPathVtFTo + 155
UIKitCore                      0x1135b104f -[_UIFilteredDataSource tableView:cellForRowAtIndexPath:] + 95
UIKitCore                      0x1131ed34d -[UITableView _createPreparedCellForGlobalRow:withIndexPath:willDisplay:] + 765
UIKitCore                      0x1131ed8da -[UITableView _createPreparedCellForGlobalRow:willDisplay:] + 73
UIKitCore                      0x1131b4b1e -[UITableView _updateVisibleCellsNow:isRecursive:] + 2863
UIKitCore                      0x1131d57eb -[UITableView layoutSubviews] + 165
UIKitCore                      0x1133921ee -[UIView(CALayerDelegate) layoutSublayersOfLayer:] + 1501
QuartzCore                     0x10ab72eb1 -[CALayer layoutSublayers] + 175
QuartzCore                     0x10ab77d8b _ZN2CA5Layer16layout_if_neededEPNS_11TransactionE + 395
QuartzCore                     0x10aaf3b45 _ZN2CA7Context18commit_transactionEPNS_11TransactionE + 349
QuartzCore                     0x10ab285b0 _ZN2CA11Transaction6commitEv + 576
QuartzCore                     0x10ab29374 _ZN2CA11Transaction17observer_callbackEP19__CFRunLoopObservermPv + 76
CoreFoundation                 0x109dc3757 __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 23
CoreFoundation                 0x109dbdbde __CFRunLoopDoObservers + 430
CoreFoundation                 0x109dbe271 __CFRunLoopRun + 1537
CoreFoundation                 0x109dbd931 CFRunLoopRunSpecific + 625
GraphicsServices               0x10f5981b5 GSEventRunModal + 62
UIKitCore                      0x112c812ce UIApplicationMain + 140
AppPerformance                 0x108b8c1f0 main + 224
libdyld.dylib                  0x10cd4dc9d start + 1

======================================================================================

方案二: 是结合CADisplayLink的方式实现

在检测FPS值的时候，我们就详细介绍了CADisplayLink的使用方式，在这里也可以通过FPS值是否连续低于某个值开进行监控。

卡顿解决的主要思路

尽可能减少CPU、GPU资源消耗
按照60FPS的刷帧率，每隔16ms就会有一次VSync信号；
也就是说在16毫秒内完成CPU和GPU的操作就不会出现卡顿; (1秒钟等于1000毫秒，1000➗60 就等于16.666666666667)

优化CPU

尽量用轻量级的对象，比如用不到事件处理的地方，可以考虑使用CALayer取代UIView
不要频繁的调用UIView的相关属性，比如frame、bounds、transform等属性，尽量减少不必要的修改；
- 尽量提前计算好布局，在有需要时一次性调整对应的属性，不要多次修改属性
- Autolayout会比直接设置frame消耗更多的CPU资源
图片的size最好跟UIImageView的size保持一致，这样就不需要拉伸重绘
控制一下线程的最大并发数量，不要开启一些不必要的线程
尽量把耗时的操作放在子线程，比如：文本处理(尺寸计算、绘制)、图片处理(编码解码、绘制)
图片解码可以考虑先获取CGImage，再创建位图上下文，把图形数据绘制到上面，这个过程就是解码；最终把CGImage通过[UIImage imageWithCGImage: CGImage]转为图片赋值给UIImageView

主要功能代码

     // context
        CGContextRef context = CGBitmapContextCreate(NULL, width, height, 8, 0, CGColorSpaceCreateDeviceRGB(), bitmapInfo);

        // draw
        CGContextDrawImage(context, CGRectMake(0, 0, width, height), cgImage);

        // get CGImage
        cgImage = CGBitmapContextCreateImage(context);

        // into UIImage
        UIImage *newImage = [UIImage imageWithCGImage:cgImage];

类似的实现在SDWebImage的SDWebImageImageIOCoder.m文件的sd_decompressedAndScaledDownImageWithImage:有用到

- (nullable UIImage *)sd_decompressedAndScaledDownImageWithImage:(nullable UIImage *)image {

 CGContextRef destContext;
    
    // autorelease the bitmap context and all vars to help system to free memory when there are memory warning.
    // on iOS7, do not forget to call [[SDImageCache sharedImageCache] clearMemory];
    @autoreleasepool {
        CGImageRef sourceImageRef = image.CGImage;
        
        CGSize sourceResolution = CGSizeZero;
        sourceResolution.width = CGImageGetWidth(sourceImageRef);
        sourceResolution.height = CGImageGetHeight(sourceImageRef);
        float sourceTotalPixels = sourceResolution.width * sourceResolution.height;
        // Determine the scale ratio to apply to the input image
        // that results in an output image of the defined size.
        // see kDestImageSizeMB, and how it relates to destTotalPixels.
        float imageScale = kDestTotalPixels / sourceTotalPixels;
        CGSize destResolution = CGSizeZero;
        destResolution.width = (int)(sourceResolution.width*imageScale);
        destResolution.height = (int)(sourceResolution.height*imageScale);
        
        // device color space
        CGColorSpaceRef colorspaceRef = SDCGColorSpaceGetDeviceRGB();
        BOOL hasAlpha = SDCGImageRefContainsAlpha(sourceImageRef);
        // iOS display alpha info (BGRA8888/BGRX8888)
        CGBitmapInfo bitmapInfo = kCGBitmapByteOrder32Host;
        bitmapInfo |= hasAlpha ? kCGImageAlphaPremultipliedFirst : kCGImageAlphaNoneSkipFirst;
        
        // kCGImageAlphaNone is not supported in CGBitmapContextCreate.
        // Since the original image here has no alpha info, use kCGImageAlphaNoneSkipLast
        // to create bitmap graphics contexts without alpha info.
        destContext = CGBitmapContextCreate(NULL,
                                            destResolution.width,
                                            destResolution.height,
                                            kBitsPerComponent,
                                            0,
                                            colorspaceRef,
                                            bitmapInfo);
        
        if (destContext == NULL) {
            return image;
        }
        CGContextSetInterpolationQuality(destContext, kCGInterpolationHigh);
        
        // Now define the size of the rectangle to be used for the
        // incremental blits from the input image to the output image.
        // we use a source tile width equal to the width of the source
        // image due to the way that iOS retrieves image data from disk.
        // iOS must decode an image from disk in full width 'bands', even
        // if current graphics context is clipped to a subrect within that
        // band. Therefore we fully utilize all of the pixel data that results
        // from a decoding opertion by achnoring our tile size to the full
        // width of the input image.
        CGRect sourceTile = CGRectZero;
        sourceTile.size.width = sourceResolution.width;
        // The source tile height is dynamic. Since we specified the size
        // of the source tile in MB, see how many rows of pixels high it
        // can be given the input image width.
        sourceTile.size.height = (int)(kTileTotalPixels / sourceTile.size.width );
        sourceTile.origin.x = 0.0f;
        // The output tile is the same proportions as the input tile, but
        // scaled to image scale.
        CGRect destTile;
        destTile.size.width = destResolution.width;
        destTile.size.height = sourceTile.size.height * imageScale;
        destTile.origin.x = 0.0f;
        // The source seem overlap is proportionate to the destination seem overlap.
        // this is the amount of pixels to overlap each tile as we assemble the ouput image.
        float sourceSeemOverlap = (int)((kDestSeemOverlap/destResolution.height)*sourceResolution.height);
        CGImageRef sourceTileImageRef;
        // calculate the number of read/write operations required to assemble the
        // output image.
        int iterations = (int)( sourceResolution.height / sourceTile.size.height );
        // If tile height doesn't divide the image height evenly, add another iteration
        // to account for the remaining pixels.
        int remainder = (int)sourceResolution.height % (int)sourceTile.size.height;
        if(remainder) {
            iterations++;
        }
        // Add seem overlaps to the tiles, but save the original tile height for y coordinate calculations.
        float sourceTileHeightMinusOverlap = sourceTile.size.height;
        sourceTile.size.height += sourceSeemOverlap;
        destTile.size.height += kDestSeemOverlap;
        for( int y = 0; y < iterations; ++y ) {
            @autoreleasepool {
                sourceTile.origin.y = y * sourceTileHeightMinusOverlap + sourceSeemOverlap;
                destTile.origin.y = destResolution.height - (( y + 1 ) * sourceTileHeightMinusOverlap * imageScale + kDestSeemOverlap);
                sourceTileImageRef = CGImageCreateWithImageInRect( sourceImageRef, sourceTile );
                if( y == iterations - 1 && remainder ) {
                    float dify = destTile.size.height;
                    destTile.size.height = CGImageGetHeight( sourceTileImageRef ) * imageScale;
                    dify -= destTile.size.height;
                    destTile.origin.y += dify;
                }
                CGContextDrawImage( destContext, destTile, sourceTileImageRef );
                CGImageRelease( sourceTileImageRef );
            }
        }
        
        CGImageRef destImageRef = CGBitmapContextCreateImage(destContext);
        CGContextRelease(destContext);
        if (destImageRef == NULL) {
            return image;
        }
        UIImage *destImage = [[UIImage alloc] initWithCGImage:destImageRef scale:image.scale orientation:image.imageOrientation];
        CGImageRelease(destImageRef);
        if (destImage == nil) {
            return image;
        }
        return destImage;
    }

}

优化GPU

尽量避免短时间内大量图片的显示，尽可能将多张图片合成一张进行显示(每张图都是大图就算了)
GPU能处理的最大纹理尺寸是4096x4096，一旦超过这个尺寸，就需要更多的时间处理，这样CPU的处理空间就会变小，所以纹理尽量不要超过这个尺寸
尽量减少视图数量和层次
减少透明的视图(alpha<1),不透明的就设置opaque属性为YES;
opaque: 确定视图是否透明的布尔值
尽量避免出现离屏渲染

更多相关内容：启动时间、耗电量监控、使用时崩溃率、流量监控、网络状况监控、等等。，由于篇幅太长，将作为第二篇文中发出，欢迎交流探讨。

iOS 底层 - 性能优化之启动和电池能耗
 iOS 底层 - 性能优化之安装包瘦身(App Thinning)

参考文章：

青苹果园

最后编辑于：2020.05.19 12:52:39

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 203,547评论 6赞 477
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 85,399评论 2赞 381
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 150,428评论 0赞 337
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,599评论 1赞 274
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,612评论 5赞 365
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,577评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,941评论 3赞 395
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,603评论 0赞 258
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,852评论 1赞 297
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,605评论 2赞 321
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,693评论 1赞 329
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,375评论 4赞 318
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,955评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,936评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,172评论 1赞 259
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 43,970评论 2赞 349
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,414评论 2赞 342