跟coredump文件一样,ramdump文件也是内存转储文件,但是ramdump文件更大,因为它几乎是对整个物理内存的镜像,除了一些security类型的memory抓不出来之外,几乎所有的dram都被抓下来,有些问题的复现概率低,而且有些问题是由于踩内存导致的,这种问题靠log往往是无法分析出来的,所以如果可以在问题发生时候把内存镜像保存下来,对于分析问题是很有帮助的。
MTK平台抓ramdump的方法
因为每个平台抓取ramdump的方式都不一样,所以这里以MTK平台作为示例,在MTK平台上面,user/user debug build抓取ramdump的方法如下所示,
另外由于MTK平台和Android、Kernel的升级,下面的这个方式也不一定会有用,如果抓取不了可以咨询芯片厂家。
第1步:修改LK的make file "bootable/bootloader/lk/app/mt_boot/rules.mk"
build_mt_ramdump := yes ##原先是no,这里修改为yes
ifneq ($(TARGET_BUILD_VARIANT),user)
ifeq ($(ARCH_HAVE_MT_RAMDUMP),yes)
build_mt_ramdump := yes
endif
endif
第2步:修改user-build kernel config(kernel/arch/arm/configs/xxx_defconfig或
kernel/arch/arm64/configs/xxx_defconfig)把(没有对应的config则不需要修改)
# CONFIG_MTK_AEE_MRDUMP is not set
# CONFIG_HAVE_DDR_RESERVE_MODE is not set
修改为enable
CONFIG_MTK_AEE_MRDUMP=y
CONFIG_HAVE_DDR_RESERVE_MODE=y
第3步:编译下载后复现时请打开mtklogger,如果没有打开也不会抓ramdump。
crash工具介绍
同GDB工具一样,crash也是一个很有用的分析工具,它的官网crash,当前最新的版本为7.2.3,最新的版本支持ARM64平台,将代码下载下来之后解压,编译支持arm64的版本
taotaomami@taotaomami$ ls
alpha.c dev.c gdb_interface.c lkcd_dump_v7.h lkcd_x86_trace.c netdump.h s390dbf.c unwind.c vmware_vmss.c xen_hyper_dump_tables.c
arm64.c diskdump.c global_data.c lkcd_dump_v8.h lkcd_x86_trace.h ppc64.c s390_dump.c unwind_decoder.c vmware_vmss.h xen_hyper_global_data.c
arm.c diskdump.h help.c lkcd_fix_mem.c main.c ppc.c s390x.c unwind.h x86_64.c
binqQtgK4pQ66.bin extensions ia64.c lkcd_fix_mem.h makedumpfile.c qemu.c sadump.c unwind_i.h x86.c
build_data.c extensions.c ibm_common.h lkcd_v1.c makedumpfile.h qemu-load.c sadump.h unwind_x86_32_64.c xen_dom0.c
cmdline.c filesys.c ipcs.c lkcd_v2_v3.c Makefile qemu-load.h sparc64.c unwind_x86_64.h xen_dom0.h
configure gdb-7.6 kernel.c lkcd_v5.c memory.c ramdump.c symbols.c unwind_x86.h xendump.c
configure.c gdb-7.6.patch kvmdump.c lkcd_v7.c memory_driver README task.c vas_crash.h xendump.h
COPYING3 gdb-7.6-ppc64le-support.patch kvmdump.h lkcd_v8.c mips.c remote.c test.c va_server.c xen_hyper.c
crash.8 gdb-7.6.tar.gz lkcd_common.c lkcd_vmdump_v1.h net.c rse.h tools.c va_server.h xen_hyper_command.c
defs.h gdb.files lkcd_dump_v5.h lkcd_vmdump_v2_v3.h netdump.c s390.c unwind_arm.c va_server_v1.c xen_hyper_defs.h
taotaomami@taotaomami$ make target=arm64
TARGET: ARM64
CRASH: 7.1.7++
GDB: 7.6
gcc -g -O2 -I. -I. -I./common -I./config -DLOCALEDIR="\"/usr/local/share/locale\"" -DCRASH_MERGE -DHAVE_CONFIG_H -I./../include/opcode -I./../opcodes/.. -I./../readline/.. -I../bfd -I./../bfd -I./../include -I../libdecnumber -I./../libdecnumber -I./gnulib/import -Ibuild-gnulib/import -DTUI=1 -Wall -Wdeclaration-after-statement -Wpointer-arith -Wformat-nonliteral -Wno-pointer-sign -Wno-unused -Wunused-value -Wunused-function -Wno-switch -Wno-char-subscripts -Wmissing-prototypes -Wdeclaration-after-statement -Wempty-body -c -o symtab.o -MT symtab.o -MMD -MP -MF .deps/symtab.Tpo symtab.c
Making init.c
gcc -g -O2 -I. -I. -I./common -I./config -DLOCALEDIR="\"/usr/local/share/locale\"" -DCRASH_MERGE -DHAVE_CONFIG_H -I./../include/opcode -I./../opcodes/.. -I./../readline/.. -I../bfd -I./../bfd -I./../include -I../libdecnumber -I./../libdecnumber -I./gnulib/import -Ibuild-gnulib/import -DTUI=1 -Wall -Wdeclaration-after-statement -Wpointer-arith -Wformat-nonliteral -Wno-pointer-sign -Wno-unused -Wunused-value -Wunused-function -Wno-switch -Wno-char-subscripts -Wmissing-prototypes -Wdeclaration-after-statement -Wempty-body -c -o init.o -MT init.o -MMD -MP -MF .deps/init.Tpo init.c
cc -c -g -DARM64 -DGDB_7_6 build_data.c
cc -c -g -DARM64 -DGDB_7_6 main.c
cc -c -g -DARM64 -DGDB_7_6 tools.c
cc -c -g -DARM64 -DGDB_7_6 global_data.c
cc -c -g -DARM64 -DGDB_7_6 memory.c
cc -c -g -DARM64 -DGDB_7_6 filesys.c
cc -c -g -DARM64 -DGDB_7_6 help.c
cc -c -g -DARM64 -DGDB_7_6 task.c
cc -c -g -DARM64 -DGDB_7_6 kernel.c
cc -c -g -DARM64 -DGDB_7_6 test.c
cc -c -g -DARM64 -DGDB_7_6 gdb_interface.c
cc -c -g -DARM64 -DGDB_7_6 net.c
cc -c -g -DARM64 -DGDB_7_6 dev.c
cc -c -g -DARM64 -DGDB_7_6 alpha.c
cc -c -g -DARM64 -DGDB_7_6 x86.c -DMCLX
编译完成之后会在当前目录下生成crash可执行文件,执行这个文件就可以分析ramdump了,分析ramdump的时候除了ramdump文件之外,还需要一个vmlinux文件,这个文件就是内核的symbole文件。
taotaomami@taotaomami$ crash vmlinux SYS_COREDUMP
crash 7.1.7++
Copyright (C) 2002-2016 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-unknown-linux-gnu --target=aarch64-elf-linux"...
跟GDB工具一样,crash也有自己的命令帮助,这里就不去赘述了,感兴趣的童鞋可以一个一个的去看,这里还是跟前面一样通过对一个问题的分析来介绍crash、ramdump是如何帮助我们分析问题的。
crash> help
* files mach repeat timer
alias foreach mod runq tree
ascii fuser mount search union
bt gdb net set vm
btop help p sig vtop
dev ipcs ps struct waitq
dis irq pte swap whatis
eval kmem ptob sym wr
exit list ptov sys q
extend log rd task
crash version: 7.1.7++ gdb version: 7.6
For help on any command above, enter "help <command>".
For help on input options, enter "help input".
For help on output options, enter "help output".
问题背景
在某个机型上面,跑monkey,跑一段时间之后差不多所有机器都会卡住,卡住的时候没法连接adb和串口,更坑人的是重启之后发现SD卡下面保存log的目录只保留了很少的一部分,后面的log都没有了,我们只能通过/data/anr下面的trace文件看到一点信息
"Binder_1" prio=5 tid=8 Blocked
| group="main" sCount=1 dsCount=0 obj=0x12c5c0a0 self=0x7fabe09400
| sysTid=1042 nice=0 cgrp=default sched=0/0 handle=0x7fabdff440
| state=S schedstat=( 155829476357 56801688642 366831 ) utm=11187 stm=4395 core=1 HZ=100
| stack=0x7fabd04000-0x7fabd06000 stackSize=1009KB
| held mutexes=
at com.android.server.MountService.getVolumeList(MountService.java:3139)
- waiting to lock <0x07a7279e> (a java.lang.Object) held by thread 86
at android.os.storage.IMountService$Stub.onTransact(IMountService.java:1751)
at android.os.Binder.execTransact(Binder.java:462)
"Binder_2" prio=5 tid=9 Blocked
| group="main" sCount=1 dsCount=0 obj=0x12c5f0a0 self=0x7fa2fadc00
| sysTid=1043 nice=0 cgrp=default sched=0/0 handle=0x7fabd01440
| state=S schedstat=( 152468392027 56015549452 365405 ) utm=11086 stm=4160 core=0 HZ=100
| stack=0x7fabc06000-0x7fabc08000 stackSize=1009KB
| held mutexes=
at com.android.server.MountService.getVolumeList(MountService.java:3139)
- waiting to lock <0x07a7279e> (a java.lang.Object) held by thread 86
at android.os.storage.IMountService$Stub.onTransact(IMountService.java:1751)
at android.os.Binder.execTransact(Binder.java:462)
"Binder_6" prio=5 tid=86 Native
| group="main" sCount=1 dsCount=0 obj=0x1404b0a0 self=0x7f92fac600
| sysTid=1691 nice=0 cgrp=default sched=0/0 handle=0x7f929bf440
| state=S schedstat=( 149288745355 56180291820 364873 ) utm=10576 stm=4352 core=1 HZ=100
| stack=0x7f928c4000-0x7f928c6000 stackSize=1009KB
| held mutexes=
at libcore.io.Posix.statvfs(Native method)
at libcore.io.BlockGuardOs.statvfs(BlockGuardOs.java:298)
at java.io.File.getTotalSpace(File.java:1126)
at android.os.storage.StorageManager.getStorageLowBytes(StorageManager.java:978)
at android.os.storage.VolumeInfo.buildStorageVolume(VolumeInfo.java:371)
at com.android.server.MountService.getVolumeList(MountService.java:3146)
- locked <0x07a7279e> (a java.lang.Object)
at android.os.storage.IMountService$Stub.onTransact(IMountService.java:1751)
at android.os.Binder.execTransact(Binder.java:462)
从这里的信息来看,system_server所有的binder线程都卡住了,并且大多数都是卡在获取SD容量大小的接口上面,综合前面SD卡没法保留log以及这里的trace文件信息,怀疑sd卡工作出了问题,当时sd卡用的是fuse文件系统,fuse文件系统在user空间有一个守护进程名字就叫sdcard,但是由于连不了adb和串口,我们无法得知当时sdcard进程的状态和其他信息,由于这个问题复现概率还算高,并且卡住是由于发生了system_server SWT,所以决定在system_server进程发生SWT的时候,主动让kernel crash,抓取ramdump文件来分析。
echo c > /proc/sysrq-trigger 可以让kernel发生异常重启
因为fuse文件系统分为user空间和kernel空间两层,而如果kernel层出问题的话,那kernel层其他模块很大概率也会出问题,但是从几次问题现场来看,kernel应该还是工作正常的,所以将目光放到了user空间层的sdcard守护进程上面:
过滤所有D状态的进程, UN = UNINTERRUPTIBLE
crash> ps | grep UN
116 2 1 ffffffc003274000 UN 0.0 0 0 [display_esd_che]
121 2 0 ffffffc0b6202000 UN 0.0 0 0 [display_idle_de]
176 2 0 ffffffc0b5334000 UN 0.0 0 0 [hang_detect]
206 2 0 ffffffc0b4b52000 UN 0.0 0 0 [C2K_TX_ASC]
207 2 0 ffffffc0b45a2000 UN 0.0 0 0 [C2K_RX_ASC]
229 2 0 ffffffc0b46a8000 UN 0.0 0 0 [bat_routine_thr]
230 2 0 ffffffc0b45d0000 UN 0.0 0 0 [bat_update_thre]
3046 315 5 ffffffc095fdd000 UN 0.0 20136 2056 sdcard
3048 315 4 ffffffc0adade000 UN 0.0 20136 2056 sdcard
21528 2 0 ffffffc01e2df000 UN 0.0 0 0 [kworker/u16:6]
其中3046和3048这两个sdcard线程的状态都为D,对比一下正常的机器,这两个线程的状态是不正常的,获取这两个线程的堆栈信息:
crash> bt 3046
PID: 3046 TASK: ffffffc095fdd000 CPU: 5 COMMAND: "sdcard"
#0 [ffffffc095e43880] __switch_to at ffffffc000087204
#1 [ffffffc095e438b0] __schedule at ffffffc000ae4164
#2 [ffffffc095e43a60] schedule at ffffffc000ae4590
#3 [ffffffc095e43a70] schedule_preempt_disabled at ffffffc000ae45f0
#4 [ffffffc095e43a80] __mutex_lock_slowpath at ffffffc000ae5da4
#5 [ffffffc095e43ae0] mutex_lock at ffffffc000ae5eb8
#6 [ffffffc095e43b00] fuse_reverse_inval_entry at ffffffc0002ab150
#7 [ffffffc095e43b50] fuse_notify_delete at ffffffc0002a6078
#8 [ffffffc095e43bc0] fuse_dev_do_write at ffffffc0002a7a24
#9 [ffffffc095e43c70] fuse_dev_write at ffffffc0002a81a4
#10 [ffffffc095e43cf0] do_sync_readv_writev at ffffffc00019fb28
#11 [ffffffc095e43d70] do_readv_writev at ffffffc0001a1444
#12 [ffffffc095e43e70] vfs_writev at ffffffc0001a1524
#13 [ffffffc095e43e80] sys_writev at ffffffc0001a1620
#14 [ffffffc095e43ed0] el0_svc_naked at ffffffc00008542c
PC: 0000007f9deef9c0 LR: 0000007f9e105a50 SP: 0000007f9dbbd260
X29: 0000007f9dbbd260 X28: 0000007f9dbbf360 X27: 000000000000005d
X26: 00000000009eaa0a X25: 0000007f9dbbf3d0 X24: 0000007f9e108360
X23: 0000007fe5ccd56c X22: 0000007fe5ccb448 X21: 0000007f9d546e00
X20: 0000007f9d542c00 X19: 0000000000000006 X18: 0000007f9dd43c00
X17: 0000007f9deef9b8 X16: 0000007f9e119d70 X15: 000000000000005a
X14: 0000007fe5ccb440 X13: 0000007fe5cca440 X12: 0000000000000000
X11: 0101010101010101 X10: 0000007fe5ccd56c X9: 0000000000000018
X8: 0000000000000042 X7: 0000000000000010 X6: 000000000000003d
X5: 0000000000000015 X4: 0000000000000014 X3: 0000007f9dbbd2a8
X2: 0000000000000003 X1: 0000007f9dbbd2d0 X0: 000000000000000a
ORIG_X0: 000000000000000a SYSCALLNO: 42 PSTATE: 00000000
crash> bt 3048
PID: 3048 TASK: ffffffc0adade000 CPU: 4 COMMAND: "sdcard"
#0 [ffffffc0ac517880] __switch_to at ffffffc000087204
#1 [ffffffc0ac5178b0] __schedule at ffffffc000ae4164
#2 [ffffffc0ac517a60] schedule at ffffffc000ae4590
#3 [ffffffc0ac517a70] schedule_preempt_disabled at ffffffc000ae45f0
#4 [ffffffc0ac517a80] __mutex_lock_slowpath at ffffffc000ae5da4
#5 [ffffffc0ac517ae0] mutex_lock at ffffffc000ae5eb8
#6 [ffffffc0ac517b00] fuse_reverse_inval_entry at ffffffc0002ab150
#7 [ffffffc0ac517b50] fuse_notify_delete at ffffffc0002a6078
#8 [ffffffc0ac517bc0] fuse_dev_do_write at ffffffc0002a7a24
#9 [ffffffc0ac517c70] fuse_dev_write at ffffffc0002a81a4
#10 [ffffffc0ac517cf0] do_sync_readv_writev at ffffffc00019fb28
#11 [ffffffc0ac517d70] do_readv_writev at ffffffc0001a1444
#12 [ffffffc0ac517e70] vfs_writev at ffffffc0001a1524
#13 [ffffffc0ac517e80] sys_writev at ffffffc0001a1620
#14 [ffffffc0ac517ed0] el0_svc_naked at ffffffc00008542c
PC: 0000007f9deef9c0 LR: 0000007f9e105a50 SP: 0000007f9d9c1260
X29: 0000007f9d9c1260 X28: 0000007f9d9c3360 X27: 0000007f9d6ce3e0
X26: 0000000000120edc X25: 0000007f9d9c33d0 X24: 0000007f9e108360
X23: 0000007fe5d4d62c X22: 0000007fe5cc9418 X21: 0000007f9d546e00
X20: 0000007f9d542b80 X19: 0000000000000006 X18: 00000000000001fd
X17: 0000007f9deef9b8 X16: 0000007f9e119d70 X15: 0000000000000000
X14: 00000000000081b4 X13: 0000000000000000 X12: 0000000000000000
X11: 0101010101010101 X10: 0000007fe5d4d62c X9: 0000000000000018
X8: 0000000000000042 X7: 0000000000000010 X6: 000000000000003d
X5: 0000000000000015 X4: 0000000000000014 X3: 0000007f9d9c12a8
X2: 0000000000000003 X1: 0000007f9d9c12d0 X0: 0000000000000004
ORIG_X0: 0000000000000004 SYSCALLNO: 42 PSTATE: 00000000
从堆栈来看,这两个线程都是在删除文件,并且最终都在内核态等锁,其中mutex_lock和mutex的定义如下:
crash> whatis mutex_lock
void mutex_lock(struct mutex *);
crash> whatis struct mutex
struct mutex {
atomic_t count;
spinlock_t wait_lock;
struct list_head wait_list;
struct task_struct *owner;
struct optimistic_spin_queue osq;
}
SIZE: 40
mutex里面记录了这个锁被谁持有,所以如果可以找到mutex的内容,那么就可以找到这个锁的持有者了,接下来介绍一种简单的寻找锁持有者的方法,反汇编mutex_lock的方法
crash> dis mutex_lock
0xffffffc000ae5e74 <mutex_lock>: stp x29, x30, [sp,#-32]!
0xffffffc000ae5e78 <mutex_lock+4>: mov x29, sp
0xffffffc000ae5e7c <mutex_lock+8>: str x19, [sp,#16]
0xffffffc000ae5e80 <mutex_lock+12>: mov x19, x0 //将参数mutex传给了X19,
0xffffffc000ae5e84 <mutex_lock+16>: ldxr w1, [x0]
0xffffffc000ae5e88 <mutex_lock+20>: sub w1, w1, #0x1
0xffffffc000ae5e8c <mutex_lock+24>: stlxr w2, w1, [x0]
0xffffffc000ae5e90 <mutex_lock+28>: cbnz w2, 0xffffffc000ae5e84 <mutex_lock+16>
0xffffffc000ae5e94 <mutex_lock+32>: dmb ish
0xffffffc000ae5e98 <mutex_lock+36>: tbnz w1, #31, 0xffffffc000ae5eb8 <mutex_lock+68>
0xffffffc000ae5e9c <mutex_lock+40>: mov x0, sp
0xffffffc000ae5ea0 <mutex_lock+44>: and x0, x0, #0xffffffffffffc000
0xffffffc000ae5ea4 <mutex_lock+48>: ldr x0, [x0,#16]
0xffffffc000ae5ea8 <mutex_lock+52>: str x0, [x19,#24]
0xffffffc000ae5eac <mutex_lock+56>: ldr x19, [sp,#16]
0xffffffc000ae5eb0 <mutex_lock+60>: ldp x29, x30, [sp],#32
0xffffffc000ae5eb4 <mutex_lock+64>: ret
0xffffffc000ae5eb8 <mutex_lock+68>: bl 0xffffffc000ae5d10 <__mutex_lock_slowpath> //mutex_lock从这里调用的__mutex_lock_slowpath
0xffffffc000ae5ebc <mutex_lock+72>: mov x0, sp
0xffffffc000ae5ec0 <mutex_lock+76>: and x0, x0, #0xffffffffffffc000
0xffffffc000ae5ec4 <mutex_lock+80>: ldr x0, [x0,#16]
0xffffffc000ae5ec8 <mutex_lock+84>: str x0, [x19,#24]
0xffffffc000ae5ecc <mutex_lock+88>: ldr x19, [sp,#16]
0xffffffc000ae5ed0 <mutex_lock+92>: ldp x29, x30, [sp],#32
0xffffffc000ae5ed4 <mutex_lock+96>: ret
从0xffffffc000ae5e80到0xffffffc000ae5eb8这一段的指令里面X19寄存器都没有再被修改,所以找出X19寄存器的内容就是struct mutex的地址,再看__mutex_lock_slowpath的反汇编代码:
crash> dis __mutex_lock_slowpath
0xffffffc000ae5d10 <__mutex_lock_slowpath>: stp x29, x30, [sp,#-96]!
0xffffffc000ae5d14 <__mutex_lock_slowpath+4>: mov x1, sp
0xffffffc000ae5d18 <__mutex_lock_slowpath+8>: mov x29, sp
0xffffffc000ae5d1c <__mutex_lock_slowpath+12>: and x1, x1, #0xffffffffffffc000
0xffffffc000ae5d20 <__mutex_lock_slowpath+16>: stp x19, x20, [sp,#16] //把x19寄存器存放到了[sp,#16]
0xffffffc000ae5d24 <__mutex_lock_slowpath+20>: stp x21, x22, [sp,#32]
0xffffffc000ae5d28 <__mutex_lock_slowpath+24>: stp x23, x24, [sp,#48]
0xffffffc000ae5d2c <__mutex_lock_slowpath+28>: mov x19, x0
0xffffffc000ae5d30 <__mutex_lock_slowpath+32>: mov w0, #0x1 // #1
0xffffffc000ae5d34 <__mutex_lock_slowpath+36>: ldr x22, [x1,#16]
0xffffffc000ae5d38 <__mutex_lock_slowpath+40>: bl 0xffffffc0000c7bac <preempt_count_add>
0xffffffc000ae5d3c <__mutex_lock_slowpath+44>: mov x1, #0x0 // #0
0xffffffc000ae5d40 <__mutex_lock_slowpath+48>: mov x0, x19
0xffffffc000ae5d44 <__mutex_lock_slowpath+52>: mov w2, w1
0xffffffc000ae5d48 <__mutex_lock_slowpath+56>: bl 0xffffffc0000e6554 <mutex_optimistic_spin>
0xffffffc000ae5d4c <__mutex_lock_slowpath+60>: uxtb w0, w0
0xffffffc000ae5d50 <__mutex_lock_slowpath+64>: cbnz w0, 0xffffffc000ae5e08 <__mutex_lock_slowpath+248>
......
0xffffffc000ae5e60 <__mutex_lock_slowpath+336>: ldp x19, x20, [sp,#16]
0xffffffc000ae5e64 <__mutex_lock_slowpath+340>: ldp x21, x22, [sp,#32]
0xffffffc000ae5e68 <__mutex_lock_slowpath+344>: ldp x23, x24, [sp,#48]
0xffffffc000ae5e6c <__mutex_lock_slowpath+348>: ldp x29, x30, [sp],#96
0xffffffc000ae5e70 <__mutex_lock_slowpath+352>: ret
从上面的反汇编代码来看,struct mutex的地址被放到了与__mutex_lock_slowpath栈指针偏移16字节的地方,而__mutex_lock_slowpath的栈指针为ffffffc095e43a80.
4 [ffffffc095e43a80] __mutex_lock_slowpath at ffffffc000ae5da4
5 [ffffffc095e43ae0] mutex_lock at ffffffc000ae5eb8
导出ffffffc095e43a80到ffffffc095e43ae0这段内存的内容
crash> rd ffffffc095e43a80 -e ffffffc095e43ae0
ffffffc095e43a80: ffffffc095e43ae0 ffffffc000ae5ebc .:.......^......
ffffffc095e43a90: ffffffc00f387b98 00000000ffffffec .{8.............
ffffffc095e43aa0: ffffffc00f387b98 ffffffc095e43b98 .{8......;......
ffffffc095e43ab0: 0000007f9d542c00 000000000000003d .,T.....=.......
ffffffc095e43ac0: ffffffc095e43ae0 ffffffc00f387ba0 .:.......{8.....
ffffffc095e43ad0: ffffffc00f387ba0 ffffffc095fdd000 .{8.............
可以得到struct mutex的地址为ffffffc00f387b98,利用这种方式可以得到你想要的变量的内容,打印这个锁可以获取这个锁的持有者:
crash> struct mutex ffffffc00f387b98
struct mutex {
count = {
counter = -1
},
wait_lock = {
{
rlock = {
raw_lock = {
owner = 1,
next = 1
}
}
}
},
wait_list = {
next = 0xffffffc095e43ac8,
prev = 0xffffffc095e43ac8
},
owner = 0xffffffc00bf4e000,
osq = {
tail = {
counter = 0
}
}
}
crash> struct task_struct.pid 0xffffffc00bf4e000
pid = 5446
得到3046这个sdcard线程正在等的锁的持有者是5446这个进程,接下来获取5446的堆栈:
crash> bt 5446
PID: 5446 TASK: ffffffc00bf4e000 CPU: 4 COMMAND: "io"
#0 [ffffffc0079b3a70] __switch_to at ffffffc000087204
#1 [ffffffc0079b3aa0] __schedule at ffffffc000ae4164
#2 [ffffffc0079b3c50] schedule at ffffffc000ae4590
#3 [ffffffc0079b3c60] wait_answer_interruptible at ffffffc0002a4f34
#4 [ffffffc0079b3cc0] __fuse_request_send at ffffffc0002a5424
#5 [ffffffc0079b3d30] fuse_request_send at ffffffc0002a83d4
#6 [ffffffc0079b3d40] fuse_unlink at ffffffc0002a8fb0
#7 [ffffffc0079b3d80] vfs_unlink at ffffffc0001af044
#8 [ffffffc0079b3dc0] do_unlinkat at ffffffc0001af2f4
#9 [ffffffc0079b3ec0] sys_unlinkat at ffffffc0001af350
#10 [ffffffc0079b3ed0] el0_svc_naked at ffffffc00008542c
PC: f751c298 LR: f7513639 SP: e9570150 PSTATE: 60070010
X12: 33116380 X11: 32e89670 X10: 348c61f0 X9: eeed1200
X8: 348c61f0 X7: 00000148 X6: eeed63c0 X5: 713b8ea0
X4: df154348 X3: 000081b0 X2: 00000000 X1: df154348
X0: ffffff9c
从这个堆栈来看,这个线程也是在删除文件....... ,但是这里是发送文件删除请求给fuse文件系统,fuse文集系统kernel层代码接收到请求的时候需要重新把请求路由给上层的sdcard进程,所以上面的3046和3048其中有一个线程就是在处理5446的删除文件请求,用同样的方法得到3048进程的锁是被12051获取了,而从它的堆栈来看,它也是在删除文件
crash> struct task_struct.pid 0xffffffc0082ac000
pid = 12051
crash> bt 12051
PID: 12051 TASK: ffffffc0082ac000 CPU: 5 COMMAND: "RxComputationSc"
#0 [ffffffc00ab7ba70] __switch_to at ffffffc000087204
#1 [ffffffc00ab7baa0] __schedule at ffffffc000ae4164
#2 [ffffffc00ab7bc50] schedule at ffffffc000ae4590
#3 [ffffffc00ab7bc60] wait_answer_interruptible at ffffffc0002a4f34
#4 [ffffffc00ab7bcc0] __fuse_request_send at ffffffc0002a5374
#5 [ffffffc00ab7bd30] fuse_request_send at ffffffc0002a83d4
#6 [ffffffc00ab7bd40] fuse_unlink at ffffffc0002a8fb0
#7 [ffffffc00ab7bd80] vfs_unlink at ffffffc0001af044
#8 [ffffffc00ab7bdc0] do_unlinkat at ffffffc0001af2f4
#9 [ffffffc00ab7bec0] sys_unlinkat at ffffffc0001af350
#10 [ffffffc00ab7bed0] el0_svc_naked at ffffffc00008542c
PC: 0000007f89fc7930 LR: 0000007f89fb72d8 SP: 0000007f6d1be4e0
X29: 0000007f6d1be4e0 X28: 000000001341e240 X27: 0000000013424180
X26: 000000001341c4a0 X25: 000000001341e240 X24: 0000000013210ea0
X23: 000000001341ba90 X22: 0000000013424180 X21: 0000007f6d1be5c8
X20: 0000007f80254c00 X19: 0000007f69a44980 X18: 0000007f86444b78
X17: 0000007f89fc7928 X16: 0000007f8a035510 X15: 0000000000000000
X14: 0000000000000001 X13: 0000000000000017 X12: 0000000000000000
X11: 0000000000000036 X10: 0000007f8643c9d0 X9: 0000007f8643a000
X8: 0000000000000023 X7: 0000000000430000 X6: 0000000000000000
X5: 000000000000002f X4: 0000000000430000 X3: 0000000000000100
X2: 0000000000000000 X1: 0000007f69a44980 X0: 00000000ffffff9c
ORIG_X0: 00000000ffffff9c SYSCALLNO: 23 PSTATE: 20000000
这个问题分析到这,大致可以知道是有两个线程同时向fuse文件系统删除文件,然后导致系统卡住,隐约可以感觉到这是一个死锁问题,只不过这个不是普通之间的线程死锁,而是进程之间,user空间和kernel空间的死锁问题,既然都是在删除文件,那么到底是在删除什么文件呢?类似前面获取mutex地址的方式一样,我们从5446和12051这两个线程的上下文里面获取它们正在删除的文件
crash> whatis fuse_unlink
int fuse_unlink(struct inode *, struct dentry *); //dentry里面就包含了路径信息
5446这个线程正在删除的文件信息
crash> bt 5446
PID: 5446 TASK: ffffffc00bf4e000 CPU: 4 COMMAND: "io"
#0 [ffffffc0079b3a70] __switch_to at ffffffc000087204
#1 [ffffffc0079b3aa0] __schedule at ffffffc000ae4164
#2 [ffffffc0079b3c50] schedule at ffffffc000ae4590
#3 [ffffffc0079b3c60] wait_answer_interruptible at ffffffc0002a4f34
#4 [ffffffc0079b3cc0] __fuse_request_send at ffffffc0002a5424
#5 [ffffffc0079b3d30] fuse_request_send at ffffffc0002a83d4
#6 [ffffffc0079b3d40] fuse_unlink at ffffffc0002a8fb0
#7 [ffffffc0079b3d80] vfs_unlink at ffffffc0001af044
#8 [ffffffc0079b3dc0] do_unlinkat at ffffffc0001af2f4
#9 [ffffffc0079b3ec0] sys_unlinkat at ffffffc0001af350
#10 [ffffffc0079b3ed0] el0_svc_naked at ffffffc00008542c
PC: f751c298 LR: f7513639 SP: e9570150 PSTATE: 60070010
X12: 33116380 X11: 32e89670 X10: 348c61f0 X9: eeed1200
X8: 348c61f0 X7: 00000148 X6: eeed63c0 X5: 713b8ea0
X4: df154348 X3: 000081b0 X2: 00000000 X1: df154348
X0: ffffff9c
crash> rd ffffffc0079b3cc0 -e ffffffc0079b3d30
ffffffc0079b3cc0: ffffffc0079b3d30 ffffffc0002a83d8 0=........*.....
ffffffc0079b3cd0: ffffffc0235981a0 ffffffc06a521e40 ..Y#....@.Rj....
ffffffc0079b3ce0: ffffffc098daa800 ffffffc06a521e40 ........@.Rj....
ffffffc0079b3cf0: ffffffc00f387b00 ffffffc0079b3e20 .{8..... >......
ffffffc0079b3d00: ffffffc0079b3d10 ffffffc0001aa028 .=......(.......
ffffffc0079b3d10: ffffffc0079b3d60 ffffffc0002c9bc8 `=........,.....
ffffffc0079b3d20: ffffffc06a521e40 ffffffc06a521e40 @.Rj....@.Rj....
crash> struct dentry ffffffc06a521e40
struct dentry {
d_flags = 4718788,
d_seq = {
sequence = 2
},
d_hash = {
next = 0x0,
pprev = 0xffffffc0b7c77f28
},
d_parent = 0xffffffc0052f96c0,
d_name = {
{
{
hash = 114945031,
len = 20
},
hash_len = 86014290951
},
name = 0xffffffc06a521e78 "10048316241362964103"
},
d_inode = 0xffffffc033afcb00,
d_iname = "10048316241362964103\000\000s\000l\000bytes",
d_lockref = {
{
lock_count = 4295294981,
{
lock = {
{
rlock = {
raw_lock = {
owner = 5,
next = 5
}
}
}
},
count = 1
}
}
},
d_op = 0xffffffc000b3fec0 <fuse_dentry_operations>,
d_sb = 0xffffffc098daa000,
d_time = 4299504276,
d_fsdata = 0x0,
d_lru = {
next = 0xffffffc095eff088,
prev = 0xffffffc024a50140
},
d_child = {
next = 0xffffffc03bec35d0,
prev = 0xffffffc0052f9760
},
d_subdirs = {
next = 0xffffffc06a521ee0,
prev = 0xffffffc06a521ee0
},
d_u = {
d_alias = {
next = 0x0,
pprev = 0xffffffc033afcc08
},
d_rcu = {
next = 0x0,
func = 0xffffffc033afcc08
}
}
}
12051正在删除的文件信息:
crash> struct dentry ffffffc007383180
struct dentry {
d_flags = 4718788,
d_seq = {
sequence = 2
},
d_hash = {
next = 0x0,
pprev = 0xffffffc0b7d70318
},
d_parent = 0xffffffc03ad70780,
d_name = {
{
{
hash = 3934655135,
len = 20
},
hash_len = 89834001055
},
name = 0xffffffc0073831b8 "15919563341606802260"
},
d_inode = 0xffffffc00398b2c0,
d_iname = "15919563341606802260\000m\000re.bytes",
d_lockref = {
{
lock_count = 4295491592,
{
lock = {
{
rlock = {
raw_lock = {
owner = 8,
next = 8
}
}
}
},
count = 1
}
}
},
d_op = 0xffffffc000b3fec0 <fuse_dentry_operations>,
d_sb = 0xffffffc003290800,
d_time = 4299504227,
d_fsdata = 0x0,
d_lru = {
next = 0xffffffc00acf1d40,
prev = 0xffffffc059711980
},
d_child = {
next = 0xffffffc059711990,
prev = 0xffffffc00acf1d50
},
d_subdirs = {
next = 0xffffffc007383220,
prev = 0xffffffc007383220
},
d_u = {
d_alias = {
next = 0x0,
pprev = 0xffffffc00398b3c8
},
d_rcu = {
next = 0x0,
func = 0xffffffc00398b3c8
}
}
}
上面这么看还是不够直观,在crash工具下面还发现一个很好用的命令files,用这个命令可以很直观的把文件路径打出来,
crash> files -d ffffffc06a521e40
DENTRY INODE SUPERBLK TYPE PATH
ffffffc06a521e40 ffffffc033afcb00 ffffffc098daa000 REG 10048316241362964103
//保存的这个ramdump里面不知道为什么这里打印不出来完整的路径名,
//但是在当时分析的其他ramdump里面是能打印出来,也是在删除com.xxx目录下的文件,
//只是路径前缀不一样,这个与Android M的运行时权限机制有关,像surfaceflinger这种系统服务
//进程和普通的应用进程都去访问SD卡下的同一个文件,但是它们看到的路径却是不一样的,比如
//surfaceflinger 看到的是/storage/runtime/default开头,而普通应用看到的是/mnt/runtime/write
crash> files -d ffffffc007383180
DENTRY INODE SUPERBLK TYPE PATH
ffffffc007383180 ffffffc00398b2c0 ffffffc003290800 REG /storage/runtime/default/emulated/0/com.xxx/15919563341606802260
从上面可以得知3048进程的锁是被12051获取了,而12051正在删除/storage/runtime/default/emulated/0/com.xxx/15919563341606802260这个文件,那我们是否可以从3048这个进程里面获取到它正在删除哪个文件呢?如果从它的上下文里面得到它正在删除10048316241362964103这个文件,那么就可以证明这是一个死锁的问题了,同样还是利用上面那个方法,从3048的堆栈上下文里面去分析它正在删除哪个文件.
crash> bt 3048
PID: 3048 TASK: ffffffc0adade000 CPU: 4 COMMAND: "sdcard"
#0 [ffffffc0ac517880] __switch_to at ffffffc000087204
#1 [ffffffc0ac5178b0] __schedule at ffffffc000ae4164
#2 [ffffffc0ac517a60] schedule at ffffffc000ae4590
#3 [ffffffc0ac517a70] schedule_preempt_disabled at ffffffc000ae45f0
#4 [ffffffc0ac517a80] __mutex_lock_slowpath at ffffffc000ae5da4
#5 [ffffffc0ac517ae0] mutex_lock at ffffffc000ae5eb8
#6 [ffffffc0ac517b00] fuse_reverse_inval_entry at ffffffc0002ab150
#7 [ffffffc0ac517b50] fuse_notify_delete at ffffffc0002a6078
#8 [ffffffc0ac517bc0] fuse_dev_do_write at ffffffc0002a7a24
#9 [ffffffc0ac517c70] fuse_dev_write at ffffffc0002a81a4
#10 [ffffffc0ac517cf0] do_sync_readv_writev at ffffffc00019fb28
#11 [ffffffc0ac517d70] do_readv_writev at ffffffc0001a1444
#12 [ffffffc0ac517e70] vfs_writev at ffffffc0001a1524
#13 [ffffffc0ac517e80] sys_writev at ffffffc0001a1620
#14 [ffffffc0ac517ed0] el0_svc_naked at ffffffc00008542c
crash> whatis fuse_notify_delete
int fuse_notify_delete(struct fuse_conn *, unsigned int, struct fuse_copy_state *);
crash> whatis fuse_reverse_inval_entry
int fuse_reverse_inval_entry(struct super_block *, u64, u64, struct qstr *);
crash> dis fuse_reverse_inval_entry
0xffffffc0002ab108 <fuse_reverse_inval_entry>: stp x29, x30, [sp,#-80]!
0xffffffc0002ab10c <fuse_reverse_inval_entry+4>: mov x29, sp
0xffffffc0002ab110 <fuse_reverse_inval_entry+8>: add x4, x29, #0x50
0xffffffc0002ab114 <fuse_reverse_inval_entry+12>: stp x19, x20, [sp,#16]
0xffffffc0002ab118 <fuse_reverse_inval_entry+16>: stp x21, x22, [sp,#32]
0xffffffc0002ab11c <fuse_reverse_inval_entry+20>: stp x23, x24, [sp,#48]
0xffffffc0002ab120 <fuse_reverse_inval_entry+24>: str x1, [x4,#-8]!
0xffffffc0002ab124 <fuse_reverse_inval_entry+28>: mov x23, x2
0xffffffc0002ab128 <fuse_reverse_inval_entry+32>: adrp x2, 0xffffffc0002b0000 <fuse_writepages+60>
0xffffffc0002ab12c <fuse_reverse_inval_entry+36>: mov x22, x3 //struct qstr参数放到了X22寄存器
0xffffffc0002ab130 <fuse_reverse_inval_entry+40>: add x2, x2, #0xfa8
0xffffffc0002ab134 <fuse_reverse_inval_entry+44>: mov x3, x4
0xffffffc0002ab138 <fuse_reverse_inval_entry+48>: bl 0xffffffc0001ba738 <ilookup5>
0xffffffc0002ab13c <fuse_reverse_inval_entry+52>: mov x19, x0
0xffffffc0002ab140 <fuse_reverse_inval_entry+56>: cbz x0, 0xffffffc0002ab1e4 <fuse_reverse_inval_entry+220>
0xffffffc0002ab144 <fuse_reverse_inval_entry+60>: add x21, x0, #0x98
0xffffffc0002ab148 <fuse_reverse_inval_entry+64>: mov w20, #0xffffffec // #-20
0xffffffc0002ab14c <fuse_reverse_inval_entry+68>: mov x0, x21
0xffffffc0002ab150 <fuse_reverse_inval_entry+72>: bl 0xffffffc000ae5e74 <mutex_lock> //程序从这里跳到mutex_lock继续执行。
.......
<fuse_reverse_inval_entry+112>
继续分析mutex_lock和__mutex_lock_slowpath的汇编代码,发现它们都没有修改X22的值,并且在__mutex_lock_slowpath的汇编里面把X22保存在函数栈里面
crash> dis __mutex_lock_slowpath
0xffffffc000ae5d10 <__mutex_lock_slowpath>: stp x29, x30, [sp,#-96]!
0xffffffc000ae5d14 <__mutex_lock_slowpath+4>: mov x1, sp
0xffffffc000ae5d18 <__mutex_lock_slowpath+8>: mov x29, sp
0xffffffc000ae5d1c <__mutex_lock_slowpath+12>: and x1, x1, #0xffffffffffffc000
0xffffffc000ae5d20 <__mutex_lock_slowpath+16>: stp x19, x20, [sp,#16]
0xffffffc000ae5d24 <__mutex_lock_slowpath+20>: stp x21, x22, [sp,#32] //X22保存在与SP偏移32字节的地方
0xffffffc000ae5d28 <__mutex_lock_slowpath+24>: stp x23, x24, [sp,#48]
......
crash> rd ffffffc0ac517a80 -e ffffffc0ac517ae0
ffffffc0ac517a80: ffffffc0ac517ae0 ffffffc000ae5ebc .zQ......^......
ffffffc0ac517a90: ffffffc04f747098 00000000ffffffec .ptO............
ffffffc0ac517aa0: ffffffc04f747098 ffffffc0ac517b98 .ptO.....{Q.....
ffffffc0ac517ab0: 0000007f9d542b80 000000000000003d .+T.....=.......
ffffffc0ac517ac0: ffffffc0ac517ae0 ffffffc04f7470a0 .zQ......ptO....
ffffffc0ac517ad0: ffffffc04f7470a0 ffffffc0adade000 .ptO............
crash> struct qstr ffffffc0ac517b98
struct qstr {
{
{
hash = 114945031,
len = 20
},
hash_len = 86014290951
},
name = 0xffffffc010143000 "10048316241362964103" //果然3048正在删除10048316241362964103这个文件
}
分析到这里,可以知道这个问题确实是进程间死锁问题,然后根据目前获得的消息,由两个不同权限的进程同时去删除SD卡同一个目录下的文件(例如surfaceflinger和普通app进程),果然复现了这个问题,找到了必现的路径,反馈给BSP有关同事之后, 他们认为是fuse文件系统本身的一个缺陷,但是由于项目紧急,所以当时只是做了workground方案.
ramdump提取coredump文件的方法
既然ramdump文件是整个内存的镜像,那么理论上来讲,从它里面是可以提取单个进程的coredump的,网上搜罗了一堆资料,最后在crash extension modules这个网页里面找到了方法,crash是提供可扩展的,可以为它写一些类似插件的功能,gcore就是其中一款,可以利用这个插件从ramdump里面提取出单个进程的coredump文件.
Create a core dump file of a user-space task that was running in a kernel dumpfile. The module supports the X86, X86_64, ARM, ARM64, MIPS and PPC64 architectures.
If /usr/include/crash/defs.h exists, the module may be built like so:
$ tar xzf crash-gcore-command-1.4.0.tar.gz
$ cd crash-gcore-command-1.4.0
$ make -f gcore.mkAlternatively, defs.h may be copied to the current directory:
$ tar xzf crash-gcore-command-1.4.0.tar.gz
$ cd crash-gcore-command-1.4.0
$ cp <path-to>/defs.h .
$ make -f gcore.mkTo build the module from the top-level crash-<version> directory, enter:
$ tar xzf crash-gcore-command-1.4.0.tar.gz
$ mv crash-gcore-command-1.4.0/* extensions
$ make extensionsThis module's source files are self-contained in crash-gcore-command-1.4.0.tar.gz.
Author: Daisuke Hatayama d.hatayama@jp.fujitsu.com