1. 目标文件的格式
Linux平台的可执行文件、目标文件(.o)、静态库(.a)、动态库(.so)都采用ELF
格式存储
-
ELF(Executable Linkable Format)
文件的类型- 可重定位文件(
Relocatable File
),包含了代码和数据,其中的符号地址是可以在链接过程修正的。目标文件、静态库文件都属于这种类型。 - 可执行文件(
Executable File
),包含了可执行的程序 - 共享目标文件(
Shared Object File
),主要是指动态库,用于链接,或者作为进程的一部分来执行 - Core Dump文件(
CoreDump File
),当进程或系统挂掉的时候,保存进程地址空间和一些其它信息的文件
- 可重定位文件(
-
$ file 文件名
可以用来查看属于何种类型的ELF文件
2. 目标文件的内容
源代码 SimpleSection.c:
int printf( const char *format, ... );
int global_init_var = 84;
int global_uninit_var;
void func( int i )
{
printf( "%d\n", i );
}
int main(void)
{
static int static_var = 85;
static int static_var2;
int a = 1;
int b;
func(static_var + static_var2 + a + b);
return 0;
}
目标文件的内容包含File Header
、各种段(Section)
、符号表
、调试信息
、字符串
等
- 编译后的机器指令存放于
代码段(.text section)
- 编译后的数据放在
数据段
- 初始化的全局变量和static局部变量存放于
.data section
- 未初始化的全局变量和局部变量存放于
.bss section
,bss段只是预留位置而已,并不占据文件空间
- 初始化的全局变量和static局部变量存放于
- 指令和数据分开的好处(部分编译器对未初始化的全局变量也不存放在bss段,只是预留符号而已)
- 程序装载之后,指令和数据被映射到两个虚拟内存区域,数据区域的权限为可读写,指令区域的权限为只读,可以防止指令被有意或者无意的改写
- 当系统中运行着一个程序的多个副本的时候,内存中只需保留一份该程序的指令部分。共享指令
3. 目标文件关键段解析
$ gcc -c SimpleSection.c -o SimpleSection.o -m32
Section Header
$ objdump -h SimpleSection.o
将关键段的基本信息打印出来
SimpleSection.o: file format elf32-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000050 00000000 00000000 00000034 2**2
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
1 .data 00000008 00000000 00000000 00000084 2**2
CONTENTS, ALLOC, LOAD, DATA
2 .bss 00000004 00000000 00000000 0000008c 2**2
ALLOC
3 .rodata 00000004 00000000 00000000 0000008c 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .comment 0000002b 00000000 00000000 00000090 2**0
CONTENTS, READONLY
5 .note.GNU-stack 00000000 00000000 00000000 000000bb 2**0
CONTENTS, READONLY
6 .eh_frame 00000058 00000000 00000000 000000bc 2**2
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
该目标文件有7个section,除了代码段、数据段,还有只读数据段(.rodata)、注释信息段(.comment)、堆栈提示段(.note.GNU-stack)、异常处理帧信息段(.eh_frame)
$ objdump -s -d SimpleSection.o
-s 打印关键段的内容 十六进制, -d 反汇编
SimpleSection.o: file format elf32-i386
Contents of section .text:
0000 5589e583 ec188b45 08894424 04c70424 U......E..D$...$
0010 00000000 e8fcffff ffc9c355 89e583e4 ...........U....
0020 f083ec20 c7442418 01000000 8b150400 ... .D$.........
0030 0000a100 00000001 d0034424 18034424 ..........D$..D$
0040 1c890424 e8fcffff ffb80000 0000c9c3 ...$............
Contents of section .data:
0000 54000000 55000000 T...U...
Contents of section .rodata:
0000 25640a00 %d..
Contents of section .comment:
0000 00474343 3a202855 62756e74 752f4c69 .GCC: (Ubuntu/Li
0010 6e61726f 20342e36 2e332d31 7562756e naro 4.6.3-1ubun
0020 74753529 20342e36 2e3300 tu5) 4.6.3.
Contents of section .eh_frame:
0000 14000000 00000000 017a5200 017c0801 .........zR..|..
0010 1b0c0404 88010000 1c000000 1c000000 ................
0020 00000000 1b000000 00410e08 8502420d .........A....B.
0030 0557c50c 04040000 1c000000 3c000000 .W..........<...
0040 1b000000 35000000 00410e08 8502420d ....5....A....B.
0050 0571c50c 04040000 .q......
Disassembly of section .text:
00000000 <func>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 18 sub $0x18,%esp
6: 8b 45 08 mov 0x8(%ebp),%eax
9: 89 44 24 04 mov %eax,0x4(%esp)
d: c7 04 24 00 00 00 00 movl $0x0,(%esp)
14: e8 fc ff ff ff call 15 <func+0x15>
19: c9 leave
1a: c3 ret
0000001b <main>:
1b: 55 push %ebp
1c: 89 e5 mov %esp,%ebp
1e: 83 e4 f0 and $0xfffffff0,%esp
21: 83 ec 20 sub $0x20,%esp
24: c7 44 24 18 01 00 00 movl $0x1,0x18(%esp)
2b: 00
2c: 8b 15 04 00 00 00 mov 0x4,%edx
32: a1 00 00 00 00 mov 0x0,%eax
37: 01 d0 add %edx,%eax
39: 03 44 24 18 add 0x18(%esp),%eax
3d: 03 44 24 1c add 0x1c(%esp),%eax
41: 89 04 24 mov %eax,(%esp)
44: e8 fc ff ff ff call 45 <main+0x2a>
49: b8 00 00 00 00 mov $0x0,%eax
4e: c9 leave
4f: c3 ret
.text section
从上边的结果,可以清楚的看到.text section的内容,以及它们代表的汇编代码
- 从header信息可知,代码段的大小为0x50个字节,符合Contents of seciont .text的大小
- 第一个字节
0x55
表示汇编指令push %ebp
- 最后一个字节
0xc3
表示汇编指令ret
.data section
从源代码可知,有两个初始化的 int 数据 global_init_var(84) 和 static_var(85)
- 它们保存在文件偏移地址为
00000084
的位置 -
0x84
保存着global_init_var
,值为0x54 00 00 00
,4字节大端存储 -
0x88
保存着static_var
,值为0x55 00 00 00
,4字节大端存储
.rodata section
从源代码可知,有一个字符串"%d\n",它就保存在只读数据段,占用4个字节
$ hexdump -C SimpleSection.o
读取整个二进制文件的内容
$ objdump -h SimpleSection.o
获取关键段的基本信息,比较重要的有在文件中的偏移地址和大小
$ objdump -s -d SimpleSection.o
读取关键段的内容
4. 目标文件结构分析
ELF目标文件的总体结构
ELF文件内容涉及的比较多,这里只关注File Header、表、符号等内容
/usr/include/elf.h
定义了ELF用到的所有数据类型和结构体
File Header
- 结构体(
Elf32_Ehdr
)
/* The ELF file header. This appears at the start of every ELF file. */
#define EI_NIDENT (16)
typedef struct
{
unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
Elf32_Half e_type; /* Object file type */
Elf32_Half e_machine; /* Architecture */
Elf32_Word e_version; /* Object file version */
Elf32_Addr e_entry; /* Entry point virtual address */
Elf32_Off e_phoff; /* Program header table file offset */
Elf32_Off e_shoff; /* Section header table file offset */ 段表在文件中的偏移地址
Elf32_Word e_flags; /* Processor-specific flags */
Elf32_Half e_ehsize; /* ELF header size in bytes */
Elf32_Half e_phentsize; /* Program header table entry size */
Elf32_Half e_phnum; /* Program header table entry count */
Elf32_Half e_shentsize; /* Section header table entry size */
Elf32_Half e_shnum; /* Section header table entry count */
Elf32_Half e_shstrndx; /* Section header string table index */
} Elf32_Ehdr;
-
$ hexdump -C SimpleSection.o
Header内容16进制
00000000 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 01 00 03 00 01 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020 74 01 00 00 00 00 00 00 34 00 00 00 00 00 28 00 |t.......4.....(.|
00000030 0d 00 0a 00
-
$ readelf -h SimpleSection.o
Header内容格式化输出
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 372 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 40 (bytes)
Number of section headers: 13
Section header string table index: 10
7f 45 4c 46 是文件的魔数(0x7f 'E' 'L' 'F'
)
01 01 01 分别代表文件类型(0代表无效文件, 1代表32位,2代表64位
)、字节序(0代表无效,1代表小端,2代表大端
)、ELF文件版本号(固定为1
)
Section Header Table
- Section Header 段描述符结构体(
Elf32_Shdr
)
/* Section header. */
typedef struct
{
Elf32_Word sh_name; /* Section name (string tbl index) */
Elf32_Word sh_type; /* Section type */
Elf32_Word sh_flags; /* Section flags */
Elf32_Addr sh_addr; /* Section virtual addr at execution */
Elf32_Off sh_offset; /* Section file offset */
Elf32_Word sh_size; /* Section size in bytes */
Elf32_Word sh_link; /* Link to another section */
Elf32_Word sh_info; /* Additional section information */
Elf32_Word sh_addralign; /* Section alignment */
Elf32_Word sh_entsize; /* Entry size if section holds table */
} Elf32_Shdr;
-
$ hexdump -C SimpleSection.o
Section Header Table 内容16进制
从ELF Header信息,可以获取到
-
Start of section headers: 372 (bytes into file)
Section Header Table在文件中的偏移地址 -
Size of section headers: 40 (bytes)
每一个Section Header的大小(sizeof struct Elf32_Shdr) -
Number of section headers: 13
Section Header的个数 - 计算得Section Header Table起始偏移:372(0x174) -> 892(0x37C)
00000170 6d 65 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |me..............|
00000180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000190 00 00 00 00 00 00 00 00 00 00 00 00 1f 00 00 00 |................|
000001a0 01 00 00 00 06 00 00 00 00 00 00 00 34 00 00 00 |............4...|
000001b0 50 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 |P...............|
000001c0 00 00 00 00 1b 00 00 00 09 00 00 00 00 00 00 00 |................|
000001d0 00 00 00 00 e4 04 00 00 28 00 00 00 0b 00 00 00 |........(.......|
000001e0 01 00 00 00 04 00 00 00 08 00 00 00 25 00 00 00 |............%...|
000001f0 01 00 00 00 03 00 00 00 00 00 00 00 84 00 00 00 |................|
00000200 08 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 |................|
00000210 00 00 00 00 2b 00 00 00 08 00 00 00 03 00 00 00 |....+...........|
00000220 00 00 00 00 8c 00 00 00 04 00 00 00 00 00 00 00 |................|
00000230 00 00 00 00 04 00 00 00 00 00 00 00 30 00 00 00 |............0...|
00000240 01 00 00 00 02 00 00 00 00 00 00 00 8c 00 00 00 |................|
00000250 04 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 |................|
00000260 00 00 00 00 38 00 00 00 01 00 00 00 30 00 00 00 |....8.......0...|
00000270 00 00 00 00 90 00 00 00 2b 00 00 00 00 00 00 00 |........+.......|
00000280 00 00 00 00 01 00 00 00 01 00 00 00 41 00 00 00 |............A...|
00000290 01 00 00 00 00 00 00 00 00 00 00 00 bb 00 00 00 |................|
000002a0 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 |................|
000002b0 00 00 00 00 55 00 00 00 01 00 00 00 02 00 00 00 |....U...........|
000002c0 00 00 00 00 bc 00 00 00 58 00 00 00 00 00 00 00 |........X.......|
000002d0 00 00 00 00 04 00 00 00 00 00 00 00 51 00 00 00 |............Q...|
000002e0 09 00 00 00 00 00 00 00 00 00 00 00 0c 05 00 00 |................|
000002f0 10 00 00 00 0b 00 00 00 08 00 00 00 04 00 00 00 |................|
00000300 08 00 00 00 11 00 00 00 03 00 00 00 00 00 00 00 |................|
00000310 00 00 00 00 14 01 00 00 5f 00 00 00 00 00 00 00 |........_.......|
00000320 00 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 |................|
00000330 02 00 00 00 00 00 00 00 00 00 00 00 7c 03 00 00 |............|...|
00000340 00 01 00 00 0c 00 00 00 0b 00 00 00 04 00 00 00 |................|
00000350 10 00 00 00 09 00 00 00 03 00 00 00 00 00 00 00 |................|
00000360 00 00 00 00 7c 04 00 00 65 00 00 00 00 00 00 00 |....|...e.......|
00000370 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 |................|
-
$ readelf -S SimpleSection.o
段表格式化输出
There are 13 section headers, starting at offset 0x174:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 000034 000050 00 AX 0 0 4
[ 2] .rel.text REL 00000000 0004e4 000028 08 11 1 4
[ 3] .data PROGBITS 00000000 000084 000008 00 WA 0 0 4
[ 4] .bss NOBITS 00000000 00008c 000004 00 WA 0 0 4
[ 5] .rodata PROGBITS 00000000 00008c 000004 00 A 0 0 1
[ 6] .comment PROGBITS 00000000 000090 00002b 01 MS 0 0 1
[ 7] .note.GNU-stack PROGBITS 00000000 0000bb 000000 00 0 0 1
[ 8] .eh_frame PROGBITS 00000000 0000bc 000058 00 A 0 0 4
[ 9] .rel.eh_frame REL 00000000 00050c 000010 08 11 8 4
[10] .shstrtab STRTAB 00000000 000114 00005f 00 0 0 1
[11] .symtab SYMTAB 00000000 00037c 000100 10 12 11 4
[12] .strtab STRTAB 00000000 00047c 000065 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
.symtab 符号表 段
ELF文件中的符号存在于一个符号表中,而该表又是作为ELF的一个段:.symtab
- 符号的结构体(
Elf32_Sym
)
typedef struct
{
Elf32_Word st_name; /* Symbol name (string tbl index) */
Elf32_Addr st_value; /* Symbol value */
Elf32_Word st_size; /* Symbol size */
unsigned char st_info; /* Symbol type and binding */
unsigned char st_other; /* Symbol visibility */
Elf32_Section st_shndx; /* Section index */
} Elf32_Sym;
$ readelf -s SimpleSection.o
Symbol table '.symtab' contains 16 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 FILE LOCAL DEFAULT ABS SimpleSection.c
2: 00000000 0 SECTION LOCAL DEFAULT 1
3: 00000000 0 SECTION LOCAL DEFAULT 3
4: 00000000 0 SECTION LOCAL DEFAULT 4
5: 00000000 0 SECTION LOCAL DEFAULT 5
6: 00000004 4 OBJECT LOCAL DEFAULT 3 static_var.1236
7: 00000000 4 OBJECT LOCAL DEFAULT 4 static_var2.1237
8: 00000000 0 SECTION LOCAL DEFAULT 7
9: 00000000 0 SECTION LOCAL DEFAULT 8
10: 00000000 0 SECTION LOCAL DEFAULT 6
11: 00000000 4 OBJECT GLOBAL DEFAULT 3 global_init_var
12: 00000004 4 OBJECT GLOBAL DEFAULT COM global_uninit_var
13: 00000000 27 FUNC GLOBAL DEFAULT 1 func
14: 00000000 0 NOTYPE GLOBAL DEFAULT UND printf
15: 0000001b 53 FUNC GLOBAL DEFAULT 1 main
Bind 表明了该符号(变量、函数)的绑定信息
- LOCAL 局部符号,对于目标文件以外的文件不可见
- GLOBAL 全局符号,外部可见
- WEAK 弱引用
- 如果该符号存在着定义,编译器将进行该符号的引用决议
- 如果该符号未定义,则编译器不报错,将其定义为0
- 编译器的__attribute__ ((weak)),可以将符号声明为弱符号
Ndx 表明了该符号所在的段
- 如果符号定义在本文件中,那么该值表示所在段的下标
- 对于不在本文件中定义的符号,有一些特殊值
- ABS,该符号包含了一个绝对的值,比如文件名的符号
- COMMON,该符号是一个“Common块”类型的符号,未初始化的全局变量就属于这种类型
- Undef,该符号未在本文件中定义,引用的其它文件中的
Value 对于不同的符号有不同的意思
- 对于变量和函数来说,value就是它们相对于所在段的偏移地址
- 对于可执行文件,value代表符号在虚拟内存中的虚拟地址