序言
show character set;
--查看mysql支持的字符集,部分输出如下
Character | Describption | Default collation | Maxlen |
---|---|---|---|
ascii | US ASCII | ascii_general_ci | 1 |
big5 | Big5 Traditional Chinese(繁体中文) | big5_chinese_ci | 2 |
binary | Binary pseudo charset | binary | 1 |
gb18030 | China National Standard GB18030 | gb18030_chinese_ci | 4 |
gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |
utf16 | UTF-16 Unicode | utf16_general_ci | 4 |
utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |
utf32 | UTF-32 Unicode | utf32_general_ci | 4 |
utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
utf8mb4 | UTF-8 Unicode | utf8mb4_0900_ai_ci | 4 |
show collation;
--查看mysql支持的collation
指定字符集与排序规则
服务器的字符集与排序规则
-
character_set_server
的默认值为utf8mb4
show variables like 'character_set_server'; --输出的Value列为utf8mb4
-
在
my.ini
中可以设置character_set_server
的值:character-set-server=utf8; --在my.ini中添加上述行,重启服务,即可设置character_set_server的值
-
在运行过程中,可以更改
character_set_server
的值:set character_set_server = gbk;
如果未在
create database
语句中指定字符集和排序规则,则会将character_set_server
和collation_server
作为默认值若只指定了字符集没有指定排序规则,则使用字符集对应的默认的排序规则
数据库的字符集与排序规则
-
可以在
create database
和alter database
时指定字符集和排序规则create database d character set gbk collate gbk_bin; alter database d character set gb2312 collate gb2312_bin; use d; SELECT @@character_set_database, @@collation_database; --此sql语句用于输出当前数据库的字符集与排序规则
-
当前数据库的字符集和排序规则,可以通过查看
character_set_database
和collation_database
来确定。若当前没有默认数据库,则character_set_database
和collation_database
与character_set_server
和collation_server
相同create database d0 character set gbk collate gbk_bin; create database d1 character set utf8 collate utf8_bin; use d0; show variables like 'character_set_database'; --输出的character_set_database值为gbk use d1; show variables like 'character_set_database'; --输出的character_set_database值为utf8
-
数据库的字符集和排序规则会产生以下影响:
create table
未指定字符集和排序规则时,则采用数据库的字符集和排序规则load data
未指定字符集和排序规则,则使用数据库的字符集来解释文件中的信息-
创建的存储过程和函数未指定字符集和排序规则,则使用数据库的字符集和排序规则
use d0; --create database d0 character set gbk collate gbk_bin; delimiter \ create procedure p0(in str varchar(111)) begin select charset(str); end\ delimiter ; call p0("szn"); --输出:gbk
use d1; --database d1 character set utf8 collate utf8_bin; delimiter \ create procedure p1(in str varchar(111)) begin select charset(str); end\ delimiter ; call p1("szn"); --输出:utf8
use d1; --database d1 character set utf8 collate utf8_bin; delimiter \ create procedure p_set(in str varchar(111) character set gb2312) begin select charset(str); end\ delimiter ; call p_set("szn"); --输出:gb2312
表的字符集与排序规则
-
可以在
create table
和alter table
时指定表的字符集和排序规则create table t0(name varchar(1024)) character set gbk collate gbk_bin; alter table t0 character set utf8;
若未指定表的字符集和排序规则,则使用对应数据库的字符集和排序规则
若表中的列未指定字符集和排序规则,则对应的列使用表的字符集和排序规则
列的字符集与排序规则
-
在
create table
和alter table
时指定列的字符集和排序规则create table t2(name varchar(1024) character set gbk collate gbk_bin); alter table t2 modify column name varchar(1024) character set utf8 collate utf8_bin;
使用
alter table
将列的字符集进行更改时,若字符集不兼容,则可能会丢失数据
字符串字面量的字符集与排序规则
字符串字面量的默认字符集和排序规则由
character_set_connection
和collation_connection
指定-
字符串字面量可以设定字符集与排序规则:
show variables like 'character_set_connection'; --输出值:utf8 set @b = "szn"; select charset(@b); --输出值:utf8 set @c = _gbk"哈" COLLATE gbk_bin; --字符串字面量前指定编码方式,这种方式称为介绍器(Character Set Introducers) select charset(@c); --输出值:gbk
-
字符串字面量指定字符集并不会更改其值,解析器最终仍按照
character_set_connection
进行处理set @c0 = _gbk"哈" COLLATE gbk_bin; set @c1 = _utf8"哈" COLLATE utf8_bin; select hex(@c0), hex(@c1); --两个输出均为"哈"的utf8编码 E59388
Character Set Introducers
字符串字面量,十六进制字面量,位字面量都有一个可选的字符集和排序规则设置,这称为介绍器
-
介绍器告诉解析器后面字符串使用的字符集,但是不会改变字符串的值
set @s0 = _utf8 0xE59388; set @s1 = _gbk 0xB9FE; set @s2 = _gbk X'B9FE'; set @s3 = _utf8 b'111001011001001110001000'; set @s4 = _utf8 0b111001011001001110001000; select @s0, @s1, @s2, @s3, @s4; --输出:哈,哈,哈,哈, 哈
SET @v1 = X'000D' | X'0BC0'; --@V1是bigint类型 SET @v2 = _binary X'000D' | X'0BC0'; --@v2是字符串 SELECT HEX(@v1), HEX(@v2); --输出:BCD, 0BCD select @v1 + 1, @v2 + 1, "0BCD" + "1";. --输出:3022, 1, 1
连接的字符集与排序规则
每个客户端的连接都有特定且可更改的字符集和排序规则
-
客户端发起
sql
查询,至结果返回到客户端:-
character_set_client
表明客户端发送过来的sql
语句的字符集 - 服务器将接收到的
sql
语句的字符集转换为character_set_connection
- 服务器将
sql
的执行结果的字符集转换为character_set_results
-
-
character_set_client
不允许一些字符集设置:ucs2 utf16 utf16le utf32
-
set character set
会同时将character_set_client
和character_set_results
设定为给定值,并且将character_set_connection
设置为character_set_database
的值set character_set_client = ascii; set character_set_results = big5; set character_set_database = gb2312; set character_set_connection = gbk; set character set latin2; show variables like 'character%';
Variable_name Value character_set_client latin2 character_set_connection gb2312 character_set_database gb2312 character_set_results latin2 my.ini
中添加default-character-set=latin2
,重启服务,将会同时更改character_set_client
,character_set_connection
,character_set_results
为latin2
可以在登录
mysql
时:mysql -u root -p --default-character-set=latin1
set names gb2312; --等效于下面三句代码 set character_set_client = gb2312; set character_set_results = gb2312; set character_set_connection = gb2312;
Unicode支持
The Unicode Standard includes characters from the Basic Multilingual(使用多种语言的) Plane (BMP) and supplementary(追加的) characters that lie outside the BMP.
BMP characters have these characteristics:
- Their code point values are between 0 and 65535 (or
U+0000
andU+FFFF
). - They can be encoded in a variable-length encoding using 8, 16, or 24 bits (1 to 3 bytes).
- They can be encoded in a fixed-length encoding using 16 bits (2 bytes).
- They are sufficient(充分的) for almost all characters in major(主要的) languages.
Supplementary characters lie outside the BMP:
- Their code point values are between
U+10000
andU+10FFFF
). - 占用的空间比BMP内的字符大,最多占用4字节
mysql
支持的unicode
:
字符集 | 单个字符占用空间 | 支持的字符 | 描述 |
---|---|---|---|
utf8mb4 |
1到4字节 | BMP and supplementary | A UTF-8 encoding of the Unicode character set |
utf8mb3 | 1到3字节 | BMP only | A UTF-8 encoding of the Unicode character set |
utf8 | 1到3字节 | BMP only | An alias for utf8mb3 . |
ucs2 | 2字节 | BMP and supplementary | The UCS-2 encoding of the Unicode character set |
utf16 | 2或4字节 | BMP and supplementary | The UTF-16 encoding for the Unicode character set . Like ucs2 but with an extension for supplementary characters. |
utf16le | 2或4字节 | BMP and supplementary | The UTF-16LE encoding for the Unicode character set. Like utf16 but little-endian rather than big-endian. |
utf32 | 4字节 | BMP and supplementary | The UTF-32 encoding for the Unicode character set |
utf8
当前是utf8mb3
的别称,但是utf8mb3
将在mysql
将来的版本被删除。有时候utf8
会变成utf8mb4
的别称,所以为了避免模糊不清,请考虑使用utf8mb4
对于BMP字符,
utf8mb4
与utf8mb3
具有相同编码值,占用的空间也相同。即utf8mb4
是utf8mb3
的超集对于补充字符,
utf8mb4
需要四个字节来存储