下载
到官网下载https://github.com/medcl/elasticsearch-analysis-ik对应版本的ik(直接下载releases版本,避免maven打包!!!
如果不是这个版本,则需要进入解压后的目录使用mvn package打包,然后在target->releases目录下会生成对应的zip文件)
安装
上传zip包到plugins目录,然后解压:
[es@Centos-51 plugins]$ ls
elasticsearch-analysis-ik-7.4.2.zip
[es@Centos-51 plugins]$ mkdir elasticsearch-analysis-ik-7.4.2
[es@Centos-51 plugins]$ ls
elasticsearch-analysis-ik-7.4.2 elasticsearch-analysis-ik-7.4.2.zip
[es@Centos-51 plugins]$ unzip elasticsearch-analysis-ik-7.4.2.zip -d ./elasticsearch-analysis-ik-7.4.2
Archive: elasticsearch-analysis-ik-7.4.2.zip
inflating: ./elasticsearch-analysis-ik-7.4.2/elasticsearch-analysis-ik-7.4.2.jar
inflating: ./elasticsearch-analysis-ik-7.4.2/httpclient-4.5.2.jar
inflating: ./elasticsearch-analysis-ik-7.4.2/httpcore-4.4.4.jar
inflating: ./elasticsearch-analysis-ik-7.4.2/commons-logging-1.2.jar
inflating: ./elasticsearch-analysis-ik-7.4.2/commons-codec-1.9.jar
inflating: ./elasticsearch-analysis-ik-7.4.2/plugin-descriptor.properties
inflating: ./elasticsearch-analysis-ik-7.4.2/plugin-security.policy
creating: ./elasticsearch-analysis-ik-7.4.2/config/
inflating: ./elasticsearch-analysis-ik-7.4.2/config/surname.dic
inflating: ./elasticsearch-analysis-ik-7.4.2/config/quantifier.dic
inflating: ./elasticsearch-analysis-ik-7.4.2/config/extra_stopword.dic
inflating: ./elasticsearch-analysis-ik-7.4.2/config/suffix.dic
inflating: ./elasticsearch-analysis-ik-7.4.2/config/extra_single_word_full.dic
inflating: ./elasticsearch-analysis-ik-7.4.2/config/extra_single_word.dic
inflating: ./elasticsearch-analysis-ik-7.4.2/config/preposition.dic
inflating: ./elasticsearch-analysis-ik-7.4.2/config/IKAnalyzer.cfg.xml
inflating: ./elasticsearch-analysis-ik-7.4.2/config/main.dic
inflating: ./elasticsearch-analysis-ik-7.4.2/config/stopword.dic
inflating: ./elasticsearch-analysis-ik-7.4.2/config/extra_main.dic
inflating: ./elasticsearch-analysis-ik-7.4.2/config/extra_single_word_low_freq.dic
[es@Centos-51 plugins]$ ls
elasticsearch-analysis-ik-7.4.2 elasticsearch-analysis-ik-7.4.2.zip
[es@Centos-51 plugins]$ cd elasticsearch-analysis-ik-7.4.2
[es@Centos-51 elasticsearch-analysis-ik-7.4.2]$ ls
commons-codec-1.9.jar commons-logging-1.2.jar config elasticsearch-analysis-ik-7.4.2.jar httpclient-4.5.2.jar httpcore-4.4.4.jar plugin-descriptor.properties plugin-security.policy
[es@Centos-51 elasticsearch-analysis-ik-7.4.2]$ cd ..
[es@Centos-51 plugins]$ ls
elasticsearch-analysis-ik-7.4.2 elasticsearch-analysis-ik-7.4.2.zip
[es@Centos-51 plugins]$ rm -rf elasticsearch-analysis-ik-7.4.2.zip
验证
使用ik_smart分词结果:
{
"tokens": [
{
"token": "我",
"start_offset": 0,
"end_offset": 1,
"type": "CN_CHAR",
"position": 0
},
{
"token": "是",
"start_offset": 1,
"end_offset": 2,
"type": "CN_CHAR",
"position": 1
},
{
"token": "中国人",
"start_offset": 2,
"end_offset": 5,
"type": "CN_WORD",
"position": 2
}
]
}
使用ik_max_word分词结果:
{
"tokens": [
{
"token": "我",
"start_offset": 0,
"end_offset": 1,
"type": "CN_CHAR",
"position": 0
},
{
"token": "是",
"start_offset": 1,
"end_offset": 2,
"type": "CN_CHAR",
"position": 1
},
{
"token": "中国人",
"start_offset": 2,
"end_offset": 5,
"type": "CN_WORD",
"position": 2
},
{
"token": "中国",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 3
},
{
"token": "国人",
"start_offset": 3,
"end_offset": 5,
"type": "CN_WORD",
"position": 4
}
]
}
ik_max_word: 会将文本做最细粒度的拆分,比如会将“我是中国人”拆分为“我,是,中国人,中国,国人”,会穷尽各种可能的组合。
ik_smart: 会做最粗粒度的拆分,比如会将“我是中国人”拆分为“我,是,中国人”。