四.IK分词器

针对词条查询（TermQuery）,查看默认中文分词器的效果:

[itstar@hadoop105 elasticsearch]$ curl -XGET 'http://hadoop105:9200/_analyze?

pretty&analyzer=standard' -d '中华人民共和国'

{

"tokens" : [

{

"token" : "中",

"start_offset" : 0,

"end_offset" : 1,

"type" : "<IDEOGRAPHIC>",

"position" : 0

{

"token" : "华",

"start_offset" : 1,

"end_offset" : 2,

"type" : "<IDEOGRAPHIC>",

"position" : 1

{

"token" : "人",

"start_offset" : 2,

"end_offset" : 3,

"type" : "<IDEOGRAPHIC>",

"position" : 2

{

"token" : "民",

"start_offset" : 3,

"end_offset" : 4,

"type" : "<IDEOGRAPHIC>",

"position" : 3

{

"token" : "共",

"start_offset" : 4,

"end_offset" : 5,

"type" : "<IDEOGRAPHIC>",

"position" : 4

{

"token" : "和",

"start_offset" : 5,

"end_offset" : 6,

"type" : "<IDEOGRAPHIC>",

"position" : 5

{

"token" : "国",

"start_offset" : 6,

"end_offset" : 7,

"type" : "<IDEOGRAPHIC>",

"position" : 6

}

]

}

4.1 IK分词器的安装

4.1.1 前期准备工作

1）CentOS联网

配置CentOS能连接外网。Linux虚拟机ping www.baidu.com 是畅通的

2）jar包准备

（1）elasticsearch-analysis-ik-master.zip

(下载地址:https://github.com/medcl/elasticsearch-analysis-ik)

（2）apache-maven-3.0.5-bin.tar.gz

4.1.2 jar包安装

~~1）Maven解压、配置MAVEN_HOME和PATH。~~

[itstar@hadoop102 software]# tar -zxvf apache-maven-3.0.5-bin.tar.gz -C /opt/module/

[itstar@hadoop102 apache-maven-3.0.5]# sudo vi /etc/profifile

#MAVEN_HOME export MAVEN_HOME=/opt/module/apache-maven-3.0.5 export

PATH=$PATH:$MAVEN_HOME/bin

[itstar@hadoop101 software]#source /etc/profifile

验证命令：mvn -version

2）Ik分词器解压、打包与配置

ik分词器解压

[itstar@hadoop102 software]$ unzip elasticsearch-analysis-ik-master.zip -d ./

进入ik分词器所在目录

[itstar@hadoop102 software]$ cd elasticsearch-analysis-ik-master

使用maven进行打包

[itstar@hadoop102 elasticsearch-analysis-ik-master]$ mvn package -Pdist,native -DskipTests -

Dtar

打包完成之后，会出现 target/releases/elasticsearch-analysis-ik-{version}.zip

[itstar@hadoop102 releases]$ pwd /opt/software/elasticsearch-analysis-ik�

master/target/releases

对zip文件进行解压，并将解压完成之后的文件拷贝到es所在目录下的/plugins/

[itstar@hadoop102 releases]$ unzip elasticsearch-analysis-ik-6.0.0.zip

[itstar@hadoop102 releases]$ cp -r elasticsearch /opt/module/elasticsearch-5.6.1/plugins/

需要修改plugin-descriptor.properties文件，将其中的es版本号改为你所使用的版本号，即完成ik分词

器的安装

[itstar@hadoop102 elasticsearch]$ vi plugin-descriptor.properties

71行 elasticsearch.version=6.0.0 修改为 elasticsearch.version=5.6.1

至此，安装完成，重启ES！

注意：需选择与es相同版本的ik分词器。

安装方法（2种）：

./elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.1.1/elasticsearch-analysis-ik-6.1.1.zip

cp elasticsearch-analysis-ik-6.1.1.zip ./elasticsearch-6.1.1/plugins/

unzip elasticsearch-analysis-ik-6.1.1.zip -d ik-analyzer

elasticsearch-plugin install -f file:///usr/local/elasticsearch-analysis-ik- 6.1.1.zip

4.2 IK分词器的使用

4.2.1 命令行查看结果

ik_smart模式

[itstar@hadoop102 elasticsearch]$ curl -XGET 'http://hadoop104:9200/_analyze?

pretty&analyzer=ik_smart' -d '中华人民共和国'

curl -H "Content-Type:application/json" -XGET

'http://192.168.109.133:9200/_analyze?pretty' -d

'{"analyzer":"ik_smart","text":"中华人民共和国"}'

{

"tokens" : [

{

"token" : "中华人民共和国",

"start_offset" : 0,

"end_offset" : 7,

"type" : "CN_WORD",

"position" : 0

}

]

}

ik_max_word模式

[itstar@hadoop102 elasticsearch]$ curl -XGET 'http://hadoop104:9200/_analyze?

pretty&analyzer=ik_max_word' -d '中华人民共和国'

curl -H "Content-Type:application/json" -XGET

'http://192.168.109.133:9200/_analyze?pretty' -d

'{"analyzer":"ik_max_word","text":"中华人民共和国"}'

{

"tokens" : [

{

"token" : "中华人民共和国",

"start_offset" : 0,

"end_offset" : 7,

"type" : "CN_WORD",

"position" : 0

{

"token" : "中华人民",

"start_offset" : 0,

"end_offset" : 4,

"type" : "CN_WORD",

"position" : 1

{

"token" : "中华",

"start_offset" : 0,

"end_offset" : 2,

"type" : "CN_WORD",

"position" : 2

{

"token" : "华人",

"start_offset" : 1,

"end_offset" : 3,

"type" : "CN_WORD",

"position" : 3

{

"token" : "人民共和国",

"start_offset" : 2,

"end_offset" : 7,

"type" : "CN_WORD",

"position" : 4

{

"token" : "人民",

"start_offset" : 2,

"end_offset" : 4,

"type" : "CN_WORD",

"position" : 5

{

"token" : "共和国",

"start_offset" : 4,

"end_offset" : 7,

"type" : "CN_WORD",

"position" : 6

{

"token" : "共和",

"start_offset" : 4,

"end_offset" : 6,

"type" : "CN_WORD",

"position" : 7

{

"token" : "国",

"start_offset" : 6,

"end_offset" : 7,

"type" : "CN_CHAR",

"position" : 8

}

]

}

4.2.2 JavaAPI操作

1）创建索引

//创建索引(数据库)

@Test

public void createIndex() {

//创建索引

client.admin().indices().prepareCreate("blog4").get();

//关闭资源

client.close();

}

2）创建mapping

//创建使用ik分词器的mapping

@Test

public void createMapping() throws Exception {

// 1设置mapping

XContentBuilder builder = XContentFactory.jsonBuilder()

.startObject()

.startObject("article")

.startObject("properties")

.startObject("id1")

.field("type", "string")

.field("store", "yes")

.field("analyzer","ik_smart")

.endObject()

.startObject("title2")

.field("type", "string")

.field("store", "no")

.field("analyzer","ik_smart")

.endObject()

.startObject("content")

.field("type", "string")

.field("store", "yes")

.field("analyzer","ik_smart")

.endObject()

.endObject();

// 2 添加mapping

PutMappingRequest mapping =

Requests.putMappingRequest("blog4").type("article").source(builder);

client.admin().indices().putMapping(mapping).get();

// 3 关闭资源

client.close();

}

3）插入数据

//创建文档,以map形式

@Test

public void createDocumentByMap() {

HashMap<String, String> map = new HashMap<>();

map.put("id1", "2");

map.put("title2", "Lucene");

map.put("content", "它提供了一个分布式的web接口");

IndexResponse response = client.prepareIndex("blog4", "article",

"3").setSource(map).execute().actionGet();

//打印返回的结果

System.out.println("结果:" + response.getResult());

System.out.println("id:" + response.getId());

System.out.println("index:" + response.getIndex());

System.out.println("type:" + response.getType());

System.out.println("版本:" + response.getVersion());

//关闭资源

client.close();

}

4 词条查询

```java

//词条查询

@Test

public void queryTerm() {

SearchResponse response =

client.prepareSearch("blog4").setTypes("article").setQuery(QueryBuilders.termQue

ry("content","提供")).get();

//获取查询命中结果

SearchHits hits = response.getHits();

System.out.println("结果条数:" + hits.getTotalHits());

for (SearchHit hit : hits) {

System.out.println(hit.getSourceAsString());

}

```

5）结果查看

Store 的解释：

使用 elasticsearch 时碰上了很迷惑的地方，我看官方文档说 store 默认是 no ，我想当然的理解为也就是说这个 fifield 是不会 store 的，但是查询的时候也能查询出来，经过查找资料了解到原来 store 的意思是，是否在 source 之外在独立存储一份，这里要说一下 _source 这是源文档，当你索引数据的时候，elasticsearch 会保存一份源文档到 _source ，如果文档的某一字段设置了 store 为 yes (默认为 no)，这时候会在 _source 存储之外再为这个字段独立进行存储，这么做的目的主要是针对内容比较多的字段，放到 _source 返回的话，因为source 是把所有字段保存为一份文档，命中后读取只需要一次 IO，包含内容特别多的字段会很占带宽影响性能，通常我们也不需要完整的内容返回(可能只关心摘要)，这时候就没必要放到 _source 里一起返回了(当然也可以在查询时指定返回字段)。

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 202,100评论 5赞 474
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 84,862评论 2赞 378
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 148,993评论 0赞 335
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,309评论 1赞 272
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,303评论 5赞 363
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,421评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,830评论 3赞 393
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,501评论 0赞 256
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,689评论 1赞 295
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,506评论 2赞 318
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,564评论 1赞 329
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,286评论 4赞 318
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,826评论 3赞 305
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,875评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,114评论 1赞 259
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 42,705评论 2赞 348
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,269评论 2赞 341

四.IK分词器

4.1 IK分词器的安装

4.1.2 jar包安装

4.2 IK分词器的使用

4.2.1 命令行查看结果

4.2.2 JavaAPI操作

推荐阅读更多精彩内容