Elasticsearch入门

Elasticsearch是一个开源的、高度可扩展的全文搜索与分析引擎。它可以存储海量的数据，能近乎实时地搜索和分析数据，能支撑复杂的查询需求。

Elasticsearch的使用场景有：

在线商店搜索

日志分析（ELK技术栈）

商品价格波动监控

海量数据的快速调查、分析、可视化和即席查询

Elasticsearch功能强大，使用简单，接下来我们将介绍Elasticsearch集群的搭建和简单使用，以快速上手。

基本概念

集群

集群由一个或多个节点构成，使用唯一的名字标识，默认为elasticsearch。如果一个网络环境中运行着多个Elasticsearch集群，集群名字最好不要相同。因为如果节点配置为根据集群名字加入集群，那么就会产生冲突。

节点

节点是集群中的单个服务器。节点也以名字进行标识，默认为UUID，在启动时获得。节点名字可以配置。集群可以包含任意多个节点，单节点也可以构成一个集群。

索引

索引是文档的集合。集群中可以创建任意多个索引，只要资源足够。

类型

索引中可以定义一个或多个类型，类型是索引下的逻辑分类，通常拥有共同字段的文档定义在一个类型之内。

文档

文档是索引中信息的基本单元。

分片（shard）和副本（replica）

索引可以存储大量的数据，会超过单个节点的硬件上限。例如，一个包含10亿文档的索引占1TB硬盘空间，单个节点要么空间不够，要么相应查询的速度太慢。

为了解决这一问题，Elasticsearch支持将一个索引分成多个小块，称为分片。在创建索引的时候可以定义分片数。每一个分片相当于一个功能完备的独立的小索引，可以存储在集群的任意节点上。

分片重要的原因有两点：

1. 它能水平拆分数据

2. 并行操作分片，提升吞吐量

在网络和云环境中，故障随时可能发生，因此故障恢复机制十分必要。Elasticsearch支持为分片创建一个或多个副本，称为分片副本。

副本有两个好处：

1. 高可用性。

2. 提升查询的吞吐量。

总的来说，每一个索引可以拆分成多个分片，可以复制多个副本，存在主分片和分片副本。分片数和副本数都可以在创建索引时指定，不同的是，分片数确定之后就不能更改，而副本数可以动态修改。

默认情况下，每个索引拥有5个主分片和一个副本（即5个分片，每个分片都有一个副本）。

每一个Elasticsearch分片都是一个Lucene索引。Lucene索引有文档数上限。在LUCENE-5843中，该上限为2,147,483,519 (=Integer.MAX_VALUE-128)。可以使用_cat/shards监控分片的大小。

curl -XGET gd01:9200/_cat/shards/20171229

20171229 1 p STARTED 1904509 369.5mb 132.98.16.178 data-178

20171229 1 r STARTED 1902986 383.6mb 132.98.16.176 master-176

20171229 3 r STARTED 1898048 349.7mb 132.98.16.178 data-178

20171229 3 p STARTED 1898595 492.2mb 132.98.16.177 data-177

20171229 2 r STARTED 1903094 481.2mb 132.98.16.178 data-178

20171229 2 p STARTED 1904497 526.9mb 132.98.16.176 master-176

20171229 4 p STARTED 1902180 487mb 132.98.16.178 data-178

20171229 4 r STARTED 1900635 586.9mb 132.98.16.176 master-176

20171229 0 p STARTED 1902472 421.6mb 132.98.16.177 data-177

20171229 0 r STARTED 1901511 511.8mb 132.98.16.176 master-176

Elasticsearch集群安装

Elasticsearch集群依赖JDK1.8，因此在安装之前应先安装好JDK1.8。

下载安装文件

curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.5.tar.gz

解压

tar -xvf elasticsearch-5.6.5.tar.gz

启动单节点

elasticsearch-5.6.5/bin/elasticsearch

集群配置

elasticsearch.yml示例

# ======================== Elasticsearch Configuration =========================

# NOTE: Elasticsearch comes with reasonable defaults for most settings.

# Before you set out to tweak and tune the configuration, make sure you

# understand what are you trying to accomplish and the consequences.

# The primary way of configuring a node is via this file. This template lists

# the most important settings you may want to configure for a production cluster.

# Please consult the documentation for further information on configuration options:

# https://www.elastic.co/guide/en/elasticsearch/reference/index.html

# ---------------------------------- Cluster -----------------------------------

# Use a descriptive name for your cluster:

cluster.name: es-gotcha

# ------------------------------------ Node ------------------------------------

# Use a descriptive name for the node:

node.name: node-${HOSTNAME}

# Add custom attributes to the node:

#node.attr.rack: r1

node.master: true

node.data: false

# ----------------------------------- Paths ------------------------------------

# Path to directory where to store the data (separate multiple locations by comma):

path.data: /var/data/es

# Path to log files:

path.logs: /var/log/es

# ----------------------------------- Memory -----------------------------------

# Lock the memory on startup:

bootstrap.memory_lock: true

# Make sure that the heap size is set to about half the memory available

# on the system and that the owner of the process is allowed to use this

# limit.

# Elasticsearch performs poorly when the system is swapping the memory.

# ---------------------------------- Network -----------------------------------

# Set the bind address to a specific IP (IPv4 or IPv6):

network.host: 132.98.16.176

# Set a custom port for HTTP:

#http.port: 9200

# For more information, consult the network module documentation.

# --------------------------------- Discovery ----------------------------------

# Pass an initial list of hosts to perform discovery when new node is started:

# The default list of hosts is ["127.0.0.1", "[::1]"]

discovery.zen.ping.unicast.hosts: ["132.98.16.176", "132.98.16.177", "132.98.16.179", "132.98.16.180", "132.98.16.182", "132.98.16.183", "132.98.16.184"]

# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):

discovery.zen.minimum_master_nodes: 3

# For more information, consult the zen discovery module documentation.

# ---------------------------------- Gateway -----------------------------------

# Block initial recovery after a full cluster restart until N nodes are started:

#gateway.recover_after_nodes: 3

# For more information, consult the gateway module documentation.

# ---------------------------------- Various -----------------------------------

# Require explicit names when deleting indices:

action.destructive_requires_name: true

需要配置的有：

cluster.name

node.name

node.master，定义节点是否为主节点

node.data

network.host

discovery.zen.ping.unicast.hosts，Elasticsearch集群节点列表

discovery.zen.minimum_master_nodes，构成集群的最少主节点数

在多台机器上部署Elasticsearch，然后依次启动，节点会自动发现，并构成一个集群。

集群小试

Elasticsearch提供了REST API和Java API。接下来我们使用REST API。使用API，我们可以：

检查集群、节点、索引健康、状态和一些统计信息

管理集群、节点、索引数据和元数据

执行CRUD

执行高级搜索，如分页、排序、过滤、执行脚本、聚合等等

集群健康

curl -XGET gd01:9200/_cat/health?v

epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent

1514533722 15:48:42 esbds green 3 3 20 10 0 0 0 0 - 100.0%

获取节点列表

curl -XGET gd01:9200/_cat/nodes?v

ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name

132.98.16.176 64 26 5 1.45 1.49 1.43 mdi * master-176

132.98.16.177 84 19 8 1.27 1.43 1.60 di - data-177

132.98.16.178 57 78 16 2.24 2.40 2.45 di - data-178

列举索引

curl -XGET gd01:9200/_cat/indices?v

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size

green open 20171228 ij-Y05EEQIimzEDYPyzvjw 5 1 7810000 3922446 2.6gb 1.3gb

green open 20171229 FUabFhc5TYyi4K_y81GJ9w 5 1 9905546 6122165 4.2gb 2.2gb

创建索引

curl -XPUT gd01:9200/test_idx?pretty

{

"acknowledged" : true,

"shards_acknowledged" : true,

"index" : "test_idx"

}

创建文档

在test_idx索引中创建类型为external，id为1的文档。

curl -XPUT gd01:9200/test_idx/external/1?pretty -d '

{

"name": "John Doe"

{

"_index" : "test_idx",

"_type" : "external",

"_id" : "1",

"_version" : 1,

"result" : "created",

"_shards" : {

"total" : 2,

"successful" : 2,

"failed" : 0

"created" : true

}

查询文档

curl -XGET gd01:9200/test_idx/external/1?pretty

{

"_index" : "test_idx",

"_type" : "external",

"_id" : "1",

"_version" : 1,

"found" : true,

"_source" : {

"name" : "John Doe"

}

bulk操作

批量创建文档

curl -XPOST gd01:9200/test_idx/external/_bulk?pretty -d '

{"index":{"_id":"1"}}

{"name": "John Doe" }

{"index":{"_id":"2"}}

{"name": "Jane Doe" }

bulk中的操作可以不一样

curl -XPOST gd01:9200/test_idx/external/_bulk?pretty -d '

{"update":{"_id":"1"}}

{"doc": { "name": "John Doe becomes Jane Doe" } }

{"delete":{"_id":"2"}}

查询

在Elasticsearch中，查询条件可以放在url中，也可以在请求体里。

url附带查询条件

curl -XGET gd01:9200/test_idx/external/_search?q=John

{

"_index" : "test_idx",

"_type" : "external",

"_id" : "1",

"_version" : 2,

"found" : true,

"_source" : {

"name" : "John Doe"

}

请求体中附带查询条件

curl -XPOST gd01:9200/test_idx/external/_search?pretty -d '

{

"query": {

"term": {

"name": "John Doe"

}

除了简单查询，Elasticsearch还支持：

过滤，请参考https://www.elastic.co/guide/en/elasticsearch/reference/5.6/_executing_filters.html

聚合，请参考https://www.elastic.co/guide/en/elasticsearch/reference/5.6/_executing_aggregations.html

最后编辑于：2017.12.29 18:05:43

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 201,784评论 5赞 474
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 84,745评论 2赞 378
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 148,702评论 0赞 335
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,229评论 1赞 272
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,245评论 5赞 363
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,376评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,798评论 3赞 393
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,471评论 0赞 256
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,655评论 1赞 295
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,485评论 2赞 318
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,535评论 1赞 329
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,235评论 3赞 318
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,793评论 3赞 304
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,863评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,096评论 1赞 258
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 42,654评论 2赞 348
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,233评论 2赞 341

Elasticsearch入门

推荐阅读更多精彩内容