Elasticsearch

2021-03-29 2022-03-22

database

16 分钟读完 (大约 2459 个字) 0次访问

官网：https://www.elastic.co/cn/
ELK技术栈：Elasticsearch、Kibana、Logstash等

Elasticsearch

Elasticsearch是基于Lucene的全文检索技术，基于倒排索引，采用Rest风格API。

默认端口：9200用于http连接; 9300用于tcp连接

注意：安装需要保证elasticsearch、kibana、analysis_ik版本一致。

elasticsearch-head

es的管理界面，安装相关chrome拓展使用

项目地址：https://github.com/mobz/elasticsearch-head

Kibana

Kibana是一个基于Node.js的Elasticsearch索引库数据统计工具，可以利用Elasticsearch的聚合功能，生成各种图表，如柱形图，线状图，饼图等。默认端口5601。

IK Analysis

项目地址： https://github.com/medcl/elasticsearch-analysis-ik

Analyzer: ik_smart , ik_max_word
测试安装结果：打开kibana控制台，输入如下请求，若成功分词则成功安装

POST _analyze
{
  "analyzer": "ik_smart",
  "text":     "我是中国人"
}

操作索引indices

创建索引

PUT index
{
    "settings": {
    	# 分片数，副本数
        "number_of_shards": 3,	
        "number_of_replicas": 2
      },
	# 映射配置
	"mappings": {
  	}
}

查看索引

1 2	GET index GET *

删除索引

1	DELETE index

映射配置

elasticsearch7.x中移除了类型(Type)这个概念，要使用_doc占位。

创建映射字段

PUT /索引库名/_mapping/类型名称
{
  "properties": {
    "字段名": {
      "type": "类型", # 可以是text、long、short、date、integer、object等
      "index": true， # 是否索引
      "store": true， # 是否存储
      "analyzer": "分词器"
    }
  }
}

PUT index/_mapping/_doc
{
  "properties": {
    "title": {
      "type": "text",
      "analyzer": "ik_max_word"
    },
    "images": {
      "type": "keyword",
      "index": "false"
    },
    "price": {
      "type": "float"
    }
  }
}

查看映射关系

1	GET /索引库名/_mapping

字段属性

type: 字段类型
- String类型，又分两种：
  - text：可分词，不可参与聚合
  - keyword：不可分词，数据会作为完整字段进行匹配，可以参与聚合
- Numerical：数值类型，分两类
  - 基本数据类型：long、interger、short、byte、double、float、half_float
  - 浮点数的高精度类型：scaled_float
    - 需要指定一个精度因子，比如10或100。elasticsearch会把真实值乘以这个因子后存储，取出时再还原。
- Date：日期类型
  - elasticsearch可以对日期格式化为字符串存储，但是建议我们存储为毫秒值，存储为long，节省空间。
index: 是否索引
- 不索引，则不能用于搜索
- 默认为true，索引
store：是否将数据进行额外存储
- 默认为false，不进行额外存储
- es在创建索引时，会将文档所有数据保存到_source。因此，不论store为何值都可以搜到结果。
boost：激励因子

操作文档document

新增文档

单个

POST/PUT /index/_doc/id可写可不写
{
    "title":"小米手机"
    "price":2699.00
}

批量新增

POST /index/_doc/_bulk
{ "index":{} }
{ "title":"OnePlus8","price":3999 }
{ "index":{} }
{ "title":"OnePlus8 pro","price":4999 }

修改文档

根据id

PUT /index/_doc/3
{
    "title":"超大米手机"
    "price":3899.00
}

批量修改

POST goods/_update_by_query
{
  "script": {
    "inline": "ctx._source.price = params.price",
    "params": {
      "state": 9999
    }
  },
  "query": {"match_all": {}}
}

删除文档

根据id删

1	DELETE /索引库名/类型名/id值

批量删除

POST goods/_delete_by_query
{
  "query": {"match_all": {}}
}

查看文档

根据id查询

1	GET /index/_doc/3

查询

基本语法

GET /索引库名/_search
{
	# 查询
    "query":{
        "查询类型":{
            "查询条件":"查询条件值"
        }
    }
    
    # 设置要显示的字段
	"_source": ["field", ...], 
	
	# 排序
	"sort": [{"field": {"order": "desc|asc"}}, ...],
	
	# 分页
	"from": 从第几个开始
	"size": 每页显示几个
}

// or
GET index/type/_id
GET index/_search?field=value

查询query
- 查询类型：match_all， match，term ， range 等等
- 查询条件：文档field

_source设置要显示的field

直接指定要显示的字段"_source": ["field", ...],

使用includes和excludes

"_source": {
    "includes": "{field}",	# 需要的字段
    "excludes": "{field}"	# 不需要的字段
},

sort排序

"sort": [
    {"price": {"order": "desc"}},
    {"_score": {"order": "desc"}}
]

match_all查询所有

示例

GET index/_search
{
    "query":{
        "match_all": {}
    }
}

返回结果

{
  "took" : 1,					# 查询花费的时间, 单位ms
  "timed_out" : false,			# 是否超时
  "_shards" : {					# 分片信息
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {					# 搜索结果总览对象
    "total" : {					# 命中纪录数
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 1.0,			# 所有结果中最高文档得分
    "hits" : [
      {
        "_index" : "index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,			# 文档得分
        "_source" : {			# 源数据
          "title" : "超大米手机"
          "price" : 3899.0
        }
      },
      ...
    ]
  }
}

match查询

match类型查询，会把查询条件进行分词，然后进行查询，默认是or关系

or关系

GET index/_search
{
  "query": {
    "match": {
      "title": "小米手机"
    }
  }
}

and关系

GET index/_search
{
  "query": {
    "match": {
      "title": {
        "query": "小米手机",
        "operator": "and"
      }
    }
  }
}

multi_match查询

与match查询类似，不同在于它可以在多个字段中查询

# 在title和subTitle两个filed中查找
GET index/_search
{
  "query": {
    "multi_match": {
      "query": "小米手机",
      "fields": ["title", "subTitle"]
    }
  }
}

term词条匹配

term 查询被用于精确值匹配，这些精确值可能是数字、时间、布尔或者那些未分词的字符串

GET index/_search
{
    "query": {  
      "term": {
        "price": 3899.0
      }
    }
}

terms多词条精确匹配

terms 查询和 term 查询一样，但它允许你指定多值进行匹配。如果这个字段包含了指定值中的任何一个值，那么这个文档满足条件

GET index/_search
{
    "query": {  
      "terms": {
        "price": [2699.0, 3899.0]
      }
    }
}

querystring查询

GET index/_search
{
    "query": {
        "query_string": {
            "default_field": "title",
            "query": "小米手机"
        }
    }
}

range范围查询

range 查询找出那些落在指定区间内的数字或者时间

操作符
- gt、gte、lt、lte(大于、大于等于、小于、小于等于)

GET index/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 2000,
        "lte": 3000
      }
    }
  }
}

fuzzy模糊查询

fuzzy 查询是 term 查询的模糊等价。它允许搜索词条与实际词条的拼写出现偏差，但偏差的编辑距离不得超过2

GET index/_search
{
  "query": {
    "fuzzy": {
      "title": "appla"
    }
  }
}
# 能成功检索到apple

bool布尔组合查询

bool把各种其它查询通过must（与）、must_not（非）、should（或）的方式进行组合

GET index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "price": {
              "gt": 3000
            }
          }
        },
        {
          "match": {
            "title": "小米"
          }
        }
      ]
    }
  }
}

filter过滤

filter在bool中使用，在filter中还可以再次使用bool查询。

如果我们需要在查询结果中进行过滤，并且不希望过滤条件影响评分，那么就不要把过滤条件作为查询条件来用。而是使用filter方式。（所有的查询都会影响到文档的评分及排名。）

GET index/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "title": "小米手机"
          }
        }
      ],
      "filter": {
          "range": {
            "price": {
              "gte": 2000,
              "lte": 3000
            }
          }
        }
      }
  }
}

滚动查询

文档：https://www.elastic.co/guide/cn/elasticsearch/guide/current/scroll.html

使用原因：

es 默认翻页查询最多能查前10000 条数据（可修改）
数据量大，性能更好

// scroll 游标有效时间
GET /index/_search?scroll=1m 
{
    "query": { "match_all": {}},
    "size":  100
}

// scroll_id 上面查询返回的 _scroll_id
GET /_search/scroll
{
    "scroll": "1m", 
    "scroll_id" : ""
}

es null value

es 将不存在的值视为空值( 如：null , “” , [], {} )

GET /index/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "exists": { "field": "title"}
        }
      ]
    }
  }
}

聚合aggregations

基本概念

bucket桶：

桶的作用，是按照某种方式对数据进行分组，每一组数据在ES中称为一个桶，类似sql的分组。

Elasticsearch中提供的划分桶的方式有很多：

Date Histogram Aggregation：根据日期阶梯分组，例如给定阶梯为周，会自动每周分为一组
Histogram Aggregation：根据数值阶梯(interval)分组，与日期类似
Terms Aggregation：根据词条内容分组，词条内容完全匹配的为一组
Range Aggregation：数值和日期的范围分组，指定开始和结束，然后按段分组
……

分桶语法：

GET cars/_search
{
  "size": 0,			# 不显示查询内容，只显示聚合结果
  "aggs": {				# 聚合
    "NAME": {			# 聚合名
      "AGG_TYPE": {}	# 分桶方式
    }
  }
}

metrics度量：

分组完成以后，我们一般会对组中的数据进行聚合运算，例如求平均值、最大、最小、求和等，这些在ES中称为度量

比较常用的一些度量聚合方式：

Avg Aggregation：求平均值
Max Aggregation：求最大值
Min Aggregation：求最小值
Percentiles Aggregation：求百分比
Stats Aggregation：同时返回avg、max、min、sum、count等
Sum Aggregation：求和
Top hits Aggregation：求前几
Value Count Aggregation：求总数
……

聚合为桶

准备数据

PUT /cars
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "transactions": {
      "properties": {
        "color": {
          "type": "keyword"
        },
        "make": {
          "type": "keyword"
        }
      }
    }
  }
}

POST /cars/transactions/_bulk
{ "index": {}}
{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }

聚合

GET cars/_search
{
  "size": 0, 
  "aggs": {
    "popular_colors": {
      "terms": {
        "field": "color",
        "size": 10,			# 显示的桶数
        "order": {		
         	"_key": "asc"	# 根据聚合的field排序
        }
      }
    }
  }
}

桶内度量

聚合后会默认指挥返回每个桶里面的文档数量，通常我们需要跟复杂的文档度量。

这时我们就需要度量，在aggs中添加新的aggs，即桶内的聚合，可见度量也是一个聚合

求每个桶价格的平均值

GET cars/_search
{
  "size": 0, 
  "aggs": {
    "popular_colors": {
      "terms": {
        "field": "color",
        "size": 10
      },
      "aggs": {
        "avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

返回结果

...
  "aggregations" : {
    "popular_colors" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "red",
          "doc_count" : 4,
          "avg_price" : {
            "value" : 32500.0
          }
        },
        {
          "key" : "blue",
          "doc_count" : 2,
          "avg_price" : {
            "value" : 20000.0
          }
        },
        {
          "key" : "green",
          "doc_count" : 2,
          "avg_price" : {
            "value" : 21000.0
          }
        }
      ]
    }
  }

本文标题：Elasticsearch
本文作者：tcbaby
本文链接：http://tcbaby.github.io/2021/03/29/database/es/
发布时间：2021-03-29
版权声明：本博客所有文章除特别声明外，均采用 CC BY-NC-SA 4.0 许可协议。转载请注明出处！

database, es

Elasticsearch

Elasticsearch

Elasticsearch

elasticsearch-head

Kibana

IK Analysis

操作索引indices

相关概念

创建索引

查看索引

删除索引

映射配置

字段属性

操作文档document

新增文档

修改文档

删除文档

查看文档

查询

基本语法

match_all查询所有

match查询

multi_match查询

term词条匹配

terms多词条精确匹配

querystring查询

range范围查询

fuzzy模糊查询

bool布尔组合查询

filter过滤

滚动查询

es null value

聚合aggregations

基本概念

聚合为桶

桶内度量

喜欢这篇文章？打赏一下作者吧

评论

目录

Your browser is out-of-date!