Elasticsearch 索引学习

创建索引

创建索引的时候指定分片的个数：

http put :9200/indexsetting number_of_shards=1 number_of_replicas=1  {     "acknowledged": true }

映射配置

在我们手动配置映射之前，Elasticsearch 可以通过 json 来猜测文档结构当中的字段的类型。如下例子：

http post :9200/test/auto field1='20' field:=10  {     "_id": "AVHNbr0WRh7yMB73pVgC",     "_index": "test",     "_shards": {         "failed": 0,         "successful": 1,         "total": 2     },     "_type": "auto",     "_version": 1,     "created": true }  http :9200/test/auto/_mapping  {     "test": {         "mappings": {             "auto": {                 "properties": {                     "field": {                         "type": "long"                     },                     "field1": {                         "type": "string"                     }                 }             }         }     } }

可以看到 field 的类型是 long。当然我们也可以在创建索引的时候指定 numeric_detection 参数为 true 以开启更积极的文本检测。

//创建类型 notauto 的 mapping http put :9200/test/_mapping/notauto notauto:='{"numeric_detection":true}'  {     "acknowledged": true }  //添加文档 http post :9200/test/notauto f1='10' f2='20' {     "_id": "AVHNeiW1Rh7yMB73pVgG",     "_index": "test",     "_shards": {         "failed": 0,         "successful": 1,         "total": 2     },     "_type": "notauto",     "_version": 1,     "created": true }  //查看字段类型 http :9200/test/notauto/_mapping {     "test": {         "mappings": {             "notauto": {                 "numeric_detection": true,                 "properties": {                     "f1": {                         "type": "long"                     },                     "f2": {                         "type": "long"                     }                 }             }         }     } }

但是有个问题就是我们不能从强文本当中推测出布尔值，我们只能在映射定义中直接定义字段。

另外的一个类型是日期类型，我们也可以指定 "dynamic date formats" : ["yyyy-MM-dd hh:mm"] 这个参数可以接收的是一个数组。

禁止字段类型猜想

要关闭自动添加字段，可以把 dynamic 属性设置成 false。

http put :9200/test/_mapping/my my:='{"dynamic":false,"properties":{"ff1":{"type":"string"},"ff2":{"type":"string"}}}'  {     "acknowledged": true }  http  :9200/test/my/_mapping  {     "test": {         "mappings": {             "my": {                 "dynamic": "false",                 "properties": {                     "ff1": {                         "type": "string"                     },                     "ff2": {                         "type": "string"                     }                 }             }         }     } }

索引结构映射

如下例子：

cat posts.json  {     "mappings":{         "post": {             "properties": {                 "id" : {                     "type":"long",                     "store":"yes",                     "precision_step":"0"                 },                 "name" : {                     "type":"string",                     "store":"yes",                     "index":"analyzed"                 },                 "published" : {                     "type":"date",                     "store":"yes",                     "precision_step":"0"                 },                 "contents" : {                     "type":"string",                     "store":"no",                     "index":"analyzed"                 }             }         }     } }  http put :9200/posts < posts.json  {     "acknowledged": true }  http :9200/posts/_mapping  {     "posts": {         "mappings": {             "post": {                 "properties": {                     "contents": {                         "type": "string"                     },                     "id": {                         "precision_step": 1,                         "store": true,                         "type": "long"                     },                     "name": {                         "store": true,                         "type": "string"                     },                     "published": {                         "format": "strict_date_optional_time||epoch_millis",                         "precision_step": 1,                         "store": true,                         "type": "date"                     }                 }             }         }     } }

核心类型

string
number
date
boolean
binary

每个类型的公共属性

index_name：定义存储到索引中字段的名称，未定义则使用字段的名字
index：可以设置 analyzed 或 no，字符串类型还可以设置成 not analyzed。设置成 analyzed 该字段被编入搜索以提供搜索。如果设置成 no，将无法搜索该字段。默认是 analyzed，如果字符串类型设置成 not analyzed，那么意味着字段不经过分析直接编入索引，搜索的时候进行全匹配。
store：yes 或者 no，表示是否被写入索引。
boost：默认值是 1。定义了文档中该字段的重要性，值越高越重要。
null_value：如果该字段不是索引的一部分，那么属性的值指定写入索引的值。默认忽略该字段。
copy_to：指定一个字段，字眼的所有值都将复制到该指定字段。
include in all：此属性指定该字段是否应包括在 all字段当中，默认的情况所有字段都会包含在` all`当中。

字符串类型

字符串类型还可以使用如下属性：

term_vector ：此属性可以设置成 no、yes、 with_offsets 、 with_positions 、 with_positions_offsets 。定义是否计算该字段的 lucene 词向量，如果使用高亮，那就需要计算这个词向量。
omit_norms：该属性可以设置为 true 和 false。对于经过分析的字符串字段，默认值为 false，而对于未经过分析但已经存入索引的字符串字段，默认设置为 true。当属性为 true 的时候，禁止 lucene 对该字段的加权计算。
analyzer：定义索引和搜索的分析器名称。
index_analyzer：该属性定义创建索引的分析器名称。
search_analyer：定义查询时候的分析器名称。
norms.enabled：字段加权基准。默认是 true，未分析字段是 false。
norms.loading：可以设置成 eager 或 lazy。eager 表示此字段总是加载加权基准。lazy 是指定时候才加载。

数字类型

byte
short
integer
long
float
double

IP地址类型

可以把字段设置 ip 类型，来存放 ip 数据

批量操作

cat bulk.json  {"index":{"_index":"test", "_type":"bulk"}} {  "name":"rcx", "age":14} {"index":{"_index":"test", "_type":"bulk"}} { "name":"rcx1", "age":28}  http post :9200/test/bulk/_bulk < bulk.json {     "errors": false,     "items": [         {             "create": {                 "_id": "AVHOPSjBRh7yMB73pVgS",                 "_index": "test",                 "_shards": {                     "failed": 0,                     "successful": 1,                     "total": 2                 },                 "_type": "bulk",                 "_version": 1,                 "status": 201             }         },         {             "create": {                 "_id": "AVHOPSjBRh7yMB73pVgT",                 "_index": "test",                 "_shards": {                     "failed": 0,                     "successful": 1,                     "total": 2                 },                 "_type": "bulk",                 "_version": 1,                 "status": 201             }         }     ],     "took": 23 }

索引内部信息

每个文档都有自己的标识符和类型。文档存在两种内部标识符。

_uid：是索引中文档的唯一标识符，由文档的标识符和类型构成，此字段不需要设置，总是被索引。
_id：实际标识符，一般创建文档是时候会传入，如果不传入会自动生成一个。

** _type 字段**

默认情况下文档的类型也会编入索引，但是不会被分析也不会被存储。

** _all 字段**

Elasticsearch 使用 all 字段来存储其他字段中的数据便于搜索。当要执行简单的搜索功能，搜索所有数据，但是有不想去考虑字段名称之类的事情，这个字段很有用。默认情况下，` all 是启用的。 _all` 字段也可以完全禁止，或者排除某些字段。需要如下修改：

{     "book" : {         "_all" : {             "enabled" : "false"         },         "properties" : {             ...         }     } }

** _source 字段**

该字段存储原始 json 文档。默认情况下是开启的。如果不需要这个功能可以禁止，与 _all 禁止的方式相同。

** _index 字段 **

存储文档的索引信息。

** _size 字段**

默认不开启，这个字段使我们可以自动索引 _source 字段的原始大小，并且与文件一起存储。

** _timestamp 字段**

_ttl 字段

time to live，它允许定义文档的生命周期，周期结束后文档被自动删除。默认禁止此属性。

【参考资料】

Elasticsearch服务器开发

---EOF---

创建索引

映射配置

禁止字段类型猜想

索引结构映射

核心类型

每个类型的公共属性

字符串类型

数字类型

IP地址类型

批量操作

索引内部信息

_ttl 字段

热门推荐

相关文章

说给你听

本文目录

随机标签

书籍教程

近期评论

网站信息

其他链接

关于本站

问题交流