# mapping 的创建以及复杂 mapping 详解

本章会记录 3个 章节的笔记

    1. 初识搜索引擎_ mapping 的核心数据类型以及 dynamic mapping
    1. 初识搜索引擎_手动建立和修改 mapping 以及定制 string 类型数据是否分词
    1. 初识搜索引擎_ mapping 复杂数据类型以及 object 类型数据底层结构大揭秘

# 核心的数据类型

  • string
  • byte,short,integer,long
  • float,double
  • boolean
  • date

# dynamic mapping 规则

就是自动识别类型

  • true or false --> boolean
  • 123 --> long
  • 123.45 --> double
  • 2017-01-01 --> date
  • "hello world" --> string/text

# 查看 mapping

语法

GET /index/_mapping/type
1

# 如何建立索引时指定 mapping

语法如下

PUT /index
{
  "index": {
    "type":{
      "properties": {
        "field":{
          "type": "text", // 数据类型
          "index":"",    // 索引类型
          "analyzer": "english" // 分词类型
        }
      }
    }
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14

索引类型有如下值:

  • analyzed : 全文 full text
  • not_analyzed : 精准匹配 exact value
  • no :不索引

TIP

只能创建 index 时手动建立 mapping,或者新增 field mapping,但是不能 update field mapping

PUT /website
{
  "mappings": {
    "article": {
      "properties": {
        "author_id": {
          "type": "long"
        },
        "title": {
          "type": "text",
          "analyzer": "english"
        },
        "content": {
          "type": "text"
        },
        "post_date": {
          "type": "date"
        },
        "publisher_id": {
          "type": "text",
          "index": "not_analyzed"
        }
      }
    }
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

如果你尝试再次执行上面的语句就会看到报错了,不能修改

{
  "error": {
    "root_cause": [
      {
        "type": "index_already_exists_exception",
        "reason": "index [website/icXrvvkcRj6z4uNaNhf6uA] already exists",
        "index_uuid": "icXrvvkcRj6z4uNaNhf6uA",
        "index": "website"
      }
    ],
    "type": "index_already_exists_exception",
    "reason": "index [website/icXrvvkcRj6z4uNaNhf6uA] already exists",
    "index_uuid": "icXrvvkcRj6z4uNaNhf6uA",
    "index": "website"
  },
  "status": 400
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

# 新增字段 mapping

对于已经存在的字段不能修改 mapping,新增字段则可以指定

PUT /website/_mapping/article
{
  "properties" : {
    "new_field" : {
      "type" :    "string",
      "index":    "not_analyzed"
    }
  }
}
1
2
3
4
5
6
7
8
9

# 测试 mapping

可以使用如下语法进行查看 mapping 的分词效果,如

title 的 analyzer 是 english,可以看到把 a 干掉了

GET /website/_analyze
{
  "field": "title",
  "text": "a dogs"
}

--------------- 响应

{
  "tokens": [
    {
      "token": "dog",
      "start_offset": 2,
      "end_offset": 6,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
GET /website/_analyze
{
  "field": "content",
  "text": "my-dogs"
}
1
2
3
4
5

如果 index = not_analyzed 的话。使用该 api 就会报错;如 new_field 字段

GET website/_analyze
{
  "field": "new_field",
  "text": "my dogs"
}

--------------------------------- 响应

{
  "error": {
    "root_cause": [
      {
        "type": "remote_transport_exception",
        "reason": "[sEvAlYx][127.0.0.1:9300][indices:admin/analyze[s]]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "Can't process field [new_field], Analysis requests are only supported on tokenized fields"
  },
  "status": 400
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

mapping 复杂数据类型以及 object 类型数据底层结构大揭秘

# multivalue field

建立索引时与 string 是一样的,数据类型不能混

{ "tags": [ "tag1", "tag2" ]}
1

# empty field

主要是空值:null,[],[null]

# object field

对象类型的就比较复杂了,先来创建一个文档,再查看 es 自动为我们创建的 mapping 是什么样的

PUT /company/employee/1
{
  "address": {
    "country": "china",
    "province": "guangdong",
    "city": "guangzhou"
  },
  "name": "jack",
  "age": 27,
  "join_date": "2017-01-01"
}

1
2
3
4
5
6
7
8
9
10
11
12

GET /company/_mapping/employee 可以看到返回的数据嵌套很复杂了。

address 下面还有一个 properties ,那么 address 就是一个 object field

{
  "company": {
    "mappings": {
      "employee": {
        "properties": {
          "address": {
            "properties": {
              "city": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "country": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "province": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          },
          "age": {
            "type": "long"
          },
          "join_date": {
            "type": "date"
          },
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56

比如这一条数据,它在 es 里面被分词之后,可能是这样的

{
  "address": {
    "country": "china",
    "province": "guangdong",
    "city": "guangzhou"
  },
  "name": "jack",
  "age": 27,
  "join_date": "2017-01-01"
}

-----------------------------------

{
    "name":            [jack],
    "age":          [27],
    "join_date":      [2017-01-01],
    "address.country":         [china],
    "address.province":   [guangdong],
    "address.city":  [guangzhou]
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

上面的都是比较简单的内容,如果是稍微复杂一点的,就可能是下面这样了

{
    "authors": [
        { "age": 26, "name": "Jack White"},
        { "age": 55, "name": "Tom Jones"},
        { "age": 39, "name": "Kitty Smith"}
    ]
}

-------------------------------

{
    "authors.age":    [26, 55, 39],
    "authors.name":   [jack, white, tom, jones, kitty, smith]
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14