Elasticsearch-keyword和text类型

2017-02-01 约 524 字预计阅读 2 分钟

概述

关于 Elasticsearch 有很多中文博客多有介绍 keyword 和 text 的区别，基本都是源自于官方文档，笔者在这里把官方社区中的博客贴出来，并作简单翻译，供其他人参考。

Text vs. keyword

在 Elasticsearch 5.0 发布后，string 类型被正式去除。原因在于该类型总是会制造很多混乱，因为在 Elasticsearch 中有两种不同的方式来检索字符，既可以搜索全字符，又可以搜索词项 token。而一般前者的 mapping 是需要加上 not_analyzed，而后者是需要 analyzed。这一特性保留在了 Elasticsearch 6.1 当中。

为了避免这些误会，string 类型被分为 text 和 keyword 两种类型，前者作为全文检索，后者是关键词检索的类型。

New defaults

而默认的字符串动态 mapping 也做了修改，字符串会被动态的以 text 和 keyword 两种类型被索引。

如果需要索引以下字段：

1
2
3


{
  "foo": "bar"
}

动态 mapping 的效果为：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


{
  "foo": {
    "type" "text",
    "fields": {
      "keyword": {
        "type": "keyword",
        "ignore_above": 256
      }
    }
  }
}

当然你可以采用显示的方式来定义 mapping 以避免同时被索引成两种类型。

How to migrate

下面展示的是如何索引文档的字符型字段成需要的类型。

针对以前需要 analyzed 的 string 字段，可以简单的使用 text 和 index: true 来设定。

1
2
3
4
5
6


{
  "foo": {
    "type": "string",
    "index": "analyzed"
  }
}

1
2
3
4
5
6


{
  "foo": {
    "type": "text",
    "index": true
  }
}

针对以前需要 not_analyzed 的 string 字段，可以简单使用 keyword 和 index: true 来设定。

1
2
3
4
5
6


{
  "foo": {
    "type": "string",
    "index": "not_analyzed"
  }
}

1
2
3
4
5
6


{
  "foo": {
    "type": "keyword",
    "index": true
  }
}

参考资料

Strings are dead, long live strings!

警告

本文最后更新于 2017年2月1日，文中内容可能已过时，请谨慎参考。

💡赞赏支持

微信打赏

支付宝打赏

目录

Elasticsearch-keyword和text类型

概述

Text vs. keyword

New defaults

How to migrate

参考资料