docker for Windows : 18.03.1-ce-win65 (17513)
springBoot : 2.2.2.RELEASE
springDataElasticSearch : 3.2.3
elasticSearch Image : 6.8.5
elasticSearch-analysis-ik : 6.8.5
mySql : 5.6.40-log
JDK : 1.8
gradle : 6.0.1
为什么要学习elasticSearch?因为快,因为能提供良好的中文分词,因为分布式,因为springBoot已经集成了。其实因为最近项目中我们对接了京东大约百万条商品数据,导致以前的一些查询出现十几秒加载的情况,让我重新进行了sql的优化( 拆分join,设置联合索引,异步请求 )使得我对索引进行了复习,并且想去了解搜索引擎与mysql全文索引的具体区别。这里我是用了docker + elasticSearch + springBoot来初步了解elasticsearch。
因为在dockers pull elasticsearch 的时候提示没有latest版本所以从docker hub上找到6.8.5来测试,这个版本比较稳定也比较新。
HTTP/1.1 200 OK content-type: application/json; charset=UTF-8 content-length: 578 { "tokens" : [ { "token" : "我", "start_offset" : 0, "end_offset" : 1, "type" : "<IDEOGRAPHIC>", "position" : 0 }, { "token" : "爱", "start_offset" : 1, "end_offset" : 2, "type" : "<IDEOGRAPHIC>", "position" : 1 }, { "token" : "中", "start_offset" : 2, "end_offset" : 3, "type" : "<IDEOGRAPHIC>", "position" : 2 }, { "token" : "国", "start_offset" : 3, "end_offset" : 4, "type" : "<IDEOGRAPHIC>", "position" : 3 } ] }
分词效果不好,和老外一样。
HTTP/1.1 200 OK content-type: application/json; charset=UTF-8 content-length: 424 { "tokens" : [ { "token" : "我", "start_offset" : 0, "end_offset" : 1, "type" : "CN_CHAR", "position" : 0 }, { "token" : "爱", "start_offset" : 1, "end_offset" : 2, "type" : "CN_CHAR", "position" : 1 }, { "token" : "中国", "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD", "position" : 2 } ] }
具体接入网上很多,只提一点,要使用IK分词器不能使用@Field这些注解,只能自己写JSON文件进行mapping:
@Getter @Mapping(mappingPath = "es_article_mapping.json") @Document(indexName = "article",type = "article") public class ArticleEsEntity { @Id private String id; private String title; private String content; private long createTime; public ArticleEsEntity(String title, String content) { this.id = System.nanoTime() + ""; this.title = title; this.content = content; this.createTime = System.currentTimeMillis(); } }
{ "article":{ "properties":{ "id":{ "type":"text" }, "create/_time":{ "type":"long" }, "content":{ "type":"text", "analyzer":"ik/_smart", "search/_analyzer":"ik/_smart", "fields":{ "keyword":{ "type":"keyword", "ignore/_above":10000 } } }, "title":{ "type":"text", "analyzer":"ik/_smart", "search/_analyzer":"ik/_smart", "fields":{ "keyword":{ "type":"keyword", "ignore/_above":256 } } } } } }
总共12w+的记录,mysql与elasticsearch都是。
另外:mysql的fullIndex不好分词哦~~~