转载

elasticsearch-dsl:灵活的 Elasticsearch Python ORM

Query DSL 是 Elasticsearch 的查詢用 Domain-specific Language

可以當成是 Elasticsearch 的 SQL

只不過它實際上就是一堆 JSON

elasticsearch-dsl 就是官方發佈的一套用來操作 Query DSL 的 Python package

用起來有點像 Django 的 ORM

Installation

$ pip install elasticsearch-dsl

ref:

https://github.com/elastic/elasticsearch-dsl-py

https://elasticsearch-dsl.readthedocs.org/en/latest/index.html

Indice and Types

in app/documents.py

from elasticsearch_dsl import DocType, String, Boolean from elasticsearch_dsl.connections import connections connections.create_connection(hosts=['127.0.0.1', ]) class AlbumDoc(DocType):  upc = String(index='not_analyzed')  title = String(analyzer='ik', fields={'raw': String(index='not_analyzed')})  artist = String(analyzer='ik')  is_ready = Boolean()  class Meta:   index = 'dps'   doc_type = 'album'  @classmethod  def sync(cls, album):   album_doc = AlbumDoc(meta={'id': album.id})   album_doc.upc = album.get_upcs(output_str=False)   album_doc.title = album.name   album_doc.artist = album.artist.name   album_doc.is_ready = album.is_ready   album_doc.save()  def save(self, *args, **kwargs):   return super(AlbumDoc, self).save(*args, **kwargs)  def get_model_obj(self):   from svapps.dps.models import Album   return Album.objects.get(id=self.meta.id) # to create mappings AlbumDoc.init() 
一定要執行一次 YourDocType.init()

這樣 Elasticsearch 才會根據你的 DocType 產生對應的 mapping

否則 Elasticsearch 就會在你第一次倒資料進去的時候根據你的資料的 data type 建立對應的 mapping

所以 analyzer 之類的設定就會是預設的 standard

你可以透過 _mapping API 來檢查

http://127.0.0.1:9200/dps/_mapping/track

http://127.0.0.1:9200/dps/_mapping/album

需要全文搜尋的欄位要設為 analyzed (string 欄位默認都是 analyzed)

不需要全文搜尋的欄位,也就是要求精確的欄位,例如:username、email address、zip code,就可以設成 not_analyzed

但是你就不能對 analyzed 的欄位使用 term 了

除非你對該欄位額外再建立一個 raw 欄位

ref:

https://elasticsearch-dsl.readthedocs.org/en/latest/persistence.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html#CO59-2

Store Data

album_doc = AlbumDoc(meta={'id': 42}) album_doc.upc = ['887375000619', '887375502069'] album_doc.title = 'abc' album_doc.artist = 'xyz' album_doc.is_ready = True album_doc.save()  # 可以如常地 query,不用管它是不是 list search = AlbumDoc.search().filter('term', upc='887375000619') response = search.execute()

因為 Elasticsearch 是 schemaless

所以即使你定義了 String 欄位

還是可以存一個 list 進去

Search Data

search = TrackDoc.search() /  .filter('term', is_ready=True) /  .query('match', title=u'沒有的啊') search = TrackDoc.search() /  .filter('term', is_ready=True) /  .query(   Q('match', title='沒有的啊') & /   Q('match', artist='那我懂你意思了') & /   Q('match', album='沒有的, 啊!?')  ) q = Q(  'bool',  must=[   Q('match', title={'query': track_name, 'fuzziness': 'AUTO'}),  ],  should=[   Q('match', album={'query': album_name, 'minimum_should_match': '60%'}),   Q('match', artist={'query': artist_name, 'minimum_should_match': '80%'}),  ],  minimum_should_match=1 ) search = TrackDoc.search().filter('term', is_ready=True).query(q) q = Q(  'bool',  should=[   Q('term', isrc=q),   Q('term', upc=q),   Q('match', **{'title.raw': {'query': q}}),   Q('multi_match', query=q, fields=['title', 'artist', 'album']),  ], ) search = Search(index='dps', doc_type=['track', 'album']).query(q) search = search[:20] # print the raw Query DSL import uniout from pprint import pprint pprint(search.to_dict()) response = search.execute() print(response.hits.total) print(response[0].title) print(response[0].artist) print(response[0].album) print(response[0].is_ready) 

ref:

https://elasticsearch-dsl.readthedocs.org/en/latest/search_dsl.html
正文到此结束
Loading...