title |
---|
Elasticsearch / Query |
Elasticsearch / Query
-
Exploring Your Data | Elasticsearch Reference [6.5] | Elastic
-
I’ve prepared a sample of fictitious JSON documents of customer bank account information. Each document has the following SCHEMA: (For the curious, this data was generated using www.json-generator.com/)
{ "account_number": 0, "balance": 16623, "firstname": "Bradshaw", "lastname": "Mckenzie", "age": 29, "gender": "F", "address": "244 Columbus Place", "employer": "Euron", "email": "[email protected]", "city": "Hobucken", "state": "CO" }
-
You can download the sample dataset (
accounts.json
) from here. Extract it to our current directory and let’s load it into our cluster as follows: 走 Bulk API 將 accounts (JSON Lines) 上傳 1000 筆測試用數據。$ curl -L -o accounts.json https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json?raw=true $ head -4 accounts.json {"index":{"_id":"1"}} {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"[email protected]","city":"Brogan","state":"IL"} {"index":{"_id":"6"}} {"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"[email protected]","city":"Dante","state":"TN"} $ curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@accounts.json" $ curl "localhost:9200/_cat/indices?v" health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open bank vA1OFhDGR9GkXDM1EVbZpg 5 1 1000 0 103.8kb 103.8kb
-
-
The Search API | Elasticsearch Reference [6.5] | Elastic #ril
-
Introducing the Query Language | Elasticsearch Reference [6.5] | Elastic #ril
-
Executing Searches | Elasticsearch Reference [6.5] | Elastic #ril
-
Executing Filters | Elasticsearch Reference [6.5] | Elastic #ril
-
Search | Elasticsearch Reference [6.5] | Elastic #ril
- The search API allows you to execute a search query and get back SEARCH HITS that match the query. The query can either be provided using a simple QUERY STRING as a parameter (也就是 URI Search), or using a REQUEST BODY. 其中 search hits 指的是 match 的筆數。
- Or we can search across all available indices using
_all
:GET /_all/_search?q=tag:wow
若省略_all/
也是代表 all??
-
Request Body Search | Elasticsearch Reference [6.5] | Elastic #ril
- Both HTTP GET and HTTP POST can be used to execute search with body. Since not all clients support GET with body, POST is allowed as well. 聽起來 POST 就比較合理。
-
Source filtering | Elasticsearch Reference [6.5] | Elastic #ril
-
Script Fields | Elasticsearch Reference [6.5] | Elastic #ril
-
Doc value Fields | Elasticsearch Reference [6.5] | Elastic #ril
-
Query DSL | Elasticsearch Reference [6.5] | Elastic
- Elasticsearch provides a full Query DSL (Domain Specific Language) based on JSON to define queries. ... consisting of two types of CLAUSES: Leaf, Compound
- Leaf query clauses look for a PARTICULAR VALUE IN A PARTICULAR FIELD, such as the
match
,term
orrange
queries. These queries can be used by themselves. 相對於 compound query clause 用來組合其他 clause。 - Compound query clauses WRAP other leaf or compound queries and are used to COMBINE MULTIPLE QUERIES in a logical fashion (such as the
bool
ordis_max
query), or to alter their behaviour (such as theconstant_score
query). 這裡 multiple queries 的說法可能會誤導? 而是一個 query 可以由 leaf/compound (query) clause 構成;不過官方的文件到處都是 XXX Query 這種說法,只要知道那指的是 query clause 即可。 - Query clauses behave differently depending on whether they are used in QUERY CONTEXT or FILTER CONTEXT.
-
Query and filter context | Elasticsearch Reference [6.5] | Elastic
-
A query clause used in query context answers the question “How well does this document match this query clause?” Besides deciding whether or not the document matches, the query clause also calculates a
_score
representing HOW WELL the document matches, relative to other documents. Query context is in effect whenever a query clause is passed to aquery
parameter, such as thequery
parameter in thesearch
API. 除了 match 條件之外,還是算出 score 以區分 match 程度的不同 => 嚴格來說,不是 "計算" score,而是 "會影響" score -- 因為條件成立/不成立,而加減一點分數。 -
In filter context, a query clause answers the question “Does this document match this query clause?” The answer is a simple YES OR NO — no scores are calculated. Filter context is mostly used for filtering STRUCTURED DATA, e.g. Does this
timestamp
fall into the range 2015 to 2016? Is thestatus
field set to "published"? 只有 match 與否,沒有程度的不同 => 嚴格來說,是 "不會影響" score。 -
Frequently used filters will be CACHED AUTOMATICALLY by Elasticsearch, to speed up performance.
-
Filter context is in effect whenever a query clause is passed to a
filter
parameter, such as thefilter
ormust_not
parameters in thebool
query, thefilter
parameter in theconstant_score
query, or thefilter
aggregation. 通常文件會特別提示 "executed in filter context",上面bool
的filter
/must_not
parameters 都有。 -
Use query clauses in query context for conditions which SHOULD AFFECT the score of matching documents (i.e. how well does the document match), and use all other query clauses in filter context. 單純地過濾用 filter context,接著要分出程度不同才用 query context。
-
Below is an example of query clauses being used in query and filter context in the search API. This query will match documents where all of the following conditions are met:
GET /_search { "query": { <-- query context "bool": { "must": [ { "match": { "title": "Search" }}, { "match": { "content": "Elasticsearch" }} ], "filter": [ <-- filter context { "term": { "status": "published" }}, { "range": { "publish_date": { "gte": "2015-01-01" }}} ] } } }
The
bool
and twomatch clauses
are used in query context, which means that they are USED TO SCORE how well each document matches. Theterm
andrange
clauses are used in filter context. They will filter out documents which do not match, but they will NOT AFFECT THE SCORE for matching documents. 從最外層query
開始,往下預設都是 query context,除非遇到 filter context,所以bool
跟match
clause 都算在 filter context,但term
跟range
clause 算在 filter context,至於must
跟filter
不是 query clause,只是bool
clause 的 paramter 而已。
-
-
Match All Query | Elasticsearch Reference [6.5] | Elastic 感覺只用在測試 query,實際上會用在哪??
-
The most simple query, which matches all documents, giving them all a
_score
of1.0
.GET /_search { "query": { "match_all": {} } }
-
The
_score
can be changed with theboost
parameter: 使用時機?GET /_search { "query": { "match_all": { "boost" : 1.2 } } }
-
Match None Query - This is the inverse of the
match_all
query, which matches no documents. 使用時機?GET /_search { "query": { "match_none": {} } }
-
-
Full text queries | Elasticsearch Reference [6.5] | Elastic #ril
-
Match Query | Elasticsearch Reference [6.5] | Elastic #ril
-
match queries accept text/numerics/dates, analyzes them, and constructs a query. For example:
GET /_search { "query": { "match" : { "message" : "this is a test" } } }
Note,
message
is the name of a field, you can substitute the name of any field instead. 但只能有一個 field??
-
-
Match Phrase Query | Elasticsearch Reference [6.5] | Elastic #ril
-
Match Phrase Prefix Query | Elasticsearch Reference [6.5] | Elastic #ril
-
Multi Match Query | Elasticsearch Reference [6.5] | Elastic #ril
-
Common Terms Query | Elasticsearch Reference [6.5] | Elastic #ril
-
Query String Query | Elasticsearch Reference [5.4] | Elastic #ril
-
A query that uses a QUERY PARSER in order to parse its content. The
query_string
query parses the input and splits text around operators. Each TEXTUAL PART is ANALYZED INDEPENDENTLY of each other. For instance the following query: 其中 textual part 指的是去掉 operator 的結果。GET /_search { "query": { "query_string" : { "default_field" : "content", "query" : "(new york city) OR (big apple)" } } }
will be split into
new york city
andbig apple
and each part is then analyzed independently by the analyzer configured for the field. -
Whitespaces are not considered operators, this means that
new york city
will be passed "as is" to the analyzer configured for the field. If the field is akeyword
field the analyzer will create a SINGLE TERMnew york city
and the QUERY BUILDER will use this term in the query. If you want to query each term separately you need to add explicit operators around the terms (e.g.new AND york AND city
). 因為放在括號裡的關係?? 因為default_operator
提到 "with a default operator ofOR
, the querycapital of Hungary
is translated tocapital OR of OR Hungary
" -
When multiple fields are provided it is also possible to modify how the different field queries are combined inside each textual part using the
type
parameter. The possible modes are described here and the default isbest_fields
. ?? -
The
query_string
query can also run against multiple fields. Fields can be provided via thefields
parameter. The idea of running thequery_string
query against multiple fields is to expand each query term to an OR clause like this:field1:query_term OR field2:query_term | ...
跟 Multi Match Query 有點像??
-
-
Simple Query String Query | Elasticsearch Reference [5.4] | Elastic #ril
-
A query that uses the
SimpleQueryParser
to parse its context. Unlike the regularquery_string
query, thesimple_query_string
query will NEVER THROW AN EXCEPTION, and discards invalid parts of the query. 原來query_string
query 會丟錯,以(new york city OR (big apple)
為例 (new york city
後少了)
),會得到 HTTP 400:{ "error": { "root_cause": [ { "type": "query_shard_exception", "reason": "Failed to parse query [(new york city OR (big apple)]", "index_uuid": "yPMioLAkRpaDWxO5aAWvog", "index": "myindex-1" }, { "type": "query_shard_exception", "reason": "Failed to parse query [(new york city OR (big apple)]", "index_uuid": "C9qPMmhHQSeCCOfTKivmxw", "index": "myindex-2" } ] }, ... }
-
Simple Query String Syntax 用
+
/|
來表示 AND/OR,跟 Query String Query 不同??
-
- Starts-With Phrase Matching | Elastic (2013-02-04) #ril
- Term level queries | Elasticsearch Reference [6.5] | Elastic #ril
- While the full text queries will analyze the query string before executing, the term-level queries operate on the exact terms that are stored in the inverted index, and will normalize terms before executing only for keyword fields with normalizer property.
- Keyword datatype | Elasticsearch Reference [6.5] | Elastic #ril
-
How scoring works in Elasticsearch - Compose Articles (2016-02-18) #ril
-
What Is Relevance? | Elasticsearch: The Definitive Guide [2.x] | Elastic #ril
-
Theory Behind Relevance Scoring | Elasticsearch: The Definitive Guide [2.x] | Elastic #ril
-
Relevancy looks wrong - Getting consistent scoring | Elasticsearch Reference [6.5] | Elastic #ril
- If you notice that two documents with the same content get different scores or that an exact match is not ranked first, then the issue might be related to SHARDING. By default, Elasticsearch makes each shard responsible for producing ITS OWN SCORES.
- However since index statistics are an important contributor to the scores, this only works well if shards have SIMILAR INDEX STATISTICS. The ASSUMPTION is that since documents are routed EVENLY to shards by default, then index statistics should be very similar and scoring would work as expected. 其中 "routed evenly" 是根據什麼??
-
Relevance Is Broken! | Elasticsearch: The Definitive Guide [2.x] | Elastic #ril
- Don’t use
dfs_query_then_fetch
in production. It really isn’t required. Just HAVING ENOUGH DATA will ensure that your term frequencies are well distributed. There is no reason to add this extra DFS step to every query that you run. 不該用在 production,那這個 parameter 根本就不該存在!! 但初期資料量少時確實會是個問題,等資料量成長到一定再拆 shard??
- Don’t use
-
Understanding "Query Then Fetch" vs "DFS Query Then Fetch" | Elastic (2013-02-10) #ril
-
Practical BM25 - Part 1: How Shards Affect Relevance Scoring in Elasticsearch | Elastic (2018-04-19) #ril
-
Practical BM25 - Part 2: The BM25 Algorithm and its Variables | Elastic (2018-04-19) #ril
-
Practical BM25 - Part 3: Considerations for Picking b and k1 in Elasticsearch | Elastic (2018-04-19) #ril
-
Similarity module | Elasticsearch Reference [6.5] | Elastic #ril
-
TFIDFSimilarity (Lucene 6.0.1 API)
docFreq
- the number of documents which contain the termdocCount
- the total number of documents in the collection??
-
BM25 docFreq, docCount and avgFieldLength seems to be wrong · Issue #24429 · elastic/elasticsearch (2017-05-02) jimczi: (member)
- Don't forget that ES creates an index with 5 shards by default and that
docFreq
anddocCount
are computed PER SHARD. - You can create an index with 1 shard or use the
dfs
mode to compute distributed stats: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch 併成 1 個 shard 未來會有問題吧??
- Don't forget that ES creates an index with 5 shards by default and that
-
fieldLength not an integer · Issue #25916 · elastic/elasticsearch #ril
-
Explain | Elasticsearch Reference [6.5] | Elastic #ril
-
Enables explanation for each hit on how its score was computed.
GET /_search { "explain": true, <-- 加上這個即可 "query" : { "term" : { "user" : "kimchy" } } }
-
以 Explain | Elasticsearch Reference [6.5] | Elastic 提供的測試資料為例:
GET bank/_doc/32 { "_index": "bank", "_type": "_doc", "_id": "32", "_version": 1, "found": true, "_source": { "account_number": 32, "balance": 48086, "firstname": "Dillard", "lastname": "Mcpherson", "age": 34, "gender": "F", "address": "702 Quentin Street", <-- 先確認待會排名第 1 的資料 "employer": "Quailcom", "email": "[email protected]", "city": "Veguita", "state": "IN" } } --- POST bank/_search { "explain": true, "query": { "match": { "address": { "query": "street quentin", "operator": "and" } } } } --- { ... "_explanation": { "value": 5.9542274, <-- address:street (1.1056647) + address:quentin (4.8485627) "description": "sum of:", "details": [ { "value": 1.1056647, "description": "weight(address:street in 0) [PerFieldSimilarity], result of:", "details": [ { "value": 1.1056647, <-- idf (1.1064554) x tfNorm (0.99928534) "description": "score(doc=0,freq=1.0 = termFreq=1.0 ), product of:", "details": [ { "value": 1.1064554, "description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:", "details": [ { "value": 63, <-- 有出現 street 的有 63 份,辨識度不高 "description": "docFreq", "details": [] }, { "value": 191, <-- 所在的 shard 有 191 份文件 (預設有 5 個 shard) "description": "docCount", "details": [] } ] }, { "value": 0.99928534, "description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:", "details": [ { "value": 1, "description": "termFreq=1.0", "details": [] }, { "value": 1.2, "description": "parameter k1", "details": [] }, { "value": 0.75, "description": "parameter b", "details": [] }, { "value": 2.9947643, "description": "avgFieldLength", "details": [] }, { "value": 3, <-- 幾個 term ?? "description": "fieldLength", "details": [] } ] } ] } ] }, { "value": 4.8485627, "description": "weight(address:quentin in 0) [PerFieldSimilarity], result of:", "details": [ { "value": 4.8485627, "description": "score(doc=0,freq=1.0 = termFreq=1.0 ), product of:", "details": [ { "value": 4.8520303, "description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:", "details": [ { "value": 1, "description": "docFreq", "details": [] }, { "value": 191, "description": "docCount", "details": [] } ] }, { "value": 0.99928534, "description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:", "details": [ { "value": 1, "description": "termFreq=1.0", "details": [] }, { "value": 1.2, "description": "parameter k1", "details": [] }, { "value": 0.75, "description": "parameter b", "details": [] }, { "value": 2.9947643, "description": "avgFieldLength", "details": [] }, { "value": 3, "description": "fieldLength", "details": [] } ] } ] } ] } ] } }
-
-
Customize relevance with Elasticsearch – Sravanthi Naraharisetti – Medium (2018-04-16) #ril
-
Ranking Evaluation API | Elasticsearch Reference [6.5] | Elastic #ril
-
Scores are not reproducible - Getting consistent scoring | Elasticsearch Reference [6.5] | Elastic 跟 replica 有關 #ril
-
Elasticsearch Query-Time Strategies and Techniques for Relevance: Part II - Compose Articles (2016-03-31) #ril
-
Tuning Relevance in Elasticsearch with Custom Boosting – Marco Bonzanini (2015-06-22) #ril
-
Index Boost | Elasticsearch Reference [6.5] | Elastic
-
Allows to configure different boost level per index when searching across more than one indices. This is very handy when hits coming from ONE INDEX MATTER MORE than hits coming from another index (think social graph where each user has an index).
-
You can also specify it as an array to control the order of boosts.
GET /_search { "indices_boost" : [ { "alias1" : 1.4 }, { "index*" : 1.3 } ] }
-
This is important when you use aliases or wildcard expression. If multiple matches are found, the first match will be used. For example, if an index is included in both
alias1
andindex*
, boost value of1.4
is applied. 一個 index 最多被 boost 一次。 -
Boost 的值要怎麼給? 給了
{ "wiki": 5 }
結果 score 從 2.96 衝到 14.80,它跟 score 計算式的關係是什麼?? 搭配 index per document type 的策略比較好發揮。
-
-
Boosting Query | Elasticsearch Reference [6.5] | Elastic #ril
-
Lucene’s Practical Scoring Function | Elasticsearch: The Definitive Guide [2.x] | Elastic #ril
- Elasticsearch: Building AutoComplete functionality – Hacker Noon (2017-12-31) #ril
- Elasticsearch: Using Completion Suggester to build AutoComplete (2018-01-26) #ril
- How to Build a “Did You Mean” Feature with Elasticsearch (2018-01-04) #ril
- Suggesters | Elasticsearch Reference [6.5] | Elastic #ril
- Term suggester | Elasticsearch Reference [6.5] | Elastic #ril
- Practical guide to grouping results with elasticsearch (2016-06-06) #ril
- Executing Aggregations | Elasticsearch Reference [6.5] | Elastic #ril
- Aggregations | Elasticsearch Reference [6.5] | Elastic #ril
- 5 Best Elasticsearch GUI clients as of 2018 - Slant Postman 排名第一!? #ril
- The Sense UI | Sense Documentation | Elastic Sense UI 是一個 Kibana app? #ril
- ElasticHQ - Elasticsearch Management and Monitoring 看起來很專業,但偏向系統管理 #ril
- mobz/elasticsearch-head: A web front end for an elastic search cluster #ril
- appbaseio/dejavu: The Missing Web UI for Elasticsearch: Import, browse and edit data with rich filters and query views, create search UIs visually. #ril
- jettro/elasticsearch-gui: An angularJS client for elasticsearch as a plugin #ril
- appbaseio/mirage: GUI for simplifying Elasticsearch Query DSL #ril
- ElasticSearch 1.7.2 Query DSL Builder #ril
- danpaz/bodybuilder: An elasticsearch query body builder #ril
- sudo-suhas/elastic-builder: A Node.js implementation of the elasticsearch Query DSL #ril
- KunihikoKido/atom-elasticsearch-client: elasticsearch-client #ril
- searchkit/searchkit: React UI components / widgets. The easiest way to build a great search experience with Elasticsearch. 提供前端 search 的 UI 元件 #ril