title
Elasticsearch / Query

Elasticsearch / Query

新手上路 ?? {: #getting-started }

Exploring Your Data | Elasticsearch Reference [6.5] | Elastic

I’ve prepared a sample of fictitious JSON documents of customer bank account information. Each document has the following SCHEMA: (For the curious, this data was generated using www.json-generator.com/)

{
    "account_number": 0,
    "balance": 16623,
    "firstname": "Bradshaw",
    "lastname": "Mckenzie",
    "age": 29,
    "gender": "F",
    "address": "244 Columbus Place",
    "employer": "Euron",
    "email": "bradshawmckenzie@euron.com",
    "city": "Hobucken",
    "state": "CO"
}

You can download the sample dataset (accounts.json) from here. Extract it to our current directory and let’s load it into our cluster as follows: 走 Bulk API 將 accounts (JSON Lines) 上傳 1000 筆測試用數據。

$ curl -L -o accounts.json https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json?raw=true
$ head -4 accounts.json
{"index":{"_id":"1"}}
{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
{"index":{"_id":"6"}}
{"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}

$ curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@accounts.json"
$ curl "localhost:9200/_cat/indices?v"

health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   bank      vA1OFhDGR9GkXDM1EVbZpg   5   1       1000            0    103.8kb        103.8kb

The Search API | Elasticsearch Reference [6.5] | Elastic #ril
Introducing the Query Language | Elasticsearch Reference [6.5] | Elastic #ril
Executing Searches | Elasticsearch Reference [6.5] | Elastic #ril
Executing Filters | Elasticsearch Reference [6.5] | Elastic #ril

Search API ??

Search APIs | Elasticsearch Reference [6.5] | Elastic #ril
Search | Elasticsearch Reference [6.5] | Elastic #ril
- The search API allows you to execute a search query and get back SEARCH HITS that match the query. The query can either be provided using a simple QUERY STRING as a parameter (也就是 URI Search), or using a REQUEST BODY. 其中 search hits 指的是 match 的筆數。
- Or we can search across all available indices using _all: GET /_all/_search?q=tag:wow 若省略 _all/ 也是代表 all??
URI Search | Elasticsearch Reference [6.5] | Elastic #ril
Request Body Search | Elasticsearch Reference [6.5] | Elastic #ril
- Both HTTP GET and HTTP POST can be used to execute search with body. Since not all clients support GET with body, POST is allowed as well. 聽起來 POST 就比較合理。
Query | Elasticsearch Reference [6.5] | Elastic #ril
Sort | Elasticsearch Reference [6.5] | Elastic #ril
Source filtering | Elasticsearch Reference [6.5] | Elastic #ril
Fields | Elasticsearch Reference [6.5] | Elastic #ril
Script Fields | Elasticsearch Reference [6.5] | Elastic #ril
Doc value Fields | Elasticsearch Reference [6.5] | Elastic #ril
Post filter | Elasticsearch Reference [6.5] | Elastic #ril
Rescoring | Elasticsearch Reference [6.5] | Elastic #ril
Scroll | Elasticsearch Reference [6.5] | Elastic #ril
Preference | Elasticsearch Reference [6.5] | Elastic #ril
Version | Elasticsearch Reference [6.5] | Elastic #ril
min_score | Elasticsearch Reference [6.5] | Elastic #ril

Query DSL

Query DSL | Elasticsearch Reference [6.5] | Elastic
- Elasticsearch provides a full Query DSL (Domain Specific Language) based on JSON to define queries. ... consisting of two types of CLAUSES: Leaf, Compound
- Leaf query clauses look for a PARTICULAR VALUE IN A PARTICULAR FIELD, such as the match, term or range queries. These queries can be used by themselves. 相對於 compound query clause 用來組合其他 clause。
- Compound query clauses WRAP other leaf or compound queries and are used to COMBINE MULTIPLE QUERIES in a logical fashion (such as the bool or dis_max query), or to alter their behaviour (such as the constant_score query). 這裡 multiple queries 的說法可能會誤導? 而是一個 query 可以由 leaf/compound (query) clause 構成；不過官方的文件到處都是 XXX Query 這種說法，只要知道那指的是 query clause 即可。
- Query clauses behave differently depending on whether they are used in QUERY CONTEXT or FILTER CONTEXT.
Query and filter context | Elasticsearch Reference [6.5] | Elastic
- A query clause used in query context answers the question “How well does this document match this query clause?” Besides deciding whether or not the document matches, the query clause also calculates a _score representing HOW WELL the document matches, relative to other documents. Query context is in effect whenever a query clause is passed to a query parameter, such as the query parameter in the search API. 除了 match 條件之外，還是算出 score 以區分 match 程度的不同 => 嚴格來說，不是 "計算" score，而是 "會影響" score -- 因為條件成立/不成立，而加減一點分數。
- In filter context, a query clause answers the question “Does this document match this query clause?” The answer is a simple YES OR NO — no scores are calculated. Filter context is mostly used for filtering STRUCTURED DATA, e.g. Does this timestamp fall into the range 2015 to 2016? Is the status field set to "published"? 只有 match 與否，沒有程度的不同 => 嚴格來說，是 "不會影響" score。
- Frequently used filters will be CACHED AUTOMATICALLY by Elasticsearch, to speed up performance.
- Filter context is in effect whenever a query clause is passed to a filter parameter, such as the filter or must_not parameters in the bool query, the filter parameter in the constant_score query, or the filter aggregation. 通常文件會特別提示 "executed in filter context"，上面 bool 的 filter/must_not parameters 都有。
- Use query clauses in query context for conditions which SHOULD AFFECT the score of matching documents (i.e. how well does the document match), and use all other query clauses in filter context. 單純地過濾用 filter context，接著要分出程度不同才用 query context。
- Below is an example of query clauses being used in query and filter context in the search API. This query will match documents where all of the following conditions are met:
```
GET /_search
{
  "query": { <-- query context
    "bool": {
      "must": [
        { "match": { "title":   "Search"        }},
        { "match": { "content": "Elasticsearch" }}
      ],
      "filter": [ <-- filter context
        { "term":  { "status": "published" }},
        { "range": { "publish_date": { "gte": "2015-01-01" }}}
      ]
    }
  }
}
```
  The bool and two match clauses are used in query context, which means that they are USED TO SCORE how well each document matches. The term and range clauses are used in filter context. They will filter out documents which do not match, but they will NOT AFFECT THE SCORE for matching documents. 從最外層 query 開始，往下預設都是 query context，除非遇到 filter context，所以 bool 跟 match clause 都算在 filter context，但 term 跟 range clause 算在 filter context，至於 must 跟 filter 不是 query clause，只是 bool clause 的 paramter 而已。
Match All Query | Elasticsearch Reference [6.5] | Elastic 感覺只用在測試 query，實際上會用在哪??
- The most simple query, which matches all documents, giving them all a _score of 1.0.
```
GET /_search
{
    "query": {
        "match_all": {}
    }
}
```
- The _score can be changed with the boost parameter: 使用時機?
```
GET /_search
{
    "query": {
        "match_all": { "boost" : 1.2 }
    }
}
```
- Match None Query - This is the inverse of the match_all query, which matches no documents. 使用時機?
```
GET /_search
{
    "query": {
        "match_none": {}
    }
}
```

Full Text Query ??

Full text queries | Elasticsearch Reference [6.5] | Elastic #ril
Match Query | Elasticsearch Reference [6.5] | Elastic #ril
- match queries accept text/numerics/dates, analyzes them, and constructs a query. For example:
```
GET /_search
{
    "query": {
        "match" : {
            "message" : "this is a test"
        }
    }
}
```
  Note, message is the name of a field, you can substitute the name of any field instead. 但只能有一個 field??
Match Phrase Query | Elasticsearch Reference [6.5] | Elastic #ril
Match Phrase Prefix Query | Elasticsearch Reference [6.5] | Elastic #ril
Multi Match Query | Elasticsearch Reference [6.5] | Elastic #ril
Common Terms Query | Elasticsearch Reference [6.5] | Elastic #ril
Query String Query | Elasticsearch Reference [5.4] | Elastic #ril
- A query that uses a QUERY PARSER in order to parse its content. The query_string query parses the input and splits text around operators. Each TEXTUAL PART is ANALYZED INDEPENDENTLY of each other. For instance the following query: 其中 textual part 指的是去掉 operator 的結果。
```
GET /_search
{
    "query": {
        "query_string" : {
            "default_field" : "content",
            "query" : "(new york city) OR (big apple)"
        }
    }
}
```
  will be split into new york city and big apple and each part is then analyzed independently by the analyzer configured for the field.
- Whitespaces are not considered operators, this means that new york city will be passed "as is" to the analyzer configured for the field. If the field is a keyword field the analyzer will create a SINGLE TERM new york city and the QUERY BUILDER will use this term in the query. If you want to query each term separately you need to add explicit operators around the terms (e.g. new AND york AND city). 因為放在括號裡的關係?? 因為 default_operator 提到 "with a default operator of OR, the query capital of Hungary is translated to capital OR of OR Hungary"
- When multiple fields are provided it is also possible to modify how the different field queries are combined inside each textual part using the type parameter. The possible modes are described here and the default is best_fields. ??
- The query_string query can also run against multiple fields. Fields can be provided via the fields parameter. The idea of running the query_string query against multiple fields is to expand each query term to an OR clause like this: field1:query_term OR field2:query_term | ... 跟 Multi Match Query 有點像??
Simple Query String Query | Elasticsearch Reference [5.4] | Elastic #ril
- A query that uses the SimpleQueryParser to parse its context. Unlike the regular query_string query, the simple_query_string query will NEVER THROW AN EXCEPTION, and discards invalid parts of the query. 原來 query_string query 會丟錯，以 (new york city OR (big apple) 為例 (new york city 後少了 ))，會得到 HTTP 400：
```
{
  "error": {
    "root_cause": [
      {
        "type": "query_shard_exception",
        "reason": "Failed to parse query [(new york city OR (big apple)]",
        "index_uuid": "yPMioLAkRpaDWxO5aAWvog",
        "index": "myindex-1"
      },
      {
        "type": "query_shard_exception",
        "reason": "Failed to parse query [(new york city OR (big apple)]",
        "index_uuid": "C9qPMmhHQSeCCOfTKivmxw",
        "index": "myindex-2"
      }
    ]
  },
  ...
}
```
- Simple Query String Syntax 用 +/| 來表示 AND/OR，跟 Query String Query 不同??

Phrase Mathing ??

Starts-With Phrase Matching | Elastic (2013-02-04) #ril

Term Level Query ??

Term level queries | Elasticsearch Reference [6.5] | Elastic #ril
- While the full text queries will analyze the query string before executing, the term-level queries operate on the exact terms that are stored in the inverted index, and will normalize terms before executing only for keyword fields with normalizer property.
Keyword datatype | Elasticsearch Reference [6.5] | Elastic #ril

Scoring / Ranking / Relevance / Similarity {: #scoring }

How scoring works in Elasticsearch - Compose Articles (2016-02-18) #ril
What Is Relevance? | Elasticsearch: The Definitive Guide [2.x] | Elastic #ril
Theory Behind Relevance Scoring | Elasticsearch: The Definitive Guide [2.x] | Elastic #ril
Relevancy looks wrong - Getting consistent scoring | Elasticsearch Reference [6.5] | Elastic #ril
- If you notice that two documents with the same content get different scores or that an exact match is not ranked first, then the issue might be related to SHARDING. By default, Elasticsearch makes each shard responsible for producing ITS OWN SCORES.
- However since index statistics are an important contributor to the scores, this only works well if shards have SIMILAR INDEX STATISTICS. The ASSUMPTION is that since documents are routed EVENLY to shards by default, then index statistics should be very similar and scoring would work as expected. 其中 "routed evenly" 是根據什麼??
Relevance Is Broken! | Elasticsearch: The Definitive Guide [2.x] | Elastic #ril
- Don’t use dfs_query_then_fetch in production. It really isn’t required. Just HAVING ENOUGH DATA will ensure that your term frequencies are well distributed. There is no reason to add this extra DFS step to every query that you run. 不該用在 production，那這個 parameter 根本就不該存在!! 但初期資料量少時確實會是個問題，等資料量成長到一定再拆 shard??
Search Type | Elasticsearch Reference [6.5] | Elastic #ril
Understanding "Query Then Fetch" vs "DFS Query Then Fetch" | Elastic (2013-02-10) #ril
Practical BM25 - Part 1: How Shards Affect Relevance Scoring in Elasticsearch | Elastic (2018-04-19) #ril
Practical BM25 - Part 2: The BM25 Algorithm and its Variables | Elastic (2018-04-19) #ril
Practical BM25 - Part 3: Considerations for Picking b and k1 in Elasticsearch | Elastic (2018-04-19) #ril
Similarity module | Elasticsearch Reference [6.5] | Elastic #ril
TFIDFSimilarity (Lucene 6.0.1 API)
- docFreq - the number of documents which contain the term
- docCount - the total number of documents in the collection??
BM25 docFreq, docCount and avgFieldLength seems to be wrong · Issue #24429 · elastic/elasticsearch (2017-05-02) jimczi: (member)
- Don't forget that ES creates an index with 5 shards by default and that docFreq and docCount are computed PER SHARD.
- You can create an index with 1 shard or use the dfs mode to compute distributed stats: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch 併成 1 個 shard 未來會有問題吧??
fieldLength not an integer · Issue #25916 · elastic/elasticsearch #ril

Explain | Elasticsearch Reference [6.5] | Elastic #ril

Enables explanation for each hit on how its score was computed.

GET /_search
{
    "explain": true, <-- 加上這個即可
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

以 Explain | Elasticsearch Reference [6.5] | Elastic 提供的測試資料為例：

GET bank/_doc/32

{
  "_index": "bank",
  "_type": "_doc",
  "_id": "32",
  "_version": 1,
  "found": true,
  "_source": {
    "account_number": 32,
    "balance": 48086,
    "firstname": "Dillard",
    "lastname": "Mcpherson",
    "age": 34,
    "gender": "F",
    "address": "702 Quentin Street", <-- 先確認待會排名第 1 的資料
    "employer": "Quailcom",
    "email": "dillardmcpherson@quailcom.com",
    "city": "Veguita",
    "state": "IN"
  }
}

---

POST bank/_search
{
  "explain": true,
  "query": {
    "match": {
      "address": {
        "query": "street quentin",
        "operator": "and"
      }
    }
  }
}

---

{
  ...
  "_explanation": {
    "value": 5.9542274, <-- address:street (1.1056647) + address:quentin (4.8485627)
    "description": "sum of:",
    "details": [
      {
        "value": 1.1056647,
        "description": "weight(address:street in 0) [PerFieldSimilarity], result of:",
        "details": [
          {
            "value": 1.1056647, <-- idf (1.1064554) x tfNorm (0.99928534)
            "description": "score(doc=0,freq=1.0 = termFreq=1.0 ), product of:",
            "details": [
              {
                "value": 1.1064554,
                "description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                "details": [
                  {
                    "value": 63, <-- 有出現 street 的有 63 份，辨識度不高
                    "description": "docFreq",
                    "details": []
                  },
                  {
                    "value": 191, <-- 所在的 shard 有 191 份文件 (預設有 5 個 shard)
                    "description": "docCount",
                    "details": []
                  }
                ]
              },
              {
                "value": 0.99928534,
                "description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                "details": [
                  {
                    "value": 1,
                    "description": "termFreq=1.0",
                    "details": []
                  },
                  {
                    "value": 1.2,
                    "description": "parameter k1",
                    "details": []
                  },
                  {
                    "value": 0.75,
                    "description": "parameter b",
                    "details": []
                  },
                  {
                    "value": 2.9947643,
                    "description": "avgFieldLength",
                    "details": []
                  },
                  {
                    "value": 3, <-- 幾個 term ??
                    "description": "fieldLength",
                    "details": []
                  }
                ]
              }
            ]
          }
        ]
      },
      {
        "value": 4.8485627,
        "description": "weight(address:quentin in 0) [PerFieldSimilarity], result of:",
        "details": [
          {
            "value": 4.8485627,
            "description": "score(doc=0,freq=1.0 = termFreq=1.0 ), product of:",
            "details": [
              {
                "value": 4.8520303,
                "description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                "details": [
                  {
                    "value": 1,
                    "description": "docFreq",
                    "details": []
                  },
                  {
                    "value": 191,
                    "description": "docCount",
                    "details": []
                  }
                ]
              },
              {
                "value": 0.99928534,
                "description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                "details": [
                  {
                    "value": 1,
                    "description": "termFreq=1.0",
                    "details": []
                  },
                  {
                    "value": 1.2,
                    "description": "parameter k1",
                    "details": []
                  },
                  {
                    "value": 0.75,
                    "description": "parameter b",
                    "details": []
                  },
                  {
                    "value": 2.9947643,
                    "description": "avgFieldLength",
                    "details": []
                  },
                  {
                    "value": 3,
                    "description": "fieldLength",
                    "details": []
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
  }
}

Explain API | Elasticsearch Reference [6.5] | Elastic #ril
Customize relevance with Elasticsearch – Sravanthi Naraharisetti – Medium (2018-04-16) #ril
Ranking Evaluation API | Elasticsearch Reference [6.5] | Elastic #ril
Scores are not reproducible - Getting consistent scoring | Elasticsearch Reference [6.5] | Elastic 跟 replica 有關 #ril

Boosting ??

Elasticsearch Query-Time Strategies and Techniques for Relevance: Part II - Compose Articles (2016-03-31) #ril
Tuning Relevance in Elasticsearch with Custom Boosting – Marco Bonzanini (2015-06-22) #ril
Index Boost | Elasticsearch Reference [6.5] | Elastic
- Allows to configure different boost level per index when searching across more than one indices. This is very handy when hits coming from ONE INDEX MATTER MORE than hits coming from another index (think social graph where each user has an index).
- You can also specify it as an array to control the order of boosts.
```
GET /_search
{
    "indices_boost" : [
        { "alias1" : 1.4 },
        { "index*" : 1.3 }
    ]
}
```
- This is important when you use aliases or wildcard expression. If multiple matches are found, the first match will be used. For example, if an index is included in both alias1 and index*, boost value of 1.4 is applied. 一個 index 最多被 boost 一次。
- Boost 的值要怎麼給? 給了 { "wiki": 5 } 結果 score 從 2.96 衝到 14.80，它跟 score 計算式的關係是什麼?? 搭配 index per document type 的策略比較好發揮。
boost | Elasticsearch Reference [6.5] | Elastic #ril
Boosting Query | Elasticsearch Reference [6.5] | Elastic #ril
Lucene’s Practical Scoring Function | Elasticsearch: The Definitive Guide [2.x] | Elastic #ril

Pagination ??

From / Size | Elasticsearch Reference [6.5] | Elastic #ril

Highlighting ??

Highlighting | Elasticsearch Reference [6.5] | Elastic #ril

Suggestion ??

Elasticsearch: Building AutoComplete functionality – Hacker Noon (2017-12-31) #ril
Elasticsearch: Using Completion Suggester to build AutoComplete (2018-01-26) #ril
How to Build a “Did You Mean” Feature with Elasticsearch (2018-01-04) #ril
Suggesters | Elasticsearch Reference [6.5] | Elastic #ril
Term suggester | Elasticsearch Reference [6.5] | Elastic #ril

Aggregation ??

Practical guide to grouping results with elasticsearch (2016-06-06) #ril
Executing Aggregations | Elasticsearch Reference [6.5] | Elastic #ril
Aggregations | Elasticsearch Reference [6.5] | Elastic #ril

Security ??

Setting Up Field and Document Level Security | X-Pack for the Elastic Stack [6.2] | Elastic #ril

工具 {: #tools }

5 Best Elasticsearch GUI clients as of 2018 - Slant Postman 排名第一!? #ril
The Sense UI | Sense Documentation | Elastic Sense UI 是一個 Kibana app? #ril
ElasticHQ - Elasticsearch Management and Monitoring 看起來很專業，但偏向系統管理 #ril
mobz/elasticsearch-head: A web front end for an elastic search cluster #ril
appbaseio/dejavu: The Missing Web UI for Elasticsearch: Import, browse and edit data with rich filters and query views, create search UIs visually. #ril
jettro/elasticsearch-gui: An angularJS client for elasticsearch as a plugin #ril
appbaseio/mirage: GUI for simplifying Elasticsearch Query DSL #ril
ElasticSearch 1.7.2 Query DSL Builder #ril
danpaz/bodybuilder: An elasticsearch query body builder #ril
sudo-suhas/elastic-builder: A Node.js implementation of the elasticsearch Query DSL #ril
KunihikoKido/atom-elasticsearch-client: elasticsearch-client #ril
searchkit/searchkit: React UI components / widgets. The easiest way to build a great search experience with Elasticsearch. 提供前端 search 的 UI 元件 #ril

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

elasticsearch-query.md

elasticsearch-query.md

Elasticsearch / Query

新手上路 ?? {: #getting-started }

Search API ??

Query DSL

Full Text Query ??

Phrase Mathing ??

Term Level Query ??

Scoring / Ranking / Relevance / Similarity {: #scoring }

Boosting ??

Pagination ??

Highlighting ??

Suggestion ??

Aggregation ??

Security ??

工具 {: #tools }

Files

elasticsearch-query.md

Latest commit

History

elasticsearch-query.md

File metadata and controls

Elasticsearch / Query

新手上路 ?? {: #getting-started }

Search API ??

Query DSL

Full Text Query ??

Phrase Mathing ??

Term Level Query ??

Scoring / Ranking / Relevance / Similarity {: #scoring }

Boosting ??

Pagination ??

Highlighting ??

Suggestion ??

Aggregation ??

Security ??

工具 {: #tools }