Skip to content

Latest commit

 

History

History
494 lines (411 loc) · 34.6 KB

elasticsearch-query.md

File metadata and controls

494 lines (411 loc) · 34.6 KB
title
Elasticsearch / Query

新手上路 ?? {: #getting-started }

Search API ??

Query DSL

  • Query DSL | Elasticsearch Reference [6.5] | Elastic

    • Elasticsearch provides a full Query DSL (Domain Specific Language) based on JSON to define queries. ... consisting of two types of CLAUSES: Leaf, Compound
    • Leaf query clauses look for a PARTICULAR VALUE IN A PARTICULAR FIELD, such as the match, term or range queries. These queries can be used by themselves. 相對於 compound query clause 用來組合其他 clause。
    • Compound query clauses WRAP other leaf or compound queries and are used to COMBINE MULTIPLE QUERIES in a logical fashion (such as the bool or dis_max query), or to alter their behaviour (such as the constant_score query). 這裡 multiple queries 的說法可能會誤導? 而是一個 query 可以由 leaf/compound (query) clause 構成;不過官方的文件到處都是 XXX Query 這種說法,只要知道那指的是 query clause 即可。
    • Query clauses behave differently depending on whether they are used in QUERY CONTEXT or FILTER CONTEXT.
  • Query and filter context | Elasticsearch Reference [6.5] | Elastic

    • A query clause used in query context answers the question “How well does this document match this query clause?” Besides deciding whether or not the document matches, the query clause also calculates a _score representing HOW WELL the document matches, relative to other documents. Query context is in effect whenever a query clause is passed to a query parameter, such as the query parameter in the search API. 除了 match 條件之外,還是算出 score 以區分 match 程度的不同 => 嚴格來說,不是 "計算" score,而是 "會影響" score -- 因為條件成立/不成立,而加減一點分數。

    • In filter context, a query clause answers the question “Does this document match this query clause?” The answer is a simple YES OR NO — no scores are calculated. Filter context is mostly used for filtering STRUCTURED DATA, e.g. Does this timestamp fall into the range 2015 to 2016? Is the status field set to "published"? 只有 match 與否,沒有程度的不同 => 嚴格來說,是 "不會影響" score。

    • Frequently used filters will be CACHED AUTOMATICALLY by Elasticsearch, to speed up performance.

    • Filter context is in effect whenever a query clause is passed to a filter parameter, such as the filter or must_not parameters in the bool query, the filter parameter in the constant_score query, or the filter aggregation. 通常文件會特別提示 "executed in filter context",上面 boolfilter/must_not parameters 都有。

    • Use query clauses in query context for conditions which SHOULD AFFECT the score of matching documents (i.e. how well does the document match), and use all other query clauses in filter context. 單純地過濾用 filter context,接著要分出程度不同才用 query context。

    • Below is an example of query clauses being used in query and filter context in the search API. This query will match documents where all of the following conditions are met:

      GET /_search
      {
        "query": { <-- query context
          "bool": {
            "must": [
              { "match": { "title":   "Search"        }},
              { "match": { "content": "Elasticsearch" }}
            ],
            "filter": [ <-- filter context
              { "term":  { "status": "published" }},
              { "range": { "publish_date": { "gte": "2015-01-01" }}}
            ]
          }
        }
      }
      

      The bool and two match clauses are used in query context, which means that they are USED TO SCORE how well each document matches. The term and range clauses are used in filter context. They will filter out documents which do not match, but they will NOT AFFECT THE SCORE for matching documents. 從最外層 query 開始,往下預設都是 query context,除非遇到 filter context,所以 boolmatch clause 都算在 filter context,但 termrange clause 算在 filter context,至於 mustfilter 不是 query clause,只是 bool clause 的 paramter 而已。

  • Match All Query | Elasticsearch Reference [6.5] | Elastic 感覺只用在測試 query,實際上會用在哪??

    • The most simple query, which matches all documents, giving them all a _score of 1.0.

      GET /_search
      {
          "query": {
              "match_all": {}
          }
      }
      
    • The _score can be changed with the boost parameter: 使用時機?

      GET /_search
      {
          "query": {
              "match_all": { "boost" : 1.2 }
          }
      }
      
    • Match None Query - This is the inverse of the match_all query, which matches no documents. 使用時機?

      GET /_search
      {
          "query": {
              "match_none": {}
          }
      }
      

Full Text Query ??

  • Full text queries | Elasticsearch Reference [6.5] | Elastic #ril

  • Match Query | Elasticsearch Reference [6.5] | Elastic #ril

    • match queries accept text/numerics/dates, analyzes them, and constructs a query. For example:

      GET /_search
      {
          "query": {
              "match" : {
                  "message" : "this is a test"
              }
          }
      }
      

      Note, message is the name of a field, you can substitute the name of any field instead. 但只能有一個 field??

  • Match Phrase Query | Elasticsearch Reference [6.5] | Elastic #ril

  • Match Phrase Prefix Query | Elasticsearch Reference [6.5] | Elastic #ril

  • Multi Match Query | Elasticsearch Reference [6.5] | Elastic #ril

  • Common Terms Query | Elasticsearch Reference [6.5] | Elastic #ril

  • Query String Query | Elasticsearch Reference [5.4] | Elastic #ril

    • A query that uses a QUERY PARSER in order to parse its content. The query_string query parses the input and splits text around operators. Each TEXTUAL PART is ANALYZED INDEPENDENTLY of each other. For instance the following query: 其中 textual part 指的是去掉 operator 的結果。

      GET /_search
      {
          "query": {
              "query_string" : {
                  "default_field" : "content",
                  "query" : "(new york city) OR (big apple)"
              }
          }
      }
      

      will be split into new york city and big apple and each part is then analyzed independently by the analyzer configured for the field.

    • Whitespaces are not considered operators, this means that new york city will be passed "as is" to the analyzer configured for the field. If the field is a keyword field the analyzer will create a SINGLE TERM new york city and the QUERY BUILDER will use this term in the query. If you want to query each term separately you need to add explicit operators around the terms (e.g. new AND york AND city). 因為放在括號裡的關係?? 因為 default_operator 提到 "with a default operator of OR, the query capital of Hungary is translated to capital OR of OR Hungary"

    • When multiple fields are provided it is also possible to modify how the different field queries are combined inside each textual part using the type parameter. The possible modes are described here and the default is best_fields. ??

    • The query_string query can also run against multiple fields. Fields can be provided via the fields parameter. The idea of running the query_string query against multiple fields is to expand each query term to an OR clause like this: field1:query_term OR field2:query_term | ... 跟 Multi Match Query 有點像??

  • Simple Query String Query | Elasticsearch Reference [5.4] | Elastic #ril

    • A query that uses the SimpleQueryParser to parse its context. Unlike the regular query_string query, the simple_query_string query will NEVER THROW AN EXCEPTION, and discards invalid parts of the query. 原來 query_string query 會丟錯,以 (new york city OR (big apple) 為例 (new york city 後少了 )),會得到 HTTP 400:

      {
        "error": {
          "root_cause": [
            {
              "type": "query_shard_exception",
              "reason": "Failed to parse query [(new york city OR (big apple)]",
              "index_uuid": "yPMioLAkRpaDWxO5aAWvog",
              "index": "myindex-1"
            },
            {
              "type": "query_shard_exception",
              "reason": "Failed to parse query [(new york city OR (big apple)]",
              "index_uuid": "C9qPMmhHQSeCCOfTKivmxw",
              "index": "myindex-2"
            }
          ]
        },
        ...
      }
      
    • Simple Query String Syntax 用 +/| 來表示 AND/OR,跟 Query String Query 不同??

Phrase Mathing ??

Term Level Query ??

Scoring / Ranking / Relevance / Similarity {: #scoring }

Boosting ??

Pagination ??

Highlighting ??

Suggestion ??

Aggregation ??

Security ??

工具 {: #tools }