The built-in language analyzers are available globally and don’t need to be configured before being used. They can be specified directly in the field mapping:
PUT /my_index
{
"mappings": {
"blog": {
"properties": {
"title": {
"type": "string",
"analyzer": "english" (1)
}
}
}
}
}
-
The
title
field will use theenglish
analyzer instead of the defaultstandard
analyzer.
Of course, by passing text through the english
analyzer, we lose
information:
GET /my_index/_analyze?field=title (1)
I'm not happy about the foxes
-
Emits tokens:
i’m
,happi
,about
,fox
We can’t tell if the document mentions one fox
or many foxes
; the word
not
is a stopword and is removed, so we can’t tell whether the document is
happy about foxes or not. By using the english
analyzer, we have increased
recall as we can match more loosely, but we have reduced our ability to rank
documents accurately.
To get the best of both worlds, we can use multi-fields to
index the title
field twice: once with the english
analyzer and once with
the standard
analyzer:
PUT /my_index
{
"mappings": {
"blog": {
"properties": {
"title": { (1)
"type": "string",
"fields": {
"english": { (2)
"type": "string",
"analyzer": "english"
}
}
}
}
}
}
}
-
The main
title
field uses thestandard
analyzer. -
The
title.english
sub-field uses theenglish
analyzer.
With this mapping in place, we can index some test documents to demonstrate how to use both fields at query time:
PUT /my_index/blog/1
{ "title": "I'm happy for this fox" }
PUT /my_index/blog/2
{ "title": "I'm not happy about my fox problem" }
GET /_search
{
"query": {
"multi_match": {
"type": "most_fields" (1)
"query": "not happy foxes",
"fields": [ "title", "title.english" ]
}
}
}
-
Use the
most_fields
query type to match the same text in as many fields as possible.
Even though neither of our documents contain the word foxes
, both documents
are returned as results thanks to the word stemming on the title.english
field. The second document is ranked as more relevant, because the word not
matches on the title
field.