This plugin exposes keyboard_layout
term suggester which suggests terms according to the switched keyboard layout.
шзрщту ч 64пи ⟶ iphone x 64gb
nt[yjkjus] ⟶ технології
dszdbo ytrfkmrb dsgflrfo ⟶ выявіў некалькі выпадкаў
тшлу rhjccjdrb runner 2 ⟶ nike кроссовки runner 2
;tcnrbq lbcr 1n, ⟶ жесткий диск 1тб
The following keyboard layouts are supported:
-
Russian
-
Ukrainian
-
Belarusian
Feel free to open a pull request with any other keyboard layouts.
This plugin may be used in combination with default term suggester which is based on string similarity in order to build a google-like search experience known as "did you mean?".
|
Please note that due to the serialization issue this plugin is available only for Elasticsearch 7.0.0 and above. |
In order to install the plugin, choose a version and run:
$ bin/elasticsearch-plugin install URL
where URL
points to zip file of the appropriate release which corresponds to your elasticsearch version.
❗
|
The plugin must be installed on every node in the cluster, and each node must be restarted after installation. |
E.g., command for Elasticsearch 7.6.0
# install plugin on Elasticsearch 7.6.0
$ bin/elasticsearch-plugin install https://github.com/papahigh/elasticsearch-keyboard-layout/raw/7.6.0/dist/keyboard-layout-7.6.0.zip
After installation this plugin will expose new token filter and term suggester named keyboard_layout
.
You can start using the keyboard_layout
suggester by providing the suggest part of a search request:
POST _search
{
"suggest": {
"text": "шЗрщту ЧЫ 64пи",
"keyboard_suggestion": {
"keyboard_layout": {
"field": "content",
"language": "russian",
"lowercase_token": true,
"preserve_case": true,
"add_original": false
}
}
}
}
In the response you should see the original start offset and length in the suggest text and if any found a switched keyboard layout options.
Each options array contains an option object that includes the suggested text and its document frequency. You may also request original token and its frequency by providing add_original
option.
{
"suggest": {
"keyboard_suggestion": [
{
"text": "шЗрщту",
"offset": 0,
"length": 6,
"options": [
{
"text": "iPhone",
"freq": 4,
"switch": true
}
]
},
{
"text": "ЧЫ",
"offset": 7,
"length": 2,
"options": [
{
"text": "XS",
"freq": 2,
"switch": true
}
]
},
{
"text": "64пи",
"offset": 10,
"length": 4,
"options": [
{
"text": "64gb",
"freq": 1,
"switch": true
}
]
}
]
}
...
}
Extension for go client github.com/olivere/elastic: https://github.com/aaerofeev/go-elasic-keyboard-layout
List of the supported suggester options is as follows:
text |
The suggest text. The suggest text is a required option that needs to be set globally or per suggestion. |
field |
The field to fetch the candidate suggestions from. This is an required option that either needs to be set globally or per suggestion. |
language |
The language of the keyboard layout. This is an required option. Available options are: |
analyzer |
The analyzer to analyse the suggest text with. Defaults to the whitespace analyzer. |
lowercase_token |
Lower cases terms before frequency evaluation and after the suggest analysis is done. Default is false. |
preserve_case |
Whether case should be preserved in the switched suggest options. When lower_case is set to true this option restores the original case. Defaults to false. |
min_freq |
The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified then the number cannot be fractional. The shard level document frequencies are used for this option. |
max_freq |
The maximum threshold in number of documents a suggest text token can exist in order to be included. Can be a relative percentage number (e.g 0.4) or an absolute number to represent document frequencies. If an value higher than 1 is specified then fractional can not be specified. Defaults to -1 and is not enabled. This can be used to exclude high frequency terms from switch keyboard suggestions. The shard level document frequencies are used for this option. |
add_original |
Whether original term and its frequency should be included in the suggest options. Default is false. |
Use the issue tracker and/or open pull requests.
This project is released under version 2.0 of the Apache Licence.