Now that we have some data stored in Elasticsearch, we can get to work on the business requirements for this application. The first requirement is the ability to retrieve individual employee data.
This is very easy in Elasticsearch. We simply execute an HTTP GET request and specify the ``address'' of the document — the index, type and id. Using those three pieces of information, we can return the original JSON document:
GET /megacorp/employee/1
And the response contains some metadata about the document, and John Smith’s
original JSON document as the _source
field:
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
}
In the same way that we changed the HTTP verb from PUT
to GET
in order to
retrieve the document, we could use the DELETE
verb to delete the document,
and the HEAD
verb to check whether or not the document exists. To replace an
existing document with an updated version, we just PUT
it again.
A GET
is fairly simple — you get back the document that you ask for. Let’s
try something a little more advanced, like a simple search!
The first search we will try is the simplest search possible. We will search for all employees, with this request:
GET /megacorp/employee/_search
You can see that we’re still using index megacorp
and type employee
, but
instead of specifying a document ID, we now use the _search
endpoint. The
response includes all three of our documents in the hits
array. By default,
a search will return the top 10 results.
{
"took": 6,
"timed_out": false,
"_shards": { ... },
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "3",
"_score": 1,
"_source": {
"first_name": "Douglas",
"last_name": "Fir",
"age": 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_score": 1,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_score": 1,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
}
}
Note
|
The response not only tells us which documents matched, but it also includes the whole document itself: all of the information that we need to display the search results to the user. |
Next, let’s try searching for employees who have Smith'' in their last name.
To do this, we’ll use a
lightweight'' search method which is easy to use
from the command line. This method is often referred to as a query string
search, since we pass the search as a URL query string parameter:
GET /megacorp/employee/_search?q=last_name:Smith
We use the same _search
endpoint in the path, and we add the query itself in
the q=
parameter. The results that come back show all Smith’s:
{
...
"hits": {
"total": 2,
"max_score": 0.30685282,
"hits": [
{
...
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
},
{
...
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
}
}
Query-string search is handy for ad hoc searches from the command line, but it has its limitations (see Search Lite). Elasticsearch provides a rich, flexible, query language called the Query DSL, which allows us to build much more complicated, robust queries.
The DSL (Domain Specific Language) is specified using a JSON request body. We can represent the previous search for all Smith’s like so:
GET /megacorp/employee/_search
{
"query" : {
"match" : {
"last_name" : "Smith"
}
}
}
This will return the same results as the previous query. You can see that a
number of things have changed. For one, we are no longer using query string
parameters, but instead a request body. This request body is built with JSON,
and uses a match
query (one of several types of queries, which we will learn
about later).
Let’s make the search a little more complicated. We still want to find all employees with a last name of ``Smith'', but we only want employees who are older than 30. Our query will change a little to accommodate a filter, which allows us to execute structured searches efficiently:
GET /megacorp/employee/_search
{
"query" : {
"filtered" : {
"filter" : {
"range" : {
"age" : { "gt" : 30 } (1)
}
},
"query" : {
"match" : {
"last_name" : "smith" (2)
}
}
}
}
}
-
This portion of the query is a
range
filter, which will find all ages older than 30 —gt
stands for ``greater than''. -
This portion of the query is the same
match
query that we used before.
Don’t worry about the syntax too much for now, we will cover it in great
detail later on. Just recognize that we’ve added a filter which performs a
range search, and reused the same match
query as before. Now our results
only show one employee who happens to be 32 and is named ``Jane Smith'':
{
...
"hits": {
"total": 1,
"max_score": 0.30685282,
"hits": [
{
...
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
}
}
The searches so far have been simple: single names, filtering by age. Let’s try a more advanced, full-text search — a task which traditional databases would really struggle with.
We are going to search for all employees who enjoy ``rock climbing'':
GET /megacorp/employee/_search
{
"query" : {
"match" : {
"about" : "rock climbing"
}
}
}
You can see that we use the same match
query as before to search the about
field for ``rock climbing''. We get back two matching documents:
{
...
"hits": {
"total": 2,
"max_score": 0.16273327,
"hits": [
{
...
"_score": 0.16273327, (1)
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
},
{
...
"_score": 0.016878016, (1)
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
}
}
-
The relevance scores.
By default, Elasticsearch sorts matching results by their relevance score,
that is: by how well each document matched the query. The first and highest
scoring result is obvious: John Smith’s about
field clearly says ``rock
climbing'' in it.
But why did Jane Smith, come back as a result? The reason her document was
returned is because the word rock'' was mentioned in her
rock'' was mentioned, and not about
field.
Because only `climbing'', her `_score
is
lower than John’s.
This is a good example of how Elasticsearch can search within full text fields and return the most relevant results first. This concept of relevance is important to Elasticsearch, and is a concept that is completely foreign to traditional relational databases where a record either matches or it doesn’t.
Finding individual words in a field is all well and good, but sometimes you
want to match exact sequences of words or phrases. For instance, we could
perform a query that will only match employees that contain both rock''
and
climbing'' and where the words are next to each other in the phrase
``rock climbing''.
To do this, we use a slight variation of the match
query called the
match_phrase
query:
GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
}
}
Which, to no surprise, returns only John Smith’s document:
{
...
"hits": {
"total": 1,
"max_score": 0.23013961,
"hits": [
{
...
"_score": 0.23013961,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
}
]
}
}
Many applications like to highlight snippets of text from each search result so that the user can see why the document matched their query. Retrieving highlighted fragments is very easy in Elasticsearch.
Let’s rerun our previous query, but add a new highlight
parameter:
GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
},
"highlight": {
"fields" : {
"about" : {}
}
}
}
When we run this query, the same hit is returned as before, but now we get a
new section in the response called highlight
. This contains a snippet of
text from the about
field with the matching words wrapped in <em></em>
HTML tags:
{
...
"hits": {
"total": 1,
"max_score": 0.23013961,
"hits": [
{
...
"_score": 0.23013961,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
},
"highlight": {
"about": [
"I love to go <em>rock</em> <em>climbing</em>" (1)
]
}
}
]
}
}
-
The highlighted fragment from the original text.
You can read more about the highlighting of search snippets in the {ref}search-request-highlighting.html[highlighting reference documentation].