This repository contains project models for hands on lab sessions about elasticsearch.
Thanks to cloudbees for the answers branch build status:
First, you can download the latest version of Elasticsearch:
curl -OL -k http://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.0.1.zip
Modify config/elasticsearch.yml
file. Elasticsearch use by default:
elasticsearch
as cluster name- multicast discovery
# cluster.name: elasticsearch
cluster.name: handson
# discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.multicast.enabled: false
You can also download some nice GUI plugins:
- Marvel Plugin (free for dev)
- Kopf Plugin
bin/plugin -i elasticsearch/marvel/latest
bin/plugin -install lmenezes/elasticsearch-kopf
git clone https://github.com/elasticsearchfr/hands-on.git
mvn compile
mvn test
Tests should fail as you have to fill blanks!
Slides are available on Slideshare
When running NodeTest, you should bring to live an Elasticsearch node.
When successful, launch an external Elasticsearch node, using the optional download above and look at logs. You should see that when running test, both nodes see each other.
You can add the following line at the end of NodeTest:
Thread.sleep(120000);
And open in your browser: http://localhost:9200/_plugin/marvel/ to see one node, then both nodes and then only one node. Have a look also at http://localhost:9200/_plugin/kopf/
All tests are in IndexTest.java.
We index one beer in index meal, type beer: Heineken, PALE, 0.33 L, 3 EUROS. We have defined a Javabean to handle Beer documents.
We will:
- use Jackson to generate JSon from object
- index the beer
- get back the generated ID for this document
- ask Elasticsearch to get the document back
- deserialize it from JSon to javabean Beer
- check that sent and indexed objects are equals
- remove the beer document and check that it does not exist anymore
We index 1 000 beers with the Elasticsearch bulk feature.
We will:
- index all beers
- check that there is no failure
- remove all beers
You can increase the number of beer to see how fast is it on your personal computer.
Before each test we index 1000 beers. After each test, we remove them.
Tests are in SearchTest.java.
We search for all documents. We will learn to build a very simple query and see how we can retrieve a single hit from results.
Even if you build well the query, it will failed. Can you explain why and how to fix it?
We want to search for term "Heineken" in brand field. We will get 0 hit. Can you explain why?
We fix the previous test. We should get more than 0 beer.
We can search for HEINEKEN or HeiNEken in brand field. We should get more than 0 beer.
What is the main difference between term and match queries?
We search for HEINEKEN in all documents.
We can also use here a Lucene syntax. For example, you can also search for "+heineken -pale" or for "+heineken pale" and see the differences.
We build here a query based on beer price. We want to get beers with price between 5 and 10.
We build a boolean query to get HEINEKEN beers with price between 5 and 10.
We will check the first 10 hits that they answer to this rule.
This test is disabled by default. We will talk about it if we have enough time during the lab.
We use the same query as the previous exercise but we want to filter results with at least 1 litter beers.
We will check the first 10 hits that they answer to this rule.
We will ask for the first 100 results of "HEINEKEN pale". Let's see in logger how scoring order results.
We will ask for the first 100 results of "HEINEKEN pale^3". Let's see in logger how scoring order results. We also want to "highlight" some fields: colour and brand to see how it appears in the JSon raw output.
We want to run two queries at once:
- The first query is a queryString with "pale". We want only one result.
- The second query is a matchQuery on "brand" with "HEINEKEN". We want only one result.
We want to build Fuzzy Query and we want to find heinezken (mispelling).
Before each test we index 1000 beers. After each test, we remove them.
Tests are in FacetTest.java.
We want to count beers by brand. We will create a matchAll query and add a "bybrand" terms facet on field "brand".
We will check that the sum of all counts is equal to 1000.
We want to count beers by brand. We will search for brand equals to "heineken" and add a "bybrand" terms facet on field "brand".
We will check that the sum of all counts is less than 1000.
This test is disabled by default. We will talk about it if we have enough time during the lab.
We want to count beers by brand. We will search for all beers and apply a term filter on brand to get only "heineken" beers and we will add a "bybrand" terms facet on field "brand".
We will see that the sum of all counts is 1000 unless we only get less than 1000 results. Could you explain why?
Term Filter with Terms Facet on beer brand with filter - brand_termsFacet_withFilter_termFilter() - Ignored
This test is disabled by default. We will talk about it if we have enough time during the lab.
We want to count beers by brand. We will search for all beers and apply a term filter on brand to get only "heineken" beers and we will add a "bybrand" terms facet on field "brand" and we will reduce it with a the same filter as above.
We will check that the sum of all counts is less than 1000.
We want to get some statistics on beer prices. We will create a matchAll query and add a "byprice" range facet on field "price":
- price up to 3
- price between 3 and 6
- price more than 6
We will check that the sum of all counts is equal to 1000. Because all ranges are exclusive. What will happen if we use the following ranges:
- price up to 3
- price between 2 and 8
- price more than 6
TODO : Let's play with multiple nodes, with sharding and replicas. And let's monitor it with: