Skip to content

Latest commit

 

History

History
201 lines (138 loc) · 5.07 KB

README.md

File metadata and controls

201 lines (138 loc) · 5.07 KB

Solr Load Balancing

This is just a summary of the modifications needed to enable load balancing on Solr in case i forgot in a few months how to do it.

Load balancing is made possible by using haproxy

Solr Replication is done by following the Solr Replication Guideline

Local

Once everything is set on the localhost:

  • haproxy is installed and configured accordingly to local/haproxy/haproxy.cfg
  • the replication is enabled on the solr master and configured on the solr slave instances

, the haproxy statistics can be seen on the page:

[]http://localhost/haproxy?stats](http://localhost/haproxy?stats)

I started the master solr with

   $ bin/solr start -p 8984

and the slave solr with

   $ bin/solr start -p 8985

Docker

Via Docker, it is easier to see what this example is about. In the directory docker/ there are a few things to take into consideration:

  • haproxy/haproxy.cfg takes into account the names of the containers solr-master and solr-slave for the network connectivity
  • in the file solr/slave/sortdemo/conf/solrconfig.xml the masterUrl for the replicationHandler is referencing solr-master host

Starting up the ecosystem is done via

  docker$ sudo docker-compose -f consul.yml up -d

This command will start the following instances:

  • docker_lb : the haproxy instance linked to consul
  • progrium/consul : the consul-template instance
  • gliderlabs/registrator : service registry bridge for Docker

The code required is taken with small modifications (for supporting solr master and slaves behind the same host) on the haproxy.tmpl file as it is from docker-consul-demo repository.

Once the load balancing infrastructure is in place, the next step is to load the solr master and slave instance(s).

  docker$ sudo docker-compose up -d

In order to scale the solr slave instance, just run the following command

  docker$ sudo docker-compose scale solr-slave=2

The exposed URLs are:

Once everything is setup, you should see the haproxy statistics:

haproxy stats

and consul statistics:

consul stats for the solr-slave instances

as well as executing solr queries

solr query

Test data


# query will be handled by solr-master
curl http://solr.127.0.0.1.xip.io/solr/sortdemo/update?commitWithin=3000 -d '
[
  {
    "id": "2014_1",
    "name": "Tyrion",
    "city": "New York",
    "year" : 2014,
    "month" : 1
  },
  {
    "id": "2015_2",
    "name": "John",
    "city": "New York",
    "year" : 2015,
    "month" : 2
  },
  {
    "id": "2014_2",
    "name": "Marc",
    "city": "San Francisco",
    "year" : 2014,
    "month" : 2
  },
  {
    "id": "2014_5",
    "name": "Mary",
    "city": "San Jose",
    "year" : 2014,
    "month" : 5
  },
  {
    "id": "2013_4",
    "name": "Barry",
    "city": "New Hampshire",
    "year" : 2013,
    "month" : 4
  },
  {
    "id": "2016_5",
    "name": "Betty",
    "city": "Cincinatti",
    "year" : 2015,
    "month" : 5
  }
]'


# query will be handled by solr-master
curl http://solr.127.0.0.1.xip.io/solr/sortdemo/update?commitWithin=3000 -d '
[
  {
    "id": "201x_1",
    "name": "No Year",
    "city": "New York",
    "month" : 1
  },
  {
    "id": "2014_x",
    "name": "No Month",
    "city": "New York",
    "year" : 2014
  }
]'


# query will be handled by one of the solr slaves
curl http://solr.127.0.0.1.xip.io/solr/sortdemo/select?indent=on&q=*:*&sort=city%20asc,%20name%20asc&wt=json

Open issues

The number of slaves used can be scaled up and down with a simple command on docker compose, but there are a few issues that should still be handled to make such a prototype production ready.

First of all, when scaling up the solr slaves, they are immediately accessible in the cluster, even though they are not warmed with the data existent on the master solr instance. This is why the number of results for a search query may vary very much, depending on the replication state of the solr slave.

Another point is that haproxy doesn't support at the time of this writing reconfiguration of the backend servers without a restart (this is why after scaling up / down the solr slaves can be seen a different PID for the haproxy process compared to the previous one). I suppose that such an operation comes also with dropped requests.

Projects used in coming up with this PoC