Skip to content
Jürgen Jakobitsch edited this page Aug 25, 2016 · 4 revisions
Website http://lucene.apache.org/solr/
Supported versions Apache Solr 6.1.0
Current responsible(s) Jürgen Jakobitsch @ SWC -- [email protected]
Docker image(s) bde2020/solr:latest
More info http://lucene.apache.org/solr/resources.html

Short description

Apache Solr is a fulltext indexing server based on Apache Lucene. It provides high availability by introducing distributed indexing, replication and load balancing. Apache Solr is highly customizable with respect to index configuration and text analysis. Distributed indexes are managed by Apache Zookeeper.

Example usage

_To make use of Apache Solr distributed cloud features it is highly recommended to organize configuration within Apache Zookeeper independently from a running analysis BDE pipeline and map directories to the hosting filesystem for enduring persistance. To setup a cloud index that is configured within Apache Zookeeper and distributed amongst several nodes within an arbitrary BDE cluster the below workflow needs to be followed

  • Extend bde2020/solr adding a solr-zk-init.json command (see below for an example), a solr-startup.json (see below) and adding a suitable solr configuration which is uploaded to an already running zookeeper instance
 [
        {
                "sh":"/app/server/scripts/cloud-scripts/zkcli.sh",
                "-zkhost":"192.168.88.219:2181/bde-solr-x",
                "-confname":"bde-solr-x",
                "-solrhome":"data",
                "-confdir":"/config/myconf",
                "-cmd":"upconfig"
        },
        {
                "sh":"/app/server/scripts/cloud-scripts/zkcli.sh",
                "-zkhost":"192.168.88.219:2181/bde-solr-x",
                "-cmd":"putfile",
                "/solr.xml":"/config/myconf/solr.xml"
        }
  ]
  • bde2020/solr includes solr-bin.py which reads the above json structure into a command runnable in a shell environment
  • for more information on the steps required configuration files in zookeeper please refer to: https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files
  • the above example adds a whole configuration directory to /config, which is then uploaded into zookeeper's "bde-solr-x" node.
  • additionally a required solr.xml is uploaded.
  • note that depending on the use case, additional file may be necessary
[
        {
                "/app/bin/solr":"start",
                "-f":"",
                "-cloud":"",
                "-s":"/data/",
                "-p":"8983",
                "-z":"192.168.88.219:2181,192.168.88.220:2181,192.168.88.221:2181/bde-solr-x"
        }
]
  • above solr-startup.json command is also parsed into a runnable shell command by solr-bin.py

  • the above example simply starts solr cloud on port 8983 with configuration from zookeeper which has been uploaded in the previous step.

  • to start the previously described steps in one go, it is only necessary to run the included "solr-init" bash script, which will look for the solr-zk-init.json and solr-startup.json in the /config directory.

  • Important note on creating a distributed collection based: Solr cloud collection creation depends on the all solr nodes running, so it is not possible to create a collection that is for example distributed amongst three nodes before not all three nodes are running. To create a distributed index it is therefor necessary to create a simple docker image, which only needs to

Scaling

Scaling distributed indexes can be achieved by creating cloud collections that are replicated and sharded amongst multiple server, for more information on scaling please refer to Apache Solr's documentation on said topic

Clone this wiki locally