Contents
The recipe configures an instance of the Solr indexing server. Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface
Git Repository and issue tracker: https://github.com/collective/collective.recipe.solrinstance
- This version of the recipe supports Solr 3.5, 4.x (including 4.0, 4.1, 4.2 and 4.5). Please use a release from the 2.x series if you are using Solr 1.4.
- This version supports Genshi templates only. Please use a release less than 5.x if you require Cheetah templating and do not require Python 3 support. If you require Python 3 support, you must convert any custom templates to use the Genshi text templating language.
The recipe supports the following options.
- solr-location
- Path to the location of the Solr installation. This should be the top-level installation directory.
- host
- Name or IP address of the Solr server, e.g. some.server.com.
Defaults to
localhost
. - port
- Server port. Defaults to
8983
. - basepath
Base path to the Solr service on the server. The final URL to the Solr service will be made of:
$host:$port/$basepath
to which the actual commands will be appended. Defaults to
/solr
.- vardir
- Optional override for the location of the directory where Solr
stores its indexes and log files. Defaults to
${buildout:directory}/var/solr
. This option and thescript
option make it possible to create multiple Solr instances in a single buildout and dedicate one or more of the instances to automated functional testing. - logdir
- Optional override for the location of the Solr logfiles.
Defaults to
${buildout:directory}/var/solr
. - pidpath
- Optional override for the location of the Solr pid file.
Defaults to
${buildout:directory}/var/solr
. - jetty-template
- Optional override for the
jetty.xml
template. Defaults totemplates/jetty.xml.tmpl
. - log4j-template
- Optional override for the
log4j.properties
template. Defaults totemplates/log4j.properties.tmpl
. - logging-template
- Optional override for the
logging.properties
template. Defaults totemplates/logging.properties.tmpl
. - jetty-destination
- Optional override for the directory where the
jetty.xml
file will be generated. Defaults to the Solr default location. - extralibs
Optional includes of custom Java libraries. The option takes a path and a regular expression per line separated by a colon. The regular expression is optional and defaults to
.*\.jar
(all jar-files in a directory). Example:extralibs = /my/global/java/path some/special/libs:.*\.jarx
- script
- Optional override for the name of the generated Solr instance
control script. Defaults to
solr-instance
. This option and thevardir
option make it possible to create multiple Solr instances in a single buildout and dedicate one or more of the instances to automated functional testing. - java_opts
Optional. Parameters to pass to the Java Virtual Machine (JVM) used to run Solr. Each option is specified on a separated line. For example:
[solr-instance] ... java_opts = -Xms512M -Xmx1024M ...
- config-destination
- Optional override for the directory where the
solrconfig.xml
file will be generated. Defaults to the Solr default location. - config-template
- Optional override for the template used to generate the
solrconfig.xml
file. Defaults to the template contained in the recipe, i.e.templates/solrconfig.xml.tmpl
. - max-num-results
- The maximum number of results the Solr server returns. This sets the
rows
option for the request handlers. Defaults to 500. - maxWarmingSearchers
- Maximum number of searchers that may be warming in the background.
Defaults to
4
. For read-only slaves recommend to set to1
or2
. - useColdSearcher
- If a request comes in without a warm searcher available, immediately use
one of the warming searchers to handle the request. Defaults to
false
. - mergeFactor
- Specify the index defaults merge factor. This value determines how many
segments of equal size exist before being merged to a larger segment. With
the default of
10
, nine segments of 1000 documents will be created before they are merged into one containing 10000 documents, which in turn will be merged into one containing 100000 documents once that size is reached. - ramBufferSizeMB
- Sets the amount of RAM that may be used by Lucene indexing for buffering added documents and deletions before they are flushed to the directory. Defaults to 16mb.
- unlockOnStartup
- If
true
(the recipes default), unlock any held write or commit locks on startup. This defeats the locking mechanism that allows multiple processes to safely access a Lucene index. - abortOnConfigurationError
- If set to
true
, the Solr instance will not start up if there are configuration errors. This is useful in development environments to debug potential issues with schema and solrconfig. Defaults tofalse
. - spellcheckField
- Configures the field used as a source for the spellcheck search component.
Defaults to
default
. - autoCommitMaxDocs
- Lets you enable auto commit handling and force a commit after at least the number of documents were added. This is disabled by default.
- autoCommitMaxTime
- Lets you enable auto commit handling after a specified time in milliseconds. This is disabled by default.
- requestParsers-multipartUploadLimitInKB
- Optional
<requestParsers />
parameter useful if you are submitting very large documents to Solr. May be the case if Solr is indexing binaries extracted from request. - directoryFactory
- Solr4 allows for different directoryFactories: solr.StandardDirectoryFactory, solr.MMapDirectoryFactory, solr.NIOFSDirectoryFactory, solr.SimpleFSDirectoryFactory, sorl.RAMDirectoryFactory or solr.NRTCachingDirectoryFactory. The default is: solr.NRTCachingDirectoryFactory If you are running a solr-instance for unit-testing of an application it could be useful to use solr.RAMDirectoryFactory.
- additional-solrconfig
- Optional additional configuration to be included inside the
solrconfig.xml
. For instance,<requestHandler />
directives. - additional-solrconfig-query
- Optional additional configuration to be included inside the
query section of
solrconfig.xml
. For instance,<listener />
directives.
Fine grained control of query caching as described at http://wiki.apache.org/solr/SolrCaching.
The supported options are:
filterCacheSize
filterCacheInitialSize
filterCacheAutowarmCount
queryResultCacheSize
queryResultCacheInitialSize
queryResultCacheAutowarmCount
documentCacheSize
documentCacheInitialSize
documentCacheAutowarmCount
(only for Solr 4)
- schema-destination
- Optional override for the directory where the
schema.xml
file will be generated. Defaults to the Solr default location. - schema-template
- Optional override for the template used to generate the
schema.xml
file. Defaults to the template contained in the recipe, i.e.templates/schema.xml.tmpl
. - stopwords-template
- Optional override for the template used to generate the
stopwords.txt
file. Defaults to the template contained in the recipe, i.e.templates/stopwords.txt.tmpl
. - extra-field-types
- Configure the extra field types available to be used in the
index
option. You can create custom field types with special analyzers and tokenizers, check Solr's complete reference: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters - extra-conf-files
- Add extra files to conf folder like synonyms.txt or hunspell files https://wiki.apache.org/solr/Hunspell
- filter
Configure filters for analyzers for the default field types. These accept tokens produced by a given
tokenizer
and process them in series to either add, change or remove tokens. After all filters have been applied, the resulting token stream is indexed into the given field.This option applies to the default analyzer for a given field -- by default, Solr considers this to apply to both
query
andindex
analyzers. If you want to configure separate analyzers, see thefilter-query
andfilter-index
options below.Each filter is configured on a separated line and each filter will be applied to tokens (during Solr operation) in the order specified.
Each line should read like:
text solr.EdgeNGramFilterFactory minGramSize="2" maxGramSize="15" side="front"
In the above example:
text
is thetype
, one of the built-in field types;solr.EdgeNGramFilterFactory
is theclass
for this filter; andminGramSize="2" maxGramSize="15" side="front"
are the parameters for the filter's configuration. They should be formatted as XML attributes.
By default, for the default analyzer (being both
query
andindex
):text
fields are filtered using:solr.ICUFoldingFilterFactory
solr.WordDelimiterFilterFactory
solr.TrimFilterFactory
solr.StopFilterFactory
To suppress default behaviour, configure the
filter
option accordingly. If you want no filters, then setfilter =
(as an empty option) in your Buildout configuration. This is useful in the situation where you want no default filters and want full control over specifying filters on a per-analyzer basis.Check the available filters in Solr's documentation: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#TokenFilterFactories
- filter-query
Configure filters for default field types for
query
analyzers only. This option is likefilter
but only applies to thequery
analyzer for a given field.Configuration syntax is the same as the
filter
option above. Options specified here will be added after any that apply from usage of the mainfilter
option.- filter-index
Configure filters for default field types for
index
analyzers only. This option is likefilter
but only applies to theindex
analyzer for a given field.Configuration syntax is the same as the
filter
option above. Options specified here will be added after any that apply from usage of the mainfilter
option.- char-filter
Configure character filters (
CharFilterFactories
) for analyzers for the default field types. These are pre-processors for input characters in Solr fields or queries (consuming and producing a character stream) that can add, change or remove characters while preserving character position informationThis option applies to the default analyzer for a given field -- by default, Solr considers this to apply to both
query
andindex
analyzers. If you want to configure separate analyzers, see thechar-filter-query
andchar-filter-index
options below.Each char filter is configured on a separated line, following the same configuration syntax as the
filter
option above. Each char filter will be applied to tokens (during Solr operation) in the order specified.By default, no char filters are specified for any analyzers.
Information about available character filters is available in Solr's documentation: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories
- char-filter-query
Configure character filters for default field types for
query
analyzers only. This option is likechar-filter
but only applies to thequery
analyzer for a given field type.Configuration syntax is the same as the
filter
option above. Options specified here will be added after any that apply from usage of the mainchar filter
option.- char-filter-index
Configure character filters for default field types for
index
analyzers only. This option is likechar-filter
but only applies to theindex
analyzer for a given field type.Configuration syntax is the same as the
filter
option above. Options specified here will be added after any that apply from usage of the mainchar filter
option.- tokenizer
Configure tokenizers for analyzers for the default field types.
This option applies to the default analyzer for a given field -- by default, Solr considers this to apply to both
query
andindex
analyzers. If you want to configure separate analyzers, see thetokenizer-query
andtokenizer-index
options below.Each tokenizer is configured on a separated line, following the same configuration syntax as the
filter
option above. Only one tokenizer may be specified per analyzer type for a given field type. If you specify multiple tokenizers for the same field type, the last one specified will take precedence.By default, for the default analyzer (being both
query
andindex
):text
fields are tokenized usingsolr.ICUTokenizerFactory
text_ws
fields are tokenized usingsolr.WhitespaceTokenizerFactory
- tokenizer-query
Configure a tokenizer for default field types for
query
analyzers only. This option is liketokenizer
, but only applies to thequery
analyzer for a given field type.Configuration syntax is the same as the
filter
option above. Options specified here will overide any that apply from usage of the maintokenizer
option. For instance, if you specified atext_ws
tokenizer within thetokenizer
option, and re-specify anothertext_ws
tokenizer here, then this will take precedence. Other field types will not be affected if not overriden.- tokenizer-index
Configure a tokenizer for default field types for
index
analyzers only. This option is liketokenizer
, but only applies to theindex
analyzer for a given field type.Configuration syntax is the same as the
filter
option above. Options specified here will overide any that apply from usage of the maintokenizer
option. For instance, if you specified atext_ws
tokenizer within thetokenizer
option, and re-specify anothertext_ws
tokenizer here, then this will take precedence. Other field types will not be affected if not overriden.- index
Configures the different types of index fields provided by the Solr instance. Each field is configured on a separated line. Each line contains a white-space separated list of
[key]:[value]
pairs which define options associated with the index. Common field options are detailed at http://wiki.apache.org/solr/SchemaXml#Common_field_options and are illustrated in following examples.A special
[key]:[value]
pair is supported here for supporting Copy Fields; if you specifycopyfield:dest_field
, then a<copyField>
declaration will be included in the schema that copies the given field into that ofdest_field
.- unique-key
- Optional override for declaring a field to be unique for all documents. See http://wiki.apache.org/solr/SchemaXml for more information Defaults to 'uid'.
- default-search-field
- Configure a default search field, which is used when no field was explicitly given. See http://wiki.apache.org/solr/SchemaXml.
- default-operator
- The default operator to use for queries. Valid values are
AND
andOR
. Defaults toOR
. - additional-schema-config
- Optional additional configuration to be included inside the
schema.xml
. For instance, custom<copyField />
directives and anything else that's part of the schema configuration (see http://wiki.apache.org/solr/SchemaXml). - additionalFieldConfig
Optional additional configuration which is placed inside the
<fields>...</fields>
directive inschema.xml
. Use this to insert dynamic fields. For example:additionalFieldConfig = <dynamicField name="..." type="string" indexed="true" stored="true" />
Defaults to
''
(empty string).
The following options only apply if collective.recipe.solrinstance:mc
is
specified. They are optional if the normal recipe is being used.
All options defined in the solr-instance section will we inherited to cores.
A core could override a previous defined option.
- cores
A list of identifiers of Buildout configuration sections that correspond to individual Solr core configurations. Each identifier specified will have the section it relates to processed according to the given options above to generate Solr configuration files for each core. See Multi-core Solr for an example.
Each identifier specified will result in a Solr
instanceDir
being created and entries for each core placed in Solr'ssolr.xml
configuration.- default-core-name
- Optional. This option controls which core is set as the default for
incoming requests that do not specify a core name. This corresponds to
the
defaultCoreName
option described at http://wiki.apache.org/solr/CoreAdmin#cores.
- section-name
- Name of the
product-config
section to be generated forzope.conf
. Defaults tosolr
. - zope-conf
Optional override for the configuration snippet that is generated to be included in
zope.conf
by other recipes. Defaults to:<product-config ${part:section-name}> address ${part:host}:${part:port} basepath ${part:basepath} </product-config>
A simple example how a single Solr configuration could look like this:
[buildout] parts = solr-download solr [solr-download] recipe = hexagonit.recipe.download strip-top-level-dir = true url = http://mirrorservice.nomedia.no/apache.org//lucene/solr/3.5.0/apache-solr-3.5.0.zip [solr] recipe = collective.recipe.solrinstance solr-location = ${solr-download:location} host = 127.0.0.1 port = 1234 max-num-results = 500 section-name = SOLR unique-key = uniqueID index = name:uniqueID type:string indexed:true stored:true required:true name:Foo type:text copyfield:Baz name:Bar type:date indexed:false stored:false required:true multivalued:true omitnorms:true copyfield:Baz name:Foo bar type:text name:Baz type:text name:Everything type:text filter = text solr.LowerCaseFilterFactory char-filter-index = text solr.HTMLStripCharFilterFactory tokenizer-query = text solr.WhitespaceTokenizerFactory additional-schema-config = <copyField source="*" dest="Everything"/>
To configure Solr for multiple cores, you must use the
collective.recipe.solrinstance:mc
recipe. An example of a multi-core Solr
configuration could look like the following:
[buildout] parts = solr-download solr-mc [solr-download] recipe = hexagonit.recipe.download strip-top-level-dir = true url = http://mirrorservice.nomedia.no/apache.org//lucene/solr/3.5.0/apache-solr-3.5.0.zip [solr-mc] recipe = collective.recipe.solrinstance:mc solr-location = ${solr-download:location} host = 127.0.0.1 port = 1234 section-name = SOLR directoryFactory = solr.NRTCachingDirectoryFactory cores = core1 core2 [core1] max-num-results = 99 unique-key = uniqueID index = name:uniqueID type:string indexed:true stored:true required:true name:Foo type:text copyfield:Baz name:Bar type:date indexed:false stored:false required:true multivalued:true omitnorms:true copyfield:Baz name:Foo bar type:text name:Baz type:text name:Everything type:text filter = text solr.LowerCaseFilterFactory char-filter-index = text solr.HTMLStripCharFilterFactory tokenizer-query = text solr.WhitespaceTokenizerFactory text solr.LowerCaseFilterFactory additional-schema-config = <copyField source="*" dest="Everything"/> [core2] max-num-results = 66 unique-key = uid index = name:uid type:string indexed:true stored:true required:true name:La type:text name:Le type:date indexed:false stored:false required:true multivalued:true omitnorms:true name:Lau type:text filter = text solr.LowerCaseFilterFactory char-filter-query = text solr.HTMLStripCharFilterFactory tokenizer-index = text solr.WhitespaceTokenizerFactory