An example of using JesterJ to index html, that hopefully grows to become more than at trivial example
-
Solr Cloud installed where JesterJ can see it and zookeeper.
-
Java 11 installed (17+ won’t work for JesterJ yet)
-
Ability to build or download solr refguide
-
Get the ref guide. You will need to build the ref guide as shown below (this may get checked in for convenience at a later date)
cd /dir/you/checked/out/solr/in ./gradlew :solr:solr-ref-guide:buildLocalSite
-
Update the build with the info for your solr’s zookeeper
solr { zkHost 'localhost:2181/solr' <<<<<<<<<< update this if required confName 'ref-guide' confDir 'src/main/solr/configs/ref-guide' }
-
Start solr, and upload the configset for this project via
cd /dir/you/checked/out/this/project ./gradlew upconfig
-
Then in org.jesterj.index.refguide.SolrRefguideConfig paste the absolute path to /solr/solr/solr-ref-guide/build/site into
File refGuideAbsoluteLocation = new File("YOUR__PATH__HERE");
-
Adjust zookeeper settings here:
sendToSolrBuilder .named("solr_sender") .withProcessor( new SendToSolrCloudProcessor.Builder() .named("solr_processor") .withZookeeper("YOUR_ZK:2181/ZK_ROOT_IF_APPLICABLE")
-
Download JesterJ 1.0.0 node-jar here: https://github.com/nsoft/jesterj/releases
-
Build this project via
export JAVA_HOME=/home/gus/.jdks/azul-11.0.19 ./gradlew packageUnoJar
-
Index the ref guide!
export JAVA_HOME=/home/gus/.jdks/azul-11.0.19 $JAVA_HOME/bin/java -jar jesterj-ingest-1.0.0-node.jar build/libs/index-solr-ref-guide-1.0-SNAPSHOT-dep.jar solrrefguide s3cret
You will see a few exceptions, but this is just some image files that Tika doesn’t like, and after 3 tries JesterJ will mark those files dead and ignore them ever after. JesterJ will continue to run and every minute it will check for changes to the files. Any ref guide files that are updated will be re-indexed.
If you want to re-index for some reason, the easiest thing to do is delete ~/.jj/solrrefguide
which holds all the JesterJ logs, and the files for the cassandra database.
You may notice that JesterJ regularly logs a graphviz visualization of its current state. You can paste that at https://dreampuf.github.io/GraphvizOnline/ and you will get something that looks like: