Skip to content

Latest commit

 

History

History
40 lines (25 loc) · 1.27 KB

README.md

File metadata and controls

40 lines (25 loc) · 1.27 KB

FUSEARCH

Python package badge https://pypi.python.org/pypi/fusearch twitter badge@plarroy

A Python3, console based full-text search for document collections. It converts different types of documents such as PDF, word files etc to text and creates a simple inverted index for queries.

The index is kept in a sqlite file in the indexed directory.

This software is ALPHA status

How to run

Recommend to create and activate a venv

virtualenv -p(which python3) venv
source venv/bin/activate

Edit fusearch.yaml and add some directory to index.

Start the daemon in foreground mode (-f) and see the indexing process take place.

pip install -e .
fusearchd.py -f -c fusearch.yaml

Dependencies

From textract:

apt-get install python-dev libxml2-dev libxslt1-dev antiword unrtf poppler-utils pstotext tesseract-ocr
flac ffmpeg lame libmad0 libsox-fmt-mp3 sox libjpeg-dev swig