-
Notifications
You must be signed in to change notification settings - Fork 135
Pre_Processors
#Pre Processor
Pre Processors converts the raw text of the question entered into the bot before it is passed to the brain to answer.
Pre processors are chained together, and are called in the sequence order in which they are defined in their
configuration file preprocessors.conf
. Each pre processor is listed as the full Python class path.
As an example the current Y-Bot pre processor configuration looks like this. A line starting with a #
shows that
pre processor is commented out
programy.processors.pre.normalize.NormalizePreProcessor
programy.processors.pre.removepunctuation.RemovePunctuationPreProcessor
#programy.processors.pre.demojize.DemojizePreProcessor
#programy.processors.pre.splitchinese.SplitChinesePreProcessor
#programy.processors.pre.toupper.ToUpperPreProcessor
#programy.processors.pre.translate.TranslatorPreProcessor
#programy.processors.pre.wordtagger.WordTaggerPreProcessor
#programy.processors.pre.lemmatize.LemmatizePreProcessor
#programy.processors.pre.stemming.StemmingPreProcessor
#programy.processors.pre.stopwords.StopWordsPreProcessor
The available pre processors are currently
- NormalizePreProcessor. Normalises the input replacing abbreviations and punctuation and replacing numbers with word.
- RemovePunctuationPreProcessor. Remove all punctuation from the sentence.
- DemojizePreProcessor. Replaces emoji character strings with word equivalents.
- SplitChinesePreProcessor. Splits the sentence based on chinese language grammar rules.
- ToUpperPreProcessor. Converts all text to upper case.
- TranslatorPreProcessor. Translate the language using Google translate.
- WordTaggerPreProcessor. Using NLP to tag each word with its grammar definition. See NLP for more details.
- LemmatizePreProcessor. Lemmatizes the string, replace any plural words with singular, e.g mice to mouse. See NLP for more details.
- StemmingPreProcessor. Applies Stemming to the string, reducing each word to its base stemm, e.g troubled, troubles and troubling to troubl. See NLP for more details.
- StopWordsPreProcessor. Removes all stop words from the string, e.g as, is, the etc. See NLP for more details
Pre Processors inherit from the abstract base class
programy.processors.processing.PreProcessor
The class has a single method, process, which takes bot, client and the string to pre-process and should return the processed string
class PreProcessor(Processor):
def __init__(self):
Processor.__init__(self)
@abstractmethod
def process(self, bot, clientid, string):
pass
Once built and tested the path to the class needs to be appened to PYTHONPATH system variable
Email: [email protected] | Twitter: @keiffster | Facebook: keith.sterling | LinkedIn: keithsterling | My Blog
- Home
- Background
- Guiding Principles
- Reporting an Issue
- Installation
- You And Your Bot
- Bots
- Clients
- Configuration
- AIML
- Sentence Splitting
- Natural Langauge Processing
- Normalization
- Spelling
- Sentiment Analysis
- Translation
- Security
- Hot Reload
- Logging
- Out of Band
- Multi Language
- RDF Support
- Rich Media
- Asynchronous Events
- Triggers
- External Services
- Dynamic Sets, Maps & Vars
- Extensions
- Pre & Post Processors
- Custom Nodes
- The Brain Tree
- Utilities
- Building It Yourself
- Creating Your Own Bot
- Contributing
- Performance Testing
- FAQ
- History
- Website