GitHub

=== Rule Engine ===

A simple system which can work and process work items based on rules define for them. The idea was to alieviate the pain of site specific parsing to extract relevant content Three types of rules can be defined.

CSS based extraction
Regex based extraction
Executing JS on the HTML to extract content ( Not done yet - WIP)

The Rule engine would return results based on the best fit for a particular URL For ex., if Rules were defined as below:

*: title: h1 comments: div#comments, div#comment, div#user_comment user: div#comment span.user

xyz.com: title: div#name comments: div#comment_bar span.comments

Now, if a user wanted to scrape title, comments & user for xyz.com the title, comments would xyz.com would take effect. And, the rule for user at * will take effect. (WIP)

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.idea		.idea
src		src
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml
rule_engine.iml		rule_engine.iml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

brewkode/rule_engine

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages