=== Rule Engine ===

A simple system which can work and process work items based on rules define for them. The idea was to alieviate the pain of site specific parsing to extract relevant content Three types of rules can be defined.

CSS based extraction
Regex based extraction
Executing JS on the HTML to extract content ( Not done yet - WIP)

The Rule engine would return results based on the best fit for a particular URL For ex., if Rules were defined as below:

*: title: h1 comments: div#comments, div#comment, div#user_comment user: div#comment span.user

xyz.com: title: div#name comments: div#comment_bar span.comments

Now, if a user wanted to scrape title, comments & user for xyz.com the title, comments would xyz.com would take effect. And, the rule for user at * will take effect. (WIP)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Files

README.md

Latest commit

History

README.md

File metadata and controls