Skip to content

asaini/QProber

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Based Databases

This project usage is restricted to Educational & Research purposes only. It consists of 2 parts:

Part 1

Implements a web database classification algorithm that is described in

'QProber: A System for Automatic Classification of Hidden-Web Databases' by Gravano, Ipeirotis, and Sahami

Part 1 tries to classify web databases using a meta searching, whereby a query is made to Bing. This query searches for particular keywords listed in a directory,say root.txt and classifies database in a particular category by looking at the number of documents returned for those particular keywords

Part 2

Implements a simplified version of content-summary extraction algorithm, which is fully described in

'Classification Aware Hidden-Web Text Database Selection' by Ipeirotis and Gravano

Part 2 generates a dictionary containing all the terms listed on that website.

To run the program:

$ python qproberGeneric.py

Once running, it will ask for the following parameters

  • Coverage Threshold
  • Specificity Threshold
  • Database Url

Here are a few examples of input parameters

Coverage Threshold 100
Specificity Threshold: 0.6
Database Url: www.diabetes.org

Coverage Threshold: 100
Specificity Threshold: 0.6
Database Url: www.microsoft.com

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages