Create a tool that could scrawl the web on a specific topic and create a synthesis of the content found – the challenge is to
mitigate the sources depending on their quality. Solution should be show content from various sources and rate those sources
based on the quality of content. Also, solution should be able to crawl automatically and learn from the content and the source.
1. Input the query from the user.
2. Convert that text query into a search query url to gather results from various search engines such as Google, Yahoo, Bing and DuckDuckGo.
3. Scrape the results' webpages using our Custom Parser and store the results in a file.
4. Visit each link gathered in the above step and get the content of the webpage.
5. Process each webpage and rank them solely on the basis of the content with the help of our Custom Ranking algorithm.
6. Render results to the user.
To acquire a set of relevant sites that serve the content related to the keywords entered, the first step is to boot the spider with the keywords and let it gather links from 4 major search engines:
* Google
* Bing
* Yahoo
* DuckDuckGo
The Algorithm would compare all results with each other and return the results containing the most common information.
* NumPy
* SciPy
* BeautifulSoup
* Requests
* Tornado