Skip to content

nandishaivalli/semantic-web-mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

simantic-web-mining

Meaning based web mining and scraping.

A new type of search engine.

Many seacrh engine like google,bing etc offer a better , robust way of searching any data from the internet.In this project we prupose a model that search or ranks pages based on the simantic simlarity between the search query and the data in the website(for instance all the headers and paragraps in the page)

Using the concept of web scraping and NLP combined to produce the final results.

1. Web Scraping

Scrapy + Beautifulsoup

2. Data Comparision (NLP)

Universal sentence encoder + cosine similarity


To Run

scrapy crawl getData -o data.json a data.json file is created with the scraped items.

open the jupyter notebook to run the remaining code


About

meaning or semantic based web mining and scraping

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published