simantic-web-mining

Meaning based web mining and scraping.

A new type of search engine.

Many seacrh engine like google,bing etc offer a better , robust way of searching any data from the internet.In this project we prupose a model that search or ranks pages based on the simantic simlarity between the search query and the data in the website(for instance all the headers and paragraps in the page)

Using the concept of web scraping and NLP combined to produce the final results.

1. Web Scraping

Scrapy + Beautifulsoup

2. Data Comparision (NLP)

Universal sentence encoder + cosine similarity

To Run

scrapy crawl getData -o data.json a data.json file is created with the scraped items.

open the jupyter notebook to run the remaining code

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
capstoneEnv/projectFolder		capstoneEnv/projectFolder
.gitignore		.gitignore
README.md		README.md
index.html		index.html
tmp.py		tmp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

simantic-web-mining

Meaning based web mining and scraping.

1. Web Scraping

2. Data Comparision (NLP)

About

Releases

Packages

Languages

nandishaivalli/semantic-web-mining

Folders and files

Latest commit

History

Repository files navigation

simantic-web-mining

Meaning based web mining and scraping.

1. Web Scraping

2. Data Comparision (NLP)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages