Skip to content

Latest commit

 

History

History
16 lines (12 loc) · 1.44 KB

README.md

File metadata and controls

16 lines (12 loc) · 1.44 KB

Natural-language-Processing

Text Summarization & Web Scrapping

This file consists of a piece of text scrapped from a website and involves basic text data pre-processing techniques such as lemmatization, word and senetnce tokenization , stopword removal, punctuation removal, uper to lower case conversion and digit removal. The pre-processed text data is then used for frequency distribution count of words and then used for text ranking and summarization using TF-IDF and Gensim and the results are compared.

Text Summarization with N-Grams

Similar approach as above is performed for this text data but N-Gram is sued for the frequency of words calculation and summarization. Here Unigrams, Bigrams and Trigrams are created first and then used for frequency count.

Word Prediction with N-Grams

Created word tokens of the sentence, found frequency for each of the unigrams and relative frequency for bigrams. Performed word prediction using the relative frequency and probability.

NER & De-Identification Using SPACY

Used Spacy library to perform Named Entity Recognition on a webscarpped news article.