This is a application for sentiment analysis of tweets and creating maps and histograms based on it.
Text Analysis is a major application field for machine learning algorithms. However the raw data, a sequence of symbols cannot be fed directly to the algorithms themselves as most of them expect numerical feature vectors with a fixed size rather than the raw text documents with variable length. In order to address this, scikit-learn provides utilities for the most common ways to extract numerical features from text content, namely:
- tokenizing strings and giving an integer id for each possible token, for instance by using white-spaces and punctuation as token separators.
- counting the occurrences of tokens in each document.
- normalizing and weighting with diminishing importance tokens that occur in the majority of samples / documents. A corpus of documents can thus be represented by a matrix with one row per document and one column per token (e.g. word) occurring in the corpus. We call vectorization the general process of turning a collection of text documents into numerical feature vectors. This specific strategy (tokenization, counting and normalization) is called the Bag of Words or Bag of n-grams representation.\
tf(t,d) - no of times a term 't' occurs in a document 'd'
idf(t,d) - inverse document frequncy
tf-idf(t,d) = tf(t,d) x idf(t,d)
After downloading the file, type in your Command prompt:
streamlit run US_Airlines_Tweets.py
This project uses the following software and libraries:
Email: [email protected]
Project Link: https://github.com/pranaykankariya97/Web-App-For-Sentiment-Analysis-Of-Tweets