Skip to content

wenyingw/Natural-Language-Processing-Project

Repository files navigation

Natural Language Processing Project

Overview:

This project is about utilizing text analysis techniques to analyze unstructured data (text) in multiple text documents, aiming at providing insights and figuring out hidden themes in these documents. As a result, grouped 42 txt files into 5 topics, and classified overall sentiment of each file. Process including:

  • Data understanding and preparation including removing punctuation marks, transforming all letters to lowercase, Stemming etc.
  • Exploratory data analysis including word frequency, TF-IDF, word cloud, and Bigram
  • Clustering using K-mean, Hierarchical clustering, Network graph
  • Latent semantic analysis such as semantic similarity, sentiment analysis
  • Topic modelling utilising Latent Dirichlet Allocation (LDA) algorithm

Output:

The approach used, assumptions and supporting rationale for each stage of the CRISP-DM framework. Results and recommendations, including supporting visualisations and summary data. Evaluate the results of different techniques, giving reasons for the final approach.

An appendix including working code

A blog post reflecting on the use of the techniques of text analysis in the workplace.

Edit on May 39, 2020

About

Natural language processing on unstructured data with R

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages