Natural Language Processing Project

Overview:

This project is about utilizing text analysis techniques to analyze unstructured data (text) in multiple text documents, aiming at providing insights and figuring out hidden themes in these documents. As a result, grouped 42 txt files into 5 topics, and classified overall sentiment of each file. Process including:

Data understanding and preparation including removing punctuation marks, transforming all letters to lowercase, Stemming etc.
Exploratory data analysis including word frequency, TF-IDF, word cloud, and Bigram
Clustering using K-mean, Hierarchical clustering, Network graph
Latent semantic analysis such as semantic similarity, sentiment analysis
Topic modelling utilising Latent Dirichlet Allocation (LDA) algorithm

Output:

Written Report

The approach used, assumptions and supporting rationale for each stage of the CRISP-DM framework. Results and recommendations, including supporting visualisations and summary data. Evaluate the results of different techniques, giving reasons for the final approach.

Workfile

An appendix including working code

Reflection Blog

A blog post reflecting on the use of the techniques of text analysis in the workplace.

_{^{Edit on May 39, 2020}}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Natural Language Processing Project

Overview:

Output:

Written Report

Workfile

Reflection Blog

Files

README.md

Latest commit

History

README.md

File metadata and controls

Natural Language Processing Project

Overview:

Output:

Written Report

Workfile

Reflection Blog