RESULTS OF THE FINAL MODEL CAN BE VIEWED HERE:
Data source:
- https://www.kaggle.com/tboyle10/medicaltranscriptions
- Medical transcription data scraped from mtsamples.com
Python libraries used:
- pandas
- scikit learn
- gensim
- boto3
- plotly
- pyLDAvis
Templates followed:
- https://github.com/aws/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/lda_topic_modeling/LDA-Introduction.ipynb
- https://github.com/XuanX111/Friends_text_generator/blob/master/Friends_LDAvis_Xuan_Qi.ipynb
- https://towardsdatascience.com/nlp-extracting-the-main-topics-from-your-dataset-using-lda-in-minutes-21486f5aa925
- https://github.com/priya-dwivedi/Deep-Learning/blob/master/topic_modeling/LDA_Newsgroup.ipynb
- https://alexandersimes.com/unsupervised/machine/learning/nlp/sagemaker/2019/09/01/got.html
- https://github.com/aws/amazon-sagemaker-examples/blob/master/scientific_details_of_algorithms/lda_topic_modeling/LDA-Science.ipynb
- https://www.machinelearningplus.com/nlp/topic-modeling-visualization-how-to-present-results-lda-models/#6.-What-is-the-Dominant-topic-and-its-percentage-contribution-in-each-document
- https://www.kaggle.com/ykhorramz/lda-and-t-sne-interactive-visualization
- https://www.machinelearningplus.com/nlp/topic-modeling-gensim-python/#17howtofindtheoptimalnumberoftopicsforlda
- https://rare-technologies.com/what-is-topic-coherence/
Link to previous proposal review: