Skip to content

Latest commit

 

History

History
26 lines (14 loc) · 1.7 KB

README.md

File metadata and controls

26 lines (14 loc) · 1.7 KB

NLP-in-Python

Lynn Cherny, @arnicas / arnicas@gmail

Intro to some NLP concepts and libraries in Python for a class at CMU, Feb 2015.

Lots of libraries are required - see here for install info.

Notebook viewer links:

0. Reading in Files: How I made the data files, mostly Gutenberg operations. Add your own URLs!

1. Tokenizing, Stemming, POS - the very basics. POS is "parts of speech" not "piece of %#@t".

2. Wordclouds - entirely optional, but shows off interactive widgets to live-filter stopwords for visual effect

3. TF-IDF, Clustering, Pattern - getting to the meat! Hierarchical clustering here too.

Bonus: Doing some TF-IDF NLP in node's package "natural" (but caveats apply): see here

4. Naive Bayes Classification - the infamous 50 Shades Sex Scene Detection because spam is boring

5. Naive Bayes in Scikit-Learn - very quick intro to the main ML package in Python, for comparison purposes; same sex scene data.

There are some links to libraries and books in the [Intro NLP Links.md](Intro NLP Links.md)