Investigation of text mining tools using R, including bag-of-word models, and information retrieval using the term frequency-inverse document frequency (tf-idf) approach. Advanced topics such as document clustering are considered. A variety of types of texts are analyzed from tweets from Twitter to digitized books from Project Gutenberg.

Prerequisites: DATA 511 or permission of department chair.

4 Credits

