3.3 Packages for Quantitative Text Analysis in R

There are several packages that we can use for quantitative text analysis in R, such as tm, tidytext, RTextTools, corpus and koRpus (Welbers et al., 2017). Many of these packages offer specialised features that can sometimes be very useful, but in this book, we will mainly rely on quanteda (Benoit et al., 2018), which is currently in its fourth version. The advantage of quanteda over other packages is that it integrates into a common framework many of the text analysis functions of R that were previously spread across many different packages(Welbers et al., 2017).In addition, many `quanteda’ functions can be easily combined with functions in other packages, while the package as a whole has simple and logical commands and a well-maintained website.

The current version of quanteda at the time of writing is 4.0. This version works best with R version 4.0.1 or higher. To check if your system has this, type R.Version() in your console. The result will be a list. Look for $version.string to see what version number your version of R is. If you do not have the latest version, see the steps above to install the latest version.

To install quanteda, type:

install.packages("quanteda", dependencies = TRUE)

Note that because we wrote dependencies = TRUE, this command also installed three other quanteda helper packages, which serve to extend the basic tools that are already inside quanteda. In the future, more of these helper packages can be expected to extend the main quanteda package even further. However, before these helper packages get an official release, we can already find them in development on GitHub.In this book, we will install two of them - quanteda.classifiers, which we will use for supervised learning methods, and quanteda.dictionaries, which we will use for dictionary analysis:

library(devtools)
install_github("quanteda/quanteda.classifiers", dependencies = TRUE)
install_github("kbenoit/quanteda.dictionaries", dependencies = TRUE)
install_github("quanteda/quanteda.corpora", dependencies = TRUE)

In addition to quanteda we then use the following packages:

install_github("mikegruz/kripp.boot", dependencies = TRUE)
install.packages("ca", dependencies = TRUE)
install.packages("combinat", dependencies = TRUE)
install.packages("DescTools", dependencies = TRUE)
install.packages("FactoMineR", dependencies = TRUE)
install.packages("factoextra", dependencies = TRUE)
install.packages("Factoshiny", dependencies = TRUE)
install.packages("Hmisc", dependencies = TRUE)
install.packages("httr", dependencies = TRUE)
install.packages("jsonlite", dependencies = TRUE)
install.packages("manifestoR", dependencies = TRUE)
install.packages("readr", dependencies = TRUE)
install.packages("readtext", dependencies = TRUE)
install.packages("reshape2", dependencies = TRUE)
install.packages("RTextTools", dependencies = TRUE)
install.packages("R.temis", dependencies = TRUE)
install.packages("rvest", dependencies = TRUE)
install.packages("seededlda", dependencies = TRUE)
install.packages("stm", dependencies = TRUE)
install.packages("tidyverse", dependencies = TRUE)
install.packages("topicmodels", dependencies = TRUE)
install.packages("magick", dependencies = TRUE)
install.packages("vader", dependencies = TRUE)

Some of these are specialised packages for text analysis, others for statistical estimation and visualisation. After installation, you will find these packages, as well as the quanteda and devtools packages, under the Packages tab in RStudio.

References

Benoit, K., Watanabe, K., Wang, H., Nulty, P., Obeng, A., Müller, S., & Matsuo, A. (2018). Quanteda: An r package for the quantitative analysis of textual data. Journal of Open Source Software, 3(30), 774. https://doi.org/10.21105/joss.00774
Welbers, K., Van Atteveldt, W., & Benoit, K. (2017). Text Analysis in R. Communication Methods and Measures, 11(4), 245–265. https://doi.org/10.1080/19312458.2017.1387238