3.3 Packages for Quantitative Text Analysis in R
There are several packages that we can use for quantitative text analysis in R, such as tm
, tidytext
, RTextTools
, corpus
and koRpus
(Welbers et al., 2017). Many of these packages offer specialised features that can sometimes be very useful, but in this book, we will mainly rely on quanteda
(Benoit et al., 2018), which is currently in its fourth version. The advantage of quanteda
over other packages is that it integrates into a common framework many of the text analysis functions of R that were previously spread across many different packages(Welbers et al., 2017).In addition, many `quanteda’ functions can be easily combined with functions in other packages, while the package as a whole has simple and logical commands and a well-maintained website.
The current version of quanteda
at the time of writing is 4.0. This version works best with R version 4.0.1 or higher. To check if your system has this, type R.Version()
in your console. The result will be a list. Look for $version.string
to see what version number your version of R is. If you do not have the latest version, see the steps above to install the latest version.
To install quanteda
, type:
Note that because we wrote dependencies = TRUE
, this command also installed three other quanteda
helper packages, which serve to extend the basic tools that are already inside quanteda
. In the future, more of these helper packages can be expected to extend the main quanteda
package even further. However, before these helper packages get an official release, we can already find them in development on GitHub.In this book, we will install two of them - quanteda.classifiers
, which we will use for supervised learning methods, and quanteda.dictionaries
, which we will use for dictionary analysis:
library(devtools)
install_github("quanteda/quanteda.classifiers", dependencies = TRUE)
install_github("kbenoit/quanteda.dictionaries", dependencies = TRUE)
install_github("quanteda/quanteda.corpora", dependencies = TRUE)
In addition to quanteda
we then use the following packages:
install_github("mikegruz/kripp.boot", dependencies = TRUE)
install.packages("ca", dependencies = TRUE)
install.packages("combinat", dependencies = TRUE)
install.packages("DescTools", dependencies = TRUE)
install.packages("FactoMineR", dependencies = TRUE)
install.packages("factoextra", dependencies = TRUE)
install.packages("Factoshiny", dependencies = TRUE)
install.packages("Hmisc", dependencies = TRUE)
install.packages("httr", dependencies = TRUE)
install.packages("jsonlite", dependencies = TRUE)
install.packages("manifestoR", dependencies = TRUE)
install.packages("readr", dependencies = TRUE)
install.packages("readtext", dependencies = TRUE)
install.packages("reshape2", dependencies = TRUE)
install.packages("RTextTools", dependencies = TRUE)
install.packages("R.temis", dependencies = TRUE)
install.packages("rvest", dependencies = TRUE)
install.packages("seededlda", dependencies = TRUE)
install.packages("stm", dependencies = TRUE)
install.packages("tidyverse", dependencies = TRUE)
install.packages("topicmodels", dependencies = TRUE)
install.packages("magick", dependencies = TRUE)
install.packages("vader", dependencies = TRUE)
Some of these are specialised packages for text analysis, others for statistical estimation and visualisation. After installation, you will find these packages, as well as the quanteda
and devtools
packages, under the Packages tab in RStudio.