1.3 Required Packages

R has several packages for text analysis, including tm, tidytext, RTextTools, corpus and koRpus. While each of these packages offers its own special features that may be useful in certain contexts, we will primarily rely on the quanteda package here (Benoit et al., 2018). We do so because it is efficient, has a logical design, and communicates well with other packages. Although already hosted on CRAN, it is still under active development (see https://quanteda.io/) and has a well-maintained website with extensive documentation, tutorials and vignettes. To install it, run:

install.packages("quanteda", dependencies = TRUE)

The main idea of quanteda is that the package itself contains the basic tools, and other “helper” packages provide more specialised tasks. These are either already released on CRAN or are still under development:

install.packages("quanteda.textmodels", dependencies = TRUE)
install.packages("quanteda.textstats", dependencies = TRUE)
install.packages("quanteda.textplots", dependencies = TRUE)

library(devtools)

install_github("quanteda/quanteda.classifiers", dependencies = TRUE)
install_github("kbenoit/quanteda.dictionaries", dependencies = TRUE) 
install_github("quanteda/quanteda.corpora", dependencies = TRUE) 

Besides quanteda, we need several other packages before we can start. Note that writing devtools:: is another way of telling R to load this package and run the command that follows it (which can be useful if we have several packages with the same commands):

# Install from GitHub

devtools::install_github("mikegruz/kripp.boot", dependencies = TRUE)
devtools::install_github("matthewjdenny/preText", dependencies = TRUE)

# Install from CRAN
install.packages(c(
  "ca",           # Correspondence Analysis
  "caret",        # Machine Learning
  "combinat",     # Combinatorics
  "coop",         # Cosine Similarity
  "DescTools",    # Descriptive Statistics
  "ggdendro",     # Dendrograms
  "FactoMineR",   # Correspondence Analysis
  "factoextra",   # Visualisations for FactoMineR
  "Factoshiny",   # Shiny app for FactoMineR
  "Hmisc",        # Collection of useful functions
  "httr",         # Tools for working with URLs and HTTP
  "irr",          # For Krippendorff's alpha
  "jsonlite",     # Tools for working with JSON
  "lsa",          # Latent Semantic Analysis
  "manifestoR",   # Access Manifesto Project data
  "readr",        # Read .csv files
  "readtext",     # Read .txt files
  "reshape2",     #  Reshape Data
  "R.temis",      # Text Mining
  "rvest",        # Scrape Web Pages
  "seededlda",    # Semi-supervised Latent Dirichlet Allocation
  "stm",          # Structural Topic Models
  "tidyverse",    # Tools for data science
  "topicmodels",  # Topic Models
  "magick",       # Advanced Graphics
  "vader"         # Vader Sentiment Analysis
), dependencies = TRUE)

After successfully installing a package, we can find it in RStudio under the Packages tab. To use it, we can either select it there or run the library() command:

library(quanteda)
## Package version: 4.3.1
## Unicode version: 15.1
## ICU version: 74.2
## Parallel computing: 16 of 16 threads used.
## See https://quanteda.io for tutorials and examples.

References

Benoit, K., Watanabe, K., Wang, H., Nulty, P., Obeng, A., Müller, S., & Matsuo, A. (2018). Quanteda: An r package for the quantitative analysis of textual data. Journal of Open Source Software, 3(30), 774. https://doi.org/10.21105/joss.00774