3.4 Import .csv

Sometimes, text data comes pre-processed as a document-term matrix (DTM) or term-frequency matrix stored in a CSV file. A DTM typically has documents as rows, terms (or words) as columns, and cell values representing the word counts. There are two main ways we can import CSV files: using R’s inbuilt read.csv() or the read_csv function from the readr package:

data_dtm <- read.csv("your_dtm_file.csv")  # In case the first row is NOT the column names
data_dtm <- read.csv("your_dtm_file.csv", header = TRUE)
data_dtm <- readr::read_csv("your_dtm_file.csv", col_names = FALSE)  # In case the first row are NOT the column names
data_dtm <- readr::read_csv("your_dtm_file.csv")

Remember that importing a pre-computed matrix means you inherit the pre-processing choices made when it was created. Also, take into account that in some cases, the CSV is not delimited by a comma but by a semicolon (;) or tab. In that case, we have to import it as a delimited object:

data_dtm <- read_delim(NULL, delim = ";", escape_double = FALSE)
data_dtm <- read_delim(NULL, delim = "\t", escape_double = FALSE)