6.2 Keywords in Context

One simple - but effective - way to learn more about our texts is by looking at keywords-in-context (kwic). Here, we look at with which other words a certain word appears in our texts. This is also known as looking at the concordance of our text. To do so is easy with our tokens data frame. Let’s take all those words that start with ‘secur’ and look at which three words occur before and after this word. We can then run:

kwic_output <- kwic(data_tokens, pattern = "secur*", valuetype = "glob", window = 3)

In the outputted object, we find a column labelled pre and another labelled post. These refer to the words that came either before or after the word ’secur*’. We can easily take these out and combine them:

text_pre <- kwic_output$pre
text_post <- kwic_output$post
text_word <- kwic_output$keyword
text <- as.data.frame(paste(text_pre, text_word, text_post))

We then combine this information with the name of the document it came from so that we know which text the word is from:

extracted <- cbind(kwic_output$docname, text)
names(extracted) <- c("docname", "text")
head(extracted)

##   docname                                                                 text
## 1  text10             making allowances peace security ushering period détente
## 2  text27 establishment maintenance post-war security scholars contend western
## 3  text27        western allies desired security system democratic governments
## 4  text27  churchill's mainly centered securing control mediterranean ensuring
## 5  text34    peace enforcement capacity security council effectively paralyzed
## 6  text44        leaders establishing secret security force prevent subversion