7.1 Classical Dictionary Analysis

As for our dictionaries, we can either make the dictionary ourselves or use an off-the-shelf version. For the latter, we can either import the files we already have into R or use some of the versions that come with the quanteda.dictionaries package. For this, we first load the package:

library(quanteda.dictionaries)

We then apply one of these dictionaries to the document feature matrix we in the previous chapter. As a dictionary, we will use the one made by Laver & Garry (2000), meant for estimating policy positions from political texts. We first load this dictionary into R and then run it on the dfm using the dfm_lookup command:

data_dictionary_LaverGarry
dictionary_results <- dfm_lookup(data_dfm, data_dictionary_LaverGarry)
dictionary_results

Apart from off-the-shelf dictionaries, it is also possible to create our own which could suit our research question better. One approach is to use prior theory to come up with different categories and their associated words. Another approach is to use reference texts to come up with categories and words. We can also combine different dictionaries as illustrated by Young & Soroka (2012), or different dictionaries and keywords from categories in a manual coding scheme (Lind et al., 2019). Finally, we can use expert or crowd coding assessments to determine the words that best match different categories in a dictionary (Haselmayer & Jenny, 2017).

If we want to create our own dictionary in quanteda we use the same commands as above, but we first have to create the dictionary. To do so, we specify the words in a named list. This list contains keys (the words we want to look for) and the categories to which they belong. We then transform this list into a dictionary. Here, we choose some words which we believe will allow us to identify the different parties with ease:

dic_list <- list(economy = c("tax*", "invest*", "trade"), 
                 war = c("army", "troops", "fight"), 
                 diplomacy = c("nato","comintern","un"), 
                 government = c("washington","moscow","beijing")
                 )

dic_created <- dictionary(dic_list, tolower = FALSE)
dic_created

## Dictionary object with 4 key entries.
## - [economy]:
##   - tax*, invest*, trade
## - [war]:
##   - army, troops, fight
## - [diplomacy]:
##   - nato, comintern, un
## - [government]:
##   - washington, moscow, beijing

If you compare the dic_list file with the data_dictionary_LaverGarry file, you will find that it has the same structure. To see the result, we can use the same command:

dictionary_created <- dfm_lookup(data_dfm, dic_created)
dictionary_created

## Document-feature matrix of: 205 documents, 4 features (89.39% sparse) and 0 docvars.
##        features
## docs    economy war diplomacy government
##   text1       0   0         0          0
##   text2       0   0         0          0
##   text3       0   0         0          0
##   text4       0   0         0          0
##   text5       0   0         0          0
##   text6       0   0         0          0
## [ reached max_ndoc ... 199 more documents ]

Also note that if you would like to convert this dfm into a regular dataframe, you can use the convert command included in quanteda:

dictionary_df <- convert(dictionary_created, to = "data.frame")

Moreover, while we could look at this dataframe by either calling it in the console or looking at it in the Environment, we can also make it into an HTML widget, using the DT and data.table packages:

DT::datatable(dictionary_df)

References

Haselmayer, M., & Jenny, M. (2017). Sentiment analysis of political communication: Combining a dictionary approach with crowdcoding. Quality & Quantity, 51(6), 2623–2646. https://doi.org/10.1007/s11135-016-0412-4

Laver, M., & Garry, J. (2000). Estimating policy positions from political texts. American Journal of Political Science, 44(3), 619–634. https://doi.org/10.2307/2669268

Lind, F., Eberl, J.-M., Heidenreich, T., & Boomgaarden, H. G. (2019). When the journey is as important as the goal: A roadmap to multilingual dictionary construction. International Journal of Communication, 13, 4000–4020.

Young, L., & Soroka, S. (2012). Lexicoder sentiment dictionary. http://www.snsoroka.com/data-lexicoder/