7.1 Classical Dictionary Analysis
As for our dictionaries, we can either make the dictionary ourselves or use an off-the-shelf version. For the latter, we can either import the files we already have into R or use some of the versions that come with the quanteda.dictionaries
package. For this, we first load the package:
We then apply one of these dictionaries to the document feature matrix we in the previous chapter. As a dictionary, we will use the one made by Laver & Garry (2000), meant for estimating policy positions from political texts. We first load this dictionary into R and then run it on the dfm using the dfm_lookup
command:
data_dictionary_LaverGarry
dictionary_results <- dfm_lookup(data_dfm, data_dictionary_LaverGarry)
dictionary_results
Apart from off-the-shelf dictionaries, it is also possible to create our own which could suit our research question better. One approach is to use prior theory to come up with different categories and their associated words. Another approach is to use reference texts to come up with categories and words. We can also combine different dictionaries as illustrated by Young & Soroka (2012), or different dictionaries and keywords from categories in a manual coding scheme (Lind et al., 2019). Finally, we can use expert or crowd coding assessments to determine the words that best match different categories in a dictionary (Haselmayer & Jenny, 2017).
If we want to create our own dictionary in quanteda
we use the same commands as above, but we first have to create the dictionary. To do so, we specify the words in a named list. This list contains keys (the words we want to look for) and the categories to which they belong. We then transform this list into a dictionary. Here, we choose some words which we believe will allow us to identify the different parties with ease:
dic_list <- list(economy = c("tax*", "invest*", "trade"),
war = c("army", "troops", "fight"),
diplomacy = c("nato","comintern","un"),
government = c("washington","moscow","beijing")
)
dic_created <- dictionary(dic_list, tolower = FALSE)
dic_created
## Dictionary object with 4 key entries.
## - [economy]:
## - tax*, invest*, trade
## - [war]:
## - army, troops, fight
## - [diplomacy]:
## - nato, comintern, un
## - [government]:
## - washington, moscow, beijing
If you compare the dic_list
file with the data_dictionary_LaverGarry
file, you will find that it has the same structure. To see the result, we can use the same command:
## Document-feature matrix of: 205 documents, 4 features (89.39% sparse) and 0 docvars.
## features
## docs economy war diplomacy government
## text1 0 0 0 0
## text2 0 0 0 0
## text3 0 0 0 0
## text4 0 0 0 0
## text5 0 0 0 0
## text6 0 0 0 0
## [ reached max_ndoc ... 199 more documents ]
Also note that if you would like to convert this dfm into a regular dataframe, you can use the convert
command included in quanteda
:
Moreover, while we could look at this dataframe by either calling it in the console or looking at it in the Environment, we can also make it into an HTML widget, using the DT
and data.table
packages: