4.5 Exercises

  1. Using the UK Manifestos corpus (data_corpus_ukmanifestos), calculate each manifesto’s readability scores (textstat_readability), perhaps using multiple measures. Add these scores to the document variables and plot a chosen readability score against the Year docvar. Is there a discernible trend in the readability of UK political manifestos over the selected period?

  2. Calculate keyness statistics (textstat_keyness) to compare Labour Party manifestos with all other parties in the filtered corpus. What are the 20 most important terms for the Labour Party? Create a visualisation of these terms using textplot_keyness.

  3. Using the cosine method (margin = "features"), explore the similarity of features (words) in the UK manifestos (data_dfm). Can you identify pairs of words that tend to appear in similar contexts? (Hint: examine the similarity matrix for high values).

  4. Calculate the entropy (textstat_entropy ) for features in the UK Manifestos DFM. Identify features with very low entropy (close to 0) and examine the documents in which they are concentrated using the kwic() function or by examining the DFM directly. What might this tell you about those documents or the use of those specific terms?

  5. Apply textstat_frequency grouped by party to find the most frequent terms for the Conservative, Labour and Liberal Democrat parties. Create bar plots using the ggplot2 package to visualise the frequencies of a few selected terms (e.g. ‘econom’, ‘social’, ‘europ’) across these three parties. Compare this to the grouped frequency plot above, and add more terms for comparison.