4.5 Exercises
Using the UK Manifestos corpus (
data_corpus_ukmanifestos), calculate each manifesto’s readability scores (textstat_readability), perhaps using multiple measures. Add these scores to the document variables and plot a chosen readability score against the Year docvar. Is there a discernible trend in the readability of UK political manifestos over the selected period?Calculate keyness statistics (
textstat_keyness) to compare Labour Party manifestos with all other parties in the filtered corpus. What are the 20 most important terms for the Labour Party? Create a visualisation of these terms usingtextplot_keyness.Using the cosine method (
margin = "features"), explore the similarity of features (words) in the UK manifestos (data_dfm). Can you identify pairs of words that tend to appear in similar contexts? (Hint: examine the similarity matrix for high values).Calculate the entropy (
textstat_entropy) for features in the UK Manifestos DFM. Identify features with very low entropy (close to 0) and examine the documents in which they are concentrated using thekwic()function or by examining the DFM directly. What might this tell you about those documents or the use of those specific terms?Apply
textstat_frequencygrouped by party to find the most frequent terms for the Conservative, Labour and Liberal Democrat parties. Create bar plots using theggplot2package to visualise the frequencies of a few selected terms (e.g. ‘econom’, ‘social’, ‘europ’) across these three parties. Compare this to the grouped frequency plot above, and add more terms for comparison.