6.4 Exercises

  1. Apply the Wordscores method to the inaugural speeches corpus (data_corpus_inaugural). Select a set of reference speeches and assign them hypothetical scores based on a dimension of interest (e.g. populism, as defined by external knowledge). Estimate the scores for the remaining speeches and visualise the results. Discuss the estimated positions and the uncertainty around them.

  2. Apply the Wordfish method to the corpus of UK party manifestos used in the Wordscores example. Select two manifestos to define the direction of the scale. Interpret the resulting dimension based on the words that load highly on it and the manifestos’ positions. Compare the Wordfish results with the Wordscores results.

  3. Explore another dataset with categorical text annotations or create a categorised dataset from a text corpus (for example, by coding themes in a small set of documents). Perform Multiple Correspondence Analysis on this data. Interpret the main dimensions based on the categories that contribute most to them. Visualise the categories and/or individuals in the MCA space.

  4. Apply Simple Correspondence Analysis to a document-feature matrix of your choice using the textmodel_ca() function. Interpret the first two dimensions based on the words and documents located at the extremes. Visualise the results and discuss the relationships revealed by the plot.

  5. Use the textmodel_affinity() function to compute the affinity matrix for a corpus. Explore the matrix to identify documents with a high affinity to specific documents of interest. Apply a clustering method (e.g. hierarchical clustering) to the affinity matrix and interpret the resulting clusters.

  6. Research other scaling methods for text analysis, such as ideal point models or specialised techniques for specific text data types. How do they differ from Wordscores, Wordfish and Correspondence Analysis in terms of their assumptions and applications?