Chapter 4 Describe
Now that we have loaded our texts into R, it is time to understand what our texts are about, who their authors are, and what we expect to find in them. This chapter focuses on techniques for exploring and summarising text data, including keywords-in-context, visualisations, and text statistics. Before diving into these techniques, we will briefly discuss the concept of the corpus, which is central to working with text data in quanteda and the DFM (data-frequency matrix), which we derive from it. Throughout this chapter, we will use the example of the Manifesto Project corpus, specifically the UK manifestos, to illustrate these concepts and techniques. This data is part of the quanteda.corpora package as data_corpus_ukmanifestos.