4.5 Import from an API
Another way to import texts is by using an Application Programming Interface (API). While comparable to web scraping, APIs are much more user friendly and communicate better with R. This makes it easier to download a large amount of data at once and import the results into R. There are APIs for many popular websites, such as Wikipedia, Twitter, YouTube, Weather Underground, The New York Times, the European Union and so on. Note, however, that you often, if not always, need to register before you can use an API. Moreover, social media platforms such as Facebook and Twitter have recently introduced restrictions in the use of their APIs that have limited researchers’ ability to conduct critical scholarly research (Bruns, 2019). For instance, Facebook has taken steps in restricting access to their public APIs for research purposes. As such, free research on Facebook users’ posts is no longer an option (Freelon, 2018; Perriam et al., 2020). Even more recently, Twitter (rebranded as ‘X’ in July 2023) has eliminated the free access to its API for third-party developers. At the time of writing, the `Basic’ subscription that costs $100 per month allows you to create a project to pull up to 10,000 Tweets.
While web scraping, in general, is easy with the rvest
package, for the APIs you often need a specific package. For example, for Twitter there is the rtweet
package, for Facebook Rfacebook
, and ggmap
for Google maps. Moreover, there are many APIs with associated R packages made by researchers for researchers, such as manifestoR
, a package that provides researchers access to the corpus of the Manifesto Project (Merz et al. (2016)).
Let’s look at an example using an API for the New York Times. If you look at the New York Times’s API page (https://developer.nytimes.com/), you will find that we use the API to extract information ranging from opinion articles to book reviews, movie reviews, and so on. In our example, we will use the API to extract a corpus of movie reviews that were originally published in the New York Times.
Before we start here, we first have to gain permission to use the API. For this, you need to register an account at the website and log in. Then, make a new app under: https://developer.nytimes.com/my-apps and ensure you select the movie reviews. Then, you can click on the new app to see your key under API Keys. It is this string of codes and letters you will have to place at the [YOUR_API_KEY_HERE]
bit shown below.
Now, let us first load the necessary packages:
We can then build our request. As you can see on the site, the request requires us to give a search term (here we choose “love”). Also, we can set a time frame from which we want to extract the reviews:
reviews <- fromJSON("https://api.nytimes.com/svc/movies/v2/reviews/search.json?query=love&opening-date=2000-01-01:2020-01-01&api-key=[YOUR_API_KEY_HERE]")
The result is a JSON object that you can see in the environment. While JSON (JavaScript Object Notation) is a generic way in which information is easy to share, and is thus often used, it is not in an ideal form. So, we change the JSON information to a data frame using the following:
reviews_df <- fromJSON("https://api.nytimes.com/svc/movies/v2/reviews/search.json?query=love&opening-date=2000-01-01:2020-01-01&api-key=[YOUR_API_KEY_HERE]",
flatten = TRUE) %>%
data.frame()
You can now find all the information in the new reviews_df
object, which also contains other useful information about each movie. As we can see, having a package makes things easier, though more limited.