3.5 Import from an API

Application Programming Interfaces (APIs) provide structured ways to request and receive data directly from web services (e.g., social media platforms, news organisations, databases). When available, using an API is generally more reliable and efficient than web scraping. There are some considerations to keep in mind:

  • Registration/Authentication: Most APIs require registration to obtain an API key or token for authentication.
  • Rate Limits: APIs usually limit the requests allowed within a specific period.
  • Terms of Service: Always review the API’s terms of service regarding data usage and restrictions.
  • API Changes & Restrictions: APIs can change. Notably, access to platforms like Twitter/X and Facebook has become significantly restricted and often requires payment or enhanced verification. For instance, the Rfacebook package is no longer actively maintained. Always check the current status and documentation.
  • R Packages: Specific R packages often exist to simplify interaction with popular APIs (e.g., rtweet for Twitter/X, RedditExtractoR for Reddit, WikipediR for Wikipedia, manifestoR for the Manifesto Project corpus). If no dedicated package exists, you can use general HTTP packages like httr or httr2 combined with jsonlite to handle requests and responses.

To demonstrate how this works, let us have a look at the New York Times Movie Reviews API (requires registering for an API key at https://developer.nytimes.com/):

library(httr)
library(jsonlite)
library(tidyverse)

nyt_api_key <- "[YOUR_API_KEY_HERE]"  # Replace '[YOUR_API_KEY_HERE]' with your actual key

# Construct the API request URL
base_url <- "[https://api.nytimes.com/svc/movies/v2/reviews/search.json](https://api.nytimes.com/svc/movies/v2/reviews/search.json)"
query_params <- list(query = "love", `opening-date` = "2000-01-01:2020-01-01", `api-key` = nyt_api_key)

response <- GET(base_url, query = query_params)  # Make the API request using httr::GET()

# Parse the JSON content
content_json <- content(response, as = "text", encoding = "UTF-8")
reviews_list <- fromJSON(content_json, flatten = TRUE)
reviews_df <- as_tibble(reviews_list$results)  # Convert the relevant part of the list (results) to a data frame

This example retrieves movie reviews published between 2000 and 2020 containing the word “love”. The response is in JSON format, which jsonlite::fromJSON() converts into an R list, which is subsequently transformed into a tibble (a type of data frame).