Streamlining TikTok Research: Introducing our Open-Source Library

At Cybersecurity for Democracy, one of our goals is to increase transparency in the tech industry and build tools to increase researchers ability to study online platforms.

Recently we’ve been diving into TikTok’s ecosystem to better understand the platform's influence on polarizing topics and how people communicate on social media.

To aid in this research, we’ve been utilizing the TikTok Research API, a tool made available by the company to researchers worldwide after they have been granted access through an application process. 

While we’ve discussed examples and first impressions of the API in the past, we still believe it’s a worthwhile source of data for researchers. To streamline our efforts, we’ve developed an open-source, in-house library to interface with TikTok’s API.

The library is designated to simplify data acquisition, manage long-running queries and make ad-hoc interactions with the API much smoother. 

Practical Limitations of the TikTok Research API

One of the challenges we've encountered is that the TikTok Research API, while powerful, only allows low-level interfacing. This generates several limitations:

  • Interactivity and prototyping: It’s difficult to experiment with API requests and responses quickly. Everything needs to be done through low level curl/get requests and access tokens are reset every two hours.
  • Long-running queries: Requests that will return large amounts of data (think multiple weeks of videos)  can take a long time to complete, or need to be spread over multiple sessions due to rate limits. Keeping track of large-scale requests and gracefully saving the data in intermediate steps is a key goal of our library.

How to Use Our Library

The library is available on GitHub and can be installed directly via PyPI. Here’s a quick overview of how to get started.

1. Configuring Secrets

To use the TikTok Research API, you'll need to have Research access from TikTok. Once you have your API credentials, you can configure them within the library as a secret.yaml file with the relevant client_secret and client_key fields.

2. Interacting with the API

One of the library’s core strengths is how it simplifies interacting with the TikTok API. Whether you’re querying for video metadata or comments, our interface helps you quickly prototype requests and test different queries. For example, a researcher might try obtaining all videos containing both a Garfield and Lasagna hashtag that were published in the US with the following code:

Our library supports all video, comments and user-info queries that the TikTok API currently allows. To view the results, we just call fetch on our client:

This workflow speeds up prototyping queries of interest and debugging them so that we can leave them running for larger data acquisition.

3. Handling Large-Scale Data Acquisition

Data collection at scale often presents bottlenecks, especially with APIs that have rate limits like TikTok’s, which allows for a maximum of 1,000 API calls per day. To solve this, the library supports both SQLite and PostgreSQL databases, making it easy to manage and store large amounts of data efficiently.

We recommend SQLite for smaller queries that might only take a few days and only one researcher is accessing the data, while Postgres can be used for large amounts of data that may be used by multiple researchers for analysis. 

4. Command Line Interface (CLI)

The library also includes a CLI to give you control over your data-fetching tasks directly from the command line. This makes automating long-running jobs simple and efficient. The example above, now saving in a local SQLite database, can be rewritten as:

Going forward

We’re actively using this library in several of our research projects and are constantly improving it. We encourage you to try it out, contribute to the codebase, or share any issues or feedback. 

Links