Sexually Transmitted Disease-Related Reddit Posts During the COVID-19 Pandemic: Latent Dirichlet Allocation Analysis

Johnson AK, Bhaumik R, Nandi D, Roy A, Mehta SD.

Journal of Medical Internet Research

Background: Sexually transmitted diseases (STDs) are common and costly, impacting approximately 1 in 5 people annually. Reddit, the sixth most used internet site in the world, is a user-generated social media discussion platform that may be useful in monitoring discussion about STD symptoms and exposure.

Objective: This study sought to define and identify patterns and insights into STD-related discussions on Reddit over the course of the COVID-19 pandemic.

Methods: We extracted posts from Reddit from March 2019 through July 2021. We used a topic modeling method, Latent Dirichlet Allocation, to identify the most common topics discussed in the Reddit posts. We then used word clouds, qualitative topic labeling, and spline regression to characterize the content and distribution of the topics observed.

Results: Our extraction resulted in 24,311 total posts. Latent Dirichlet Allocation topic modeling showed that with 8 topics for each time period, we achieved high coherence values (pre-COVID-19=0.41, prevaccination=0.42, and postvaccination=0.44). Although most topic categories remained the same over time, the relative proportion of topics changed and new topics emerged. Spline regression revealed that some key terms had variability in the percentage of posts that coincided with pre-COVID-19 and post-COVID-19 periods, whereas others were uniform across the study periods.

Conclusions: Our study's use of Reddit is a novel way to gain insights into STD symptoms experienced, potential exposures, testing decisions, common questions, and behavior patterns (eg, during lockdown periods). For example, reduction in STD screening may result in observed negative health outcomes due to missed cases, which also impacts onward transmission. As Reddit use is anonymous, users may discuss sensitive topics with greater detail and more freely than in clinical encounters. Data from anonymous Reddit posts may be leveraged to enhance the understanding of the distribution of disease and need for targeted outreach or screening programs. This study provides evidence in favor of establishing Reddit as having feasibility and utility to enhance the understanding of sexual behaviors, STD experiences, and needed health engagement with the public.

