ArXiv Preprint
An emerging trend on social media platforms is their use as safe spaces for
peer support. Particularly in healthcare, where many medical conditions contain
harsh stigmas, social media has become a stigma-free way to engage in dialogues
regarding symptoms, treatments, and personal experiences. Many existing works
have employed NLP algorithms to facilitate quantitative analysis of health
trends. Notably absent from existing works are keyphrase extraction (KE) models
for social health posts-a task crucial to discovering emerging public health
trends. This paper presents a novel, theme-driven KE dataset, SuboxoPhrase, and
a qualitative annotation scheme with an overarching goal of extracting targeted
clinically-relevant keyphrases. To the best of our knowledge, this is the first
study to design a KE schema for social media healthcare texts. To demonstrate
the value of this approach, this study analyzes Reddit posts regarding
medications for opioid use disorder, a paramount health concern worldwide.
Additionally, we benchmark ten off-the-shelf KE models on our new dataset,
demonstrating the unique extraction challenges in modeling user-generated
health texts. The proposed theme-driven KE approach lays the foundation of
future work on efficient, large-scale analysis of social health texts, allowing
researchers to surface useful public health trends, patterns, and knowledge
gaps.
William Romano, Omar Sharif, Madhusudan Basak, Joseph Gatto, Sarah Preum
2023-01-27