@Article{info:doi/10.2196/29238, author="Matsuda, Shinichi and Ohtomo, Takumi and Tomizawa, Shiho and Miyano, Yuki and Mogi, Miwako and Kuriki, Hiroshi and Nakayama, Terumi and Watanabe, Shinichi", title="Incorporating Unstructured Patient Narratives and Health Insurance Claims Data in Pharmacovigilance: Natural Language Processing Analysis of Patient-Generated Texts About Systemic Lupus Erythematosus", journal="JMIR Public Health Surveill", year="2021", month="Jun", day="29", volume="7", number="6", pages="e29238", keywords="social media; adverse drug reaction; pharmacovigilance; text mining; systemic lupus erythematosus; natural language processing; NLP; lupus; chronic disease; narrative; insurance; data; epidemiology; burden; Japan; patient-generated", abstract="Background: Gaining insights that cannot be obtained from health care databases from patients has become an important topic in pharmacovigilance. Objective: Our objective was to demonstrate a use case, in which patient-generated data were incorporated in pharmacovigilance, to understand the epidemiology and burden of illness in Japanese patients with systemic lupus erythematosus. Methods: We used data on systemic lupus erythematosus, an autoimmune disease that substantially impairs quality of life, from 2 independent data sets. To understand the disease's epidemiology, we analyzed a Japanese health insurance claims database. To understand the disease's burden, we analyzed text data collected from Japanese disease blogs (t{\={o}}by{\={o}}ki) written by patients with systemic lupus erythematosus. Natural language processing was applied to these texts to identify frequent patient-level complaints, and term frequency--inverse document frequency was used to explore patient burden during treatment. We explored health-related quality of life based on patient descriptions. Results: We analyzed data from 4694 and 635 patients with systemic lupus erythematosus in the health insurance claims database and t{\={o}}by{\={o}}ki blogs, respectively. Based on health insurance claims data, the prevalence of systemic lupus erythematosus is 107.70 per 100,000 persons. T{\={o}}by{\={o}}ki text data analysis showed that pain-related words (eg, pain, severe pain, arthralgia) became more important after starting treatment. We also found an increase in patients' references to mobility and self-care over time, which indicated increased attention to physical disability due to disease progression. Conclusions: A classical medical database represents only a part of a patient's entire treatment experience, and analysis using solely such a database cannot represent patient-level symptoms or patient concerns about treatments. This study showed that analysis of t{\={o}}by{\={o}}ki blogs can provide added information on patient-level details, advancing patient-centric pharmacovigilance. ", issn="2369-2960", doi="10.2196/29238", url="https://publichealth.jmir.org/2021/6/e29238", url="https://doi.org/10.2196/29238", url="http://www.ncbi.nlm.nih.gov/pubmed/34255719" }