This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on https://publichealth.jmir.org, as well as this copyright and license information must be included.
Gaining insights that cannot be obtained from health care databases from patients has become an important topic in pharmacovigilance.
Our objective was to demonstrate a use case, in which patient-generated data were incorporated in pharmacovigilance, to understand the epidemiology and burden of illness in Japanese patients with systemic lupus erythematosus.
We used data on systemic lupus erythematosus, an autoimmune disease that substantially impairs quality of life, from 2 independent data sets. To understand the disease’s epidemiology, we analyzed a Japanese health insurance claims database. To understand the disease’s burden, we analyzed text data collected from Japanese disease blogs (tōbyōki) written by patients with systemic lupus erythematosus. Natural language processing was applied to these texts to identify frequent patient-level complaints, and term frequency–inverse document frequency was used to explore patient burden during treatment. We explored health-related quality of life based on patient descriptions.
We analyzed data from 4694 and 635 patients with systemic lupus erythematosus in the health insurance claims database and tōbyōki blogs, respectively. Based on health insurance claims data, the prevalence of systemic lupus erythematosus is 107.70 per 100,000 persons. Tōbyōki text data analysis showed that pain-related words (eg, pain, severe pain, arthralgia) became more important after starting treatment. We also found an increase in patients’ references to mobility and self-care over time, which indicated increased attention to physical disability due to disease progression.
A classical medical database represents only a part of a patient's entire treatment experience, and analysis using solely such a database cannot represent patient-level symptoms or patient concerns about treatments. This study showed that analysis of tōbyōki blogs can provide added information on patient-level details, advancing patient-centric pharmacovigilance.
Pharmacovigilance, monitoring drugs during their product lifecycle to detect, assess, understand, and prevent adverse effects or other problems [
To expand the scope of pharmacovigilance to patients’ viewpoints, it is necessary to include data sources that can be used to analyze patient situations. Several studies [
Although several studies [
Systemic lupus erythematosus is a complex, autoimmune disease; information from multiple sources should be considered in disease management. In Japan, limited epidemiological information on systemic lupus erythematosus is available [
To understand epidemiology, treatments, and disease burden in patients with systemic lupus erythematosus, we analyzed 2 independent data sets: health insurance claims data and
For each data source used in this study—health insurance claims data or tōbyōki blogs—basic characteristics such as data structures, data points, and contents are shown.
The study protocol was reviewed and approved by the Research Institute of Healthcare Data Science (RI2018008). The board waived informed consent because the data sources do not contain identifying information.
We analyzed a Japanese health insurance claims database (JMDC Inc), which contains data from more than 3 million individuals enrolled in the database as of 2015. Patient data from January 1, 2015 to December 31, 2016 were extracted. International Classification of Diseases tenth revision [
As reported previously [
Using the health insurance claims data, we identified patients with prevalent systemic lupus erythematosus, defined as systemic lupus erythematosus diagnosed between January 1, 2015 and December 31, 2016, and calculated the overall prevalence (with 95% CI), as well as by age and by gender. We also estimated the incidence (with 95% CI) by calculating the number of patients with incident systemic lupus erythematosus, defined as an initial diagnosis between January 1, 2015 and December 31, 2016 (no systemic lupus erythematosus diagnosis in the preceding 12 months) divided by the total population during both years.
Data from patients with systemic lupus erythematosus was used to identify medications during patients’ follow-up periods. Medications were coded according to the Anatomical Therapeutic Chemical classification system [
Unstructured text written by patients was deconstructed into words using morphological analysis. Drug names mentioned in blogs were analyzed and summarized descriptively.
Symptom outcomes cannot be obtained from health insurance claims data.
We explored patients’ skin abnormality and photosensitivity symptoms, which are characteristics of systemic lupus erythematosus [
We assumed that
The number of pain-related words used in relation to systemic lupus erythematosus treatments was analyzed; term frequency–inverse document frequency (TF-IDF) analysis was conducted, which assigns a weight to each term based on the frequency of its occurrence in the document, to highlight the word characteristics for each text; a higher score may indicate that the term
where
We also sought to explore information on health-related quality of life from the unstructured patient narratives using the EQ-5D-5L questionnaire (EuroQol Group), which is a widely used validated instrument, consisting of 5 dimensions (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression), for assessing health-related outcomes in both the general population and patients [
SAS software (version 9.4; SAS Institute) was used for data analysis. To process the unstructured text, we performed morphological analysis using MeCab [
We analyzed health insurance claims data from 4694 patients with systemic lupus erythematosus and
Patient characteristics.
Age category | Health insurance claims data | ||||||||
|
|
Total, n (%) | Male, n (%) | Female, n (%) | Total, n (%) | Male, n (%) | Female, n (%) | Unknown, n (%) | |
|
4694 (100) | 994 (100) | 3700 (100) | 671 (100) | 36 (100) | 634 (100) | 1 (100) | ||
|
≤19 years old | 275 (5.9) | 86 (8.7) | 189 (5.1) | 125 (18.6) | 5 (13.9) | 120 (18.9) | 0 (0.0) | |
|
20-34 years old | 449 (9.6) | 123 (12.4) | 326 (8.8) | 233 (34.7) | 15 (41.7) | 218 (34.4) | 0 (0.0) | |
|
35-49 years old | 2175 (46.3) | 337 (33.9) | 1838 (49.7) | 71 (10.6) | 6 (16.7) | 65 (10.3) | 0 (0.0) | |
|
50-64 years old | 1557 (33.2) | 379 (38.1) | 1178 (31.8) | 5 (0.7) | 0 (0.0) | 5 (0.8) | 0 (0.0) | |
|
≥65 years old | 238 (5.1) | 69 (6.9) | 169 (4.6) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | |
|
Unknown | 0 (0.0) | 0 (0.0) | 0 (0.0) | 237 (35.3) | 10 (27.8) | 226 (35.6) | 1 (100) |
Using health insurance claims data, we found that the overall prevalence of systemic lupus erythematosus was 107.70 per 100,000 persons and was 4.4 times higher for females than that for males; females had a higher prevalence than males in all age groups (
(A) Prevalence and (B) incidence of systemic lupus erythematosus for each age range, stratified by sex. Error bars represent 95% confidence intervals.
Based on health insurance claims data, immunosuppressants, such as oral corticosteroids, and disease-modifying antirheumatic drugs were drugs frequently prescribed to patients with systemic lupus erythematosus (
Systemic lupus erythematosus drug treatments.
Drug treatments | Patients, n (%) | |||
|
|
|||
|
|
4694 (100) | ||
|
|
Oral corticosteroids, plain [H02A2] | 2529 (53.9) | |
|
|
Proton pump inhibitors [A02B2] | 1622 (34.6) | |
|
|
Antirheumatics, nonsteroidal plain [M01A1] | 1432 (30.5) | |
|
|
All other antiulcerants [A02B9] | 1333 (28.4) | |
|
|
Other immunosuppressants [L04X-] | 1266 (27.0) | |
|
|
Bisphosphonates for osteoporosis and related disorders [M05B3] | 1224 (26.1) | |
|
|
Vitamin D [A11C2] | 1129 (24.1) | |
|
|
Nonnarcotics and antipyretics [N02B-] | 1089 (23.2) | |
|
|
Topical antirheumatics and analgesics [M02A-] | 1083 (23.1) | |
|
|
Systemic antihistamines [R06A-] | 1023 (21.8) | |
|
|
Plain topical corticosteroids [D07A-] | 891 (19.0) | |
|
|
Statins (HMG-CoA reductase inhibitors) [C10A1] | 771 (16.4) | |
|
|
H2 antagonists [A02B1] | 745 (15.9) | |
|
|
Expectorants [R05C-] | 738 (15.7) | |
|
|
Angiotensin-II antagonists, plain [C09C-] | 731 (15.6) | |
|
|
|||
|
|
671 (100) | ||
|
|
Steroid | 499 (74.4) | |
|
|
Prednisolone | 470 (70.0) | |
|
|
Loxoprofen sodium hydrate | 220 (32.8) | |
|
|
Tacrolimus hydrate | 190 (28.3) | |
|
|
Alendronate sodium hydrate | 114 (17.0) | |
|
|
Aspirin | 109 (16.2) | |
|
|
Acetaminophen | 104 (15.5) | |
|
|
Lidocaine, Adrenaline bitartrate | 101 (15.1) | |
|
|
Cyclophosphamide hydrate | 99 (14.8) | |
|
|
Azathioprine | 93 (13.9) | |
|
|
Alfacalcidol | 89 (13.3) | |
|
|
Aztreonam | 88 (13.1) | |
|
|
Calcium L-aspartate hydrate | 83 (12.4) | |
|
|
Cyclophosphamide hydrate | 82 (12.2) | |
|
|
Mycophenolate mofetil | 80 (11.9) |
aAnatomical Therapeutic Chemical classification.
For the steroids that, based on both data sets, were frequently used as treatments, we analyzed dose information using health insurance claims data (
Distribution of the maximum daily dose of steroids: (A) 0-500 mg, (B) 500-1000 mg, (C) 1000-1500 mg, and (D) 1500-2000 mg.
Patient-level complaints that are not necessarily recognized as disease names cannot be derived from health insurance claims data. Symptoms that commonly present with systemic lupus erythematosus, such as “pain” and “feeling tired,” and some disease-specific symptoms, such as “moon face” and “arthralgia,” appeared frequently in blog text (
Symptoms of systemic lupus erythematosus identified from tōbyōki blog data.
Symptoms mentioned in |
Patients, n (%) | ||
|
671 (100) | ||
|
Pain | 508 (75.7) | |
|
Symptom | 504 (75.1) | |
|
Anxiety | 498 (74.2) | |
|
Adverse drug reaction | 495 (73.8) | |
|
Stress | 467 (69.6) | |
|
Aggravation | 430 (64.1) | |
|
Appetite | 416 (62.0) | |
|
Headache | 389 (58.0) | |
|
Shock symptom | 386 (57.5) | |
|
Feeling tired | 382 (56.9) | |
|
Recovery | 354 (52.8) | |
|
Feeling itchy | 326 (48.6) | |
|
Cough | 322 (48.0) | |
|
Inflammation | 297 (44.3) | |
|
Feeling abnormal | 296 (44.1) | |
|
Swelling | 296 (44.1) | |
|
Nausea | 296 (44.1) | |
|
Moon face | 295 (44.0) | |
|
Arthralgia | 292 (43.5) | |
|
Slight fever | 292 (43.5) |
aNumber of patients who described each symptom at least once in their tōbyōki blog.
We also conducted word co-occurrence network analysis to understand the characteristics of photosensitivity and erythema, which are 2 symptoms that are specific to systemic lupus erythematosus. In the word co-occurrence network analysis for photosensitivity (
Network of words co-occurring with photosensitivity in tōbyōki blogs of patients with systemic lupus erythematosus. Because the original language of the blogs is Japanese, English translations are shown.
Network of words co-occurring with erythema in tōbyōki blogs of patients with systemic lupus erythematosus. Because the original language of the blogs is Japanese, English translations are shown.
Among 671
Pain-related words had higher TF-IDF values after therapy had been mentioned than those before therapy was mentioned (
Changes in the importance of pain-related words before and after mentioning treatments. TF-IDF: term frequency–inverse document frequency; TM: therapy mentioned.
We examined the distribution of health-related quality of life words in the
Health-related quality of life estimated from pre-specified keywords mentioned in tōbyōki blogs, corresponding to the 5 dimensions of the EuroQOL 5D-5L questionnaire. TM: therapy mentioned.
In this study, analysis of
The overall prevalence (
In both health insurance claims data and
Although information on how the symptoms of the primary disease change (improve or deteriorate) with treatment and adverse events is vital in pharmacovigilance, it is impossible to obtain patient-level symptom information from health insurance claims data alone. In the clinical course of systemic lupus erythematosus, anorexia, general malaise, skin symptoms, and swelling of the face are known to occur [
TF-IDF analysis showed that pain-related words became more important after the start of treatment than they were before the start of treatment (
In health-related quality of life data from
As a strength of this study, we applied several unique approaches to obtain effective insights from
This study has several limitations. First, because
A classical medical database represents only a part of a patient's entire treatment experience, and analysis using solely such a database cannot represent patient-level symptoms or patient concerns about treatments. This study showed that web-based text data from patients could add detailed patient-level information, which can be used to advance patient-centric pharmacovigilance.
term frequency–inverse document frequency
We thank JMDC Inc for providing health insurance claims data. The authors thank Initiative Inc for providing
All authors contributed to the conception and study design. Data analysis and interpretation were performed by SM, TO, ST, YM, MM, and HK. SM drafted the initial manuscript, with support from TO, ST, MM, and HK. All authors contributed to revising the manuscript and approved the final version.
All authors are employees of Chugai Pharmaceutical Co Ltd, which provided support in the form of salaries for all authors but did not have any additional role in study design, data analysis, manuscript preparation, or the decision to publish the manuscript.