Published on in Vol 10 (2024)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/52353, first published .
A Multimorbidity Analysis of Hospitalized Patients With COVID-19 in Northwest Italy: Longitudinal Study Using Evolutionary Machine Learning and Health Administrative Data

A Multimorbidity Analysis of Hospitalized Patients With COVID-19 in Northwest Italy: Longitudinal Study Using Evolutionary Machine Learning and Health Administrative Data

A Multimorbidity Analysis of Hospitalized Patients With COVID-19 in Northwest Italy: Longitudinal Study Using Evolutionary Machine Learning and Health Administrative Data

Original Paper

1Centre for Biostatistics, Epidemiology, and Public Health, Department of Clinical and Biological Sciences, University of Turin, Turin, Italy

2Modeling and Data Science, Department of Mathematics, University of Turin, Turin, Italy

3Data Analysis and Modeling Unit, Department of Veterinary Sciences, University of Turin, Turin, Italy

4Department of Translational Medicine, University of Piemonte Orientale, Novara, Italy

5Unit of Epidemiology, Regional Health Service, Local Health Unit Torino 3, Turin, Italy

Corresponding Author:

Dayana Benny, PhD

Centre for Biostatistics, Epidemiology, and Public Health

Department of Clinical and Biological Sciences

University of Turin

Regione Gonzole 10

Orbassano

Turin, 10043

Italy

Phone: 39 0116705440

Email: dayana.benny@unito.it


Background: Multimorbidity is a significant public health concern, characterized by the coexistence and interaction of multiple preexisting medical conditions. This complex condition has been associated with an increased risk of COVID-19. Individuals with multimorbidity who contract COVID-19 often face a significant reduction in life expectancy. The postpandemic period has also highlighted an increase in frailty, emphasizing the importance of integrating existing multimorbidity details into epidemiological risk assessments. Managing clinical data that include medical histories presents significant challenges, particularly due to the sparsity of data arising from the rarity of multimorbidity conditions. Also, the complex enumeration of combinatorial multimorbidity features introduces challenges associated with combinatorial explosions.

Objective: This study aims to assess the severity of COVID-19 in individuals with multiple medical conditions, considering their demographic characteristics such as age and sex. We propose an evolutionary machine learning model designed to handle sparsity, analyzing preexisting multimorbidity profiles of patients hospitalized with COVID-19 based on their medical history. Our objective is to identify the optimal set of multimorbidity feature combinations strongly associated with COVID-19 severity. We also apply the Apriori algorithm to these evolutionarily derived predictive feature combinations to identify those with high support.

Methods: We used data from 3 administrative sources in Piedmont, Italy, involving 12,793 individuals aged 45-74 years who tested positive for COVID-19 between February and May 2020. From their 5-year pre–COVID-19 medical histories, we extracted multimorbidity features, including drug prescriptions, disease diagnoses, sex, and age. Focusing on COVID-19 hospitalization, we segmented the data into 4 cohorts based on age and sex. Addressing data imbalance through random resampling, we compared various machine learning algorithms to identify the optimal classification model for our evolutionary approach. Using 5-fold cross-validation, we evaluated each model’s performance. Our evolutionary algorithm, utilizing a deep learning classifier, generated prediction-based fitness scores to pinpoint multimorbidity combinations associated with COVID-19 hospitalization risk. Eventually, the Apriori algorithm was applied to identify frequent combinations with high support.

Results: We identified multimorbidity predictors associated with COVID-19 hospitalization, indicating more severe COVID-19 outcomes. Frequently occurring morbidity features in the final evolved combinations were age>53, R03BA (glucocorticoid inhalants), and N03AX (other antiepileptics) in cohort 1; A10BA (biguanide or metformin) and N02BE (anilides) in cohort 2; N02AX (other opioids) and M04AA (preparations inhibiting uric acid production) in cohort 3; and G04CA (Alpha-adrenoreceptor antagonists) in cohort 4.

Conclusions: When combined with other multimorbidity features, even less prevalent medical conditions show associations with the outcome. This study provides insights beyond COVID-19, demonstrating how repurposed administrative data can be adapted and contribute to enhanced risk assessment for vulnerable populations.

JMIR Public Health Surveill 2024;10:e52353

doi:10.2196/52353

Keywords



Background

COVID-19, classified as a highly infectious disease, poses a severe threat to vulnerable populations, making it a critical public health concern and a significant global epidemiological issue. The first Italian case of COVID-19 was diagnosed in the Lombardy region on February 21, 2020. The virus quickly spread across the country, leading to a nationwide lockdown and overwhelming the health care system. Italy was among the countries hardest hit by the COVID-19 pandemic, with Piedmont, a region in the northwest, experiencing a high number of cases during the first wave.

Multimorbidity refers to the presence of multiple coexisting medical conditions in a patient, which interact with each other, resulting in a complex and multidimensional health condition [1]. At the population level, it has been established that interactions between diseases can increase the severity of the overall medical condition and complicate the treatment of other diseases within the combination [2,3]. In people infected with SARS-CoV-2, multimorbidity can increase the severity of the infection [4,5]. Therefore, it is important to identify specific disease combinations that could impact the severity of COVID-19 in individuals with multimorbidity.

It is necessary to point out that having one or more of these chronic health conditions does not guarantee severe COVID-19 development, but it does increase the risk. Different diseases may affect COVID-19 outcomes differently. Therefore, identifying specific disease combinations and studying interactions between different chronic health conditions are essential for understanding the severity of COVID-19 among individuals with multimorbidity. This can help health care professionals identify those at the highest risk of severe complications and provide appropriate prevention, care, and treatment.

Studying multimorbidity using traditional methods can be labor-intensive, requiring the identification of high-dimensional combinatorial features. Also, there is no universally accepted list of medical conditions to define multimorbidity. To address these challenges, efforts must focus on identifying low-dimensional representations of multimorbidity features for effective outcome prediction. High-order input features make machine learning models more prone to overfitting, and identifying meaningful high-order combinatorial features requires extensive effort from experts with domain knowledge.

Contrary to misconceptions, the concept of multimorbidity analysis in patients with COVID-19 is not outdated. Our study introduces a cutting-edge tool designed to analyze the complex interactions among diverse chronic health conditions and their collective impact, which could be valuable in situations similar to recent health crises.

Traditionally, research on multimorbidity has focused on counting the total number of chronic conditions rather than considering individual experiences and the effects of different combinations of diseases [6]. Count-based measures of multimorbidity have been utilized to predict emergency hospitalizations [7,8]. Common combinations of medical conditions have been documented to delineate patterns of multimorbidity [9,10]. Previous studies have explored multimorbidity combinations using methods such as latent class analysis [11], cluster analysis [12], network analysis [13], factor analysis [14], association rules, and tree-based analysis [13,15].

Rare features, such as diseases and drugs with low occurrence rates in the data, can pose significant challenges for both statistical and machine learning analyses. Their lower prevalence in the data can result in sparsity, which may lead to poorer predictions.

Some studies using machine learning to investigate multimorbidity patterns tackled data set sparsity by strategies such as removing sparsity-inducing features [16], consolidating feature categories after one-hot encoding [17], or clustering rare features [18]. However, while these methods can alleviate sparsity, they may also result in the loss of important information and impede the meaningful interpretation of multimorbidity features [19].

Importance of the Proposed Method for Multimorbidity Research

With the increasing prevalence of electronic health records and other large data sets, there is a rising demand for efficient and effective methods to analyze and comprehend multimorbidity. By utilizing machine learning algorithms and other advanced computational techniques, researchers can gain deeper insights into the underlying mechanisms and risk factors associated with multimorbidity. This understanding can significantly inform more effective prevention and treatment strategies.

An evolutionary algorithm coupled with deep learning–based feature scoring represents a powerful approach for analyzing multimorbidity data [20]. This method involves multiple steps aimed at identifying the most relevant features for predicting the target variable while minimizing the number of features used.

Initially, the data set undergoes preprocessing by using a feature binning approach to generate various subsets or bins of the multimorbidity features. This step aims to reduce sparsity in the data and facilitates more effective feature scoring. Subsequently, deep learning is utilized to score the features within each subset based on their relevance for predicting the target variable. The result of this process is a feature score assigned to each feature within every subset. Next, an evolutionary algorithm is applied to select the optimal subset of features based on their scores. The algorithm initiates by generating a population of candidate feature subsets and iteratively enhances this population through selection, crossover, and mutation operations [21]. The fitness of each candidate solution is assessed using a fitness function that integrates the deep learning–based feature scores derived from each subset or bin of features.

The output of the evolutionary algorithm is a subset of features that are most relevant for predicting the target variable. These features can be utilized for further analysis, such as constructing a predictive model or uncovering the underlying associations of multimorbidity patterns. In summary, using an evolutionary algorithm with deep learning–based feature scoring offers a robust approach to analyzing multimorbidity data, pinpointing the most influential features for predicting the target variable. This approach can result in enhanced model performance, faster training times, and improved interpretability in complex data sets featuring multimorbidity [22].

Study Goal

This study aims to identify multimorbidity patterns that may serve as predictors of COVID-19 hospitalization (as a proxy for a more severe COVID-19 outcome) using evolutionary algorithms. To assess the effectiveness of our approach, we conducted a comparative analysis to justify the use of deep learning as a classifier over other established machine learning algorithms in terms of prediction accuracy. The evolutionary model may excel not only because of its superior predictive performance but also because it effectively manages sparsity. This capability could result in better identification of key features, more stable predictions, or enhanced performance within specific subgroups of the data [23].

A logistic model might reveal multimorbidity patterns that exhibit complexity[24]. However, linear models, despite their high interpretability, may struggle in sparse data sets where effective feature selection is relevant [25]. In such cases, evolutionary algorithms, particularly genetic algorithms, excel by adeptly handling feature interactions with complexity and identifying optimal feature subsets, a task that proves challenging for linear models in sparse data scenarios [26].

The selected models are interpreted using Shapley Additive Explanations (SHAP) values [27] to understand the relationship between the multimorbidity features across different cohort data and hospitalization outcomes. The proposed method also generates a feature-engineered data set containing a specified number of outcome-associated combinations or bins of multimorbidity. Then, the best performing bins are analyzed to explore the frequency of various multimorbidity patterns across all cohorts.

This study demonstrates how our innovative tool has the potential to revolutionize traditional risk assessment approaches. By incorporating complex combinations of diseases, the tool aims to improve the accuracy of predicting severe outcomes for individuals with multiple chronic conditions. With its adaptable design, it ensures applicability even in evolving scenarios involving different communicable diseases, highlighting its ongoing relevance. This study focuses on investigating the complexities of disease interactions, demonstrating how our tool could reshape risk assessment in similar contexts.


Study Design

This retrospective cohort study is designed to exhaustively examine the presence or absence of various multimorbidities in patients over a 5-year period leading up to the onset of COVID-19. The core of our analysis is the longitudinal tracking of these multimorbidities, relevant for understanding their impact on subsequent health outcomes, particularly hospitalizations due to COVID-19. Central to the study’s design is its longitudinal nature, involving systematic analysis of patient data collected over a specified time frame to assess how individual and collective health conditions influence the risk and severity of COVID-19–related hospitalizations. The retrospective cohort framework enables the use of existing medical records, including hospital discharge summaries and drug prescription data, to construct a comprehensive picture of each patient’s health status in the years leading up to the pandemic. Through an analysis of these long-term health patterns, our aim is to understand how preexisting conditions influence the severity of COVID-19.

This study involves examining individuals’ multimorbidity history over a 5-year period, encompassing both pre– and post–COVID-19 diagnosis periods. It further investigates the relationship between multimorbidity history, COVID-19 positivity, and the subsequent severity of COVID-19. This design allows for observing participants over an extended time frame and evaluating outcomes not only before but also after the critical event of COVID-19 diagnosis.

Multimorbidity Data Set

Data for the multimorbidity analyses were collected from the Piedmont Longitudinal Study (PLS), a health administrative cohort comprising anonymized records linked at the individual level from various social, health, and administrative databases [5,19]. Since February 2020, the PLS has been augmented by the regional COVID-19 platform, which collects data on COVID-19 infections. From these databases, we utilized registers for (1) hospital discharges, (2) drug prescription data, and (3) COVID-19 hospitalizations of individuals diagnosed with SARS-CoV-2 infection for the first time between February 22, 2020, and May 31, 2020. We retrieved the 5-year medical history of patients positive for COVID-19 from these data sets. The extracted data comprises 12,793 individuals aged 45-74 years who tested positive for the first time for SARS-CoV-2 infection. Our study specifically focused on this age group to eliminate potential influences from both younger and older individuals on the results. Also, this approach helped to mitigate bias associated with patients residing in nursing homes.

As the study was integrated into the National Statistical Plan, no ethical approvals or permits were required, and the database used for the analyses contained only anonymized data. Further information regarding ethical considerations and data availability can be found in the “Ethical Considerations” section.

Ethical Considerations

This study is part of the PLS, a specific project within the Italian National Statistical Program, proposed by the National Statistical System (SISTAN), integrated into the National Statistical Program (Programma Statistico Nazionale [PSN]), an initiative endorsed by the National Institute of Statistics (Istituto Nazionale di Statistica [ISTAT]) in Italy. This project is annually approved by the Italian Parliament. Since 2003, a dedicated form (PIE-00001 “Monitoring of socio-economic differences in mortality and morbidity through longitudinal studies”) has been included in the PSN, currently effective for the 3-year period from 2020 to 2022 [28] and recently renewed for 2023-2025.

Ethical approval or permits from the ethics committee are not necessary for this research. Access to PLS data within the responsible institution does not require informed consent, as stipulated by the Presidential Decree published in the Official Gazette of the Italian Republic N. 122 of May 26, 2022, under the PSN [29].

Consequently, informed consent from an ethical committee is not required for this study. All analyses adhered to the principles of the World Medical Association’s Declaration of Helsinki, and to preserve privacy, the data used for analysis underwent deidentification.

Construction of the Exposure’s Variables

In this longitudinal cohort study, patients’ multimorbidity status over the past 5 years (2015-2019) was compared in relation to a specific outcome (hospitalization due to COVID-19). Multimorbidity was defined using records from hospital discharge and drug prescription registers. In the data sets for hospital discharges and drug prescriptions, multiple entries exist for each patient with COVID-19. The drug prescriptions data set comprises approximately 1 million records, while the hospital discharges data set includes around 19,000 entries. From the drug prescriptions data set, the Anatomical Therapeutic Chemical (ATC) classification system codes were used. All distinct ATC codes up to the 4th level (the first 5 digits of the ATC codes) were considered in this study. One-hot encoding was applied to convert categorical codes into separate feature columns with binary values (0 or 1) indicating the absence or presence of drugs in each patient’s prescription history. Similarly, from the hospital discharge data, the 9th International Classification of Diseases-Clinical Modification (ICD-9-CM) codes [30] (as diagnosis codes) were used, and one-hot encoding was applied. Following these transformations, only drug codes and diagnosis codes meeting the criterion “at least 100 patients with this code in the COVID-19-positive patients’ data” are retained. Consequently, 194 features were derived from drug codes (n=112) and diagnosis codes (n=82) as multimorbidity features from the entire data set, where the presence and absence of these features are denoted as 1 and 0, respectively. Also, 2 features—age and sex—are included, with sex coded as 1 for females and 0 for males. Subsequently, the preprocessed data were segmented into 4 data sets based on age and sex. The data set transformation steps are illustrated in Figure 1.

Subsets of various cohorts were obtained by considering the study population falling within the age criteria of “aged 45-59 years” and “aged 60-74 years.” This subdivision is made because individuals aged 60 and above are often categorized as part of the older population [5]. Median values within each age range are used as threshold values for discretizing the age feature in this study. This approach involves categorizing or binning based on median values within each specified age range. For example, to discretize the age feature into groups such as “45-59” and “60-74,” we used medians (53 for “45-59” and 68 for “60-74”) as thresholds.

In the data sets for the younger cohorts (cohorts 1 and 2), the age feature was converted into a binary variable, where 1 represents age>53 and 0 represents age≤53. The age values were derived from the 2020 COVID-19 data, and the age of 53 years was used as a threshold to divide the younger population into 2 subgroups (45-53 and 54-59 years). Similarly, the older population was divided into 2 subgroups (60-68 and 69-74 years), where the age feature was converted into a binary variable, with 1 indicating age>68 and 0 indicating age≤68. All 4 cohort data sets were treated as distinct binary classification problems. The input variables, comprising multimorbidity history and age, along with the outcome variable indicating whether a patient was hospitalized due to COVID-19, were represented as binary values.

In our study, multimorbidity features included the presence and absence of prescribed drugs and diagnosed diseases, as well as patient age and sex. However, due to the rarity of many medical conditions in the study population, the resulting data set became sparse when encoding absence as 0 values.

Figure 1. Transformation of data sets: The Multimorbidity data, derived from prescription and hospital discharge data sets, are merged with the COVID-19 database containing patients who tested positive. The resulting data set is then subdivided into 4 cohorts based on age and sex. ATC: Anatomical Therapeutic Chemical; ICD: 9th International Classification of Diseases-Clinical Modification.

Data Imbalance Rectification

A significant challenge when working with clinical data is predicting rare events, which can result in an imbalance problem when the target variable has more observations in one class than in others. Therefore, it is beneficial to handle imbalanced raw data properly to prevent bias toward a particular class. All data sets used in this study exhibit imbalance, and resampling is recommended as a solution. To address this, randomly balanced samples were drawn from the unbalanced original data set to achieve class balance. Subsequently, a statistical hypothesis test, specifically the one-proportion z-test, was performed. This test compares the proportion of the sampled population with that of the raw data population, ensuring the representativeness of the randomly balanced sample data compared with the original cohort data set and mitigating potential biases.

The steps performed to obtain an unbiased balanced data set with significant features are as follows:

  • Extract all minority and majority samples attributed to the outcome value from the original cohort data set.
  • Randomly select samples belonging to the majority class so that they are equal in number to the minority class to achieve a balanced data set.
  • Calculate the prevalence of each feature in the randomly selected samples and compare it with the prevalence in the original population.
  • Perform a one-proportion z-test on all nonzero variables to assess whether the frequency distribution of a feature in the resampled data is representative of the same feature in the original cohort data set, using a significance level of .05.
  • Evaluate the results of the one-proportion z-test, considering the test statistic and P values, to determine the significance and eliminate nonsignificant features from the sampled data.

Model Development

Machine Learning Algorithms

To select the best model, we evaluated the performance of various supervised machine learning algorithms. Using labeled health records enables the application of supervised learning, specifically binary classification to classify a patient’s multimorbidity profile. Deep learning and other machine learning algorithms were applied to all cohort data sets, as depicted in Figure 2. Results were compared using a scoring grid with average cross-validated scores.

Figure 2. Selecting best ML model for each cohort data set: A streamlined process of selecting the optimal ML model for cohort data sets, using supervised algorithms for binary classification of multimorbidity profiles, with comparison based on a scoring grid featuring average cross-validated scores. AUC: area under the curve; ML: machine learning.
SHAP Analysis

SHAP values were used to elucidate the contribution of individual features in predicting hospitalization outcomes across all cohorts. These SHAP values for all features were plotted, with their positions on the y-axis indicating their impact on the model outcome. Beeswarm plots of SHAP values were used to explore the distribution of influence that each feature has on the model outcome, with features of greater importance positioned higher on the graph. Each data point for a feature corresponds to a single patient, with the position of the data point (SHAP value) on the x-axis indicating the effect of that feature on the model outcome for that specific patient. In the SHAP beeswarm plots, when multimorbidity is present (indicated by a feature value of 1 in red), a higher positive SHAP value suggests that this feature acts as a risk factor for hospitalization. Conversely, a more negative SHAP value in the presence of multimorbidity indicates that this feature acts as a protective factor against hospitalization risk for the patient. These findings are illustrated in Figure 3.

Figure 3. SHAP beeswarm plots illustrating the impact of all features on COVID-19 hospitalization for all 4 models. SHAP: Shapley Additive Explanations.
Deep Learning With Sparse Data

This study addresses a sparse health care data set that includes rare medical conditions and drugs, posing challenges for statistical and machine learning analyses due to their low prevalence [23]. To tackle this issue, the study utilizes sequential deep learning with the Adaptive Gradient Algorithm (AdaGrad), an optimization algorithm well-suited for handling sparse data [31]. AdaGrad’s adaptive scaling of the learning rate eliminates the need for manual tuning and enhances robustness compared with stochastic gradient descent. Also, the study uses early stopping functionality to improve the model’s performance.

In all deep learning models, dropout has been used as a regularization technique to mitigate overfitting during training [32]. Specifically, a dropout layer with a 20% dropout rate has been introduced after the first and second layers in the sequential model. Given the binary classification nature of the problem, the default loss function used is binary cross-entropy loss [33].

Feature Selection for Discovering the Optimal Set of Multimorbidity Features

Feature selection as a preprocessing method eliminates irrelevant and redundant information, aiding in dimensionality reduction [34]. There are 3 main methods of feature selection: filter-based, embedded, and wrapper-based methods. Filter-based methods typically generate models with reduced predictive performance compared with the other 2 methods. The embedded method performs an optimal feature subset search while constructing the model, whereas the wrapper method selects the best feature subset based on the classifier’s performance. In our study, we used a wrapper method that utilizes deep learning as the classifier algorithm and an evolutionary algorithm as the search strategy to generate feature subsets (bins). The best performing bin is determined using the area under the curve (AUC) metric and selected as the optimal subset of multimorbidity features highly associated with COVID-19 hospitalization.

Evolutionary Machine Learning

The use of evolutionary algorithms represents a promising approach for extracting a reduced set of meaningful and accurate rare associations, particularly beneficial for addressing challenges such as sparse data, epistatic associations with features, and high-dimensional representations of features. Evolutionary machine learning is a hybrid method that leverages evolutionary computation to overcome challenges encountered in various machine learning tasks [35]. Compared with traditional algorithms that rely on exhaustive search-based techniques, evolutionary algorithms offer a more robust solution.

Several key considerations arise when performing feature engineering with evolutionary algorithms: (1) a feature’s lack of prevalence does not necessarily imply irrelevance; it could still strongly influence the outcome; (2) addressing data sparsity poses a challenge for many machine learning methods, particularly concerning features with near-0 variance; and (3) evaluating combinations of features may yield greater predictive power than assessing isolated features alone, emphasizing the significance of exploring feature interactions.

We used a genetic algorithm to create an optimized feature matrix. Initially, features were randomly grouped into bins, each forming a feature matrix. These bins were then regrouped using a genetic algorithm and a wrapper-based method interacting with a classifier. The study adopted the elitism principle to preserve the best-performing bins as checkpoints. The final feature matrix represents the evolved engineering matrix after all iterations, designed to address issues of data sparsity and incorporate interactions among various multimorbidity features.

The proposed evolutionary approach in the study is an evolutionary algorithm–based wrapper method, illustrated in Figure 4. It is a modified version of an existing evolutionary algorithm known as the Relevant Association Rare-variant-bin Evolver [23]. The proposed method differs from the existing approach in several ways: it utilizes a prediction-based method with separate training and testing phases, incorporates a deep learning technique with an AdaGrad optimizer, and estimates the frequency of occurrence of specific features within the best performing feature combinations. Also, the scores produced by the deep learning model serve as fitness scores to assess the performance of multimorbidity combinations in each cycle.

Figure 4. Illustration of the evolutionary approach carried out in this study: The process begins by randomly initializing subsets of features. Through the execution of evolutionary computation, a final feature matrix is generated. Subsequently, frequently occurring combinations and features are identified.
Frequent Multimorbidity Features

The most prevalent multimorbidity combinations were identified to discern patterns among patients with COVID-19 using the Apriori algorithm. Applied to the final bins data set, which includes various multimorbidity feature combinations obtained from the evolutionary algorithm, the Apriori algorithm utilized the support measure to gauge the commonality of feature combinations across rows in the final bins. To focus on relevant feature combinations, only the most common multimorbidity patterns were analyzed. Frequent combinations of features were examined using a minimum support threshold (smin) set at 0.5 to derive frequent itemsets.


Characteristics of the COVID-19 Population

Table 1 summarizes the characteristics of the COVID-19 population, while Table 2 presents the distribution of hospitalized and nonhospitalized patients.

Table 1. Characteristics of the COVID-19 population.
DemographicsValues
Age groups (years), n/N (%)

45-53a4179/7324 (57.06)

54-59a3145/7324 (42.94)

60-68b3296/5469 (60.27)

69-74b2173/5469 (39.73)
Female, n/N (%)

Younger age group4477/7324 (61.13)

Older age group2355/5479 (42.98)
Male, n/N (%)

Younger age group2847/7324 (38.87)

Older age group3114/5479 (56.84)
Age (years), mean (SD)

Younger age group52.3 (4.18)

Older age group67 (4.55)

aConsidered the younger age group.

bConsidered the older age group.

Table 2. Distribution of hospitalized and nonhospitalized patients with COVID-19.
DemographicsHospitalizedNonhospitalized

Male, n/NFemale, n/NMale, n/NFemale, n/N
Age group (years)




45-591101/1717616/17171746/56073861/5607

45-53522/825303/8251031/33542323/3354

54-59579/892313/892715/22531538/2253

60-741974/2927953/29271140/25421402/2542

60-68a1073/1585512/1585740/1711971/1711

69-74a901/1342441/1342400/831431/831

aConsidered the older age group.

One-Proportion z-Test Results

The one-proportion z-test was conducted on all features, and the results comparing randomly sampled data with the original cohort data sets are presented in Multimedia Appendix 1.

Performance of Machine Learning Models and Model Selection

Table 3 illustrates the performance evaluation of the deep learning model used across all 4 cohorts. The evaluation of other machine learning models is presented in Multimedia Appendix 2.

For each cohort, as depicted in Figure 5, 2 line plots were generated to validate the model’s effectiveness using cross-validation.

Table 3. Performance evaluation of the deep learning model.
CohortAUCa score 5-fold CVb (SD), %Training AUC score (loss)Test AUC score (loss)Accuracy, %Precision, %Recall, %F1-score, %
177 (1.87)82% (0.28)80% (0.29)76856372
268 ( 1.94)71% (0.30)67% (0.32)62626162
367 (1.87)74% (0.31)69% (0.32)67706065
461 (2.44)65% (0.34)62% (0.34)63626865

aAUC: area under the curve.

bCV: coefficient of variation.

Figure 5. Model loss plot and AUC score over epochs—validation of model efficiency for each cohort through 2 line plots: The topmost plot depicts the binary cross-entropy loss for the epochs for the training and validation data sets, and the bottommost one presents the classification performance (AUC score) over epochs. AUC: area under the curve.

In cohort 1, it is evident that the model learns the problem efficiently and rapidly, achieving an AUC score of 82% on the training data set and 80% on the test data set. The close similarity between these scores suggests that the model is neither overfitting nor underfitting. The cross-entropy loss plot showed that the model has converged, with acceptable loss values observed on both data sets. The classification performance plot further indicated convergence. The model’s performance and convergence suggested that cross-entropy loss is suitable for effectively learning this neural network problem. In cohort 2, the model achieved performance scores of 71% on the training data set and 67% on the test data set, with reasonable loss values. The minimal difference between these scores indicated that the model learned the problem satisfactorily. In cohort 3, the model achieved a training score of 74% and a test AUC score of 69%. Observing that there was no significant improvement after 30 epochs, early stopping could be implemented during model training to prevent overfitting and stabilize the validation loss. In cohort 4, although the loss plot appeared well-converged, the model showed slightly lower classification performance compared with the models in other cohorts.

Influence of Individual Features on COVID-19 Hospitalization: Most Prevalent Multimorbidity Features in Evolved Bins

The accuracy scores of the evolutionarily obtained final bins have been calculated. The highest accuracy was achieved for cohort 1 using the evolutionary approach to find outcome-associated best subsets of features, reaching 71.43% (95% CI 67.31-67.97) with 64 features. For cohort 2, the accuracy was 63% (95% CI 59.43-59.75) using 69 features. Cohort 3 achieved an accuracy of 62.38% (95% CI 59.84-60.09) with 53 features, while cohort 4 achieved an accuracy of 58% (95% CI 55.42-55.63) using 61 features. These results were then compared with the accuracy score of the deep learning model that utilized all features, as illustrated in Figure 6.

Figure 6. Maximum classification accuracy achieved by a bin versus number of features in that bin using evolutionary approach (left side) and the accuracy score achieved exclusively by the deep learning model (right side) with all the available features in the cohort.

In cohort 1, frequently occurring multimorbidity features included age>53, R03BA (glucocorticoid inhalants), and N03AX (other antiepileptics). For cohort 2, A10BA (biguanide or metformin) and N02BE (anilides) were prevalent. Cohort 3 exhibited frequent occurrences of N02AX (other opioids) and M04AA (preparations inhibiting uric acid production), while G04CA (Alpha-adrenoreceptor antagonists) was notable in cohort 4.

Table 4 displays the multimorbidity features that occurred most frequently in the final bins data set across all cohorts, using a minimum support (smin) measure of 0.6. It includes the prevalence of these features in the sampled data set. Detailed statistics for all other features can be found in Multimedia Appendix 3.

Table 4. Frequently occurred morbidity features in the evolutionarily obtained final bins data set with support measure with corresponding P values, and the prevalence of the features in the sampled data set utilized for the predictive analysis.
Cohort, category, and featuresDescriptionP valueSupportPrevalence
1

Age


>53a<.0010.8441.15

ATC#b


R03BAGlucocorticoids<.0010.8515.5


N03AXOther antiepileptics<.0010.825.6


R06AXOther antihistamines for systemic use<.0010.796.74


J01XXOther antibacterials<.0010.7814.2


C03CASulfonamides, plain<.0010.765.19


N02AXOther opioids<.0010.746.9


A11CCVitamin D and analogs<.0010.7323.05


C09CAAngiotensin II receptor blockers, plain<.0010.695.44


J01CAPenicillins with extended spectrum<.0010.6614.12


J01EECombinations of sulfonamides and trimethoprim, including derivatives.030.612.44

ICD#c


298Other nonorganic psychoses.160.680.16


411Other acute and subacute forms of ischemic heart disease.320.620.08
2

ATC#


A10BABiguanides<.0010.864.31


N02BEAnilides<.0010.796.4


J05ABNucleosides and nucleotides (excluding reverse transcriptase inhibitors)<.0010.762.91


C03CASulfonamides, plain<.0010.764.09


M04AAPreparations inhibiting uric acid production<.0010.745.13


C09CAAngiotensin II receptor blockers, plain<.0010.718.4


C02CAAlpha-adrenoreceptor antagonists<.0010.653.22


C08CADihydropyridine derivatives<.0010.657.4


J02ACTriazole and tetrazole derivatives.030.646.18


N06ABSelective serotonin reuptake inhibitors.030.638.58


S01EEProstaglandin analogs.070.620.68


N03AGFatty acid derivatives.080.612.5


M01ABAcetic acid derivatives and related substances.0010.618.21


N03AEBenzodiazepine derivatives.170.61.54

ICD#


V64Surgical or other procedures not carried out because of contraindications>.990.640.18


V54Other orthopedic aftercare.260.640.32


188Malignant neoplasm of the bladder.320.630.18


735Acquired deformities of the toe>.990.60.18


454Varicose veins of lower extremities.830.61.04


820Fractures of the neck of the femur.320.60.05
3

ATC#


N02AXOther opioids<.0010.8412.96


M04AAPreparations inhibiting uric acid production<.0010.828.5


C03EALow-ceiling diuretics and potassium-sparing agents<.0010.765.35


A02BAH2-receptor antagonists.0040.754.04


B01ABHeparin group<.0010.7312.59


N03AXOther antiepileptics<.0010.711.7


N02AANatural opium alkaloids<.0010.6813.9


J05ABNucleosides and nucleotides (excluding reverse transcriptase inhibitors).0080.655.77


A12AACalcium.110.624.67


C07BBBeta blocking agents, selective, and thiazides.160.622.62


B03BBFolic acid and derivatives<.0010.619.23


R03ACSelective beta-2-adrenoreceptor agonists.0050.67.19

ICD#


295Schizophrenic disorders.030.680.73


813Fractures of the radius and ulna.620.680.84
4

ATC#


G04CAAlpha-adrenoreceptor antagonists.020.825.75


J01CAPenicillins with extended spectrum.0080.7314.47


C09DAAngiotensin II receptor blockers and diuretics.070.6613.11


C09AAACE inhibitors, plain.030.6626.32


B01AAVitamin K antagonists.0010.644.61


C03CASulfonamides, plain.0020.6216.49

ICD#


995Certain adverse effects not elsewhere classified>.990.610.44

aNot available.

bATC: Anatomical Therapeutic Chemical.

cICD: 9th International Classification of Diseases.

The graph in Figure 7 illustrates the combinations derived from analyzing all 2-variable combinations with a minimum support (smin) of 0.5. Detailed results for these combinations can be found in Multimedia Appendix 4.

Figure 7. Frequent outcome-associated multimorbidity feature combinations (2 variable combinations with smin=0.5) in each cohort.

We observed that certain multimorbidity features appear consistently across most outcome-associated bins. Additionally, some features are common and frequent across the final bins of various cohorts. Table 5 tabulates the features and combinations that frequently appeared in the final bins data set, using a support (s) threshold between 0.7 and 1.0. These findings are graphically presented in Figure 8.

Table 5. Frequently appeared features and combinations in the final bins data set when the support (s) is configured between 0.7 and 1.0.
SupportLength of the combinationFrequent featuresCohort
0.851ATC R03BA1
0.841Age>531
0.821ATC N03AX1
0.791ATC R06AX1
0.781ATC J01XX1
0.761ATC C03CA1
0.741ATC N02AX1
0.742Age>53, ATC R03BA1
0.731ATC A11CC1
0.722ATC N03AX, ATC R03BA1
0.722Age>53, ATC N03AX1
0.861ATC A10BA2
0.791ATC N02BE2
0.761ATC C03CA2
0.761ATC J05AB2
0.741ATC M04AA2
0.711ATC C09CA2
0.841ATC N02AX3
0.821ATC M04AA3
0.761ATC C03EA3
0.751ATC A02BA3
0.731ATC B01AB3
0.712ATC M04AA, ATC N02AX3
0.71ATC N03AX3
0.81ATC G04CA4
0.731ATC J01CA4
Figure 8. Illustration of the features and combinations that frequently appeared in the final bins data set when configuring the support (s) between 0.7 and 1.0 as radar chart, with features presented in more than 1 cohort stacked.

Principal Findings

The primary findings of the study highlight prevalent multimorbidity patterns identified within the evolved data set. These patterns, characterized by specific ATC codes and ICD codes, show significant associations with hospitalization outcomes, particularly among distinct demographic groups. This analysis not only provides insights into COVID-19 but also suggests potential broader applications. Repurposing data originally collected for administrative purposes, this innovative approach shows promise for multimorbidity analysis in public health. It shows the adaptability and versatility of the methodology, capable of extracting valuable insights from existing data sets to inform effective public health strategies and interventions.

While our evolutionary machine learning model shows only marginal superiority compared with other prediction models, even slight improvements in predictive performance can hold significant value in real-world applications, particularly in critical fields such as health care where accuracy is mandatory. Moreover, we acknowledge that achieving the highest prediction performance may not be the sole objective of our study.

In the baseline method, variables are transformed into a binary format using one-hot encoding, which leads to the creation of a large, sparse matrix [26]. Evolutionary models typically excel in handling high-dimensional data compared with linear models, utilizing their enhanced ability to navigate and effectively utilize the search space [36]. Evolutionary approaches have drawbacks such as challenges in interpretability, computational efficiency, and a higher risk of overfitting [37]. However, despite the simplicity and clear interpretability of linear models, evolutionary models excel in managing complex, high-dimensional data and are proficient in handling feature interactions with complexity. This makes them particularly suitable for studies focused on detailed and complex aspects of multimorbidity patterns. Utilizing a novel evolutionary machine learning approach, we illustrate the ability to derive meaningful results even from rare events. Our model’s successful application in uncovering prevalent morbidity patterns linked to COVID-19 outcomes underscores its potential to yield valuable insights across diverse data sets, particularly where data sparsity poses challenges. While acknowledging its computational demands, we emphasize the model’s readiness and adaptability for analyzing complex medical data, highlighting its robustness as a powerful tool in medical research.

We identified prevalent morbidity patterns from the evolved data set, focusing on multimorbidity combinations or feature subsets closely associated with the outcome. This research targets clinically significant patterns directly. Utilizing an evolutionary algorithm to identify these combinations ensures the analysis is grounded in a robust, data-driven process. Analyzing the frequency of these subsets provides a measure of their prevalence and significance in the studied population. This step helps validate the relevance of the identified combinations, ensuring that the observed patterns are not random but indicative of common trends in patient data. Focusing on the most prevalent combinations, the study aims to yield findings with practical implications for health care providers. These findings can inform clinical decision-making by helping practitioners identify patients at higher risk due to specific multimorbidity patterns, enabling them to tailor treatment approaches accordingly.

Multimorbidity features such as older age combined with specific ATC codes (N03AX and R03BA) were frequently observed in outcome-related bins, particularly among middle-aged females. Likewise, during the analysis of SHAP values in cohort 1, it was noted that the use of inhaled corticosteroid medication for asthma (R03BA) had a significantly positive impact on the likelihood of hospitalization. This observation aligns with findings from the Open SAFELY study, which identified asthma as a significant risk factor for mortality in patients with COVID-19. Specifically, it highlighted that individuals using inhaled corticosteroids face the highest risk in this context [38].

The ATC N03AX group encompasses various antiepileptic medications used in treating bipolar disorder, epilepsy, migraine, and sometimes schizophrenia. Individuals with severe mental illnesses have shown a slightly higher risk of severe clinical outcomes from COVID-19 compared with those without prior mental health conditions [39]. Also, there have been reports linking the use of antiepileptic medications with vitamin D deficiency [40]. In our study, the presence of A11CC (vitamin D and analogs) in the multimorbidity history makes middle-aged females more vulnerable to hospitalization. Conversely, for older-age females, the presence of this feature is associated with smaller SHAP values, indicating that its presence in their history is protective against hospitalization.

In a multimorbidity study of hospitalized patients with COVID-19 [41], the ATC group most closely associated with prolonged hospital stays is M04AA, which includes preparations inhibiting uric acid production. In our study, among older-age females, the combinations of M04AA and NO3AX were notably frequent. M04AA also featured prominently in middle-aged males, while G04CA (alpha-adrenoreceptor antagonists), used for benign prostatic hypertrophy, was notable among older-age males. Research indicates that male COVID-19 cohorts experience more unfavorable clinical outcomes compared with females [42,43]. Specifically, while patients with cancer are at an increased risk of SARS-CoV-2 infection, individuals undergoing androgen-deprivation therapy for prostate cancer appear to have some level of protection against the infection [43].

Strength and Limitations

Each row in the data set represents a comprehensive aggregation of each patient’s multimorbidity history over a 5-year period, including all relevant instances of diseases and conditions. This approach ensures a holistic view of each patient’s health status. To minimize subjectivity in the selection process, the criteria for including health records in the data set are consistent and objective. The aggregation process is governed by standardized criteria, uniformly applied across all patients. Also, aggregating multiple health records into a single patient instance helps mitigate bias that could arise from selectively choosing one entry over another.

In many clinical scenarios, understanding the implications of false positives and false negatives is a requisite beyond just disease probabilities. Although metrics such as Pietra and sBrier [44] and the average deviation about the probability threshold (ADAPT) index [45] are valuable, especially when patients seek to understand disease probabilities, we believe that traditional metrics such as AUC, accuracy, precision, recall, and F1-score, along with a confusion matrix, offer a comprehensive evaluation of the prediction models in this study. The use of a confusion matrix as an evaluation tool enables us to customize model assessment to reflect different clinical priorities, which is particularly relevant when the prediction model informs treatment plans or risk assessments [46].

Evolutionary algorithms inherently favor the best performing choices available, despite their stochastic nature. These biases contribute to their improved performance. Each evolutionary cycle involves evaluating bin fitness and performing genetic operations to identify the best performing group of features. In this study, the evolutionary algorithm is used not only for feature selection in sparse data but also to indirectly assess epistatic associations between features in each evolutionary cycle. Multimorbidity features are grouped into bins and scored based on a deep learning classifier’s predictive ability for the outcome. The features within bins are regrouped iteratively after each evolutionary cycle.

Many studies using machine learning to investigate multimorbidity patterns focus on handling sparse data sets by either removing sparsity-generating features or merging feature categories to reduce sparsity. However, these methods often result in information loss and less precise interpretation of multimorbidity features [19]. Instead of relying solely on a sequential deep learning model, we aggregated all evolved bins to create a new data set. This allowed us to analyze the evolutionarily evolved bins and identify frequent multimorbidity features and combinations.

Analyzing all possible combinations of multimorbidity features in a data set can be computationally expensive, and many irrelevant combinations may not warrant further analysis. To address this, we applied an evolutionary algorithm to extract meaningful combinations, prioritizing even less prevalent features. Consequently, our focus shifted to investigating only the most common multimorbidity features found in the top bins.

Conclusions

When combined with other multimorbidity features, we identified associations with the outcome even for less prevalent medical conditions. Discovering hidden interconnections among different multimorbidity features opens new research pathways for studying multidimensional medical conditions in combination.

Using an innovative evolutionary machine learning approach, we identified prevalent morbidity patterns linked to hospitalization risk, especially among specific age and gender cohorts. Our findings highlight the adaptability of this methodology, demonstrating its ability to yield significant insights even in scenarios involving rare events. In addition to this, we repurposed administrative data for multimorbidity analysis, offering a novel path for public health research. This approach has the potential to influence future studies and interventions, encompassing areas such as polypharmacy and long COVID-19 research. By deepening our understanding of COVID-19 dynamics, this study emphasizes the broader utility of such methodologies in shaping effective public health strategies and interventions.

Data Availability

The data sets used in this study are not publicly available. As a result of ethical committee restrictions, raw data cannot be publicly or freely shared to ensure the privacy and protection of individual-level data. However, researchers may request access to aggregated data by contacting the corresponding author (DB) through a reasonable inquiry. The code utilized to derive the results in this study is accessible in Multimedia Appendix 5.

Conflicts of Interest

None declared.

Multimedia Appendix 1

One-proportion z-test results.

PDF File (Adobe PDF File), 311 KB

Multimedia Appendix 2

Performance evaluation of other machine learning algorithms.

PDF File (Adobe PDF File), 308 KB

Multimedia Appendix 3

Outcome association of the feature and the support.

PDF File (Adobe PDF File), 261 KB

Multimedia Appendix 4

Most prevalent multimorbidity feature combinations in evolved bins.

PDF File (Adobe PDF File), 206 KB

Multimedia Appendix 5

The code used to derive the study results.

ZIP File (Zip Archive), 1220 KB

  1. Radner H, Yoshida K, Smolen JS, Solomon DH. Multimorbidity and rheumatic conditions-enhancing the concept of comorbidity. Nat Rev Rheumatol. Apr 2014;10(4):252-256. [CrossRef] [Medline]
  2. Larsen FB, Pedersen MH, Friis K, Glümer C, Lasgaard M. A latent class analysis of multimorbidity and the relationship to socio-demographic factors and health-related quality of life. A national population-based study of 162,283 Danish adults. PLoS One. Jan 5, 2017;12(1):e0169426. [FREE Full text] [CrossRef] [Medline]
  3. Zhang Y, Chen R, Tang J, Stewart WF, Sun J. LEAP: learning to prescribe effective and safe treatment combinations for multimorbidity. New York, NY. Association for Computing Machinery; 2017. Presented at: KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; August 13 - 17, 2017:1315-1324; Halifax, NS, Canada. [CrossRef]
  4. Clark A, Jit M, Warren-Gash C, Guthrie B, Wang HHX, Mercer SW, et al. Centre for the Mathematical Modelling of Infectious Diseases COVID-19 working group. Global, regional, and national estimates of the population at increased risk of severe COVID-19 due to underlying health conditions in 2020: a modelling study. Lancet Glob Health. Aug 2020;8(8):e1003-e1017. [FREE Full text] [CrossRef] [Medline]
  5. Catalano A, Dansero L, Gilcrease W, Macciotta A, Saugo C, Manfredi L, et al. Multimorbidity and SARS-CoV-2-related outcomes: analysis of a cohort of Italian patients. JMIR Public Health Surveill. Feb 09, 2023;9:e41404. [FREE Full text] [CrossRef] [Medline]
  6. Stirland LE, González-Saavedra L, Mullin DS, Ritchie CW, Muniz-Terrera G, Russ TC. Measuring multimorbidity beyond counting diseases: systematic review of community and population studies and guide to index choice. BMJ. Feb 18, 2020;368:m160. [FREE Full text] [CrossRef] [Medline]
  7. Wallace E, McDowell R, Bennett K, Fahey T, Smith SM. Comparison of count-based multimorbidity measures in predicting emergency admission and functional decline in older community-dwelling adults: a prospective cohort study. BMJ Open. Sep 20, 2016;6(9):e013089. [FREE Full text] [CrossRef] [Medline]
  8. Sundararajan V, Henderson T, Perry C, Muggivan A, Quan H, Ghali WA. New ICD-10 version of the Charlson comorbidity index predicted in-hospital mortality. J Clin Epidemiol. Dec 2004;57(12):1288-1294. [CrossRef] [Medline]
  9. Fernández-Niño JA, Guerra-Gómez JA, Idrovo AJ. Multimorbidity patterns among COVID-19 deaths: proposal for the construction of etiological models. Rev Panam Salud Publica. 2020;44:e166. [FREE Full text] [CrossRef] [Medline]
  10. van den Bussche H, Koller D, Kolonko T, Hansen H, Wegscheider K, Glaeske G, et al. Which chronic diseases and disease combinations are specific to multimorbidity in the elderly? Results of a claims data based cross-sectional study in Germany. BMC Public Health. Feb 14, 2011;11:101. [FREE Full text] [CrossRef] [Medline]
  11. Hall M, Dondo TB, Yan AT, Mamas MA, Timmis AD, Deanfield JE, et al. Multimorbidity and survival for patients with acute myocardial infarction in England and Wales: latent class analysis of a nationwide population-based cohort. PLoS Med. Mar 2018;15(3):e1002501. [FREE Full text] [CrossRef] [Medline]
  12. Bisquera A, Gulliford M, Dodhia H, Ledwaba-Chapman L, Durbaba S, Soley-Bori M, et al. Identifying longitudinal clusters of multimorbidity in an urban setting: a population-based cross-sectional study. Lancet Reg Health Eur. Apr 2021;3:100047. [FREE Full text] [CrossRef] [Medline]
  13. Hernández B, Reilly RB, Kenny RA. Investigation of multimorbidity and prevalent disease combinations in older Irish adults using network analysis and association rules. Sci Rep. Oct 10, 2019;9(1):14567. [FREE Full text] [CrossRef] [Medline]
  14. Prados-Torres A, Poblador-Plou B, Calderón-Larrañaga A, Gimeno-Feliu LA, González-Rubio F, Poncel-Falcó A, et al. Multimorbidity patterns in primary care: interactions among chronic diseases using factor analysis. PLoS One. 2012;7(2):e32190. [FREE Full text] [CrossRef] [Medline]
  15. König Hans-Helmut, Leicht H, Bickel H, Fuchs A, Gensichen J, Maier W, et al. MultiCare study group. Effects of multiple chronic conditions on health care costs: an analysis based on an advanced tree-based regression model. BMC Health Serv Res. Jun 15, 2013;13:219. [FREE Full text] [CrossRef] [Medline]
  16. Dong G, Zhang Z, Feng J, Zhao X. MorbidGCN: prediction of multimorbidity with a graph convolutional network based on integration of population phenotypes and disease network. Brief Bioinform. Jul 18, 2022;23(4):bbac255. [CrossRef] [Medline]
  17. Nouri, Lizotte D, Sedig K, Abdullah S. VISEMURE: a visual analytics system for making sense of multimorbidity using electronic medical record data. Data. Aug 04, 2021;6(8):85. [FREE Full text] [CrossRef]
  18. Charles G, Krishnarajah N, Christopher Y. mmVAE: multimorbidity clustering using relaxed Bernoulli β -variational autoencoders. J Mach Learn Res. Nov 28, 2022;193:88-102. [FREE Full text] [CrossRef]
  19. Benny D, Giacobini M, Costa G, Gnavi R, Ricceri F. Multimorbidity in middle-aged women and COVID-19: binary data clustering for unsupervised binning of rare multimorbidity features and predictive modeling. BMC Med Res Methodol. Apr 24, 2024;24(1):95. [FREE Full text] [CrossRef] [Medline]
  20. Marzouki F, Bouattane O. Deep learning based model for automatic multimorbidity pattern prognosis. 2023. Presented at: 3rd International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET 2023); May 18-19, 2023:1-6; Casablanca, Morocco. [CrossRef]
  21. Guha, Ghosh M, Kapri S, Shaw S, Mutsuddi S, Bhateja V, et al. Deluge based genetic algorithm for feature selection. Evol Intel. Mar 07, 2019;14(2):357-367. [FREE Full text] [CrossRef]
  22. Vaishali R, Sasikala R, Ramasubbareddy S, Remya S, Nalluri S. Genetic algorithm based feature selection and MOE Fuzzy classification algorithm on Pima Indians Diabetes dataset. 2017. Presented at: 2017 International Conference on Computing Networking and Informatics (ICCNI); October 29-31, 2017:1-5; Covenant University, Canaanland, Ota, Nigeria. [CrossRef]
  23. Satvik D, Ryan J U. RARE: evolutionary feature engineering for rare-variant bin discovery. 2021. Presented at: Genetic and Evolutionary Computation Conference Companion; July 10-14, 2021; Lille, France. [CrossRef]
  24. Chen Y, Shi L, Zheng X, Yang J, Xue Y, Xiao S, et al. Patterns and determinants of multimorbidity in older adults: study in health-ecological perspective. Int J Environ Res Public Health. Dec 14, 2022;19(24):16756. [FREE Full text] [CrossRef] [Medline]
  25. Tong DL, Mintram R. Genetic algorithm-neural network (GANN): a study of neural network activation functions and depth of genetic algorithm search applied to feature selection. Int J Mach Learn Cyber. Sep 10, 2010;1(1-4):75-87. [FREE Full text] [CrossRef]
  26. Sheikhalishahi S, Bhattacharyya A, Celi LA, Osmani V. An interpretable deep learning model for time-series electronic health records: case study of delirium prediction in critical care. Artif Intell Med. Oct 2023;144:102659. [FREE Full text] [CrossRef] [Medline]
  27. Lundberg SM, Lee Su-In. A unified approach to interpreting model predictions. 2017. Presented at: NIPS'17: 31st International Conference on Neural Information Processing Systems; December 4-9, 2017; Long Beach, CA. [CrossRef]
  28. Established Legislative Decree no. 322/1989 concerning National Statistical System organization the Sistan includes: the National Institute of Statistics (ISTAT); public bodies and statistical information bodies (INEA, ISFOL); the statistical offices of the State administrations and other public bodies, of the Government Offices of the Government, of the Regions and Autonomous Provinces, of the Provinces, of the Chambers of Commerce (CCIAA), of the Municipalities, single or associated, and the statistics offices of other public and private institutions that perform public interest functions. SISTAN. URL: https://www.sistan.it/index.php?id=422 [accessed 2023-06-06]
  29. The Official Gazette of the Italian Republic: Approval of the National Statistical Program 2020-2022, Decree of the President of the Republic. Italian Government: Gazzetta Ufficiale. Mar 09, 2022. URL: https://www.gazzettaufficiale.it/eli/gu/2022/05/26/122/so/20/sg/pdf [accessed 2023-05-27]
  30. Centers for Disease Control and Prevention (CDC). ICD - ICD-9-CM - International Classification of Diseases, Ninth Revision, Clinical Modification (2021). CDC. 2021. URL: https://www.cdc.gov/nchs/icd/icd9cm.htm [accessed 2023-08-27]
  31. Halgamuge M, Daminda E, Nirmalathas A. Best optimizer selection for predicting bushfire occurrences using deep learning. Nat Hazards. May 29, 2020;103(1):845-860. [FREE Full text] [CrossRef]
  32. Nitish S, Geoffrey H, Alex K, Ilya S, Ruslan S. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15:1929-1958. [FREE Full text] [CrossRef]
  33. Liu L, Qi H. Learning effective binary descriptors via cross entropy. 2017. Presented at: IEEE Winter Conference on Applications of Computer Vision (WACV); March 24-31, 2017; Santa Rosa, CA. [CrossRef]
  34. Zebari R, Abdulazeez A, Zeebaree D, Zebari D, Saeed JA. A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction. Journal of Applied Science and Technology Trends. 2020;1(1):56-70. [FREE Full text] [CrossRef]
  35. Zhang J, Zhan Z, Lin Y, Chen N, Gong Y, Zhong J, et al. Evolutionary computation meets machine learning: a survey. IEEE Comput Intell Mag. Nov 2011;6(4):68-75. [CrossRef]
  36. Chen K, Xue B, Zhang M, Zhou F. An evolutionary multitasking-based feature selection method for high-dimensional classification. IEEE Trans Cybern. Jul 2022;52(7):7172-7186. [CrossRef]
  37. Al-Sahaf H, Bi Y, Chen Q, Lensen A, Mei Y, Sun Y, et al. A survey on evolutionary machine learning. Journal of the Royal Society of New Zealand. May 05, 2019;49(2):205-228. [FREE Full text] [CrossRef]
  38. Williamson EJ, Walker AJ, Bhaskaran K, Bacon S, Bates C, Morton CE, et al. Factors associated with COVID-19-related death using OpenSAFELY. Nature. Aug 2020;584(7821):430-436. [FREE Full text] [CrossRef] [Medline]
  39. Lee SW, Yang JM, Moon SY, Yoo IK, Ha EK, Kim SY, et al. Association between mental illness and COVID-19 susceptibility and clinical outcomes in South Korea: a nationwide cohort study. Lancet Psychiatry. Dec 2020;7(12):1025-1031. [FREE Full text] [CrossRef] [Medline]
  40. Nadalin S, Jakovac H, Peitl V, Karlović D, Buretić-Tomljanović A. Dysregulated inflammation may predispose patients with serious mental illnesses to severe COVID‑19. Mol Med Rep. Aug 2021;24(2):611. [FREE Full text] [CrossRef] [Medline]
  41. Malekpour M, Abbasi-Kangevari M, Shojaee A, Saeedi Moghaddam S, Ghamari S, Rashidi M, et al. Effect of the chronic medication use on outcome measures of hospitalized COVID-19 patients: evidence from big data. Front Public Health. 2023;11:1061307. [FREE Full text] [CrossRef] [Medline]
  42. Strope JD, Chau CH, Figg WD. Are sex discordant outcomes in COVID-19 related to sex hormones? Semin Oncol. Oct 2020;47(5):335-340. [FREE Full text] [CrossRef] [Medline]
  43. Montopoli M, Zumerle S, Vettor R, Rugge M, Zorzi M, Catapano C, et al. Androgen-deprivation therapies for prostate cancer and risk of infection by SARS-CoV-2: a population-based study (N = 4532). Ann Oncol. Aug 2020;31(8):1040-1045. [FREE Full text] [CrossRef] [Medline]
  44. Wu Y, Lee W. Alternative performance measures for prediction models. PLoS One. 2014;9(3):e91249. [FREE Full text] [CrossRef] [Medline]
  45. Lee W, Wu Y. Characterizing decision-analysis performances of risk prediction models using ADAPT curves. Medicine (Baltimore). Jan 2016;95(2):e2477. [FREE Full text] [CrossRef] [Medline]
  46. Forbes A. Classification-algorithm evaluation: five performance measures based on confusion matrices. J Clin Monitor Comput. May 1995;11(3):189-206. [FREE Full text] [CrossRef]


AdaGrad: Adaptive Gradient Algorithm
ADAPT: average deviation about the probability threshold
ATC: Anatomical Therapeutic Chemical
AUC: area under the curve
ICD-9-CM: 9th International Classification of Diseases-Clinical Modification
ISTAT: Istituto Nazionale di Statistica
PLS: Piedmont Longitudinal Study
PSN: Programma Statistico Nazionale
SHAP: Shapley Additive Explanations
SISTAN: National Statistical System


Edited by A Mavragani; submitted 31.08.23; peer-reviewed by WC Lee, E Menasalvas, P Wang; comments to author 08.12.23; revised version received 31.01.24; accepted 16.05.24; published 18.07.24.

Copyright

©Dayana Benny, Mario Giacobini, Alberto Catalano, Giuseppe Costa, Roberto Gnavi, Fulvio Ricceri. Originally published in JMIR Public Health and Surveillance (https://publichealth.jmir.org), 18.07.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on https://publichealth.jmir.org, as well as this copyright and license information must be included.