1. Introduction
n December 2019, an emerging disease called COVID-19 originated in Wuhan, China, and rapidly spread worldwide. The resulting global pandemic impacted all continents [
1, 2]. In Iran, the first case of COVID-19 was reported on February 19, 2020, in Qom city [
3, 4]. Various diagnostic methods have been used to detect COVID-19, including laryngoscopy, initial blood tests, and polymerase chain reaction (PCR) tests [
5-7]. Both the World Health Organization (WHO) and the Infectious Disease Controlling Center recommended PCR as an accurate way to diagnose the disease [
8, 9].
When trying to determine the mortality rate of COVID-19 in a scenario where a higher number of hospitalized patients are likely to succumb to the disease, it becomes imperative to assess the relationship between risk factors and deaths attributed to COVID-19 [
10, 11]. Epidemiological studies have indicated that certain factors will likely influence mortality risk in patients with a positive PCR test diagnosis [
5,
12-14]. For example, Li et al. reported that age over 60 was one of the leading mortality factors in patients with PCR+ test diagnosis [
5]. Older age weakens the body’s immune system, acting as a barrier against viral infections. Also, background diseases such as blood pressure, diabetes, and cardiovascular diseases increase the risk factors for illness and mortality from COVID-19 in older patients.
However, traditional statistical models like logistic regression may not adequately account for the impact of factors such as age and underlying medical conditions on COVID-19-related deaths. Failing to account for these factors in statistical models can lead to biased estimates of the association between COVID-19 and mortality [
13-15]. The potential bias in estimating the association between COVID-19 and mortality could arise from improperly matching risk factors among various groups of COVID-19 patients. This discrepancy in risk factors between these groups may introduce an association between COVID-19 and mortality influenced by the variations in these risk factors [
16-19].
Several estimation methods have been recently developed to enhance our ability to generate unbiased estimators from data derived from improperly matched risk factors among different COVID-19 patient groups [
18-22]. The propensity score estimation (PSM) method has been introduced in certain studies as a novel approach for matching risk factors [
14, 18]. This method effectively balances the distribution of risk factors among various groups of COVID-19 patients. The propensity score (PS) is defined as the conditional probability of an individual i with a positive PCR (PCR+) in a specific condition, given the risk factors for that individual. Therefore, according to the Equation 1, the likelihood of:
for the individual i with PCR+, Zi=1, and for those with PCR-, Zi=0, where Xi denotes the vector of risk factors for the individual i are measured before measuring PCR, or their scores are not affected by the PCR results. In administrating the PSM in practice, the logistic regression model and matching approaches are used [
15, 16]. In the PSM approach, the main variables observed between the treated (PCR+, Zi=1) and control (PCR-, Zi=0) groups are balanced on Equation 2:
This approach, named nearest neighbor matching (NNM), or PSM, is a quasi-experimental method where the researcher uses statistical techniques to construct an artificial control group by matching each treated unit with a control unit of similar characteristics. The PS is used for estimating the results before and after matching. The outliers can be shown by the standard mean difference reflecting the balance between the treated and control groups [
23, 24].
This study aims to assess COVID-19 mortality by examining the causal association of risk factors among COVID-19 patients admitted to hospitals in Golestan Province, Iran. The data and statistics presented in this article were obtained from the Hospital Information System (HIS) software installed in hospitals affiliated with Golestan University of Medical Sciences. Among the 30114 hospitalized patients suspected of having the coronavirus, PCR tests were conducted on 6379 patients. It should be noted that during the initial months of the pandemic, testing availability was limited, and not all patients underwent PCR testing.
The study focuses on risk factors such as SpO2 levels, gender, age, and duration of ICU hospitalization. We investigated their influence on the risk of death in COVID-19 patients with a positive PCR diagnosis. Initially, logistic regression analysis was used to estimate the effect of COVID-19 on the death rate among patients. However, it was observed that the underlying risk factors were not adequately adjusted for. To address this, PS matched deceased patients with recovered patients with similar scores. This matching approach was mainly utilized when a higher number of hospitalized patients were expected to die. The study also compared the results obtained from logistic regression analysis and the PSM estimation method.
2. Materials and Methods
Study design and sample size
The current study participants were 6379 patients with COVID-19 symptoms and their PCR test results. They were referred to hospitals with different service qualities in Golestan Province, north-east of Iran, in 2020. Golestan Province has 26 hospitals. The main hospitals for admitting COVID-19 patients are Sayyad Shirazi Gorgan and Payambar Azam Gonbad hospitals. It can be stated that the other hospitals admitted patients in emergencies. All hospitals were under the supervision of Golestan University of Medical Science. Note that not all patients were tested with the PCR test from the beginning of the COVID-19 outbreak in Golestan Province until the end of 2019. However, a total of 30114 people were hospitalized in all the hospitals of this province.
Inclusion criteria
The inclusion criteria were patients diagnosed and registered as COVID-19 in the study hospitals, and their PCR tests were positive.
Exclusion criteria
The exclusion criteria were the patients discharged from the hospital with partial recovery or died.
Questioners and data collection
The hospital admission registers the basic information of each patient at admission. Using the hospital HIS software, the collected information was transferred to the central data of the University of Medical Sciences. Then, another application used in the laboratory of the provincial health center recorded the results of the PCR test for coronavirus patients. Next, the patient’s information was processed based on the patient’s profile with a specific code. This information can be identified and tracked.
Their gender, age, SpO2 (pulse oximeter), hospitalization, and ICU (intensive care unit) stay duration were recorded by referring to the hospital HIS. The results of the PCR test and treatment as died or discharged were conceived as independent and dependent variables, respectively.
Statistical data analysis
This study used the logistic regression analysis as an initial step to estimate the effect of COVID-19 on the rate of death among patients. After the initial logistic regression analysis, PS were used to adjust for underlying risk factors such as age, gender, and SpO2 levels. PS were used to match deceased patients with recovered patients who had similar PS. The approach helped to create more comparable groups and reduce bias in estimating the effect of COVID-19 on the rate of death among patients. PS estimation was applied to estimate the probability scores of the underlying risk factors with probability in expression 1. PSM method relies on the distance d in expression 2. The value d was calculated for estimating results before and after matching. Then, the means of the treatment group and the control group were assimilated. We also used the traditional multiple logistic regression model to compare with the PSM method results. R software, version 4.3.2 and the SPSS software, version 22 were used for statistical analysis.
3. Results
Of 6379 inpatients, 5581(87.5%) were discharged/recovered, and 798(12.5%) died. Moreover, 1954 patients (30.6%) had positive and 4425(69.4%) negative PCR test results. Further, their average age was 53.91±23.61 years; 3130(49.1%) were male, and 3249(50.9%) were female.
Table 1 presents the characteristics of the patients under the study based on their treatment results (discharged vs died).
The chi-square test showed a significant association between the results of treating patients and that of PCR results and ICU stay. Moreover, the t-test showed that the average rate of age among dead ones was significantly higher than that of recovered ones, and the mean rate of SpO2 in dead ones was considerably lower than that of recovered ones (P<0.001).
Table 2 presents the causal association between the underlying covariates and treatment results (discharged vs dead) for 6379 subjects before matching and 3960 after matching. Results showed a considerable change in the percentage of deaths of patients with PCR+ after matching.
Table 2 also indicates that PCR+ between dead has increased from 38% to 59.1% before and after matching. Moreover, male deaths have changed from 54% to 56%, and females from 45% to 43% after matching. This finding implies that the number of deaths due to COVID-19 in men is higher than that in women. In contrast, there were no significant differences between the dead groups before and after matching regarding their average age and SpO2.
The results of binary regression goodness-of-fit showed that before matching, PCR+, low SpO2, older age, male gender, and hospitalization in ICU resulted in an increased risk of death among the patients (
Table 3).
After matching, binary logistic regression results revealed a positively increased score based on the PS approach and an increased death ratio among patients with PCR+. Matching also reflected the influence of hospitalization in the ICU (
Table 4).
According to
Table 5, before matching, the estimation of the mean effect of propensity (distance) in the PCR+ group and PCR- group reached 0.3161 and 0.3020, respectively. After matching, the indicator reached 0.3161 in the PCR+ group and 0.3160 in the PCR- group, reflecting an unchanged amount in the former. However, the standardized mean difference before matching (0.2968) and after matching (0.0007) reflected a considerable decrease. The variance ratio before matching (0.6148) and after matching (1.0011) reflected more homogeneity of variances in the groups after matching.
Table 5 summarizes the PSM results for the death rate of COVID-19 patients based on the hospitalization of having a PCR test. It indicates that the patient’s initial diagnosis at the admission was based on the clinical symptoms, and then the desired test was performed. In the initial data, there were 6379 patients.
Figure 1 illustrates that matching considerably affected the groups’ homogeneity in other studied variables. Outlier observations in
Figure 1 present estimates of the probability of receiving treatment close to 0 or 1; therefore, the PSM method with limited overlap can generate estimates approximately unchanged in bias and precision.
4. Discussion
This study aimed to examine the impact of various risk factors on the mortality of COVID-19 patients treated in Golestan hospitals with varying healthcare capabilities. The measured risk factors in this study included gender, age, SpO2 (pulse oximeter), hospitalization, and duration of ICU stay. These factors were documented by referencing the HIS.
Usually, logistic regression procedures are employed in routine practice to address such analyses; however, they come with inherent flaws. For example, the association of certain risk factors, such as age, may confound the effect of COVID-19 on mortality. Younger patients with shorter hospitalization and ICU stays exhibit a lower mortality risk than older patients with longer stays.
Consequently, researchers during the COVID-19 pandemic aimed to determine whether certain risk factors, such as age and gender, impact COVID-19-related deaths. These risk factors can introduce bias in estimating the mortality rate of COVID-19, especially in individuals with underlying diseases. While several studies have been conducted in our country to evaluate the influence of risk factors on COVID-19-related mortality outcomes, they may not comprehensively account for the role and effect of all potential risk factors for COVID-19-related deaths [
1-3].
In our study, we observed that the risk of mortality increased in all age groups after the age of 53, with a significant rise in individuals aged 65 and older. However, the mortality risk among elderly patients was comparable to that of the general population, suggesting that although age is a significant risk factor for COVID-19-related mortality outcomes, other factors may also contribute.
We also observed that traditional multiple logistic regression methods fail to estimate unbiased mortality rates. This finding underscores the need for more advanced statistical methods to determine the causal association between COVID-19 and mortality rate accurately. Specifically, the traditional logistic regression output in
Table 3 indicates interference with the data when analyzing the effect of PCR+ on death. In contrast, the results in
Table 4 demonstrate the advantages of using new data over the previous data, as evidenced by an increase in Means Treated and a decrease in Std. Diff (
Table 5). According to
Table 5, after removing the excess data, the mean values of the treatment and control groups in the assimilated data became closer. Consequently, the standard average difference reduced from 0.2968 to 0.0007. Moreover, the variance ratio increased from 0.6148 to 1.0011. These indicators specify better balance in the treatment and control data groups.
Furthermore, the average standard deviation tends towards 0, and a variance ratio of 1 suggests improved matching for the PCR+ group. This improvement and reduction were achieved by equalizing the standard deviation between the control and treatment groups, a task that logistic regression fails to reduce this standard deviation. Additionally, the box plot diagram below illustrates the index or average tendency score. According to our findings, the measured risk factors mentioned earlier resulted in a 72% increase in mortality risk among patients with a positive PCR result when employing the PSM estimation approach. However, utilizing the multiple logistic regression model approach yielded an observed increase in mortality risk of 46%. This outcome suggests that the latter approach is more effective in controlling measured risk factors.
Our findings indicated that the age at which individuals expired due to COVID-19 was significantly higher than those who recovered from it. This finding could be attributed to various physiological and anatomical changes in the human body associated with aging. These age-related changes render older individuals more susceptible to severe outcomes and complications related to COVID-19, leading to a higher mortality rate among this population. This finding is consistent with the findings of Sheikhi et al. [
2]. In our study, we also observed that patients whose PCR test was positive had a higher risk of mortality than those with negative test results. These findings align with the previous research conducted by Azizmohammad Looha et al. [
3], further supporting the association between a positive PCR test and an increased risk of mortality from COVID-19.
In our study, older age, male gender, and dyspnea were risk factors. Li et al. concluded that after matching, the effect of these factors had been well-reflected. Moreover, they found that without matching, the rate of mortality can increase in old male patients [
5]. We also found such a finding for male gender and older patients. The higher risk of death observed in men could be attributed to lifestyle factors such as higher rates of smoking and alcohol consumption.
The study conducted by Kim et al. demonstrated an increase in the mortality rate from COVID-19 [
6]. Before matching, the difference in mortality rate was not significant. According to our observations in
Table 5, females exhibited a significantly lower risk of death than males. These findings align with the previous research conducted by Her et al. in South Korea, as mentioned in some studies [
7].
However, Elze et al. argued that utilizing moderating covariates and matching through the PS approach yielded better and more accurate results [
10]. Our findings align with these studies, further supporting the notion that matching by PS and considering moderating covariates can enhance the accuracy of mortality rate analysis.
Our findings revealed that the SpO2 levels of deceased patients were significantly lower than those who were discharged. This outcome may be due to the difference in SpO2 levels between deceased patients and those discharged, which could reflect the severity of COVID-19 infection. Patients with more severe symptoms and complications tend to have lower SpO2 readings due to impaired lung function caused by inflammation or fluid accumulation.
Additionally, the risk of death from the disease was found to be higher among patients hospitalized in the ICU compared to those who were not. The higher mortality may be because patients requiring ICU hospitalization are often those with more severe symptoms and complications related to COVID-19. These individuals may have compromised respiratory function, organ failure, or other critical conditions that increase their chance of mortality. These results are consistent with a study by Martinez-Martinez et al. [
11].
In this study, after matching risk factors by applying the PS approach, the effect of PCR+ increased significantly. This finding agrees with findings in other studies [
8]. One advantage of this study is its use of the PSM approach, which has rarely been used in previous studies. This method correctly models the nature of the relationship between the PS and the outcome.
However, the PSM approach is yielding an amended estimation. Studies also reveal that methods such as G-estimation can, in principle, be adopted to show the impact of the causal association of the risk factors.
5. Conclusion
Using the PSM estimation method showed the high risk of death in patients with the PCR+ test. Specifically, using the PSM estimation approach, the above-measured confounding factors increased the risk of death in patients with PCR+ to 72%. However, the multiple logistic regression model revealed the risk of death at 46%. This discrepancy might be due to better control of the impact of above-measured risk factors. Therefore, the former estimating approach is more effective in controlling the impact of confounding factors.
Study limitations
The study encountered certain limitations. The first one was related to the hospitals under the supervision of Golestan University of Medical Science. They had different levels of quality in services. The second limitation was that the exact time of being infected with COVID-19 was the inability to adjust for the exact time of COVID-19 infection precisely. These factors may limit the interpretation of our results.
Ethical Considerations
Compliance with ethical guidelines
This work was approved by the Ethics Committee of Medical Research in Iran (Code: IR.GOUMS.REC.1401.185). Although the proposal has been approved by the Biomedical Research Ethics Committee, meeting the professional and legal requirements is the sole responsibility of the PI and other project collaborators.
Funding
The paper was extracted from the first author’s Master thesis under supervisor of the second author, Department of Statistics, Faculty of Sciences, Golestan University, Gorgan, Iran.
Authors contributions
Conceptualization: Hassan Khorsha; Initial idea, methodology and writing original draft: Manoochehr Babanezhad; Editing: Naser Behnampour; Investigation: All authors.
Conflict of interest
The authors declared no conflict of interest.
Acknowledgments
The authors would like to thank the Golestan University authorities.
References