Introduction
Lung cancer (LC) is a major global health challenge due to its high prevalence and mortality rates, placing a significant burden on health care systems worldwide, with approximately 2.48 million new cases globally and a 5-year prevalence of 4.21 per 100000 population in Iran [1-3]. Rapid progression, late-stage diagnosis, and reduced opportunities for effective treatment often result in limited survival prospects for LC patients [4]. Therefore, understanding the factors that influence LC survival is crucial for enhancing patient outcomes and informing clinical decision-making [5].
Previous research has explored various factors influencing LC survival. Studies have highlighted the role of tumor stage, histology, and genetic mutations in prognosis. However, conflicting findings persist in the literature, particularly regarding demographic and tumor-related factors [6]. For example, while some studies suggest that gender and age (due to a weaker immune system compared to younger adults) significantly influence survival [5, 7, 8], others report inconsistent or non-significant associations [5, 9, 10]. Previous studies indicate that biological and behavioral differences (such as smoking profile) may play a critical role in significant differences in survival of LC patients among males and females [11, 12]. However, few recent studies have demonstrated no difference in LC survival according to gender, which may be due to dynamic changes in the distribution of these behavioral factors, such as smoking among males and females over time, or differences in life expectancy [9, 13-16]. Similarly, the impact of histology on survival remains a topic of debate, with some studies indicating better outcomes for squamous cell carcinoma (SCC). In contrast, others emphasize the role of adenocarcinoma (AC) and its molecular subtypes [7, 12, 17]. These contradictions highlight the complexity of LC and the need for more nuanced analyses to clarify the roles of these factors.
Several studies in Iran have analyzed LC survival data in different provinces. Abedi et al. [18] investigated non-small cell LC and small cell LC patients in Sari City, Iran, using several survival analysis methods, revealing that demographic factors, gender, and cancer type significantly impact survival, with non-small cell LC patients exhibiting notably higher survival rates. Zahir et al. [19] in Yazd City, Iran, reported that SCC was associated with higher survival rates, although this relationship did not reach statistical significance. Babanejhad et al. [20] in Tehran City, Iran, identified age at diagnosis, tumor type, and brain metastasis as influential on survival. However, most of these previous studies were conducted on small samples of LC patients, and to the best of our knowledge, no studies have been conducted to analyze the most recent data on LC survival and its determinants in Mazandaran Province, Iran.
LC survival rates vary significantly by region due to a combination of factors, including genetics, ethnicity, behavior (e.g. smoking) [21-23], occupation, environmental conditions (e.g. air quality) [24-26], access to healthcare, and recent advances in treatment options [27]. These developments, combined with the unique demographic and environmental characteristics of specific regions, underscore the need to study the most recent LC patient data within distinct populations. In this context, investigating LC survival among the northern Iranian population, particularly in Mazandaran Province, is crucial.
Furthermore, many studies have utilized classical statistical analyses to identify variables associated with the survival of LC patients. However, given the complexity and multifaceted nature of this disease, classical analyses alone are insufficient for modeling the intricate relationships between variables and accurately predicting survival patterns. Some studies have focused on complex statistical and machine learning methods. For instance, Zhang et al. [28] developed a model based on Ridge and Lasso regression, Chaudhry et al. [29] demonstrated the utility of conditional survival analysis in better assessing dynamic survival patterns, and Nguyen et al. [30] showed that the Cox model serves as an effective tool for predicting survival-influencing factors. These studies highlight the significance of demographic and clinical variables in predicting survival and underscore the need for advanced models to facilitate more precise survival analyses. Thus, the reliance on classical analytical methods, which are inadequate for modeling complex variable relationships, has created a significant gap in understanding the precise factors affecting survival, making a comprehensive survival analysis essential to address the limitations of previous studies and identify the best model.
Given the existing gaps in the literature, the present study aimed to assess up-to-date and comprehensive data from LC patients in Mazandaran Province by employing a diverse set of more sophisticated survival models to identify factors influencing survival and cancer-specific survival patterns.
Materials and Methods
This prospective cohort study consisted of 708 LC patients residing in urban and rural areas of Mazandaran Province, Iran. The data for this study were obtained from the Cancer Registry Center of Mazandaran University of Medical Sciences (MazUMS) using census-based sampling enrolling all LC patients diagnosed between 2017 and 2019, and followed up by phone calls until February 2023, and analyzed in November 2024 (until this date, only the data recorded up to 2019 had been prepared for analysis).
Inclusion and exclusion criteria
Patients with a confirmed diagnosis of LC, residing in Mazandaran Province at the time of diagnosis, and registered at the MAZUMS Cancer Registry Center, with the researcher’s access to their complete medical records, were enrolled. Patients with incomplete data, non-LC diagnoses, those transferred to other centers, died due to causes unrelated to LC, or those unwilling to continue participation were excluded from the study. Patients with more than 10% missing data in key variables (age, gender, tumor type, or survival status) were excluded from the analysis. For patients with missing values in only one variable, listwise deletion was applied in the relevant analysis, and no imputation method was used.
Study variables
The study variables included gender (male/female), age group (less than 50 years, 50-60 years, 60-70 years, and over 70 years), residential location (urban/rural), lung tumor type (non-small cell LC, small cell LC), and histological grade (level of differentiation).
Statistical analyses
Survival time was defined as the interval between the date of LC diagnosis (2017–2019, corresponding to 1396–1398 in the Iranian calendar) and the date of death or last follow-up in February 2023. Patients with more than 10% missing data in key variables (age, gender, tumor type, or survival status) were excluded from the study. For patients with a single missing value, listwise deletion was applied in the relevant analysis, and no imputation method was used.
Both non-parametric and semi-parametric approaches were applied, including Kaplan–Meier survival curves, log-rank tests, and univariate and multivariate Cox proportional hazards (PH) regression models. The PH assumption was evaluated using Schoenfeld residuals and log-minus-log survival plots. The results showed that the PH assumption was violated for tumor type (P<0.001), while no violations were observed for other covariates. To account for this, a Cox regression model with time-varying covariates was also fitted. This model revealed a time-dependent effect of tumor type; however, due to the very small standard errors and unstable estimates, its results were interpreted with caution.
To obtain more robust estimates, we further applied a range of parametric survival models, including exponential, Weibull, log-normal, log-logistic, Gompertz, and generalized gamma distributions, each with and without frailty terms, under both PH and accelerated failure time (AFT) frameworks. Frailty was modeled using a gamma distribution to capture unobserved heterogeneity across patients (e.g. comorbidities, genetic predisposition, or treatment-related differences). Model fit was compared using the Akaike information criterion (AIC), Bayesian information criterion (BIC), and log-likelihood values. Among all candidates, the log-logistic AFT frailty model provided the best fit and the most stable estimates; therefore, our main interpretations were based on this model.
All analyses were performed using Stata software, version 17 (StataCorp LLC, College Station, TX, USA) for Cox regression, parametric survival, and frailty models. R software, version 4.5.0 (R Foundation for Statistical Computing, Vienna, Austria) was used primarily for generating Kaplan–Meier curves, log-minus-log plots, and other visualizations, with the assistance of the “survival” and “survminer” packages. A gamma frailty distribution was incorporated into the models to capture unobserved heterogeneity across patients, reflecting latent factors (e.g. comorbidities, genetic predisposition, or treatment-related differences) that were not available in the registry data.
Results
Patient characteristics
Out of 708 LC patients, 431 (61.02%) died during the follow-up period. Patients included in this study were those whose diagnoses had been registered in the Cancer Registry of MAZUMS between 2017 and 2019 (corresponding to 1396–1398 in the Iranian calendar). These patients were then followed up until February 2023. Thus, the maximum potential follow-up duration was nearly 6 years. Follow-up time varied across individuals, with a median follow-up of 489 days (the interquartile range [IQR]: 121.5-1118.75). The mean age of LC patients was 64±12.42 years. Regarding histologic grading, 31(4.38%), 24(3.39%), and 53(7.49%) patients had well-differentiated, moderately differentiated, and poorly differentiated tumors, respectively. The majority of the LC tumors were non-small cell LC (58.33%) (Table 1).
Overall survival
The mean survival time was 612.49±500.8 days (median: 489 days, interquartile range: 121.5-1118.75, mode: 5 days, min: 1 day, max: 1453 days). Table 2 presents the results of survival probabilities at various time points.
The survival probability was 0.69 (95% CI, 0.66%, 0.73%) at 6 months, 0.54 (95% CI, 0.51%, 0.585%) at 1 year, 0.44 (95% CI, 0.41%, 0.48%) at 2 years, and 0.39 (95% CI, 0.36%, 0.43%) at 3 years (Figure 1).
Univariate Cox regression, log-rank, and PH test
Table 3 demonstrates the association of demographic, clinical, and tumor-related variables with patient survival and includes the hazard ratio (HR) with 95% CI, the univariate Cox regression results, log-rank test results, and PH test results for each variable.
Our analysis indicated no significant difference in survival among males and females, or patients living in urban versus rural areas. Also, none of the age groups showed a significant association with survival outcomes (All P>0.05) (Figure 2).
In LC patients with small cell LC, the HR was 0.93 (95% CI, 0.76%, 1.12%), with a univariate Cox regression P-value of 0.432 and a log-rank test P of 0.429, indicating no significant difference in survival between SCLC and NSCLC patients. However, the P for the PH test was 0.0009, suggesting a significant difference from the perspective of the PH model. Also, as shown in Figure 2, the chart line of both tumor type subgroups (small cell LC and non-small cell LC) intersects. These results reveal that the PH assumption was violated for tumor type (P<0.001). The PH assumption was evaluated using both Schoenfeld residuals and log-minus-log survival plots. As illustrated in Figure 3 of the manuscript, the log-minus-log survival curves for tumor type were not parallel. In addition, the Schoenfeld residuals test indicated a significant violation for tumor type (P<0.001) in Table 3. No other variables showed a violation of the PH assumption. Given this violation and the lack of significant associations across all variables, the use of a multivariate Cox regression model with time-varying covariates (for tumor type, which may change over time) could be expected to provide more accurate results.
Time-varying multivariate Cox regression analysis
The total number of patients was 708, of whom 432 died during the follow-up period (total time at risk: 433640 days). The Cox regression analysis with the Chi-square test showed that the model was generally statistically significant (log-likelihood [LR] chi2 (7)=47.22, P<0.001). As shown in Table 4, males had a higher but insignificant HR compared to women (HR: 1.14; 95% CI, 0.91%, 1.42%; P=0.246). Similarly, no significant results were observed with different age groups or the residential status.
In contrast, tumor type emerged as a strong and statistically significant predictor of mortality. Patients diagnosed with small-cell LC exhibited a substantially higher risk of death compared to those with non-small-cell LC (HR=1.69; 95% CI, 1.28%, 2.23%; P<0.001). However, inspection of the log-minus-log survival curves (Figure 3) revealed clear non-parallelism between groups, indicating a violation of the PHs assumption for this variable. To account for this temporal non-proportionality, a time-varying Cox model was applied by incorporating an interaction term between tumor type and time. This model revealed a statistically significant interaction effect, with a hazard ratio of 0.998 (95% CI, 0.997%, 0.998%; P<0.001), suggesting a modest but consistent decline in the relative mortality risk associated with small cell LC as time progressed. Clinically, this trend aligns with the known biological behavior of small cell LC, an aggressive malignancy with high early mortality, followed by a reduction in risk among long-term survivors. In contrast, non-small cell LC typically demonstrates a more stable risk profile over time. Overall, incorporating time-dependent Effects into the multivariate Cox regression enhanced the temporal resolution of risk estimation and emphasized the necessity of flexible modeling approaches in survival analyses, particularly when dealing with rapidly evolving diseases such as small cell LC.
Parametric survival models
According to Table 5, the log-logistic AFT model with a gamma frailty distribution yielded the lowest AIC=2418.36 and BIC=2459.43 values among all tested parametric survival models, along with the highest log-likelihood (−1200.181). These metrics indicate superior goodness-of-fit and suggest that this model provides the most appropriate representation of the observed survival patterns in LC patients.
Table 6 summarizes the estimated hazard ratios and corresponding statistics from the log-logistic AFT model with gamma frailty. The overall model was statistically significant, with a likelihood ratio chi-square of 28.21 (P<0.001). No significant associations with survival were detected for gender (HR=1.27; 95% CI, 0.82%, 1.98%, P=0.280), residence (HR=1.1; 95% CI, 0.75%, 1.62%, P=0.626), or age groups 51–60 (HR=0.9; 95% CI, 0.48%, 1.7%, P=0.754) and 61–70 (HR=1; 95% CI, 0.55%, 1.82%; P=0.993). However, individuals aged over 70 years exhibited a higher risk of death, approaching statistical significance (HR=1.71; 95% CI, 0.93%, 3.15%; P=0.086). Tumor histology emerged as a strong and statistically significant predictor of survival. Patients with small-cell LC had a markedly higher risk of death compared to those with non-small-cell lung cancer, with a hazard ratio of 2.63 (95% CI, 1.72%, 4.04%; P<0.0001). The frailty component of the model was also statistically significant, as indicated by the likelihood ratio test of θ=1.26 (95% CI, 0.87%, 1.82%; P<0.001), confirming the presence of substantial unobserved heterogeneity in survival risk. This finding underscores the importance of accounting for latent individual-level variability when modeling survival outcomes in LC populations.
Discussion
This study investigated survival outcomes in a cohort of 708 LC patients from Mazandaran Province, Iran, evaluating the prognostic value of demographic, clinical, and tumor-related variables. Across all analyses, tumor histology consistently demonstrated a strong and statistically significant association with patient survival. In contrast, demographic factors such as sex, residential status, and age group did not retain significance.
The time-varying Cox model was applied to illustrate the non-proportional effect of tumor type over time. However, this model generated very small standard errors and narrow confidence intervals, indicating instability in the estimates. Therefore, these results were interpreted with caution and mainly used to demonstrate the violation of the PH assumption rather than as the basis for conclusions. To overcome these limitations, we emphasized the results of the log-logistic AFT frailty model, which provided more robust and reliable estimates without relying on the PH assumption. The frailty component also captured unobserved heterogeneity among patients, further improving the validity of the findings.
In the present study, 6-month, 1-year, 2-year, and 3-year survival rates of 708 LC patients were 69%, 54%, 44%, and 39%, respectively. Other studies were conducted in various provinces of Iran, with considerable variability in survival rates. In a study by Bahari et al. [31] on 73 Kurdish Iranian LC patients, the 1-, 2-, and more than 2-year survival rates were 27%, 22%, and 16%, respectively. In a study by Abedi et al. [18] in Sari on 102 LC patients, 1-, 2-, and 3-year survival rates were 57%, 34%, and 29%, respectively. Babanejhad et al. [20] assessed 259 LC patients in Tehran (capital city of Iran) and estimated the 1-, 2-, and 3-year survival to be 63%, 53%, and 46%, respectively. Abazari et al. [9] studied 355 LC patients in Western Azerbaijan and found the 1-, 2-, and 3-year survival rates to be 39%, 18%, and 0.07% respectively, in this population. Another study by Zahir et al. [19] on 148 LC patients in Yazd found that only 25% of patients live longer than 1 year after diagnosis.
These disparities can be justified based on different health development indices in Iranian provinces. Nemati et al. [32] found less than 15% geographical disparity in survival of lethal malignancies (such as LC) in Iran, with the lowest chance of survival in Western Azerbaijan. Iran exhibits significant regional inequalities in health care access, infrastructure, and socioeconomic development. Provinces like Mazandaran and Tehran, which are more economically developed, may have better health care facilities, earlier diagnosis, and more advanced treatment options compared to less developed regions like Western Azerbaijan and Yazd, as a study by Janssen-Heijnen et al. [33] reported variations in survival rates among European nations (as economically developed countries). Public awareness of cancer symptoms and the importance of early medical consultation can vary across regions. In more developed provinces, higher literacy rates may encourage individuals to seek medical attention earlier, leading to better survival outcomes.
Also, these discrepancies in survival rates may have been affected by the time of study conduction [34, 35]. For instance, Lu et al.’s [36] findings showed the LC survival rate increased over time. Similarly, Howlader et al. [27] found that survival increased from 26% in patients diagnosed in 2001 to 35% among patients diagnosed in 2014. Older studies may have relied on less advanced diagnostic tools, such as chest x-rays or basic CT scans, which are less sensitive for detecting early-stage lung cancer. This restriction could result in a higher proportion of patients being diagnosed at advanced stages, leading to lower reported survival rates. Studies conducted before the widespread availability of targeted therapies (e.g. EGFR inhibitors, ALK inhibitors) and immunotherapy (e.g. PD-1/PD-L1 inhibitors) may report lower survival rates, as treatment options were limited to chemotherapy and conventional radiotherapy, which are less effective for certain subtypes of lung cancer. On the other hand, more recent studies may have benefited from advanced imaging techniques, such as high-resolution CT scans and molecular diagnostics. More recent studies are likely to reflect the benefits of advanced treatments, such as targeted therapies, which have improved outcomes for patients with specific genetic mutations.
In the present study, the mean age of patients was 64±12.42 years (median: 65). Similarly, other studies conducted in Iran found homogenous results. The mean age of patients was 63.5±13.5 years in Abazari et al.’s study [9], 63±12.8 years in Zahir et al.’s study [19], 62.9±12.8 years in Abedi et al’s study [18], 59.49±11.12 years in Bahari et al.’s study [31], and 62.86±12.46 years in Babanejhad et al.’s study [20]. However, LC patients were older in other international studies. In Agarwal et al.’s study in the United States [37], the median age of patients was 69 years, with only 18% being younger than 60. However, in the present study, 37.5% of the LC patients were younger than 60. The lower age of LC patients in studies conducted in Iran, compared to international studies, should be considered an alarm for Persian health care policymakers. Future studies should be performed to assess the LC trend among the Iranian population.
Our findings indicated that place of residence did not exhibit a statistically significant impact on survival outcomes among LC patients. However, contrasting findings have been reported in other contexts, such as the study by Atkins et al. [38], Logan et al. [39], and Pozet et al. [40], which identified rurality as a factor associated with lower survival rates. This discrepancy may be attributed to geographical and infrastructural differences between rural areas in the United States and those in Mazandaran Province. Unlike in the United States, in Mazandaran, rural and urban areas are often in proximity, with travel times of only a few minutes by car, and both populations have similar socioeconomic statuses [39, 41]. As a result, access to health care facilities and treatment options does not differ substantially between rural and urban populations in this region, potentially explaining the lack of a significant survival disparity. To further explore the relationship between residence location and survival, similar studies could be conducted in other regions of Iran where rural and urban areas are more geographically and socioeconomically dispersed.
Tumor type was identified as a significant predictor of survival. Patients with small cell LC faced a higher risk of mortality compared to those with non-small cell LC (HR: 1.693; P<0.001). This finding is consistent with studies such as Teixidor-Vilà et al. [42], Zhang et al. [28], Abedi et al. [18], and Francisci et al. [43], which have confirmed the more aggressive nature of small cell LC.
Notably, the time-varying Cox regression analysis revealed that the prognostic impact of tumor type, particularly for small cell LC, decreases gradually over time. This dynamic risk pattern, rarely addressed in prior survival studies, suggests that the initial aggressiveness of small cell LC is more pronounced in the early months post-diagnosis. Complementing this finding, our parametric survival analysis based on the log-logistic AFT model with a gamma frailty component confirmed that small cell LC was the most powerful predictor of poor prognosis. Specifically, small cell LC was associated with a substantially reduced survival time (HR: 2.63; 95% CI, 1.72%, 4.04%; P<0.0001), even after adjusting for unobserved heterogeneity among patients. These results reinforce the biologically aggressive nature of small cell LC and underscore the importance of early intervention strategies tailored to this high-risk subgroup.
Conclusion
This study, based on a comprehensive survival analysis of 708 patients with LC in Mazandaran Province, Iran, identified key prognostic factors that influence patient outcomes. The overall survival rates were 69% at 6 months, 54% at 1 year, 44% at 2 years, and 39% at 3 years. Among all variables assessed, tumor type emerged as one of the most significant predictors of survival. Consequently, clinical protocols should prioritize early, aggressive intervention for small cell LC patients. Follow-up strategies must be risk-adaptive, with intensified monitoring in the initial high-risk period. Future research should validate these dynamic patterns in larger cohorts to refine personalized care pathways and improve patient outcomes.
Study limitations
This study has several limitations. The single-center design may limit generalizability. Missing data were handled by listwise deletion, potentially introducing bias. The time-varying Cox model produced unstable estimates with narrow confidence intervals. Unmeasured confounders, such as smoking status or comorbidities, could influence survival outcomes despite frailty adjustment.
Ethical Considerations
Compliance with ethical guidelines
This study was Approved by the Ethics Committee of Mazandaran University of Medical Sciences, Sari, Iran (Code: IR.MAZUMS.IMAMHOSPITAL.REC.1396.10). Also, written informed consent was obtained from all participants.
Funding
This research did not receive any grant from funding agencies in the public, commercial, or non-profit sectors.
Authors contributions
Conceptualization: Mahmood Moosazadeh, Erfan Ghadirzadeh, Ghasem Janbabaei, and Ali Asghar Nadi Ghara; Data curation: Mobina Gheibi, Ramin Shekarriz, Mej, and Ehsan Zaboli; Formal analysis: Mahmood Moosazadeh and Ali Asghar Nadi Ghara; Methodology: Akbar Hedayatizadeh Omran, Mahmood Moosazadeh, Ali Asghar Nadi Ghara, and Erfan Ghadirzadeh; Project administration: Erfan Ghadirzadeh, Ali Asghar Nadi Ghara, and Mahmood Moosazadeh; Resources: Mahmood Moosazadeh, Ghasem Janbabaei, Akbar Hedayatizadeh Omran, Ramin Shekarriz, Ehsan Zaboli, and Mohamahmood Moosazadehad Eslami Jouybari; Software: Mahmood Moosazadeh And Ali Asghar Nadi Ghara; Supervision: Mahmood Moosazadeh and Ghasem Janbabaei; Validation: Erfan Ghadirzadeh and Mobina Gheibi; Visualization: Erfan Ghadirzadeh, Ali Asghar Nadi Ghara, and Mobina Gheibi; Writing the original draft: Erfan Ghadirzadeh And Mobina Gheibi; Review, and editing: All authors.
Conflict of interest
The authors declared no conflict of interest.
References