Ethics code: IR.USWR.REC.1395.7
Clinical trials code: 0

Lotfalinezhad E, Barati F, Sahaf R, Shati M, Abolfathi Momtaz Y, Forughan M, et al . Psychometric Properties of the Persian Kessler Psychological Distress Scale Among Iranian Older Adults. Iran J Health Sci 2023; 11 (1) :21-28
Department of Public Health, School of Health, Mazandaran University of Medical Sciences, sari, Iran. , shahabpapi@yahoo.com
1. Introduction
In many countries, the populations are considered “rapidly” aging [1, 2], which means that the number of older adults (≥60 years of age) is expected to increase in the coming years. This demographic shift, if accurate, means that the burden from somatic, mental, and social changes that occur with aging is going to increase as well [3]. These changes may include or may lead to the development of psychological distress (PD) [4]. PD is generally defined as “emotional suffering characterized by symptoms of depression and anxiety that may be tied with somatic symptoms”. Although PD is a general term (classification F43.0) [5], it is an important predictor of healthy aging and an important indicator of health status among older adults [4]. For instance, greater PD is associated with unhealthy diet and activity patterns, sedentary behaviors, smoking, increased risk of cardiovascular diseases, problems in performing basic activities of daily living, poor well-being, and social support [6]. 
However, like other mental health issues, PD may easily go undetected for several reasons including the lack of valid disease detection tools [7]. Such reasons are unfortunate because PD is treatable, and its timely detection and appropriate management may not only shorten the duration of suffering for the elderly but may also improve their quality of life (QoL) and social impairment in the long term [8]. Of many detection tools, Kessler psychological distress scale is one of the most widely used tools for detecting PD [9]. Although it is not a diagnostic instrument, it has been reasonably implicated in a variety of mental illnesses as well as clinical and cultural contexts [10, 11]. Also, unlike other instruments [12], the Kessler questionnaire is short and takes less time, and is as useful as the lengthy instruments and interviews [13]. Kessler questionnaire has been used in world mental health surveys as well [9]. Moreover, it is often recommended as a standardized measure when fuller assessments for the presence and severity of PD are not possible [9]. The cross-cultural adaptation of disease detection tools, such as Kessler, is also important since PD is related to cultural factors, such as social change, family stability, and characteristics that are related to communities, not the individuals [14].
Thus, with such a vision, we conducted a study with the primary objective to measure the psychometric properties of a Persian Kessler Psychological Distress Scale among older adults in Gorgan (Golestan). With this study, we expect to fill the gaps regarding the prospects of timely detection and appropriate management of PD among older adults. 

2. Materials and Methods
For validation studies, there is no specific and single á-priori guideline on how to compute the sample size. The recommendations vary from taking two to 20 participants per item of the questionnaire [15] with an absolute minimum of 100 [16]. Others suggest the following sample size: 0-100 (poor), 101-200 (fair), 201-300 (good), 301-500 (very good), and 501+ (excellent). Moreover, for studies with factor analysis, the recommended sample size varies from three to ten subjects for each questionnaire item [16]. 
The participants of our study were recruited from the three most relevant sampling units nested in the general population of Gorgan district (Golestan): The main Park, Elder Leisure Center, and the main Mosque. Throughout our country, the federal government has established one main park, one central mosque, and one elder leisure center in each district. For recruitment, the subjects were required to be at least 60 years of age, male or female, a resident of the district for at least six months, and willing to participate independently. Simultaneously, at all exit points of our sampling units, after a random first contact, the field staff approached every third individual from there for inclusion. This process continued until there was a required number of participants. By following the above recommendations, we aimed to recruit about 200 eligible subjects. 

As recommended [17], a 10-expert panel of trained specialists (psychology, mental health, psychiatry, neurology, public health, geriatrics, and gerontology) with fluency in English was constituted. Following that, the English language Kessler questionnaire was translated into Persian language. After that, a panel discussion took place until there was a complete consensus on the translated version. After that, the Persian version was back-translated into English by an external professional translator who was fluent in the Persian language and medical terminologies. After that, both versions were examined side-by-side by the same panel, which examined items based on feasibility, readability, consistency of style and format, and clarity of language [18]. Panel responses were used to alter or modify the items. During this discussion, two questions “in the past four weeks, how often did you feel restless or fidgety”, and, “in the past four weeks, how often did you feel that everything was an effort” came up for re-translation. After that, the final version was ready. Thereafter, a preliminary pilot testing was conducted on a small sample of 20 intended older adults [19], who were not part of the final sample. After the completion of the translated questionnaire, the respondent was asked to elaborate on what they thought each questionnaire item and their corresponding response meant. This approach allowed us to make sure that the translated questionnaire retained the same meaning as the original, and to ensure that there is no confusion regarding the translated version. 
For content validity, the content validity index (CVI) was measured for relevance, clarity, and brevity for each item and overall scale, by using the ratings of each item. A CVI score higher than 0.75 was considered acceptable [20]. To determine the chance of inter-rater agreement, we also estimated the modified Kappa statistic [21, 22]. Receiver-operator characteristics (ROC) of our questionnaire were measured vis-à-vis the hospital anxiety and depression scale (HADS). HADS was used because it is one of the tools recommended by the National Institute for Health and Care Excellence (NICE) for diagnosing depression and anxiety [23]. 
For factor analysis, we conducted both exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) including fit indices, standardized size of the residual, and coefficient of determination. For EFA, we used the standard criteria including eigenvalue ≥1.0 and factor loading ≥0.40. The Kaiser-Meyer-Olkin (KMO) test was done to determine sampling adequacy. Bartlett’s test of sphericity was also performed. The reliability of our questionnaire was measured for consistency and temporal stability by estimating the alpha coefficient and intraclass correlation (ICC), respectively. Later, we did a test-retest with two weeks intervals in between. 
All data were entered in MS-Excel and were systematically analyzed and also used Amos 24 for CFA. Ethics permission was obtained from the Institutional Review Board of the University of Social Welfare and Rehabilitation Sciences. All subjects were requested to provide their informed written consent before allowing their participation.

3. Results
In total, 200 subjects were approached; of which 190 subjects (≥60 years of age) participated, and their mean age was 70.3 (95% CI: 69.2-71.4) years. Of these subjects, 90 were males (47.3%) and 100(52.7%) were females, P=0.2. A total of 35.8% had PD (i.e., score >20) and 40(21.0%) were illiterates, while 38(20.0%) of the total sample were living alone. A total of 117(61.6%) subjects reported that they are satisfied with their lives, as a subjective observation. The specific details on the validation of our questionnaire, which is the focus of this report, showed that the overall alpha coefficient was 0.88 with no individual item having a coefficient below 0.86. The alpha coefficient to report a yes to an item (0.85) or no to an item (0.90) was similar to each other as well. ICC for test-retest at a two-week interval was 0.90, and inter-rate agreement was 0.89 (95% CI 0.83-0.94). EFA was conducted, which showed a one-factor structure of the underlying construct, Figure 1, which provided a cumulative variance of 97.1%. The KMO test for sampling adequacy was 0.89 and Bartlett’s test of sphericity was significant. The ROC vis-à-vis HADS was measured, which came lower than an expected value of 0.70. Further, CFA was done, which showed high fit indices (0.91) and a low standardized residual of 0.05 (Table 1 and Figure 1).

We analyzed CFA and the goodness of fit (GOF) of a statistical model indicated that the model had a good fit (Table 2).

Table 3 showed that all questions had a significant relationship with the depression questionnaire. 

4. Discussion
The methods of our study were adequately robust. For instance, the sample was made of residents from general community-based sources nested in the population. Moreover, the majority of those who were approached had agreed to participate. Earlier studies have evaluated PD in specific situations [24]; but, we addressed broad non-exceptional context. We focused on older adults, with a high risk of poor mental health [25, 26]. Also, our sample had a balanced representation of both males and females (P=0.2); for instance, nearly half of our sample was female. Moreover, our sample size was in line with the general recommendations for validation studies. In addition, Persian is spoken by about 110 million in eight countries, which implies a direct application of our work to a relatively broad audience. Besides, we used the Kessler questionnaire, which is well-recognized for detecting PD. Its relevance in PD has been discussed in the introduction section already. 
We examined the reliability, stability, and various forms of validity including the face, content, concurrent, and construct validity. The Cronbach alpha and ICC values of our questionnaire were higher than the typically accepted value of 0.70 for the questionnaires [27]. So, these results may infer that all items of our questionnaire would measure the same underlying construct and that the application of these items is likely to remain stable over time. We had a delay of two weeks between the testing and re-testing of our questionnaire. A two weeks delay should be an acceptable one because PD is a chronic condition that may not dissipate over a short time on its own [28]. A two-week delay also helped us by not making the respondents familiar with the items to answer based on their memory simply. 
Face validity is often looked down upon as unreliable. However, it is simple and feasible and provides essential perspectives [29]. For instance, if a questionnaire does not look “good at face”, then, it cannot generate initial interest, motivation, and acceptance from the respondents and practitioners. Further, we also estimated the content validity of our questionnaire based on the parameters of relevance, clarity, and brevity [30] (Table 1).

These content validity indices were higher than a typically expected value of 0.75 [21]. Moreover, the value of the interrater agreement was also reasonably high; by that, we may surmise against obtaining content validity by only chance (Table 1). The legitimacy of content validity, like face validity, is often criticized, but, it is possibly due to the lack of distinction between content and face validity [31]. 
We measured the ROC value of our questionnaire to detect any possibility of being able to distinguish between anxiety and depressive states vis-à-vis HADS. HADS has 14 items and is considered useful for both the diagnosis and tracking of the symptomatic progression of anxiety and depressive states. It is also one of the tools recommended by the National Institute for health and care excellence (NICE) for diagnosing depression and anxiety [23]. However, the ROC discrimination threshold value was lower than the typical value of 0.70 expected from a questionnaire with diagnostic ability. These ROC results align with the results of EFA, and also showed a single-factor underlying structure of our questionnaire (Figure 1). Overall, these results match reasonably well with the fact that the Kessler questionnaire was originally designed as a unidimensional scale, with all its items indicating a single underlying construct [32] to measure a non-specific PD [9]. All items, therefore, serve as an indicator of a single underlying factor of “distress” [32]. 
There are plenty of population and individual-specific factors that may affect the diagnostic ability of the questionnaire, for instance, the risk of anxiety and depression varies by gender [33]. Also, distress relates chronologically more with anxiety than depression, which with aging, one may increase and the other may decrease at the same time [34]. The validity of a single-factor structure of the Kessler questionnaire can be further corroborated by the fact that, in our study, the cumulative variance explained by the single-factor structure was 97.0% (Figure 1). We also performed CFA, which showed adequate factor structure parameters, for instance, a fit index of 0.91. These values are similar to recommended values [35]. 
Lastly, as usual, our work has a few limitations. For instance, we did not identify factors that may affect PD or those that mediate (or buffer against) distress, anxiety, and depression. The major limitation of the Kessler questionnaire is the lack of consistency in its factor structure. Earlier studies have provided conflicting results, for instance, the four-factor model [36], two-factor fit [37], or inadequate fit [38]. Nevertheless, the Kessler questionnaire was originally designed as a unidimensional scale with all its items indicating a single underlying construct [32].
Moreover, the sample was derived from three community-based sources, which were most adapted to the context of our population and also to an urban population set-up. For instance, unlike rural set-ups, door-to-door or other similar household approaches are less viable in urban areas. However, we had made efforts to compensate for the limitations. For instance, the recruitment from the main Mosque was conducted on Thursday and Friday; which are the mandatory days for coming to a Mosque to pray. Similarly, one may note that Parks in Iran are an important venue to fulfill leisure time, which is accessible to a majority of the population [39], in some cities, up to 98.0% of the population. However, access and usage of the main parks may vary from one place to another.
5. Conclusion
To conclude, we may affirm that a Persian language questionnaire for detecting PD among older adults was found to be adequately valid, reliable, and stable. This questionnaire fill-ups the prior gaps in the lack of validated tools for timely detection and appropriate management of PD among older adults in the cultural contexts of the Persian language, including Iran, Afghanistan, and Tajikistan. The study also emphasizes that the Kessler questionnaire is a unidimensional scale measuring a non-specific PD. All items, therefore, serve as an indicator of a single underlying “distress” factor. 

Ethical Considerations
Compliance with ethical guidelines

This study was performed in line with the principles of the Declaration of Helsinki. The Ethics Committee of the University of Rehabilitation Sciences and Social Health, Tehran, Iran approved the study (Code of Ethics: IR.USWR.REC.1395.7).

This research did not receive any grant from funding agencies in the public, commercial, or non-profit sectors. 

Authors contributions
Conceptualization and supervision: Elham Lotfalinezhad, Robab Sahaf, and Devender Bhalla; Methodology: Mahshid Forughan, Fatemeh Barati; Investigation, Writing-original draft, and writing-review and editing: Shahab Papi; Data collection: Fatemeh Amini; Data analysis: Mohsen Shati, Yadollah Abolfathi Momtaz, and Abolfazl Hosseinnataj; Funding acquisition and resources: All authors.

Conflict of interest
All authors declare no conflict of interest.

We thank Kanoon Jahan Didegan Kanon (a daycare center) for providing administrative support.

